Lexical Analysis

Lexical Analysis Steps

The source code is looked at. Any unnecessary parts of it are removed. This would include comments and spaces. Keywords and symbols are then replaced with ‘tokens’. A token is a generic description of the type of symbol. For example, if you had a constant called 'months_in_year', this would be replaced by the token CONSTANT. If you had variables called 'speed' and 'distance' then these would be replaced with the token VARIABLE. If you had a keyword 'IF', this would get replaced with the selection token for 'IF'. If you had a maths operator such as ‘>’, this would get replaced with the token OPERATOR. By replacing symbols, variables, keywords and so on with generic tokens, you can turn a program into a set of 'patterns' of instructions. It is then a relatively easy job for the compiler in the syntax stage to check each pattern against the allowable ones. Tokens replace:
A keyword like IF, FOR, PRINT, THEN and so on.
A symbol that has got a fixed meaning, such as +, *.
Numeric constants.
User-defined variable names.

Example

For example, if you had the following line in a program: IF x > 10 THEN this might be changed into this pattern of tokens:
SELECTION/IF-VARIABLE-OPERATOR-CONSTANT-SELECTION/THEN
Notice that the spaces have been removed and generic descriptions (tokens) have replaced keywords, variables, symbols and numeric constants.
A look-up table is created. This stores the values of all constants used along with their data type.
Variable names are also entered into the look-up table.
Once you have lexically analysed your program, you end up with a string of tokens. This is passed to the syntax analysis stage.

Backlinks