Skip to content

LennyGonz/Pascal-Compiler

Repository files navigation

Pascal Compiler

This is a Pascal compiler built using Python 2.7

Running this Compiler

Enter the command: python run.py Inside the run.py file there are 4 examples

  1. array example
  2. assignment example
  3. for-loop example
  4. if-loop example
  5. while-loop example
The first example is array, to see the other examples go into the run.py file and comment out the array example and uncomment a SINGLE other command

Resources used for this project:

Compiler Theory

Parsing Methods

How does an interpreter/compiler work?

A very simple form of a compiler/interpreter:

Source File ==> Scanner ==> Lexer ==> Parser ==> Interpreter/Code Generator

  1. Source File: This is the program that is read by the interpreter/compiler. This is the text that needs to be compiled or interpreted.

  2. Scanner: This is the first module in a compiler/interpreter/

    • The job of a scanner is to read the source file, one character at a time.
    • It also keeps track of which line number and character is currently being read.
    • A typical scanner can be instructed to move backwards and forwards through the source file.
      • Why do we need to move backwards?
    • For now assume that each time the scanner is called:
      • it returns the next character in the file
  3. Lexer: This module serves to break up the source file into chunks(called tokens). It calls the scanner to get characters one at a time and organizes them into:

    • tokens
    • token types
cx = cy + 324;
print "value of cx is ", cx;

A lexer would break it like this:

cx                 --> Identifier       (variable)
=                  --> Symbol           (assignment operator)
cy                 --> Identifier       (variable)
+                  --> Symbol           (addition operator)
324                --> Numeric Constant (integer)
;                  --> Symbol           (end of statement)
print              --> Identifier       (keyword)
"value of cx is "  --> String Constant  (string)
,                  --> Symbol           (string concatentation operator)
cx                 --> Identifier       (variable)
  • The lexer calls the scanner to pass it one character at a time
  • Then lexer groups them together(groups characters together) and identifies them up as tokens for the language parser (which is the next stage)
  • SO basically TOKENS are characters grouped together
  • The lexer also identifies the type of token:
    • variable vs keyword
    • assignment operator vs addition operator vs string concatentation operator etc
  • Occasionally, the lexer has to tell the scanner to back up.
    • Consider a language that has operators that may be more than one character long
      • For example ! vs !=
      • < vs <=
      • '+' vs ++
  • If we assume that the lexer needs to determine whether the operator is a < or a <=, the lexer will request the scanner for another character.
  • If the next character is a '=', it changes the token to "<=" and passes it to the parser
  • If not, it tells the scanner to back up one character and hold it in the buffer, while it passes the '<' to the parser.
  1. Parser: This is the part of the compiler that really understand the syntax of the language
  • It calls the lexer to get tokens and prcessess the tokens per the syntax of the language
  • For example, taking the example from the lexer above, the hypothetical interaction between the lexer and paraser could go like this:

Parser: Give me the next token
Lexer : Next token is "cx" which is a variable.
Parser: Ok, I have "cx" as a declared integer variable. Give me next token
Lexer : Next token is "=", the assignment operator.
Parser: Ok, the program wants me to assign something to "cx". Next token
------> Lexer : The next token is "cy" which is a variable.
------> Parser: Ok, I know "cy" is an integer variable. Next token please
------> Lexer : The next token is '+', which is an addition operator.
------> Parser: Ok, so I need to add something to the value in "cy". Next token please.
--------------> Lexer: The next token is "324", which is an integer.
--------------> Parser: Ok, both "cy" and "324" are integers, so I can add them. Next token please:
--------------> Lexer: The next token is ";" which is end of statement.
------> Parser: Ok, I will evaluate "cy + 324" and get the answer
Parser: I'll take the answer from "cy + 324" and assign it to "cx"


In the section above, the indenting shows a subprocess that the parsers enters to evaluate "cy+324". This gives an idea about how the parser operates.

  • Also note that the parser is checking types and syntax rules (for instance, it checked whether cy and 324 were both integer types before adding them).
  • If the parser gets a token that it was not expecting, it will stop processing and complain to the user about an error.
  • The scanner holds the current line number and character, so the Parser can inform the user approximately where the error occured.
  1. Interpreter/Code Generator: This is the part that actually takes the action that is specified by a program statement.
  • In some bases, this is actually part of the parser(especially for interpreters)
    • The parser interprets and takes action directly
  • In other cases, the parser converts the statements into byte-code
  • In the case of a compiler, it then hands them to the Code Generator to convert into machine code instructions
  • If you want a compiler for a different CPU or architecture, all you have to do is put a new code generator unit to translate the byte code into machine code for the new CPU