Compiler Studio

Introduction:
Compiler Studio generates a Recursvie Decent Predictive Parser for LL(1) grammar. Semantic Rules are supported through Translation Schemes. It Also include A Lexical Analyzer Generator which generates a lexical table for given token defintion.

Background Information:
Compiler Studio was developed as a course project for Compiler Construction course during my BS at NU-FAST. The actual project was to develop an LL(1) Compiler that will accept programs written for a subset of Pascal grammar. I found the course very much interesting , and made it as a parser generator and infact generated the actual project with it.

The target compiler was expected to cover three major phases of Compilation excluding Binary Code Generation .

  1. Lexical Analysis
  2. Parsing ( Syntax Analysis)
  3. Semantic Analysis

Description & Usage:
There are 3 main components of the Compiler Studio,

  1. Lexical Analyzer Generator
  2. Parser Generator
  3. Editor for Parsing and lexical analyze a given file

1) Lexical Ananlyzer Generator:
To use Lexical Analyzer Generator, open a Token Definition file (*.gla) by opening a new file or open an existing one. A form will appear; here u can enter the Token definition for the lexical analyzer. The first column is for the regular expression to match lexemes, the second column is the name of token, third column specify whether this will be generated by lex or not(suitable for space etc),the fourth and fifth column specify number of characters to be trimmed from the lexeme from start and end respectively . You should specify all possible regular expression for a file , that means if there is a lexeme with no regular expression then Analyzer will hang miserably.
The regular expression are of very limited (I have designed my own regular expression parser instead of .net’s).You can use only ‘*’, ‘|’, ‘(’, ‘)’ and any character to form regular expression. ‘-’, ‘?’ are not supported.
The Format of gla file is such that it consist of token definition ,each token definition is in following format
(A period)(New line)
(Expression)(New line)
(Token name) (New line)
(Generate) (New line) use 0=no 1=yes
(Start trim) (New line)
(End trim) (New line)

After entering the desired information in the form click Generate Lex Table from Tools menu. The generation will take considerable time which depends on the number characters and operator used in your regular expression .The lex table will be saved to the same location as that of file.

2)Parser Generator:
The Current version of the parser is capable of handling only left factored LL (1) grammars with no left recursion. Open Grammar file (new or existing one) write Grammar, then select a lex table for it and select generate from Tools menu. This will generate c sharp code file for the
parser( recursive decent predictive parser) with couple of additional files,
Change the name space to your desired namespace of all c sharp files, (keep one name for all).Now the add references to the dlls generated by the parser generator and build your project. (see next section for executing the generated parser)
Syntax for writing grammar is
Non-terminals enclosed in <>
Terminals enclosed in { }
Keyword Terminals enclosed in [ ] (key word terminals are tokens for which match is done on their lexeme else on the token )
Semantic Rules or Translation Schemes enclosed in (* *)

Productions are of the form
<N-T>:: symbol string composed of N-T ,T or K-T ;
You can also specify the attributes for your Non terminals. If there are attributes in your grammar they should appear before the productions
Attribute declaration
BeginAttr
<NT1> : attr1type attr1name attrclass, attr2type attr2name attrclass;
<NT2> : attr3type attr3name attrclass, attr4type attr4name attrclass;
EndAttr

Semantic Rules are in the form of Translational schemes.
To learn more about the syntax of the grammar refer to GrammarParser.cfg (this file is used to generate the grammar parser for this application )

3) Source Editor:
You can use this editor for 2 main purposes ;lexically analyze a file to see the generating tokens. After generating the parser and building it as a .net assembly , you can load that assembly (exe,dll) into the editor.
For lexical analysis only :First open a source file , select the lex table for it then select generate token from Tools menu , a list of token will be generated.
For parsing :open a source file , now load parser assembly through Tools menu, the editor will then check for the presence of class that implements IParser interface. Any class( that means including which are not generated by my generator ) that implements the IParser interface can be loaded into the editor and executed , so this is Universal GUI for all parser .After the assembly is loaded , you can parse it by selecting Parse from Tools menu. If the parsing were successful the symbol table will be shown., else an error message will be shown.

You can also use settings to specify lex table and parser assembly, these files will be loaded automatically while opening the corresponding file extension , eg. You can specify xyz.glb and xyz.dll to be loaded every time a *.txt file is loaded

Inside Out:
During lex generation phase the token definition is read and regular expressions are parsed to syntax tree. The individual expression tree are then converted to NFA (Non-deterministic Finite Automata) using Thompson Contruction. The resulting NFAs are then combined to form a Combined NFA. This is done by oring all the NFA . I used a different description for combined NFA . An NFA is described by the alphabets, set of initial states ,set of final states and transition function . I altered the description so that each final state will have the expression it accepts (This was inspired by the approach in Compiler book by Aho Sethi, Ullman). The resulting NFA is then converted to DFA (Deterministic Finite Automat ) using subset construction, to achive better runtime performance gain. DFA is then serialized to a file for later use.Lexer is a simple engine which loads a serialized DFA and then analyze the input string, generating tokens.

During Parser Generation , grammar is read and parsed into a data structure representing Grammar. and then a simple traversal of the data structure representing generates a Recursive Decent Predictive Parser for LL(1) grammar. Parser generator accepts an attributed grammar with translation schemes. The embedded code is extracted and merged with the generated parser. Parser Generator actually works in two phase first it traverse the Grammar data structure and generates a CodeDom like DOM for the resulting parser, then from the DOM source code is generated.

As of this release ,there is no support for Intermediate Code Generation , Optimization and Binary Code generation.

One important point of interest is that the Parser to parse the Input Grammar is generated by the Generator itself ( this is known as boot-strapping).

The application also includes a source code editor alongwith token Editor and grammar Editor. Source Editor loads Lex file and parser assembly to parse the input file. You can open different source eidtor at the same time , each loaded with different lexer and parser. The parser need not to be generated by the application , it can be any implementation provided it implements the supplied IParser interface

Download:
Compiler Studio is uploaded at gotdotnet in user samples area, alongwith sample parser and lexer. DOWNLOAD COMPILER STUDIO .
(Update: Since Microsoft is scraping gotdotnet ,so i am moving the source code to a new location. ALTERNATE DOWNLOAD LINK )

Note: This page will be updated with more description about the compiler studio as soon as i get enough time 🙂

One response to “Compiler Studio

  1. Lexical Analyzer Generator
    Parser Generator
    Editor for Parsing and lexical analyze a given file

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s