- Written by Robert van der Hulst
- Created: 13 October 2015
Some people have asked us if we have lost our mind. And that is a very good question:
What they mean is that writing a compiler from scratch is a hell of a job and will take a lot of time.
But we are not writing a compiler from scratch
So we have NOT lost our mind. Let me explain that a little bit:
Early 2015 Microsoft has published the source code for its C# and VB compiler. This is called the Roslyn Project.
Since this source code is licensed under the Apache 2.0 Open Source license we can use and change this code to create our X# compiler.
That means the the biggest part of the code in our compiler has been developed by the geniuses at Microsoft and has been tested by hundreds of thousands of developers all over the world. That is a HUGE benefit for our project.
So it is easy then?
No it is still not easy to write the compiler but a lot of work has already been done.
The following image displays a compiler in a schematic way:
For the core X# compiler "all" we have to do is replace the parser with a component that understands the XBase language and make a few changes to the other code because xBase is case insensitive where C# is case sensitive.
To write the parser we use a tool called ANTLR (http://www.antlr.org/).
We feed ANTLR a definition of the xBase language, and it generates for us a set of C# classes that analyze xBase source code and build a Parse tree from that source code.
And that parse tree almost looks like the Parse tree that the C# compiler generates in its parser.
So after the parsing has completed we have to convert the ANTLR parse tree to a Roslyn parse tree, and then the Roslyn compiler can do the complicated work for us.
So the steps in the X# compiler become:
- The ANTLR based parser converts source code to a parse tree.
- The hand written converter converts the Antlr parse tree to a Roslyn parse tree.
- The Roslyn engine looks up the types in the referenced assemblies (meta data import).
- The Roslyn engine keeps track of all the symbols in our code (types, variables, methods, properties).
- The Roslyn engine creates new types and methods based on the parse tree and binds this (IL) code to the external references.
- The Roslyn engine finally creates the output assembly.
So what is there left for us to do:
- Write the language definition
- Write the tree converter
- Make some adjustments to the Roslyn code to handle the differendes in case sensitivity
Of course in a later stage we will also have to add:
- Support for xBase types (DATE, ARRAY, USUAL etc)
- Support for "Clipper Calling Convention" (untyped parameters)
- Support for xBase runtime functions
The planning for these xBase types and functions is that they will be ready in the summer of 2016.
A document with the project planning will be available on this website shortly.
In a next Blog article I will go into more detail about our language definition and the new commands and new command options in the X# language. You can think of ASYNC - AWAIT, LINQ, Creating Generics and more.
And the good thing is
The Roslyn code has exposed its API so it can be used by others, such as the language service inside Visual Studio.
That way the language service does not have to "know" the language. All it has to do is talk with the Roslyn API.
The following image shows how the C# and VB language services do this.
And because our compiler uses the same source base, we can also fairly easily expose our compiler API. So the developer of the VS integration has much less work to do!