"Cola" - A compiler for the Parrot/Perl6 VM

  V0.0.6.0

  I've started this compiler (I call it Cola) to simultaneously
  teach myself how to write a compiler (ok I admit I squeaked
  through Compiler Design in school because I was too busy
  playing online MUD and basketball), as well as help
  out a project that I love, Perl6/Parrot!

  To really start having fun with the Parrot runtime I wanted
  a language similar to C/C++/C#/Java.

  The parser is LALR, developed with flex/bison, with
  an intermediate code compiler written in Perl. The
  parser generates a simple intermediate code (more on that later)
  which is easier to read than Parrot assembly, and
  also includes a few directives that Parrot currently
  lacks. The parser leaves register allocation/spill control,
  optimization and a few other specifics to the intermediate
  compiler (int2pasm.pl).

Where to Get the Latest Compiler

  Currently Cola is included in the Parrot distribution,
  see http://www.parrotcode.org or http://dev.perl.org/perl6
  At some point the distributions may diverge for some reason
  so you can grab the latest tarball from CPAN.

    http://cpan.org/authors/id/M/ME/MELVIN


The Syntax

  Compatible (eventually) C# syntax.

  The easiest way to see what it currently looks like is
  to read the examples.

  I aim to eventually achieve C# source level compliance, targetted
  to the Parrot runtime. Whether or not we do on the fly bytecode
  interfacing with .NET later is a whole different rhinoceros.


Supported Constructs

  For some quick samples, see the cola/examples/ subdirectory.

  Statements: FOR, WHILE, IF, ELSE, BREAK, CONTINUE, RETURN.

    All of these are limited in their current version but follow
    typical C-ish rules (break breaks out of current loop, continue
    shortcuts to the next iteration). return may return a value
    from anywhere inside a method.

    0.0.4 adds logical operators.
    Conditional expressions may now be compound and use logical
    operators (&& and ||).
    CAUTION: Assignments inside conditionals such as...

        if((i = 15) == 15)

    are still not supported. This sort of expression will be fixed
    in the next update.

  Types: int, float, string (classes and arrays coming soon) 
    Currently these map directly to the Parrot primitive types.
    Very simple type coercement and checking is supported.

  Variable Declarations:
    For now declare all your variables at the top of each method.
    Block scoping will be done soon.
    You may initialize your variables in the declaration.

  Expressions:  Parens, +, -, *, /, %, ++ and -- are supported in
    any combination or complexity. Post and pre-increment are also
    supported. You may use the + operator on strings for concatenation.

    The following works:

        puts("Hello Mr " + name + "\n");
    
    There is no support for concatenation with non-strings yet.    
    Need to whip up a few conversion routines.
 
  Comparisons:  <, >, <=, >=, !=, ==

  Bitwise operators:  <<, >>, |, &, ^, and ~ are now supported in 0.0.4

  Conditional expressions:  Commonly known as the ternary conditional...
    As I understand the Java and C# language spec,
    conditional expressions must be on the right hand side of an
    assignment or anywhere that uses the return value, however you
    cannot use ternaries standalone as a statement. So in Cola you can
    say:
           max = i > j ? i : j;
    or:
           max = i > j ? foo() : bar();

    But it is not proper (although C allows you) to say:

           i > j ? printf("max=i\n") : printf("max=j\n");

    If you think I have misread the C# grammar spec, please email me.

  Boolean: The boolean ! operator isn't yet supported.
 
  Objects and Methods:  
    Classic C++/Java style, recursion supported. Return types supported.
    Will be adding variable argument support per the C# spec.

    Member variables or "fields" aren't yet supported. You
    can use const definitions, but member variables
    are incomplete pending a little more work on Parrot.
    Its possible to do them now with PerlHash or PerlArray,
    but I'm working on something faster for strictly typed,
    non-dynamic languages.

    Currently instance methods are the same as class methods. So
    you could do:

        Console.WriteLine("");

    or  Console c = new Console();
        c.WriteLine("");

    The compiler doesn't yet differentiate between static or class
    methods and instance methods, and whether you are calling them
    as such.

    OOP is really just faked for now, enough for people to write
    in a high level language for Parrot. As everything, it is a work
    in progress. All objects are simply PerlString references for now.

  Current Builtin Subroutines

    Since I have yet to implement class importing, the system routines
    are just plain wrappers around the Parrot ops. Currently they are:

       strlen, substr, strchop, ord, puts, puti, putf, gets, sleep

    See gen.c and main() for the current kludgy way to patch in more wrappers.

    In calc.cola I also did a sample implementation of a string to int
    conversion called StrToInt().


  Arrays:
    Parrot now has a substr with replace op so we can emulate arrays
    on top of it. Its still a hack but it works.

What you currently CANNOT do even though the Parser may eat it...
  Statement lists:          i = j, j = k;
  Nested assigns:           i = (j = 0);
  Fancy For loops:          for(i = 0; j++, j < 4; i++)
  Empty For headers:        for(;;) // use while(1)
    
Using the Compiler

  You will need Flex and Bison installed to build the compiler.
  These are the GNU versions of the classic lex and yacc, but
  are more modern. The grammars should work with standard lex/yacc
  but I've not tested this lately.

  If for some odd reason you have Perl and Parrot but no Flex/Bison
  send me an email <melvin.smith@mindspring.com> and I might take pity
  on you and send you the generated parser C files.

  Build colac by typing:

	make

  Usage:

	colac examples/mandelbrot.cola

  Then copy "a.pasm" to your Parrot directory, assemble it and run
  as usual. In case you need help there to, try:

        assemble.pl a.pasm > a.pbc
        parrot a.pbc

  Currently colac is a short Perl pre-processor that includes
  class for any import statements (using System;)
  If you have trouble with colac you can just use colacc which
  is the raw compiler which ignores 'using' directives.

  Also, if you look in "a.imc" you will see an intermediate
  language, delimited by tags (<program>....</program>) with some
  debug info outside of those tags. Debugging the compiler is easier
  by looking at the intermediate code. This language can be
  piped through int2pasm.pl to re-generate the .pasm file.

  Currently the compiler is very limited with a few warnings and
  a few simple type coercions.
  
  If you try to do something fancy, the grammar might accept
  it but the code will probably come out wrong or simply crash the
  compiler. Read all the samples before trying anything.
 
Intermediate Code

  NOTE: The translator is very simple, mostly it just pre-processes
    code that it understands and leaves the bulk of the work to
    the Parrot assembler.

  The intermediate code output supports 3 primary types of statements.

    Assignment statements, Branch statements, and Stack manipulation

  It additionally supports:

    Labels (Parrot style), Directives (.foo), Line comments (# ...),
      and Here documents (for emitting code that skips through the
      intermediate translator). The compiler uses the here document
      to emit a few support routines (puts, substr). I'm working on
      moving this stuff into System.cola.

    Limited type checking. Each assignment has a type associated
      with it, in the same syntax as a C-style cast.

      Example:
        (i32)$R12 = $N7 * 0.1

        A floating point expression coerced to a integer.

      The compiler will check for bad assignments such as int to string, etc.

  For a nice sample of intermediate code, compile mandelbrot.cola or
  calc.cola which actually does a limited form of parsing with Parrot!


Register Allocation

  The register tracker took about 30 minutes to hack out and is
  good enough for now. Temporaries from the compiler are either named
  $LN or $RN (where N is an integer indicating the temporary). The
  translator will assign registers from a register stack in "first
  come first serve" fashion, and upon each line translation, it
  will return any registers to the stack that were used for rvalues
  on the right hand side of an assignment.
  Rvalues are indicated by $RN. Named variables and $L (lvalues)
  will currently hold their register for the scope of the method.

  This is quick and dirty. I plan to implement a flavor of the
  graph coloring algorithm for the final translator.

  For now, if the translator runs out of registers, it will simply exit.
  Parrot has 32 registers each of string, int, float and pmc so
  it takes quite a number of local variables and/or quite a complex
  expression to hold 32 live registers at any one point.

  Mandelbrot currently uses 9 of 32 float registers and 6 of 32 int
  registers to get its work done.


Optimization

  Very limited. No explicit optimization phase yet.

  You might see dumb code generated in the form of:

    set I1, I0
    set I2, I1

  or..

    branch LABEL34
    LABEL34: ...

  generated between basic blocks or in situations where
  the generator is currently dumb.
 
  I'll at a most add in a simple peephole optimizer to skip
  useless assignments, temporaries and branches.


Coming Soon
  Some form of a printf()
  Full array support
    

Gaping Holes

  Arrays
    Strings can be treated as arrays, thats it for now.
    Parrot still needs keyed aggregates finished to do this.
 
  Class/Struct instantiation
    Lotta work involved here, I won't go there for now.
    Currently you can define multiple Cola classes per file, but cannot
    instantiate any of them. Parrot currently does not
    have an adequate bytecode "class format" in which to write
    symbol table information. This should change soon; at least
    thats what Dan says. :)

  Object Methods and Field references
    Again, a lot of work, but on the list.


