THE MARST PACKAGE BRIEF GUIDE
*****************************



Copyright (C) 2000 Andrew Makhorin <mao@mai2.rcnet.ru>,
                   Department for Applied Informatics,
                   Moscow Aviation Institute, Moscow, Russia.
                   All rights reserved.

The MARST package is a part of the GNU project, released under the
aegis of GNU.

This package is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This software is distributed "as is" in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied warranty
of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.



1. INTRODUCTION
2. INSTALLATION
3. PROGRAM INVOCATION
4. USAGE EXAMPLE
5. INPUT LANGUAGE
6. INPUT/OUTPUT
7. LANGUAGE EXTENSIONS
8. CONVERTER UTILITY



1. INTRODUCTION
===============

The MARST package is intended to translate Algol 60 programs to the C
programming language.

Processing scheme can be understood as the following:

                    Algol 60 source program
                               |
                               V
                        +-------------+
                        |    MARST    |
                        +-------------+
                               |
                               V
                         C source code
                               |
                               V
                        +-------------+
         algol.h ------>| C compiler  |<------ Standard headers
                        +-------------+
                               |
                               V
                          Object code
                               |
                               V
                        +-------------+
          ALGLIB ------>|   Linker    |<------ Standard libraries
                        +-------------+
                               |
                               V
                        +-------------+
      Input data ------>| Executable  |-------> Output data
                        +-------------+

where:

Algol 60 source program - a text file that contains program written on
                          the algorithmic language Algol 60. (See below
                          about coding requirements.)

MARST                   - the MARST translator, a program that converts
                          source Algol 60 program to the C programming
                          language. This program is a part of the MARST
                          package.

C source code           - a text file that contains C source code
                          generated by the MARST translator.

algol.h                 - header file that contains declarations of all
                          objects used by any program generated by the
                          MARST translator. This header includes some
                          standard headers (stdio.h, stdlib.h, etc.) but
                          no other headers are used explicitly in any
                          generated code. This file is a part of the
                          MARST package.

Standard headers        - standard header files (they are used in the
                          header algol.h only).

C compiler              - C compiler.

Object code             - a binary file containing object code produced
                          by C compiler.

ALGLIB                  - a library (archive) file that contains object
                          code for all standard and library routines
                          used by Algol 60 program. Some of them which
                          correspond to standard Algol procedures (such
                          as ininteger, outreal, etc.) are written on
                          Algol 60 and translated to the C by the MARST
                          translator. Source code of these routines is a
                          part of the MARST package.

                          The file name of this library may be different
                          for different distributions. For example, in
                          this distribution the name libalgol.a is used.

Standard libraries      - standard C run-time libraries.

Linker                  - linkage resolving program.

Executable              - a binary file containing ready-to-run program
                          in loadable form.

Input data              - a text file read by Algol 60 program.

Output data             - a text file written by Algol 60 program.



2. INSTALLATION
===============

To install the MARST package under GNU/Linux use standard installation
procedure. (For details see file INSTALL.)

As a result of the installation four components should be installed:

marst       (usually in usr/local/bin),

macvt       (usually in usr/local/bin),

algol.h     (usually in usr/local/include or/and in usr/include), and

libalgol.a  (usually in usr/local/lib).



3. PROGRAM INVOCATION
=====================

To invoke MARST translator the following syntax is used:

   MARST [options...] [filename]

Options:

   -d, --debug          run translator in debug mode

                        If this option is set then the translator emits
                        elementary syntactic units (so called sections)
                        of source Algol 60 program to output C code as
                        comments.

                        This option is useful to localize syntax error
                        more precisely. For example, Algol 60 has three
                        kinds of comments (usual comments, comments of
                        end-end type and extended parameter delimiters).
                        Therefore it is easily to make a mistake if, for
                        example, to forget comma between end bracket and
                        next statement.

   -e nnn, --errormax nnn
                        maximal error allowance

                        This option set maximal error allowance. The
                        translator stops after the specified number of
                        errors has been detected. The value of nnn
                        should be in the range from 0 to 255. If this
                        option is not specified then -e 0 is used as
                        default value and means to continue translation
                        until the end of the input file.

   -h, --help           display help information and exit(0)

   -l nnn, --linewidth nnn
                        desirable output line width

                        This option set desirable line width for output
                        C code produced the translator. The value of nnn
                        should be in the range from 50 to 255. If this
                        option is not specified then -l 72 is used as
                        default value.

                        Note that actual line width may happen to be
                        greater than nnn, since the translator can't
                        break output text in any place, but this happens
                        extremely seldom.

   -o filename, --output filename
                        the name of file to which the translator sends
                        produced output C code

                        If this option is not set then the translator
                        uses stdandard output by default.

   -t, --notimestamp    don't write time stamp to output C code

                        By default the translator writes date and time
                        of translation to output C code.

   -v, --version        display translator version and exit(0)

   -w, --nowarn         don't display any warning messages

                        By default during translation the translator
                        displays warning messages reflecting potential
                        errors and non-standard features used in source
                        Algol 60 program.

To translate source Algol 60 program it should be prepared in text file
and the name of this file should be specified in command-line.

If the name of input text file is not specified then the translator uses
standard input file instead. *Note* that the translator reads input file
*twice*, therefore this file should be only regular file, but not pipe,
terminal input, etc. Hence if standard input file is used it should be
redirected to regular file.

For one run the translator can process only one input text file.



4. USAGE EXAMPLE
================

The following example shows how to use the MARST package in most cases.

At first we prepare source Algol 60 program, for example, in text file
named sample.alg:

   begin
      outstring(1, "Hello, world\n")
   end

Now we translate this program to the C programming language:

   marst sample.alg -o sample.c

and get text file named sample.c which then we compile and link in usual
way (we should remember about algol and math libraries):

   gcc sample.c -lalgol -lm

Finally, we run executable:

   sample

and see what we get. That's all.

For more examples see directory 'ex' in the distribution.



5. INPUT LANGUAGE
=================

The input language of the MARST translator is hardware representation
of the reference language Algol 60 as described in IFIP document:

Modified Report on the Algorithmic Language ALGOL 60. The Computer
Journal, Vol. 19, No. 4, Nov. 1976, pp. 364-379. (This document is an
official IFIP standard document. It is *not* a part of the MARST
package.)

Note that there are some differences between the Revised Report and the
Modified Report since latter is a result of application of the following
IFIP document to the Revised Report:

R.M.De Morgan, I.D.Hill, and B.A.Whichman. A Supplement to the ALGOL 60
Revised Report. The Computer Journal, Vol. 19, No. 3, 1976, pp. 276-288.
(This document is an official IFIP standard document. It is *not* a part
of the MARST package.)

Source Algol 60 program should be coded as usual text file using ASCII
character set.

Basic symbols should be coded as the following.

   Basic symbol            Hardware representation
   -----------------------------------------------
   a, b, ..., z            a, b, ..., z
   A, B, ..., Z            A, B, ..., Z
   0, 1, ..., 9            0, 1, ..., 9
   +                       +
   -                       -
   x                       *
   /                       /
   integer division        %
   exponentiation          ^ (or **)
   <                       <
   not greater             <=
   =                       =
   not less                >=
   >                       >
   not equal               !=
   equivalence             ==
   implication             ->
   or                      |
   and                     &
   not                     !
   ,                       ,
   .                       .
   ten (10)                # (pound sign)
   :                       :
   ;                       ;
   :=                      :=
   (                       (
   )                       )
   [                       [
   ]                       ]
   opening quote           "
   closing quote           "
   array                   array
   begin                   begin
   Boolean                 Boolean (or boolean)
   comment                 comment
   do                      do
   else                    else
   end                     end
   false                   false
   for                     for
   go to                   go to (or goto)
   if                      if
   integer                 integer
   label                   label
   own                     own
   procedure               procedure
   real                    real
   step                    step
   string                  string
   switch                  switch
   then                    then
   true                    true
   until                   until
   value                   value
   while                   while

Any symbol can be surrounded by any number of white-space characters
(i.e. by blanks, HT, CR, LF, FF, or VT), but any multi-character symbol
should contain no white-space characters. Moreover, a letter sequence
is recognized as a keyword if and only if there is no letter or digit
immediately preceeding or following that sequence (excluding keyword
'go to' that can contain zero or more blanks between 'go' and 'to').

   Examples
   --------

   ... 123 then abc ...    'then' will be recognized as 'then' symbol

   ... 123then abc ...     'then' will be recognized as letters 't',
   ... 123 thenabc ...     'h', 'e', 'n', but not as 'then' symbol

   ... th en ...           'th en' will be recognized as letters 't',
                           'h', 'e', 'n'

Note that identifiers and numbers can contain white-space characters.
This may be used if identifier is the same as a keyword. For example,
identifier 'label' should be coded as 'la bel' or 'lab el'. Note also
that white-space characters are not significant (excluding their use
within strings), so 'abc' and 'a b c' denotes the same identifier.

All letters are case sensitive (excluding the first 'b' in keyword
'Boolean'). This means that 'abc' and 'ABC' are different identifiers,
and 'Then' will not be recognized as a keyword 'then'.

Any identifier or number may contain up to 100 characters (excluding
internal white-space characters).

Quoted string are coded in C style. For example:

   outstring(1, "This\tis a string\n");

   outstring(1, "This\tis a st"  "ring\n");

   outstring(1, "This\tis all one st"
         "ring\n");

Within string (i.e. between double quotes enclosing string body) escape
sequences may be used (as \t and \n in the example above). Double quote
and backslash within string should be coded as \" and \\. Between parts
of a string any number of white-space characters is allowed.

Excluding strings coding and limitations on length of identifiers and
numbers there are no other differences between syntax of the reference
language and syntax of the MARST input language.



6. INPUT/OUTPUT
===============

All input/output is performed by standard Algol 60 procedures.

This implementation allows up to 16 I/O channels that have numbers 0,
1, ..., 15. The channel 0 is always connected to 'stdin' and so only
input from this channel is allowed. The channel 1 is always connected
to 'stdout' and so only output to this channel is allowed. The other
channels allow both input and output.

(The standard procedure 'fault' uses channel number 'sigma' that is not
available to the programmer. This latent channel is always connected to
'stderr'.)

Before Algol program startup all channels (excluding channels 0 and 1)
are disconnected, i.e. no files are assigned to them.

If an input (an output) is required from (to) the channel n then the
following happens:

1) If the channel n is connected for output (input) then the I/O routine
   closes a file assigned to the channel making it disconnected.

2) If the channel n is disconnected then the I/O routine opens a file in
   'read' ('write') mode and assigns this file to the channel making it
   connected for input (output).

3) Finally, the I/O routine performs input from (output to) the channel;
   if an end-of-file has been detected on input then I/O routine signals
   an error condition.

To determine the name of file which is to be assigned to the channel n
the I/O routine tries to check environment variable named "FILE_n". If
such variable exists then its value is used as filename; otherwise its
name (i.e. character string "FILE_n") is used as filename.



7. LANGUAGE EXTENSIONS
======================

The MARST translator supports some extensions of the reference language
to make the package more convenient to the programmer.

7.1. Modular programming
------------------------

The possibility of modular programming is illustrated by the following
example:

   First file                    Second file
   ----------------------------------------------------
   procedure one(a, b);          procedure one(a, b);
   value a, b; real a, b;        value a, b; real a, b;
   begin                         code;
         ...
   end;                          procedure two(x, y);
                                 value x, y; real x, y;
   procedure two(x, y);          code;
   value x, y; real x, y;
   begin                         begin
         ...                           <main program>
   end;                          end

The procedures 'one' and 'two' in the first file are called precompiled
procedures. Declarations of these procedures should be outside block or
compound statement representing program. The procedures 'one' and 'two'
in the second file are called code procedures; they have keyword 'code'
instead statement representing procedure body. Declarations of code
procedures also should be outside program block or compound statement.

This mechanism allows to translate precompiled procedures independently
from the main program (and precompiled procedures may be programmed
in any other C-compatible programming languages). The programmer can
consider that directly before program execution declarations of all
precompiled procedures are placed into the file containing main program
(the second file in example above) instead declarations of corresponding
code procedures. (Of course, it is not a new for C programmers.)

7.2. Pseudo procedure 'inline'
------------------------------

Pseudo procedure 'inline' has the following (implicit) heading:

   procedure inline(str);
   string str;

Any procedure statement using the 'inline' procedure translated into
code which is the string 'str' after deletion of enclosing quotes.

Here is an example:

   Source program                  Output C code
   ------------------------------------------------
   . . .                           . . .
   a := 1;                         dsa_0->a_5 = 1;
   b := 2;                         dsa_0->b_8 = 2;
   inline("printf(\"OK\");");      printf("OK");
   c := 3;                         dsa_0->c_4 = 3;
   . . .                           . . .

Procedure statement 'inline' may be used as a usual Algol statement
anywhere in program.

7.3. Pseudo procedure 'print'
-----------------------------

Pseudo procedure 'print' is intended mainly for test printing (since
standard Algol input/output is out of criticism). This procedure has
unspecified heading with variable parameter list.

Here is an example:

   real a, b; integer c; Boolean d;
   array u, v[1:10], w[-5:5,-10:10];
   . . .
   print(a, b, u);
   print(c);
   . . .
   print("test shot", (a+b)*c, !d | u[1] > v[1], u, v, w);
   . . .

Each actual parameter passed to the procedure 'print' is output to
standard channel 1 (stdout) in printable form.



8. CONVERTER UTILITY
=====================

Algol converter utility is MACVT. It is an auxiliary program which is
intended to convert Algol 60 programs from some other representation to
MARST representation. Such conversion is neccessary when existing Algol
programs should be adjusted to translate them using MARST.

MACVT is not a translator itself. This program just reads original code
of Algol 60 program from input text file, converts each main symbol to
MARST representation (see Section 5. Input Language), and writes result
code to output text file. It is assumed that output code produced by
MACVT will be further translated by MARST in usual way. Note that MACVT
performs no syntax checking.

Input language understood by MACVT differs from MARST input language
only in representation of basic symbols. (Should note that in this sense
MARST input language is a subset of MACVT input language.)

Representation of basic symbols implemented in MACVT is based mainly on
well known (in 1960s) Algol 60 compiler developed by IBM Corp. first for
IBM 7090 and later for System 360. This representation may be considered
as non-official standard because it was widely used at that time when
Algol 60 was actual programming language.

To invoke MACVT converter the following syntax is used:

   MACVT [options...] [filename]

Options:

   -c, --classic        classic representation

                        This option is used by default until other
                        representation is not choosen. It assumes that
                        input Algol 60 program is coded using classic
                        representation: all white-space characters are
                        non-significant (excluding quoted strings) and
                        any keyword should be enclosed in apostrophes.
                        For details see below.

   -f, --free-coding    free representation

                        If this option is set then it is allowed not to
                        enclose keywords in apostrophes. But in this
                        case white-space characters should not be used
                        within multi-character basic symbols. See below
                        for details.

   -h, --help           display help information and exit(0)

   -i, --ignore-case    convert letters to lower case

                        If this option is set then all letters (except
                        in comments and strings) are converted to lower
                        case, i.e. conversion is case-insensitive.

   -m, --more-free      more free representation

                        This option is the same as --free-coding but
                        additionaly keywords for arithmetic, logical
                        and relational operators can be coded without
                        apostrophes. For details see below.

   -o filename, --output filename
                        the name of file to which the converter sends
                        converted Algol 60 program

                        If this option is not set then the converter
                        uses standard output by default.

   -s, --old-sc         old (classic) semicolon representation allowed

                        This option allows the converter to recognise
                        diphthong ., as semicolon (including its usage
                        to terminate comment sequence).

   -t, --old-ten        old (classic) ten symbol representation allowed

                        This option allows the converter to recognise
                        single apostrophe (when it is followed by +, -,
                        or digit) as ten symbol.

   -v, --version        display converter version and exit(0)

To convert source Algol 60 program it should be prepared in text file
and the name of this file should be specified in command-line.

If the name of input text file is not specified then the converter uses
standard input file by default.

For one run the converter can process only one input text file.

In the table shown below one or more valid representations are given
for each basic symbol.

   Basic symbol            Extended hardware representation
   -----------------------------------------------------------
   a, b, ..., z            a, b, ..., z
   A, B, ..., Z            A, B, ..., Z
   0, 1, ..., 9            0, 1, ..., 9
   +                       +
   -                       -
   x                       *
   /                       /
   integer division        %                    '/'      'div'
   exponentiation          ^     **             'power'  'pow'
   <                       <                    'less'
   not greater             <=                   'notgreater'
   =                       =                    'equal'
   not less                >=                   'notless'
   >                       >                    'greater'
   not equal               !=                   'notequal'
   equivalence             ==                   'equiv'
   implication             ->                   'impl'
   or                      |                    'or'
   and                     &                    'and'
   not                     !                    'not'
   ,                       ,
   .                       .
   ten (10)                #     '              '10'
   :                       :     ..
   ;                       ;     .,
   :=                      :=    .=    ..=
   (                       (
   )                       )
   [                       [     (/
   ]                       ]     /)
   opening quote           "     `
   closing quote           "     '
   array                                        'array'
   begin                                        'begin'
   Boolean                                      'boolean'
   code                                         'code'
   comment                                      'comment'
   do                                           'do'
   else                                         'else'
   end                                          'end'
   false                                        'false'
   for                                          'for'
   go to                                        'goto'
   if                                           'if'
   integer                                      'integer'
   label                                        'label'
   own                                          'own'
   procedure                                    'procedure'
   real                                         'real'
   step                                         'step'
   string                                       'string'
   switch                                       'switch'
   then                                         'then'
   true                                         'true'
   until                                        'until'
   value                                        'value'
   while                                        'while'

Remarks
-------

1. Classic (apostrophized) form of keywords and some other basic symbols
   is allowed for any (i.e. for classic as well as free) representation.

2. In case of classic representation all white-space characters (except
   their usage in comments and quoted strings) are ignored anywhere.

3. Basic symbol coded in apostrophes may contain white-space characters
   which are ignored. Besides, all letters are case-insensitive.

4. Basic symbol may be coded in free form (without apostrophes) only if
   free representation (--free-coding) is used.

5. In case of free representation any multi-character basic symbol
   should contain no white-space characters.

6. Free form of keywords that denotes arithmetic, logical, or relational
   operators (e.g. greater instead 'greater') is allowed only if more
   free representation (--more-free) is used.

7. Single apostrophe is recognised as ten symbol only if --old-ten
   option is specified in command-line. (Note that in this case '10'
   will not be then recognised as ten symbol.)

8. Diphthong ., is recognised as semicolon only if --old-sc option is
   specified in command-line.

9. If opening quote is coded as ", then closing quote should be coded
   as " too. If opening quote is coded as `, then closing quote should
   be coded as '. (About strings coding see Section 5.)

Finally it should be noted that MACVT copies comments and white-space
characters to output text to keep original formatting of input text.
