agcpp - ANSI C pre-processor and translator.

Here is an ANSI C conforming pre-processor. It is also a translator from
ANSI C to "K & R C" (with one exception, and one restriction - see below).
There are also, probably, lots of bugs, which my coding style has not
found. Please inform me if you find any of these. Most of the draft
standard's implications, bells and whistles were taken from "The C Book,
featuring the draft ANSI C standard", by Mike Banahan of The Instruction
Set.

This was originally written to replace the old, creaking Minix cpp, because
it was driving me up the wall. As I am working in Berlin, this could get
quite dangerous.

Hopefully, this will allow people to write code in ANSI C, whilst continuing
to use their old compilers - so start writing in ANSI C, folks.

Please note that it's not quick - but I haven't optimised it yet.

1. The directives it will recognise at the moment are :

   #include, #define, #undef, #if, #ifdef, #ifndef,
   #elif, #else, #endif, #line, #error, #pragma, #

   I have also put the #ident processing in there, but default is not to
   have it. If you want it, compile agcpp with -DIDENT_DIR.

2. Token pasting (##) and token stringizing (#arg) are carried out during
   expansion of macros.

3. Comments are recognised as white space within macros.

4. const, volatile and signed will be passed on to the compiler, if ANSI is
   defined (at run-time), or silently forgotten if not.

5. Multiple whitespace within macros is folded to a single space. (Whitespace
   in strings is not touched).

6. Command line arguments (run time) are

   -ANSI for the pre-processor to output ANSI-style C.

   -Cnumber for the pre-processor to fold identifier name longers than
   <number> to an internal format of the name. This is really needed
   for Minix, more than anything else. Default is 8, can be over-ridden
   at compile-time by defining COMPCHARS, and that in term can be
   over-ridden at run-time by this argument.

   -Dsymbol[=value] for a symbol to be defined. This will override any
   previous definition in agcpp, but will be overridden by any definitions
   in the text.

   -Idirectory for an additional include directory to be searched before
   the standard ones.

   -P to suppress line directives for the compiler itself.

   -Usymbol for a symbol to be undefined. It is not an error if symbol is
   not already defined.

7. #if defined(macroname), and #if defined macroname will be recognised.

8. The ANSI standard's __FILE__, __LINE__, __DATE__ and __TIME__ will all
   be recognised and expanded. It will also recognise __STDC__ (it has the
   value 1).

9. It will recognise, and correctly include

   #define NAME	<stdio.h>
   #define INDIRECTNAME	NAME
   #include INDIRECTNAME

   (The number of layers of indirection is unlimited - although why you
   would want to do this is a mystery to me).

10. Character constants are reduced to decimal numbers.

11. Trigraphs are expanded.
   
12. Adjacent strings will be concatenated, even on different lines.

13. Unary '+' is NOT recognised - it was too difficult really, as I'd
    have do a syntax check of the program first, to identify unary '+'
    from (a - b) + (c() / d) etc. Anyway, I don't like unary '+' - I
    consider it ill-conceived, and horrible syntax - so I won't use it.

14. The '\a' (bell) character will be recognised in string and character
    constants.

15. I left "noalias" out - is it still in the standard?

16. Function prototypes. If ANSI is not specified (at run-time), these
    are expanded at the basic level (i.e. not within the definitions in
    a function).

17. New-style argument passing (including const) is performed, and
    translated to old (K & R) style arguments if ANSI is not specified
    (at run-time). Please choose one or the other method of argument
    passing - mixing methods in the same function will not work.

18. I have added code to test the level at which a file was included, and
    if an attempt is made to re-include a file twice, then it will give an
    error message and not include it, if the file was included more deeply
    in the nested includes.

19. Long name identifier folding may or may not be needed - but I took the
    trouble to put it in for Minix's somewhat rudimentary (8 char ids)
    compiler.

It's not as efficient as it could be, but it takes up less space than the
cpp's I've been able to look at. Main target for efficiency is all the
string copying that takes place - I may pass pointers and their lengths when
I get time, but that's some distance away - I'll do it the day after Spurs
win the league, or Rangers do the double.

I may also get around to implementing the run-time library for some systems
sometime, so don't hold your breath.

Constructive comments gratefully received. If you have any flames about
processing time etc, take them somewhere else.

Regards,
Alistair G. Crooks,
Joypace Ltd., 2 Vale Road, Hawkhurst, Kent TN18 4BU, UK. (+44 580 753114)
UUCP Europe:			        ...!mcvax!unido!nixpbe!nixbln!agc   
UUCP the rest of the world:	 ...!uunet!linus!nixbur!nixpbe!nixbln!agc
