genser - generate serialising code

This is a pair of programs which generate code to serialise and
deserialise data structures, given a description of them.  I wrote them
for the protocol used by userfs, but they are quite general.  One
program generates code for encoding, decoding and finding the size of
the encoded representation, and the other generates prototypes for them,
and emits the types in ansi C.

Feel free to use this code for your own projects, but remember this
is GPL code.

GENHDR

Usage: genhdr [-C] input.ty 

This emits all the datatypes and function prototypes in ansi C to
standard output.  If -C is specified then array structures are generated
with destructors which free memory allocated by decoding.

GENCODE

Usage: gencode [-sedC] [-l dir] [-s suff] input.ty [> output.c]

Gencode generates C code for the encode/decode/sizeof functions
(generated with the -e, -d, and -s options respectively).
-l dir will generate one function per file into directory "dir"
for the generation of archive libraries so that programs don't
have to link everything in.  -s sets the suffix of the output files
(.c by default).  -C generates code to work with the destuctors
generated with genhdr -C, and sets the suffix to .cc when used with
-l.

INPUT FILE FORMAT

The input file is essentially C type definitions, with a few exceptions.
By default, code is generated for any type named with typedef, and any
anonymous type used in a typedef.

Arrays are defined as follows:

	typedef int foo[];

This defines a type "foo", which is an unbounded array of ints (signed
32 bit words).  This generates a structure of the form:

	struct {
		int *elems;
		long nelem;
	};

which is a pointer to the base of the array, and the number of elements.

This, and the complete exclusion of functions from the type system, are
the main differences from pure C syntax.  A number of parsing hacks have
been put in place so that C syntax can be parsed without semantic
content for genser.

Structures may be named with the "struct foo {...};" syntax, but they
are ignored until they are used in a named type.

Often you want to include a system include file for a couple of types,
but it defines dozens.  Typedefs can be marked as "generate on demand"
(when used in other types) by enclosing them in a notypedef block:

notypedef {
#include <sys/types.h>
}

This will only generate code and definitions for the types in <sys/types.h>
if another type uses them.

The input file is run through cpp ("/lib/cpp -Ulinux -C").

It is possible to quote parts of the input file directly into the output
file, by putting '%' at the beginning of the line.  These lines are
completely uninterpreted and are copied through with the '%' stripped
off.  The order in the output of these lines is maintained, but the
order in relation to genser output corresponding to input surrounding
the quoted lines is not specified, but generally they will be before
any generated output.  Quoted lines are not copies through by gencode,
only genhdr, so they are only in the header lines.

%/* Copy into output file */
%#include <sys/types.h>

When decoding arrays of variable size and pointers to objects, the
decode routine calls a function or macro void *ALLOC(size_t size) to
allocate memory.  It expects this function will always return a valid
pointer to free memory.  By default, it is defined as malloc(), but it
can be redefined in the quoted section to something appropriate to local
conditions.  The memory allocated in the decode function must be
manually freed when you've finished.  If one generates C++ code
(-C option to gen(code|hdr)) then destructors which call FREE() are
generated for arrays.  They will only attempt to free memory if it
was allocated by a decode function.

coder.h is a file that must always be included.  It contains the
definitions of the encode/decode/sizeof functions for the base types
used by genser.  It is normally included as "coder.h", but if one
defines _NO_CODER_H_ in the output file (with a quoted define), it
will not be automatically included.  One can then include it in a
more appropriate way, or replace it altogether.

There are no known actual *bugs*, but there are a few limitations
and desireable features.  Most importantly, C++ support could be
much better.  Each type could be a class with encode/decode/sizeof
methods, and memory can be allocated with new and delete if required.

Also, this readme file could be turned into a real man page or
texinfo page.

Bugs and comments to
	Jeremy Fitzhardinge <jeremy@sw.oz.au>
