
ABE File Format:

ABE1 Format:
94 Printable characters from "!" to "~" are used.  Space and TAB are not.
The 86 characters from "%" to "z" are used to represent bytes on data lines.

ABE2 Format: (and line number format)
84 printable characters are used.  The 64 character set:
"./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
represents data bytes.  It is also used to form line numbers and checksum
bytes on lines.  Characters from the sets: +,-"#$%&'()* and
:;<=>?@_ are used to shift between sets.

The ABE2 system avoids the characters in the following set:
	! ` [ \ ] ^ { | } ~
as they have been reported to fail reliable translation between ASCII
and EBCDIC on IBM machines.  (!) fails due to a bug in "dd" on some unix
machines.

Normall, ABE Lines begin with a 3 character line "number".  This is formed from
the 64 ABE2 printables, in base 64, where '.' represents 0.  The first
byte is special, in that 'T' represents 0, rather than '.'.  (This avoids
lines that start with dots, or "From."

The 4th byte in any ABE line is a checksum of the remaining bytes in the
line.  The bytes are summed (as their ASCII values) and the result is taken
mod 64.  (The newline is not included.)  The checksum character is formed
by indexing into the ABE2 64 character set.

The remaining bytes can be data or header items.  All header items begin
with two identical characters that are NOT in the main character map --
ie. chosen from the shifting set.

If a line starts with two different characters, or two identical characters
that are not header characters, it is a data line.

The following header-lines are defined:

MAIN_HEAD	'##S'		Start of File
SUB_HEAD	'$$'		General Header with English keyword
CODE_HEAD	'""'		Character map line
MAIN_END	'##E		End of file

Most header lines use the SUB_HEAD and an English keyword.
Note that the three header chars '#', '$' and '"' can't be used as
mapping characters, unless it is assured no line will start with two
of them.  They should always be used as shifting characters.

(An undocumented option to dabe, h=string, redefines the three header characters
in case of future problems.  The default is h=#$", as shown above.  If you
make an encoding with different header chars, you would have to inform
dabe users to provide the correct option.)

Data lines are streams of printable characters, all 65 to 68 characters
long (plus line number and checksum.)  Bytes are represented with three
(or more) sets of printable characters.  A table is built defining which real
byte is meant by each printable character in the sets.

By default, we assume we are in set 0, the most common set.  If a
printable character is encountered in the mapping set, we simply look it
up in the table for set 0, and output the proper full byte.

Special escape characters, defined below, cause a temporary
shift into sets 1 and 2, or combinations of those sets.  After
encountering an escape character, we temporarily shift set for the next
1 to 3 characters.  Shifts never cross a line break -- we always go back
to set 0 with each fresh line.  The shift character meanings are defined
later.


MAIN_HEAD:	##Stver,fver,ever,style

The first three values are decimal numbers, the style is a string.
	
	tver -	Earliest version of the tiny dabe decoder that can
		decode this file.

	fver -	Version number of ABE encoder that encoded this file.

	ever -	Earliest full ABE decoder that can decode this file

	style - ident, up to 8 digits or upper case letters long, of the
		encoding style. Currently ABE1, ABE2, UUENCODE or TEXT.

CODE_HEAD:	""<which32> 8(<enc1><enc2><enc3><enc4><sets>)   (ABE1)
		""<which32> 16(<enc1><enc2><sets>)		(ABE2)

These lines define the printable character encodings used in these files.
Each line gives the encoding for a set of 32 bytes, and there will be 8
such lines.  The first character in these lines, <which32> will be the
character representing a number from 0 to 7 in the mapping set indicating
which of the blocks of 32 bytes this line defines.

ABE1:
This is followed by eight sets of five characters each.
The sets of 5 characters consist of 4 characters that are the printable
characters that will be used to encode the byte in question.  The 5th character
indicates which of the three sets each printable character resides in, for
all four bytes.   Thus the first set of five characters from CODE_HEAD 0
define, for the bytes 0, 1, 2 and 3, which printable characters in which
set will represent them.   The <sets> byte defines the set for each of the
4 bytes.  It is a number from 0 to 80, where '%' represents 0, as always in AB1.
Express the number as a 4 digit number in base 3 to get the 4 sets.  The
first (most) significant digit ( or <sets>/27 ) gives the set of the first
of the 4 bytes.  The last (least) significant digit (or <sets> % 3) gives
the set of the last of the 4 bytes, and so on.

ABE2:
This is followed by 16 sets of 3 characters each.  The third character
indicates which of the 4 sets each printable character resides in, for
both bytes.  It is thus the same as the ABE1 encoding, except the ABE2
mapping set is used, and the set byte is decoded as <sets>/4 for the first
char and sets%4 for the second byte.


MAIN_END:	##Elongint

	This line closes off the file.  It stops the tinydabe
	decoder from reading further, and includes the file checksum (longint).
	The regular DABE decoder can expect data after this line in a non
	sorted series of random blocks, or a multi-file encoding.
	The checksum is that of all printable characters representing
	data bytes in the file, mod 65536.  Only bytes from data
	lines are used in this checksum, and the sum is made of
	the printable characters in the data lines, not the actual
	bytes in the output file.

	Like all other checksums, the line number, checksum and newline
	bytes are not included in the sum.

SUB_HEAD	$$keyword=value

	Many sub-header lines are defined, and more are possible in the
	future.  Keywords are alphanumeric, and the case of the letters
	is unimportant.  Values can be any string of printable, non-blank
	characters.   While blanks could be used in values, they are
	advised against.

	Here are the sub-headers:

	startblock=blocknum,seekaddr,earlyver,filename

		blocknum - Decimal integer, the index number of this block, from
			0 to N-1.
		seekadr -  Decimal long integer, the seek address into the file
			where the block should be written.
		earlyver - Earliest decoder that can decode the file.
		filename - Ascii string, the universal filename of the file.
			(See below and the man pages for a discussion of
			universal file names.)

		This record begins a new block.  It usually is found at the
		start of an independent file, although multiple blocks can
		exist in a file.

	closeblock=blocknum,block_checksum,bytecount,blockcrc

		blocknum - Decimal integer, the index number of this block, from
		   0 to N-1.
		block_checksum - Decimal long integer, the checksum of the
		   printable characters in the block (not including the
		   CLOSE_HEAD line) after the 4th (checksum) byte of every
		   valid data or header line in the block.  This checksum is
		   presented modulo 65536.
		bytecount - The number of data bytes that should have been
		   present in the block.  (Not printable characters, but
		   actual output data bytes.)
		blockcrc - An unsigned long int, the 32 bit CRC of the block.
	style=string
		Sets the encoding style for this block.  (For now, used
		only in redundant block encoding, as the main file header
		sets the encoding style normally.)  The ident can be up
		to 8 upper case alphanumercis long.  Currently defined
		are ABE1, ABE2, UUENCODE and TEXT.

	os=string
		Defines the operating system that encoding took place on.
		Currently defined values are "unix" and "msdos",
		but anything can be used.  The two OSs must match if
		full file pathnames and machine independent forms of
		file information are to be used.

	blocking=true|false
		The value may be either "true" or "false."  This indicates
		whether this file will be split into blocks or not.

	fname=string
		Defines the file's true filename, to a limit of 60 characters.
		The true filename is only used when decoding on the same OS
		as the encoding was made on.  This field is optional.
	uname=string
		Defines the short universal name.  Universal names should
		include no directory characters and must be 12 characters
		in length or less.  While no other rules are enforced, it
		is advised that universal names be limited to alphanumerics
		and the dot (.) character, and that there be no more than
		one dot in a universal name, and that there be no more than
		3 significant characters after the dot.

		Decoding programs must ensure that universal names conform
		to the rules of their operating system.

	owner=string
		The name of the user who owns the file.  Currently optional
		and unused.

	total-blocks=longint
		In a blocked file, the argument will be a decimal integer
		indicating the total number of blocks the file was split
		into.  If the number is "10", then blocks 0 through 9 should
		be found.  This field is not found on non-blocked files.

	end_file=string
		The string will be the file's universal name, once again,
		just to be redundant.  This is the sub-header version of
		the MAIN_END header line.
	date=longint
		The modification date/time for the file, expressed as the
		number of seconds since 00:00:00 GMT, January 1, 1970.
		This is the unix epoch, and unix systems will be able to
		use this number directly.  Other systems must convert if
		they wish to use this number.  This field is optional.

	perm=int
		Specifies access permissions for the file.  Only the
		lower 3 bits (perms & 7) are OS independent.  The rest
		of the number is OS-dependent.  Of the lower 3 bits,
		bit 0 (lsb) indicates general execute permission, bit
		1 indicates general write permission and bit 2 indicates
		general read permission.

		On unix systems, this number will be the file's "mode."

	size=longint
		The size, in bytes, of the file.  This field is optional.
	filecount=int
		Number of files in this encoding.  Currently present but
		unused.
	linenumbers=true|false
		Indicates whether line numbers are present in this block,
		after this line.  If set to false, further lines until
		the end of the block need no line numbers.  The need for
		line numbers will resume in the next block or file.
		Thus this must appear in every block if line numbers are
		not to be used.
	filecrc32=unsigned long int
		This gives the 32 bit CRC for the entire file.
		The decoder only checks this value if the file was
		not blocked.  Blocked files rely on the block CRCs, as
		the file CRC will not be right if the blocks come in
		a random order.

The following sub-headings are understood by the decoder, but not
currently used by the encoder.  They make the decoder more general.  All
encoding start off as a standard encoding (ABE1,ABE2,UUENCODE, TEXT) but
these headers can change the parameters.

	numsets=int
		The number of character sets.  Numbers from 1 to 6 are
		valid.  ABE1 defaults to 3, ABE2 to 4, UUDECODE to 0.
	setgroup=int
		Defines the number of characters per set group in a
		CODE_HEAD line.  This is 4 for ABE1 and 2 for ABE2.  The
		number must be 1, 2 or 4.  This controls how CODE_HEAD
		lines are decoded.
	prints1=string
		Defines the mapping character set -- the N characters which
		are used to represent chracters, but not the shifts.
		This either defines the entire set, or the first 48 characters
		of it.
	prints48=string
		Defines the rest of the mapping character set if it is more
		than 48 bytes long.  Thus a set up to 96 bytes long can be
		defined.   prints48 MUST come after prints1, or the
		prints48 will be ignored.

The following headings define the shift characters for an ABE encoding.
There are 4 types of shift characters that can be defined, and the number
of shift characters per type depends on the number of sets.  If a shift
character is not to be define, use "0" in its place.  Thus "0" can not be
a shift character in an ABE encoding.

	xshifts=string
		Define the (sets-1) shift characters that encode a single
		byte temporary shift to a given set.  The first char is
		the shift to set 1, the second is the shift to set 2, etc.
		There is no shift to set 0, as that's the default.
	xxshifts=string
		Define the (sets-1)^2 shift characters that encode a
		double byte temporary shift to two arbitrary non-zero sets.
		In ABE1, for example, the first character means shift
		to set 1, and set 1 again.   The second is set 1, set 2.
		The third is set 2, set 1, the fourth and last is set 2, set 2.
	xcxshifts=string
		Define the (sets-1)^2 shift characters that encode a
		triple byte temporary shift to one arbitrary set, the
		default set (0) and another arbitrary set.  The same
		system is used as for xxshifts, except a return to the
		default set (normally) 0 is in the middle.  You often
		don't have room for all of these, so some with be marked
		undefined by using 0 in that place.
	runlength=char
		Defines the character as a 'run length' character.  This
		character takes the mappable character after it, gets the
		character index of it and adds 1 to get a count N.  N
		repeitions of the last unshifted or single-shifted
		character will be placed on the output file.  Note that
		this only repeats the last character from set 0 or the
		result of an xshift single-byte shift.  Note as well
		that since the character has already appeared in the
		output, you get N+1 of it.  (C+2, where C is the index
		number of the mappable character you used for the count.)

		This is not currently used by the encoder.  In fact,
		compression of this sort of thing should really be left
		up to compression programs.  But it could sometimes be
		of use in ABE encodings, so it's here.
	changeset=string
		Define the (sets) shift characters that encode a
		permanent change (for the rest of the line) in the
		default set.   In all current ABE encodings, the default
		set is always 0, but a future encoder might use this system.
		The change is only for the rest of the line.  The default
		returns to 0 at the start of any line.   The first char
		in the string will be the shift to set 0 byte, the second
		char will shift to set 1, and so on.
	



Sub-headings currently undefined, but possible for further expansion:
	variant=string
		OS variant.  Version numbers or things like "SysV" or
		"BSD" could be placed here.  Few files should contain
		anything so machine dependent that an OS variant should
		be needed, but who knows?
	group=string
		Group owner of the file
	link=string
		This file is just a link to another file in the same
		encoding.
	textfile=true|false
		Indicates whether the file is to be decoded as a text
		file.  Default is false on this optional field.
	newline=byte,byte,...
		Gives the string, as a series of decimal integer byte numbers,
		that represents a newline.  The decoder should output whatever
		is a newline on its own system.  For example, a unix system
		might say newline=10, an MS-DOS system would say newline=13,10

Escape 'Shifting' Characters for Data Lines (ABE1):

	!	Set 1, Set 1 (2 chars)
	"	Set 1, Set 2 (2 chars)
	#	Set 2, Set 1 (2 chars)
	$	Set 2, Set 2 (2 chars)
	{	Set 1 (1 char)
	|	Set 2 (1 char)
	}	Set 1, Set 0, Set 1 (3 chars)
	~	Set 1, Set 0, Set 2 (3 chars)
	( No mappings are defined for 2, 0, 1 and 2, 0, 2 )

Escape 'Shifting' Characters for Data Lines (ABE2):
	+	Set 1 (1 char)
	,	Set 2 (1 char)
	-	Set 3 (1 char)

	"	Set 1, Set 1
	#	Set 1, Set 2
	$	Set 1, Set 3
	%	Set 2, Set 1
	&	Set 2, Set 2
	'	Set 2, Set 3
	(	Set 3, Set 1
	)	Set 3, Set 2
	*	Set 3, Set 3

	:	Sets 1, 0, 1
	;	Sets 1, 0, 2
	<	Sets 1, 0, 3
	=	Sets 2, 0, 1
	>	Sets 2, 0, 2
	?	Sets 2, 0, 3
	@	Sets 3, 0, 1
	_	Sets 3, 0, 2

	(No mapping is defined for 3, 0, 3)

Encoding Style:

	ABE decoders ignore (give a warning for) unknown sub-header
	keywords, so expansions to the format can add these without
	necessarily hurting backwards-compatibility.

UUENCODE Format:
	The UUENCODE format uses no CODE_HEAD lines, and, in the data
	region, is identical to the basic format used by uuencode(1),
	with the exception of the presence of 4 bytes of line number
	and checksum (in ABE2, Base 64 form) on the front of each line.
	If line-numbers are turned off, the middle of a UUENCODE ABE
	encoding looks just like a UUENCODE encoding.  UUENCODE files
	can be blocked, but uudecode programs may not fully understand
	them in this format.

	Our uuencode method, like most uuencoders, uses the grave
	accent to represent 0, rather than space, as the original ones
	did.
TEXT Format:
	This format is designed for unix text files.  It maps all
	characters to themselves, other than the shift character
	'#' and the newline.  This allows text files to appear almost
	verbatim, although their lines must not begin with ##, $$ or "",
	as these will be mistaken for headers.

	Lines in a text encoding will have a newline output after them,
	unless the shift character '#' appears as the last character on
	the line.  These special lines that don't get a newline can be
	used to break long lines into a series of short ones.

	The encoder does not currently make TEXT format files.  Some
	future encoder may.

	The shift character '#' is a single byte shift.  It can be
	followed by any of the following bytes which will map to the
	appropriate useful byte:
	
		G	^G
		H	^H (backspace)
		r	Carriage return (byte 13)
		n	Newline (byte 10)
		E	Escape (byte 27)
		@	# (the shift char itself)
		EOL	Supress newline on this line

	The remainder of the characters in the shifted set all map to
	themselves.  In particular, '"' and '$' map this way, and shifting
	can be used to avoid lines starting with "" or $$.  These defaults
	can be changed with code_map lines in the ABE1 style.

	The generation of a proper TEXT encoder will allow dabe to replace
	'shar' and other text encodings.

Newlines:
	While we talk about only using safe printable characters here,
	one other very special character -- the newline -- is used.
	ABE files most definitely consist of lines.  Anything that removes
	all newlines will damage the files.

	Should this occur in the future, the fact that the ABE
	file formats do not use whitespace can be used, by writing
	translate utilities that substitute newlines for whitespace
	and back.  If you find a system that doesn't support newlines
	or whitespace, I guess this format just won't work.
