ThisForth -- a macro-oriented Forth based on Standard (ANS) Forth  94-08-14
===========================================================================

"No, Daddy. When you've seen one, you have seen one." Dorothy, age 6 (1963)

This Forth extends Standard (ANS) Forth with integrated macro processing.  
Macros are not recognized and expanded by a pre-processor, but by the 
built-in console/textfile interpreter-compiler.  `This Forth' named itself 
from the many times I had to write it when describing it.

>IN and SOURCE are replaced by GET-CHAR, STACK-CHAR, and PLEASE.

Source for Double Number, Exception, Search Order, and
foundations for Floating Point and Tool, word sets are provided.

Among other things, This Forth can be used as a framework to provide 
application specific Forth primitives incorporating third party material.

FEATURES
--------

This Forth features:

	Integrated macro processing
	
	Input/output re-assignment
	
	Portable implementation
	
	Easy extensibility
	
	Pinhole	optimization

ThisForth was written:

(1)	To try out Standard (ANS) Forth.

(2)	To integrate macro processing with Forth.

(3)	To be able to use Forth	in a conventional environment.

(4)	To have	Forth as an interpretive shell and scripting
	language for applications.

INPUT and OUTPUT
----------------

What distinguishes This Forth is the approach to input and output.  
Logically there is just one input device and one output device.  These 
can be dynamically reassigned, but This Forth sees all input source as 
one single sequence of characters, with no awareness of the origin.

There are Forth words to insert textfiles (`STREAM') or character strings 
(`PLEASE') into the input source.  This Forth still sees the result as one 
single sequence of characters.  That is how macros are implemented.

MACROS
------

A macro is a word that inserts a character string into the input source.  
The character string may have slots that are filled by string parameters.  
The character string will be processed by the next Forth word(s) that 
consume input source.

A _word_ -- a sequence of visible characters -- is the fundamental lexical 
unit in all Forths.  In This Forth successive invisible characters are 
filtered out of the input source.  When a visible character is encountered 
characters are collected until a character that is not visible is 
encountered.  The collected characters form a word which is then treated by 
Forth's text interpreter.

There are no lines, records, blocks, or files, recognized in this process.  
There is no concept of input buffer.

RE-SCANNING INPUT
-----------------

In most traditional implementations the value at >IN can be modified so the 
input source will be re-scanned.  The input source itself must not be 
changed.  In This Forth text that has just been read can be modified, 
expanded, and re-inserted in the input source.  The file is not changed.

PLEASE <spaces><char><text><char>
---------------------------------

PLEASE is the word used by macros to insert a character string or section
of a character string into the input source. The character " or ' or / is
usually the character <char> used as the delimiter.

	A complete implementation of the Eaker CASE structure in This Forth:

	: CASE    PLEASE "[ 0 ] " ; IMMEDIATE
	: OF      PLEASE "OVER = IF DROP " ; IMMEDIATE
	: ENDOF   PLEASE "ELSE " ; IMMEDIATE
	: ENDCASE PLEASE "DROP ESAC " ; IMMEDIATE
	: ESAC    ?DUP IF PLEASE "THEN ESAC " THEN ; IMMEDIATE

The character ~ (swung dash, but usually miscalled tilde) can be used as a 
parameter place holder within <text>.  If it is used, PLEASE takes a 
character string as an argument, which is consumed by PLEASE.
	
	: ABORT"   [CHAR] " PARSE PLEASE 'IF ." ~ " ABORT THEN ' ; IMMEDIATE

	: GET-WORD  BL WORD COUNT ;            ( Used often in macros. )
	: GET-LINE  EOL PARSE ;            ( Get the rest of the line. )

	: ??        GET-WORD PLEASE "IF ~ THEN " ; IMMEDIATE
	( Used:	?? EXIT   ?? LEAVE   ?? NEGATE   ?? CHAR+   and so on. )
	
	: TIMES     GET-LINE PLEASE "0 ?DO ~ LOOP CR " ; IMMEDIATE
	( Used:	5 TIMES rest-of-the-line    and so on. )
	
	: [']       GET-WORD PLEASE "[ ' ~ ] LITERAL " ; IMMEDIATE

	: POSTPONE  GET-WORD PLEASE "['] ~ EXECUTE " ; IMMEDIATE

A ~ can occur more than once in <text>, with each occurrence being replaced 
by the same character string parameter.  In the infrequent occasions that a 
macro takes more than one parameter, the text must be broken into sections 
that are inserted first-in last-out.

	( Candy coating.  Used:  `SET variablename rest-of-the-line'. )
	: SET			( "<spaces><name><rest-of-the-line>" -- )
		GET-WORD		( variablename .)
		GET-LINE		( variablename . rest-of-theline .)
		2SWAP PLEASE "~ ! "	( rest-of-theline .)
		PLEASE "~ "		( )
	; IMMEDIATE
	
	Given `: VARIABLE COLUMN', the following lines compile the same
	one or two primitive Forth instructions.

	0 COLUMN !
	SET COLUMN 0
	0 SET COLUMN

Instead of GET-LINE you might want to use `[CHAR] ; PARSE'.

LOW LEVEL FORTH
---------------

This Forth's primitives are written in Low Level Forth.  Low Level Forth is 
an application-specific form of C, implemented as M4 macros, designed to 
define Forth primitives.  Low Level Forth makes it easy to integrate third 
party material.

The This Forth kernel has three parts: a part in Low Level Forth, a part 
in (high level) Forth, and utility functions written in straight C.

To build This Forth you have to take two steps:

First, use M4 to convert the low level Forth part of the the kernel to ANSI 
C.

Second, compile all the C code together.  Be sure you have an ANSI C 
compiler.

See the accompanying README and Makefile.

You can get a public domain M4 from the same place you get This Forth.

EXTENDING THE KERNEL
--------------------

Low Level Forth lets you extend the kernel with new primitives and 
extensions integrating third party programming.  This Forth itself has a 
good example of this in how it implements File Access.

As another example, here's how Andrew Mckewan implemented the basic 
floating point operations in Low Level Forth:

	# define	fpop	ftop = *F--
	# define	fpush	*++F = ftop, ftop =
	
	double  fstack[FSTACK_CELLS], *F = fstack, ftop; 
	double  f;
	
	Execution(`FDROP')      fpop;                          Done
	Execution(`FDUP')       *++F = ftop;                   Done
	Execution(`FNIP')       F--;                           Done
	Execution(`FOVER')      fpush F[-1];                   Done
	Execution(`FSWAP')      f = ftop, ftop = *F, *F = f;   Done
	Execution(`FROT')
	        f = F[-1], F[-1] = *F, *F = ftop, ftop = f;
	Done
	
	Execution(`S>F')        fpush (double)top, pop;        Done
	Execution(`F>S')        push (cell)ftop, fpop;         Done
	
	Execution(`F+')         ftop += *F--;                  Done
	Execution(`F-')         ftop = *F-- - ftop;            Done
	Execution(`F*')         ftop *= *F--;                  Done
	Execution(`F/')         ftop = *F-- / ftop;            Done
	Execution(`FSQRT')      ftop = sqrt(ftop);             Done
	Execution(`F**')        ftop = pow(*F--, ftop);        Done

	void	fliteral(void)
	{
		if (! state) return ;
		latest = c(NEXT);
		u.Double = ftop, fpop;
		c(u.Short[0]), c(u.Short[1]), c(u.Short[2]), c(u.Short[3]);
	}
	Behavior
	        u.Short[0] = code[I++], u.Short[1] = code[I++],
	        u.Short[2] = code[I++], u.Short[3] = code[I++],
	        fpush u.Double;
	Done

In the text interpreter at the appropriate location:

		char * pp;
	        f = strtod((char *) &name[pocket + 1], &pp);
	        if (*pp == NUL)
	            fpush f, fliteral();
	        else 

You're not expected to understand all of this yet.  It's just to give you 
the flavor.

FILE ACCESS
-----------

This Forth has no special words to read or write files.

Instead a program temporarily reassigns the input or output device, and 
uses the words, GET-CHAR, WORD, PARSE, EMIT, TYPE, CR, SPACE, SPACES, .R, 
U., U.R, ." and . to read and write files.  The Double Number and 
Floating Point word sets provide more words.

This Forth has words to open, flush, and close files, to tell what is the 
current position of a file, and to seek to a given position in a file.  
These are interfaces to the Standard C Library functions.

The names are the same as the Standard C Library functions, and the 
arguments are the same, with Forth character strings replacing NUL 
delimited strings.  The Low Level Forth definitions do nothing more than 
call the function, and convert arguments between Forth and C format.

The words are FOPEN, FFLUSH, FCLOSE, FTELL, and FSEEK.  The value 
returned by a Standard C Library function is left on the data stack.

In C you can often ignore the return value; in Forth you must do 
something with it because it's on the stack.  So in practice the 
primitive words are seldom used.  Instead words that check for an error 
are used.  The names are all past participles indicating that the action 
has been done.

	: OPENED ( filename . mode . -- file-id )
	        FOPEN DUP 0= ABORT" Can't open "
	;
	: FLUSHED ( file-id | 0 -- ) FFLUSH ABORT" Can't flush. " ;
	: CLOSED ( file-id -- ) FCLOSE ABORT" Can't close. " ; 
  	: TOLD ( file-id -- offset ) FTELL DUP 0< ABORT" Can't tell. " ;
  	: SOUGHT ( file-id offset whence ) FSEEK ABORT" Can't seek. " ;
  	: INPUT ( filename -- file-id ) S" r" OPENED ;
  	: OUTPUT ( filename -- file-id ) S" w" OPENED ;
  	
Low Level Forth:

	Execution(`GET-CHAR')  push char();       Done
	Execution(`STACK-CHAR') unchar(top), pop; Done
	Execution(`NEXT-CHAR')
		push char(); if (top != EOF) unchar(top) ;
	Done

GET-CHAR gets a character from input source and removes it from
input source.

STACK-CHAR inserts a character into input source.

NEXT-CHAR gets a character from input source but does not remove it.

Use GET-CHAR, GET-WORD (or BL WORD COUNT), GET-LINE (or EOL PARSE), to 
get a character, a word, or a line, from input source.

REASSIGNING INPUT/OUTPUT
------------------------

First get `file-id' by using INPUT, OUTPUT, OPENED, or FOPEN.

	STREAM	( file-id | 0 -- )

        	Save the file identity of the input device on the file stack,
        	and replace it with the file identity, `file-id'.
        
       		0 is used to identify standard input.
		
	UNSTREAM ( -- ) 
	
		Restore the previous file identity of the input device from
		the file stack.
	
	DISPLAY ( file-id | 0 -- )
	
		Set the file identity of the output device to the file 
		identity of `file'.
	    
		0 is used to identify standard output.

PINHOLE OPTIMIZATION
--------------------

This Forth uses peephole optimization when compiling to reduce common 
two-instruction and three-instruction sequences to one anonymous primitive 
Forth instruction.  Because of this narrow window it's called `pinhole' 
rather than `peephole'.  Succeeding instructions may do further 
optimization.

        In the examples, single lowercase letters other than `b' are
        used indiscriminately for variable names, short literals, and
        short constants.  `b' is used for a literal or constant that
        is a power of 2.  Sequences of lowercase letters are used for
        arbitrary strings.

	Some examples of single Forth instructions.

	n +    x + n +    x n +    b *    x b *    x b * n +    b MOD
	x @    x !    x +!    x n + @    x n + !    n OVER =    DUP n =
	SWAP -    n SWAP    ROT ROT   n CELLS    n CELLS +    
	x n CELLS +    x n CELLS + @	1 +LOOP    1 CHARS +LOOP
	['] x >BODY    ['] x >BODY @    ['] x >BODY !    TO x
	['] x >BODY n CELLS + @    n PICK    n <    n =
	n ALIGNED   x n AND   x n OR   x n XOR   n LSHIFT   n RSHIFT
	DUP n =    n OVER =   	x n LSHIFT    x n RSHIFT

	C" aaa" COUNT   FALSE UNTIL

	DUP IF    ?DUP IF    0= IF    0<> IF
	DUP 0= IF    ?DUP 0= IF    0= 0= IF

	The optimizations with IF also apply to UNTIL, WHILE, and the
	test in ABORT"

	BEGIN 1- ?DUP 0= UNTIL compiles to two words, the same as
	BEGIN 5 - CELL+ ?DUP FALSE = UNTIL

        Sometimes everything get optimized away.  This can happen
        across macros and with constants or expressions.
		
	0 +   0 -   1 *   1 /   TRUE UNTIL   ROT ROT ROT
	
	The body of the following definition is 5 Forth instructions.
		
        : >UPPER    DUP  [CHAR] a -  26 U< IF   [CHAR] a - [CHAR] A +   THEN ;
                    ---  ----------  ----- --   --------------------- 

SELF-DEFINING WORDS
-------------------

As a by-product of optimization certain required words do not appear in 
the dictionary, but yet are recognized when encountered.

	0<     0=     0>     1+     1-     2*
	
What is happening is that when the text interpreter does not find a word in 
the dictionary it tries to convert it to a number.  If successful it treats 
the number as a literal.  In This Forth if it is partially successful and 
the next character is one of +-*/,<=> it accepts what it has converted and 
inserts the rest of the word back into the input source.

Thus the text interpreter sees the "words" above as

	0 <    0 =    0 >    1 +    1 -    2 *

Optimization then makes single anonymous words out of them.

`b *' where b is a power of 2 becomes a one Forth instruction shift.

Given `: K PLEASE " 5 LSHIFT 5 LSHIFT " ; IMMEDIATE' ( Used in hex or decimal. )

`2 K CELLS 1-' becomes literal 8191, `2 K CELLS *' becomes the one 
primitive Forth instruction `13 LSHIFT'.

SELF-COMPILING WORDS
--------------------

The Standard allows lowercase names and interpretation-time conditionals in 
a Standard System.  Just don't use them in a program you want to run on all 
other Standard Systems.

Mitch Bradley has shown the right interpretation behavior for most 
interpreted structures.

Some compilation-only words, that is, words with ambiguous execution 
behavior, can resolve the ambiguity by temporarily entering compiling 
state.  When the structure that required compiling has been completed, 
interpreting state is entered, and what was just compiled is executed,
and discarded.

The temporary code should be compiled where it won't interfere with HERE.

Such words are

	IF   BEGIN   DO   ?DO   CASE    ."    ABORT"    

This is such a simple thing to implement there is no excuse for a 
system not to do it.

CODESPACE NAMESPACE DATASPACE
-----------------------------

In This Forth dataspace and namespace are separate from codespace.
Namespace is also used for text-literals.

Strings in namespace are unique.

	: hi ." hello" ;
	: hello ." howdy" ;
	: hi S" howdy" ;
	: howdy ABORT" hi" ;
	
There will be only one string each for hi, hello, and howdy.

This Forth programs can exploit the separation of dataspace and codespace.  
See checques.fo, crc.fo, and the definition of VALUE.

DICTIONARY STRUCTURE
--------------------

In dataspace and namespace the address unit is an unsigned character.

In codespace it is an `instruction'.  As provided here it's a short 
integer.

The structure for a definition is:

	The relative location in codespace of the previous definition.
	
	The relative address in namespace of the name of the definition.
	
	The primitive code value for the immediate behavior for the definition.
	
	Optionally (but for almost all definitions) the primitive code values 
	for the execution behavior or behaviors of the definition.

There is no immediate bit.  The text interpreter is ignorant and 
apathetic about whether a word is immediate.  Also, the text interpreter 
doesn't know and doesn't care whether the compilation state is compiling 
or interpreting.

NOT AINT STANDARD
-----------------

The First Law of Forth is:

	If you don't like something, change it.

This law has been revoked by Standard Forth, but the Second Law is still
in effect.

The Second Law of Forth is:

	Anything controversial is unnecessary.

Unnecessary is not a bad thing.  I wouldn't give up `+' because it can be 
expressed with `0' and `-'.  Macros by definition are unnecessary.

The most controversial word in Forth is NOT.  It is so controversial that 
it had to be left out of the Standard.  You are expected to say whether it 
is definable as `: NOT 0= ;' or `: NOT INVERT ;' in every Standard Program 
that uses it, or whether it doesn't matter.  In This Forth it matters.

In This Forth the definition is `: NOT PLEASE "0 = " ; IMMEDIATE'.

This is so that optimization can take place.

Given `: S= PLEASE "COMPARE 0= " ; IMMEDIATE',
`S= NOT IF' becomes `COMPARE IF'.  So does `COMPARE 0<> IF'.

FILTERS
-------

I use This Forth as an alternative to SED, AWK, and PERL, especialy away 
from Unix.

I find it faster to write filters in This Forth than in those other
languages.

Here is a fast-hack filter to convert my coding style to the LOUD form
required to be portable Standard.

( Change everything to uppercase,
( except after [CHAR], [char], \, or between quotes or parentheses. )

: STANDARDIZE                           ( file-id -- )
        filter                                  ( )
                case
                        next-char graph? not
                if					( Invisible char )
                        get-char emit
                else
                        get-word              ( word .)

                        2dup s" [char]" s=
                        orif 2dup s" [CHAR]" s= then
                if					( [char] or [CHAR] )
                        2drop ." [CHAR] "
                        get-word type
                else
                        2dup s" (" s=
                if					( Comment )
                        2drop ." ( "
                        [char] ) parse type
                        ." )"
                else
                        2dup s" \" s=
                if					( Backslash )
                        2drop ." \ "
                        get-line type
                        eol stack-char
                else
                        please "~"              ( )
                ( Re-scan word char by char. )
                        next-char [char] " =
                if					( Quote )
                        get-char emit
                        [char] " parse type
                        [char] " emit
                else					( Anything else )
                        get-char >upper emit
                esac
        unfilter
;

Of the components, FILTER, NOT, S", [CHAR], .", ORIF, S=, and UNFILTER, are 
macros defined in the high level Forth part of the kernel.

: FILTER    please "stream begin next-char eof <> while " ; immediate
: UNFILTER  please "repeat source-id unstream rewind " ; immediate

: ORIF      please "?dup 0= if " ; immediate
: [CHAR]    ~PLEASE "[ CHAR ~ ] LITERAL " ; IMMEDIATE
: S"        [CHAR] " PARSE PLEASE 'c" ~" count ' ; IMMEDIATE 
: ."        [CHAR] " PARSE PLEASE 'c" ~" count type ' ; IMMEDIATE

In the last group, each compiles as one primitive Forth instruction.

(This filter has been expanded in file `standize.fo'.)

DISTRIBUTION
------------

I'm giving This Forth away.  If you do anything great with it, remember me.

I'm calling it `This Forth Beta' to let you tell me the things I messed up, 
so I can fix them.

I hope that this is enough for you understand how to use it.

I'm writing a book going into detail about whys, wherefores, and how comes.

Procedamus in pace -- Wil       wilbaden@netcom.com

ACKNOWLEDGMENTS
---------------

Thanks to:

Jocelyn Baden, Sean Barrett, Neil Bawd, Mitch Bradley, Dorothy Bullard, 
Skip Carter, Loring Craymer, Chris Heilmann, Andrew Mckewan, Elizabeth 
Rather, Robert Reiling, C. H. Ting, Kiyoshi Yoneda, Tom Zimmer, for 
inspiration and such.

Ozan Yigit and Richard A. O'Rourke, for PD m4.

