This file describes how to use the (collector) interface to the Gatherer,
and how to run the gatherd daemon.

Last-Modified: 1995/03/24 16:55:54

		Configuration File for Access Control
                -------------------------------------

Gatherd also supports a simple configuration file that supports access
control.  The configuration file has two directives "Allow" and "Deny".  A
line that starts with "Allow" is followed by any number of domain or host
names that are allowed to connect to gatherd.  If the word "all" is used,
then all hosts are allowed to connect to gatherd (the default).  "Deny"
is the opposite of "Allow".  For example, this configuation

	Allow	cs.colorado.edu ftp.wustl.edu
	Deny	all	

only allows hosts from cs.colorado.edu and the host ftp.wustl.edu to
connect to gatherd.

You may also add the 'Gzip' tag to gatherd.cf for the full path to gzip:

        Gzip /usr/local/bin/gzip

Or, simply add the directory in which gzip lies in your PATH before 
running gatherd.

		  The Collector/Gatherer Protocol
		  -------------------------------

First, you must have a description of a Gatherer.  The description
is, for now, a SOIF template, that looks like this:

    @GATHERER { http://rd.cs.colorado.edu/~hardy/www-home-pages-gatherer.soif
    Gatherer-Host{22}:	powell.cs.colorado.edu
    Gatherer-Name{39}:	Global - Selected Text - WWW Home Pages
    Gatherer-Port{4}:	1171
    Gatherer-Version{3}:	0.1
    Last-Modification-Time{9}:	772263506
    Refresh-Rate{6}:	604800
    Time-to-Live{7}:	2419200
    Update-Time{9}:	772263510
    }

The protocol supports the following commands:

	HELLO <hostname>         - Friendly Greeting
	HELP                     - This message
	SEND-OBJECT <oid>        - Send an Object Description
	SEND-UPDATE <timestamp>  - Send all Object Descriptions that
				   have been changed/created since timestamp
	SET compression          - Enable GNU zip compressed transfers
	QUIT                     - Close session


For example, to retrieve all of the gatherer's files the session would
look like (simple implementation is included as gather.c):

	HELLO client.host.name
	SEND-UPDATE 0
	QUIT

To retrieve all of the templates that have been changed/created in the
last week would look like (where t == time(NULL) - (1 * WEEK), and
WEEK = (60 * 60 * 24 * 7)):

	HELLO client.host.name
	SEND-UPDATE t
	QUIT

The protocol has an NNTP flavor; you can 'telnet localhost 1171' and play
around interactively ala NNTP or SMTP.  A welcome message is sent by the
server initially.  Then after each command that the client submits, a
error/success message is sent by the server.  The first 3 characters of
these messages contains one of the following error codes.  If the client
doesn't send a command in 5 minutes or so, the connection times out.

Protocol Error Codes:

	000 Successful Greeting sent by server
	001 Unknown command
	002 Unimplemented command
	003 Access Denied
	100 Successful HELLO command
	101 Invalid usage of HELLO command
	102 DNS name & given name don't match in HELLO command
	200 Successful HELP command
	300 Successful SEND-OBJECT command
	400 Successful SEND-UPDATE command
	401 Invalid usage of SEND-UPDATE command
	499 End of SEND-UPDATE output
	999 Goodbye


The WELCOME message

The welcome message contains information about the protocol version.  The
welcome message that the server sends is in following format:

	000 - HELLO <version> <server host> - are you <client host>?

like this:

	000 - HELLO 0.1 powell.cs.colorado.edu - are you burton.cs.colorado.edu?


The SEND-UPDATE command

The output of the SEND-UPDATE command looks like this:

	400 - Sending all Object Descriptions since 0
	@DELETE { }
	@REFRESH { }
	@UPDATE { 
	@DOCUMENT { /* template for object 1 */ }
	@DOCUMENT { /* template for object 2 */ }
	@DOCUMENT { /* template for object 3 */ }
	...
	@DOCUMENT { /* template for object n */ }
	}
	499 - Sent n Object Descriptions

Currently, only the @UPDATE section is implemented.

Some simple compression support has been added.  The 'SET' command can be
used to set the server in compression mode.  When the server receives the
command 'SET COMPRESSION', the all SEND-UPDATE requests are compressed
using GNU zip (gzip).  However, when all of the data for the SEND-UPDATE
answer is sent, the socket is closed.  This is so that the client doesn't
need too look for a end-transmittion message (the 499 response normally).
So when the SET COMPRESSION command has been issued, the output of the
SEND-UPDATE command looks like this:

	400 - Sending all Object Descriptions since 0
	...GNU zip'ed data here...
	...socket closed at end-of-transmission...

The server can use faster index files to serve the templates.  When given
the original database, mkindex will build an index for gatherd to use to
serve the templates faster.  It will also support a cache for sending
all templates in compressed mode; use the mkcompressed command to build
this cache.

Version 0.3.x of the interface, packetizes the gzip data so that it doesn't
close the connection at the end-of-transmission.  This code hasn't been
integrated yet.  Also, both the @DELETE and @REFRESH sections have not been
implemented yet.

-Darren Hardy, July 1994
