Contains some notes about the Harvest Broker code intended for developers.

----------------------------------------------------------------------
Registry file format (stor_reg.c):

It's one file with a record-based format.  A record looks like:

	4 bytes in network-byte order for record size
	4 bytes in network-byte order for magic number
	4 bytes in network-byte order for record flag
	4 bytes in network-byte order for URL length
	n bytes of the URL
	4 bytes in network-byte order for Gatherer Name length
	n bytes of the Gatherer Name
	[... and so on for other ASCII fields (empty fields have length 0)...]
	4 bytes in network-byte order for number 1
	4 bytes in network-byte order for number 2
	[... and so on for other numeric fields ...]
	[...end of record of record-size bytes...]

The record header for each record includes:

	4 bytes in network-byte order for a record size
	4 bytes in network-byte order for a magic number
	4 bytes in network-byte order for a flag 

The flag (an unsigned int) would mark a deleted or valid record, 
and other stuff in the future.

With this format, the broker issues 2 read() calls per record: the
first to get the record size, the second to read() the n bytes of the
record.  The broker code would then check the magic number, do
something with the flag, then (if needed) parse out the record.  This
helps to cut the system calls down.

We might also want a header to the registry file that includes:

	4 bytes in network-byte order for a magic number
	4 bytes in network-byte order for a version number

and maybe some other things like:

	4 bytes in network-byte order for the number of records
	4 bytes in network-byte order for the number of deleted records
	4 bytes in network-byte order for the number of valid records

The version number would let us for sure know how many ASCII fields and
numeric fields there are for each record.  The stats on the records would help
to determine when to garbage collect the registry file, but they would need to
be continually updated.

So, the whole file looks like:

	[registry header of 20 bytes]
	[record header of 12 bytes]     --------|
	[record data of n bytes]		|
	[...]					| n records
	[record header of 12 bytes]		|
	[record data of n bytes]	--------|

The problem with this format is garbage collection.  When you delete an entry,
you just mark the flag in the record header that the record was deleted, and
append the new one to the end.  However, the Broker will compress the Registry
every so often.

----------------------------------------------------------------------
Below are the valid Query manager flags to the indexers:

Common:		#desc				Show Description Lines
		#opaque				Force no matched lines

Glimpse:	#index case insenstive		Case Insenstive
		#index error number		Allow "number" errors
		#index matchword		Matches on word boundaries
		#index maxresult number		Allow max of "number" results

Wais:		#index maxresult number		Allow max of "number" results

----------------------------------------------------------------------

Each SOIF object in the Registry contains the following attributes:

	URL			MANDATORY
	Gatherer-Name		MANDATORY
	Gatherer-Host		MANDATORY
	Gatherer-Version	MANDATORY
	Update-Time		MANDATORY
	MD5			OPTIONAL
	Description		OPTIONAL

Two objects are the same if they both have the same:
	Gatherer-Name, Gatherer-Host, Gatherer-Version, Update-Time
and either the same URL or the same MD5.

----------------------------------------------------------------------
Running the Broker:

To start the Broker, type:

      % broker /your/broker.conf [-new | -nocol]
   
The -new flag causes the broker to begin a new collection.  The broker will do
a collection immediately by default, rather than waiting for the normal
collection time.  This is useful for starting the Broker the very first time.
If you don't want the broker to do a collection on startup, then use the
-nocol flag instead.  

----------------------------------------------------------------------
Gatherer Bookkeeping Attributes:

	Update-Time
		- The time that the summary object was last updated.
		  REQUIRED field, no default.

	Last-Modification-Time
		- The L-M-T of the object itself.  Defaults to 0.

	MD5
		- The unique string identifying the object itself.
		  Defaults to NULL.

	Refresh-Rate
		- The number of seconds after Update-Time when the
		  summary object is to be re-generated.  Defaults
		  to 1 week.

	Time-to-Live
		- The number of seconds after Update-Time when
		  the summary object is no longer valid.  Defaults
		  to 1 month.

----------------------------------------------------------------------
The Broker's Query Result set (we're in the middle of redoing it, sorry) is a
stream of newline separated items with a 3 digit code, space, hypen, and space
at the beginning of each line.  It looks like this:

	101 - Message to the User
	103 - Error Message to the User
	111 - Error Message to the User that ends the Broker results
	120 - URL of the Match
	122 - Opaque data
	124 - nbytes\nnbytes of Description 
	125 - URL of the SOIF object
	126 - URL of the Broker's home page
	130 - End of Object marker

This line '200 - ...' is always sent first (for the version) and should always
be *ignored*.  This message may be sent a few times during the output to test
the connection, so ignore it.

For bulk transfers:
	000 - Bulk xfer success
	400 - Bulk xfer error

--------------------------------------------------------------------
Glimpse Performance Issues:  Limiting the lifetime of 'glimpse' queries:

This is the broker's view of things right now, so far it works very well...

  1. The Broker runs 'glimpse', and allows it to run for LIFETIME seconds;
     it also puts a *hard* time limit of LIFETIME CPU-seconds using setrlimit.
  2. after LIFETIME seconds, if 'glimpse' has not exited, then the Broker 
     sends SIGTERM to 'glimpse', sleeps for a few seconds, and sends 
     SIGKILL to 'glimpse'.
  3. The Broker sends SIGUSR1 to 'glimpseserver' to verify that it really
     did a clean up.  The SIGTERM to 'glimpse' should send 'glimpseserver'
     a SIGPIPE which will also cause a cleanup.  But the redundancy helps...
  4. The Broker uses what ever results 'glimpse' returned as the result set
     and then sends it to the user.  This is nice for very heavily loaded 
     brokers, you can give each user a small time slice worth of result sets.

Use <INPUT TYPE="hidden" NAME="lifetime" VALUE="LIFETIME"> in your query.html
to change the lifetime per query to LIFETIME.

The MAX_LIFETIME seconds value is configurable in the Broker's broker.conf
file.  LIFETIME is always between 10 seconds and MAX_LIFETIME seconds.  By
default, LIFETIME == MAX_LIFETIME, but LIFETIME can be passed along via
query.html.  See Glimpse-MaxLife in broker.conf.


--------------------------------------------------------------------
Debugging:  Use -Dsection,level (or -Dsection for everything) after 
	    broker.conf arg in broker

registry.c	section 70, uses level 1, 5, and 9	  REGISTRY
collector.c 	section 71, uses level 1
parser.c	section 72, uses level 1
registry.c	section 73, uses level 1, 5, and 9	  HASH TABLES
stor_man.c	section 74, uses level 1
query_man.c	section 75, uses level 1
event.c		section 76, uses level 1
main.c		section 77, uses level 1
select_loop.c	section 78, uses level 9

--------------------------------------------------------------------
WIP: Proposed query result interface specification (3/95):

  BrokerReturn   --> Version Header Body Trailer

  Version	 --> INTERFACEVERSION Separator VersionRev
  VersionRev     --> MajorNumber MinorNumber string
  MajorNumber    --> number
  MinorNumber    --> number

  Header         --> InfoField Header
  Header         --> 

  InfoField      --> BROKER_URL       Separator string
  InfoField      --> BROKER_INDEXER   Separator string
  InfoField      --> BROKER_COLLECT   Separator string
  InfoField      --> MESSAGE_TO_USER  Separator string
  InfoField      --> USER_EXT         Separator UserExtType Separator Data
  UserExtType    --> string

  Body           --> BulkTransfer
  Body           --> ObjectList
  Body           --> 

  BulkTransfer   --> CompressedBulkTransfer
  BulkTransfer   --> RawBulkTransfer

  CompressedBulkTransfer --> STARTMARKER "gzip'd RawBulkTransfer" ENDMARKER

  RawBulkTransfer   --> @DELETE  { SOIFStream } RawBulkTransfer
  RawBulkTransfer   --> @UPDATE  { SOIFStream } RawBulkTransfer
  RawBulkTransfer   --> @REFRESH { SOIFStream } RawBulkTransfer
  RawBulkTransfer   --> 

  SOIFStream     --> SingleSOIFObject SOIFStream
  SOIFStream     --> 

  ObjectList     --> Object ObjectList
  ObjectList     --> 

  Object         --> OptWarning ResourceURL ObjectURL OptExt ObjectEnd

  OptWarning     --> WARNING Separator WarningNumber string
  OptWarning     -->
  WarningNumber  --> number

  ResourceURL    --> RESOURCE Separator string

  ObjectURL      --> OBJECT Separator string

  OptExt         --> DescData OptExt
  OptExt         --> OpaqueData OptExt
  OptExt         --> AttributeData OptExt
  OptExt         --> UserExtData OptExt
  OptExt	 -->

  DescData       --> DESCRIPTION Separator Data

  OpaqueData     --> OPAQUE Separator Data

  AttributeData  --> ATTRIBUTE Separator AttrString Separator Data
  AttrString     --> string

  UserExtData    --> USEREXTENSION Separator ExtentionType Separator Data
  ExtentionType  --> string

  ObjectEnd      --> OBJEND

  Trailer        --> ObjectCount
  Trailer        --> Error
  Trailer        --> Stats
  Trailer        --> 

  ObjectCount    --> OBJCOUNT Separator number

  Error          --> ERROR Separator ErrorNumber string

  Stats          --> STATS Separator Data

  ErrorNumber    --> number

  Data           --> MagicNumber Nbytes NbytesOfData 
  Nbytes         --> number

  string         --> [^\n]*\n
  number         --> htonl(number)
  MagicNumber    --> htonl(0x329fa1d2)

