This is the README file for the Bunyip Information Systems Alpha 1.0 WHOIS++
server.

This server implements the basic WHOIS++ protocol and functionality as
described in the draft RFC from the WNILS working group of the IETF with
a few modifications which are described below. 

This distribution is considered ALPHA software to test the concepts put
out in the protocol document. Bug reports/fixes will be handled on a
best-effort basis by Bunyip. You can send mail to

	whois-group@bunyip.com

It would be great if you could include a fix with the bugs found. You can
also send mail to the ietf-wnils@aggie.ucdavis.edu list which is the
mailing list for the WNILS working group.

NOTES

The whoisd server uses a whois++ protocol library (libwhois.a) to parse
incoming requests and format replies. The sources are included in the
whois subdirectory and if you'd like to know more about it there is a
README file explaining what's going on there. whoisd uses a modified
version of the freely available freeWAIS-0.1 from CNIDR as its search
engine. The modifications are concerned with a basic field indexing
capability. Users should note that this capability has been hacked in and
is really in proof-of-concept mode, but it seems to work pretty well
with a few restrictions which are mentioned below. This field indexing
capability can be extended to the full WAIS server with a few
modifications, but this has not been completed here.

The whoisd server is designed to be run from inetd. It does not talk to
sockets directly, so it can be debugged directly from the command line
since it reads from stdin and writes to stdout. Remember that commands
end in CRLF so from the shell, you'll have to type

<protocol string>
<newline>

Centroids are not currently implemented in any form.

The section PROTOCOL CHANGES contains the changes to the last RFC draft
on the WHOIS++ protocol which are implemented in this server.


FEATURES

This server supports the basic WHOIS++ protocol. General, attribute,
value, attribute-value and template seaches are implemented. Handle
searches are not (other than by saying "handle=<whatever>" which is a
standard attribute-value search implying that you have an attribute
called "handle".

A modified syntax HOLD global constraint is implemented for both commands
and searches. In addition the global contraint "LOGICAL" allows users to
either AND or OR together components of searches. The "PARTIAL" local
contraint allows partial (wildcard) matches AFTER initial segments of the
search string. See CONSTRAINTS below for more information on all of these.

Quoting is implemented although this is not specified by the RFC. See
PROTOCOL CHANGES below for an explanation.

COMPILIATION & INSTALLATION

This installation assumes a certain knowledge of UNIX systems and will
not explain basic system administration procedures in great detail.
However, examples will be given were appropriate.

1) Unpack the release if you haven't already done so. This can be done by
going into the parent directory of the place you want it installed and
running the following command:

	zcat bunyip-whois++-1.0a.tar.Z | tar xvpf -

It will create a directory called whoisd and subdirectories whois and the
freeWAIS release, freeWAIS-0.1.

2) The whois server "whoisd" code itself is ANSI C, however freeWAIS is
not, so you can't use a standard ANSI compiler. However, standard cc
should work on most (if not all), platforms. You can also use the GNU
compiler gcc, but you may not specify the -ansi flag (since the freeWAIS
code will barf). In the file whoisd/Makefile you can set the CC and
CFLAGS variables to the appropriate values. The default values should be
appropriate for most systems. The system however, has only been compiled
under Sun Sparc, SunOS 4.1.3. I don't know how easy it will be under
Solaris or its kin.

3) For those of you familiar with WAIS/freeWAIS, the following flags must
be defined for the system:

LOCAL_SEARCH
LITERAL
PARTIALWORD
BOOLEANS

in order for the whois server to work.

This is the case with the default Makefile in the freeWAIS-0.1 directory.

4) Type "make". The compiler may give a few warnings.

5) The result should be the executable "whoisd" which you can copy to
where you would like it to reside in your directory tree.

CONFIGURATION

The whoisd server uses one configuration file to tell it what templates
you have available and where they are. The file tmpl.config in the
distribution gives a sample of what this looks like:

user		user.tmpl	user.dir/user 		User template
services	services.tmpl 	services.dir/services 	Services template

The fields, separated by whitespace (blanks, tabs) are as follows:

1) The name of the template

2) The path of the file containing the definition of the template. If not
   a fully specified path, then it is relative to the current working
   directory.

3) The path of the WAIS database containing the database. If not a fully
   specified path, then it is relative to the current working directory.

4) The description of the template

Comment lines may start with a '#' and are terminated by the NEWLINE.
Lines may be continued by the use of a backslash '\' followed by NEWLINE
like make(1).

A typical template looks like this (this is user.tmpl):

1	Template-Type		Type of the template
2	Name			Name of the user
3	Organization-Name	Name of the organization the user belongs to 
4	Organization-Type	Type of organization the user belongs to
5	Work-Phone		Work phone number
6	Work-Fax		Work FAX number
7	Work-Postal		Work postal address
8	Job-Title		Job title
9	Email			Email address
10	Handle*			Handle

The fields here are:

1) Index number of the attribute
2) The attribute name
3) The human-readable description of the attribute

The fields are separated by whitespace (tabs or blanks). Comment lines
may start with a '#' and are terminated by the NEWLINE. Lines may be
continued by the use of a backslash '\' and NEWLINE a la make(1).

You will notice that one of the attributes (Handle) has an asterisk
("*"). One (and only one) of the attribute names must have an asterisk
after it. This signifies to the system that this attribute is to be used
as the "headline" (in WAIS-speak). Under whoisd, the value associated
with this attribute will be used on the response to identify this record
uniquely in the database. Since there is no way currently to generate the
unique identifier, you should pick something like a name (or handle if
you have entered such a beast in your template).

The number is used internally by the system. No two attributes may have
the same number. The numbers do not have to be in any particular order
and the value of the number itself is not significant. 

A data template looks like this:

Template-Type: user
Name:	Alan Emtage
Organization-Name:	Bunyip Information Systems
Organization-Type:	Commercial
Work-Phone:	(514) 875-8611
Work-Fax:	(514) 875-8134
Work-Postal:	310 St. Catherine St. W., Suite 202, Montreal, QC, H2X 2A1
Job-Title:	VP Research and Development
Email:	bajan@bunyip.com
Handle: bunyip.ae1

and should only contain the attributes described in the associated
template description file. If an attribute is listed that does not occur
there, then the line will be treated "as is" and no field indexing will
be possible for that attribute. 

You data can be in separate files (one template per file), or may be all
in one file, separated by blank lines. I suggest that you put them
one-per-file since with large data files containing many templates, the
hack to add the field indexing into WAIS may get confused. However it
does work correctly most of the time. The templates may contain attribute
names without associated values, but these lines will be ignored by the
system.

As above the format is:

<attribute-name>:	<value>

In the data template the attribute name must be terminated with a colon.
NOTE: The blank template has no ":".

The files user.tmpl, tmpl.config and a few sample user records (in
user.dir) are included in the distribution.

The freeWAIS program "waisindex" in whoisd/freeWAIS-0.1/bin is used to
index the templates that you have made up. For example, using the sample
files as described above, and a set of user data templates in the
directory user.dir with names like user001, user002 etc (the actual file
names on the data template files are not significant to the system). Lets
say that you are in the parent directory to user.dir:

freeWAIS-0.1/bin/waisindex -d user.dir/user -t field_index:user user.dir/user*

The -d option will cause the wais database to be put in user.dir under
the name "user". The -t option tells the program that it is a field
indexed type and the ":user" says that the name of the template is
"user". The program will look in the file tmpl.config in the current
directory to find the path of the template definition (in this case in
user.tmpl). Finally the names of the index files are in user.dir/user*.

This is only a slight modification to the waisindex program and full
details can be found in the file
freeWAIS-0.1/doc/origical-TM-wais/waisindex.txt.

Note that once indexed the data files cannot be moved around since WAIS
uses hard coded paths into the database. If you move them, they will have
to be re-indexed.

Now that you have created a wais database, you can set up the server.

You'll need to add a line in the /etc/services file for the whois server.
Our line looks like:

whois++		6969/tcp 	# experimental whois server

You can determine what port number you would like it to run on, the name
"whois++" is unimportant, you can call it whatever you like. The protocol
spec says that it runs on the standard whois port (43) however, this may
change the behavior of the UNIX whois(1) command, check this out first.
Also if you are running NIS (formerly Yellow Pages), you'll have to do
the appropriate thing to change the services file that it knows about.

In the file /etc/inetd.conf you'll have to add a line of the form:

whois++	stream tcp nowait root <path1>/whoisd whoisd -d <path> -l <logfile> -C <config>

The first field must be the same string that you used in the
/etc/services file. The user to run the program under can be root (there
are no security problems with this as far as I know), but you can make it
any user that has permission to run whoisd and read the configuration and
data files. <path1> must be the full path of the whoisd executable.

The following options are recognized by the whoisd program:

-d <path>
   This causes the program to do a chdir() (cd) to the directory given by
   <path>. This is useful as the first option to the program because it
   means that you can the omit full path names from the rest of the
   options.

-C <config>
   The name of the configuration file <config> described above.
   By default the program will use the file called tmpl.config in the
   current working directory. If you don't use the -d option, you'll have
   to specify the full path of the configuration file here. You have to
   call the configuration file tmpl.config for the waisindex program to
   work correctly.

-l <logfile>
   If you don't use the -d option, then this is the full path of the file
   to log incoming requests to. If you don't have a -l option, then no
   logging will be done. The program will create the file if it doesn't
   already exist. IP addresses of connecting hosts and the raw protocol
   input will be written to this file.

-D <debug level>
   If set the program will write (non protocol conformant)
   output to the connection. You can use this if you are trying to track
   down a problem. See the file whois/README for a full explanation of
   the wSetDebug API call. The value for <debug level> is passed directly
   to this call.

Finally, send the inetd program a -HUP signal to get it to reread the
configuration file.

You can test the server out by telnet'ing to the port specified in the
/etc/services file. You should see the banner message:

% 200 Welcome to the WHOIS++ server. Type HELP for help. Bunyip V1.0 Alpha

BTW, the help facility has not yet been implemented.... :-(

You can add new data by defining the appropriate blank template, making
the entry in tmpl.config and indexing the files with the waisindex
program. They will automatically appear in the LIST command results.

PROTOCOL CHANGES

The following are coded up in the server and will be submitted to the
working group as proposed changes to protocol.

1) The HOLD attribute has changed from a unary to binary operator. You
need to say "hold=true" rather than just "hold".

2) The LIST, COMMANDS and CONSTRAINTS, commands create "virtual"
templates for their responses. For example, the CONSTRAINTS command
returns:

# FULL 4 
# constraint HOLD 
 Name: HOLD
 Description: Hold the connection for next query
 Values: <"true","FALSE">
 Scope: global
# constraint MAXHITS 
 Name: MAXHITS
 Description: Maximum number of responses
 Values: <100>
 Scope: global
# constraint LOGICAL 
 Name: LOGICAL
 Description: Logical connection between search terms
 Values: <"AND","or">
 Scope: global
# constraint MATCH 
 Name: MATCH
 Description: Type of match
 Values: <"STANDARD","initial">
 Scope: local
# END 

Which gives the contraints supported, a description, the valid values for
the constraint and whether they are local or global in scope. Defaults
values are given in CAPS.

3) System responses (starting with a '%') have status numbers, following
the conventions for the FTP protocol (rfc 959). Values of 100 - 199 are
informational and temporary, 200 - 299 are status values and permenant.
300 - 399 are non fatal temporary warnings, 400 - 499 are permenant
warnings and 500 - 599 are fatal permenant errors.

4) ABRIGED responses are given after 10 hits, not 2 as in the RFC. The
WAIS search system does not permit the determination of what element of
the inital search query caused a match, so they do not return this value.
It is possible that very few search engines will allow this and this may
need to be reviewed. The ABRIDGED responses here have degenerated to
HANDLE responses.

CONTRAINTS

1) The "HOLD" contraint can be specified as in the example below:

bajan:hold=true

Which means perform a general search for the string "bajan" (on both
attributes and values in all templates), and hold the connection open
after the results have been returned. The default value is "false" and
while implemented is not terribly useful.

2) By default, non-template search terms are OR'd together. For example,

name=alan; attribute=email

says find all records in which the attribute "name" has the value "alan"
OR all records which have the attribute called "email". However, the
global constraint "LOGICAL", can change this behavior. Eg,


name=alan; attribute=email:logical=and

means find the records for which both must search terms match. That is
those with name=alan AND contain an attribute called email.

Template search terms are always OR'd together.

3) The global constraint MAXHITS is numeric and puts an upper bound on
the number of records that will be returned from a search. However, more
items may remain. The order of the returned items is undefined.

bunyip:maxhits=20

returns at most 20 records from the general search for "bunyip".

The internal default is 100.

4) The local constraint "MATCH" determines the kind of match to be
performed on the search term in question. By default, the search term
must be completely specified for it to match an attribute or value in a
record. For example

buny

would not match "bunyip".

However, there may be terminal wildcard matching specified with the MATCH
constraint. The search

buny,match=partial

will match "bunyip". 

Note that this constraint may not be applied on a global basis, so that

buny:match=partial

will not be recognized as valid.


QUOTING

Strings to the left of `=' must start with a letter, and consist of
letters, digits and the dash (`-'), while strings to the right of `='
must either conform to the preceding rule, or be surrounded by double
quotes (`"').  Whitespace (spaces and/or tabs) may surround names,
strings and `='.  Quoted strings follow the normal convention that a
backslash (`\') quotes itself and a double quote.


Discussions about this distribution will be carried out on the WNILS list
so you should check there if you have comments or suggestions.
