


WAISINDEX(1)             USER COMMANDS               WAISINDEX(1)



NAME
     waisindex - Indexes files

SYNOPSIS
     waisindex      [ -d index_filename ]      [ -a ]      [ -r ]
     [ -mem mbytes ]  [ -register ]  [ -export ]  [ -e [ file ] ]
     [ -l log_level ]   [ -pos | -nopos ]   [ -nopairs | -pairs ]
     [ -nocat ]  [ -T type ]  [ -t type ]  [ -contents  | -nocon-
     tents ] filename filename ...

DESCRIPTION
     waisindex creates an index of the words  in  files  so  that
     they  can  be  searched quickly (see waissearch).  The index
     takes about as much disk space as  the  original  text.   It
     also creates a new source structure named index_filename.src
     if none exists.

OPTIONS
     -d index_filename
               This is the base filename  for  the  index  files.
               Therefore if /usr/local/foo is specified, then the
               index files will be called /usr/local/foo.dct etc.
               The index should be stored on the local file  sys-
               tem  of  the  machine running waisindex.  It works
               over NFS, but it is much slower.

     -a        Append this index to an existing one.  Useful  for
               incremental  additions or updates.  This will only
               add onto an index, so that if a file has  changed,
               it  will  get  reindexed, but the old entries will
               not be purged.  Therefore, to save space, it is  a
               good  idea  to  reindex  the  whole  set  of files
               periodically.

     -r        Recursively index subdirectories.

     -mem      How much main memory to use during indexing.  This
               variable  will  have  a  large  effect on how fast
               indexing is done.

     -register Register  this  database  with  the  directory  of
               servers.   You  are  encouraged  to register data-
               bases, but only ones  that  will  be  consistently
               running.  The directory of servers is available to
               anyone that is on the internet or can phone in.

     -export   This causes the resulting source description  file
               to  include  the host-name and tcp-port for use by
               the clients.  Otherwise the file contains no  con-
               nection  information,  and  is expected to be used
               only for local searches.




Thinking Machines Last change: Sun May 10 1992                  1






WAISINDEX(1)             USER COMMANDS               WAISINDEX(1)



     -e [ filename ]
               Redirect error output to pathname, if supplied, or
               to  /dev/null.   Error  output defaults to stderr,
               unless -s is selected, in which case  it  defaults
               to /dev/null.

     -l log_level
               set logging level.  Currently only levels 0, 1,  5
               and  10  are meaningful: Level 0 means log nothing
               (silent).  Level 1 logs only errors  and  warnings
               (messages of HIGH priority), level 5 logs messages
               of MEDIUM priority (like indexing filename  info).
               Level 10 logs everything.

     -pos (-nopos)
               Include (don't include -  default)  word  position
               information  in the index.  This will increase the
               index size, but will allow search  engines  to  do
               proximity.

     -nopairs (-pairs)
               Don't build (build - the default) word pairs  from
               consecutive capitalized words.

     -nocat    Inhibits the creation of a catalog.  This is  use-
               ful  for  databases  with  a large number of docu-
               ments, as the catalog contains 3 lines  per  docu-
               ment.

     -contents (-nocontents)
               Include (exclude) the contents of  the  file  from
               the  index.  The filename and header will still be
               indexed.  Default is type depedant.

     -T type   Sets the TYPE of the document to "type".

     -t type   This is the format of files that  are  handled  by
               waisindex.   It  is easy to parse a different for-
               mat, but that has  to  be  done  by  changing  the
               source  (ircfiles.c).   To  find  out  the list of
               currently known types, execute the waisindex  com-
               mand with no arguments and it will list them.

     filename filename...
               These are the files that will be indexed according
               to  the  arguments above.  To insure the files are
               registered in the filename table correctly, it  is
               advised that these be full paths (beginning with a
               /).  If the database is to be used from a  machine
               other  than  the  machine  on  which  the index is
               created,  this  should  be  a  machine-independant
               path.



Thinking Machines Last change: Sun May 10 1992                  2






WAISINDEX(1)             USER COMMANDS               WAISINDEX(1)



SEE ALSO
     waissearch(1), waisserver(1), waissearch-gmacs(1), xwais(1),
     xwaisq(1)

     Wide Area Information Servers Concepts by Brewster Kahle.
     Brewster@think.com


DIAGNOSTICS
     The diagnostics produced by the waisindex are  meant  to  be
     self-explanatory.


BUGS
     It temporarily takes twice the space it needs for an index.

     Due to some compile time constants  the  document  table  is
     limited  to  16 Megabytes.  This limits the indexer to data-
     bases with headlines that add up to less than  16  megabytes
     (since thats the principal component of the table).  This is
     typically a problem for database types  where  a  record  is
     essentially a headline (one_line, archie).

     See the note in ir/README in the wais distribution for  more
     detail.






























Thinking Machines Last change: Sun May 10 1992                  3



