NAME
    Wais - access to freeWAIS-sf libraries

SYNOPSIS
    `require Wais;'

DESCRIPTION
    The interface is divided in four major parts.

    SFgate 4.0
              For backward compatibility the functions used in SFgate up to
              version 4 are still present. Their use is deprecated and they
              are not documented here. These functions may no be supported
              in following versions of this module.

    Protocol  XS functions which provide a low-level access to the WAIS
              protocol. E.g. `generate_search_apdu()' constructs a request
              message.

    SFgate 5  Perl functions that implement high-level access to WAIS
              servers. E. g. parallel searching is supported.

    dictionary
              A bunch of XS functions useful for inspecting local databases.

    We will start with the SFgate 5 functions.

USAGE
    The main high-level interface are the functions `Wais::Search' and
    `Wais::Retrieve'. Both return a reference to an object of the class
    `Wais::Result'.

  Wais::Search

    Arguments of `Wais::Search' are hash references, one for each database
    to search. The keys of the hashes should be:

    query     The query to submit.

    database  The database which should be searched.

    host      host is optional. It defaults to `'localhost''.

    port      port is optional. It defaults to `210'.

    tag       A tag by which individual results can be associated to a
              database/host/port triple. If omitted defaults to the database
              name.

    relevant  If present must be a reference to an array containing
              alternating document id's and types. Document id's must be of
              type `Wais:Docid'.

              Here is a complete example:

                   $result = Wais::Search({'query'    => 'pfeifer', 
                                           'database' => $db1, 
                                           'host'     => 'ls6',
                                           'relevant' => [$id, 'TEXT']},
                                          {'query'    => 'pfeifer', 
                                           'database' => $db2});

              If *host* is `'localhost'' and *database*`.src' exists, local
              search is performed instead of connecting a server.

              `Wais::Search' will open `$Wais::maxnumfd' connections in
              parallel at most.

  Wais::Retrieve

              `Wais::Retrieve' should be called with named parameters (i.e.
              a hash). Valid parameters are database, host, port, docid, and
              type.

                      $result = Wais::Retrieve('database' => $db,
                                               'docid'    => $id, 
                                               'host'     => 'ls6',
                                               'type'     => 'TEXT');

              Defaults are the same as for `Wais::Search'. In addition type
              defaults to `'TEXT''.

  `Wais:Result'

              The functions `Wais::Search' and `Wais::Retrieve' return
              references to objects blessed into `Wais:Result'. The
              following methods are available:

              diagnostics
                        Returns and array of diagnostic messages. Each
                        element (if any) is a reference to an array
                        consisting of

                             tag       The tag of the corresponding search
                                       request or `'document'' if the
                                       request was a retrieve request.

                             code      The WAIS diagnostic code.

                             message   A textual diagnostic message.

              header    Returns and array of WAIS document headers. Each
                        element (if any) is a reference to an array
                        consisting of

                             tag       The tag of the corresponding search
                                       request or `'document'' if the
                                       request was a retrieve request.

                         score
                             lines     Length of the corresponding dcoument
                                       in lines.

                             length    Length of the corresponding document
                                       in bytes.

                         headline
                             types     A reference to an array of types
                                       valid for docid.

                             docid     A reference to the WAIS identifier
                                       blessed into `Wais::Docid'.

              text      Returns the text fetched by `Wais::Retrieve'.

Dictionary
              There are a couple of functions to inspect local databases.
              See the inspect script in the distribution. You need the
              Curses module to run it. Also adapt the directory settings in
              the top part.

  Wais::dictionary

                     %frequency = Wais::dictionary($database);
                     %frequency = Wais::dictionary($database, $field);
                     %frequency = Wais::dictionary($database, 'foo*');
                     %frequency = Wais::dictionary($database,  $field, 'foo*');

              The function returns an array containing alternating the
              matching words in the global or field dictionary matching the
              prefix if given and the freqence of the preceding word. In a
              sclar context, the number of matching word is returned.

  Wais::list_offset

              The function takes the same arguments as Wais::dictionary. It
              returns the same array rsp. wordcount with the word
              frequencies replaced by the offset of the postinglist in the
              inverted file.

  Wais::postings

                     %postings = Wais::postings($database, 'foo');
                     %postings = Wais::postings($database, $field, 'foo');

              Returns and an array containing alternating numeric document
              id's and a reference to an array whichs first element is the
              internal weight if the word with respect to the document. The
              other elements are the word/character positions of the
              occurances of the word in the document. If freeWAIS-sf is
              compiled with `-DPROXIMITY', word positions are returned
              otherwise character postitions.

              In an scalar context the number of occurances of the word is
              returned.

  Wais::headline

                     $headline = Wais::headline($database, $docid);

              The function retrieves the headline (only the text!) of the
              document numbered `$docid'.

  Wais::document

                     $text = &Wais::document($database, $docid);

              The function retrieves the text of the document numbered
              `$docid'.

Protocol
  Wais::generate_search_apdu

                     $apdu = Wais::generate_search_apdu($query,$database);
                     $relevant = [$id1, 'TEXT', $id2, 'HTML'];
                     $apdu = Wais::generate_search_apdu($query,$database,$relevant);

              Document id's must be of type `WAIS::Docid' as returned by
              `Wais::Result::header' or Wais::Search::header. $WAIS::maxdoc
              may be set to modify the number of documents to retrieve.

  Wais::generate_retrieval_apdu

                     $apdu = Wais::generate_retrieval_apdu($database, $docid, $type);
                     $apdu = Wais::generate_retrieval_apdu($database, $docid, 
                                                           $type, $chunk);

              Request to send the `$chunk''s chunk of the document whichs id
              is `$docid' (must be of type `WAIS::Docid'). $chunk defaults
              to `0'. $Wais::CHARS_PER_PAGE may be set to influence the
              chunk size.

  Wais::local_answer

                     $answer = Wais::local_answer($apdu);

              Answer the request by local search/retrieval. The message
              header is stripped from the result for convenience (see the
              code of `Wais::Search' rsp. documentaion of Wais::Search::new
              below).

  Wais::Search::new

                     $result = Wais::Search::new($message);

              Turn the result message in an object of type `Wais::Search'.
              The following methods are available: diagnostics, header, and
              text. Result of the message is pretty the same as for
              `Wais::Result'. Just the tags are missing.

  Wais::Docid::new

                     $result = new Wais::Docid($distserver, $distdb, $distid,
                                   $copyright,  $origserver, $origdb, $origid);

              Only the first four arguments are manatory.

  Wais::Docid::split

                     ($distserver, $distdb, $distid, $copyright, $origserver, 
                      $origdb, $origid) = Wais::Docid::split($result);
                     ($distserver, $distdb, $distid) = Wais::Docid::split($result);
                     ($distserver, $distdb, $distid) = $result->split;

              The inverse of `Wais::Docid::new'

              diagnostics
                        Return an array of references to `[$code, $message]'

              header    Return an array of references to `[$score, $lines,
                        $length, $headline, $types, $docid]'.

              text      Returns the chunk of the document requested. For
                        documents larger than $Wais::CHARS_PER_PAGE more
                        than one request must be send.

  Wais::Search::DESTROY

              The objects will be destroyed by Perl.

VARIABLES
              $Wais::version
                        Generated by: `sprintf(buf, "Wais %3.1f%d", VERSION,
                        PATCHLEVEL);'

              $Wais:errmsg
                        Set to an verbose error message if something went
                        wrong. Most functions return `undef' on failure
                        after setting `$Wais:errmsg'.

              $Wais::maxdoc
                        Maximum number of hits to return when searching.
                        Defaults to `40'.

              $Wais::CHARS_PER_PAGE
                        Maximum number of bytes to retrieve in a single
                        retrieve request. `Wais:Retrieve' sends multiple
                        requests if necessary to retrieve a document.
                        `CHARS_PER_PAGE' defaults to `4096'.

              $Wais::timeout
                        Number of seconds to wait for an answer from remote
                        servers. Defaults to 120.

              $Wais::maxnumfd
                        Maximum number of file descriptors to use
                        simultaneously in `Wais::Search'. Defaults to `10'.

Access to the basic freeWAIS-sf reduction functions
    Wais::Type::stemmer(*word*)
              reduces *word* using the well know Porter algorithm.

                AU: Porter, M.F.
                TI: An Algorithm for Suffix Stripping
                JT: Program
                VO: 14
                PP: 130-137
                PY: 1980
                PM: JUL

    Wais::Type::soundex(*word*)
              computes the 4 byte Soundex code for *word*.

                AU: Gadd, T.N.
                TI: 'Fisching for Werds'. Phonetic Retrieval of written text in
                    Information Retrieval Systems
                JT: Program
                VO: 22
                NO: 3
                PP: 222-237
                PY: 1988

    Wais::Type::phonix(*word*)
              computes the 8 byte Phonix code for *word*.

                AU: Gadd, T.N.
                TI: PHONIX: The Algorithm
                JT: Program
                VO: 24
                NO: 4
                PP: 363-366
                PY: 1990
                PM: OCT

BUGS
              `Wais::Search' currently splits the request in groups of
              `$Wais::maxnumfd' requests. Since some requests of the group
              might be local and/or some might refer to the same host/port,
              groups may not use all `$Wais::maxnumfd' possible file
              descriptors. Therefore some performance my be lost when more
              than `$Wais::maxnumfd' requests are processed.

AUTHORS
              Ulrich Pfeifer <pfeifer@ls6.cs.uni-dortmund.de>, Norbert
              Goevert <goevert@ls6.cs.uni-dortmund.de>

