How to Make a Collection - A Quick Introduction

                                 Cristian Francu
                              francu@cs.rutgers.edu
                                  Jan 12, 2000

First, go to the directory where you installed GSDL.  In order to make
sure that you can run certain perl scripts you should run either
setup.bash or setup.csh, depending on the shell you're using:

source setup.bash               or

source setup.csh

This scripts set variables GSDLHOME, GSDLOS and PATH.  Of course you
can include them in .cshrc or .profile in order to have them set
automatically.

Next, you should run mkcol.pl in order to create the collection. This
perl script creates the necessary environment for the collection, like
directories and the file collect.cfg. The script mkcol.pl is located
in the directory

bin/script

This directory contains all the scripts that you'll need, so it's a
good idea to peek at it.

If you run mkcol.pl it will tell you how to use it:

$ mkcol.pl

  usage: mkcol.pl [options] collection-name

  options:
   -creator email      Your email address
   -maintainer email   The current maintainer's email address
   -public true|false  If this collection has anonymous access
   -beta true|false    If this collection is still under development

After running mkcol.pl the collection will reside in
collect/<collection-name>. The next thing you should do is edit the
file

collect/<collection-name>/etc/collect.cfg

You should do at least two things: one is to add a line like this:

collectionmeta iconcollection    "http://sequence.rutgers.edu/~gsdl/collect/cstr/images/cstr.jpg"

This line will set the icon of the collection (the image that users
will click to access the collection once it's on-line). Make sure you
type a proper URL of the image between quotes. You should do this at
this moment, because if you want to change the icon you have to
rebuild the collection, which is a time consuming operation. Hey,
gurus, is there any simpler way to change the icon of the collection
once the collection is already built?

Now, the second thing you should do in the collect.cfg file is add the
proper plugin on the lines:

plugin        GMLPlug 
plugin        TEXTPlug 
plugin        ArcPlug 
plugin        RecPlug

The plugins you need depend on the format of your documents. If the
documents are plain text, or GSDL's own format named GML you don't
need to change anything.  If your documents are in other formats you
should look for a proper plugin in the directory

perllib/plugins

A very useful plugin is HTMLPlug which can process files with .html and
.htm file extensions. You would normally replace the TEXTPlug plugin with
the one you want to use. Say your collection is in html format, than you
would change the plugin lines to:

plugin       GMLPlug 
plugin       HTMLPlug
plugin       ArcPlug 
plugin       RecPlug

You're finally done with collect.cfg. Suppose you are creating a
collection named "tutorial". The next thing you should do is go to the
directory collect/tutorial and create two directories, import and
archives:

cd collect/tutorial
mkdir import
mkdir archives

The material to be indexed should reside in 'import' directory. You
can either copy it there, or create links to its directory. The
material to be indexed can contain directories and subdirectories. The
building script will go recursively into them and search for files to
be indexed.  This is what the plugin RecPlug does.

So, the next thing to do is make sure you have the documents to be
indexed in the import directory.  You are now ready to run the
processing scripts. The fastest way to build a collection is in two
steps:

1. process the documents in 'import' directory and generate their
equivalent in .gml format in 'archives' directory

2. process the documents in 'archives' directory (now in .gml format)
and create the necessary indexes in 'building' directory

For the first step just run the script import.pl:

import.pl tutorial

Depending on the size of your documents this might take between
minutes and hours. You might also want to redirect stdout and stderr
to capture the possible errors to files. You can also change the
verbosity of the script, just run it without arguments and you'll get
a complete list of options.

For the second step run the script buildcol.pl:

buildcol.pl tutorial

Again, depending on the size of your material to be processed this may
take minutes to hours. Keep in mind that you must have enough space on
your hard drive for both steps, as the .gml documents eat up about the
same amount as the original documents.

If everything went fine, you should now have a directory named
'building' under collect/tutorial. That directory contains the results
of the processing of your documents.  In order to use it you have to
move the content of 'building' directory to a new directory named
'index'. First create it:

cd collect/tutorial
mkdir index

Then move the content:

mv building/* index

As long as your collect.cfg file contains the line

public  true

and the collection built successfully the gsdl software should
automatically notice your new collection. The collection should now appear
on the main page, which can be accessed at:

http://hostname.domain.edu/cgi-bin/library?a=p&p=home

(replace hostname.domain.edu with the name of your server.)

Keep in mind these instructions are just a jump start
to get you quickly on the run. There are more options
you can use and you can explore more of GSDL by reading
the documentation carefully. You can also email the
creators for further details.
