Existing bugs/problems (some may have been solved accidentally):

Non-standard error codes returned by cache.  (In fact we return 200).

Half-loaded documents are possible (says James Beal).  Fix it with an
interrupt handler.

The d directive introduces the possibility of cache servers looping.
No measures are taken against this.

Core dumps ('malformed header from script' in NCSA server) upon leaving
out the CacheServerMaintainer parameter.  A bug in fix_server_maintainer()
no doubt.  (This bug may have been fixed?)

NCSA httpd 1.1 leaves scripts hanging as zombie processes.  Loads of
'nph-cache' processes can accumulate over time.  There is no attempt
yet to fix this.  (Wrong BSD setting, or an independent problem?)
(Maybe this isn't a problem with Lagoon itself?  Can't reproduce it.)

No locking yet, so simultaneous access on the same cached document is bound to
give problems.

Sometimes, a page is partly duplicated in the cache.  I can't consistently
reproduce this behavior.

getdomainname() does not exist on all systems - find a replacement.
(related to the core dump above)

No protection (locking) against simultaneous use.

Error messages do not mention the server maintainer's address.  (Fixed?)

Filenames grow too long, especially for form-based queries.
Lagoon requires a filesystem that supports filenames as long as URLs
(120 characters are common).  There is no safeguard against this.
Turning off query caching helps, but doesn't solve the problem.

Dates are written to cache headers incorrectly.
(Some improvement has been made.)

A list of pages for which things may still go wrong, for at least partly
unidentified reasons:

http://web.cnam.fr/Images/Usenet/abpm/summaries/

Inline images (ISMAPs) don't display in
  the old Teletext interface;
  Brandon Plewe's Europe map (is a redirection, I think)

In
  http://www.dik.maschinenbau.th-darmstadt.de/demos/simple/mjackson.mpg
the MPEG is truncated (this occurred on theatre), maybe a timeout problem?

To be done:

Implement If-modified-since (see HTTP spec and
  http://www.ics.uci.edu/WebSoft/caching.html).

Process headers more intelligently; use a separate file and with dbm,
use a database.

Speed up processing of documents that don't need to be cached.

Add a query facility that enables users to find out in what way they are
accessing Lagoon!

Protect against recursive cache entries.

Use a separate configuration file for cache cleanout and make it accept
more criteria.

Change http_accept[] in mime.c to have a normal length.

Improve portability e.g. the string.h / strings.h thing.
strcasecmp() is in strings.h (op mijn machine), strdup() in string.h
Another example: fclose() is in /usr/5include/stdio.h, not /usr/include's.
ndbm.h is in /usr/ucbinclude on at least one machine.

Allow cache_prefix to be specified in http://host:port/prefix/ format

Time stuff in MIME headers; and use them in refreshing the cache.

Add the option to use unescaped URLs.

see if default paths can be obtained from ServerRoot in some way

Accept more HTTP methods (HEAD).

Maybe not in all cases, whitespace is skipped in rel_to_abs() which may
result in some untranslated links or strange behavior in general.
It also screws up on nested tags and escaped <>s.

Implement a simple database on the files.

Use the libwww HTML parsing library.

Use the most recent NCSA httpd server code for util.h and mime.h etc.
(using 1.1 at present)

Be somewhat smarter about identical documents with different URLs.

Be a better client - for one thing, determine the type of untyped data from
the filename extensions of the URLs by which they are called.
Accept returns without MIME headers, fill in defaults.

Have a better mechanism to determine what to cache, and what links to
translate.  I think we wish to translate iff we wish to cache.
Possible categorizations: http:/other protocols; by file extension;
query/non-query; certain types of query (e.g. without commas ...).

Refreshment also with config file based on MIME type and size.

Improve error messages - error codes and msgs to log, not to end user!

Some of the thousands of extra features possible thanks to the cache:

Back links.
Automatic merging of document trees.
Topological analysis (point out overview nodes, etc.)
Salton-like link creation.
Bidirectional links.
Annotation mechanisms.


Bug fixes and (some) feature enhancements (don't read this.  You don't
want to know):

From 0.1 to 0.2:

Caching didn't look at query prefixes - consequently, an incorrect result
for subsequent queries on the same script

FORM ACTION=URLs are now recognized in translation

There was a bug in host parsing that resulted in an extra leading slash
sometimes being sent to the remote server.

From 0.2 to 0.3:

A small bug repaired that blocked translation of lowercase refs.
It is configurable whether or not query results are cached.

Working on a small dbm database for storing files - but it doesn't work yet.
Some further changes in where things are put (config.c, constants.h).
Removed the cache subdirectory - everything is now in ./src/.
Separate INSTALL script, courtesy of debra.
Using standard path /usr/local/etc/httpd in configuration parameters.
Some bugs regarding configuration were removed.

From 0.3 to 0.4:

Parameters are now parsed into global variables in config.h.

A bug in strange relative URLs was removed:
//www.win.tue.nl/wwwgate/phf used to be translated to
http://wsinis10:4322/mirror/http://www.win.tue.nl//www.win.tue.nl/wwwgate/phf
Is this a legal relative URL with thjis interpretation?
I fixed it anyway.

Impatient read() replaced with readn() (the W3 Search Page failed to return
anything but headers).

Removed a bug causing host lookups to fail.

Added the option to not escape URLS in the translation.

Made print_error() return document titles of the form %d Caching Error which is
used in Mosaic fish search.

From 0.4 to 0.5:

Fixed a problem that caused the cache to be impossible to use from a machine
in a different Internet domain.  in case server and script run in different
Internet domains, the problem remains.  The real problem is in the
$SERVER_NAME variable, which, used with NCSA httpd 1.1 at least, does not
necessarily provide domain information.

Name change: print_error() is now die().

The configuration variable translate_escaped was not set correctly;
more variables were added.

On most machines but my own, the cache didn't work at all due to overflowing
strings.  This should be over now.

A new version of the mime.c code was picked and adapted from NCSA httpd 1.1.

print_error() was remamed to what it does, die().

From 0.5 to 0.6:

A nasty bug in html.c removed that caused the script to crash on certain
Solaris machines.

A simple cache refreshment scheme was added.

From 0.6 to 0.7:

Nothing much - just a bug fix.

From 0.7 to 0.8:

Merged proto.h into system.h

Support for proxy added.

Somewhat more sophisticated cache refreshing and a separate program to
clean out the cache - not very interesting.

Some directories are checked for existence.

From 0.8 to 0.9:

Better refresh and cleanup control with separate configuration file.
Separate cleanup program finished to keep cache empty (use from crontab).

From 0.9 to 0.10:

More efficient output of cached documents.
Identifying headers are sent to the remote server.
Renaming of configuration variables.
Bug fixes in string handling.
Bug fix in # URL anchor handling (reported by ngs@ukc.ac.uk).
Bug fix for whitespace skipping in html.c (contributed by slshen@lbl.gov).

From 0.10 to 0.11:

A correct cgi output header is produced.
A better request header is produced (we now get inline images from CERN httpd).
Support for a d directive to use remote cache servers and/or HTTP gateways
(thanks to ngs@ukc.ac.uk for help with design).
Redirections now work (and error handling should be better, too).

From 0.11 to 0.12:

Redirections now work even for the mirror method.
Cache files are now world readable.
Some error messages used to screw up - fixed.

From 0.12 to 0.13:

cache.c was rewritten - the major functional improvement is the possibility
of sending documents to stdout without caching them first
improved handling of queries with query result caching off

new configuration directive CacheOnlyAfter to turn off caching for certain docs

POST method implemented
POSTed requests are *never* cached

bugs introduced in 0.12 were removed (some reported by Michael Pellmann)

t directive implemented to complement d directive for 'mirror' use
