Copyright 1988, 1989 Hans-J. Boehm, Alan J. Demers
This material may be freely distributed, provided this notice is retained.
This material is provided as is, with no warranty expressed or implied.
Use at your own risk.

This is version 1.6.

HISTORY -

  This collector was developed as a part of research projects supported in
part by the National Science Foundation and the Defense Advance Research
Projects Agency.  The SPARC specific code was contributed by Mark Weiser
(weiser.pa@xerox.com).  The Encore Multimax modifications were supplied by
Kevin Kenny (kenny@m.cs.uiuc.edu).  The adaptation to the RT is largely due
to Vernon Lee (scorpion@rice.edu), on machines made available by IBM.
The HP specific code and a number of good suggestions for improving the
generic code are due to Walter Underwood (wunder@hp-ses.sde.hp.com).
(Blame for misinstallation of those modifications goes to the first author,
however.) Some of the improvements incorporated in this version were
suggested by David Chase at Olivetti Research.

  This is intended to be a general purpose, garbage collecting storage
allocator.  The algorithms used are described in:

Boehm, H., and M. Weiser, "Garbage Collection in an Uncooperative Environment",
Software Practice & Experience, September 1988, pp. 807-820.

  Many of the ideas underlying the collector have previously been explored
by others.  (We discovered recently that Doug McIlroy wrote a more or less
similar collector that is part of version 8 UNIX (tm).)  However none of this
work appears to have been widely disseminated.

  The tools for detecting storage leaks described in the above paper
are not included here.  There is some hope that they might be released
by Xerox in the future.


GENERAL DESCRIPTION

  Since the collector does not require pointers to be tagged, it does not
attempt to insure that all inaccessible storage is reclaimed.  However,
in our experience, it is typically more successful at reclaiming unused
memory than most C programs using explicit deallocation.

  In the following, an "object" is defined to be a region of memory allocated
by the routines described below.  

  Any objects not intended to be collected must be pointed to either
from other such accessible objects, or from the registers,
stack, data, or statically allocated bss segments.  It is usually assumed
that all such pointers point to the beginning of the object.  (This does
not disallow interior pointers; it simply requires that there must be a
pointer to the beginning of every accessible object, in addition to any
interior pointers.  Conditionally compiled code to check for pointers to the
interiors of objects is supplied.  As explained in "gc.h", this
may create other problems, however.)

  Note that pointers inside memory allocated by the standard "malloc" are not
seen by the garbage collector.  Thus objects pointed to only from such a
region may be prematurely deallocated.  It is thus suggested that the
standard "malloc" be used only for memory regions, such as I/O buffers, that
are guaranteed not to contain pointers.  Pointers in C language automatic,
static, or register variables, are correctly recognized.

  The collector does not understand SunOS 4.x dynamic libraries.  Space
allocated by the dynamic linker past at addresses higher than "_end" will not
be seen by the collector.  (We have not had a chance to track down exactly
what ends up there.  Some data does.  If we understood exactly where things
ended up, it would probably be easy to fix this problem.)  When in doubt,
use -Bstatic.

  The collector is designed to minimize stack growth if list-like structures
store the link in their first field; for example
 
  struct list_node {
	  struct list_node * link; /* first field */
	  ...
	  };
  
instead of

  struct list_node {
	  ...
	  struct list_node * link; /* last field */
	  };

  This should not matter for lists that are less than tens of thousands
of elements long.

  Signal processing for most signals is deferred during collection. (The
necessary calls to sigsetmask may need to be commented out under a pure
system V implementation, since there does not seem to be an equivalent
call.  Multiple calls to signal are likely to be slow.)

INSTALLATION AND PORTABILITY

  As distributed, the collector produces garbage collection statistics
during every collection.  Once the collector is known to operate properly,
these can be suppressed by defining the macro SILENT at the top
of "gc.h".  (The given statistics exhibit a few peculiarities.
Things don't appear to add up for a variety of reasons, most notably
fragmentation losses.  These are probably much more significant for the
contrived program "test.c" than for your application.)

  Note that typing "make test" will automatically compare the output
of the test program against the correct output.  This does require that
collection statistics have been disabled.

  The Makefile will generate a library gc.a which you should link against.
It is suggested that if you need to replace a piece of the collector
(e.g. mark_roots.c) you simply list your version ahead of gc.a on the
ld command line, rather than replacing the one in gc.a.

  The collector currently is designed to run essentially unmodified on
the following machines:

	    Sun 3
	    Sun 4  (except under some versions of 3.2)
	    Vax under Berkeley UNIX
	    Sequent Symmetry  (no concurrency)
	    Encore Multimax   (no concurrency)
	    MIPS M/120 (and presumably M/2000) (RISC/os 4.0 with BSD libraries)
	    IBM PC/RT  (Berkeley UNIX)
	    HP9000/300

  For these machines you should check the beginning of gc.h
to verify that the machine type is correctly defined.  On an Encore Multimax,
MIPS M/120, or a PC/RT, you will also need to make changes to the
Makefile, as described by comments there.

  In all cases we assume that pointer alignment is consistent with that
enforced by the standard C compilers.  If you use a nonstandard compiler
you may have to adjust the alignment parameters defined in gc.h.

  On a MIPS machine or PC/RT, we assume that no calls to sbrk occur during a
collection. (This is necessary due to the way stack expansion works on these
machines.) This may become false if certain kinds of I/O calls are inserted
into the collector.

  For machines not already mentioned, or for nonstandard compilers, the
following are likely to require change:

1.  The parameters at the top of gc.h and the definition of
    TMP_POINTER_MASK further down in the same file.

2.  mach_dep.c.
      The most important routine here is one to mark from registers.
    The distributed file includes a generic hack (based on setjmp) that
    happens to work on many machines, and may work on yours.  Try
    compiling and running setjmp_test.c to see whether it has a chance of
    working.  (This is not correct C, so don't blame your compiler if it
    doesn't work.  Based on limited experience, register window machines
    are likely to cause trouble.  If your version of setjmp claims that
    all accessible variables, including registers, have the value they
    had at the time of the longjmp, it also will not work.  Vanilla 4.2 BSD
    makes such a claim.  SunOS does not.)
      This file also contains interface routines that save registers
    not normally preserved by the C compiler.  These are intended for
    a fast assembly language interface to the allocator, such as the
    one that is used by the Russell compiler.  (These routines work
    only for small objects.  A call to one of these routines ensures
    that the free list for a particular object size is nonempty.  Normally
    in-line code would call these routines only after finding an empty free
    list for an about-to-be-allocated object size.)  If a pure C interface
    is used, these routines are not needed.
      If your machine does not allow in-line assembly code, or if you prefer
    not to use such a facility, mach_dep.c may be replaced by a .s file
    (as we did for the MIPS machine and the PC/RT).

3.  mark_roots.c.
      These are the top level mark routines that determine which sections
    of memory the collector should mark from.  This is normally not
    architecture specific (aside from the macros defined in gc.h and
    referenced here), but it can be programming language and compiler
    specific.  The supplied routine should work for most C compilers
    running under UNIX.

4.  The sigsetmask call does not appear to exist under system V UNIX.
    It is used by the collector to block and unblock signals at times at
    which an asynchronous allocation inside a signal handler could not
    be tolerated.  Under system V, it is possible to remove these calls,
    provided no storage allocation is done by signal handlers.  The
    alternative is to issue a sequence of system V system calls, one per
    signal that is actually used.  This may be a bit slow.

  For a different versions of Berkeley UN*X or different machines using the
Motorola 68000, Vax, SPARC, 80386, NS 32000, PC/RT, or MIPS architecture,
it should frequently suffice to change definitions in gc.h.


THE C INTERFACE TO THE ALLOCATOR

  The following routines are intended to be directly called by the user.
Note that only gc_malloc and gc_init are necessary.  Gc_realloc is provided
for applications that already use realloc.  The remaining routines are used
solely to enhance performance.  It is suggested that they be used only after
initial debugging.

1)  gc_init()
    - called once before allocation to initialize the collector.

2)  gc_malloc(nbytes)
    - allocate an object of size nbytes.  Unlike malloc, the object is
      cleared before being returned to the user.  (For even better performance,
      it may help to expand the relevant part of gc_malloc in line.
      This is done by the Russell compiler, for example.)  Gc_malloc will
      invoke the garbage collector when it determines this to be appropriate.
      (A number of previous collector bugs resulted in objects not getting
      completely cleared.  We claim these are all fixed.  But if you encounter
      problems, this is a likely source to check for.  The collector tries
      hard to avoid clearing any words that it doesn't have to.  Thus this
      is a bit subtle.)  Gc_malloc fails (generates a segmentation fault)
      if it is called with a 0 argument.

3)  gc_malloc_atomic(nbytes)
    - allocate an object of size nbytes that is guaranteed not to contain any
      pointers.  The returned object is not guaranteed to be cleeared.
      (Can always be replaced by gc_malloc, but results in faster collection
      times.  The collector will probably run faster if large character
      arrays, etc. are allocated with gc_malloc_atomic than if they are
      statically allocated.)

4)  gc_realloc(object, new_size)
    - change the size of object to be new_size.  Returns a pointer to the
      new object, which may, or may not, be the same as the pointer to
      the old object.  The new object is taken to be atomic iff the old one
      was.  If the new object is composite and larger than the original object,
      then the newly added bytes are cleared (we hope).  This is very likely
      to allocate a new object, unless MERGE_SIZES is defined in gc.h.
      Even then, it is likely to recycle the old object only if the object
      is grown in small additive increments (which, we claim, is generally bad
      coding practice.)

5)  gc_free(object)
    - explicitly deallocate an object returned by gc_malloc or
      gc_malloc_atomic.  Not necessary, but can be used to minimize
      collections if performance is critical.

6)  expand_hp(number_of_4K_blocks)
    - Explicitly increase the heap size.  (This is normally done automatically
      if a garbage collection failed to reclaim enough memory.  Explicit
      calls to expand_hp may prevent unnecessarily frequent collections at
      program startup.)

  The global variable dont_gc can be set to a non-zero value to inhibit
collections, e.g. during a time-critical section of code.  (This may cause
otherwise unnecessary expansion of the process' memory.)

  The variable non_gc_bytes, which is normally 0, may be changed to reflect
the amount of memory allocated by the above routines that should not be
considered as a candidate for collection.  Collections are inhibited
if this exceeds a given fraction (currently 3/4) of the total heap size.
The heap is simply expanded instead.  Careless use may, of course, result
in excessive memory consumption.

  Some additional tuning is possible through the parameters defined
near the top of gc.h.
  
  The two gc_malloc routines may be declared to return a suitable pointer
type.  It is not intended that gc.h be included by the user program.
If only gc_malloc is intended to be used, it might be appropriate to define:

#define malloc(n) gc_malloc(n)
#define calloc(m,n) gc_malloc((m)*(n))

  More complete emulations of the standard C allocation routines are
contained and described in "interface.c" (contributed by David Chase).

  No attempt is made to use obscure names for garbage collector routines
and data structures.  Name conflicts are possible.  (Running "nm gc.a"
should identify names to be avoided.)


ASSEMBLY LANGUAGE INTERFACE

  There is a provision for a very fast assembly language and/or in-line
C interface.  See the beginning comments in alloc.c.  On some architectures,
additional code must be supplied near the beginning of mach_dep.c for
this to work.  Using an assembly language interface, and partially
expanding the allocation code in-line, most allocations will take on the
order of 4 or 5 instructions each.  (Explicit deallocations can be kept
down to something similar if the object is atomic and of known size.
Note that in-line deallocation code for composite objects should clear
the object before returning it to the appropriate free list.)


BUGS

  Recently fixed bugs:

  Version 1.3 and immediately preceding versions contained spurious
assembly language assignments to TMP_SP.  Only the assignment in the PC/RT
code is necessary.  On other machines, with certain compiler options,
the assignments can lead to an unsaved register being overwritten.
Known to cause problems under SunOS 3.5 WITHOUT the -O option.  (With
-O the compiler recognizes it as dead code.  It probably shouldn't,
but that's another story.)

  Version 1.4 and earlier versions used compile time determined values
for the stack base.  This no longer works on Sun 3s, since Sun 3/80s use
a different stack base.  We now use a straightforward heuristic on all
machines on which it is known to work (incl. Sun 3s) and compile-time
determined values for the rest.  There should really be library calls
to determine such values.

  Version 1.5 and earlier did not ensure 8 byte alignment for objects
allocated on a sparc based machine.

  Please address bug reports to boehm@rice.edu.  If you are contemplating
a major addition, you might also send mail to ask whether it's already
been done.

