THIS CODE WAS OBTAINED FROM THE ARCHIVES AT pprg.unm.edu, AND WAS
MODIFIED TO BE DOUBLE PRECISION.

           Scott Wilson
           Center for High Technology Materials
           Albuqerque, NM 87131
           (505)277-0780
           swilson@chtm.unm.edu

The author of these routines was Richard Krukar, University of New Mexico,
who wrote a program that writes heavily inlined FFT routines. All butterflies
of size <= 128 are done FULLY UNROLLED. Butterflies larger than that are done
by recursively calling ones of size 128 or smaller. This whole code is a gamble
on the direction that processor architectures are going toward: 1) longer pipes,
2) RISC, 3) memory systems that can feed their CPUs at full speed.

The following is the README that came with the code:
===========================================================================

This is an initial release of some heavily inlined fft routines.  I
do not consider them finished, but there was enough demand that these
ones will be released.  Eventually, they will be replaced by some
functionally equivalent routines that should be faster and require
a little less memory.  These routines can handle at most 4096 complex
points or 8192 real points.  The restriction is due to the size of the
weight tables.  A little more smarts in the routines and new weight
tables will be computed as needed.  Support for any size will also be
added soon.  Heres what there is:

	fft( x, length)
	ifft( x, length)

    These are the complex real and inverse fft routines.  They expect x to
be a length long array of "struct complex { double r,i; };" numbers.

	rfft( x, length)
	rifft( x, length)

    These are the real number versions of the above routines.  Same as the
above, but array is of type double. NOTICE: On return the Nyquist component
is stored in the imaginary part of the DC component.  I think Numerical
Recipes discusses this.

	dintime( x, length, wtab )
	idintime( x, length, wtab )

   These are the recursive routines that start the process.  As soon as the
fft length gets to the unroll value, these routines hand of the work.  The
degree of unrolling indicated that the recursion was not the place to go
looking for speed yet. The weights are passed in the wtab array.

	dint( x, length, wtab)
	idint( x, length, wtab)

   The computer generated unrolled routines.  Changes here can result in
nice speedups.  128 point ffts are presently unrolled.  Didn't go higher
cause the code length got gross and took too long to compile.  If your machine
can't handle the 128 point routine, let me know.  I can post shorter lengths
and directions on using them.

	bitrev(x, length)
	bitrev512(x, length)

    Unshuffling routines.  They just do the typical bit reversal stuff.  All
lengths up to and including 512 are unrolled.  If the length is longer, it
calls the regular version.

    Coming soon:
	calc_weights(length) - In charge of large weight tables.
	2dfft() - two dimensional complex forward fft.
	2difft() - two dimensional complex inverse fft.
	2drfft() - two dimensional real forward fft.
	2drifft() - two dimensional real inverse fft.

    These routines already exist in specialized form, but are not general
enough to release.

	Richard Krukar (krukar@pprg.unm.edu)

