
   ==================================================================
   ===                                                            ===
   ===           GENESIS Distributed Memory Benchmarks            ===
   ===                                                            ===
   ===                           PDE1                             ===
   ===                                                            ===
   ===   3-Dimensional Poisson Solver Using Red-Black Relaxation  ===
   ===                                                            ===
   ===     Original authors:        J. Klose, M. Lemke            ===
   ===     PARMACS macros  :        Clemens-August Thole          ===
   ===              PALLAS GmbH                                   ===
   ===              Hermulheimer Str. 10                          ===
   ===              5040 Bruhl, GERMANY                           ===
   ===     tel.:+49-2232-18960   e-mail:karls@pallas-gmbh.de      ===
   ===                                                            ===
   ===     Copyright: PALLAS GmbH                                 ===
   ===                                                            ===
   ===          Last update: April 1992; Release: 2.0             ===
   ===                                                            ===
   ==================================================================


1. Description
--------------
The benchmark solves the Poisson-Equation on a 3-dimensional
grid by parallel red-black relaxation.

Many problems in the area of scientific computing are formulated in
terms of partial differential equations. Typical application areas are
Computational Fluid Dynamics, Meteorology, Climate Research, Oil
Reservoir simulation and others. The resulting PDEs are discretized
on some grid structure. The PDE is then represented by a large set of
(non)linear equations, each of which couples values at neighbouring grid
point with each other. For time-dependent problems, this set of
equations has to be determined (integration phase) and solved (solution
phase) at each time step.

This benchmark is an extreme example of the class of PDE solvers, as
due to the simplicity of the discretization of Poisson's equation the
number of floating point operations per gridpoint is quite small relative
to more complex PDEs. The ratio of computation to communication is
thus rather low.

The parallelization is performed by grid splitting. A part of the
computational grid is assigned to each processor. After each
computational step, values at the boundary of the subgrids are exchanged
with nearest neighbours.


2) Operating Instructions
-------------------------

A. Sequential version

The sequential version automatically produces results for a range of
problem sizes. The problem size is determined by the grid size, which
is related to the parameter N. The number of grid points in each
direction is 2**N + 1, giving (2**N + 1)**3 points in 3 dimensions.

The parameter N is varied from 3 to MMAX within the benchmark and
the benchmark performance calculated for each resulting problem size.
The value of MMAX can be changed by editing the PARAMETER statement
in the file pde1.inc. The maximum value of MMAX which is consistent with 
the available processor memory should be chosen.
The memory required is of the order { 2 * (2**MMAX + 1)**3 } * 8 bytes
For the largest problem size the relative sizes of the grid in each
dimension is varied whilst keeping the overall problem size roughly
constant. This variation in the shape of the grid for the same problem
size can increase performance by allowing more efficient vectorization.

For each problem size, the benchmark gives the performance firstly for the
standard implementation and then for potential optimisations using
loop unrolling to depth 2 in each dimension and also for a single loop
in 2 dimensions.

Compiling and running the sequential benchmark:

1) Change value of MMAX in file pde1.inc, if appropriate, to give maximum 
   problem size compatible with the available memory. (see above)

2) To compile and link the benchmark type:   make  

3) To run the benchmark type:     pde1
 
   Output from the benchmark is written to the file "result"


B) Distributed Version

In the distributed version of the program the problem size and the
number of processes are input from the standard input on channel 5.

The total lattice size is determined by the parameter NN. 
The number of grid points in each direction is 2**NN + 1, 
giving (2**NN + 1)**3 points in 3 dimensions.
This lattice is distributed over a 3-dimensional processes array of
dimensions  NXPROC * NYPROC * NZPROC,
Where NXPROC, NYPROC, NZPROC are greater than or equal to 1

The size of the local lattice determines the size of the workspace
required in the node program. The size of this workspace is determined
by a PARAMETER statement in the file node.u of the form:

       PARAMETER (NWORKD = 10000)

The size of NWORKD should be changed if necessary to ensure that it is
greater than or equal to (2**NN + 4)**3/(NPROCX * NPROCY *NPROCZ)
The maximum size of NWORKD, and hence of the local lattice size, is
constrained by the available node memory. The node memory required
will be of the order 3 * NWORKD * 8 bytes. The maximum value of NN is 
likely to be 6 or 7.


Compiling and running the distributed benchmark:

1) Change value of NWORKD in file node.u, if appropriate, to give maximum 
   work space compatible with the available memory. (see above)

2) To compile and link the benchmark type:   make  

3) To run the benchmark type:     host
 
4) Input parameters NN, NXPROC, NYPROC, NZPROC (see above) on standard
   input.

   Output from the benchmark is written to the file "result"


