Subject: 	An EXTENTED ABSTRACT to HPCN Europe 1994
		April 18-20, 1994
		Munich, Germany


Title:		EASYPVM - an EnhAnced Subroutine librarY on the top of PVM

Author: 	Sami Saarinen
		Center for Scientific Computing (CSC)
		P.O.Box 405, Tietotie 6
		FIN-02101 Espoo
		Finland
		e-mail: Sami.Saarinen@csc.fi


Introduction
------------

  EASYPVM is a message passing library built on the top of the Parallel
Virtual Machine (PVM) version 3.2. The purpose of EASYPVM is to
considerably facilitate the use of PVM-programs by supplying a small set of
functions that hide the details of PVM-message passing syntax and process
creation. It also provides very coordinated and clear message passing
programming model and eliminates need of unnecessary parameters in message
passing calls.

  EASYPVM offers an easy way to port mesh-oriented applications created for
system like Intel Paragon or Cray T3D -like to run on workstation clusters.
EASYPVM nearly mimics Intel message passing library call syntax (NX-lib).
The exceptions are in principle such, that all messages are typeless and
each receive-function contains a clear address and a message tag where the
message is expected to arrive. Also, data is addressed not in bytes, but
words, thus more suitable for architecture independent communication.

  With a typeless message passing is meant a feature, where a programmer
fixes certain data type for subsequent data communication. Data is sent or
received using generic "send()" and "recv()" functions and the length of
the data arrays is not expressed in bytes, but in words of currently active
data type.  This offers a nearly trivial port to machines with different
word lengths (e.g. Cray vs. other systems).

  EASYPVM is capable for HOST-NODE and NODE-only (SPMD) approaches. A
single HOST can be a user written program or so called PVMSERVER, a generic
HOST-program that automatically drives and terminates the NODE-programs.
Also, NODEs need not be necessarely the same program, thus providing
support for full MIMD-like computation as well. Furthermore, a user written
HOST-program can operate simultaneously with several clusters of
NODE-programs which need not to be aware of each others.

  The underlying logical processor topology in EASYPVM is a 3D-torus found
in Cray T3D architecture. However, the so called nearest neighbour
-functions and direct indexed access of NODE-processes guarantee flexible
access of any NODE-process in a logical NODE-cluster. Also, hypercube-like
primitives are being implemented.

  EASYPVM provides many common global communication routines like global
syncronization, summation, product, min-max and concatenate functions. All
these functions are also typeless, that is, exactly the same function call
is suitable for INTEGER, REAL or DOUBLE PRECISION data types, as well as
for different COMPLEX data types.

  EASYPVM is in principle available for any platform where PVM is ported.
It can be used from C, C++ or Fortran programs. Unlike PVM, EASYPVM-calls
are similar in any of the aforementioned languages. The performance of
EASYPVM is determined by the underlaying PVM-implementation. Also, an
EASYPVM-application is can still call any of the PVM-routines. This means
that EASYPVM-library "only" offers an easier way access to some commonly
used message passing calls.


Description
-----------

  A typical PVM-application contains several message passing calls that are
repeated over and over again, thus making PVM-application hard to read,
longsome to code and (thus) errorprone. For example before sending the
message, one should

	- initiate a message buffer
	- pack message into that buffer
	- send or multicast buffer to specific destination(s)

  This requires in most cases three PVM-library calls. In PVM a single
message can be formed of different data types.  However, very seldom this
is the case in a real program. For that reason EASYPVM offers a subroutine
call "setdatatype()" which fixes the data type to be used in the subsequent
message passing operations. A Fortran-code may look like as follows:

	DOUBLE PRECISION A(N)
	... do something with your data ..
	CALL setdatatype(REAL8)    ! Sets data type for subsequent operations
	CALL send(dest,100,A,N)
	... do something else with your data ..
	CALL send(dest,100,A,N)    ! This sends another array of REAL8-values

  EASYPVM is also capable to send or receive matrix blocks. Using function
calls "send2d()" and "recv2d()" a specific sub-block of a matrix can be
accessed. This simplifies programming in many linear algebra applications.
	
  Another "weakness" in pure PVM is process creation. Every time a HOST
process creates one or several NODE-processes, nearly all of the following
steps must be repeated:
	
	- get my PVM task id via XXXmytid() call ( "XXX" = pvmf or pvm_ )
	- get my PVM parent process id from XXXparent()
	- call XXXspawn to create NODE processes
	- multicast task ids of the NODE processes to every NODE just
	  created so that each NODE knows about its neighbour NODEs
	
In a NODE side repeated steps are:

	- get my PVM task id via XXXmytid() call
	- get my PVM parent process id from XXXparent()
	- receive task ids of the sibling NODE processes from HOST

  This can be expressed in EASYPVM by simple subroutine calls and
providing NODE Definition File (ndf-file) for a HOST process. In EASYPVM,
a HOST calls simply:

	CALL createproc(ndffile)

And NODE:

	CALL attachproc()

  Thus these simple looking subroutine calls hide all the "details" from
programmer. User is only responsible to create appropriate ndf-file just
before starting the application or "on-the-fly". [Sometimes file is created
automatically by a specific script that comes with EASYPVM]. An ndf-file
contains typically information about 3D-torus dimensions, NODE-executable
names and parameter list and on which computers NODEs should be initiated.

  The last few mentionable and important features in the EASYPVM-library
are related to global commucation, Intel NX-library calls, nearest
neighbour functions and NODE-clustering.

  If all NODE processes share the same (physical) SPMD-program, one can
syncronize processes (as in NX-lib) in the following manner:

	CALL gsync()

  Only after all processes have reached this point in the source code, the
NODEs can proceed their execution. This feature is available in a very
difficult way through dynamic process groups in PVM 3.x and was called
"barrier()" in older PVM 2.4.x.

  Global summation and product are also available through a single,
typeless library call, unlike in Intel NX-lib. It is under user
responsibility to assure that a correct data type is in use. A code for
INTEGER summation may look like as follows:

	INTEGER MYVAR, RESULT, mynode
	MYVAR = mynode()
	CALL setdatatype(INTEGER4)
	CALL gsum(MYVAR,1,RESULT)  ! Perform global summation with INTEGERs

  As result each of the NODE processes have the sum of MYVAR's in the
RESULT-variable. It should be noted that the program is essentially the
same for REAL-variables:
	
	REAL MYVAR, RESULT
	INTEGER mynode
	MYVAR = mynode()
	CALL setdatatype(REAL4)
	CALL gsum(MYVAR,1,RESULT)  ! Perform global summation with REALs

  The only difference is in the parameter passed into the "setdatatype()"
-function.

  The previous two examples showed also some popular Intel NX-library
calls. For the time being EASYPVM contains "mynode()", "myhost()" and
global operations. Although, it should be noted, that they may have
slightly different parameter list or meaning than in NX-lib.

  With the nearest neighbour capability in EASYPVM is meant a set of
function calls that return NODE process index of every nearest neighbour
process in a 3D-torus. For example, exchanging data with the neighbour
processes in left and right side is coded as follows:

	INTEGER west, east
	DOUBLE PRECISION Xdataleft, Xdataright
	... user coding for Xdataleft or Xdataright ...
	CALL setdatatype(REAL8)
	... non-blocking send phase ...
	CALL send(left(),110,Xdataleft,1)
	CALL send(right(),110,Xdataright,1)
	... receive phase ...
	CALL recv(west(),110,Xdataleft,1)
	CALL recv(east(),110,Xdataright,1)
	
  Finally with NODE-clustering is meant a feature which allows a HOST
-process to create several simultaneously operable "clusters" of NODEs that
are allocated to a specific task. A portion of such a program is presented
here:

	REAL A(N,N), B(N,N), C(N,N)
	INTEGER cluids(2),getcluster
	... user coding here ..
	CALL createproc('matmul.ndf')    ! Create the first cluster for MATMUL
	cluids(1) = getcluster()
	CALL createproc('lu-decomp.ndf') ! Create the 2nd cluster for LUDECOMP
	cluids(2) = getcluster()
	... access cluster#1 ...
	CALL setcluster(cluids(1))
	... perform C = A * B in parallel thru PVM ...
	CALL MATMUL(C,A,B,N)
	... access cluster#2 ...
	CALL setcluster(cluids(2))
	... perform LU-decomp of C in parallel thru PVM ...
	CALL LUDCMP(C,N)

MATMUL and LUDCMP are user written driver subprograms to distribute the data.


Results
-------

  For the time being EASYPVM is getting its final shape and is being ported
(compiled) to many common RISC-workstations as well as Cray and Convex
supercomputers. A real test of its powerfulness as well as weakness is
going take place soon when a numerous of parallel benchmarks and message
passing codes are being converted to a EASYPVM-format. Initial porting
experiences show, however, that thanks to especially global communication
features of EASYPVM, the message passing applications created for Intel
systems are almost trivial to convert to use EASYPVM-calls, or more
specifically -- PVM.


Discussion & Conclusions
------------------------

  EASYPVM offers a simple way to extend capabilities of popular PVM.  It
facilitates testing and production runs of distributed memory programs
written for the "real" MIMD/MPP-systems in a workstation cluster
environment.

  EASYPVM also removes most the PVM-programming burden, but also creates
some new barriers related to mixed data type packing available in PVM.
However, the nature of EASYPVM is not to restrict PVM-programming in any
way, thus all functions available in PVM, are still available in EASYPVM
programs.

