.\" To runoff, use: tbl file | troff -ms | print
.nr PS 12
.nr VS 14
.TL
Parallel and Distributed Compilations in Loosely-Coupled Systems:
A Case Study
.AU
Erik H. Baalbergen
.AI
Dept. of Mathematics and Computer Science
Vrije Universiteit
Amsterdam
The Netherlands
.AB
One application of large-grain parallelism is the use of
parallel and distributed compilations by
.I make ,
running under
.UX .
The original version of
.I make
executes its compilation commands successively.
'Making' a large system could therefore take a large amount of time.
An increase in efficiency may be achieved by a parallel version of
.I make ,
which tries to execute the compilations simultaneously.
A parallel, non-distributed, version of
.I make
turns out to be inefficient.
The compilations, which are mainly cpu-bound, slow each other down
due to degradation of the processor's performance.
A solution may be found in
the idea of boarding out (part of the) compilations to other
processors.
This resulted in a study of how to do compilations in a distributed
manner.
.PP
The aspect of having a system of loosely-coupled processors is
an important issue in the field of distributed compilations.
The relatively high cost of doing transactions (compared to local actions)
in a loosely-coupled system makes the use of low-level inter-processor
communication (e.g., the execution of system calls on another processor)
inefficient.
A
.UX
network system like the
.I
Amoeba Connection
.R
turns out to be unsuitable for doing distributed compilations.
It is shown that much overhead results from the communication between
the system that contains the source code to be compiled
and the system that does the compilation.
Another possibility is to copy the source code to the other processor's
data space, execute a local compilation on that processor and send the
results back; this greatly reduces the communication overhead.
The time needed to send the source to and receive the code from the remote
processor is negligible compared with the overhead mentioned earlier.
.PP
In order to create a parallel and distributed
.I make ,
I adapted the original 'make' program by adding a module for finding out which
compilations can be executed in parallel, depending on
the actions to be taken, the actions already finished, and the present files.
Furthermore, I created various versions of the 
.UX
C compiler
.I cc
in order to perform some measurements.
.AE
.NH
INTRODUCTION
.PP
The availability of networks of personal workstations has
increased interest in parallel and distributed compilations.
A decrease of the response time is the main motive for executing compilations
in parallel on several processors.
This paper is a description of an experiment set up to examine the
feasibility of using more than one processor for doing compilations.
The experiment took place in a
.UX
environment and consisted of two parts: creating two different distributed
versions of the C compiler
.I cc
and constructing a version of the 
.UX
tool 
.I make [Feldman78]
that could run its compilations in parallel.
The aspect of being tightly-coupled or loosely-coupled turns out to be
an important issue in determining whether a specific network is suitable for
doing distributed compilations.
.PP
The test configuration consisted of four VAX 11/750s, each of them running 
.UX
version 4.1 BSD.
The machines are connected by a 10 Mbps token ring (Pronet).
Pronet was made available to user programs by incorporating
the
.B Amoeba
3.0 [Tanenbaum81] communication primitives into
.UX
[Renesse84].
.PP
A distributed compilation can be done in various ways.
One possibility is to create a version of the compiler in which the system calls
may be carried out remotely.
This can be achieved by using a
.UX
network system like the
.I
Amoeba Connection,
.R
similar to the
.I
Newcastle Connection
.R
[Brownbridge82].
The system combines the file systems of each of the connected machines
by allowing access to files and execution of programs on
other systems.
The compiler runs remotely (i.e., on another processor)
but each system call concerning
the source code should be executed on the processor that runs the file
system of the source code.
Another possibility is to isolate components from the compiler
and execute some of them remotely.
One problem with remote compilations is that the compiling program should
produce code for the local machine.
Each of the connected machines should therefore have a compiler for each of the
other machines.
This is no problem if the connected machines are similar, as is the
case in the test environment.
Another problem arises in the second kind of compilation: libraries and
included source code should be derived from the source machine.
This, too, caused no problem in our test environment, as will be shown.
.PP
It must be said that many of the results depend strongly on our configuration,
especially with regard to the communication overhead.
Results from a common network operation, performed in the same
configuration, are included to give an indication of the overhead.
.NH
AMOEBA AND THE AMOEBA CONNECTION
.PP
.B Amoeba
is a distributed operating system developed at the Vrije Universiteit
[Tanenbaum81].
The
.B Amoeba
communication primitives are described in [Tanenbaum84].
.B Amoeba
uses a "request-reply" or "transaction" style of communication, in which the
basic primitive is the client sending a request to a server and the server
sending a reply back to the client.
Such a pair of request and reply messages is henceforth called a
.I transaction .
The implementation of the primitives in
.UX
created the possibility for two processes running on different systems to
communicate by means of transactions.
This has been exploited in various user programs such as copying files between
various systems, remote execution of commands, sharing of resources and
remote logging in [Renesse84].
An application of the transaction mechanism is a
.UX
system-call server.
A process on machine
.I A
can ask a system-call server on machine
.I B
to execute a system call, such as
.B open ,
.B read ,
or
.B write .
The strategy used to implement the remote system calls is to build an
extra layer on the kernel.
A program does not directly invoke the kernel but calls a stub routine
which checks whether the command must be done locally or remotely.
Local commands are passed directly to the kernel.
Remote commands are passed to a system call server on the proper
machine by doing a transaction with the system-call server.
A great advantage of the use of this extra layer is that existing programs
need not be rewritten or even recompiled.
They only have to be relinked with a library of stub routines.
The naming scheme for remote files (i.e., files on other 
.UX
systems), the system-call server and the stub-routine library together form the
.I
Amoeba Connection.
.R
The connection was found useful in our experiment, although the overhead
was large.
.PP
The following table gives an indication of the speed of the connection in
terms of response time, measured in seconds.
Three versions of the 
.UX
file-copy command
.I cp
are compared: plain
.I cp ,
able to copy files on the local machine only;
.I rcp ,
the inter-machine file-copy program as described in [Renesse84];
and
.I fcp ,
which is plain
.I cp
linked with the
.I
Amoeba Connection
.R
library.
'Local' is a file copy from one disk to another on the same system.
'Remote' is a file copy from the local system to another system.
All measurements took place on lightly loaded machines.
.TS
center,box;
c||c|c s|c s.
number of bytes	cp	rcp	fcp
.T&
c||c|c|c|c|c
r||r|r|r|r|r.
\^	_	_	_	_	_
\^	local	local	remote	local	remote
=
1	0.28	1.08	3.55	0.25	0.40
_
1024	0.28	1.20	3.60	0.25	0.43
_
10240	0.40	1.93	3.95	0.38	0.83
.TE
.ce
Table 1.
.NH
THE EXPERIMENT AND ITS RESULTS
.NH 2
A distributed compiler
.PP
The first phase of the experiment was to construct two distributed
versions of 
.I cc
which is the
.UX
C compiler.
.I Cc
is a program that causes C source code to be passed through several compilation
programs.
The first step is done by the C preprocessor
.I cpp
which performs macro substitution, file inclusion and elimination of
source code, depending on several user-specified, preprocess-time conditions.
Next follows the compiler proper,
.I ccom ,
which is a two-pass portable C compiler
[Johnson79].
The assembly code generated by the two-pass compiler is translated to object
code by the
.UX 
assembler
.I as .
The program
.I ld ,
which is the
.UX
link editor, finally combines the object programs, together with some
libraries, into one program which may be executed.
The first distributed version of the C compiler is based on the
.I
Amoeba Connection.
.R
The four compilation programs are relinked to allow files on
other systems to be compiled.
The resulting compiler, together with its driver
.I cc ,
is installed on each of the connected machines.
(Having multiple copies of the compiler is in fact an optimization with
regards to our implementation of the
.I Amoeba
.I Connection ;
remote execution
is allowed only if the program is situated in the file system belonging to
the remote processor.)
The remote compilation of a source file on
.B A
is now done by invoking 
.I cc
on some other machine
.B B.
The overhead caused by the implementation of the
.I
Amoeba Connection
.R
is large.
A transaction takes one system call on the local machine and two system calls
on the remote machine.
Furthermore, a remote system call takes one system
call on the remote machine.
It turns out that if the compiler executes one remote system call, four
system calls are needed to get the result.
Communication overhead, too, plays an important role.
Some results, compared to local execution, are listed in table 2.
.PP
The second distributed version of the compiler was created using the
local versions of each of the four components.
.I Cpp
and
.I ld
run locally, because both programs are strongly file-system dependent.
.I Cpp
may include some standard files and 
.I ld
may use some libraries from the local file system.
.I Ccom
and
.I as
are combined into a server which takes preprocessed C-source code as input
and produces object code as output.
The input and output (and compiler diagnostics) may be sent across the network
using transactions.
The compiler server can therefore run remotely, getting its input from
the locally-executed preprocessor and sending its output to the local file
system.
The advantage of this approach is that the various components need not be 
recompiled and that the only network communication is limited to invoking
the server and passing the input and output.
Some results for response times (compared to plain 
.I cc )
are listed below.
.TS
center,box;
c||c|c s|c s.
T{
number of C
source lines
T}	T{
local
using cc
T}	Amoeba Connection	Compiler server
.T&
c||c|c|c|c|c.
\^	\^	_	_	_	_
\^	\^	local	remote	local	remote
=
.T&
r||r|r|r|r|r.
20	3.27	4.05	21.45	3.07	5.58
_
200	9.17	9.82	30.57	9.07	11.33
_
2000	68.75	68.68	105.28	68.98	69.63
.TE
.ce
Table 2.
Table 2 shows that remote compilation using the compiler server gives
a better response time
than using the compiler, based on the Amoeba Connection.
.NH 2
Distributed and parallel \fImake\fR
.PP
The second part of the experiment was to apply the result of the first part
to the 
.UX
program
.I make .
Many programs developed under
.UX
consist of a set of C source files which have to be compiled.
.I Make
performs the compilations sequentially.
Doing the compilations in parallel on the same processor is not
a solution, as table 3 shows.
A solution is found in parallel and distributed execution of the compilations.
Table 3 shows results of the three ways of doing independent compilations
(i.e., no compilation uses results of the other compilations.)
The file to be compiled was about 1800 lines.
The maximum number of available processors was 4.
The compiler server is used for doing the distributed compilations.
.TS
center,box;
c||c|c|c
c||c|c|c
r||r|r|r.
number of	locally	locally	distributed
compilations	sequentially	in parallel	in parallel
=
1	68.67	-	-
_
2	137.34	127.32	69.58
_
3	206.00	183.45	73.22
_
4	274.66	242.50	78.35
.TE
.ce
Table 3.
.PP
Adapting 
.I make
in order to execute compilations in parallel on several processors
was done easily.
.I Make
maintains a list of programs that can also run on other machines.
A command is executed in parallel to the other commands under the
following conditions:
the program appears on the distributed-program list; the necessary
files are present; the compilations on which the command depends are
already done; and there is still enough remote processing power.
The only program currently in the distributed-program list is
.I cc ,
but other compilers and translating programs may be added to this list.
The processing power of a machine is computed by keeping an account
of the number of compilations started by
.I make
on that machine.
.NH
PLANS
.PP
From the experiment we learned that splitting up
compilations and running the environment and machine independent
phases on other processors results in a remarkable increase in response time,
especially if several sources need to be compiled.
Splitting up a single compilation is also a result of the philosophy
behind the 
.B
Amsterdam Compiler Kit (ACK),
.R
a project at the Vrije Universiteit in the area of compiler construction
[Tanenbaum83].
A compilation in
.B ACK
causes a program to pass through several components: a front end, which
translates the program into machine-independent intermediate code;
a peephole and global optimizer; a back end, which translates from 
intermediate code into assembly code; an assembler; and a linker.
The idea is to have a pool of processors, each of them running a dedicated
server performing one of the components.
.B ACK
tries to allocate a server for each of the phases and passes the program
through the pipeline of servers.
.NH
References
.IP [Brownbridge82]
.br
D.R. Brownbridge, L.F. Marshall and B. Randell, "The Newcastle Connection
or UNIXes of the World Unite!,"
.I
Software-Practice and Experience,
.R
vol. 12,
pp. 1147-1162 (1982).
.IP [Devarakonda85]
.br
M. Devarakonda, R. McGrath, R. Campbell and W. Kubitz, "Networking a
Large Number of Workstations Using
.UX
United,"
.I
Proc. 1st IEEE Int. Conf. on Computer Workstations, 
.R
pp. 231-239 (1985).
.IP [Feldman78]
.br
S.I. Feldman, "Make - A Program for Maintaining Computer Programs,"
appeared in 
.I
UNIX Programmers Manual,
.R
vol. 2A, Bell Laboratories, Murray Hill NJ, 1979.
.IP [Johnson79]
.br
S.C. Johnson, "A Portable Compiler: Theory and Practice,"
.I
Proc. 5th ACM Sym. on Principles of Programming Languages,
.R
pp. 97-104 (January 1978).
.IP [Miller82]
.br
J.A. Miller and R.J. LeBlanc, "Distributed Compilation: A Case Study,"
.I
Proc. 3th IEEE Int. Conf. on Distributed Computing Systems,
.R
pp. 548-553 (October 1982).
.IP [Renesse84]
.br
R. van Renesse, A.S. Tanenbaum and S.J. Mullender, "Connecting
.UX
systems Using a Token Ring," Rapport IR-91,
Vrije Universiteit, Amsterdam, The Netherlands (October 1984).
.IP [Tanenbaum81]
.br
A.S. Tanenbaum and S.J. Mullender, "An Overview of the Amoeba Distributed
Operating system,"
.I
Operating Systems Review,
.R
vol. 15, no. 3, pp. 51-64 (July 1981).
.IP [Tanenbaum83]
.br
A.S. Tanenbaum, J.M. van Staveren, E.G. Keizer and J.W. Stevenson,
"A Practical Toolkit for Making Portable Compilers,"
.I CACM
vol. 26, no. 9, pp. 654-660 (September 1983).
.IP [Tanenbaum84]
.br
A.S. Tanenbaum and S.J. Mullender, "The Design of a Capability-Based
Distributed Operating System," Rapport nr. IR-88,
Vrije Universiteit, Amsterdam, The Netherlands (November 1984).
