






















                                       11




                                   NETWORKING


   MINIX supports networking.  This chapter describes the kind of  support  pro-
vided, how to use it, and how it should be installed.


11.1.  INTRODUCTION

   Network software can be divided into two general categories differing in  the
way  the software is integrated into the operating system and the user software.
When networks first developed, they were used over slow wide-area links (56 kbps
or less), so the designers' main concern was using the available bandwidth effi-
ciently.  Programmer convenience was not considered. Later, as higher  bandwidth
networks  became  widespread (especially local area networks, such as Ethernet),
the focus changed from worrying about bandwidth utilization, to  worrying  about
making  the network interface convenient for the programmers.  This evolution is
very similar to the evolution from  assembly  language  programming,  where  the
machine came first, to programming in high level languages, where the programmer
came first.
   Networks of the first type of are said to be  connection  oriented,  and  use
what  are  called _s_l_i_d_i_n_g _w_i_n_d_o_w _p_r_o_t_o_c_o_l_s.  All older networks, especially wide
area networks, are of this type.  Some of the better known protocols  are  X.25,
TCP/IP,  and  OSI.  Networks of the second type are connectionless, and use what
is called remote procedure call (RPC).  Virtually all modern distributed operat-
ing systems are based on this concept.  Some well-known examples are the work of


                                      277










278                                NETWORKING                          CHAP.  11

Xerox PARC [1], the V kernel [2], and Amoeba [3-11].  While it is certainly pos-
sible  to  build  RPC on top of a connection-oriented protocol, this approach is
inefficient compared to building the RPC on top of the  bare  network.   For  an
introduction  to  connection-oriented protocols, RPC, and networking in general,
see [12].
   Networking in MINIX is  based  on  RPC.   Briefly  summarized,  communication
between  two  processes  works  as  follows.   One  of the processes, called the
_s_e_r_v_e_r, has some service to offer, such as a file storage.  The  other  process,
the _c_l_i_e_n_t, wants to use this service.  The interface to the service consists of
a collection of procedures that the client can call.  In  the  case  of  a  file
server, the procedures might be CREATE_FILE, RENAME_FILE, READ_DATA, WRITE_DATA,
and so on.  These are library routines available on the client's machine.
   When the client calls one of these procedures, the procedure sends a  message
to  the  server containing the procedure name and its parameters.  The procedure
then blocks waiting for the reply.  When the message gets to the server,  it  is
decoded  there and executed.  The reply is sent back to the calling procedure on
the client's machine, which then returns the results to the  caller.   From  the
programmer's  point  of  view, having remote services in the network essentially
means that there is a new collection of procedures to call.  The  programmer  is
not  burdened  with concepts like opening connections, sending data, or thinking
in terms of acknowledgements, all of which are needed in the connection-oriented
model.  Nor is the network software burdened with having to manage connections.
   In effect, RPC is based on the abstraction of  the  procedure  call,  whereas
connection-oriented networks are based on the much lower-level concept of making
the network look like an input/output device.  While at first  glance  it  might
seem  that  connection-oriented  networking  could  be  made  to  fit  with  the
UNIX/MINIX concept of a pipe, pipes are set up in a very  different  way  (by  a
common  ancestor),  and  fit  very poorly to the most common style of local area
network programming, where the client has a  request  and  the  server  gives  a
response.   With wide area networks, this kind of interaction is painfully slow,
due to the low bandwidth, so the only services generally available are mail  and
file  transfer, which are batch-oriented. MINIX networking has been designed for
interactive use on high performance local area networks, so for this reason, RPC
has been chosen over the older connection-oriented style.
   In particular, MINIX networking has been designed to be compatible  with  the
form  of  RPC  used in the Amoeba distributed operating system [3-11].  Not only
have the concepts and the implementation been well tested, but  the  performance
is  exceedingly  good.   For  example,  for doing file transfers, something that
connection-oriented protocols are supposed to be good at, Amoeba running on  two
Sun  3s  achieves  triple the throughput of TCP/IP running on the same hardware.
Data transfers between two Zenith Z-248s running the Amoeba RPC  on  MINIX  have
been  measured at 165 kbytes/sec, almost as fast as TCP/IP transfers between two
Sun 3/50s.  Considering that the Suns are two times as fast as  the  Z-248s  and
the  network  software  is  100% CPU limited (doubling the CPU speed doubles the
throughput), this is a strong argument for the Amoeba RPC.  As a  final  statis-
tic, the RPC throughput between a client and server located on the same Z-248 is
1.5 Mbytes/sec, an extremely high figure for this class  of  machine,  and  much
better  than what Suns and VAXes normally achieve locally, despite their greater
CPU power.  In conclusion, although RPC was chosen for its elegance and ease  of














SEC.  11.1                        INTRODUCTION                               279

use, it turns out that it also has excellent performance, even doing things like
bulk transfer, and certainly doing things like short request-reply interactions.
   A few words about Amoeba are probably in order here.   It  is  a  distributed
operating  system  that was developed at the Vrije Universiteit in Amsterdam and
is now being further developed there and at the Centre for Mathematics and  Com-
puter  Science  in  Amsterdam.   It currently runs on the Sun 3 (and other 680x0
processors), VAXstations, and 80386s.  Note that Amoeba is a complete  operating
system, just like UNIX, MINIX or VMS. The only relation between Amoeba and MINIX
is that MINIX networking uses the Amoeba RPC protocols.  Other  than  that  they
are  quite  different in structure, funtionality, and goals. Amoeba was designed
to run on systems consisting of dozens of processors, and yet give the  program-
mer  the  illusion that it is a traditional single-CPU time sharing system.  For
more information about Amoeba, see the references.


11.2.  OBJECTS

   Amoeba is an object-based system, and to a considerable extent this  orienta-
tion is reflected in the protocol.  As a consequence, MINIX also acquires a cer-
tain object-orientation.  Very  briefly,  an  object  is  a  programmer  defined
abstract  data  type  that  has well-defined operations on it.  As an example, a
file server could define file and directory objects, and provide  operations  to
read  and  write  the  file objects, and insert files in, and delete files from,
directory objects.  Clients can perform these operations by doing RPCs with  the
file  server.   Henceforth  we  will adopt the Amoeba terminology and call these
RPCs _t_r_a_n_s_a_c_t_i_o_n_s. A transaction consists of a request message from a client  to
a server, followed by a reply message from the server back to the client.
   It is up to the writer of each server to decide what  kinds  of  objects  the
server  will  support and what operations will be available on them.  The struc-
ture of the system guarantees that clients can only perform the operations  pro-
vided  by the server.  This style of networking is intended to force constraints
on programmers,  just  as  high-level  languages  force  constraints  on  former
assembly-language programmers.
   Objects are normally protected by capabilities, which are  currently  (Amoeba
4.0)   128-bit  numbers, although in the the next version of Amoeba (Amoeba 5.0)
this will become 256 bits.  When a client asks a server to create an object, the
server  returns  a capability for the object.  This capability must be presented
by the client to perform subsequent operations on the object.  In Amoeba,  capa-
bilities  are  protected  crytographically.   Since the MINIX kernel, unlike the
Amoeba kernel, was not designed from scratch as a distributed system,  the  pro-
tection aspects in MINIX are not fully implemented.
   A capability has 4 fields,  described  below.   These  fields  are  important
because they appear in the Amoeba and MINIX message headers.
  Port:     48-bit number used to identify the server owning the object.
  Object:   24-bit number used by the server to identify the object
  Rights:   8 bits telling which operations are allowed
  Cksum:    48-bit checksum to prevent tampering with the capability
The _p_o_r_t field is a (random) 48-bit  number  used  for  addressing.  Any  48-bit














280                                NETWORKING                          CHAP.  11

number  can  be used as a port.  In some situations, an ASCII string can be used
as a port, with the first 48 bits taken as the port  number.   All  messages  in
Amoeba  and  MINIX  are sent to ports, not to machine addresses.  The mapping of
ports to machine addresses is done deep down in the system,  and  is  of  little
concern  to  the  average programmer.  Thus: a port uniquely identifies a server
and provides a logical address to which all messages for the server are sent.
   The remaining three fields are called the private part of the capability.  In
theory,  each  server can use them any way it wants to.  In practice, to prevent
total chaos, all existing servers adhere to the following conventions  (just  as
most  UNIX  programs  adhere  to the convention that certain files contain ASCII
characters with a line feed at the end of each line).  The _o_b_j_e_c_t field is  used
by the server to identify the specific object being accessed.  For example, when
a file server created a new file on behalf of a client, it could put the  i-node
number  of  the  new  file in this field, so that when the client later used the
capability, the server could tell which file was being addressed.  The field  is
24-bits long, providing each server with 16 million object identifers.
   The _r_i_g_h_t_s field contains a bit map for up  to  eight  protected  operations.
Each bit controls permission to perform one operation.  Thus a file server could
allocate bit 0 for READ_DATA, bit 1 for WRITE_DATA, bit 2 for APPEND_DATA, bit 3
for DELETE_FILE, and so on.  When a capability arrives from a client, the server
checks to see if the bit corresponding to the relevant operation is on.   If  it
is  not,  the operation is rejected.  In this way, a user can create a file, ask
the server to turn off the WRITE_DATA and DELETE_FILE bits, and  then  give  the
capability  to  another  user.   This  new  user  cannot  perform WRITE_DATA and
DELETE_FILE operations, but can perform the operations whose bits are turned on.
   A moment's thought will reveal that the above protection scheme is  worthless
if  users  can  turn the rights bits on and off by themselves.  To prevent this,
the _c_k_s_u_m field is used.  When creating a new object, the server  simultaneously
creates  a  random  number and stores it in its internal tables (e.g., in the i-
node).  It then combines the rights bits and the random number, and  passes  the
result through a one-way cryptographic function.  The result of this function is
put in the _c_k_s_u_m field.  When a capability comes in from a  client,  the  server
uses  the  object number to locate the original random number.  It then combines
it with the rights bits present in the capability, and runs the  result  through
the  one-way  function.  If the result disagrees with the _c_k_s_u_m field, the capa-
bility is considered invalid, and an error return is sent back.   In  this  way,
users  who  change  the  rights  bits will simply invalidate their capabilities.
Attempts to break the scheme by finding an inverse to the one-way  function  can
be handled by choosing a cryptographically strong one-way function.  Brute force
does not work either, as picking checksums at random will require, on the  aver-
age,  2**47 attempts to guess the 48-bit checksum. Since a null transaction over
a 10 Mbit/sec Ethernet using SUN 3/50s takes about 1.4 msec,  about  3000  years
are  needed  to perform the search.  Furthermore, it is easy enough to program a
server to artificially increase the transaction time to 1 sec  after  10  unsuc-
cessful  attempts  have  been  made,  thus  increasing  the  mean search time to
3,000,000 years.
















SEC.  11.3                  OVERVIEW OF TRANSACTIONS                         281

11.3.  OVERVIEW OF TRANSACTIONS

   To summarize what we have covered so far, the normal style of  networking  in
MINIX  (and  Amoeba)  is to structure dialogues in terms of clients and servers.
Each server manages one or more types of objects, and  provides  operations  for
clients  to perform operations on these objects.  When a client asks a server to
create an object for it, the server then returns a capability for the object  to
the  client.   This capability identifies the server, identifies the object, and
tells which subset of the operations the holder of the capability  may  perform.
To have an operation performed, the client sends a request message to the server
(with the capability embedded in the message header), and the server then  sends
back  a  reply.   In most cases, the calls to the server are embedded in library
procedures, called _s_t_u_b_s, to encapsulate the message passing and  hide  it  from
the users.
    Transactions provide a basis for a large number of user services.  In MINIX,
users  can  use them to build arbitrary services.  Two key services are provided
as standard for MINIX, remote execution and remote file copying.  These services
make  use  of  a  process  called  the  shell server, or sherver for short.  The
sherver accepts messages from remote (or local) clients, executes  the  commands
in them, and returns the output.
   Communication is implemented as follows. Each server listens to a unique  48-
bit  port.   A client that wants service from the server sends a request to that
port and blocks until it receives a reply. (If the  client  cannot  find  anyone
listening  to  the  port after a given period, it times out and returns an error
status.)  When the server is ready, it returns a reply  to  the  client,   which
then  continues execution. Each transaction is independent of the previous tran-
sactions; there is no connection or virtual circuit.
   Clients must have some way of discovering a server's port.   Under  Amoeba  a
directory  server  is used. The directory server stores capabilities for objects
and associates them with an ASCII string.  The directory server has a well known
port.  Under MINIX you make initial contact with a sherver that has a well known
port and then the sherver creates a secret port for all further transactions  on
that machine.
   There are four stub routines in the user  library  which  provide  the  basic
interface between user processes and transactions.  They are:
  1. _g_e_t_r_e_q - Get request (used by servers to get a request)
  2. _p_u_t_r_e_p()- Put reply (used by servers to send reply)
  3. _t_r_a_n_s()- Transaction (used by clients to do a transaction)
  4. _t_i_m_e_o_u_t- Sets the time limit at which _t_r_a_n_s gives up
Getreq() and _p_u_t_r_e_p are used by servers to get a request from a  client  and  to
send a reply.  A server may not do a _g_e_t_r_e_q until it has replied to the previous
_g_e_t_r_e_q.  The call _t_r_a_n_s is used by clients to send a request  to  a  server.  It
blocks until a reply or a signal arrives, or, if it cannot find a server listen-
ing to his port, it times out and returns an error  code.   The  length  of  the
timeout is set using the function _t_i_m_e_o_u_t.  This timeout has to do with locating
servers, not how long they have to do the work.
   Messages of up to 30000 bytes can be sent between client  and  server.   This
limit  will  increase to 1 Gbyte in the next version of Amoeba but will probably














282                                NETWORKING                          CHAP.  11

remain at 30000 bytes in MINIX due to the small address space of the IBM PC.  It
is  possible  to  provide security so that servers only execute remote procedure
calls for authorized users. The protection mechanism uses  capabilities  and  is
discussed  in  detail  in  the  references.  It will not be discussed much here.
This protection mechanism is not implemented in the remote shell software avail-
able  with  MINIX.   (It  requires  a  directory server, among other things. The
implementation is left as an exercise for the reader.)


11.4.  SYNTAX AND SEMANTICS OF TRANSACTION PRIMITIVES

   Now we will take a detailed look at the syntax and semantics of  the  library
routines  for  using  transactions, followed by some simple examples to indicate
how the functions are typically used. Remember, that when programming with tran-
sactions,  the  primitives  used  in  C  programs are _g_e_t_r_e_q, _p_u_t_r_e_p, _t_r_a_n_s, and
_t_i_m_e_o_u_t.  These can be thought of as network system calls, although they are not
implemented  quite  like  that  in MINIX.  If you are building a server, it will
typically have a main loop with a _g_e_t_r_e_q at the top,  a  switch  in  the  middle
based on some field of the incoming message, and a _p_u_t_r_e_p at the bottom.  Furth-
ermore, the server writer will generally also provide a set of  stub  procedures
that contain _t_r_a_n_s calls to access the server.  The average user will call these
library procedures, and will not make _t_r_a_n_s calls directly, although he  is,  of
course, free to do so if he wishes.
   Transaction messages always begin with a special header.  The exact layout of
these messages is defined by the Amoeba protocol.  By using this protocol, MINIX
machines can communicate with one another,  and  with  Suns  and  Vaxes  running
Amoeba.   Device drivers have also been written for UNIX to allow UNIX processes
to speak Amoeba, and have Amoeba clients and servers run on UNIX. At  the  Vrije
Universiteit,  all  the  Suns, Vaxes, and other machines that run UNIX have such
drivers to communicate with each other and  with  machines  running  Amoeba  and
MINIX.  It is the local lingua franca, just as TCP/IP is at some sites.
   The Amoeba header is defined in the header file /_u_s_r/_i_n_c_l_u_d_e/_a_m_o_e_b_a._h,  which
must  be  included in all programs using transactions.  The header definition is
given below.  The types used in the header struct are also defined in _a_m_o_e_b_a._h.
typedef struct {
     port h_port;   /* port (i.e., logical address) of the dest. */
     port h_signature;/* used for authentication and protection */
     private h_priv;/* 10 bytes: object, rights, and cksum */
     unshort h_command;/* code for operation desired/status returned */
     long h_offset; /* parameter field */
     unshort h_size;/* parameter field */
     unshort h_extra;/* parameter field */
} header;
   The message header contains the port to which the message should be  sent,  a
command/status  field  for use by the server and space for some parameters to go
with the command or status.  Let us now look at  the  four  network  primitives.
The first one, getreq, has the following declaration:














SEC.  11.4       SYNTAX AND SEMANTICS OF TRANSACTION PRIMITIVES              283

     unshort getreq(hdr, buffer, size)
     header *hdr;
     char *buffer;
     unshort size;
The three parameters refer to the header,  the  buffer,  and  the  buffer  size,
respectively.   In  a  sense,  they are analogous to the parameters of the MINIX
READ and WRITE system calls.  The _h_d_r parameter points to a header struct, which
is  used  to  allow the server to specify which port it wants to listen to.  The
_h__p_o_r_t field of the header must be initialized with the port number.  The _b_u_f_f_e_r
parameter  is a pointer to a buffer to hold the incoming message.  It can hold a
maximum of size bytes, specified by the third parameter.  If  successful  _g_e_t_r_e_q
returns  the  number  of  the  bytes  of  data  in the buffer that were actually
received.  In addition, the other fields of the header are filled in by the sys-
tem.   If  an error occurs then it returns a negative error code. Possible error
codes (defined in _a_m_o_e_b_a._h) are:
     FAILED:     - Null port or _g_e_t_r_e_q done before previous _p_u_t_r_e_p
     BADADDRESS: - The buffer pointer and/or size was not valid
     ABORTED:    - A signal was received
     TRYAGAIN:   - There were no free transaction slots in the kernel tables
Note that after a _g_e_t_r_e_q, _t_r_a_n_s may be used to communicate with  another  server
before doing the _p_u_t_r_e_p. In other words, a server may call other servers to help
it do its job, but it may not process multiple transactions simultaneously.  (In
Amoeba,  server processes may contain multiple threads to allow parallelism, but
MINIX does not allow multiple threads per process.)
   The next call is _p_u_t_r_e_p, used by servers to reply to requests and  send  back
results and status information.  The declaration is:

     unshort putrep(hdr, buffer, size)
     header *hdr;
     char *buffer;
     unshort   size;

The header returned contains status information, and possibly a new port (in the
_h__s_i_g_n_a_t_u_r_e  field).  A buffer containing size bytes of data is also returned to
the client.  If successful, _p_u_t_r_e_p returns the number of bytes sent.  The  reply
message is not acknowledged, so that a successful return from this call does not
guarantee that the client got the reply.  In general, it is up to the client  to
try again if the reply is not forthcoming quickly enough.  Possible error condi-
tions for _p_u_t_r_e_p are defined in _a_m_o_e_b_a._h as follows:

     FAILED:     - No _g_e_t_r_e_q was done first
     BADADDRESS: - The buffer pointer and/or size was not valid
     ABORTED:    - A signal was received

   Now we come to the call used by clients to  request  services  and  wait  for














284                                NETWORKING                          CHAP.  11

replies.  Servers can also use this call to request services from other servers.
Thus at one instant a process may be acting as a server and at another the  same
process may be acting as a client.  The client call is:

     unshort trans(hdr1, buffer1, size1, hdr2, buffer2, size2)
     header *hdr1, *hdr2;
     char *buffer1, *buffer2;
     unshort size1, size2;

The call has two independent sets of parameters.  Those with suffix 1  are  used
for sending the request message to the server.  Those with suffix 2 are used for
getting the reply.  Both sets have a header, a buffer, and a size.  The two  _h_d_r
pointers  point  to structs for message headers.  The first one contains parame-
ters copied to the outgoing message to the server and the  second  one  contains
space  for  the  data  to be copied in from the server's _p_u_t_r_e_p.  The two buffer
parameters are for the outgoing and incoming data,  respectively,  and  the  two
sizes tell how large these buffers are.
   After making a _t_r_a_n_s call, the client blocks until the message has been sent,
received,  processed  by  the  server, and replied to.  Only then can the client
continue execution.  At this point the fields of _h_d_r_2 and _b_u_f_f_e_r_2  will  contain
the  reply  data.  Like MINIX itself, transactions support only this synchronous
form of communication.  Experience has painfully shown that asynchronous  stream
communication  is difficult for programmers to deal with.  After all, everything
else in programming languages is synchronous.  (Can you imagine what it would be
like  to  have  a procedure call return control to the caller before having fin-
ished its work?)
   If successful, _t_r_a_n_s, returns the number of  bytes  in  the  reply.  Possible
error codes are:

     FAILED:     - Null port or server crashed between _g_e_t_r_e_q and _p_u_t_r_e_p
     NOTFOUND:   - The port locate failed to find a server before the timeout
     BADADDRESS: - A buffer pointer and/or size was not valid
     ABORTED:    - A signal was received
     TRYAGAIN:   - There were no free transaction slots in the kernel's tables

The final network primitive deals with setting timeouts.  When  a  client  first
does  a transaction on a previously unknown port, the kernel broadcasts a locate
message to find the server.  It then waits a certain amount of time for a server
to  reply.  If no server replies before the timer goes off, the _t_r_a_n_s fails with
NOTFOUND.  The _t_i_m_e_o_u_t call allows the client to determine how long to wait  for
a  server  to  reply.  After a reply has been received, the kernel keeps it in a
cache, so that locates will not be needed  subsequently.   It  is  important  to
realize  that  the  timeout  relates  to  locating servers, not to how much time
servers have to perform their work.  The declaration is:

     unshort timeout(time)














SEC.  11.4       SYNTAX AND SEMANTICS OF TRANSACTION PRIMITIVES              285

     unshort time;

The function sets the length of the locate timeout in tenths of a  second.   The
default  is 300 (30 seconds). A timeout of 0 means do not time out.  The _t_i_m_e_o_u_t
call returns the length of the previous timeout.

























































286                                NETWORKING                          CHAP.  11

11.5.  SERVER STRUCTURE

   A typical server has the following form:
  /* Declarations needed by the server. */
  header hdr;                 /* header for receiving requests */
  char buffer[BUFSIZE];       /* buffer for receiving requests */
  char reply[BUF2SIZE];       /* buffer for sending replies */
  unshort size, replysize;    /* sizes of the two buffers */
  unshort getreq;             /* function declaration */
  char *strncpy();            /* string function */

  signal(SIGAMOEBA, SIG_IGN); /* ignore signals */

  while (1) {

    /* Have the server listen to a 48-bit port equal to ASCII "MyServ" */
    strncpy(&hdr.h_port, "MyServ", HEADERSIZE);

    /* Wait for a request to come in for that port. */
    size = getreq(&hdr, buffer, BUFSIZE);

    /* If the size returned is negative then an error occurred. */
    if ((short) size < 0) {
          handle_error();
    } else {
          perform_request();  /* carry out the work */
          hdr.h_status = OK;  /* or whatever */
          putrep(&hdr, reply, replysize);/* send reply back */
    }
  }
If all the information necessary for the request is  in  the  headers  then  the
buffers  in _g_e_t_r_e_q and _p_u_t_r_e_p can be replaced by the value NILBUF and the buffer
sizes can be replaced by 0.




























SEC.  11.5                      SERVER STRUCTURE                             287

11.6.  CLIENT STRUCTURE

   The structure of a client program is much  more  variable.   A  program  that
deals with the above server might look like this:
  /* Declarations needed by the client. */
  header hdr;                 /* header used for request */
  char buffer[BUFSIZE];       /* buffer used for request */
  short size;                 /* size of the buffer */
  unshort trans;              /* function declaration */
  char *strncpy();            /* string function */

  /* Initialize server port to "MyServ". */
  strncpy(&hdr.h_port, "MyServ", HEADERSIZE);

  /* Send request to server listening to that port. */
  size = (short) trans(&hdr, buffer, BUFSIZE, &hdr, NILBUF, 0);
  if (size < 0) {
          printf("trans failed %d0, size);
  } else {
          if (hdr.h_status != OK)/* nonzero status is an error */
                  work_not_done();
                              else
                  successful_trans;
  }


11.7.  SIGNAL HANDLING

   It is important for programmers to understand how signals work.  If a  client
receives  a signal while doing a _t_r_a_n_s, the signal propagates to the server.  If
the server is also doing a _t_r_a_n_s then it propagates again to  the  next  server,
and  so  on.  The aim of this is to request all servers to terminate their tran-
saction as soon as possible.
   If the server receiving the signal is not doing a transaction and not already
doing  a  _p_u_t_r_e_p  then the server code must handle the signal.  It may choose to
catch the signal and send a reply immediately or simply ignore the  signal.   If
it  does  not  catch  the signal then it will die since the signal propagated is
SIGAMOEBA (which is defined as SIGEMT for MINIX).  In this case the  transaction
will fail (with return status FAILED for the client).
   Once the transaction is completed the client process will be signaled.  It in
turn  must  handle  the  original signal (not necessarily SIGAMOEBA).  The exact
transaction semantics of Amoeba are not supported under MINIX due to  difficulty
in  keeping  user processes alive until a transaction terminates after a signal.
Signal propagation does occur, but the client may die before a reply  comes  in.
This  should  not  matter too much for most applications. In the next rewrite of
Amoeba the syntax and semantics of these functions will change in non-compatible
ways, but this will probably not appear in MINIX.














288                                NETWORKING                          CHAP.  11

11.8.  IMPLEMENTATION OF TRANSACTIONS IN MINIX

   Amoeba transactions are implemented in the MINIX kernel as a number of kernel
tasks.  Several  alterations  were  made  to  the kernel to support these tasks,
including the addition of an (optional) ethernet driver (for the Western Digital
EtherCard  Plus,  also  known as the WD1003E) and the possibility to specify the
size of the stack for kernel tasks on a per  task  basis.   (Amoeba  tasks  need
larger  stacks than the other MINIX kernel tasks.) There is also an extra system
call that is handled by MM.  This is the Amoeba system call and is the interface
to the kernel.  Special handling of signals is also provided for in the MM task.
   There are five kernel tasks for Amoeba.  The first acts as  a  manager  which
accepts asynchronous events.  Possible events are:
     1. An ethernet packet has arrived
     2. A local signal has arrived
     3. A user task involved in an active transaction has died
     4. A sweep timeout has occurred
(Locate timeouts are implemented using a  counter  which  is  decremented  every
tenth  of  a  second  by a sweep routine.) Each of the other four tasks manage a
single user process' transactions.  Thus, a maximum of four processes can simul-
taneously do transactions under MINIX.  The number of transaction tasks is, how-
ever, a constant in an include file and can be increased if needed.
   In the MINIX kernel there is a table which keeps  a  record  of  the  current
state  of  a  transaction.   This table is called _a_m__t_a_s_k and is declared in the
file _a_m_o_e_b_a._c. This records many things, including, the process  number  of  the
task  doing  the  transaction, the current state (locating, waiting for a reply,
waiting for a request, etc.) and the relevant ports and machine addresses.
   The Amoeba network protocol is a stop and wait protocol  that  guarantees  at
most once delivery of a message.  A message consists of the concatenation of the
transaction header with the data in the buffer (if any) given to  _t_r_a_n_s,  _g_e_t_r_e_q
or  _p_u_t_r_e_p.   The transaction code divides messages up into packets which fit on
the underlying network medium (which is ethernet in the case of MINIX).  It then
sends  over the message fragments and they are reassembled on the remote machine
before being given to the recipient.
   Each packet begins with an ethernet header (which consists of the source  and
destination  ethernet  addresses)  followed  by a 10-byte Amoeba internet header
containing data about the source and destination processes to  ensure  that  the
message is delivered to the correct process.  The rest of the packet is used for
sending data.


11.9.  COMPILING THE SYSTEM

   There are several interesting things you need to know before you can build  a
MINIX  kernel  with  Amoeba transactions in it. First of all, you do not need an
Ethernet to use transactions.  You can have your clients and servers running  on
a  single  machine.   In  this  mode,  it is possible to write and debug network
software without having a network.  Later, when you move to a real network,  the
code  will  already be fully debugged, as the system itself makes no distinction














SEC.  11.9                    COMPILING THE SYSTEM                           289

between local and remote transactions.
   Second, the transaction code is quite substantial.  So much so that it  would
tend  to overshadow the rest of MINIX if it were fully integrated into it.  This
fact, combined with the knowledge that not all MINIX  users  are  interested  in
networking  has  led to adding a new top-level directory in MINIX, _a_m_o_e_b_a.  This
directory and its subdirectories contain all the networking code.   If  you  are
not interested in networking, just ignore it.
   Installation of networking is largely  auto-configured  using  the  makefiles
provided. Two new -D entries are used in the _m_m and _a_m_o_e_b_a/_k_e_r_n_e_l makefiles:
     -DAM_KERNEL    (used in _m_m and _a_m_o_e_b_a/_k_e_r_n_e_l) enables networking
     -DNONET        (used in _a_m_o_e_b_a/_k_e_r_n_e_l) single machine networking
in other words, local transactions only If you use -DAM_KERNEL but  not  DNONET,
you get full networking and MUST have a Western Digital Etherplus card.
   If you add a new kernel task of your own then it MUST come between the Amoeba
kernel  tasks and the printer task in the file kernel/table.c and should be num-
bered relative to AMOEBA_CLASS in the file _h/_c_o_m._h (i.e. The task number  should
be  AMOEBA_CLASS+1  for  the  first  new task, AMOEBA_CLASS+2 for the second new
task, etc.).  Be sure to set NR_TASKS correctly.
   To compile and install networking, you must follow the steps below carefully.


11.10.  HOW TO INSTALL NETWORKING IN MINIX

   You must do the following important steps carefully.  However, before  start-
ing,  make  sure that /_u_s_r/_l_i_b/_c_p_p has at least 50000 bytes of stack space (_s_i_z_e
will tell you).  If you, use _c_h_m_e_m to give it more.
   1. Make sure that you are in the Amoeba directory and that there is plenty
      of free disk space.  Now edit _M_a_k_e_f_i_l_e to include or exclude _N_O_N_E_T from
      _C_F_L_A_G_S as you prefer.
   2. Type:
           make
   3. When you are instructed to do so, insert a blank diskette and  hit  the
      return key.
   4. Reboot your machine using the new boot floppy.
   5. Test the system.  The directory _a_m_o_e_b_a/_e_x_a_m_p_l_e_s contains  several  pro-
      grams to test the reliability of transactions.  The _R_E_A_D__M_E file in the
      directory gives more details.
   6. If you have an ethernet card  then  install  the  network  tools.   The
      directory amoeba/util contains utilities for remote shells, remote file
      copying and message sending.  These only work with machines  that  have
      Amoeba  transactions  installed.   The  _R_E_A_D__M_E  file  there gives more
      details.















290                                NETWORKING                          CHAP.  11

11.11.  NETWORKING UTILITIES

   There are several utility programs which you may find useful if  you  have  a
network  connection.   They  are listed below with a brief outline of their use.
Other utilities are possible and reasonably simple to  write  as  shell  scripts
that  use _r_s_h (remote shell, described below).  The utilities are located in the
amoeba/utilities directory.


11.12.  REMOTE SHELL

   One of the main features of MINIX networking is the use of the remote  shell.
This utility is a server that accepts commands over the network from clients and
executes them.  The syntax of this command is:
     rsh [-bei] _p_o_r_t _c_o_m_m_a_n_d
_T_h_i_s _p_r_o_g_r_a_m _e_x_e_c_u_t_e_s _t_h_e _c_o_m_m_a_n_d _s_p_e_c_i_f_i_e_d _b_y _c_o_m_m_a_n_d on  the  machine  with  a
sherver  (described  below) listening to the port _p_o_r_t, which is an ASCII string
of up to 6 characters.  It is used to generate a unique port name for the under-
lying transaction mechanism.
   Normally standard output and standard error from the command are  written  on
standard output of the local process.  If the -e flag is specified then they are
kept separate. The -i flag specifies that standard input for the command  should
come  from  the  local  process.  The  -b  flag specifies that the _r_s_h should be
started in the background.  Some examples:
     rsh bozo
_s_t_a_r_t_s _a_n _i_n_t_e_r_a_c_t_i_v_e _s_h_e_l_l _o_n _t_h_e _m_a_c_h_i_n_e _r_u_n_n_i_n_g _a  _s_h_e_r_v_e_r  _w_i_t_h  _p_o_r_t  _b_o_z_o.
Subsequent  commands that you type will be fed to the remote shell.  You can use
cd to change to a directory on the remote machine,  ls  to  list  files  in  the
remote  directory,  and any other commands you want.  In effect, _r_s_h gives you a
simple form of remote login.  Note that to make this work,  the  remote  process
listening on the port _b_o_z_o must be a shell server (sherver).
   As a second example of _r_s_h, consider
     rsh jumbo cat /etc/passwd
_w_h_i_c_h _d_i_s_p_l_a_y_s _o_n _y_o_u_r _s_c_r_e_e_n _t_h_e _f_i_l_e /_e_t_c/_p_a_s_s_w_d from the  machine  running  a
sherver  with port jumbo. The _r_s_h command could also have redirected this output
to a local file or pipe.
   A slightly more complex example is
     rsh -i freddo 'cat >/usr/ast/junk' </etc/termcap
which runs the command
     cat >/usr/ast/junk
on machine the machine running a sherver with port _f_r_e_d_d_o and takes as input the
file /_e_t_c/_t_e_r_m_c_a_p from the local machine.  Note that by quoting the second argu-
ment, it is passed as a string to the remote sherver.  If the  command  contains
magic characters (e.g., *.c) the resulting action depends on whether the command














SEC.  11.12                       REMOTE SHELL                               291

is quoted or not.  If it is not quoted, the local shell will  expand  the  magic
characters  before  _r_s_h  is  even  called. If the command is quoted, the command
string is passed unmodified to the remote sherver, which then expands it in  the
directory it is currently working in.
   When you log into a remote machine with _r_s_h, you get a shell having  the  uid
and gid of the sherver (see below).  To get your own uid and gid, type
     exec su george
_a_s_s_u_m_i_n_g _t_h_a_t _y_o_u_r _l_o_g_i_n _i_s _g_e_o_r_g_e. If you have a password, _s_u will ask for  it.
Needless  to say, the _s_u program will use /_e_t_c/_p_a_s_s_w_d on the remote machine.  Do
not forget to use exec, as this eliminates the need for an extra shell.  If  you
do not need your own uid, do not bother, as it costs memory.


11.13.  SHERVERS

   To enable remote shell operations, it is necessary to have a sherver  running
on the destination machine.  Shervers can be started up by:
     sherver port
assuming that sherver is kept in  /_u_s_r/_b_i_n. This program  listens  to  the  port
specified and accepts a single request from the program _r_s_h. It then executes it
with the uid and gid of the sherver.  When it is finished, the sherver exits.
   The sherver gets its input from a pipe.  This means that it can only do those
things  possible  with  a pipe as input. In particular, signals (e.g., DEL), EOF
(e.g., CTRL-D), and the ioctl system call do not  work  properly.   Hitting  DEL
remotely will kill the sherver.  There is no simple solution, except to use stty
to change your DEL character so that you do not hit it out of habit.


11.14.  MASTERS

   Another useful program is _m_a_s_t_e_r.  It is started up as follows:
     master count uid gid command
_T_h_i_s _p_r_o_g_r_a_m _s_t_a_r_t_s _u_p _c_o_u_n_t _c_o_p_i_e_s _o_f _t_h_e _p_r_o_g_r_a_m  _s_p_e_c_i_f_i_e_d  _b_y  _c_o_m_m_a_n_d  _w_i_t_h
_u_s_e_r  _i_d  _u_i_d  _a_n_d _g_r_o_u_p _i_d _g_i_d. _T_h_e _c_o_m_m_a_n_d _m_a_y _b_e _g_i_v_e_n _p_a_r_a_m_e_t_e_r_s.  _I_f _a_t _a_n_y
_t_i_m_e _t_h_e _c_o_m_m_a_n_d _e_x_i_t_s _o_r _d_i_e_s _t_h_e_n _m_a_s_t_e_r will start up a new invocation of it.
This  was designed to work with shervers but has other applications as well. For
example,
     /usr/bin/master 1 2 2 /etc/sherver mumbo
_w_i_l_l _s_t_a_r_t _a _s_i_n_g_l_e _s_h_e_r_v_e_r _l_i_s_t_e_n_i_n_g _t_o _t_h_e _p_o_r_t _m_u_m_b_o and ensure that there is
always  a  sherver running.  This sherver will have uid=2 and gid=2, so that _r_s_h
calls to _m_u_m_b_o will be executed with this uid/gid combination. It  is  suggested
to  start  up _m_a_s_t_e_r in the /_e_t_c/_r_c file of any machine running shervers. When a
sherver finishes executing a command, it exists.  By having  _m_a_s_t_e_r  running  in
the  background  all  the time, every time a sherver exists, its parent, _m_a_s_t_e_r,
will create a new one.  This mechanism is somewhat akin to init creating  a  new














292                                NETWORKING                          CHAP.  11

login process whenever a shell exits.  Since $PATH is generally not set prior to
executing /_e_t_c/_r_c, _m_a_s_t_e_r should be specified as /_u_s_r/_b_i_n/_m_a_s_t_e_r.
   The amount of stack space to give to _m_a_s_t_e_r (and _s_h_e_r_v_e_r) is  important.   If
it  is  too  little, the programs will act weird.  If it is too much, everything
will work fine, but memory will be wasted and there may not be  enough  left  to
run  all  the  programs.   Some  experimentation  is required.  In any event, if
things act strange, use _c_h_m_e_m to allocate more stack space to these programs  to
see if that helps.


11.15.  FILE TRANSFER

   The standard MINIX networking provides for file transfer using a shell script
called rcp (remote cp).  The syntax of the call is
     rcp [port!]from_file [port!]to_file
It can also do local file copy but this is more  easily  accomplished  with  cp.
Here are two examples of rcp usage:
     rcp jumbo!/etc/passwd
     rcp jumbo!/etc/passwd freddo!/usr/ast/pebble
The first one will copy the file /_e_t_c/_p_a_s_s_w_d from the machine running a  sherver
with  the port jumbo to the file _p_a_s_s_w_d in the current directory. The second one
will copy the file /_e_t_c/_p_a_s_s_w_d from the machine running a sherver with the  port
jumbo to the file /_u_s_r/_a_s_t/_p_e_b_b_l_e on the machine running a sherver with the port
_f_r_e_d_d_o. Thus it is possible to issue commands on machine A to  copy  files  from
machine B to machine C.


11.16.  REMOTE PIPES

   It is possible to set up remote pipes using the programs _t_o  and  _f_r_o_m.   The
program  _t_o  reads  from standard input and writes its output to the named port.
Similarly, _f_r_o_m reads from the named port and writes to  standard  output.   For
example,  consider  the  following  commands,  possibly  given  on two different
machines:
     cat F* | sort | to 'port66'
     _f_r_o_m '_p_o_r_t_6_6' | _u_n_i_q -_c | _s_o_r_t -_n
The first command concatenates files beginning with 'F', sorts them, and  writes
the  output  to  'port66'.  The second commands reads from 'port66' and provides
input to the rest of the pipeline.


11.17.  THE ETHERNET INTERFACE

   The ethernet driver in this version of Minix is for the Western Digital  Eth-
ercard  Plus  card,  which is also known as the WD1003E. The ethernet controller
chip on this board  is  the  National  Semiconductor  DP8390.   If  you  have  a














SEC.  11.17                  THE ETHERNET INTERFACE                          293

different  type of ethernet controller then there are several things you need to
know about the interface between the driver and the Amoeba transaction layer  in
order to write a suitable driver for your card.
   There were several fundamental assumptions  made  while  designing  the  high
level protocol which affect the ethernet driver.
   1. The ethernet controller has enough local memory to buffer at least  one
      incoming packet and one outgoing packet and will not overwrite a buffer
      with a new incoming packet until the buffer has been released.
   2. Read buffers are released in the same order  as  they  were  allocated.
      After  a  read interrupt has occurred and (*_b_u_f_r_e_a_d)() has been called,
      then _b_u_f_r_e_a_d will not be called again until  an  eth_release  has  been
      done.
   3. The ethernet driver generates no write interrupts.  This is because  we
      found  that busy waiting was more efficient than doing a context switch
      and waiting for an interrupt.  By the time the context switch was done,
      the  interrupt  had  already  happened, so we had to switch back.  It's
      faster to just wait for it.  On a very slow machine, a different  stra-
      tegy might be
       appropriate.

There are several routines used by the high level code which should be  provided
by  the  ethernet driver.  Unless otherwise stated, these routines are called in
the file _a_m_o_e_b_a._c.
   1. _e_t_h_e_r_a_d_d_r - get ethernet address of this host from rom.
   2. _e_t_h__i_n_i_t - initialises the ethernet card and sets pointers to  routines
      to be called on packet arrival and departure.
   3. _e_t_h__g_e_t_b_u_f - returns pointer to next write buffer.
   4. _e_t_h__w_r_i_t_e - writes the current "write buffer" to the net.
   5. _e_t_h__r_e_l_e_a_s_e - release a read buffer for reuse.
   6. _e_t_h__s_t_p - shuts up the ethernet chip so that reboot can stop all inter-
      rupts  from  the  chip.   The normal reboot procedure does not stop the
      WD1003E from running, so the next time interrupts are enabled it  makes
      a fuss (called from klib88.s).

The files _d_p_8_3_9_0._c, _d_p_8_3_9_0._h, _d_p_8_3_9_0_i_n_f_o._h  and  _d_p_8_3_9_0_s_t_a_t._h  contain  routines
specific to the NS DP8390 chip.  These may need some slight changes before work-
ing correctly with another manufacturer's board which also uses this chip.   The
files  _e_t_h_e_r_p_l_u_s._c  and  _e_t_h_e_r_p_l_u_s._h  contain  routines  specific to the WD1003E
board.


















294                                NETWORKING                          CHAP.  11

11.18.  REFERENCES

 1. Birrell, A.D., and Nelson, B.J.: "Implementing Remote Procedure Calls,"  _A_C_M
  _T_r_a_n_s_a_c_t_i_o_n_s _o_n _C_o_m_p_u_t_e_r _S_y_s_t_e_m_s, vol. 2, pp. 39-59, Feb. 1984.

 2. Cheriton, D.. "The V Kernel: A Software Base for Distributed Systems,"  _I_E_E_E
  _S_o_f_t_w_a_r_e _M_a_g_a_z_i_n_e, vol. 1, pp. 19-42, April 1984.

 3. Bal, H.E., Renesse, R. van, and Tanenbaum, A.S.:  "Implementing  Distributed
  Algorithms  using  Remote  Procedure Call," _P_r_o_c. National Computer Conference
  AFIPS, pp. 499-505, 1987.

 4. Renesse, R. van, Tanenbaum, A.S., Staveren, H., and  Hall,  J.:  "Connecting
  RPC-Based  Distributed Systems using Wide-Area Networks," _P_r_o_c. _S_e_v_e_n_t_h _I_n_t_e_r_-
  _n_a_t_i_o_n_a_l _C_o_n_f. _o_n _D_i_s_t_r. _C_o_m_p_u_t_e_r _S_y_s_t_e_m_s, IEEE, pp. 28-34, 1987.

 5. Tanenbaum, A.S., Mullender, S.J., and van Renesse, R.: "Using  Sparse  Capa-
  bilities  in  a Distributed Operating System," _P_r_o_c. _S_i_x_t_h _I_n_t_e_r_n_a_t_i_o_n_a_l _C_o_n_f.
  _o_n _D_i_s_t_r. _C_o_m_p_u_t_e_r _S_y_s_t_e_m_s, IEEE, 1986.

 6. Mullender, S.J., and Tanenbaum, A.S.: "The Design of a Capability-Based Dis-
  tributed Operating System," _C_o_m_p_u_t_e_r _J_o_u_r_n_a_l, vol. 29, pp. 289-299, Aug. 1986.

 7. Tanenbaum, A.S., and Renesse, R. van: "Distributed Operating Systems,"  _C_o_m_-
  _p_u_t_i_n_g _S_u_r_v_e_y_s, vol. 17, pp. 419-470, Dec. 1985.

 8. Mullender, S.J., and Tanenbaum, A.S.: "A Distributed File Service  Based  on
  Optimistic  Concurrency Control," _P_r_o_c. _T_e_n_t_h _S_y_m_p. _O_p_e_r. _S_y_s_t. _P_r_i_n., pp. 51-
  62, 1985.

 9. Mullender, S.J., and Tanenbaum, A.S.: "Protection and  Resource  Control  in
  Distributed  Operating  Systems," _C_o_m_p_u_t_e_r _N_e_t_w_o_r_k_s, vol. 8, pp. 421-432, Oct.
  1984.

10. Mullender, S.J., Rossum, G. van, Tanenbaum, A.S., Renesse, R. van, Staveren,
  H.  van:  "Amoeba-A Distributed Operating System for the 1990s," _I_E_E_E _C_o_m_p_u_t_e_r
  _M_a_g_a_z_i_n_e, May 1990.

11. Tanenbaum, A.S., Renesse, R. van, Staveren, H. van, Sharp,  G.J.,  Mullender
  S.J.,  Jansen,  A.J., and Rossum, G. van: "Experiences with the Amoeba Distri-
  buted Operating System," _C_o_m_m_u_n_i_c_a_t_i_o_n_s _o_f _t_h_e _A_C_M.

12.  Tanenbaum,  A.S.,  _C_o_m_p_u_t_e_r  _N_e_t_w_o_r_k_s,  2nd  ed.,  Englewood  Cliffs,   NJ:
  Prentice-Hall, 1989.













