







               TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn

                        Rick Macklem
                    _U_n_i_v_e_r_s_i_t_y _o_f _G_u_e_l_p_h



                          AABBSSTTRRAACCTT

     The  4.4BSD  implementation  of the Network File System
(NFS)1  is intended to interoperate with other NFS Version 2
Protocol (RFC1094) implementations but also allows use of an
alternate  protocol  that is hoped to provide better perfor-
mance in certain environments.  This paper  will  informally
discuss  these  various  protocol  features  and  their use.
There is a brief overview of the implementation followed  by
several sections on various problem areas related to NFS and
some hints on how to deal with them.

     Not Quite NFS (NQNFS) is an NFS like protocol  designed
to  maintain  full  cache  consistency  between clients in a
crash tolerant manner. It is an adaptation of the NFS proto-
col such that the server supports both NFS and NQNFS clients
while maintaining full consistency between  the  server  and
NQNFS  clients.   It  borrows  heavily  from  work  done  on
Spritely-NFS [Srinivasan89], but  uses  Leases  [Gray89]  to
avoid  the  need to recover server state information after a
crash.


11..  NNFFSS IImmpplleemmeennttaattiioonn

     The 4.4BSD implementation of NFS and the alternate pro-
tocol  nicknamed  Not Quite NFS (NQNFS) are kernel resident,
but make use of a few system daemons.  The kernel  implemen-
tation does not use an RPC library, handling the RPC request
and reply messages directly in _m_b_u_f data areas.  NFS  inter-
faces to the network using sockets via. the kernel interface
available in  _s_y_s_/_k_e_r_n_/_u_i_p_c___s_y_s_c_a_l_l_s_._c  as  _s_o_s_e_n_d_(_)_,  _s_o_r_e_-
_c_e_i_v_e_(_)_,...   There  are  connection management routines for
support of sockets for  connection  oriented  protocols  and
timeout/retransmit  support  for  datagram  sockets  on  the
client side.  For connection oriented  transport  protocols,
such  as  TCP/IP, there is one connection for each client to
server mount point that is maintained until an  umount.   If
the  connection  breaks, the client will attempt a reconnect
with a new socket.  The client side can operate without  any
____________________
   1Network File System (NFS) is believed to be a registered
trademark of Sun Microsystems Inc.












SSMMMM::0066--22                       TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn


daemons running, but performance will be improved by running
nfsiod daemons that perform read-aheads  and  write-behinds.
For the server side to function, the daemons portmap, mountd
and nfsd must be running.  The mountd  daemon  performs  two
important functions.

1)   Upon  startup  and  after a hangup signal, mountd reads
     the exports file and pushes the export information  for
     each  local  file  system down into the kernel via. the
     mount system call.

2)   Mountd handles remote mount protocol (RFC1094, Appendix
     A) requests.

The  nfsd  master  daemon  forks off children that enter the
kernel via. the nfssvc system call.  The  children  normally
remain  kernel resident, providing a process context for the
NFS RPC servers. The only exception to this is when  a  Ker-
beros  [Steiner88]  ticket  is received and at that time the
nfsd exits the kernel temporarily to verify the ticket  via.
the  Kerberos  libraries and then returns to the kernel with
the results.  (This only happens for Kerberos  mount  points
as described further under Security.)  Meanwhile, the master
nfsd waits to accept new connections from clients using con-
nection  oriented  transport  protocols  and  passes the new
sockets down into the kernel.   The  client  side  mount_nfs
along  with portmap and mountd are the only parts of the NFS
subsystem that make any use of the Sun RPC library.

22..  MMoouunntt PPrroobblleemmss

     There are several problems that can be  encountered  at
the  time  of  an NFS mount, ranging from a unresponsive NFS
server (crashed, network partitioned from client,  etc.)  to
various  interoperability  problems  between  different  NFS
implementations.

     On the server side, if the 4.4BSD NFS  server  will  be
handling  any  PC clients, mountd will require the --nn option
to enable non-root mount request servicing.   Running  of  a
pcnfsd2 daemon will also  be  necessary.   The  server  side
requires  that  the  daemons  mountd and nfsd be running and
that they be registered with portmap properly.  If  problems
are  encountered,  the safest fix is to kill all the daemons
and then restart them in the order portmap, mountd and nfsd.
Other  server  side problems are normally caused by problems
with the format of the exports file, which is covered  under
Security and in the exports man page.
____________________
   2  Pcnfsd  is available in source form from Sun Microsys-
tems and many anonymous ftp sites.












TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn                       SSMMMM::0066--33


     On  the  client  side,  there are several mount options
useful for dealing with server problems.  In cases  where  a
file  system  is  not  critical for system operation, the --bb
mount option may be specified so that mount_nfs will go into
the  background  for  a  mount  attempt  on  an unresponsive
server.  This is useful for mounts specified in _f_s_t_a_b_(_5_), so
that  the system will not get hung while booting doing mmoouunntt
--aa because a file server is not responsive.   On  the  other
hand,  if  the  file system is critical to system operation,
this option should not be used so that the client will  wait
for  the  server to come up before completing bootstrapping.
There are also three mount options to help deal with  inter-
operability  issues with various non-BSD NFS servers. The --PP
option specifies that the NFS client use a reserved IP  port
number  to satisfy some servers' security requirements.3 The
--cc option stops the NFS client from doing a _c_o_n_n_e_c_t  on  the
UDP  socket,  so that the mount works with servers that send
NFS replies from port numbers other than the standard 2049.4
Finally,  the  --gg==_n_u_m  option  sets  the maximum size of the
group list in the credentials passed to  an  NFS  server  in
every RPC request. Although RFC1057 specifies a maximum size
of 16 for the group list, some  servers  can't  handle  that
many.   If  a  user,  particularly root doing a mount, keeps
getting access denied from a file  server,  try  temporarily
reducing the number of groups that user is in to less than 5
by editing /etc/group. If the user can then access the  file
system,  slowly  increase the number of groups for that user
until the limit is found and then peg the limit  there  with
the  --gg==_n_u_m  option.  This implies that the server will only
see the first _n_u_m groups that the  user  is  in,  which  can
cause some accessibility problems.

     For sites that have many NFS servers, amd [Pendry93] is
a useful administration tool.  It also reduces the number of
actual  NFS mount points, alleviating problems with commands
such as df(1) that hang when  any  of  the  NFS  servers  is
unreachable.

33..  DDeeaalliinngg wwiitthh HHuunngg SSeerrvveerrss

     There  are  several  mount  options available to help a
client deal with being hung  waiting  for  response  from  a
crashed  or  unreachable5  server.  By default, a hard mount
will continue to try to contact the  server  ``forever''  to
complete  the system call. This type of mount is appropriate
____________________
   3Any  security benefit of this is highly questionable and
as such the BSD server does not require a client  to  use  a
reserved port number.
   4The Encore Multimax is known to require this.
   5Due to a network partitioning or similar.












SSMMMM::0066--44                       TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn


when processes on the client that access files in  the  file
system do not tolerate file I/O systems calls that return -1
with _e_r_r_n_o _=_= _E_I_N_T_R and/or access  to  the  file  system  is
critical for normal system operation.

There are two other alternatives:

1)   A  soft  mount  (--ss  option) retries an RPC _n times and
     then the corresponding  system  call  returns  -1  with
     errno  set to EINTR.  For TCP transport, the actual RPC
     request is not retransmitted, but the timeout intervals
     waiting  for  a  reply  from the server are done in the
     same manner as UDP for this purpose.  The problem  with
     this  type  of  mount  is that most applications do not
     expect an EINTR error return from file I/O system calls
     (since it never occurs for a local file system) and get
     confused by the error return from the I/O system  call.
     The  option  --xx==_n_u_m  is used to set the RPC retry limit
     and if set too low, the error returns will start occur-
     ring whenever the NFS server is slow due to heavy load.
     Alternately, a  large  retry  limit  can  result  in  a
     process  hung  for a long time, due to a crashed server
     or network partitioning.

2)   An interruptible mount (--ii option) checks to see  if  a
     termination  signal  is  pending  for  the process when
     waiting for server response and if it is, the I/O  sys-
     tem  call  posts an EINTR. Normally this results in the
     process being terminated by the signal  when  returning
     from  the  system  call.   This  feature  allows you to
     ``^C'' out of processes that are hung due to  unrespon-
     sive  servers.   The problem with this approach is that
     signals that are caught by a process are not recognized
     as  termination  signals  and  the  process will remain
     hung.6

44..  RRPPCC TTrraannssppoorrtt IIssssuueess

     The NFS Version 2 protocol runs over  UDP/IP  transport
by   sending   each  Sun  Remote  Procedure  Call  (RFC1057)
request/reply message in a single UDP  datagram.  Since  UDP
does  not  guarantee datagram delivery, the Remote Procedure
Call (RPC) layer times out and retransmits an RPC request if
no  RPC reply has been received. Since this round trip time-
out (RTO) value is for the entire RPC  operation,  including
RPC  message  transmission  to  the  server,  queuing at the
server for an nfsd, performing the RPC and sending  the  RPC
____________________
   6Unfortunately, there are also some  resource  allocation
situations  in  the  BSD kernel where the termination signal
will be ignored and the process will not terminate.












TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn                       SSMMMM::0066--55


reply  message back to the client, it can be highly variable
for even a moderately loaded NFS server.  As a  result,  the
RTO  interval  must  be  a conservation (large) estimate, in
order  to  avoid  extraneous RPC request retransmits.7 Also,
with an 8Kbyte  read/write  data  size  (the  default),  the
read/write  reply/request  will  be  an 8+Kbyte UDP datagram
that must normally be fragmented at the IP layer for  trans-
mission.8 For IP fragments to  be  successfully  reassembled
into  the IP datagram at the receive end, all fragments must
be received within a fairly short ``time to live''.  If  one
fragment  is lost/damaged in transit, the entire RPC must be
retransmitted and redone.  This problem can  be  exaggerated
by  a  network  interface on the receiver that cannot handle
the reception of back to back network packets. [Kent87a]

     There are several tuning mount options  on  the  client
side  that can prove useful when trying to alleviate perfor-
mance problems related to UDP RPC  transport.   The  options
--rr==_n_u_m  and  --ww==_n_u_m  specify  the maximum read or write data
size respectively.  The size _n_u_m should be a power of 2 (4K,
2K,  1K)  and  adjusted downward from the maximum of 8Kbytes
whenever IP fragmentation  is  causing  problems.  The  best
indicator of IP fragmentation problems is a significant num-
ber of _f_r_a_g_m_e_n_t_s _d_r_o_p_p_e_d _a_f_t_e_r _t_i_m_e_o_u_t reported by  the  _i_p_:
section  of  a  nneettssttaatt  --ss  command on either the client or
server.  Of course, if the fragments are  being  dropped  at
the  server,  it can be fun figuring out which client(s) are
involved.  The most likely candidates are clients  that  are
not  on  the  same  local area network as the server or have
network interfaces that do not receive several back to  back
network packets properly.

     By default, the 4.4BSD NFS client dynamically estimates
the retransmit timeout interval for the RPC and this appears
to  work reasonably well for many environments. However, the
--dd flag can be specified to turn off the dynamic  estimation
of  retransmit timeout, so that the client will use a static
initial  timeout  interval.9  The  --tt==_n_u_m option can be used
with --dd to set the initial timeout interval  to  other  than
the  default  of 2 seconds.  The best indicator that dynamic
estimation should be turned off would be a significant  num-

____________________
   7At best, an extraneous RPC request retransmit  increases
the  load  on  the server and at worst can result in damaged
files on the server  when  non-idempotent  RPCs  are  redone
[Juszczak89].
   86  IP  fragments  for  an Ethernet, which has an maximum
transmission unit of 1500bytes.
   9After the first retransmit timeout, the initial interval
is backed off exponentially.












SSMMMM::0066--66                       TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn


ber10  in  the  _X  _R_e_p_l_i_e_s  field  and a large number in the
_R_e_t_r_i_e_s field in the _R_p_c _I_n_f_o_: section as  reported  by  the
nnffssssttaatt  command.  On the server, there would be significant
numbers of _I_n_p_r_o_g recent request cache hits  in  the  _S_e_r_v_e_r
_C_a_c_h_e  _S_t_a_t_s_:  section  as  reported by the nnffssssttaatt command,
when run on the server.

     The tradeoff is that a smaller timeout interval results
in  a  better  average  RPC response time, but increases the
risk of extraneous retries that in turn increase server load
and  the  possibility  of damaged files on the server. It is
probably best to err on the safe side and use  a  large  (>=
2sec)  fixed timeout if the dynamic retransmit timeout esti-
mation seems to be causing problems.

     An alternative to all this fiddling is to run NFS  over
TCP  transport  instead of UDP.  Since the 4.4BSD TCP imple-
mentation provides reliable delivery  with  congestion  con-
trol,  it avoids all of the above problems.  It also permits
the use of read and write data sizes greater than the 8Kbyte
limit for UDP transport.11 NFS  over  TCP  usually  delivers
comparable to significantly better performance than NFS over
UDP unless the client or server processor runs at less  than
5-10MIPS.  For  a  slow processor, the extra CPU overhead of
using TCP transport will become significant and  TCP  trans-
port  may only be useful when the client to server intercon-
nect traverses congested gateways.  The  main  problem  with
using TCP transport is that it is only supported between BSD
clients and servers.12

55..  OOtthheerr TTuunniinngg TTrriicckkss

     Another mount option that may improve performance  over
certain  network interconnects is --aa==_n_u_m which sets the num-
ber of blocks that the system  will  attempt  to  read-ahead
during  sequential reading of a file. The default value of 1
seems to be appropriate for most situations,  but  a  larger
value  might  achieve  better  performance for some environ-
ments, such as a mount to a server across a ``high bandwidth
* round trip delay'' interconnect.

     For  the  adventurous,  playing  with  the  size of the
buffer cache can also improve performance for some  environ-
ments  that use NFS heavily.  Under some workloads, a buffer
____________________
   10Even 0.1% of the total RPCs is probably significant.
   11Read/write  data  sizes  greater  than 8Kbytes will not
normally improve  performance  unless  the  kernel  constant
MAXBSIZE  is increased and the file system on the server has
a block size greater than 8Kbytes.
   12There are rumors of commercial NFS over TCP implementa-
tions  on the horizon and these may well be worth exploring.











TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn                       SSMMMM::0066--77


cache of 4-6Mbytes can  result  in  significant  performance
improvements over 1-2Mbytes, both in client side system call
response time and reduced server RPC load.  The buffer cache
size  defaults  to  10%  of physical memory, but this can be
overridden  by  specifying  the  BUFPAGES  option   in   the
machine's  config  file.13  When increasing the size of BUF-
PAGES, it is  also  advisable  to  increase  the  number  of
buffers  NBUF by a corresponding amount.  Note that there is
a tradeoff of memory allocated to the  buffer  cache  versus
available  for  paging, which implies that making the buffer
cache larger will increase paging rate, with possibly disas-
trous results.

66..  SSeeccuurriittyy IIssssuueess

     When  a  machine is running an NFS server it opens up a
great big security  hole.   For  ordinary  NFS,  the  server
receives  client credentials in the RPC request as a user id
and a list of group ids and trusts  them  to  be  authentic!
The  only  tool  available to restrict remote access to file
systems with is the exports(5) file, so file systems  should
be  exported  with  great care.  The exports file is read by
mountd upon startup and after a hangup signal is posted  for
it and then as much of the access specifications as possible
are pushed down into the kernel for use by the nfsd(s).  The
trick here is that the kernel information is stored on a per
local file system mount point and client host address  basis
and  cannot refer to individual directories within the local
server file system.  It is best to think of the exports file
as  referring to the various local file systems and not just
directory paths as mount points.  A local file system may be
exported  to  a specific host, all hosts that match a subnet
mask or all other hosts (the world). The latter is very dan-
gerous and should only be used for public information. It is
also strongly recommended  that  file  systems  exported  to
``the world'' be exported read-only.  For each host or group
of hosts, the file  system  can  be  exported  read-only  or
read/write.  You can also define one of three client user id
to server credential mappings to help control access.   Root
(user  id  ==  0)  can be mapped to some default credentials
while all other user ids are  accepted  as  given.   If  the
default  credentials  for  user id equal zero are root, then
there is essentially no remapping.  Most  NFS  file  systems
are exported this way, most commonly mapping user id == 0 to
the credentials for the user nobody.  Since the client  user
id and group id list is used unchanged on the server (except
for root), this also implies that the user id and  group  id
____________________
   BUFPAGES is the number of physical machine pages allocat-
ed to the buffer cache.  ie. BUFPAGES * NBPG = buffer  cache
size in bytes












SSMMMM::0066--88                       TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn


space  must  be  common between the client and server.  (ie.
user id N on the client must refer to the same user  on  the
server)  All user ids can be mapped to a default set of cre-
dentials, typically that of the  user  nobody.  This  essen-
tially  gives world access to all users on the corresponding
hosts.

     There is also a non-standard BSD  --kkeerrbb  export  option
that  requires  the client provide a KerberosIV rcmd service
ticket to authenticate the user on the server.  If  success-
ful,  the  Kerberos  principal  is looked up in the server's
password and group databases to get a set of credentials and
a  map of client userid to these credentials is then cached.
The use of TCP transport is strongly recommended, since  the
scheme  depends  on  the  TCP  connection  to  avert  replay
attempts.  Unfortunately, this option is only usable between
BSD  clients  and  servers  since  it is not compatible with
other known ``kerberized'' NFS systems.  To  enable  use  of
this  Kerberos option, both mount_nfs on the client and nfsd
on the server must be rebuilt with the -DKERBEROS option and
linked  to  KerberosIV  libraries.   The file system is then
exported to the client(s)  with  the  --kkeerrbb  option  in  the
exports  file  on  the server and the client mount specifies
the --KK and --TT options.  The --mm==_r_e_a_l_m  mount  option  may  be
used  to specify a Kerberos Realm for the ticket (it must be
the Kerberos Realm of the server) that  is  other  than  the
client's  local  Realm.   To  access  files in a --kkeerrbb mount
point, the user must have  a  valid  TGT  for  the  server's
Realm, as provided by kinit or similar.

     As   well  as  the  standard  NFS  Version  2  protocol
(RFC1094) implementation, BSD systems can use a  variant  of
the  protocol  called  Not Quite NFS (NQNFS) that supports a
variety of protocol extensions.  This  protocol  uses  64bit
file  offsets  and sizes, an _a_c_c_e_s_s _r_p_c, an _a_p_p_e_n_d option on
the write rpc and extended file attributes to support 4.4BSD
file  system functionality more fully.  It also makes use of
a variant of short term _l_e_a_s_e_s [Gray89] with  delayed  write
client  caching,  in an effort to provide full cache consis-
tency and better performance.  This  protocol  is  available
between  4.4BSD  systems  only and is used when the --qq mount
option is specified.  It can be used with any of the  afore-
mentioned  options  for  NFS, such as TCP transport (--TT) and
KerberosIV authentication (--KK).  Although this  protocol  is
experimental,  it is recommended over NFS for mounts between
4.4BSD systems.14
____________________
   14I would appreciate email from anyone  who  can  provide
NFS  vs.  NQNFS  performance measurements, particularly fast
clients, many clients or  over  an  internetwork  connection
with a large ``bandwidth * RTT'' product.












TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn                       SSMMMM::0066--99


77..  MMoonniittoorriinngg NNFFSS AAccttiivviittyy

     The  basic  command  for  monitoring  NFS  activity  on
clients and servers is nfsstat. It reports  cumulative  sta-
tistics  of  various  NFS  activities, such as counts of the
various different RPCs and cache hit rates on the client and
server.  Of particular interest on the server are the fields
in the _S_e_r_v_e_r _C_a_c_h_e _S_t_a_t_s_: section, which gives numbers  for
RPC  retries  received  in  the first three fields and total
RPCs in the fourth. The first three fields should  remain  a
very  small  percentage of the total. If not, it would indi-
cate one or more clients doing retries too aggressively  and
the  fix  would  be  to  isolate  these clients, disable the
dynamic RTO estimation on them and make their initial  time-
out interval a conservative (ie. large) value.

     On the client side, the fields in the _R_p_c _I_n_f_o_: section
are of particular interest, as they give an overall  picture
of  NFS  activity.   The _T_i_m_e_d_O_u_t field is the number of I/O
system calls that returned -1 for ``soft'' mounts and can be
reduced  by increasing the retry limit or changing the mount
type to ``intr'' or ``hard''.  The _I_n_v_a_l_i_d field is a  count
of  trashed  RPC replies that are received and should remain
zero.15  The  _X  _R_e_p_l_i_e_s field counts the number of repeated
RPC replies received from the server and is a clear  indica-
tion  of  a  too  aggressive RTO estimate.  Unfortunately, a
good NFS server implementation will use a  ``recent  request
cache''  [Juszczak89]  that  will  suppress  the  extraneous
replies.  A large value for _R_e_t_r_i_e_s indicates a problem, but
it could be any of:

+o    a too aggressive RTO estimate

+o    an overloaded NFS server

+o    IP fragments being dropped (gateway, client or server)

and  requires  further investigation.  The _R_e_q_u_e_s_t_s field is
the total count of RPCs done on all servers.

     The nneettssttaatt --ss comes in useful during investigation  of
RPC  transport  problems.  The field _f_r_a_g_m_e_n_t_s _d_r_o_p_p_e_d _a_f_t_e_r
_t_i_m_e_o_u_t in the _i_p_: section indicates IP fragments are  being
lost  and  a significant number of these occurring indicates
that the use of TCP transport or a smaller  read/write  data
size  is  in  order.   A significant number of _b_a_d _c_h_e_c_k_s_u_m_s
reported in the _u_d_p_: section would suggest network  problems
of  a  more  generic sort.  (cabling, transceiver or network
____________________
   15Some NFS implementations run with  UDP  checksums  dis-
abled, so garbage RPC messages can be received.












SSMMMM::0066--1100                      TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn


hardware interface problems or similar)

     There is a RPC activity logging facility for  both  the
client  and  server  side  in  the  kernel.  When logging is
enabled by setting the kernel variable nfsrtton to one,  the
logs  in  the kernel structures nfsrtt (for the client side)
and nfsdrt (for the server side) are updated upon  the  com-
pletion  of  each RPC in a circular manner.  The pos element
of the structure is the index of the next element of the log
array  to  be  updated.  In other words, elements of the log
array from _l_o_g[pos] to _l_o_g[pos -  1]  are  in  chronological
order.   The include file <sys/nfsrtt.h> should be consulted
for details on the fields in the two log structures.16

88..  DDiisskklleessss CClliieenntt SSuuppppoorrtt

     The  NFS  client  does include kernel support for disk-
less/dataless operation  where  the  root  file  system  and
optionally  the  swap  area  is remote NFS mounted.  A disk-
less/dataless client is configured using a  version  of  the
``swapvmunix.c''  file  as  provided  in  the directory _c_o_n_-
_t_r_i_b_/_d_i_s_k_l_e_s_s_._n_f_s.  If the swap device == NODEV,  it  speci-
fies  an  NFS mounted swap area and should be configured the
same size as set  up  by  diskless_setup  when  run  on  the
server.    This   file   must   be   put   in  the  _s_y_s_/_c_o_m_-
_p_i_l_e_/_<_m_a_c_h_i_n_e___n_a_m_e_> kernel build directory after the  config
command has been run, since config does not know about spec-
ifying NFS root and swap areas.  The kernel variable  mount-
root  must  be set to nfs_mountroot instead of ffs_mountroot
and the kernel structure  nfs_diskless  must  be  filled  in
properly.   There  are  some primitive system administration
tools in the _c_o_n_t_r_i_b_/_d_i_s_k_l_e_s_s_._n_f_s  directory  to  assist  in
filling  in  the nfs_diskless structure and in setting up an
NFS server for diskless/dataless clients.   The  tools  were
designed  to provide a bare bones capability, to allow maxi-
mum flexibility when setting up different servers.

The tools are as follows:

+o    diskless_offset.c - This little program reads a  ``vmu-
     nix''  object  file  and writes the file byte offset of
     the nfs_diskless structure in it to  standard  out.  It
     was  kept  separate because it sometimes has to be com-
     piled/linked in funny  ways  depending  on  the  client
     architecture.   (See  the  comment  at the beginning of
     it.)


____________________
   16Unfortunately, a monitoring tool that uses  these  logs
is still in the planning (dreaming) stage.












TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn                      SSMMMM::0066--1111


+o    diskless_setup.c - This program is run  on  the  server
     and  sets  up  files for a given client. It mostly just
     fills in an nfs_diskless structure and writes it out to
     either  the  "vmunix"  file  or  a separate file called
     /var/diskless/setup.<official-hostname>

+o    diskless_boot.c - There are two functions in here  that
     may be used by a bootstrap server such as tftpd to per-
     mit sharing of the ``vmunix'' object file  for  similar
     clients.  This saves disk space on the bootstrap server
     and simplify organization, but  are  not  critical  for
     correct  operation.  They read the ``vmunix'' file, but
     optionally fill in the nfs_diskless  structure  from  a
     separate "setup.<official-hostname>" file so that there
     is only one copy of "vmunix" for all similar (same arch
     etc.)  clients.  These functions use a text file called
     /var/diskless/boot.<official-hostname> to  control  the
     netboot.

The basic setup steps are:

+o    make  a  "vmunix" for the client(s) with mountroot() ==
     nfs_mountroot() and swdevt[0].sw_dev == NODEV if it  is
     to  do  nfs swapping as well (See the same swapvmunix.c
     file)

+o    run diskless_offset on the vmunix file to find out  the
     byte offset of the nfs_diskless structure

+o    Run  diskless_setup  on the server to set up the server
     and fill in the nfs_diskless structure for that client.
     The  nfs_diskless  structure can either be written into
     the vmunix file (the -x option) or saved in  /var/disk-
     less/setup.<official-hostname>.

+o    Set up the bootstrap server. If the nfs_diskless struc-
     ture was written into the ``vmunix'' file, any  vanilla
     bootstrap  protocol  such as bootp/tftp can be used. If
     the bootstrap server has been modified to use the func-
     tions in diskless_boot.c, then a file called /var/disk-
     less/boot.<official-hostname> must be created.   It  is
     simply  a  two  line text file, where the first line is
     the pathname of the correct  ``vmunix''  file  and  the
     second line has the pathname of the nfs_diskless struc-
     ture file and its byte offset in it.  For example:
             /var/diskless/vmunix.pmax
             /var/diskless/setup.rickers.cis.uoguelph.ca
     642308

+o    Create a /var subtree for each client in an appropriate
     place on the server, such as /var/diskless/var/<client-
     hostname>/...   By  using the <client-hostname> to dif-
     ferentiate /var for each host, /etc/rc can be  modified










SSMMMM::0066--1122                      TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn


     to mount the correct /var from the server.

99..   NNoott  QQuuiittee  NNFFSS,,  CCrraasshh  TToolleerraanntt CCaacchhee CCoonnssiisstteennccyy ffoorr
NNFFSS

     Not Quite NFS (NQNFS) is an NFS like protocol  designed
to  maintain  full  cache  consistency  between clients in a
crash tolerant manner.  It is an adaptation of the NFS  pro-
tocol  such  that  the  server  supports  both NFS and NQNFS
clients  while  maintaining  full  consistency  between  the
server and NQNFS clients.  This section borrows heavily from
work done on Spritely-NFS [Srinivasan89],  but  uses  Leases
[Gray89]  to avoid the need to recover server state informa-
tion after a crash.  The reader is  strongly  encouraged  to
read  these  references  before trying to grasp the material
presented here.

99..11..  OOvveerrvviieeww

     The protocol maintains cache  consistency  by  using  a
somewhat  Sprite  [Nelson88]  like protocol, but is based on
short term leases17 instead of hard state information  about
open files.  The basic principal is that the  protocol  will
disable client caching of a file whenever that file is write
shared18.  Whenever a client wishes to cache data for a file
it must hold a  valid  lease.   There  are  three  types  of
leases:  read  caching,  write caching and non-caching.  The
latter type requires that all file operations be  done  syn-
chronously  with the server via. RPCs.  A read caching lease
allows for client data caching, but  no  file  modifications
may  be  done.   A  write  caching  lease  allows for client
caching of writes, but requires that all writes be pushed to
the  server  when  the lease expires.  If a client has dirty
buffers19  when  a  write cache lease has almost expired, it
will attempt to extend the lease but is required to push the
dirty  buffers  if extension fails.  A client gets leases by
either doing a GGeettLLeeaassee RRPPCC or by  piggybacking  a  GGeettLLeeaassee
RReeqquueesstt  onto another RPC. Piggybacking is supported for the
frequent RPCs  Getattr,  Setattr,  Lookup,  Readlink,  Read,
Write  and  Readdir  in  an effort to minimize the number of
GGeettLLeeaassee RRPPCCss required.  All leases are at  the  granularity
of  a  file,  since all NFS RPCs operate on individual files
and NFS has no intrinsic notion of a file hierarchy.  Direc-
tories,  symbolic  links  and  file  attributes  may be read
____________________
   17  A  lease  is  a ticket permitting an activity that is
valid until some expiry time.
   18 Write sharing occurs when at least one client is modi-
fying a file while other client(s) are reading the file.
   19 Cached write data is not yet pushed (written)  to  the
server.












TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn                      SSMMMM::0066--1133


cached but are not write cached.  The exception here is  the
attribute  file_size, which is updated during cached writing
on the client to reflect a growing file.

     It is the server's responsibility to ensure  that  con-
sistency  is maintained among the NQNFS clients by disabling
client caching whenever a server file operation would  cause
inconsistencies.   The possibility of inconsistencies occurs
whenever a client has a write caching lease  and  any  other
client,  or  local operations on the server, tries to access
the file or when a modify operation is attempted on  a  file
being  read  cached  by client(s).  At this time, the server
sends an eevviiccttiioonn nnoottiiccee to all clients  holding  the  lease
and  then  waits  for  lease termination.  Lease termination
occurs when a vvaaccaatteedd tthhee pprreemmiisseess message has been received
from  all the clients that have signed the lease or when the
lease expires  via.  timeout.   The  message  pair  eevviiccttiioonn
nnoottiiccee  and  vvaaccaatteedd  tthhee  pprreemmiisseess  roughly correspond to a
Sprite server->client callback, but are not  implemented  as
an  actual RPC, to avoid the server waiting indefinitely for
a reply from a dead client.

     Server consistency checking can be  viewed  as  issuing
intrinsic  leases  for  a file operation for the duration of
the operation only. For example, the CCrreeaattee RRPPCC will get  an
intrinsic  write lease on the directory in which the file is
being created, disabling client read caches for that  direc-
tory.

     By  relegating  this responsibility to the server, con-
sistency between the server and NQNFS clients is  maintained
when NFS clients are modifying the file system as well.20

     The leases are issued as time intervals  to  avoid  the
requirement  of time of day clock synchronization. There are
three important time constants known to the server. The mmaaxx--
iimmuumm__lleeaassee__tteerrmm  sets an upper bound on lease duration.  The
cclloocckk__sskkeeww is added to all lease terms on the server to cor-
rect  for  differing  clock  speeds  between  the client and
server and wwrriittee__ssllaacckk is the number of seconds  the  server
is  willing  to  wait  for  a  client  with an expired write
caching lease to push dirty writes.

     The server maintains a mmooddiiffyy__rreevviissiioonn number for  each
file.  It  is defined as a unsigned quadword integer that is
never zero and that must increase whenever the corresponding
file is modified on the server.  It is used by the client to
determine whether or not cached data for the file is  stale.
____________________
   20 The NFS clients will continue to be _a_p_p_r_o_x_i_m_a_t_e_l_y con-
sistent with the server.












SSMMMM::0066--1144                      TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn


Generating  this value is easier said than done. The current
implementation  uses  the  following  technique,  which   is
believed  to be adequate.  The high order longword is stored
in the ufs inode and is initialized to one when an inode  is
first  allocated.   The low order longword is stored in main
memory only and is initialized to zero when an inode is read
in  from disk.  When the file is modified for the first time
within a given second of wall clock  time,  the  high  order
longword  is  incremented  by one and the low order longword
reset to zero.  For subsequent modifications within the same
second  of wall clock time, the low order longword is incre-
mented. If the low order longword wraps around to zero,  the
high  order  longword  is incremented again.  Since the high
order longword only increments once per second and the inode
is  pushed to disk frequently during file modification, this
implies 0 <= Current-Disk <= 5.  When the inode is  read  in
from  disk,  10  is  added to the high order longword, which
ensures that the quadword is greater than any value it could
have had before a crash.  This introduces apparent modifica-
tions every time the inode falls out of the LRU inode cache,
but  this  should only reduce the client caching performance
by a (hopefully) small margin.

99..22..  CCrraasshh RReeccoovveerryy aanndd ootthheerr FFaaiilluurree SScceennaarriiooss

     The server must maintain the state of all  the  current
leases  held  by  clients.   The nice thing about short term
leases is that maximum_lease_term seconds after  the  server
stops  issuing leases, there are no current leases left.  As
such, server crash  recovery  does  not  require  any  state
recovery. After rebooting, the server refuses to service any
RPCs except for writes until write_slack seconds  after  the
last  lease would have expired21.  By then, the server would
not have any outstanding leases to recover the state of  and
the  clients  have  had at least write_slack seconds to push
dirty writes to the server and get the server sync'd  up  to
date.  After  this, the server simply services requests in a
manner similar to NFS.  In an effort to minimize the  effect
of   "recovery   storms"   [Baker91],   the  server  replies
ttrryy__aaggaaiinn__llaatteerr to the RPCs it is not yet ready to  service.

     After a client crashes, the server may have to wait for
a lease to timeout before servicing a request if write shar-
ing  of  a file with a cachable lease on the client is about
to occur.  As for the client, it simply  starts  up  getting
any  leases  it  now  needs. Any outstanding leases for that
client on the server prior  to  the  crash  will  either  be
____________________
   21  The last lease expiry time may be safely estimated as
"boottime+maximum_lease_term+clock_skew" for  machines  that
cannot store it in nonvolatile RAM.












TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn                      SSMMMM::0066--1155


renewed or expire via timeout.

     Certain network partitioning failures are more problem-
atic. If a client to server network  connection  is  severed
just before a write caching lease expires, the client cannot
push the dirty writes to the server. After the lease expires
on  the  server,  the server permits other clients to access
the file with the potential of getting stale data.  Unfortu-
nately  I  believe this failure scenario is intrinsic in any
delay write caching scheme unless the server is required  to
wait ffoorreevveerr for a client to regain  contact22.   Since  the
write  caching lease has expired on the client, it will sync
up with the server as soon as  the  network  connection  has
been re-established.

     There  is another failure condition that can occur when
the server is congested.  The worst case scenario would have
the  client  pushing  dirty writes to the server but a large
request queue on the server delays  these  writes  for  more
than wwrriittee__ssllaacckk seconds. It is hoped that a congestion con-
trol scheme using the ttrryy__aaggaaiinn__llaatteerr RPC reply after  boot-
ing  combined  with the following lease termination rule for
write caching leases can minimize the risk  of  this  occur-
rence.   A  write  caching  lease  is only terminated on the
server when there are have been no writes to  the  file  and
the  server  has  not  been  overloaded  during the previous
write_slack seconds. The server has not been  overloaded  is
approximated  by  a  test for sleeping nfsd(s) at the end of
the write_slack period.

99..33..  SSeerrvveerr DDiisskk FFuullll

     There is a serious unresolved problem for delayed write
caching  with respect to server disk space allocation.  When
the disk on the file server is full, delayed write RPCs  can
fail  due  to  "out  of  space".   For  NFS, this occurrence
results in an error return from the close system call on the
file, since the dirty blocks are pushed on close.  Processes
writing important files can check for this error  return  to
ensure  that  the file was written successfully.  For NQNFS,
the dirty blocks are not pushed on close  and  as  such  the
client may not attempt the write RPC until after the process
has done the close which implies no error  return  from  the
close.   For  the current prototype, the only solution is to
modify programs writing important file(s) to call fsync  and
check for an error return from it instead of close.


____________________
   22  Gray and Cheriton avoid this problem by using a wwrriittee
tthhrroouugghh policy.












SSMMMM::0066--1166                      TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn


99..44..  PPrroottooccooll DDeettaaiillss

     The  protocol specification is identical to that of NFS
[Sun89] except for the following changes.

+o    RPC Information

                 Program Number 300105
                 Version Number 1


+o    Readdir_and_Lookup RPC

                 struct readdirlookargs {
                         fhandle file;
                         nfscookie cookie;
                         unsigned count;
                         unsigned duration;
                 };

                 struct entry {
                         unsigned cachable;
                         unsigned duration;
                         modifyrev rev;
                         fhandle entry_fh;
                         nqnfs_fattr entry_attrib;
                         unsigned fileid;
                         filename name;
                         nfscookie cookie;
                         entry *nextentry;
                 };

                 union readdirlookres switch (stat status) {
                 case NFS_OK:
                         struct {
                                 entry *entries;
                                 bool eof;
                         } readdirlookok;
                 default:
                         void;
                 };

                 readdirlookres
                 NQNFSPROC_READDIRLOOK(readdirlookargs) = 18;

     Reads entries in a directory in a manner  analogous  to
     the  NFSPROC_READDIR  RPC  in NFS, but returns the file
     handle and attributes of  each  entry  as  well.   This
     allows the attribute and lookup caches to be primed.

+o    Get Lease RPC

                 struct getleaseargs {










TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn                      SSMMMM::0066--1177


                         fhandle file;
                         cachetype readwrite;
                         unsigned duration;
                 };

                 union getleaseres switch (stat status) {
                 case NFS_OK:
                         bool cachable;
                         unsigned duration;
                         modifyrev rev;
                         nqnfs_fattr attributes;
                 default:
                         void;
                 };

                 getleaseres
                 NQNFSPROC_GETLEASE(getleaseargs) = 19;

     Gets  a  lease  for "file" valid for "duration" seconds
     from when the lease was issued on  the  server23.   The
     lease  permits  client  caching  if "cachable" is true.
     The modify revision level and attributes for  the  file
     are also returned.

+o    Eviction Message

                 void
                 NQNFSPROC_EVICTED (fhandle) = 21;

     This  message  is  sent  from the server to the client.
     When the client receives the message, it  should  flush
     data  associated with the file represented by "fhandle"
     from its caches and then send the VVaaccaatteedd MMeessssaaggee  back
     to  the  server.  Flushing  includes  pushing any dirty
     writes via. write RPCs.

+o    Vacated Message

                 void
                 NQNFSPROC_VACATED (fhandle) = 20;

     This message is sent from the client to the  server  in
     response to the EEvviiccttiioonn MMeessssaaggee. See above.

+o    Access RPC

                 struct accessargs {
____________________
   23 To be safe, the client may only assume that the  lease
is  valid for ``duration'' seconds from when the RPC request
was sent to the server.












SSMMMM::0066--1188                      TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn


                         fhandle file;
                         bool read_access;
                         bool write_access;
                         bool exec_access;
                 };

                 stat
                 NQNFSPROC_ACCESS(accessargs) = 22;

     The  access  RPC does permission checking on the server
     for the given type of access required by the client for
     the  file.   Use of this RPC avoids accessibility prob-
     lems caused by client->server uid mapping.

+o    Piggybacked Get Lease Request

     The  piggybacked  get  lease  request  is  functionally
equivalent  to  the Get Lease RPC except that is attached to
one  of  the  other  NQNFS  RPC  requests  as  follows.    A
getleaserequest is prepended to all of the request arguments
for NQNFS and a getleaserequestres is inserted  in  all  NFS
result  structures just after the "stat" field only if "stat
== NFS_OK".

            union getleaserequest switch (cachetype type) {
            case NQLREAD:
            case NQLWRITE:
                    unsigned duration;
            default:
                    void;
            };

            union getleaserequestres switch (cachetype type) {
            case NQLREAD:
            case NQLWRITE:
                    bool cachable;
                    unsigned duration;
                    modifyrev rev;
            default:
                    void;
            };

The get lease request applies to the file that the  attached
RPC  operates  on and the file attributes remain in the same
location as for the NFS RPC reply structure.

+o    Three additional "stat" values

     Three additional values have been added to the  enumer-
ated type "stat".

            NQNFS_EXPIRED=500
            NQNFS_TRYLATER=501










TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn                      SSMMMM::0066--1199


            NQNFS_AUTHERR=502

The "expired" value indicates that a lease has expired.  The
"try later" value is returned by the server when  it  wishes
the  client to retry the RPC request after a short delay. It
is used during crash recovery (Section 2) and  may  also  be
useful  for  server  congestion control.  The "authetication
error" value is returned  for  kerberized  mount  points  to
indicate  that there is no cached authentication mapping and
a Kerberos ticket for the principal is required.

99..55..  DDaattaa TTyyppeess

+o    cachetype

                 enum cachetype {
                         NQLNONE = 0,
                         NQLREAD = 1,
                         NQLWRITE = 2
                 };

     Type of lease requested. NQLNONE is used to indicate no
     piggybacked lease request.

+o    modifyrev

                 typedef unsigned hyper modifyrev;

     The  "modifyrev"  is  a unsigned quadword integer value
     that is never zero and increases every time the  corre-
     sponding file is modified on the server.

+o    nqnfs_time

                 struct nqnfs_time {
                         unsigned seconds;
                         unsigned nano_seconds;
                 };

     For  NQNFS  times are handled at nano second resolution
     instead of micro second resolution for NFS.

+o    nqnfs_fattr

                 struct nqnfs_fattr {
                         ftype type;
                         unsigned mode;
                         unsigned nlink;
                         unsigned uid;
                         unsigned gid;
                         unsigned hyper size;
                         unsigned blocksize;
                         unsigned rdev;










SSMMMM::0066--2200                      TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn


                         unsigned hyper bytes;
                         unsigned fsid;
                         unsigned fileid;
                         nqnfs_time atime;
                         nqnfs_time mtime;
                         nqnfs_time ctime;
                         unsigned flags;
                         unsigned generation;
                         modifyrev rev;
                 };

     The nqnfs_fattr structure  is  modified  from  the  NFS
     fattr  so that it stores the file size as a 64bit quan-
     tity and the storage occupied  as  a  64bit  number  of
     bytes. It also has fields added for the 4.4BSD va_flags
     and va_gen fields as well  as  the  file's  modify  rev
     level.

+o    nqnfs_sattr

                 struct nqnfs_sattr {
                         unsigned mode;
                         unsigned uid;
                         unsigned gid;
                         unsigned hyper size;
                         nqnfs_time atime;
                         nqnfs_time mtime;
                         unsigned flags;
                         unsigned rdev;
                 };

     The  nqnfs_sattr  structure  is  modified  from the NFS
     sattr structure in the same manner as fattr.

The arguments to several of the NFS RPCs have been  modified
as  well.  Mostly, these are minor changes to use 64bit file
offsets or similar. The modified argument structures follow.

+o    Lookup RPC

                 struct lookup_diropargs {
                         unsigned duration;
                         fhandle dir;
                         filename name;
                 };

                 union lookup_diropres switch (stat status) {
                 case NFS_OK:
                         struct {
                                 union getleaserequestres lookup_lease;
                                 fhandle file;
                                 nqnfs_fattr attributes;
                         } lookup_diropok;










TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn                      SSMMMM::0066--2211


                 default:
                         void;
                 };


     The  additional "duration" argument tells the server to
     get a lease for the name being looked up if it is  non-
     zero and the lease is specified in "lookup_lease".

+o    Read RPC

                 struct nqnfs_readargs {
                         fhandle file;
                         unsigned hyper offset;
                         unsigned count;
                 };


+o    Write RPC

                 struct nqnfs_writeargs {
                         fhandle file;
                         unsigned hyper offset;
                         bool append;
                         nfsdata data;
                 };

     The  "append"  argument  is  true for apeend only write
     operations.

+o    Get Filesystem Attributes RPC

                 union nqnfs_statfsres (stat status) {
                 case NFS_OK:
                         struct {
                                 unsigned tsize;
                                 unsigned bsize;
                                 unsigned blocks;
                                 unsigned bfree;
                                 unsigned bavail;
                                 unsigned files;
                                 unsigned files_free;
                         } info;
                 default:
                         void;
                 };

     The "files" field is the number of files  in  the  file
     system and the "files_free" is the number of additional
     files that can be created.













SSMMMM::0066--2222                      TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn


1100..  SSuummmmaarryy

     The configuration and  tuning  of  an  NFS  environment
tends  to be a bit of a mystic art, but hopefully this paper
along with the man pages and other reading will be  helpful.
Good Luck.

1111..  BBiibblliiooggrraapphhyy

[Baker91]       Mary Baker and John Ousterhout, Availability
                in the Sprite Distributed  File  System,  In
                _O_p_e_r_a_t_i_n_g  _S_y_s_t_e_m  _R_e_v_i_e_w, (25)2, pg. 95-98,
                April 1991.

[Baker91a]      Mary Baker, Private Email Communication, May
                1991.

[Burrows88]     Michael  Burrows,  Efficient  Data  Sharing,
                Technical Report #153, Computer  Laboratory,
                University of Cambridge, Dec. 1988.

[Gray89]        Cary  G. Gray and David R. Cheriton, Leases:
                An Efficient  Fault-Tolerant  Mechanism  for
                Distributed File Cache Consistency, In _P_r_o_c_.
                _o_f _t_h_e _T_w_e_l_f_t_h _A_C_M  _S_y_m_p_o_s_i_u_m  _o_n  _O_p_e_r_a_t_i_n_g
                _S_y_s_t_e_m_s  _P_r_i_n_c_i_p_a_l_s,  Litchfield  Park,  AZ,
                Dec. 1989.

[Howard88]      John H. Howard, Michael L. Kazar, Sherri  G.
                Menees, David A. Nichols, M. Satyanarayanan,
                Robert N. Sidebotham and  Michael  J.  West,
                Scale  and Performance in a Distributed File
                System,  _A_C_M  _T_r_a_n_s_.  _o_n  _C_o_m_p_u_t_e_r  _S_y_s_t_e_m_s,
                (6)1, pg 51-81, Feb. 1988.

[Juszczak89]    Chet Juszczak, Improving the Performance and
                Correctness of an NFS Server, In _P_r_o_c_.  _W_i_n_-
                _t_e_r  _1_9_8_9  _U_S_E_N_I_X _C_o_n_f_e_r_e_n_c_e_, pg. 53-63, San
                Diego, CA, January 1989.

[Keith90]       Bruce E. Keith,  Perspectives  on  NFS  File
                Server   Performance   Characterization,  In
                _P_r_o_c_. _S_u_m_m_e_r  _1_9_9_0  _U_S_E_N_I_X  _C_o_n_f_e_r_e_n_c_e,  pg.
                267-277, Anaheim, CA, June 1990.

[Kent87]        Christopher.  A.  Kent,  _C_a_c_h_e  _C_o_h_e_r_e_n_c_e _i_n
                _D_i_s_t_r_i_b_u_t_e_d _S_y_s_t_e_m_s, Research  Report  87/4,
                Digital    Equipment   Corporation   Western
                Research Laboratory, April 1987.

[Kent87a]       Christopher. A. Kent and Jeffrey  C.  Mogul,
                _F_r_a_g_m_e_n_t_a_t_i_o_n  _C_o_n_s_i_d_e_r_e_d  _H_a_r_m_f_u_l, Research
                Report 87/3, Digital  Equipment  Corporation










TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn                      SSMMMM::0066--2233


                Western Research Laboratory, Dec. 1987.

[Macklem91]     Rick  Macklem,  Lessons  Learned  Tuning the
                4.3BSD Reno Implementation of the NFS Proto-
                col,  In _P_r_o_c_. _W_i_n_t_e_r _U_S_E_N_I_X _C_o_n_f_e_r_e_n_c_e, pg.
                53-64, Dallas, TX, January 1991.

[Nelson88]      Michael N. Nelson, Brent B. Welch, and  John
                K. Ousterhout, Caching in the Sprite Network
                File System, _A_C_M  _T_r_a_n_s_a_c_t_i_o_n_s  _o_n  _C_o_m_p_u_t_e_r
                _S_y_s_t_e_m_s (6)1 pg. 134-154, February 1988.

[Nowicki89]     Bill  Nowicki,  Transport Issues in the Net-
                work File System, In _C_o_m_p_u_t_e_r  _C_o_m_m_u_n_i_c_a_t_i_o_n
                _R_e_v_i_e_w,  pg. 16-20, Vol. 19, Number 2, April
                1989.

[Ousterhout90]  John K.  Ousterhout,  Why  Aren't  Operating
                Systems  Getting Faster As Fast as Hardware?
                In _P_r_o_c_. _S_u_m_m_e_r _1_9_9_0 _U_S_E_N_I_X _C_o_n_f_e_r_e_n_c_e,  pg.
                247-256, Anaheim, CA, June 1990.

[Pendry93]      Jan-Simon Pendry, 4.4 BSD Automounter Refer-
                ence Manual, In _s_r_c_/_u_s_r_._s_b_i_n_/_a_m_d_/_d_o_c  _d_i_r_e_c_-
                _t_o_r_y _o_f _4_._4 _B_S_D _d_i_s_t_r_i_b_u_t_i_o_n _t_a_p_e.

[Reid90]        Jim  Reid, N(e)FS: the Protocol is the Prob-
                lem, In _P_r_o_c_. _S_u_m_m_e_r _1_9_9_0 _U_K_U_U_G  _C_o_n_f_e_r_e_n_c_e,
                London, England, July 1990.

[Sandberg85]    Russel   Sandberg,   David  Goldberg,  Steve
                Kleiman, Dan Walsh, and Bob Lyon, Design and
                Implementation  of  the Sun Network filesys-
                tem, In _P_r_o_c_. _S_u_m_m_e_r _1_9_8_5 _U_S_E_N_I_X _C_o_n_f_e_r_e_n_c_e,
                pages 119-130, Portland, OR, June 1985.

[Schroeder85]   Michael  D.  Schroeder, David K. Gifford and
                Roger M. Needham, A Caching File System  For
                A  Programmer's Workstation, In _P_r_o_c_. _o_f _t_h_e
                _T_e_n_t_h _A_C_M  _S_y_m_p_o_s_i_u_m  _o_n  _O_p_e_r_a_t_i_n_g  _S_y_s_t_e_m_s
                _P_r_i_n_c_i_p_a_l_s,  pg.  25-34,  Orcas  Island, WA,
                Dec. 1985.

[Srinivasan89]  V.  Srinivasan  and   Jeffrey.   C.   Mogul,
                _S_p_r_i_t_e_l_y _N_F_S_: _I_m_p_l_e_m_e_n_t_a_t_i_o_n _a_n_d _P_e_r_f_o_r_m_a_n_c_e
                _o_f  _C_a_c_h_e_-_C_o_n_s_i_s_t_e_n_c_y  _P_r_o_t_o_c_o_l_s,   Research
                Report  89/5,  Digital Equipment Corporation
                Western Research Laboratory, May 1989.

[Steiner88]     Jennifer G.  Steiner,  Clifford  Neuman  and
                Jeffrey  I. Schiller, Kerberos: An Authenti-
                cation Service for Open Network Systems,  In
                _P_r_o_c_. _W_i_n_t_e_r _1_9_8_8 _U_S_E_N_I_X _C_o_n_f_e_r_e_n_c_e, Dallas,










SSMMMM::0066--2244                      TThhee 44..44BBSSDD NNFFSS IImmpplleemmeennttaattiioonn


                TX, February 1988.

[Stern]         Hal Stern, _M_a_n_a_g_i_n_g _N_F_S  _a_n_d  _N_I_S,  O'Reilly
                and Associates, ISBN 0-937175-75-7.

[Sun87]         Sun  Microsystems  Inc.,  _X_D_R_: _E_x_t_e_r_n_a_l _D_a_t_a
                _R_e_p_r_e_s_e_n_t_a_t_i_o_n  _S_t_a_n_d_a_r_d,  RFC1014,  Network
                Information  Center, SRI International, June
                1987.

[Sun88]         Sun Microsystems Inc., _R_P_C_: _R_e_m_o_t_e _P_r_o_c_e_d_u_r_e
                _C_a_l_l   _P_r_o_t_o_c_o_l   _S_p_e_c_i_f_i_c_a_t_i_o_n  _V_e_r_s_i_o_n  _2,
                RFC1057,  Network  Information  Center,  SRI
                International, June 1988.

[Sun89]         Sun  Microsystems  Inc.,  _N_F_S_:  _N_e_t_w_o_r_k _F_i_l_e
                _S_y_s_t_e_m _P_r_o_t_o_c_o_l _S_p_e_c_i_f_i_c_a_t_i_o_n, ARPANET Work-
                ing  Group Requests for Comment, DDN Network
                Information Center, SRI International, Menlo
                Park, CA, March 1989, RFC-1094.







































