		     Internet Rover 3.0 Ping Daemon


	********************************************
	*
	*  graphics -- Cannot represent as text.
	*
	********************************************

  The most common test performed by rover is the PING() test.  This
test determines if the not is reachable or not by sending an ICMP echo
request packet and awaiting an ICMP echo response.  The problem is that
it takes time to determine that a node is not responding to ping.  In
practice, this causes network management tools to work least
efficiently during times of network calamity, and most efficiently during
times of network stability.  Obviously, this is not the desired result.

  An additional problem arises as well.  During network routing
information exchanges associated with metric changes and other routing
instabilities, it is likely that seconds can elapsed before packets
once again become routed.  The original ping test sent ICMP echo
request packets one second apart until a response was received, or five
seconds had elapsed.  The network instability caused by network route
exchange would cause a situation where nodes would incorrectly be
marked unreachable.  The next cycle through would indicate thate the
nodes were once again reachable.  Which nodes were marked unreachable was
dependent upon which nodes were being polled during the routing exchange.
These false alerts caused the alerts authenticity to be questioned.

  The ping daemon was written to solve this problems.  By changing the
implementation of the PING() test from sequential ICMP echo requests to
parallel ICMP echo requests, there is an upper bound on the elapsed
time before a node outage is detected.  The pingd reads from the same
hostfile to determine which nodes need to be ping'd.   pingd maintains
a table of nodes to ping, when the next ping is to be sent, the current
state of the node, along with configuration information (number of
retries, and time between retries). 

  Initially, all pings are sent out in parallel.  As responses are
received, the next ping times are calculated, and the state of the node
is set to up. Since the responses are received at different times,
there is a natural skewing of the next ping times.  

  There is a default number of retries and a default time between
retries.  Users may edit the hostfile and alter the default
timeout,retry, and cache timeout parameters on the PING() test.  The
syntax is:

  PING( retries, timeout, cachetimeout)

  where:

  retries = the maximum number of retries before marking the host DOWN

  timeout = the time in seconds between retries

  cachetimeout=the time to hold the node state before restarting the
pings

  This facility solves the previous problems in two ways.  First, it
works equally well during outages and network steady states, because
there is now a cap on how long a node is non-responsive before it is
marked down.  Secondly, short duration outages caused by routing
transitions no longer cause alerts, because the user can adjust the
sensitivity of the test.  (Kind of like a squelch knob on a citizen's
band radio.)

