.\" $Header: watchdog.8,v 1.1 89/12/15 16:27:13 network Exp $
.\"
.TH watchdog 8 
.SH NAME
watchdog \- periodically executes specific instructions based on the 
time elapsed since an event occurred. It is a part of 
.I NOCOL.
.SH USAGE
.B watchdog  <config file name>
.SH DESCRIPTION
.LP
.I watchdog
is a program that periodically processes data files (created by other
.I nocol
monitoring programs such as 
.I pingmon, perfmon)
and if the elapsed time of an event exceeds the values
set in the configuration file, it executes the  program associated with
that time. It then sleeps for a specified period of time, and 
processes the data files after waking up. All necessary information is
provided to 
.I watchdog
in the configuration file. The information consists of the 
.I datafiles
(which contain data generated by the monitoring programs such as 
.I perfmon, pingmon
), the 
.I sleeptime
( in minutes )
and lastly, the program to be executed (after its wait period is exceeded).
.LP
Each data file is processed and the time elapsed since each event occured
is calculated. If the 
.I elapsed time
exceeds any of the specified times in the config file
.I (for that sender - PINGMON or PERFMON),
then the string associated with that time is executed.
As an example, a line such as
.IP
PINGMON  45  /usr/users/nocol/bin/watchdog-45
.LP
would cause the program 
.I watchdog-45 
to be executed if the event happened more than 45 minutes ago. Note that the
sender has to be specified, and the present recognized sender's are
.I PERFMON & PINGMON.
Upto ten such
.I 'levels'
per sender can be specified in the config file. As soon as a 'level' matches
the time elapsed for an event, no further tests are made for that event
for the same sender. Thus, if the configuration file contains the lines:
.IP
.DS
PINGMON  20   /usr/nocol/watchdog/pingmon-20
PINGMON  40   /usr/nocol/watchdog/pingmon-40
PERFMON  20   /usr/nocol/watchdog/perfmon-20
PERFMON  30   /usr/nocol/watchdog/perfmon-30
.DE
.LP
\- then the actions are sorted in descending time order, and for each event
in a data file, the \fI elapsed time \fR is calculated. If the elapsed
time exceeds 40 min, then 
.I pingmon-40 
is executed and
.I perfmon-20 
is not executed (only the highest level (maximum time) match is found for
a particular sender. The next check will be for the
.I PERFMON
levels, and 
.I perfmon-30
is executed (
.I perfmon-20 
is \fB not \fR executed).
.LP
.I watchdog
only tests for 
.I 'critical'
events, and ignores all other events. Also, when the program specified in
the config file is executed, the 
.I site name, ip address, sender, variable, data, time, elapsed time 
.I in present state 
passed to the program as options, and can be used for 
.B problem identification and information.
.SH WATCHDOG CONFIG FILE
.LP
The configuration file for
.I watchdog
has the names of the 
.I data files,
.I programs to execute,
and the
.I sleeptime
(which is the time the program sleeps between processing the data files).
Comments should be on independent lines and begin with a '#'. The 
data files (upto 5) should all be on separate lines. All the times
in the data files are in minutes. The format of the configuration 
file is a 
.I keyword
followed by the requisite information for that 'keyword'.
.IP -
DATA \fIfilename\fR: The full pathname of the data files that contain
monitored inforamtion (in the format specified by the 
.I nocol
structure). Upto 5 datafiles can be specified (though \fBnot\fR on the
same line, but on individual lines).
.IP -
SLEEPTIME \fIminutes\fR: The interval between processing the data files.
The program sleeps periodically for this interval each time it processes
all the data files.
.IP -
PINGMON  \fIwait-time  exec-string\fR: The string to be executed if the
\fBsender\fR of the event is \fIPINGMON\fR and if the event has been 
in the critical state for more than \fI'wait-time'\fR. Upto 10 levels
can be specified. The levels are sorted in descending time order, and
the \fIhighest\fR level is sought and corresponding string executed.
This restriction is solely to prevent multiple instances of the same
action being taken; the intent is to escalate the level of the action
and not repeat the lower level actions. (It might be useful to have
a large time value pointing to a dummy executable program so that
the event is ignored if the time exceeds a certain maximum. Lastly,
the maximum value of \fIwait-time\fR is 1440 min (1 day).
.IP -
PERFMON  \fIwait-time  exec-string\fR: Same as for \fIPINGMON\fR above,
except that the string is executed if the \fBsender\fR of the event is
.I PERFMON.
A separate test is made for each sender.
.SH DIAGNOSTICS & RESTRICTIONS
.LP
The maximum wait-time that can be specified is a day's worth of
minutes (1440 min). This restriction arose simply because it is
difficult to calculate the elapsed time if the event occurred
on the 30th and it is now the 1st of the next month (it is messy trying
to keep track of how many days are in each month).
.LP
If the program is compiled with the \fB DEBUG\fR option, then certain
useful data is written to \fI stdout\fR. The executable strings
and the times are stored in 
.I 'wd_act' 
structures, and in the \fBDEBUG\fR mode, the contents of the structures
are printed out.
.LP
A sample config file can be found in the source level directory.
.SH AUTHOR
Vikas Aggarwal
.SH See Also
netmon(1), pingmon(8), perfmon(8)
