You must read the Installation document prior to reading this Operations guide.
The main directory where nocol gets installed is specified at compile time (default is set to /usr/local/nocol). Under this directory, the following sub-directories exist:
Running the Monitors
Nocol has a large number of independent monitors- all desired monitors should be listed in the keepalive_monitors script (the variable PROGRAMS). This script is run periodically from crontab and ensures that all the desired monitors are running (the crontab.nocol file is installed into cron during the installation steps).
Generally the monitors do not need any command line argument- the name and location of the configuration file and the data directory is compiled into the monitors. However, you can always specify an alternate config file or output data file using the '-c' or the '-o' command line options respectively. All monitors also accept the '-d' flag to indicate debug mode, in which case they write verbose error messages to the stderr. If started from keepalive_monitors, these error messages are stored in the run/xxxx.error file.
The configuration file for each monitor is located in the etc/ directory. Each of these files should be edited for your site. Note that in most monitors, the 'name' of the device is not used by the monitor, but is basically a operator friendly name for the device.
Currently, sending a HUP signal to the monitors does NOT cause them to re-read the configuration file and preserve the existing state of the variables being monitored.
noclogd - the Logging Daemon
The noclogd daemon listens on port 5354 of the logging host for any events sent by the monitors. The name of the host where noclogd runs is compiled into all the monitors and is not configurable in their config files at this time.
The noclogd process is similar to the Unix 'syslog' daemon and the configuration file allows piping the logged events to any external process. To prevent any random host from sending it any messages, the list of allowed IP addresses (which can log to it) is listed in the noclogd configuration file.
Since this process can run external programs, it is used to run the pager notification scripts, etc. This program can be used to log messages to a database, send emails, etc.
It should be noted that an 'event' in nocol is generated only when a value crosses a threshold in any polling interval. Hence, normally you will not see any logging activity in noclogd, but when a device variable changes its state, an event will be logged. This means that an event will be sent by a monitor to noclogd both when it goes down (e.g. from info level to warning level) and also when it comes back up (e.g. warning level to info level).
Routine admin tasks in nocol consist of ensuring that all the monitors are running (done by running keepalive_monitors from cron), and rotating all the log files maintained by noclogd (done by running log-maint periodically from crontab). See the sample nocol.crontab for achieving these tasks.
There are three different user interfaces to view the nocol data. The simplest of them all is netconsole, which is a non-graphical, curses based tool for displaying the raw data being collected by the monitors. Any user on the system where the monitors are running can run this tool.
The Web interface for displaying nocol data is divided into two scripts- genweb.pl which runs periodically from crontab and generates 4 web pages (one for each severity level). The other program is a CGI script webnocol.cgi, which gives added functionality to the user such as troubleshooting, adding notes for an event, hiding a known event, etc. This script has its own built in access control based on the user, but as an alternative the typical .htaccess method can easily be used.
This is a Tcl/tk based monitor using client-server technology. A simple daemon (called 'ndaemon') runs on the nocol machine listening on TCP port 5005 and all it does is periodically send the nocol raw data to all connected clients. The client displays then parse and format/display this nocol raw data. ndaemon has no access control at this time, so it is important to put a firewall to restrict unauthorized access to ndaemon's TCP port.
Note that none of these interfaces displays historical data from 'noclogd'- they all work directly on the data being collected by the monitors which represents the current state of the network.
|Notifications & Reports|
|A very flexible notification script called
'notifier.pl' is provided with nocol which has a configuration file
describing the type of event and required action. Currently the possible actions are
mail and page. A minimum and maximum age of the event can be defined
indicating that the action should be taken (paging or email) only if the age of the event
lies between these two values (in seconds). An option exists to allow 'repeat'
notification (once every hour) until the age is exceeded.
A more 'event' driven notification system can be written by using noclogd. Any event can be piped to an external script by noclogd, so a page or email can be sent as soon as an event occurs and is logged to noclogd. As an example, look at the 'utility/beep_oncall' script.
Currently the only reporting tool for historical analysis is 'logstats' which parses the historical noclogd event logs and generates a simple summary report. This is run by the 'log-maint' script which in turn is run periodically from crontab.