[Date Prev]   [Date Next] [Thread Prev]   [Thread Next] [Date Index]   [Thread Index]

 

     Re: [nocol-users] MySQL monitor?

  • To: Vikas Aggarwal <vikas at navya com>
  • Subject: Re: [nocol-users] MySQL monitor?
  • From: "Nathan Clemons [Staff]" <nathan at ici net>
  • Date: Tue, 18 Jan 2000 18:07:58 -0500 (EST)
On Tue, 18 Jan 2000, Vikas Aggarwal wrote:

> Date: Tue, 18 Jan 2000 17:06:41 -0500
> From: Vikas Aggarwal <vikas@navya.com>
> To: "Nathan Clemons [Staff]" <nathan@ici.net>
> Cc: Velocet <mathboy@velocet.ca>, nocol-users@navya.com
> Subject: Re: [nocol-users] MySQL monitor?
> 
> "Nathan Clemons [Staff]" wrote:
> > 
> > I'd love to write a good Perl DBI monitor, where you could specify
> > username, password, port, and DBD type in the config file and have a SQL
> > statement to use to test it.
> > 
> > If anyone feels up to writing a FAQ on how to write a Perl based monitor,
> > I'll be happy to contribute it when complete.
> 
> Nathan,
> 
> A 'sample' perl monitor is located in perlnocol/SAMPLE-perl-monitor.
> Ideally (and also in the case of the Perl DBI monitor), it only needs 2
> functions:

I'll have to take a look at this. Thanks for the pointer. One of the
downsides to lack of comprehensive documentation is in a project as big as
NOCOL, it's easy to have things and not even notice them *smile*

> 
>    sub readconf() = read config file and build 'item' list
>    sub dotest()  = which test's one host, and calls &calc_status()
> 
>    &nocol_main() will then automatically call these above routines, etc.
> 
> Yes, these need to be documented better.
> 
> Regarding the issue with handling a "HUP" signal, the problem lies in
> the fact that it is difficult to determine the changes in the config
> files. Consider the case of 'portmon'... I might edit an existing file
> and change just the IP address in an entry, or just the 'return-string'.
> On getting a HUP, the monitors would have to go thru each of these
> parameters, decide what has been changed, and then delete that 'item'.
> It is specific for each monitor, hence it cannot be made into a library
> function.
> 
> One simple way to do this, is upon getting a HUP, a monitor can:
> 
> 	- erase the old file (effectively as good as restarting)
> 	- but on the first pass, dont reinit everything to 'unknown'
> 	  instead just directly escalate each event to the 'highest'
> 	  severity directly.
> 
> The only downside to this could be if a site just went down, then all
> the monitors would NOT step thru the severities (info -> warning ->
> error -> critical), but directly escalate the site to 'critical'. This
> is the easiest approach to the problem, and with the least impact.
> 

Oh, I know that *smile* That's why I have already submitted a patch for
portmon. Granted, if you change a response, it will still be at the old
severity after the restart, but that should clear out afterwards.

The problems I'm having with SIGHUPs on other monitors besides portmon is
that they read the config file (makes sense) and then for each iteration
through for polling, read the data file to figure out what to poll.
Frankly, with migrating to an SQL based system, there should ideally be no
need for the data directory (potentially). Portmon kept an array of
structs for keeping track of what is what. Radiusmon seems to have a
linked list, but doesnt keep track of the severity in that list. Once I
grok the code a bit more I'll be able to overcome this. Pingmon looks to
be more difficult since it doesnt even have a linked list to work with,
just completely relies on the data file.

> On another note... Once we have all the events in a database (courtsey
> jonz@netrail.net), we should be able to assign 'nodenames' to each
> monitor and refer to each event using 'nodename.event'. A meta database
> could collect data from all these differnt databases, and co-relation
> between nodeA.eventX and nodeB.eventX can be done (this idea from
> Velocet folks). Ofcourse, the tool to do any kind of analysis is also
> TBD.
> 
> 	-vikas
> 

*nod* Our work on migrating to a databased system is ASAP. Unfortunately,
we can't wait long enough for you all to be finished with the changes;
we'll have to get our system working, and then evaluate when you finalize
the next step up w/ DB access whether to convert or to stay with what we
have.

Thanks for the responses,
Nathan.

____________________________________________________________________
Nathan Clemons, Systems Engineer
WinStar Internet and Hosting Services

800 South Main St. Mansfield, MA  02048
____________________________________________________________________  
nclemons@winstar.com    www.winstar.com    (v) 800-234-0002 ext.1109
nathan@ici.net                             (f) 508-261-0430