[Date Prev]   [Date Next] [Thread Prev]   [Thread Next] [Date Index]   [Thread Index]

 

     Re: Monitoring the monitors

> 
> On Sat, Jul 10, 1999 at 06:11:41PM -0400, TTSG wrote:
> > Hi,
> > 
> > 	We just ran into a problem where a machine failed, and it caused
> > hostmon to "lock".  The last thing it appears to be doing was an rcp of
> > the file from the machine that failed.  We fixed the machine last Friday
> > and didn't check NOCOL.  Since then, 2 other machines had the hostmon
> > monitor die, which we didn't know.  Even worse, though, was that another
> > machine went down and we didn't find out until we saw other indications.
> > 
> > 	I hate to do this, but is there something we can do to monitor the
> > monitors?  (We'd of course have a monitor monitor). 
> 
> *ack*
> 
> Perhaps, once every ten minutes or so:
> 
>    ls -lt ~nocol/logs | head -1
> 
> ...if a file hasn't changed "recently," either you have a fairly docile
> network or something's buggy with the monitoring (specifically, hostmon
> tends to log the idle time of the CPU and context switches once every few
> minutes as it's ratehr rare the the relative load on my server(s) don't
> change ever-so-slightly)
>
	But you couldn't tell if it was hostmon logging an info, or
something else....No?
>
> So now we need a monitor-monitor monitor, right?  *chuckle*
>
	Actually considering a 2nd machine to monitor the first.
>
> 
> > 	In an unrelated story....... Is there a way to keep "X" previous copies
> > of hostmon output?  We sometimes don't catch a situation that would have
> > really been great to see the hostmon information until after its refreshed
> > it.  Only keep the last "X" rolling copies of the config per machine.
> 
> Well, a "painful" way to rebuild the data may be to:
> 
>   cat ~nocol/logs/* | grep '\[hostmon\]' | sort
>
	Doesn't show the data I'm looking for.  Only some.
> 
> 
> I know... perhaps not the most elegant, but...
> 
	Any hints/tips/tricks are appreciated!

		Tuc/TTSG