[Date Prev]   [Date Next] [Thread Prev]   [Thread Next] [Date Index]   [Thread Index]


     Re: [snips-users] hostmon falling over / snipstv/eventselect segmentation fault:

On Wed, Jul 26, 2006 at 03:04:58PM +0100, Robert Lister wrote:
> On Mon, Jul 24, 2006 at 01:32:35PM -0700, Russell Van Tassell wrote:
> > I've not tried going 64-bit on NOCOL, however, though I've been building
> > a new 64-bit Solaris machine... I might be inclined to check it out,
> > though.
> > 
> > (Apologies for the delayed reply here... life has kind of "taken over"
> > recently)
> Well, the problem seems to be (amongst others) anything that tries to use 
> the SNIPS perl module pack_event() will fall over with a segmentation 
> fault... but not always, so I think it may be something to do with monitors 
> that use subdevices.

...sounds like the perl module might need to be recompiled on that
platform, then?  (or, by "fall over," do you mean the aforementioned

For the record, the one thing I've been able to actually *panic* a box
with, recently, was attempting to get Perl installed 64-bit on a Solaris
box... it would actually panic, predictably, while compiling some
modules (don't remember which ones anymore); though I'm just-as-ready to
blame GCC... ;-)

> As our monitoring is made up of several machines there are other problems.
> The SNIPS database files, and RRD files generated on 32bit machines are not 
> readable by the 64bit machine. I haven't yet got round to fixing this, but 
> it would involve exporting the database files as text on the 32bit boxes as 
> text with display_snips_datafile / rrdtool dump and then re-import on the 
> 64bit box.

...an issue with RRD Tool or B-DB?  In any case, yeah, rrddump and such
should help you out, there...

> Unfortunately I can't get my re-import script for SNIPS data to work on 
> either the 32bit machines or the 64bit machines. Maybe something I'm doing 
> wrong. The SNIPS Perl API seems a bit vaguely documented in places, and 
> doesn't give any good examples of how to use pack_event correctly. (i.e., 
> what you need to present it with and how, in which order.)
> If I can't get this to work then I will have to rewrite some stuff (in perl) 
> that uses the SNIPS data files to use the text files instead (snipstv and 
> snipsweb). This is easy enough to do I suppose, just takes a bit of time.

...I'd expect this to be RRD native, myself (and yeah, I'm painfully
aware of how bad the documentation may be, in both arenas).

> On the day of the migration I discovered that snipslogd on the central host 
> did not seem to work remotely. It works fine receiving events from monitors 
> locally, but does not receive events from the remote boxes.
> I can see the UDP packets arrive, but snipslogd doesn't do anything with 
> them, but logs errors about: "readevent: socket read failed (incomplete)--"
> The workaround to this was to configure the remote boxes to use syslog as a 
> transport, by running snipslogd on each remote box, and configuring 
> everything to log to that local snipslogd, and then configure snipslogd to 
> pipe everything to logger:
> *               info    |/usr/bin/logger^-t^snipslogd^-p^local3.info

Ouch... are you sure there's not an ACL in the way, by chance?  I'll have
to look a bit closer on this to comment intelligently, as honestly this
part tends to be a "set and forget" on my part...

> I then configured syslog-ng to ship those events back to the central host, 
> and tweak my event processing script to strip off the syslog date and time 
> bits before trying to parse as a SNIPS event. It turns out that this is also
> slightly more robust since I set it up to use TCP and not UDP, and it allows 
> me to keep a local log on each server as well as log to multiple 
> destinations. (Something that you couldn't really do with snipslogd)

Well, that's *one* way to add a bit more "stability" (?), eh?  *smirk*

> Longer term I will probably need to replace SNIPS with something else (as we 
> will shortly have more requirements like IPv6 Monitoring and more precise 
> latency/jitter/packet loss monitoring. 

Strangely enough, I'm attempting to do similar things, albeit on
Solaris, and while some stats are available via netstat, net-snmp
doesn't have it part of their MIB... so now I'm attempting to wedge
snmpdx into net-snmp as a sub-agent and hope the proper data points
are included with Sun's implementation (and then, if I get that far,
see what it takes to "catch up" Sun's implementation to the current
version of net-snmp).

> It may be that I can come up with a replacement "multiping" wrapper script 
> that makes this work. Unfortunately every alternative out there seems to 
> involve a huge bloaty system that back-ends on mysql or some other database, 
> uses php/java or is yet another open monitoring framework which may or may 
> not be developed to a usable point, or just be too complicated for me to 
> understand. Network engineers are not usually also software developers, so I 
> can manage to hack together something in perl that works, but some of these 
> new montioring frameworks require advanced knowledge of java, python, php or 
> some other wacky script language.

Check out "smokeping" (also from Tobias), not sure if it'll fit your
needs... and it's relatively easy to setup and configure -- even for
network engineers.  ;-)

> The nice thing about snips was that it's lightweight and it just worked 
> (except for a few foibles, most of which have been corrected.)
> Maybe I'm getting too old for this lark, I should let the bright new kids 
> have a go, say I'm old-fashioned with my rickety perl scripts, and not 
> listen to me, and then repeat all the same mistakes I made in the past. :)
> Regards,
> Rob

Yep... I've, personally, been looking in to using cacti (though again
that's largely for general statistics gathering rather than any real
level of monitoring).  The problem there, like most freeware, is the
(lack of) good documentation... and, of course, templating a whole new
system all over again.

Let me know if I can be of any further assistance, a shoulder to cry on,
etc... ;-)  I'm facing a lot of the same issues, here, though I've not
been so compelled to try to do the 64-bit'ness migration on our systems
yet (though again, I might try "just to do so" soon).

Russell M. Van Tassell
russell at loosenut com

Never underestimate the power of stupid people in large groups.

Zyrion Traverse Network Monitoring & Network Management Software