Web Administration Guide

Figure 7.1 shows the top rows of the NetVigil Administrator interface. All NetVigil pages have the components shown, including:

Tabs with links to the four main areas: STATUS, REPORTS, MANAGE, and HELP

A navigation bar with links to the pages within the current main area

The name of the current page.

Figure 7.2 displays the NetVigil icons used to communicate device and test status. OK, WARNING, and CRITICAL test statuses are determined by the thresholds for a specific test. Test status is UNKNOWN if the test cannot be performed (for example, if the device is down) or if the database cannot be reached. Although not represented by a particular icon, a test can have status = FAIL, which means that the device was reached but the test failed to be performed. An example is when a POP3 port test is performed and the supplied login/password combination fails.

A test status is UNCONFIGURED if the device has been added to the application for monitoring but no tests have been created for it. An UNREACHABLE test status will be returned if a parent/child dependency has been created between two or more devices and a failure in the parent device causes the child to be physically unreachable via the network. (For detailed information on device dependencies, see Section 8.3, "Device Dependency" on page 8-3.) Test status is TRANSIENT if the test's status has changed, but the flap prevention threshold has not been crossed. For example, if you configure a test so that no action is taken until the result has been CRITICAL for three test cycles, test status changes to TRANSIENT after the first CRITICAL result is returned. It remains TRANSIENT until either the problem is resolved, in which case test status changes to a lower severity, or the third CRITICAL result is returned, after which test status is CRITICAL and appropriate action is taken.

7.1.2 Logging In

The NetVigil superuser will probably create the Admin-Group structure and assign you to an Admin-Class, which will determine the scope of your ability to see, create, modify and delete entities within the application.

To access NetVigil:

Type http://netvigil.your.domain into your web browser.

Enter the Department name, Username and Password given to you by your administrator.

Click on the Login button to enter the site.

If you are an Administrator, you will see the administration interface as described above. If you are not an administrator, please refer to the User Guide for assistance with your NetVigil account.

7.2 Viewing Status & Events

To view the Status Summary for all your Departments, do one of the following:

If you are already logged in, click on the STATUS tab and the Department Status Summary screen will load.

7.2.1 Department Status Summary View

The Department Status Summary View is the administrative default view when the STATUS tab is selected. There is one row for each Department with monitored devices. Each row gives the Department name and an icon representing the worst test status for the Department at the far right of the row.

If the Department status for one group of tests is WARNING, at least one current test result for that test category on the Department is in WARNING range. Similarly, if the Department status for one category of tests is CRITICAL, at least one current test result for that category on the Department is in CRITICAL range. The worst test status of all tests in the category determines the icon displayed. The rule for displaying the icons (from most to least severe) is:

7.2.2 Device Status Summary View

The Device Status Summary View under the main STATUS tab displays all the devices on all the Departments that you have been given permission to view/manage. Each row displayed gives the device name and an icon representing the worst test status for the device at the far right of the row.

CRITICAL (most severe)

WARNING

UNREACHABLE

UNKNOWN

SUSPENDED

UNCONFIGURED (least severe)

To view the device status summary for a specific Department:

Click on the STATUS tab on the main navigation bar to go to the Department Status Summary page.

Click on the Department name link for the Department of interest and you will be taken to the Device Status Summary page.

7.2.3 Test Summary View

The Test Summary page contains one row for each test being conducted. Each row contains test status, test name, current test value, the warning and critical thresholds, the time the last test was conducted, and the time the test has remained in the current state. For example, in the sample Device Test Status Summary page in Figure 7.4 below, the ping Packet Loss test has been in OK status for 1 hour & 18 minutes.

To view the test summary for a specific device:

Click on the STATUS tab on the main navigation bar to go to the Department Status Summary page.

Click on the Department name link for the Department of interest and you will be taken to the Device Status Summary page.

Click on the device name link for the device of interest and you will be taken to the Device Test Status Summary page.

7.2.4 Test Details View

The Test Details page graphically displays performance and event history for a single test over the last 6-24 hours. There are four graphs on the Test Details page (see Figure 7.5 below):

A pie chart showing percent of last 24 hours in each of OK, WARNING, CRITICAL, UNKNOWN and UNREACHABLE statuses

A three-dimensional bar graph of test results for the last 6 hours

A line graph of test results for the last 24 hours

A frequency distribution graph for the last 24 hours

To view the details for a specific test

Click on the STATUS tab on the main navigation bar to go to the Department Status Summary page.

Click on the Department name link for the Department of interest and you will be taken to the Device Status Summary page.

Click on the device name link for the device of interest and you will be taken to the Device Test Status Summary page.

Click on the test name link for the test of interest and you will be taken to the Device Test Details page for that test.

7.2.5 Event Logs

An Event Log lists every time a test status has changed in the past 24 hours. Each line gives the device name, time the event occurred, test name, type of test, low (warning) and high (critical) thresholds, and the test value. The Event Log is typically sorted by device and then by test, but may not show other devices or even a device name, depending on which level of detail you are viewing. The various levels of viewing event logs are explained below.

Please wait for the information to load, as the databases for all the data gathering elements (DGEs) are being queried.

To view the Event Log for only one device:

Go to the Device Test Status Summary page as described in "To view the test summary for a specific device:" on page 7-5.

Click on the Events for the last 24 hours link to see the events for that device.

To view the Event Log for a single test:

Go to the Device Test Details page as described in "To view the details for a specific test" on page 7-7.

Click on the Events for the last 24 hours link to see the events for only that test.

7.3 Managing NetVigil

7.3.1 Manage Your Own Department

To update your own Department information or change your password:

Click on the MANAGE tab on the main navigation bar.

Click on the USER tab on the secondary navigation bar and you will be taken to the Update Department page.

Enter any changes desired. Modifiable fields include: E-mail, day phone, evening phone, mobile phone, pager, timezone, and password. Please contact your administrator if you wish to modify your contact address.

In the Preferences section, select the checkboxes for all the device states you wish to view on the summary pages, leaving blank all those you wish to filter out by default.

Change the number of devices to view on each page in the Maximum To Display field.

Click the Update User button to save your changes.

These changes will become part of your user profile and will serve as defaults each time you log in to NetVigil.

7.4 Administrative Reports

NetVigil provides report templates for analyzing systems usage and performance. The reports are designed to provide a summary view of all the Departments assigned to you as an administrator. The currently available reports detail Department/device health, event history for Departments/devices/tests in a drill down fashion, and audit Department and user activity. The Admin-Class to which you are assigned adheres to the privileges matrix and provides the filter for which User-Classes you will see on your reports. Consequently, if you are managing a single department, you may have full access to the department information, but will not be able to see another department's reports (and vice versa). This restriction can be modified by the enterprise's Superuser to fit your needs.


Important Note All WARNING or CRITICAL events used to generate admin reports are based on the Shadow Thresholds, which are thresholds established by the administrator for each combination of test type and User-Class. See the Assigning Shadow Thresholds and Actions section for more specific details. End-users who may also run similar reports to those below will see reporting results based on WARNING & CRITICAL thresholds that they have established themselves on a per test basis or by accepting the default test thresholds as defined in the Assigning Default Thresholds and Actions section. Thus, reports based severities equal to WARNING or CRITICAL may show different results, depending on whether the user viewing them has admin or only end-user privileges. Because SLA thresholds are established for the benefit of both admin users and end-users alike, reports based on SLA severities will display the same results.

To view the following reports:

Click on the REPORTS tab on the main navigation bar. You will be taken to the Manage Reports page (see Figure 7.6 below.)

Depending on report type, select a Duration and/or Severity via the drop down list, then click Go.

To view the User Audit Report, simply click Go.

7.4.1 Fault Management Reports

The Fault Management Reports provides an in-depth and rigorous analysis of the events where tests/devices and services crossed the thresholds. It provides Device and Service reports on the most fault prone services and the number of events that occurred.

is designed to provide a consolidated view of events for either the last 24 hours or for a specific historical month. Each report entry is a unique combination of device name, test name and severity, detailing both the total duration in the specified severity (i.e. CRITICAL, WARNING, etc.) and the number of times that the test entered that severity.Below the text listing is a graphical display of the top 10 'worst' results in a horizontal bar style. Clicking on any of the column headings for the text list will automatically update this graph.

provides reports on top 10,25 or 50 services affected by number of events. The report consists of the Frequency distribution of the events during each hour of the day, each day of the week/month and duration of events.

are the most important reports that allow you useful data on threshold Violations for Bandwidth, CPU, Memory and Disk Utilization.

7.4.2 Performance and Capacity Planning Reports

These reports help you plan managing your IT infrastructure investments and targeting them in right direction. These reports help to know where exactly the performance is the bottleneck due to capacity constraints.

gives useful data on the capacity planning for creating redundant capacity where required and removal of excess capacity where it is not required by reporting on TOP N devices or Tests by highest or lowest usage values. This report can be based on the status of one or more test types.

for Bandwidth, CPU and Disk Space utilization gives a trend analysis for next one month and allows you to plan accordingly.

7.4.3 SLA Reports

f is based on device availability as measured by the ICMP packet loss test. Metrics are captured for the device state equal to CRITICAL or UNREACHABLE. The report shows the Top 10 devices by amount of "unavailability", displaying total time unavailable and % unavailable, with graphics showing either view.

Users may link to an availability distribution report/graph as well. This histogram is a distribution of the numbers of devices falling into blocks of 10% availability. That is, it displays the number of devices falling between 0-10% availability, 10-20% availability, and so on.

7.4.4 Management Reports

User Audits

This administrator-level report displays discrete numbers of Departments, users, devices, tests by category, and logins for each User-Class within the domain of the administrator seeking the information. Also displayed for each User-Class, are average numbers per Department for: devices, ICMP tests, SNMP tests, and Port test.

The Superuser is also able to see the currently logged in users by clicking on the link 'Who's logged in now'.

7.4.5 Stored and Scheduled Reports

The stored and scheduled reports are the custom reports that are stored as queries and hence can be generated on the fly just like the standard reports.

7.4.6 Custom Reports

Generates one or more of the Top Ten, Number of Events Distribution, Event Duration Distribution, Number of Events, Performance, Statistics, Trend Analysis reports for the particular tests of chosen test types for a device.

Generates reports for one or more of Top Ten, Number of Events Distribution, Event Duration Distribution, Number of Events for devices of a particular vendor and Device type.

Features the event distribution against time reports for chosen types of Tests or a particular device.

Short Graphical Reports for last 24 hours 5-minutes interval for a particular test/device or types of tests chosen

Plots reports for the similar tests on a single graph allowing comparison of performance.