Netcon - Network Condition Monitoring Tool

Written by David Jeske, (C) Copyright 2004,2005.

Released under Apache License v1.1. See LICENSE.txt

What is Netcon?

Netcon is an operational machine and service monitoring tool. It allows you to setup monitoring for machine paramaters such as CPU, and Disk Usage, as well as services such as HTTP, and MySQL. When any of the reported data for these services meets a set of pre-determined triggers, the people responsible for those services can be notified.

What is different about Netcon?

One of Netcon's primary goals is to separate the discovery of individual errors from the notification process. Instead of receiving individual notifications about service problems, which can often include tens or hundreds of notifications, with Netcon the user receives strictly time-periodic updates of the system state during an incident. For example, you might receive one notification every five minutes during an incident, each one telling you how many service failures are still pending on that incident, followed by a notification when the incident is resolved and cleared.

Another primary goal of Netcon is to configure failure conditions not only in terms of discerete failure events (such as service-down), but also in terms of the predicted time to reach a failure threshold. For example, Netcon can be configured to alert you when it's predicted that available diskspace will reach <10% in less than 12 hours. To do this prediction, it uses a simple linear regression of available disk-space over a few time periods.

What else does Netcon do?

The Netcon data collection uses a uniform and extendable naming scheme for storing service status metrics. The same history that records the CPU usage of a machine over time is used to record the duration of a trigger or failure -- allowing both to leverage the same display and graphing capabilities. This makes it easy to build up many layers to the system.
For example, by setting up an agent to report to Netcon the user-impact of an incident, Netcon can report on the user-percieved impact of failures over time.

Netcon's basic architecture borrows my favorite features from other tools. Like QOS, it has a lightweight data-collection agent which is deployed as needed to query data, and which can be easily extended with application-specific collectors. Like Netsaint/Nagios, it has an SQL backend database and a configuration and information browsing UI. Like some larger commercial counterparts, configuration is performed from the Netcon web user-interface. This means it is easy to configure, and since this configuration is stored in the database, this means it is easy to write scripts which modify configuration without fear of breaking a big configuration file.

What are the other basic features of Netcon?

data is stored in a MySQL database
monitoring is performed by a lightweight data-collection client
configuration data about what to monitor is administered centrally
custom data-collection clients can be written by extending the Netcon data-collection agent in Python, or by merely speaking the Netcon http protocol
clients can (optionally) save and report data for disconnected periods
hierarchial redundant trigger suppression
services are specified in role-groups and applied to a set of machines

How does Netcon work?

One way to understand Netcon is to consider the flow of monitored data through the system. Here is a description of the cycle of data collection through an incident notification and resolution.

netcon server startup
netcon client startup a. checkin with server to get configuration b. begin monitoring, periodically reporting data to server
netcon server accepts reported data from many clients a. for each piece of data, update the 'current' state of that service b. roll previous data into 'history'
netcon server periodically checks for errors a. load all triggers and check against 'current' state b. record any trigger state changes c. for any triggered errors, add them to the active incident, creating one if necessary
netcon server periodically handles notifications a. iterate through active incidents, make sure currently active users are watching these incidents c. iterate over watched incidents for each user, and generate notifications (user can choose a single email, or a single email per incident) d. deactivate incidents which have been resolved and which have passed their 'watch' period without any activity.

When the user receives a notification, that notification will indicate the severity of the incident, and the number of failures present on that incident. By visiting the web-interface, the user can check the detailed information reported on the incident, as well as add notes to the incident.

When the problem is resolved, the user must acknowledge and resolve the incident before it will be cleared. When acknowledging, the user can indicate the user-percieved result of the failure (degraded-performance, degraded-functionality, inaccessability), as well as the length of time this incident should be watched for. After the watch timeout has expired, Netcon will clear the incident and make it part of the incident history.

What is there left to work on?

Check out the TODO.txt file!

What other programs are available?

QOS : a client/server data collection and error notification system. Uses raw files for data, and python for configuration. Simple web-interface for viewing current failures. No graphing.
NMIS : a centralized SNMP data collection server with notification and graphing based on RRDTool

http://www.sins.com.au/nmis/
remstats : uses client collectors and a central server with rrdtool

http://remstats.sourceforge.net/release/index.html
other tools based on RRDTool

http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/rrdworld/index.html

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
docs/images		docs/images
nc_agent		nc_agent
nc_server		nc_server
INSTALL.txt		INSTALL.txt
LICENSE.txt		LICENSE.txt
README.md		README.md
TODO.txt		TODO.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs/images

docs/images

nc_agent

nc_agent

nc_server

nc_server

INSTALL.txt

INSTALL.txt

LICENSE.txt

LICENSE.txt

README.md

README.md

TODO.txt

TODO.txt

Repository files navigation

Netcon - Network Condition Monitoring Tool

Written by David Jeske, (C) Copyright 2004,2005.

What is Netcon?

What is different about Netcon?

What else does Netcon do?

What are the other basic features of Netcon?

How does Netcon work?

What is there left to work on?

What other programs are available?

What does Netcon look like?

About

Releases

Packages

Languages

License

jeske/netcon

Folders and files

Latest commit

History

Repository files navigation

Netcon - Network Condition Monitoring Tool

Written by David Jeske, (C) Copyright 2004,2005.

What is Netcon?

What is different about Netcon?

What else does Netcon do?

What are the other basic features of Netcon?

How does Netcon work?

What is there left to work on?

What other programs are available?

What does Netcon look like?

About

Resources

License

Stars

Watchers

Forks

Languages