Topitall

Introduction

Topitall is a Unix system monitoring tool designed to run as an agent, or daemon, to gather, and display as html, time series graphs of the most useful system resource information. Topitall uses the common unix system tools, vmstat, ps, df,netstat and uptime to gather system information every 15 minutes and uses gnuplot to display them as a series of time-series graphs. The parameters collected are basic system parameters such as memory, network, swap and cpu usage, per process or application, and file system useage.

If you want to know to know how your Unix boxes are spending their time, whilst you are not watching, for example when you are off sleeping, eating or sailing to far off places then Topitall could be for you !

Topitall also can be easily configured to generate alerts by syslog and email based on any of the collected parameters and so act as a simple Enterprise System Monitoring [ESM] tool. For example, Topitall could be used to detect system resource problems such as stopped or runaway processes, over or under used file system capacity, or when a process or application has exceeded a system resource threshold. What more do you really need ?

So what do the graph's look like ? here is an example with just three graphs, a normal page will have over thirty graphs, plus three graphs per configured process/application.

Topitall will run on any system where perl5, df, vmstat, ps and rrdtool are installed. Topitall has been developed on Solaris and Linux, so let me know if you have problems on other platforms. [gnuplot is required instead of RRD for version 1.x]

Topitall is designed to be trivial to install, and compared to most other traditional and commercial ESM software, is easy to customize for your own particular applications.

Topitall is implemented in JABOPS [Just A Bunch Of Perl Scripts] using the KISS [Keep it Simple, Stupid] paradigm, and so is reliable and easy to customize and extend. Version 1.2 uses ascii files to store data and implements the gif graphs using gnuplot. Version 2. uses the very excellent Round Robin Database [RRD] library to store the data and produce the gif files. The topitall daemon, and the RRD Library is quite efficient and so will not, in itself, add greatly to the system load.

Topitall executable has three main functions. The data collection agent, topitall.pl -daemon, is designed to run permanently, gathering data averaged over 15 minutes. Basic system resource data is collected and then for configured processes or groups of processes, the number of processes running and cpu and memory usage is collected. The plotting tool, topitall.pl -plot, takes the data collected by the daemon and produces time series graphs in gif format and writes a html file. The time period plotted can be specified as a day, a week, a four week month or a whole year. the plotting tool is a little cpu intensive as it produces graphs in gif format, but this can be scheduled to run as often or as little as you desire.

Here is an example Alert email :

Subject:
jcbhp named : N <1
Date:
Fri, 28 Dec 2001 08:54:03 GMT
From:
John Belshaw

Named has stopped, check slave server for dns !

Automatic Mail from Topitall

Here is an example message

Dec 28 08:51:08 jcbhp Topitall: Alert named N <1

Installation

Just download the tgz file, gunzip it and untar it in a normal users directory. I would not recommend running or installing Topitall as a root user. The script topitall.pl should be run with : perl topitall.pl -daemon. For Version 2 you must install RRD according to the instructions given with that package. There is an example rc script to start Topitall on boot , and an example crontab file to generate the html files automagically. The utilities top, netstat, and df must be in the users path. For alerts to work, logger must be in the path and sendmail should be /usr/lib/sendmail and configured so as to be able to send mail.

Download

You can download topitall here. Current Version is 2.0 , 11 Jan 2002 , Version 1.2 is here

Quick Start

Simply run perl topitall.pl -daemon as the user you installed it and data will start to be writen in the tia directory [data directory, one file per parameter]. If you use the -q option topitall will run in quick mode, where data will be gathered one point per ten seconds with butchered time stamps so you can fine tune your config files. To see the results run perl topitall.pl -plot -day and today's files will be created in the html directory including the hostname.day.html file which is the one you should look at first.

configuration files

The tiaProcess.cfg file specifies the names of the processes [in fact a regular expression on the Command field given by ps] , one per line. If for example you are running the apache httpd daemon then just put httpd in a line in the tiaProcess.cfg file and restart topitall. One of the parameters plotted is the number of processes matched by each line in the tiaProcess.cfg file, and so can be used to see if the process, or application is running or not. If you wish to monitor a whole application then you may be able to enter a string or pattern to match all the processes in the application. For example all your processes may start with ora_ and so will be matched by a line containing ora_. If your regular expressions are anything like mine then you will need to keep them very simple otherwise you will get confused results. You can use the -v 2 option to see what is actually being matched.

The tiaAlert.cfg file is a bit more complicated so you should see the comments in the file for detailed description and examples but I will give a summary here. Each parameter has a category : System, Disk or Process and a keyword to identify it. The category and the keyword identify the measured parameter, and then an expression is defined from which, when it is satisfied at a 15 minute interval, an alert is generated. The last two fields are the email to send the alert to and a comment to include in the body of the email. The alert subject line is constructed from the expression as defined in the file and will be more less in plain English. The alert subject line is also logged locally to the messages file using the local6.info syslog facility.

For example this line will generate an alert when the Load Average is greater than 5 and send an email to the user jbelshaw@jcbhp with the comment in the body of the email. :
System LoadAv >5 jbelshaw@jcbhp # System is getting pretty busy

And this line will generate an alert for the process httpd when the number of process drops below 4:
httpd N <4 webmaster@localhost # Web server processes have stopped

here are two more example lines
netscape Mem >=55 jbelshaw@jcbhp # netscape is using a lot of Memory
Disk /dev/hda1 >=65 jbelshaw@jcbhp # Boot disk getting full

The process httpd and netscape must have been included in the tiaProcess.cfg file for the process examples to work.

The prune.cfg is only used in version 1.2. The functionality has been replaced by RRD in version 2. It is designed to reduce the numer of data points stored in each data file for data older than a number specified days. If a parameter is not mentioned in the file then data is never reduced. I suggest running prune manually when you wish to reduce the amount of data stored and then decide on prune policy, edit the prune.cfg file and put an entry in the crontab to run it occasionally, say once a week.

Performance

The topitall agent is designed to run at a very low load. When run on a normal speed laptop in Linux, running in quick mode it uses less than 1% of the cpu so I believe it will average at less than 0.01 % of cpu when running normally. It has a menory footprint of circa 2.5 MB.

Comments

I wrote version 1 of topitall whilst bored sailing to the Seychelles on my laptop using RedHat Linux 6.2, and so I had limited resources to distract me. If you like Topitall tell all your friends and get them to install it. If you have any comments or constructive criticism please send them to topitall@eircom.net and I will try and answer them. If you have any success customising the config files for particular applications I would welcome seeing them and I will include them in future example files with an acknowledgement.

Restrictions

here is an example Topitall is distributed under the artistic License, so you can freely download it, use it and extend it as long as you respect my copyright. See the license file here. If you wish to use it in a commercial environment you should consider paying me to supply support, [it's either begging here or on the streets of Dublin, so I'll try here first ;-]. If you need help in rolling out topitall in an enterprise environment I can offer a standard service to configure your config files for you, please contact the author John Belshaw at topitall@eircom.net.

Caveats

If vmstat, df, or ps hangs whilst being called from topitall then the agent will hang, and will need to be restarted when the problem is fixed.
The alerts generated can only be sent by mail if /usr/lib/sendmail is correctly configured.
The alerts generated will be logged by local6.info and require syslog to be configured.
vmstat generates different outputs on different platforms, and so the data collected is different, if the platform is not SunOS or Linux then topitall will just use the cryptic names given by vmstat itself as the name of the parameters.
Some older versions of gnuplot do not support the gif terminal type and so will not produce the graphs.
The system commands, vmstat, df, uptime, ps and gnuplot must be in the path of the user running topitall.
The network parameter data is currently in development and is only certain to work on Linux and Solaris.

BUGS

Version 1.x is not designed to run over a year and so the data files will overwrite.

Appendix

System parameters measured

**All Platforms**
Name	Description
Mem	Percent Memory used [not very accurate]
MemFree	Memory Free KB
MemSwap	Swap Used KB
Uptime	System uptime
Users	Number of users
LoadAv	15 min Load average
NProc	Total number of processes
CPUuser	Percent CPU used by user processes
CPUsystem	Percent CPU used by system or kernel
CPUidle	Percent CPU not used
Nrun	Number of running processes
Nsleep	Number of sleeping processes
Nswapped	Number of swapped processes
SwapIn	KB/s swapped in
SwapOut	KB/s swapped out
Interrupts	Number of Interrupts/s including clock
ContextSwitches	Number of context switches /s

**Linux Specific**
IOin	Block/s IO in
IOout	Block/s out
MemCache	Cache Used

**Solaris Specific**
PageReclaims
MinorFaults
SwapFreed
MemShortfall
ScanRate
disk0
disk1
disk2
disk3

Disks

Percentage Free space for all locally mounted disk, identified by /dev/nodename

Process, per match

Name	Description
Cpu	% CPU used
Mem	% Memory used
N	Number of processes matched