If you want to know to know how your Unix boxes are spending their time, whilst you are not watching, for example when you are off sleeping, eating or sailing to far off places then Topitall could be for you !
Topitall also can be easily configured to generate alerts by syslog and email based on any of the collected parameters and so act as a simple Enterprise System Monitoring [ESM] tool. For example, Topitall could be used to detect system resource problems such as stopped or runaway processes, over or under used file system capacity, or when a process or application has exceeded a system resource threshold. What more do you really need ?
So what do the graph's look like ? here is an example with just three graphs, a normal page will have over thirty graphs, plus three graphs per configured process/application.
Topitall will run on any system where perl5, df, vmstat, ps and rrdtool are installed. Topitall has been developed on Solaris and Linux, so let me know if you have problems on other platforms. [gnuplot is required instead of RRD for version 1.x]
Topitall is designed to be trivial to install, and compared to most other traditional and commercial ESM software, is easy to customize for your own particular applications.
Topitall is implemented in JABOPS [Just A Bunch Of Perl Scripts] using the KISS [Keep it Simple, Stupid] paradigm, and so is reliable and easy to customize and extend. Version 1.2 uses ascii files to store data and implements the gif graphs using gnuplot. Version 2. uses the very excellent Round Robin Database [RRD] library to store the data and produce the gif files. The topitall daemon, and the RRD Library is quite efficient and so will not, in itself, add greatly to the system load.
Topitall executable has three main functions. The data collection agent, topitall.pl -daemon, is designed to run permanently, gathering data averaged over 15 minutes. Basic system resource data is collected and then for configured processes or groups of processes, the number of processes running and cpu and memory usage is collected. The plotting tool, topitall.pl -plot, takes the data collected by the daemon and produces time series graphs in gif format and writes a html file. The time period plotted can be specified as a day, a week, a four week month or a whole year. the plotting tool is a little cpu intensive as it produces graphs in gif format, but this can be scheduled to run as often or as little as you desire.
Named has stopped, check slave server for dns !
Automatic Mail from Topitall
The tiaAlert.cfg file is a bit more complicated so you should see the comments in the file for detailed description and examples but I will give a summary here. Each parameter has a category : System, Disk or Process and a keyword to identify it. The category and the keyword identify the measured parameter, and then an expression is defined from which, when it is satisfied at a 15 minute interval, an alert is generated. The last two fields are the email to send the alert to and a comment to include in the body of the email. The alert subject line is constructed from the expression as defined in the file and will be more less in plain English. The alert subject line is also logged locally to the messages file using the local6.info syslog facility.
For example this line will generate an alert when the Load Average is
greater than 5 and send an email to the user jbelshaw@jcbhp with the comment
in the body of the email. :
System LoadAv >5 jbelshaw@jcbhp # System is getting pretty busy
And this line will generate an alert for the process httpd when the
number of process drops below 4:
httpd N <4 webmaster@localhost # Web server processes have stopped
here are two more example lines
netscape Mem >=55 jbelshaw@jcbhp # netscape is using a lot of Memory
Disk /dev/hda1 >=65 jbelshaw@jcbhp # Boot disk getting full
The process httpd and netscape must have been included in the tiaProcess.cfg file for the process examples to work.
The prune.cfg is only used in version 1.2. The functionality has been replaced by RRD in version 2. It is designed to reduce the numer of data points stored in each data file for data older than a number specified days. If a parameter is not mentioned in the file then data is never reduced. I suggest running prune manually when you wish to reduce the amount of data stored and then decide on prune policy, edit the prune.cfg file and put an entry in the crontab to run it occasionally, say once a week.
Version 1.x is not designed to run over a year and so the data files will overwrite.
Name | Description |
Mem | Percent Memory used [not very accurate] |
MemFree | Memory Free KB |
MemSwap | Swap Used KB |
Uptime | System uptime |
Users | Number of users |
LoadAv | 15 min Load average |
NProc | Total number of processes |
CPUuser | Percent CPU used by user processes |
CPUsystem | Percent CPU used by system or kernel |
CPUidle | Percent CPU not used |
Nrun | Number of running processes |
Nsleep | Number of sleeping processes |
Nswapped | Number of swapped processes |
SwapIn | KB/s swapped in |
SwapOut | KB/s swapped out |
Interrupts | Number of Interrupts/s including clock |
ContextSwitches | Number of context switches /s |
IOin | Block/s IO in |
IOout | Block/s out |
MemCache | Cache Used |
PageReclaims | |
MinorFaults | |
SwapFreed | |
MemShortfall | |
ScanRate | |
disk0 | |
disk1 | |
disk2 | |
disk3 |
Name | Description |
Cpu | % CPU used |
Mem | % Memory used |
N | Number of processes matched |