16.2 Collecting and Analyzing Qmail Logs with Qmailanalog

The qmailanalog package extracts statistics from the logs created by qmail-send. It consists of matchup, which preprocesses the qmail logs; some scripts such as zoverall and zddist, which collect and print statistics; a second set of scripts, such as xsender, for picking out subsets of messages to analyze; and a few other auxiliary programs and scripts. The only C programs are matchup and columnt, an auxiliary program that neatens up the columns in the reports. Everything else in the package is short awk or shell scripts that are not hard to edit.

Using qmailanalog is more painful than it should be because it expects its input log files to use an older decimal timestamp format used by the now obsolete splogger and accustamp rather than the TAI64N format used by multilog. I have a patch to matchup to translate TAI64N to the older format as the logs are read at http://www.iecc.com/qmailanalog-date-patch. The rest of the discussion here assumes that matchup has that patch. To build the qmailanalog package, download the current version (0.70 as of this writing) from http://cr.yp.to, download and apply the patch, do the usual make, then become super-user and make setup check. Normally qmailanalog installs itself in /usr/local/qmailanalog. To change the installation directory, edit conf-home. The setup instructions advise against installing the programs in /usr/local/bin because some of the names may collide with other unrelated programs.

To use qmailanalog, first you pass the raw logs through matchup to create a condensed file with one line per message and one line per delivery. Then the analysis scripts read the condensed files and produce reports. matchup writes both the condensed file and a second file listing messages that haven't been completely processed. The next time matchup runs, it needs that second file to pick up where it left off. The condensed file is written to standard output, and the second file to file descriptor 5.

16.2.1 Log Analysis at Rotation Time

The condensed files produced by matchup are about half the size of the raw qmail logs and matchup is fairly fast, so it makes sense to call matchup from multilog to create the condensed logs each time it switches log files, as shown in Example 16-1.

Example 16-1. Qmail log run with analysis

 1. #!/bin/sh  2.   exec setuidgid qmaill \  3.   multilog t s4000000 \  4.   !'cat /dev/fd/4 - | /usr/local/qmailanalog/bin/matchup' \  5.   ./main

Line 4 in this modified run file creates the condensed logs, using the short quoted shell script as the log processor. Because matchup was written before multilog, their file descriptor conventions almost, but not quite, agree with each other. When multilog runs the log processor, it opens the existing log file as standard input and as a file of saved data from the previous run on file descriptor 4. The standard output is saved as the old log file, named with the current TAI64N timestamp, and any output on file descriptor 5 is stored away for the next time the processor runs. Although matchup does write information about partially processed messages to file descriptor 5, the next run reads that information from the previous run from the standard input along with the next log file. Hence use cat /dev/fd/4 - to read from the two file decriptors and pipe it all to matchup. The result of all of this is a set of condensed log files in /service/qmail-send/log/main.

16.2.2 Log Analysis Once a Day

It's equally possible to do the log analysis once a day from cron or /etc/daily. If the original logs have to be saved, you should do all of the processing at once. In this case, be sure that the file rotation options for multilog make the log files it creates large enough and that it saves enough of them to keep a full day's log files. I use "s 4000000" so that each log file is up to four megabytes or a total of 40 MB before multilog starts overwriting them.

Example 16-2, to be run once a day, saves the logs as qmail-send.yyyymmdd and a preprocessed version as qmail-summary.yyyymmdd.

Example 16-2. Daily log save and analyze

 1.  #!/bin/sh  2.  cd /var/log                     # or wherever logs are archived  3.  a=$(date +'%Y%m%d')             # yyyymmdd  4.  svc -a /service/qmail-send/log  # force log rotation  5.  sleep 5                         # give rotation a moment to happen  6.  cat /service/qmail-send/log/main/@* > qmail-send.$a  7.  cat qmail-send.yesterday qmail-send.$a | \  8.   /usr/local/qmailanalog/bin/matchup > qmail-summary.$a 5>qmail-send.tomorrow  9.  mv qmail-send.tomorrow qmail-send.yesterday 10.  gzip qmail-send.$a  qmail-summary.$a  # log files are big, save space

The file of incompletely processed deliveries is saved to qmail-send.tomorrow, then it is renamed to qmail-send.yesterday for the next run. This creates a new pair of log files every day, so you need some provision for deleting old logs now, e.g., to delete logs over a month old:

$ find . \( -name qmail-send.[0-9]* -name qmail-summary.[0-9]* \) \     -mtime +30 -exec rm {  } \;

16.2.3 Getting Statistics with Qmailanalog

Once the summary files are created, you can run the various summary scripts, all of which have names starting with z, to get mail system statistics. All of the scripts read the summary file from their standard input:

# Summaries created at rotation time log directory $ cat /service/qmail-send/log/main/@* |     /usr/local/qmailanalog/bin/zoverall # Daily summaries in /var/log $ gzcat /var/log/qmail-summary.yyyymmdd.gz |     /usr/local/qmailanalog/bin/zoverall

The most useful report is zoverall, which as its name suggests produces overall statistics. Example 16-3 is from my main mail server, which hosts some mailing lists, a few dozen personal mailboxes, and the abuse.net message forwarding service.

Example 16-3. A zoverall report

Basic statistics qtime is the time spent by a message in the queue. ddelay is the latency for a successful delivery to one recipient---the end of successful delivery, minus the time when the message was queued. xdelay is the latency for a delivery attempt---the time when the attempt finished, minus the time when it started. The average concurrency is the total xdelay for all deliveries divided by the time span; this is a good measure of how busy the mailer is. Completed messages: 56013 Recipients for completed messages: 65158 Total delivery attempts for completed messages: 66940 Average delivery attempts per completed message: 1.19508 Bytes in completed messages: 309400658 Bytes weighted by success: 349381796 Average message qtime (s): 31.3781 Total delivery attempts: 75035   success: 66080   failure: 974   deferral: 7981 Total ddelay (s): 2353455.027418 Average ddelay per success (s): 35.615240 Total xdelay (s): 437123.420922 Average xdelay per delivery attempt (s): 5.825594 Time span (days): 0.631722 Average concurrency: 8.00874

In this case, the summary file covers about 15 hours (0.631 days), long enough to be interesting but perhaps not typical of a full 24-hour period. The system is moderately busy with an average of eight messages in transit at once, and the average message dealt with in 31 seconds and the average delivery taking about 35 seconds. (This is unusually slow, probably because abuse.net sends messages to very overloaded recipient hosts via long international links.) This system is configured to permit 110 remote deliveries at a time, but the average concurrency is only 8, so increasing the maximum probably wouldn't make much difference.

Another useful summary is zsuids, summary by numeric sender ID, as shown in Example 16-4.

Example 16-4. Log summary by sender user IDs

One line per sender uid. Information on each line: * mess is the number of messages sent by this uid. * bytes is the number of bytes sent by this uid. * sbytes is the number of bytes successfully received from this uid. * rbytes is the number of bytes from this uid, weighted by recipient. * recips is the number of recipients (success plus failure). * tries is the number of delivery attempts (success, failure, deferral). * xdelay is the total xdelay incurred by this uid.  mess      bytes     sbytes     rbytes  recips  tries         xdelay  uid    21      27319      27319      27319      21     21       3.736360  0    13      25340      25340      25340      13     13       9.240442  9  1608    6597028   30396182   31342313    6143   7233  129119.309333  85 27052  156054190  167640496  168084828   30104  30392   53368.913426  120   113    1552110    1552110    1552110     113    119     120.743584  121   838    7325053    7256425    7325053     838    900    2406.317767  124     4      18179      18179      18179       4      4      64.632176  130   233    1113023    1208518    1210043     299    308    3266.386563  143  1955    6336818    8803216    8968141    2714   2758   20493.535539  162  1028    6983187    6985508    7248128    1120   1155    4744.157863  166 23060  123111117  125207318  129722143   23698  23945   87628.298410  170    24     224755     224755     224755      24     24      11.962693  172    64      32539      36430      36430      67     68     327.511405  32767

In this system, user 85 is majordomo, so most of its messages are to mailing lists. (Note that it sent 1608 messages to 6143 recipients, an average of almost four recipients per message, which is very high.) User 120 is qmaild, which is considered responsible for all mail arriving via SMTP. User 162 is the spam trap, sending out many semiautomatic abuse reports; user 166 is the POP toaster for the individual mail users; and 170 is abuse.net, forwarding third-party messages. It's easy to see that abuse.net accounts for the largest part of the mail traffic, followed by majordomo, the POP toaster, and the spam trap with about equal traffic.

Another level of filters makes it possible to look at just mail to or from a particular address. Use xsender to pick out just mail sent from a particular address:

# assume /usr/local/qmailanalog/bin is in $PATH $ cat qmail-summary | xsender fred@example.com  | zoverall

Use xrecipient to pick out mail just to a particular address. The summary file prefixes each address by local. or remote., depending on whether deliveries are local or remote, and virtual addresses are expanded out to the full local address:

# assume /usr/local/qmailanalog/bin is in $PATH # local user $ cat qmail-summary | xrecipient local.fred@example.com  | zoverall # virtual user $ cat qmail-summary | xrecipient local.myvirt-fred@myvirt.com  | zoverall # remote user $ cat qmail-summary | xrecipient remote.fred@domain.com  | zoverall

There's also xqp to pick out particular message numbers, which I don't find very useful.

The full set of analysis programs includes:

zddist: Reports a percentage distribution showing how long mail deliveries take. If 90% of deliveries aren't done within a few seconds, there's probably a network problem.
zdeferrals: Reports delivery deferrals with reasons and can be useful if there are particularly recalcitrant remote hosts.
zfailures: Reports delivery failures with reasons, if you want to see all the bounce messages.
zoverall: Overall summary.
zrecipients: Summarizes all deliveries by recipient, with message counts and sizes.
zrhosts: Summarizes remote deliveries by recipient hosts, with counts and sizes.
zrxdelay: Summarizes deliveries by recipients, sorted by how fast mail to them is delivered.
zsenders: Summarizes messages by envelope sender, with counts and sizes.
zsendmail: A log report that is similar to sendmail's log report, for people who like that sort of thing.
zsuccesses: All successful deliveries, with the log messages and delivery delays.
zsuids: Summarize by numeric user ID.

It can be enlightening to run these programs from time to time to see if there are senders or recipients with inexplicably large amounts of mail, remote hosts that consistently reject large amounts of mail, or other anomalies.