Previously, we've considered spam suppression features in both sendmail and Postfix. These features can be very effective at blocking some spam before it ever enters your site. The procmail program, written by Stephen van den Berg, offers a different method for accomplishing this task. The package's homepage is http://www.procmail.org. In fact, procmail is a very powerful, general-purpose mail filtering facility. Its capabilities are not limited to removing spam; procmail can be used for several different purposes:
In fact, procmail is the mail filtering tool of choice for most users on Unix systems. procmail can be applied to incoming mail in two main ways:[34] by using it as the local delivery agent (the program to which the transport agent hands off local messages for actual delivery), or by piping incoming mail for individual users to it, usually in the.forward file, as in this canonical example:
"|IFS=' ' && exec /usr/bin/procmail -Yf- || exit 75 #username" This example first sets the shell's interfield separator character to a space (see Chapter 3) and execs procmail, specifying -Y (assume BSD mailbox format) and -f- (which tells the program to update the timestamp in the leading From header). You may need to modify the path to one appropriate to your system. If you want to be extra cautious, you can use an entry like this one: "|IFS=' ' && p=/usr/bin/procmail && test -f $p && exec $p -Yf- || exit 75 #username" This version tests for the existence of the procmail executable before running it. The output is wrapped here, but it is a single line in the .forward file. In any case, if the procmail program fails, the process returns an exit code of 75. The final item is a shell comment, but it is required. As the procmail man page explains it, this item "is not actually a parameter that is required by procmail; in fact, it will be discarded by sh before procmail ever sees it; it is however a necessary kludge against overoptimizing sendmail programs." Whatever.
9.6.1 Configuring procmailprocmail gets its instructions about which mail filtering operations to perform in a configuration file. The systemwide configuration file is /etc/procmailrc . The user-specific procmail configuration file is ~/.procmailrc. The systemwide configuration file is also invoked when individual users run procmail unless its -p option is included or the configuration file to use is explicitly specified as the command's final argument. NOTE
When procmail is being used only on a per-user basis, it is best to leave the global configuration file empty. Actions specified in the global configuration file are run in the root account context, and you have to set up this file very carefully in order to avoid security risks. procmail examines each successive mail message it receives and applies the various filters defined in the configuration file (known as "recipes") in turn. The first recipe that results in a destination or other disposition for the message causes all further processing to stop. If all of the recipes are applied without effect in other words, if the message passes unaffected through all the filters the mail is appended to the user's normal mailbox (which can be defined via the procmail DEFAULT variable). procmail configuration file entries have this general format: :0 [flags] Indicates the start of a new recipe. * condition Zero or more lines of regular expressions. disposition Destination/treatment of matching messages. Let's begin with some simple examples: # Define variables PATH=/bin:/usr/bin:/usr/local/bin:$HOME/bin:/usr/sbin MAILDIR=$HOME/Mail DEFAULT=$MAILDIR/unseen # Discard message from this user. :0 * ^From.*jerk@bad-guys.org /dev/null # Copy all mail messages into my message archive file. :0c: archive The initial section of the configuration file defines some procmail variables: the search path, the mail directory, and the default message destination for messages not redirected or discarded by any recipe. The first recipe filters out mail from user jerk at bad-guys.org by redirecting it to /dev/null. Note that the condition is a regular expression against which incoming message text is matched. Contrary to expectations, however, pattern matching is not case-sensitive by default. The second recipe unconditionally copies all incoming messages to the file ~/Mail/archive relative pathnames are interpreted with respect to MAILDIR while retaining the original message in the input stream. Since there is no condition specified, all messages will match and be processed by the recipe. Copying occurs because the c flag (clone the message) is included in the start line. As this recipe indicates, the start line can potentially include a variety of items. The 0 can be followed by one or more code letters (flags specifying message-handling variations), and the entire string can be followed by another colon, which causes procmail to use a lock file when processing a message with this recipe. The lock file serves to prevent multiple procmail processes, handling different mail messages (as might be generated by the transport agent when mail is arriving rapidly), from trying to write to the same file simultaneously. The terminal colon can optionally be followed by a lock-file name. In most cases, the filename is left blank (as it was here), allowing procmail to generate the name itself. If this was the entire .procmailrc configuration file, all messages not discarded by the first recipe would end up in the location specified by the DEFAULT variable: ~/Mail/unseen. Similarrecipes can be used to direct procmail to sort incoming mail into bins: # Set directory for relative pathnames HOME=/home/aefrisch/Mail # Sort and transfer various types of messages :0: * ^From: (patti_singleton|craig_stone|todd_stone)@notaol\.org new-family :0c: * ^TO_help@zoas\.org support/archive :0: * ^TO_help@zoas\.org * ^Subject: Case.*[GVPM][0-9][0-9][0-9]+ support/existing :0: * ^TO_help@zoas\.org support/incoming The first recipe sends mail from various users at notaol.org to the indicated mail folder (they are some of my siblings). The remaining three recipes copy all messages addressed to help into the file archive in the indicated directory and sort the messages into two other mail folders. The third recipe directs messages whose subject line begins with "Case" and contains one of the indicated letters followed by three or more consecutive digits into the existing file, and all other messages go into the incoming file (both in my ~/support subdirectory). The ordering of configuration-file recipes can be important. For example, mail to help from one of my siblings will still go into the new-family file, not one of the ~/Mail/support files. The ^TO_ component used in some of the preceding recipes is actually a procmail keyword, and it causes the program to check all recipient-related headers for the specified pattern. You can specify more than one condition by including multiple asterisk lines: # Define a FROM header set FROM="^(From[ ]|(Resent-)?(From|Reply-To|Sender):)" # Discard some junk :0H * $ $(FROM).*@bad-guys\.org * ^Subject: .*what a deal /dev/null :0 * ^Subject:.*last chance|\ ^Subject:.*viagra|\ ^Subject:.*?? /dev/null The first recipe discards mail from anyone in the indicated domain that contains the indicated string in the subject line. Note that conditions are joined with AND logic. If you want to use OR logic, you must construct a single condition using the regular expression | construct. The second recipe provides an example of doing so. Its search expression could be written more succinctly, but this way it is easier to read. This recipe also illustrates the use of configuration-file variables. We define one named FROM, which matches a variety of headers indicating a message's sender/origin (the square brackets contain a space and a tab character). The variable is then used in the first condition, and the initial dollar sign is required to force variable dereferencing within the pattern. 9.6.1.1 Other procmail disposition optionsYou can also use apipe as the destination by including a vertical bar as the first character in the line: # Run message (except from root and cron) through a script :0 * !^From: (root|cron) | $HOME/bin/chomp_mail.pl This recipe sends all mail not from root or cron (the exclamation mark indicates a negative test) to the indicated Perl script. We don't use procmail locking here; if the script does any writing to files, it will need to do its own locking (procmail locking is not recommended for this purpose). NOTE
Be aware that procmail assumes that commands will be executed in the context of the Bourne (sh) shell at a very deep level. If your login shell is a C shell variant, place the following command at the top of your procmail configuration file to avoid unwanted weirdness: $SHELL=/bin/sh In these next examples, we forward mail to another user and generate and send a mail message within procmail recipes: # Distribute CCL mail list messages related to Gaussian :0 * ^Subject: CCL:.*g(aussian|9) ! ccl_gauss,ccl_all # Distribute remaining CCL mailing list messages :0 * ^Subject: CCL: ! ccl_all # Send rejection message to this guy :0 * ^From:.*persistent@bad-guys\.org * !X-Loop: chavez@ahania.com | ( formail -r -a "X-Loop: chavez@ahania.com"; \ echo "This is an auto-generated reply."; \ echo "Mail rejected; it will never be read." ) \ | sendmail -t -oi The first recipe distributes selected items from a mailing list to a group of local users. Messages from the mailing list are identifiable by the beginning of their subject lines, and the recipe selects messages with either "gaussian" or "g9" anywhere in the subject line. The selected messages are forwarded to the two indicated local users, which are actually aliases expanding to a list of users. The second recipe sends all the remaining messages from the same list to the ccl_all alias. The users in this internal list want to receive the entire mailing list, and the combination of recipes 1 and 2 produces that result. The final recipe sends a reply to any mail messages from the specified user. It uses the formail utility, which is part of the procmail package. The formail -r command creates a reply to the mail message the command receives as input, discarding existing message headers and the message body. The new body text is created via the two echo commands which follow, and the completed message is piped to sendmail for submission to the mail facility. sendmail's -t option tells the program to determine the recipients from the message headers, and -oi causes it not to treat a line containing a sole period as the end of input (only rarely needed, but traditionally included just to be safe). This message also illustrates a technique for avoiding mail loops with procmail. The formail command adds anX-Loop header to the outgoing mail message (via the -a option). The conditions also check for the presence of this header, bypassing the message when it is found. In this way, this recipe prevents procmail from processing the generated message should it bounce. Table 9-11 lists some useful formail options.
procmail recipes can also be used to transform incoming mail messages. Here is a nice example by TonyNugent (slightly modified): # --- Strip out PGP stuff --- :0fBbw * (BEGIN|END) PGP (SIG(NATURE|NED MESSAGE)|PUBLIC KEY BLOCK) | sed -e 's+^- -+-+' \ -e '/BEGIN PGP SIGNED MESSAGE/d' \ -e '/BEGIN PGP SIGNATURE/,/END PGP SIGNATURE/d' \ -e '/BEGIN PGP PUBLIC KEY BLOCK/,/END PGP PUBLIC KEY BLOCK/d' # Add (or replace) an X-PGP header :0Afhw | formail -I "X-PGP: PGP Signature stripped" These recipes introduce several new procmail flags. The set in the first recipe, Bfw, tells procmail to search the message body only (B) (the default is the entire message), that the recipe is a filter (f) and messages should continue to be processed by later configuration file entries after it completes, and that the program should wait for the filter program to complete before proceeding to the next recipe in the configuration file (w). The sed command in the disposition searches for various PGP-related strings within the message body (b flag). When found, it edits the message, replacing two space-separated hyphens at the beginning of a line with a single hyphen and removing various PGP-related text, signature blocks and public key blocks (accomplishing the last two operations by using sed's text section-removal feature). The next recipe will be applied only to messages that matched the conditions in the previous recipe (the A flag), operating as a filter (f) on the message headers only (h) and waiting for the filter program to complete before continuing with the remainder of the configuration file (w). The disposition causes the message to be piped to formail, where an X-PGP header is added to the message or an existing header of this type is replaced (-I option). Table 9-12 lists the most important procmail start-line flags.
9.6.1.2 Using procmail to discard spamprocmail can be very useful in identifying and removing spam messages. For it to be successful, you must be able to describe common patterns in the messages you want to treat as spam and write recipes accordingly. In this section, we will look at a variety of recipes that may be useful as starting points for dealing with spam. They happen to come from my own .procmailrc file, and so are applied only to my mail. As an administrator, you can choose to deal with spam at several levels: via the transport agent (e.g., checking against blacklists), at the system level, and/or on a per-user basis. In the case of procmail-based filtering, anti-spam recipes can be used in a systemwide procmailrc file or made available to users wanting to filter their own mail. The following recipe is useful at the beginning of any procmail configuration file, because it formats mail headers into a predictable format: # Make sure there's a space after header names :0fwh |formail -z The next two recipes provide simple examples of one approach to handling spam: # Mail from mailing lists I subscribe to :0: * ^From: RISKS List Owner|\ ^From: Mark Russinovich to-read # Any other mail not addressed to me is spam # Warning: may discard BCC's to me :0 * !To: .*aefrisch /dev/null Spam is discarded by the second recipe, which defines spam as mail not addressed to me. The first recipe saves mail from a couple of specific senders to the file to-read. It serves to define exceptions to the second recipe, because it saves messages from these two senders regardless of who they are addressed to. This recipe is included because I want to retain the mail from the mailing lists corresponding to these senders, but it does not arrive addressed to me. In fact, there are other recipes which fall between these two, because there are a lot of exceptions to be handled before I can discard every message not addressed to me. Here are two of them: # Mail not addressed to me that I know I want :0: * !To: .*aefrisch * ^From: .*oreilly\.com|\ ^From: .*marj@zoas\.org|\ ^From: aefrisch $DEFAULT # Keep these just in case :0: * ^To: .*undisclosed.*recipients spam The first recipe saves mail sent from the specified domain and the remote user marj@zoas.org via the first two condition lines. I include this recipe because I receive mail from these sources which is not addressed to me and thus can resemble spam because of the way their mailer programs handle personal mailing lists. I also retain messages from myself, which result from a CC or BCC on an outgoing message. The second recipe saves files addressed to any variant of "Undisclosed Recipients" to a file called spam. Such mail is almost always spam, but once in a while I discover a new exception. The next few recipes in my configuration file handle mail that is addressed to me but is still spam. This recipe discards mail with any of the specified strings anywhere in the message headers: # Vendors who won't let me unsubscribe :0H * cdw buyer|spiegel|ebizmart|bluefly gifts|examcram /dev/null Such messages are spam sent by vendors from which I did once buy something and who ignore my requests to stop sending me email. The next two recipes identify other spam messages based on the Subject: header: # Assume screaming headers are spam :0D * ^Subject: [-A-Z0-9\?!._ ]*$ /dev/null # More spam patterns :0 * ^Subject: .*(\?\?|!!|\$\$|viagra|make.*money|out.*debt) /dev/null The first recipe discards messages whose subjects consist entirely of uppercase letters, numbers, and a few other characters. The second message discards messages whose subject lines contain two consecutive exclamation marks, question marks or dollar signs, the word "viagra," "make" followed by "money," or "out" followed by "debt" (with any intervening text in the latter two cases). It is also possible to check mail senders against the contents of an external file containing spam addresses, partial addresses, or any other patterns to be matched: # Check my blacklist (a la Timo Salmi) :0 * ? formail -x"From" -x"From:" -x"Sender:" -x"X-Sender:" \ -x"Reply-To:" -x"Return-Path" -x"To:" | \ egrep -i -f $HOME/.spammers /dev/null This recipe is slightly simplified from one by TimoSalmi. It uses formail to extract just the text from selected headers and pipes the resulting output into the egrep command, taking the patterns to match from the file specified to its -f option (-i makes matches case insensitive). My spam identification techniques are very simple and therefore quite aggressive. Some circumstances call for more restraint than I am inclined to use. There are several ways of tempering such a drastic approach. The most obvious is to save spam messages to a file rather than simply discarding them. Another is to write more detailed and nuanced recipes for identifying spam. Here is an example: # Discard if From:=To: SENTBY=`formail -z -x"From:"` :0 * ! ^To: aefrisch * ? ^To: .*$SENTBY /dev/null This recipe discards messages where the sender and recipient addresses are the same a classic spam characteristic and are different from my address. The contents of the From: header are extracted to the SENTBY variable via the backquoted formail command. This variable is used in the second condition, which examines the To: header for the same string. More complex versions of such a test are also possible (e.g., one could examine more headers other than just From:). There are also a myriad of existing spam recipes that people have created available on the web. 9.6.1.3 Using procmail for security scanningprocmail 's pattern-matching and message-disposition features can also be used to scan incoming mail messages for security purposes: for viruses, unsafe macros, and so on. You can create your own recipes to do so, or you can take advantage of the ones that other people have written and generously made available. In this brief section, we will look at BjarniEinarsson's Anomy Sanitizer (see http://mailtools.anomy.net/sanitizer.html). This package is written in Perl and requires a basic knowledge of Perl regular expressions to configure.[36] Once configured, you can run the program via procmail using a recipe like this one:
:0fw |/usr/local/bin/sanitizer.pl /etc/sanitizer.cfg This recipe uses the sanitizer.pl script as a filter on all messages (run synchronously), using the configuration file given as the script's argument. The package's configuration file, conventionally /etc/sanitizer.cfg , contains two types of entries: general parameters indicating desired features and program behavior, and definitions of file/attachment types and the way they should be examined and modified. Here are some examples of the first sort of configuration file entries: # Global parameters feat_log_inline = 1 # Append log to modified messages. feat_log_stderr = 0 # Don't log to standard error also. feat_verbose = 0 # Keep logging brief. feat_scripts = 1 # Sanitize incoming shell scripts. feat_html = 1 # Sanitize active HTML content. feat_forwards = 1 # Sanitize forwarded messages. # Template for saved file names file_name_tpl = /var/quarantine/saved-$F-$T.$$ The first group of entries specify various aspects of sanitize.pl's behavior, including level of detail and destinations for its log messages as well as whether certain types of message content should be "sanitized": examined and potentially transformed to avoid security problems. The final entry specifies the location of the package's quarantine area: the directory location where potentially dangerous parts of mail messages are stored after being removed. The next set of entries enables scanning based on file/attachment-extension and specifies the number of groups of extensions that will be defined and default actions for all other types: feat_files = 1 # Use type-based scanning. file_list_rules = 3 # We will define 3 groups. # Set defaults for all other types file_default_policy = defang # Rewrite risky constructs. file_default_filename = unnamed.file # Use if no file name given. A sanitizer policy indicates how a mail part/attachment will be treated when it is encountered. These are the most important defined policies:
<DEFANGED_SCRIPT language=JavaScript>
We'll now turn to some example file-type definitions. This set of entries defines the first file type as the filename winmail.dat (the composite mail message and attachment archive generated by some Microsoft mailers) and all files with extensions .exe, .vbs, .vbe, .com. ,chm, .bat, .sys or .scr: # Always quarantine these file types file_list_1_scanner = 0 file_list_1_policy = save file_list_1 = (?i)(winmail\.dat file_list_1 += |\.(exe|vb[es]|c(om|hm)|bat|s(ys|cr))*)$ Notice that the file_list_1 parameter defines the list of filenames and extensions using Perl regular expression syntax. The policy for this group of files is save, meaning that files of these types are always removed from the mail message and saved to the quarantine area. The attachment is replaced by some explanatory text within the modified mail message: NOTE: An attachment was deleted from this part of the message, because it failed one or more checks by the virus scanning system. The file has been quarantined on the mail server, with the following file name: saved-Putty.exe-3af65504.4R This message is a bit inaccurate, since in this case the attachment was not actually scanned for viruses but merely identified by its file type, but the information that the user will need is included. NOTE
Clearly, it will be necessary to inform users about any attachment removal and/or scanning policies that you institute. It will also be helpful to provide them with alternative methods for receiving files of prohibited types that they may actually need. For example, they can be taught to send and receive word-processing documents as Rich Text Format files rather than, say, Word documents. Here are two more examples of file group definitions: # Allow these file types through: images, music, sound, etc. file_list_2_scanner = 0 file_list_2_policy = accept file_list_2 = (?i)\.(jpe?g|pn[mg] file_list_2 += |x[pb]m|dvi|e?ps|p(df|cx)|bmp file_list_2 += |mp[32]|wav|au|ram? file_list_2 += |avi|mov|mpe?g)*$ # Scan these file types for macros, viruses file_list_3_scanner = 0:1:2:builtin 25 file_list_3_policy = accept:save:save:defang file_list_3 = (?i)\.(xls|d(at|oc|ot)|p(pt|l)|rtf file_list_3 += |ar[cj]|lha|[tr]ar|rpm|deb|slp|tgz file_list_3 += |(\.g?z|\.bz\d?))*$ The first section of entries defines some file types that can be passed through unexamined (via the accept policy). The second group defines some extensions for which we want to perform explicit content scanning for dangerous items, including viruses and embedded macros in Microsoft documents. The file_list_3 extension list includes extensions corresponding to various Microsoft document and template files (e.g., .doc, .xls, .dot, .ppt and so on) and a variety of popular archive extensions. The scanner and policy parameters for this file group now contain four entries. The file_list_3_scanner parameter's four colon-separated subfields define four sets of return values for the specified scanning program: the values 0, 1, and 2 and all other return values resulting from running the builtin program. The final subfield specifies the program to run here it is a keyword requesting sanitizer.pl's built-in scanning routines with the argument 25 and serves as a placeholder for all other possible return values that are not explicitly named in earlier subfields (each subfield can hold a single or comma-separated list of return values). The subfields of the file_list_policy_3 parameter define the policy to be applied when each return value is received. In this case, we have the following behavior:
By default, the sanitizer.pl script checks macros in Microsoft documents for dangerous operations (e.g., attempting to modify the system registry or the Normal template). However, I want to be more conservative and quarantine all documents containing any macros. To do so, I must modify the script's source code. Here is a quick and dirty solution to my problem, which consists of adding a single line to the script: # Lots of while loops here - we replace the leading \000 boundary # with 'x' characters to ensure this eventually completes. # $score += 99 while ($buff =~ s/\000Macro recorded/x$1/i); $score += 99 while ($buff =~ s/\000(VirusProtection)/x$1/i); The line in bold is added. It detects within the document macros that have been recorded by the user. The solution is not an ideal one, because there are other methods of creating macros which would not be detected by this string, but it illustrates what is involved in extending this script, if needed. 9.6.1.4 Debugging procmailSetting up procmail configuration files can be both addictive and time-consuming. To make debugging easier, procmail provides some logging capabilities, specified with these configuration file entries: LOGFILE=path LOGABSTRACT=all These variables set the path to the log file and specify that all messages directed to files belogged. If you would like even more information, including a recipe-by-recipe summary for each incoming message, add this entry as well: VERBOSE=yes Here are some additional hints for debugging procmail recipes:
You can also run the sanitizer.pl script to test your configuration with a command like this one: # cat mail-file | /path/sanitizer.pl config-file You will also want to include this line within the configuration file: feat_verbose = 1 # Produce maximum detail in log messages. 9.6.1.5 Additional informationHere are some other useful procmail-related web pages:
|