Spam mail: you know it when you see it, and everyone has seen the ridiculous advertisements for investment schemes, pornography, herbal cures, and everything else imaginable. Many experts categorize all Unsolicited Commercial Email (UCE) as spam, and certainly UCE clutters electronic mailboxes around the world. However, legitimate advertisers who offer real " opt-out " schemes are not the worst problem. [1] The real problems come from spammers. What distinguishes spammers from legitimate advertisers is that spammers hide their identity and abuse other people's systems. These traits can be seen by the fact that:
You must do your part to reduce spam for everyone by ensuring that your system is not misused by remote spammers and that your local users do not contribute to spam. Spammers use open mail relays to hide their identity. See the recipes in Chapter 3 for examples of how to configure a mail relay host. Test your relay as shown in Chapter 3 every time the configuration is changed to ensure that it can't be abused. The first step in addressing local spammers is to create an acceptable use policy (AUP) that forbids spamming and defines the actions you will take to stop it. Make sure that the policy clearly states that email is not private and is subject to logging, scanning, and filtering. Because this is a policy, it must be approved and issued by management, and it must be reviewed and approved by the legal department. Policies are always a hassle because they are not technical solutions and because they involve management. However, it is a necessary step to give you the authority you need to analyze mail. Once the policy is in place, all users must agree to it as a condition of obtaining a user account. In addition to ensuring that your system doesn't contribute to the spam problem, there are technical tools that you can use to actively fight spam. sendmail provides a number of tools to combat incoming junk mail:
The access databaseThe access database provides fine-grained control over mail relaying and mail delivery. Use the access_db feature to add the access database to the sendmail configuration: FEATURE(`access_db') By default, the access database is a hash database, and its default pathname is /etc/mail/access.db . Use the optional argument field with the access_db feature to change the default values, as in this example: FEATURE(`access_db', `btree -T<TMPF> /var/mail/access') This example changes the database type, specifies the -T<TMPF> option to deal with temporary lookup failures, and changes the database pathname. Only change the defaults when it is truly necessary. Each line in the access database contains two fields: the key field and the return value. When the access database is used to control junk mail, the key to the database is an address, and the return value specifies the action that sendmail should take with regard to mail to or from the specified address. When the access database was used in Chapter 3, the return value was, logically enough, the keyword RELAY . To control spam, the recipes in this chapter use the access database to determine whether mail should be accepted from or delivered to specified addresses. For that, we need a different set of return values. The keywords that apply to controlling junk mail are listed in Table 6-1. Table 6-1. access database keywords used for spam control
The OK command tells sendmail to accept mail from the source identified by the address in the key field regardless of other conditions. For example, if the hostname specified in the address field cannot be resolved by DNS, sendmail accepts mail from that host even if the accept_unresolvable_domains feature is not enabled. OK accepts mail for local delivery; OK does not grant relaying privileges. The RELAY keyword, described in Chapter 3, is required for that. The REJECT keyword returns a standard error message to the source and rejects the mail. The DISCARD action drops the mail without sending an error message back to the source. Many anti-spam authorities disagree with silently discarding mail because they feel it does not discourage the spammer. For all the spammer knows, you received the mail, so he just keeps sending more junk. Other anti-spam authorities prefer to silently discard mail because they believe that responding to the mail in any form verifies the address for the spammer, which encourages the spammer to continue his assault. Both approaches protect your users from the spammer. However, using REJECT , which returns an error message to the source of the mail, helps when legitimate mail is accidentally classified as spam because it notifies the source of the mail that their mail was rejected. The REJECT action sends a default error message. Use the ERROR keyword to reject a message with your own custom error message. For example: example.com ERROR:5.7.1:550 Relaying denied to spammers In this case, the error message returned to the sender is "Relaying denied to spammers." This error message includes delivery status notification code 5.7.1 and the SMTP error code 550. Use a valid DSN code from RFC 1893 that is compatible with the RFC 821 error code and the message. [2] The format for the error message shown in Table 6-1 is ERROR : dsn : code text . This format is recommended, but not required. It could be specified without the keyword ERROR or the DSN code. However, that old error format has been deprecated. Use the ERROR keyword and the DSN code to ensure compatibility with future sendmail releases.
The FRIEND and HATER keywords only apply if the delay_checks feature is used to control when the check_mail and check_relay rulesets are applied. The FRIEND keyword in the return value allows mail that would normally be discarded by check_mail and check_relay to pass through the system to the recipient specified in the key field. When HATER is used, check_mail and check_relay are only applied to mail addressed to recipients who have the HATER keyword in their access database entries. FRIEND and HATER cannot appear in the same access database because the delay_checks feature must be configured to accept either FRIEND or HATER ”it cannot be configured to accept both at the same time. Recipe 6.13 provides an example of how these keywords are used. Most of the actions in Table 6-1 are described as affecting mail "from or to" an address. This is only true when tags are used or the blacklist_recipients feature is used. When that feature is not used, the actions only affect mail coming from a source address, unless the address field is modified by an optional To :, From :, or Connect : tag. These tags limit the address test to the envelope recipient, envelope sender, and connection address, respectively. [3] For example, an access database entry to reject connections from 10.0.187.215 might contain:
Connect:10.0.187.215 ERROR:5.7.1:550 UCE not accepted The Connect : tag limits the match to the address of the remote system that connected to the server to deliver the mail. If that address is 10.0.187.215 , the mail is rejected and the message "UCE not accepted" is returned to the sender. The address in the key field of an access database entry can define a user, an individual email address, a source IP address, a network address, or the name of a domain:
In addition to the Connect : example shown in the access database , other address formats can be used to identify mail from 10.0.187.215 . Here are a few: 10.0.187 REJECT [10.0.187.215] DISCARD example.com ERROR:5.7.1:550 Mail not accepted. The first two lines in this access database match IP addresses. The first entry rejects mail from any computer whose IP address begins with the network number 10.0.187 , which is the network from which the UCE was received. The second line defines a specific computer with the address 10.0.187.215 . The square brackets surrounding the individual address indicate that this IP address doesn't resolve to a hostname. The last entry defines an entire domain. It rejects mail from any host in the domain example.com . Of course, you wouldn't do this if you got only one piece of UCE from that domain, but if you consistently received junk mail from the domain, you may decide to block all mail from that domain until they improved their security. The dnsbl feature provides another method for blocking mail from specific hosts and domains. Blackhole lists with dnsbl and enhdnsblAdd the dnsbl feature to the sendmail configuration to use a blackhole list to block spam. A blackhole list is a DNS database that identifies spam contributors. The contributors can be the original source of the spam or open mail relays that permit spammers to relay mail. Blackhole list services are implemented through DNS. Every Unix system can issue DNS queries, so this is a very effective way to distribute information. Of course, a program can only make use of the information if it understands it, which sendmail does. The dnsbl feature accepts two optional arguments. The first argument is the name of the domain that contains the blackhole list. This defaults to the Realtime Blackhole List (RBL) that is maintained by the Mail Abuse Prevention System (MAPS). There are several other groups that maintain lists and make them available to the public. Table 6-2 lists a few of these. Table 6-2. Blackhole list services
To use a blackhole list specified in Table 6-2, point the first argument of the dnsbl feature to the domain listed in the table. For example, the following command would configure sendmail to use the Spamhaus Block List: FEATURE(`dnsbl', `sbl.spamhaus.org') The policies enforced by the different blackhole list services vary. Most of these services focus on blocking open relays instead of focusing on spam sources. The reason for this focus is that spam sources are constantly changing and hiding their true identities. They are aided in this, albeit unwittingly, by open relays. The open relay doesn't really want to help the spammer. Blocking the open relay's mail quickly gets the attention of the administrator of that relay, who fixes the system, and thus denies the spammer access to resources. This indirect defense against spam impacts many innocent, if naive, network users. For this reason, many consider blackhole lists to be a cure that is as bad as the disease, and they discourage the use of blackhole lists. There are many systems listed in the public blackhole lists. Any site that relays spam ”which could be your site if you don't properly configure sendmail ”is likely to be blackhole listed. This is one of the reasons that it is essential to configure relaying properly. A mistake in configuring relaying could get your site added to the blackhole list. If a site stops relaying spam, it should be removed from the list after about a month. Of course, this policy varies from list to list, as does the efficiency by which sites are added to and removed from the lists. If your site gets added to a blackhole list, fix the problem and apply to have your site removed from the list by following the instructions on the list's web site. Visit the web site of each listing service to find out more about the list before you start using it. The simplest way to block spam is to let someone else do it. However, while using a public blackhole list is simple, it isn't perfect. You can't choose which sites are added to the list, which means the list might block email from a friendly site just because the administrator at that site forgot to turn off relaying. You can, however, override entries in a blackhole list using the access database, as described in Recipe 6.4. For even more control, some organizations decide to build their own DNS-based blackhole list. Recipe 6.5 shows how you build and invoke your own blackhole list. The second argument available for the dnsbl feature is the error message displayed when mail is rejected because of the blackhole server. The format of the default message is: 550 Rejected $&{client_addr} listed at dnsbl-domain where $&{client_addr} is the IP address that was rejected, and dnsbl-domain is the DNS blackhole list that rejected the address (i.e., dnsbl-domain is the value from the first argument provided to the dnsbl feature). Use the second dnsbl argument only if you wish to change the standard error message. Most administrators stick with the default message. The third argument available for the dnsbl feature allows you to specify how temporary DNS lookup failures should be handled. By default, sendmail does not defer the message just because the blackhole list service is not able to respond to a DNS lookup. Placing a t in the third argument field causes sendmail to return a temporary error message and defer the message. Here is an example: FEATURE(`dnsbl', `sbl.spamhaus.org', ,`t') An alternative to the dnsbl feature is the enhdnsbl feature. The syntax of the enhdnsbl feature has the same three initial arguments as the dnsbl feature, but it adds a fourth argument. The fourth argument is the return value that sendmail expects from the DNS lookup. By default, any DNS lookup value returned by a blackhole list service indicates that the address being looked up is listed by the service and should therefore be rejected. The fourth argument allows you to change this so that only a value matching the fourth argument will trigger a rejection . The fourth argument does not need to be a single value: it can be a list of values or it can use the same operators as the lefthand side of a rewrite rule to match multiple values. Here is an example of the enhdnsbl feature: FEATURE(`enhdnsbl', `sbl.spamhaus.org', , ,`127.0.0.2', `127.0.0.3') This macro entry enables the enhdnsbl feature and uses the sbl.spamhaus.org blackhole list service. It tells sendmail to reject the incoming message if the blackhole list service returns either the value 127.0.0.2 or 127.0.0.3 in response to the lookup of the connection address. The access database and the blackhole lists block mail from known spam sources and from open relays. But not all spam comes from known spam sources. Sometimes you don't know it is junk mail until you read it. Mail filtering tools can examine the content of the mail and decide how it should be handled based on the information found in the mail itself. MILTERsendmail provides direct access via the sockets interface to external mail filtering programs, called MILTERs, which are written in accordance with the Sendmail Mail Filter API . External mail filters are defined in the sendmail configuration using either the INPUT_MAIL_FILTER macro or the MAIL_FILTER macro. Other than the macro name, the syntax of both macros is identical. For example, here is the syntax of the INPUT_MAIL_FILTER macro: INPUT_MAIL_FILTER( name, equates ) The name is an arbitrary name used by sendmail ”much like an internal mailer name or an internal database name. There are up to three equates written in the form of key-letter = value , where key-letter is one of the following:
Given this syntax, an INPUT_MAIL_FILTER macro that adds support for the external MIMEDefang program might look like the following: INPUT_MAIL_FILTER(`mimedefang',`S=unix:/var/run/mimedefang.sock, T=S:5m;R:5m') This macro defines mimedefang as the internal name for this filter. sendmail will create the socket /var/run/mimedefang.sock and communicate with the filter through this Unix socket. Because sendmail creates the socket, it should not already exist. The F equate is not used in the example, therefore, sendmail will continue to process the message in a normal manner even if the socket fails or the filter responds incorrectly. The T equate increases the send and receive timers to five minutes. The INPUT_MAIL_FILTER macro defines only one filter. To use multiple filters, add multiple INPUT_MAIL_FILTER or MAIL_FILTER macros to the sendmail configuration. When multiple filters are used, the difference between MAIL_FILTER and INPUT_MAIL_FILTER becomes clear. Here is an example. Assume the sendmail configuration contains the following macros: INPUT_MAIL_FILTER(`filter1', `S=unix:/var/run/filter1.soc') INPUT_MAIL_FILTER(`filter2', `S=unix:/var/run/filter2.soc') INPUT_MAIL_FILTER(`filter3', `S=unix:/var/run/filter3.soc') The INPUT_MAIL_FILTER macro sets the order in which the filters are used. Given these three macros and the order in which they are listed, sendmail would send data through filter1 , filter2 , and filter3 in that order. To create an equivalent configuration with the MAIL_FILTER macro requires four sendmail configuration lines: MAIL_FILTER(`filter1', `S=unix:/var/run/filter1.soc') MAIL_FILTER(`filter2', `S=unix:/var/run/filter2.soc') MAIL_FILTER(`filter3', `S=unix:/var/run/filter3.soc') define(`confINPUT_MAIL_FILTERS', `filter1, filter2, filter3') MAIL_FILTER macros do not set the order in which the filters are used; therefore, they must be accompanied by a confINPUT_MAIL_FILTERS define that specifies the order of execution. If the confINPUT_MAIL_FILTERS define is not used when MAIL_FILTER macros are used, the filters defined by the macro are ignored. Of course, when MAIL_FILTER macros and the confINPUT_MAIL_FILTERS define are used, mail filters do not need to be run in the order in which they are declared. For example, changing the confINPUT_MAIL_FILTERS define to the one shown here would run the filters in reverse order: define(`confINPUT_MAIL_FILTERS', `filter3, filter2, filter1') The filters used by sendmail are external programs. Any reasonably skilled programmer can write a basic filter program but creating one that is truly effective at fighting mail abuse is a challenge. Luckily, many skilled people have already created useful filters. Before you reinvent the wheel, search the Web for filters that may solve your email problems. Here are a few places to start:
There are many more MILTER- related web sites and many more MILTERs available. However, MILTERs are not the only tools available for mail filtering; procmail is also widely used. Filtering with procmailMost sendmail texts , and this one is no exception, concentrate on procmail for mail filtering. There are a few reasons for this emphasis:
The variety of ways that procmail can be invoked add to the flexibility of this tool. As noted above, sendmail can be configured to use procmail as the local mailer. procmail can also be invoked from the shell command line. It can be invoked from the mailertable , as described in Recipe 6.8, and it can be executed from the user's .forward file, in the following manner: $ cat > .forward " /usr/bin/procmail " Ctrl-D A common mistake is to think that, because system-wide mail filters affect a large number of users, the most important mail filtering takes place at the system level. User-level mail filtering is just as important as system-level filters. User-level filters:
For these reasons, and more, it is good to encourage users to learn about and use the mail filters available to them. Many end user mail tools come with mail filtering features. These are not integrated into sendmail and thus are not discussed here. procmail is a powerful, although complex, mail filtering system. Personal procmail filters are defined by the user in the user's home directory in a file named .procmailrc . The system administrator defines system-wide mail filters in the /etc/procmailrc file and uses the /etc/procmailrc file for general anti-spam filtering. The end user uses .procmailrc to add filtering for personal preferences. The format of both files is the same. The .procmailrc file contains two types of entries: environment variable assignments and mail filtering rules, which are called recipes in procmail parlance. Environment variable assignments are straightforward and look just like these assignments would in a shell script. For example, HOME=/home/ craig is a valid environment variable assignment. The .procmailrc manpage lists more than 30 environment variables . The real substance of a .procmailrc file are the recipes. The syntax of each recipe is: :0 [ flags ] [:[ lockfile ]] [* condition ] action Every recipe begins with :0 , which differentiates it from an assignment statement. The :0 is optionally followed by flags that change how the filter is processed . Table 6-3 lists the flags and their uses. Table 6-3. procmail recipe flags
Use the optional lockfile variable to specify the name of the local lock file to be used for this recipe. The lock file prevents multiple copies of procmail from writing to the same mailbox at the same time, which can happen on a busy system. The lockfile name is preceded by a colon. If the colon is used and no name is specified, a default lockfile name created from the mailbox name and the extension .lock is used. If no local lock file is specified, a default lock file will be used. However, the procmail documentation encourages the use of local lock files. The conditional test is optional. If no condition is provided, the recipe acts as if the condition is true, which means that the action is taken. If a condition is specified it must begin with an asterisk ( * ). The condition is written as a regular expression. If the value defined by the regular expression is found in the mail, the condition evaluates to true and the action is taken. To take an action when mail does not contain the specified value, put an exclamation in front of the regular expression. Here are some examples of valid conditional tests: * ^From.*simon@oreilly\.com * !^Subject: Chapter The first conditional checks to see if the mail contains a line that begins with ( ^ ) the literal string From followed by any number of characters ( .* ) and the literal string simon@oreilly.com . The second conditional matches all mail that does not ( ! ) contain a line that begins with the string Subject : Chapter . If multiple conditions are defined for one recipe, each condition appears on a separate line. While there may be multiple conditions in a procmail recipe, there can be only one action . The action can direct the mail to a file, forward it to another email address, or send it to a program, or the action can define additional recipes to process the message. If the action is an additional recipe, it begins with :0 . If the action directs the mail to an email address, it begins with an exclamation ( ! ), and if it directs the mail to a program, it begins with a vertical bar ( ). If the action directs the mail to a file, just the name of the file is specified. The following example illustrates how mail is passed to an external program for processing: :0 B * .*pheromones awk -f spamscript > spam-suspects The B flag applies the conditional test to the content of the message body. All messages that contain the word "pheromones" anywhere in the message body are passed to awk for processing. In this example, awk runs a program file named spamscript that extracts information from the mail and stores it in a file named spam-suspects . You can imagine that the administrator of this system created spamscript to extract the email addresses from suspected spam. The example shows procmail filtering the message body. By default, procmail looks at the message headers. Message headers can also be given special attention inside of the sendmail configuration by using custom rulesets. Custom rulesetssendmail allows you to define custom processing for the addresses and headers from incoming mail, and it provides some hooks for this purpose. The hooks used for custom address processing are:
These rulesets are not specially designed to detect and delete junk mail; they have a broader applicability. However, these ruleset hooks are useful for fighting spam. In addition to these hooks that are called from standard rulesets, a ruleset can be called from a header definition to perform custom header processing. The basic syntax of the sendmail.cf H command defines the format of mail headers. In the basic syntax, the header definition starts with the H command followed by the name of the header and the format of that header. The syntax to call a ruleset from an H command is: Hname: $> ruleset where name is the header name and ruleset is the ruleset called to process incoming headers of that name. Use this capability to check incoming headers to detect spam mail from the header information. Recipe 6.9 provides an example of how this capability is used. |