Spam Detection | Blocking Unsolicited Bulk Email

As long as you're not operating an open relay, you can be confident that your systems are not being used to harm other systems. Your next consideration is to protect yourself and your users by limiting the spam your network receives. Ideally, your mail server could simply reject any message that looks like spam. Unfortunately, whereas humans can look at a message and know instantly that it's spam, computers have a tougher time detecting it without making mistakes. The ugly truth is that once you start to reject spam, there is always a risk that you will block legitimate correspondence.

Misidentifying a legitimate message as spam is referred to as a false-positive identification. Your anti-spam efforts are an attempt to detect as much spam as you can with the fewest possible false-positives. You have to weigh the size of your spam problem against the possibility of rejecting real email when deciding how aggressive to be in implementing your anti-spam measures. The extremes range from permitting all spam to accepting mail only from preapproved individuals. Preapproval may seem severe, but the problem is getting bad enough for some people that whitelist applications, where any correspondent you receive mail from must be identified ahead of time, are becoming more common.

There are two primary ways of detecting spam: identifying a known spamming client and inspecting the contents of a message for tell-tale phrases or other clues that reveal the true nature of a spam message. Despite the difficulties, postmasters can achieve some success with minimal false-positives by implementing various spam-detection measures.

11.4.1 Client-Based Spam Detection

Client-blocking techniques use IP addresses, hostnames, or email addresses supplied by clients when they connect to deliver a message. Each piece of information supplied can be compared to lists of items from known spamming systems. Spamming systems might be owned by actual spammers, but they might also be unintentionally open relays managed by hapless, (almost) innocent mail administrators. In either case, if a system is regularly sending you spam, you will probably decide to block messages from it. One problem with identifying spam by IP address, hostname, or email address is that these items are easily forged. While the IP address of the connecting system requires some sophistication to spoof, envelope email addresses are trivial to fake.

11.4.1.1 DNS-based blacklists

In a grass-roots effort to stem the tide of spam on the Internet, various anti-spam services, generally called DNS-based Blacklists (DNSBL) or Realtime Blacklists, have developed. These services maintain large databases of systems that are known to be open relays or that have been used for spam. A newer, increasingly more common problem is with systems that have been hijacked by spammers who install their own proxy software that allows them to relay messages. These hijacked systems can also be used in distributed denial-of-service attacks. There are DNSBL lists that are dedicated to listing these unwitting spam relays. The idea is that by pooling the information from hundreds or thousands of postmasters, legitimate sites can try to stay ahead of spammers.

Usually, these systems work by adding a DNS entry to their domain space for each of the IP addresses in their database that have been identified as spam-friendly open relays. For example, if the host at IP address 192.168.254.31 has been identified as an open relay, the (fictitious) DNSBL service No Spam Unlimited using a domain name of nospam.example.com creates a DNS entry like 31.254.168.192.nospam.example.com. When a client connects to your Postfix system, Postfix can check the No Spam DNS server to see if there is an entry for the client's IP address. If the IP address has been identified as an open relay system, Postfix can reject the message.

Consider very carefully before you decide to make use of a DNSBL service. Many open relays used to forward spam also operate mail services for nonspamming users. You are very likely to block legitimate mail in addition to the spam. Also keep in mind that you are offloading to a third party the responsibility of making important decisions about who can and cannot send mail to your users. On the other hand, if you're buried in spam, DNSBL services can definitely help. If you decide to use one, review their service options and policies very carefully. Again, you have to balance your aggressiveness and the likelihood of losing legitimate mail against the magnitude of your spam problem.

11.4.2 Content-Based Spam Detection

In addition to identifying clients, you can often recognize spam by its contents. Certain strings within email messages mark them as likely to be spam ("Our Rates Have Never Been Lower!!"). But trying to distinguish spam by the contents of the message can be problematic. Imagine that you receive lots of spam offering new house mortgages. You figure you can eliminate most of it by blocking messages that contain words like "really low interest rate on a new mortgage." This may indeed block many spam messages, but you might also block a message from your friend (or one of your user's friends) who just got a great deal on a new house and wrote to tell you about it.

11.4.3 Detection Difficulties

The problem with both client- and content-based techniques to identify spam is that spammers are constantly finding ways to get around them. There is a sort of arms race going on between legitimate users of email and spammers. You can compile lists of open relays, but spammers expend a great deal of effort seeking out new open relays or proxy servers to abuse (and there always seem to be more of them).

You may discover that you receive a lot of spam with the same return address. You can block messages that use that return address, but spammers use hit-and-run tactics. They obtain an email address from one of the free email sites and use that address to send thousands or millions of spam messages, and then discard it for another. Within a couple of days, you'll never see the address you listed again.

Even content filters have to adjust for spammers escalating tactics. Some spammers embed HTML codes within the words of their messages to break up phrases you might filter against. Or they encode the entire message so that when Postfix scans it for recognized spam phrases, there are no intelligible phrases. Most email clients oblige users by automatically rendering such messagesdecoding or ignoring extraneous HTML codes. Recipients often don't even notice that the message had originally been encoded.

Introduction

Prerequisites

Postfix Architecture

General Configuration and Administration

Queue Management

Email and DNS

Local Delivery and POP/IMAP

Hosting Multiple Domains

Mail Relaying

Mailing Lists

Blocking Unsolicited Bulk Email

SASL Authentication

Transport Layer Security

Content Filtering

External Databases

Appendix A. Configuration Parameters

Appendix B. Postfix Commands

Appendix B. Postfix Commands

Appendix C. Compiling and Installing Postfix

Appendix D. Frequently Asked Questions

Appendix D. Frequently Asked Questions