9.2 Places to Filter

Filtering can be applied at several places in the receipt and delivery process. The earlier a filter is applied, the more quickly a message is dealt with. Filtering points include:

At connection time, for IP address and rDNS-based filters
During the SMTP session, before the message is received, for filters based on envelope information
During the SMTP session, after the message is received, for filters based on message contents
As the message is delivered, for user-customizable filters

Most systems use multiple filters applied at different points. The standard qmail SMTP daemon is very lightweight compared to most other MTAs, and does as little work as possible to collect the message and queue it, leaving all of the rest of the work for delivery time. Many of the spam filtering tools, such as Spamassassin, a complex filter that computes a "spamminess" score based on multiple criteria, can run at either SMTP time or delivery time.

If you run it at SMTP time, the disadvantage is that it ties up an incoming SMTP process a lot longer than normal, possibly causing mail to be rejected if tcpserver reaches its concurrency limit. Also, the SMTP daemon doesn't know where the mail will be delivered, which makes it hard to apply per-user parameters. The advantages of filtering at SMTP time are that mail can be rejected before it's queued, so the bounce goes back to the actual sending system rather than a probably forged return address; a message addressed to multiple recipients can be processed once rather than separately for each user; and in case of a barrage of spam, the tcpserver concurrency limits prevents mail from being accepted faster than it can be delivered.^[1]

^[1] Hitting the concurrency limit and rejecting mail is good if the rejected mail is spam; it's not so good if the rejected mail isn't spam. But legitimate mail software will retry the delivery, so real mail will only be delayed, not lost.

I used to think that only lightweight filters, such as IP address lookups in DNS blacklists and envelope address lookups in badmailfrom, should be run at SMTP time, but as the ratio of spam to real mail has grown, and I see blasts of spam come in that flood the queue and can take hours to filter at delivery time, now I think that it makes sense to run anything at SMTP time that isn't user-specific and doesn't need access to data that the SMTP daemon doesn't have.