Section 23.3. Filtering Spam

23.3. Filtering Spam

The constant flood of so-called spam (more precisely, unsolicited commercial email) has decreased the usefulness of email as a communication medium considerably. Luckily, there are tools that can help us with that as well. These are called spam filters, and what they do is to attempt to categorize each incoming message according to a large number of rules to determine whether it is spam. The filters then mark up the message with either certain additional header lines or a changed subject line. It is then your task (or your mail user agent's task) to sort the messages according to these criteria into separate folders (or, quite dangerously, into the trash can directly). At the end of the day, you decide how aggressively you want to handle spam. You need to make up your mind what is more important to you: to filter out as much spam as possible, or to ensure that no important message (such as a request from a potential customer) will ever get filtered out.

There are two different ways of using a spam filter: either directly on the mail server, or in your email client. Filtering directly on the mail server is advantageous if the mail server serves more than one mail client, because then the same set of filtering rules can be applied and maintained for all users connected to this mail server, and a message coming in to several users on this server only needs to pass the spam filter once, which saves processing time. On the other hand, filtering on the client side allows you to define your own rules and filter spam completely.

The best-known spam filter in the Linux world (even though it is by no means Linux-dependent) is a tool called SpamAssassin . You can find lots of information about SpamAssassin at its home page, http://spamassassin.spache.org. SpamAssassin can work both on the server and on the client; we'll leave it to you to read the ample documentation available on the web site for installing SpamAssassin on a Postfix (or other) mail server.

When SpamAssassin is run on a server, the best way to use it is to let it run in client/server mode. That way, the large tables that SpamAssassin needs do not have to be reread for each message. Instead, SpamAssassin runs as a daemon process called spamd, which is accessed for each message by a frontend command called spamc.

If you want to configure your email client to use SpamAssassin, you need to pipe every incoming email through the command spamassassin (you can even use the spamc/spamd combo on the client, of course). spamassassin will accept the incoming message on standard input, analyze it, and write the changed message to standard output. Most modern mail user agents have facilities for piping all (or just some) incoming messages through an external command, so you should almost always find a way to hook up spamassassin somehow.

If SpamAssassin has analyzed your message to be spam, it will add the header line:

 X-Spam-Status: Yes

to your message. It is then up to you to configure the filters in your email client to do to this message whatever you want to be done to spam (sort into a separate folder, move directly to the trash can, etc.). If you want to do more detailed filtering, you can also look at the header line starting with:

 X-Spam-Status:

This marker is followed by a number of stars; the more stars there are, the more likely the message is spam.

Before we look at one email client in more detail, to sum up, you need to do two things in order to set up SpamAssassin on the client:

Configure your mail client to pipe each incoming message through the spamassassin command.
According to the header lines added by SpamAssassin, filter the message per your personal requirements.

You can even use the procmail command that we covered in the previous section to pass the email messages through spamassassin. http://wiki.apache.org/spamassassin/UsedViaProcmail has ample information about how to do this.

As an example of how you can set up an email client to support SpamAssassin, we will look at KMail, the KDE email client. KMail allows you to perform the steps just mentioned, of course. But it can also automate the procedure by means of the anti-spam wizard. You can invoke it from Tools Anti-Spam Wizard. This tool first scans for the available anti-spam tools on your system (searching for a couple more than just SpamAssassin), and then lets you select those that you want KMail to use. It is not a good idea to just select all available tools here, because each additional filtering slows down the processing of incoming email messages.

On the next page of the wizard, you will be given a number of options of what to do with spam. You should check at least "Classify messages using the anti-spam tools" and "Move detected messages to the selected folder." Then select a target folder for messages that are quite sure to be spam, and a target folder for messages where it is a bit less certain. Once you click Finish, KMail sets up all the necessary filter rules for you, and on your next email download, you can watch the spam folders filling. Your inbox should be, if not completely spam-free, then still a lot more free from spam than previously.

SpamAssassin has a lot of functionality that we have not covered at all here. For example, it contains a Bayes filter that operates on statistical data. When a spam message that comes into the system is not marked as spam, you can teach SpamAssassin to recognize similar messages as spam in the future. Likewise, if a message is erroneously recognized as spam, you can teach SpamAssassin to not consider messages like it as spam in the future (but rather ham, as the opposite of spam is often called). Please see the SpamAssassin documentation on how to set this up.

We have now discussed a number of options that you have when setting up your email system. Our advice is to start slowly, setting up one piece at a time and making sure that everything works after each step; trying to perform the whole setup in one go can be quite challenging.