4.1 Autowhitelisting

‚ < ‚ Day Day Up ‚ > ‚

SpamAssassin's autowhitelisting algorithm learns each sender's history of sending spam or non-spam messages and modifies the spam score of their subsequent mailings on the basis of this history. The primary goal of autowhitelisting is to reduce false positives ‚ to make it less likely that a non-spam message will be tagged as spam ‚ by assuming that people who send you non-spam messages will not begin to spam you. It can also reduce false negatives if a spammer consistently sends email from the same email address, but this happens infrequently enough that autowhitelisting rarely has a significant effect on false negatives .

4.1.1 Principles

When autowhitelisting is enabled, SpamAssassin maintains a database keyed on message senders' email addresses and the IP addresses of their nearest untrusted relay (if any). Each time a message from a given sender is received, the message's spam score is added to the sender's total score in the database, and a count of the number of messages received from that sender is updated.

The average sender score ‚ the total score divided by the number of messages received ‚ is used to modify the spam score of new messages from that sender. Specifically, the difference between the average score and the new message's score is multiplied by a configurable factor, and the result is added to the new message's spam score. The effect is that when the new message has a higher spam score than average, its spam score is adjusted downward; when the new message has a lower spam score than average, its spam score is adjusted upward.

As you might expect from this explanation, the autowhitelist tests are the last ones performed by SpamAssassin. All other tests must be run first in order to have the most accurate spam score for a message before comparing it to the sender's historical average. In addition, the sender's historical average is updated with the spam score of a new message before the autowhitelist modifier is applied.

4.1.2 Configuration

The most important decisions to make in autowhitelisting are how much weight SpamAssassin should put on a sender's history of sending spam or non-spam messages and how much weight it should put on the spam score of the message it is checking.

Use the auto_whitelist_factor directive to set the multiplier that is applied to the difference between a message's spam score and the sender's historical average score. It can range from to 1. The default factor is 0.5, which causes the final spam score to be halfway between the message's spam score and the sender's average score.

To put more weight on the historical average, increase the auto_whitelist_factor . When the auto_whitelist_factor is set to 1, the historical average alone will be the new message's spam score (recall, however, that the score before autowhitelisting is performed is fed back into the system and becomes part of the new historical average).

To put less weight on the historical average, decrease the auto_whitelist_factor . When the auto_whitelist_factor is set to 0, the historical average is ignored, and the current message's spam score will not be modified based on the sender's past messages.

Table 4-1 illustrates the impact of several different settings for auto_whitelist_factor . Each row of the table represents a new message from the same sender. Table columns show the spam score of each message before applying an autowhitelist modifier, the sender's historical average score, and the spam score after applying an autowhistelist modifier. In this example, the sender sends several non-spam messages and then sends a message that looks like spam to SpamAssassin (a false positive). As you can see, with autowhitelisting using factors of 0.5, 0.75, or 1, the message will not reach the usual spam threshold of 5 because of the sender's history of non-spam messages. Without autowhitelisting (i.e., with an factor of 0), the message receives a score of 6.

Table 4-1. The impact of auto_whitelist_factor (AWF)

Message number	Message score (before autowhitelist)	Sender average score	Score after autowhitelist with given AWF
‚	‚	‚		.5	.75	1
1	2	(none)	2	2	2	2
2	1	2	1	1.5	1.75	2
3	1	1.5	1	1.25	1.375	1.5
4		1.33		0.67	1.00	1.33
5	2	1.0	2	1.5	1.25	1.0
6	6	1.2	6	3.6	2.4	1.2

SpamAssassin stores its autowhitelist data in database files. SpamAssassin lets Perl's AnyDBM module choose which database format will be used, based on which system libraries are available. In SpamAssassin 3.0, you can control this choice by setting the auto_whitelist_db_modules option to a space-separated list of Perl database modules to be tried in order; the first module that loads successfully will be used. For example, the default module order is specified like this:

 auto_whitelist_db_modules DB_File GDBM_File NDBM_File SDBM_File

How you configure autowhitelisting also depends on whether you want each user to have his own whitelist database, or whether you want to use one database in common across all users.

4.1.2.1 Configuring per-user autowhitelists

By default, SpamAssassin maintains a separate autowhitelist for each user on the system. SpamAssassin stores the autowhitelist database for a user in the auto-whitelist file in the .spamassassin subdirectory of each user's home directory. SpamAssassin uses one of several database formats for this file, depending on what database libraries are available on the system; the Berkeley DB format is chosen when it's available.

SpamAssassin 3.0 can also store autowhitelists in an SQL database, which is useful when users don't have accounts on the mail server. To store addresses in SQL, you must install the DBI Perl module and an appropriate driver module for your SQL server. Common choices are DBD-mysql (for the MySQL server), DBD-Pg (for the PostgreSQL server), and DBD-ODBC (for connection to an ODBC-compliant server).

You should create a database and a user with privileges to access it. You must then create a table in the database to store the user autowhitelist. The SpamAssassin source code includes schemas for MySQL and PostgreSQL tables in the sql subdirectory. Here is the MySQL schema:

 CREATE TABLE awl (   username varchar(100) NOT NULL default '',   email varchar(200) NOT NULL default '',   ip varchar(10) NOT NULL default '',   count int(11) default '0',   totscore float default '0',   PRIMARY KEY  (username,email,ip) ) TYPE=MyISAM;

Each row in this table specifies an autowhitelist entry for a single sender for an individual SpamAssassin user. SpamAssassin uses the columns to store the following information:

username: Stores the username or email address of the user (the latter is more useful in virtual hosting environments).
email: Stores the email address of a sender whose messages' spam scores are being tracked.
ip: Stores the IP address of the sender.
count: Stores the total number of messages received from the sender.
totscore: Stores the total spam score of messages received from the sender.

To configure SQL support for autowhitelists, set the following configuration parameters in your systemwide configuration file ( local.cf ):

auto_whitelist_factory Mail::SpamAssassin::SQLBasedAddrList

Configures SpamAssassin to use SQL-based autowhitelists instead of file-based autowhitelists.

user_awl_dsn DSN

Defines the data source name for the SQL database, telling spamd how it will connect to the database server. A typical DSN for the Perl DBI module is written like this:

 DBI:   databasetype   :   databasename   :   hostname   :   port

For example, to use a MySQL database named saawl running on a database server on the SpamAssassin host, the DSN would read:

 DBI:mysql:saawl:localhost:3306

If the server were running PostgreSQL, the DSN would read:

 dbi:Pg:dbname=saawl;host=localhost;port=5432;

user_awl_sql_username username

Defines the username that will be used to connect to the database server. This user must have permission to modify the data in the table (including inserting and deleting rows).

user_awl_sql_password password

Defines the password associated with the username that will be used to connect to the server.

user_awl_sql_table tablename

Defines the name of the table that contains autowhitelist data. The default tablename is awl .

4.1.2.2 Configuring a system-wide autowhitelist

It is often desirable to maintain a single autowhitelist for all users of a system. When users don't have home directories, such an approach is not just desirable but may be necessary if autowhitelisting is to be used. You can configure a systemwide autowhitelist by setting the auto_whitelist_path directive to the full path of the autowhitelist database file. Set auto_whitelist_path in the systemwide configuration file. For example, to set up a systemwide autowhitelist in the file /etc/mail/spamassassin/auto-whitelist , use the following directive:

 auto_whitelist_path /etc/mail/spamassassin/auto-whitelist

If SpamAssassin encounters this directive, it checks to be sure the database file exists. If the file does not exist, SpamAssassin attempts to create it. You may not want to give SpamAssassin write access to the directory you specify. One way around that is to create the file as root , change its ownership to the SpamAssassin user, and set the mode to allow read/write access, all before you add the auto_whitelist_path to your configuration file.

However you create it, the systemwide autowhitelist database file should be readable and writable by the user running SpamAssassin. Depending on your configuration, SpamAssassin may be running as root , as one of several users on the system, or as a default unprivileged user such as nobody . If you let SpamAssassin create the systemwide autowhitelist database file, you can use the auto_whitelist_file_mode directive to specify the file's mode. It defaults to 0700 but may need to be set to 0770 or 0777 depending on your configuration, when multiple users must access the file.

Using a systemwide autowhitelist with mode 0777 (or 0770 and an inappropriate group ) will enable a curious local user to learn the email addresses of message senders and their average spam scores or to modify those scores . A malicious user could modify the database to give legitimate senders a false history of spamming . In general, file modes other than 0700 should be avoided.

4.1.3 Using an Autowhitelist

Once the autowhitelisting system is configured, you must instruct SpamAssassin to use it. In SpamAssassin 2.63, if you invoke SpamAssassin with the spamassassin script, add the --auto-whitelist option to direct the script to consult your autowhitelist. If you invoke SpamAssassin with the spamc client, you should start spamd (the daemon) with the --auto-whitelist option to direct it to consult user autowhitelists.

SpamAssassin 3.0 contains no --auto-whitelist command-line options. Instead, autowhitelists are always used when the use_auto_whitelist configuration option is set in a user's (or a systemwide) configuration file.

Using Autowhitelists in Perl

If you've written a Perl application that uses Mail::SpamAssassin to checks messages, you can take advantage of autowhitelists, but it requires a little additional setup. You must create an address list factory , an object that generates objects to store autowhitelisted addresses, and you must associate the address list factory with your Mail::SpamAssassin object. Here is sample code that does this:

 #!/usr/bin/perl use Mail::SpamAssassin; my $spamtest = Mail::SpamAssassin->new( ); my $awl = Mail::SpamAssassin::DBBasedAddrList->new; $spamtest->set_persistent_address_list_factory($awl); # Now go on to use $spamtest as usual.

Mail::SpamAssassin also provides methods for adding and removing addresses from the autowhitelist. See the manpage for more information.

You can use the spamassassin script to manipulate the contents of your autowhitelist. The following command-line options to spamassassin operate on your autowhitelist:

--add-addr-to-whitelist= emailaddress: Adds emailaddress to the autowhitelist with an initial score of -100. SpamAssassin will forget any past history associated with the address.
--add-addr-to-blacklist= emailaddress: Adds emailaddress to the autowhitelist with an initial score of 100. SpamAssassin will forget any past history associated with the address.
--remove-addr-from-whitelist= emailaddress: Removes emailaddress from the autowhitelist. SpamAssassin will forget any past history associated with the address.
--add-to-whitelist: When you pipe an email message to spamassassin --add-to-whitelist , SpamAssassin adds all email addresses found in the To , From , Cc , Reply-To , Sender , Errors-To , and Mail-Followup-To headers or in the body of the message to the autowhitelist with initial scores of -100. SpamAssassin will forget any past history associated with these addresses.
--add-to-blacklist: When you pipe an email message to spamassassin --add-to-blacklist , SpamAssassin adds all email addresses found in the To , From , Cc , Reply-To , Sender , Errors-To , and Mail-Followup-To headers or in the body of the message to the autowhitelist with initial scores of 100. SpamAssassin will forget any past history associated with these addresses. Because this behavior will probably result in the blacklisting of your own email address, this option is usually useless.
--remove-from-whitelist: When you pipe an email message to spamassassin --remove-from-whitelist , SpamAssassin removes all email addresses found in the To , From , Cc , Reply-To , Sender , Errors-To , and Mail-Followup-To headers or in the body of the message from the autowhitelist and forgets any past history associated with these addresses.

Be careful with --add-to-blacklist . A malicious spammer could send you HTML email with friendly addresses (including your own) embedded in invisible <mailto:> tags. Piping this message to spamassassin --add-to-blacklist causes SpamAssassin to add all of those addresses to the autowhitelist as likely spammers! Using --add-addr-to-blacklist with individual email addresses is safer.

‚ < ‚ Day Day Up ‚ > ‚