‚ < ‚ Day Day Up ‚ > ‚ |
4.1 AutowhitelistingSpamAssassin's autowhitelisting algorithm learns each sender's history of sending spam or non-spam messages and modifies the spam score of their subsequent mailings on the basis of this history. The primary goal of autowhitelisting is to reduce false positives ‚ to make it less likely that a non-spam message will be tagged as spam ‚ by assuming that people who send you non-spam messages will not begin to spam you. It can also reduce false negatives if a spammer consistently sends email from the same email address, but this happens infrequently enough that autowhitelisting rarely has a significant effect on false negatives . 4.1.1 PrinciplesWhen autowhitelisting is enabled, SpamAssassin maintains a database keyed on message senders' email addresses and the IP addresses of their nearest untrusted relay (if any). Each time a message from a given sender is received, the message's spam score is added to the sender's total score in the database, and a count of the number of messages received from that sender is updated. The average sender score ‚ the total score divided by the number of messages received ‚ is used to modify the spam score of new messages from that sender. Specifically, the difference between the average score and the new message's score is multiplied by a configurable factor, and the result is added to the new message's spam score. The effect is that when the new message has a higher spam score than average, its spam score is adjusted downward; when the new message has a lower spam score than average, its spam score is adjusted upward. As you might expect from this explanation, the autowhitelist tests are the last ones performed by SpamAssassin. All other tests must be run first in order to have the most accurate spam score for a message before comparing it to the sender's historical average. In addition, the sender's historical average is updated with the spam score of a new message before the autowhitelist modifier is applied. 4.1.2 ConfigurationThe most important decisions to make in autowhitelisting are how much weight SpamAssassin should put on a sender's history of sending spam or non-spam messages and how much weight it should put on the spam score of the message it is checking. Use the auto_whitelist_factor directive to set the multiplier that is applied to the difference between a message's spam score and the sender's historical average score. It can range from to 1. The default factor is 0.5, which causes the final spam score to be halfway between the message's spam score and the sender's average score. To put more weight on the historical average, increase the auto_whitelist_factor . When the auto_whitelist_factor is set to 1, the historical average alone will be the new message's spam score (recall, however, that the score before autowhitelisting is performed is fed back into the system and becomes part of the new historical average). To put less weight on the historical average, decrease the auto_whitelist_factor . When the auto_whitelist_factor is set to 0, the historical average is ignored, and the current message's spam score will not be modified based on the sender's past messages. Table 4-1 illustrates the impact of several different settings for auto_whitelist_factor . Each row of the table represents a new message from the same sender. Table columns show the spam score of each message before applying an autowhitelist modifier, the sender's historical average score, and the spam score after applying an autowhistelist modifier. In this example, the sender sends several non-spam messages and then sends a message that looks like spam to SpamAssassin (a false positive). As you can see, with autowhitelisting using factors of 0.5, 0.75, or 1, the message will not reach the usual spam threshold of 5 because of the sender's history of non-spam messages. Without autowhitelisting (i.e., with an factor of 0), the message receives a score of 6. Table 4-1. The impact of auto_whitelist_factor (AWF)
SpamAssassin stores its autowhitelist data in database files. SpamAssassin lets Perl's AnyDBM module choose which database format will be used, based on which system libraries are available. In SpamAssassin 3.0, you can control this choice by setting the auto_whitelist_db_modules option to a space-separated list of Perl database modules to be tried in order; the first module that loads successfully will be used. For example, the default module order is specified like this: auto_whitelist_db_modules DB_File GDBM_File NDBM_File SDBM_File How you configure autowhitelisting also depends on whether you want each user to have his own whitelist database, or whether you want to use one database in common across all users. 4.1.2.1 Configuring per-user autowhitelistsBy default, SpamAssassin maintains a separate autowhitelist for each user on the system. SpamAssassin stores the autowhitelist database for a user in the auto-whitelist file in the .spamassassin subdirectory of each user's home directory. SpamAssassin uses one of several database formats for this file, depending on what database libraries are available on the system; the Berkeley DB format is chosen when it's available. SpamAssassin 3.0 can also store autowhitelists in an SQL database, which is useful when users don't have accounts on the mail server. To store addresses in SQL, you must install the DBI Perl module and an appropriate driver module for your SQL server. Common choices are DBD-mysql (for the MySQL server), DBD-Pg (for the PostgreSQL server), and DBD-ODBC (for connection to an ODBC-compliant server). You should create a database and a user with privileges to access it. You must then create a table in the database to store the user autowhitelist. The SpamAssassin source code includes schemas for MySQL and PostgreSQL tables in the sql subdirectory. Here is the MySQL schema: CREATE TABLE awl ( username varchar(100) NOT NULL default '', email varchar(200) NOT NULL default '', ip varchar(10) NOT NULL default '', count int(11) default '0', totscore float default '0', PRIMARY KEY (username,email,ip) ) TYPE=MyISAM; Each row in this table specifies an autowhitelist entry for a single sender for an individual SpamAssassin user. SpamAssassin uses the columns to store the following information:
To configure SQL support for autowhitelists, set the following configuration parameters in your systemwide configuration file ( local.cf ):
4.1.2.2 Configuring a system-wide autowhitelistIt is often desirable to maintain a single autowhitelist for all users of a system. When users don't have home directories, such an approach is not just desirable but may be necessary if autowhitelisting is to be used. You can configure a systemwide autowhitelist by setting the auto_whitelist_path directive to the full path of the autowhitelist database file. Set auto_whitelist_path in the systemwide configuration file. For example, to set up a systemwide autowhitelist in the file /etc/mail/spamassassin/auto-whitelist , use the following directive: auto_whitelist_path /etc/mail/spamassassin/auto-whitelist If SpamAssassin encounters this directive, it checks to be sure the database file exists. If the file does not exist, SpamAssassin attempts to create it. You may not want to give SpamAssassin write access to the directory you specify. One way around that is to create the file as root , change its ownership to the SpamAssassin user, and set the mode to allow read/write access, all before you add the auto_whitelist_path to your configuration file. However you create it, the systemwide autowhitelist database file should be readable and writable by the user running SpamAssassin. Depending on your configuration, SpamAssassin may be running as root , as one of several users on the system, or as a default unprivileged user such as nobody . If you let SpamAssassin create the systemwide autowhitelist database file, you can use the auto_whitelist_file_mode directive to specify the file's mode. It defaults to 0700 but may need to be set to 0770 or 0777 depending on your configuration, when multiple users must access the file.
4.1.3 Using an AutowhitelistOnce the autowhitelisting system is configured, you must instruct SpamAssassin to use it. In SpamAssassin 2.63, if you invoke SpamAssassin with the spamassassin script, add the --auto-whitelist option to direct the script to consult your autowhitelist. If you invoke SpamAssassin with the spamc client, you should start spamd (the daemon) with the --auto-whitelist option to direct it to consult user autowhitelists. SpamAssassin 3.0 contains no --auto-whitelist command-line options. Instead, autowhitelists are always used when the use_auto_whitelist configuration option is set in a user's (or a systemwide) configuration file.
You can use the spamassassin script to manipulate the contents of your autowhitelist. The following command-line options to spamassassin operate on your autowhitelist:
|
‚ < ‚ Day Day Up ‚ > ‚ |