2.2 Building SpamAssassin

‚ < ‚ Day Day Up ‚ > ‚

The easiest way to download and install SpamAssassin is through CPAN. Here's what a CPAN-install of SpamAssassin looks like:

 $  su  Password:    XXXXXXX    #  perl -MCPAN -e shell  cpan shell -- CPAN exploration and modules installation (v1.61) ReadLine support enabled cpan>  o conf prerequisites_policy ask  prerequisites_policy ask cpan>  install Mail::SpamAssassin  CPAN: Storable loaded ok CPAN: LWP::UserAgent loaded ok Fetching with LWP:   ftp://ftp.perl.org/pub/CPAN/authors/01mailrc.txt.gz ... Running install for module Mail::SpamAssassin Running make for J/JM/JMASON/Mail-SpamAssassin-2.60.tar.gz Fetching with LWP: ftp://ftp.perl.org/pub/CPAN/authors/id/J/JM/JMASON/Mail-SpamAssassin-2.60.tar.gz CPAN: Digest::MD5 loaded ok Fetching with LWP: ftp://ftp.perl.org/pub/CPAN/authors/id/J/JM/JMASON/CHECKSUMS Checksum for /root/.cpan/sources/authors/id/J/JM/JMASON/Mail-SpamAssassin-2.60.tar.gz  ok Scanning cache /root/.cpan/build for sizes Mail-SpamAssassin-2.60/ Mail-SpamAssassin-2.60/ninjabutton.png ... Mail-SpamAssassin-2.60/sample-spam.txt   CPAN.pm: Going to build J/JM/JMASON/Mail-SpamAssassin-2.60.tar.gz What email address or URL should be used in the suspected-spam report text for users who want more information on your filter installation? (In particular, ISPs should change this to a local Postmaster contact) default text: [the administrator of that system]    postmaster@example.com    Checking if your kit is complete... Looks good Writing Makefile for Mail::SpamAssassin Makefile written by ExtUtils::MakeMaker 6.03 /usr/bin/perl build/preprocessor -Mconditional -Mbytes -DPERL_VERSION=5.8.0 -Mvars - DVERSION=2.60 -DPREFIX=/usr <lib/Mail/SpamAssassin/AutoWhitelist.pm >blib/lib/Mail/ SpamAssassin/AutoWhitelist.pm ... gcc  -g -O2 spamd/spamc.c spamd/libspamc.c spamd/utils.c \         -o spamd/spamc   -ldl  ... Manifying blib/man3/Mail::SpamAssassin::PerMsgLearner.3pm   /usr/bin/make  -- OK Running make test PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/basic_lint................ok ... t/zz_cleanup................ok All tests successful, 1 test skipped. Files=40, Tests=301, 426 wallclock secs (238.53 cusr + 14.19 csys = 252.72 CPU)   /usr/bin/make test -- OK Running make install Installing /usr/lib/perl5/site_perl/5.8.0/Mail/SpamAssassin.pm Installing /usr/lib/perl5/site_perl/5.8.0/Mail/SpamAssassin/PerMsgLearner.pm ... Installing /usr/bin/spamc Installing /usr/bin/spamd Installing /usr/bin/sa-learn Installing /usr/bin/spamassassin Writing /usr/lib/perl5/site_perl/5.8.0/i586-linux-thread-multi/auto/Mail/ SpamAssassin/.packlist Appending installation info to /usr/lib/perl5/5.8.0/i586-linux-thread-multi/ perllocal.pod /usr/bin/perl "-MExtUtils::Command" -e mkpath /etc/mail/spamassassin ...   /usr/bin/make install  -- OK cpan>  quit

It is also possible to install SpamAssassin manually by downloading the code as a gzip ped tar archive from http://www.spamassassin.org and following these steps from the directory where you keep local source code ( /usr/local/src on many systems):

 $  gunzip -c Mail-SpamAssassin-2.60.tar.gz  tar xf -  $  cd Mail-SpamAssassin-2.60  $  perl Makefile.PL  What email address or URL should be used in the suspected-spam report text for users who want more information on your filter installation? (In particular, ISPs should change this to a local Postmaster contact) default text: [the administrator of that system]    postmaster@example.com    Checking if your kit is complete... Looks good Writing Makefile for Mail::SpamAssassin $  make  ...compilation mesages... $  su  Password:    XXXXXXXX    #  make install  ...installation messages...

If you install SpamAssassin manually, remember that you may need to install or update other Perl modules listed in Section 2.1 , earlier in this chapter, prior to installing SpamAssassin.

FreeBSD users can install SpamAssassin from the ports collection, where it is available both as a traditional port (in which it downloads the source code and compiles it) and as a precompiled package. For example, SpamAssassin 2.63 is included in the collection as p5-Mail-SpamAssassin-2.63 .

Finally, Linux users can install SpamAssassin in one of several packaged formats. SpamAssassin is available in the Debian GNU/Linux and Gentoo Linux packaging systems as the "spamassassin" and "Mail-SpamAssassin" packages, respectively. Many other distributions of Linux bundle SpamAssassin (although not always the latest version). The latest version of SpamAssassin is also distributed as a source rpm by one of its developers. The source rpm is used to build three platform-specific rpm s that are then installed in the usual way. Example 2-1 shows the process on a RedHat Linux system.

Example 2-1. Building SpamAssassin from source rpm

 (download spamassassin-2.60-1.src.rpm from http;//w:w.spamassassin.org) #  rpm -Uvh spamassassin-2.60-1.src.rpm  1:spamassassin           ###################################### [100%] #  cd /usr/src/redhat/SPECS  #  rpm -bb spamassassin.spec  Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.57624 ... #  cd ../RPMS/i386  #  ls -l  perl-Mail-SpamAssassin-2.60-1.i386.rpm  spamassassin-tools-2.60-1.i386.rpm spamassassin-2.60-1.i386.rpm #  rpm -Uvh Perl-Mail-Spam*rpm spamassassin*2.6.0*.rpm  ...installation messages...

Installing SpamAssassin for Personal Use

If you do not have superuser access on your mail server, but do have a shell account, it is possible to install SpamAssassin into private directories in your account.

Follow the instructions for manual installation and indicate the directory structure you'd like to use for the installation of the program and libraries, and for the configuration files. For example, if you have personal bin , share , lib , and etc directories under your home directory, you might use this build process:

 $  perl Makefile.PL PREFIX=   ~   SYSCONFDIR=   ~/etc    $  make  $  make install

Note that you must still have the prerequisite Perl modules installed systemwide or you must install them into your private directories as well.

To use a personal installation of SpamAssassin, you will need to make sure that <PREFIX>/bin is on your PATH .

2.2.1 What Gets Installed

An installation of SpamAssassin includes the following components :

Perl modules

SpamAssassin's core functions are in a set of Perl modules. The most important of these are Mail::SpamAssassin , the top-level module that includes most of the others, and Mail::SpamAssassin::Conf , the module that includes documentation of the configuration files for SpamAssassin. These modules are usually installed under a directory with a name like /usr/lib/perl5/site_perl/5.8.1 , but you do not need to know their location, as the Perl installer will ensure that they are installed in a path that Perl will search when loading modules.

SpamAssassin 3.0 introduced a distinction between core SpamAssassin modules and plug-ins , modules that may be written for SpamAssassin by third parties and loaded in rulesets. Plug-in modules will have names in the Mail::SpamAssassin::Plugin hierarchy (e.g., Mail::SpamAssassin::Plugin::URIDNSBL ).

Rulesets

The rules that SpamAssassin uses to help decide whether or not a message is spam are kept in a set of configuration files that are usually installed in /usr/share/spamassassin . You can find the default location of these files by running spamassassin --local --debug , but you can always specify alternative locations.

A systemwide configuration file

The systemwide configuration file controls the default behavior of the spamassassin (and spamd ) programs when not overridden by per- user preferences. The file is called local.cf and is installed in /etc/mail/spamassassin . Other applications that use the Mail::SpamAssassin modules often put their systemwide configuration files in this directory as well. You can find the default location of these files by running spamassassin --local --debug , but you can always specify alternative locations.

spamassassin

The spamassassin program is a Perl script that accepts a message on standard input, applies the functions of Mail::SpamAssassin , and returns the message on standard output with spam scores, reports , or other modifications added as warranted. It has several other functions as well, which are described in detail later in this chapter. It is usually installed in /usr/bin .

spamd and spamc

On sites that receive large amounts of mail, invoking the spamassassin script for each message is costly, due to the overhead associated with starting a new process and running the Perl interpreter. spamd is a daemon that is started once (at system boot) and remains in memory to perform spam-checking. It listens on either a Unix domain socket or a TCP port to receive requests to check messages, and performs checks; it returns the (possibly modified) messages to the client.

spamc is the client program for sites that run the spamd daemon. It accepts a message on standard input, transmits it to spamd , and returns the response on standard output. Like spamassassin , it is invoked for each message, but it is written in C and compiled, and thus avoids the overhead associated with invoking Perl. It provides the most important functionality of spamassassin .

spamc and spamd are usually installed in /usr/bin . They are described in greater detail later in this chapter.

sa-learn

The sa-learn script is used to train SpamAssassin's Bayesian spam classification system. It teaches SpamAssassin which messages you consider spam and which you consider non-spam. Eventually, SpamAssassin can use this information to make better judgments of whether or not you want a message marked as spam. SpamAssassin's learning systems are described in detail in Chapter 4.

2.2.2 Basic Configuration

Once SpamAssassin has been installed, it's a good idea to adjust the basic systemwide configuration before testing. A complete guide to the configuration directives is given in Chapter 3; only the most commonly adjusted systemwide directives are described here.

Configuration is usually controlled by the file /etc/mail/spamassassin/local.cf . Example 2-2 shows a typical local.cf that might be used with SpamAssassin 2.63.

Example 2-2. A typical local.cf file

 # This is the right place to customize your installation of SpamAssassin. # # See 'perldoc Mail::SpamAssassin::Conf' for details of what can be # tweaked. # ########################################################################### # How high a score is considered spam? required_hits 5 # How should spam reports be inserted into the message? report_safe 1 # Should we tag the subject of spam messages? rewrite_subject 1 # By default, SpamAssassin will run RBL checks.  If your ISP already # does this, set this to 1. skip_rbl_checks 0

Blank lines and lines beginning with a number sign (#) are ignored in configuration files. Other lines begin with a configuration directive (e.g., required_score ), followed by whitespace and then the value for the directive (e.g., 5 ).

The directives you will most want to adjust are:

required_hits (SpamAssassin 2.63) or required_score (SpamAssassin 3.0)

Each SpamAssassin rule that matches a message adds (or subtracts) points from the message's total spam score. When the total score reaches the value of this directive, SpamAssassin reports the message as spam. The default value, 5, is suitable for most installations. If you are particularly worried about false positives, you can increase this value, which will also have the effect of reducing the number of true positives (i.e., some spam will be missed).

report_safe

This directive determines how SpamAssassin modifies messages that it determines are spam.

No matter how report_safe is set, SpamAssassin adds three headers to spam mail: X-Spam-Level (set to a number of asterisks representing the spam score), X-Spam-Status (set to a one-line description of the spam score and matching tests), and X-Spam-Flag (set to Yes).

When report_safe is set to 0, the message body is kept intact, and the header X-Spam-Report is added with a detailed description of the rules that matched. When report_safe is set to 1, a new MIME message is created with the spam report as an attachment and the original spam message as an attachment with content-type message/rfc822 . When report_safe is set to 2, SpamAssassin behaves similarly, but the original spam message is attached with content-type text/plain .

rewrite_subject (SpamAssassin 2.x only)

If this directive is set to 1, SpamAssassin will prepend "*****SPAM*****" to the message subject in the Subject header if the message is considered spam. This is useful when users have mail clients that can filter only on standard headers.

rewrite_header (SpamAssassin 3.0 only)

This directive can be used to rewrite the Subject , From , or To headers of messages that SpamAssassin considers spam. Rewriting the Subject header prepends a given string to the message subject. For example, to prepend "*****SPAM*****" to a spam message's subject, use the following:

 rewrite_header subject *****SPAM*****

Rewriting From or To headers adds the given string to the email address as a parenthetical comment.

skip_rbl_checks

SpamAssassin typically looks up a sender's IP address in a set of Domain Name System (DNS)-based real-time blacklists (DNSBLs or RBLs) to determine whether they have been listed as known spam source, open proxy or relay, dialup host, etc. Many ISPs perform these checks in the MTA itself in order to reject connections from such hosts at the earliest possible point. If you do that, you can prevent SpamAssassin from doing its own lookups by setting this directive to 1; the default is 0. It is also possible to perform lookups against one set of DNSBLs at the MTA and a different set in SpamAssassin.

2.2.3 Testing SpamAssassin

Once the basic systemwide configuration is in place, it's a good idea to test SpamAssassin to ensure that it can correctly distinguish a known non-spam message from a known spam message. To facilitate this, the SpamAssassin source code includes two files, sample-nonspam.txt and sample-spam.txt . The former contains an email message that has very few hallmarks of spam; the latter contains an email message that includes the GTUBE (Generic Test for Unsolicited Bulk Email) string, a special test string that is used to validate spam-checkers.

If you installed SpamAssassin using CPAN, you'll find the sample-nonspam.txt and sample-spam.txt files in whichever directory CPAN performs its builds. Often that will be a subdirectory of root 's home directory named .cpan/build/Mail-Spamassassin-2.63 .

To test the spamassassin script, run it in test mode by using the --test-mode command-line argument and provide one of the sample files on its standard input. In test mode, spamassassin will produce a spam score at the bottom of the message whether or not the message meets the required score for spam. Example 2-3 shows a test of spamassassin on the sample-nonspam.txt file, which produces a final score of 0.0.

Example 2-3. Testing spamassassin with sample-nonspam.txt

 $  cd Mail-SpamAssassin-2.63  $  spamassassin --test-mode < sample-nonspam.txt  Return-Path: <tbtf-approval@world.std.com> Delivered-To: foo@foo.com Received: from europe.std.com (europe.std.com [199.172.62.20])         by mail.netnoteinc.com (Postfix) with ESMTP id 392E1114061         for <foo@foo.com>; Fri, 20 Apr 2001 21:34:46 +0000 (Eire) ... Content preview:  -----BEGIN PGP SIGNED MESSAGE----- TBTF ping for   2001-04-20: Reviving T a s t y B i t s f r o m t h e T e c h n o l o g   y F r o n t [...] Content analysis details:   (0.0 points, 5.0 required)  pts rule name              description ---- ---------------------- --------------------------------------------  0.0 LINES_OF_YELLING       BODY: A WHOLE LINE OF YELLING DETECTED

Example 2-4 shows the same test using sample-spam.txt , which produces a final score of 1000.

Example 2-4. Testing spamassassin with sample-spam.txt

 $  spamassassin --test-mode < sample-spam.txt  Received: from localhost [127.0.0.1] by tala.mede.uic.edu         with SpamAssassin (2.60 1.212-2003-09-23-exp);         Sun, 16 Nov 2003 21:38:03 -0600 ... Content preview:  This is the GTUBE, the Generic Test for Unsolicited   Bulk Email. If your spam filter supports it, the GTUBE provides a test   by which you can verify that the filter is installed correctly and is   detecting incoming spam. You can send yourself a test mail containing   the following string of characters (in uppercase and with no white   spaces and line breaks): [...] Content analysis details:   (1000.0 points, 5.0 required)  pts rule name              description ---- ---------------------- --------------------------------------------- 1000 GTUBE                  BODY: Generic Test for Unsolicited Bulk Email

If these tests succeed, you might try testing with a few real spam and non-spam messages from your mailbox to get a feel for how the scoring works.

2.2.4 SpamAssassin Options

The spamassassin script has a large number of command-line options that control its behavior. Some of the most commonly used for spam-checking are detailed here; others are featured in Chapter 3 and Chapter 4. A complete list of options can be found in the man page for spamassassin .

2.2.4.1 Locating configuration files

SpamAssassin expects to find its rulesets in /usr/share/spamassassin , its systemwide configuration file at /etc/mail/spamassassin , and per-user preferences in ~/.spamassassin/user_prefs . If you've installed SpamAssassin in different locations, you may need to use these command-line options to help the spamassassin script locate these files.

--configpath /path/to/ruleset/directory: Specifies the path to the directory containing the SpamAssassin ruleset configuration files. This option also can be called as --config-file or --config-dir .
--siteconfigpath /path/to/sitewide/directory: Specifies the path to the directory containing the sitewide configuration file local.cf .
--prefspath /path/to/user_prefs: Specifies the path to the file containing user preferences for the user running spamassassin . --prefs-file can also be used.

2.2.4.2 Scripting and testing options

Two spamassassin options are useful in scripting.

--exit-code [integer]

When this option is used, the spamassassin script will exit with a nonzero exit code if the message it checked was determined to be spam, and a zero exit code if it was not. The default spam exit code is 5, but you can specify one as an argument to this option. If spamassassin exits due to a program error, it returns exit code 64 (if bad arguments were given to spamassassin ) or 70 (for other errors).

This option provides a useful way for a calling script to determine if a message is considered spam.

--log-to-mbox /path/to/mbox/file (SpamAssassin 2.x only)

This option causes copies of all of the messages processed by spamassassin to be logged to the given file in mbox format. The messages are logged in the form in which spamassassin receives them, with no spam-tagging. This option can be used to preserve pristine copies of email, but such a function is probably better performed by the MTA itself, rather than by SpamAssassin.

2.2.4.3 Untagging

No spam-checking system is perfect. If SpamAssassin mistakenly tags a non-spam message as spam, it will add several message headers and reformat the message to include its report as the first MIME attachment and the original message as a second attachment. To remove these headers and restore the message to a near-original state, pipe the message to spamassassin with the --remove-markup option, as shown in Example 2-5.

Example 2-5. Removing SpamAssassin markup

 $  spamassassin < sample-spam.txt > marked-message  $  spamassassin --remove-markup < marked-message > unmarked-message  $  diff -s sample-spam.txt unmarked-message  Files sample-spam.txt and unmarked-message are identical

Messages that have been tagged and then untagged via --remove-markup may differ in minor ways from the original message. For example, headers that may have included line breaks in the original message may be concatenated into one long line.

2.2.4.4 Reporting

If you've installed clients for spam checksum clearinghouses, you can report spam to those clearinghouses by piping a message to spamassassin --report . The message will be untagged before being reported . In SpamAssassin 2.63, if you also provide the --warning-from= emailaddress option, the sender of the spam will receive an email (apparently from the provided emailaddress ) warning her that her message has been reported as spam. This is rarely useful (because most spam forges or obfuscates the sender's address), and this option has been removed in SpamAssassin 3.0.

You can also use SpamAssassin's reporting capability to set up spam traps . A spam trap is an email address that has never been used by a real recipient and never requests email from anyone . People who set up spam trap addresses often include the addresses on web pages or in Usenet postings with instructions that people should not send mail to the addresses ‚ instructions that spammers' address- harvesting programs will ignore. Because any email that's sent to the spam trap address can be safely assumed to be spam, you can report it as such to spam clearinghouses. To set up a spam trap with SpamAssassin, create an email alias that pipes messages to spamassassin --report . For most clearinghouse systems, you will need to determine which user your mail system will invoke spamassassin --report as and set up some files in that user's home directory to control how it will interact with the clearinghouse client. See your clearinghouse documentation for details.

Never report spam sent to a legitimate address that you have not verified with your own eyes . The clearinghouse systems rely on these spam reports, and their effectiveness is diminished when non-spam messages are reported as spam. If you do accidentally report a non-spam message, you can revoke your report by piping the message to spamassassin --revoke . Not all clearinghouses support message revocation. As of SpamAssassin 3.0, only Vipul's Razor does.

‚ < ‚ Day Day Up ‚ > ‚