3.1 The Anatomy of a Test

‚  < ‚  Day Day Up ‚  > ‚  

Most SpamAssassin tests consist of the same basic components :

  • A test name , consisting of up to 22 uppercase letters , numbers , or underscores. Names that begin T_ refer to rules in testing.

  • A more verbose description of the test, which is used in the reports generated by SpamAssassin. Typically, descriptions are up to 50 characters long.

  • An indication of where to look. Tests can be applied to the message headers only, the message body only, uniform resource identifiers (URIs) in the message body, or the complete message. When testing the message body, the body can be analyzed in its raw state, after MIME-decoding the text, or after MIME-decoding, stripping of HTML, and removal of all line breaks.

  • A description of what to look for. Tests can specify a header to check for existence, a Perl regular expression pattern to match, a DNS-based blacklist to query, or a SpamAssassin function to evaluate.

  • Optional test flags that control the conditions under which the test is applied or other exceptional features.

  • A score or scores for the test. Tests can have a single score that is always used, or they can have separate scores for messages that test positive under each of four conditions:

    • When the Bayesian classifier and network tests are not in use

    • When the Bayesian classifier is not in use, but network tests are

    • When the Bayesian classifier is in use, but network tests are not

    • When the Bayesian classifier and network tests are both in use

Example 3-1 shows the complete definition of a test that matches when a message's From address begins with at least two numbers. This test is defined in the file /usr/share/spamassassin/20_head_tests.cf (although its score appears in the 50_scores.cf file).

Example 3-1. A test definition and score
 header FROM_STARTS_WITH_NUMS    From =~ /^\d\d/ describe FROM_STARTS_WITH_NUMS  From: starts with nums score FROM_STARTS_WITH_NUMS     0.390 1.574 1.044 0.579 

How does this test work? The header directive defines it as a test that will be applied to the message headers and gives the test name ( FROM_STARTS_WITH_NUMS ) and the test itself, a match of the From header against the regular expression /^\d\d/ . That regular expression denotes a string that begins with two digits.

For information about how to read and write regular expressions, see the Perl manual page perlre , or Jeffrey Friedl's book Mastering Regular Expressions (O'Reilly).


The describe directive provides a human-readable description of the test that SpamAssassin will insert in reports when the test matches. The score directive determines how many points SpamAssassin will add to the spam score of a message if the test matches. Higher scores mean that a message that matches the test is more likely to be spam. In this example, SpamAssassin will add 0.39 points to the spam score of a matching message if network and Bayesian tests are not in use, 1.574 points if network tests are in use but Bayesian tests are not, 1.044 points if Bayesian tests are in use but network tests are not, and 0.579 points if both network and Bayesian tests are in use.

The tests distributed with SpamAssassin are typically stored in files in /usr/share/spamassassin . Tests are stored in a set of ruleset files based on the type of test being performed, and scores for all tests are stored together in one file. These tests are discussed in detail later in this chapter. Following are some other examples of test definitions from the distributed tests, along with their scores.

Testing for a To , From , or Cc header that mentions friend@public.com (this test is distributed disabled):

 header FRIEND_PUBLIC       ALL =~ /^(?:toccfrom):.*friend\@public\.com/im describe FRIEND_PUBLIC     sent from or to friend@public.com score FRIEND_PUBLIC        0 

Testing for the existence of the X-PMFLAGS header:

 header X_PMFLAGS_PRESENT        exists:X-PMFLAGS describe X_PMFLAGS_PRESENT      Message has X-PMFLAGS header score X_PMFLAGS_PRESENT         2.900 2.800 2.800 2.700 

Testing for long lines of hexadecimal code in the message body:

 body LARGE_HEX                  /[0-9a-fA-F]{70,}/ describe LARGE_HEX              Contains a large block of hexadecimal code score LARGE_HEX                 0.633 1.595 1.193 1.160 

Testing for a Subject header in all capital letters, by evaluating a SpamAssassin function:

 header SUBJ_ALL_CAPS            eval:subject_is_all_caps( ) describe SUBJ_ALL_CAPS          Subject is all capitals score SUBJ_ALL_CAPS             0.550 0.567 0 0 

Testing for a message that includes HTML to open a new window with JavaScript (disabled by default):

 body HTML_WIN_OPEN              eval:html_test('window_open') describe HTML_WIN_OPEN          Javascript to open a new window score HTML_WIN_OPEN             0 

Testing for an HTTP (Hypertext Transfer Protocol) URI anywhere in the message that uses a numeric IP address (e.g., http:// 3502894884 ):

 uri NUMERIC_HTTP_ADDR           /^https?\:\/\/\d{7,}/is describe NUMERIC_HTTP_ADDR      Uses a numeric IP address in URL score NUMERIC_HTTP_ADDR         2.899 2.800 2.696 0.989 

‚  < ‚  Day Day Up ‚  > ‚  


SpamAssassin
SpamAssassin
ISBN: 0596007078
EAN: 2147483647
Year: 2004
Pages: 88

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net