‚ < ‚ Day Day Up ‚ > ‚ |
When none of the existing tests does what you'd like, you can write a custom test of your own. Custom tests are just like the distributed tests, except that you install them in the systemwide configuration file or in a per- user preference file.
The first step in writing a custom test is to choose a symbolic test name and write a meaningful test description with the describe directive. For now, do not begin any of your names with a double underscore ( _ _ ). Test names that begin with two underscores are not listed in test hit reports , nor are they added to the spam score on their own; such names are used for creating sets of subtests that should be applied in combination. SpamAssassin calls these combinations meta tests , and they are discussed later in this section. Second, determine what part of the message you wish to test. Table 3-1 summarizes the directives used to test different portions of a message. Each is covered in greater detail in the following sections. Table 3-1. Message portions and associated test directives
Third, decide if your test requires any special test flags. Test flags are used to inform SpamAssassin that your test may apply only under certain conditions or may do something unusual. Use the tflags TESTNAME flaglist directive to indicate test flags. The flaglist is a space-separated list of flags. Table 3-2 lists the available flags in SpamAssassin and their effects. Table 3-2. Test flags
For example, the RCVD_IN_BL_SPAMCOP_NET test, which checks the message's Received headers against the DNS-based blacklist at bl.spamcop.net is defined in 20_dnsbl_tests.cf like this: [2]
header RCVD_IN_BL_SPAMCOP_NET eval:check_rbl_txt('spamcop', 'bl.spamcop.net.') describe RCVD_IN_BL_SPAMCOP_NET Received via a relay in bl.spamcop.net tflags RCVD_IN_BL_SPAMCOP_NET net Finally, after adding or modifying a test, you should run spamassassin --lint to check your new rules for correct syntax. This command will attempt to parse all of the rules and configuration files in the ruleset directory and systemwide configuration directory. It exits quietly if no errors are found.
3.3.1 Header TestsUse the header directive to define a header test. Header tests can test for the existence of a header or check to see if a header matches (or fails to match) a regular expression. To check for the existence of a header, use the following syntax: header TESTNAME exists: headername Regular expression tests can be applied to any single header in a message, both the To and Cc headers, all Message-Id headers, or all headers. Use the following form to match a header to a Perl regular expression: header TESTNAME headername =~ / regexp / modifiers Use this next syntax to test whether a header does not match a regular expression: header TESTNAME headername !~ / regexp / modifiers In these tests, the headername can be the name of a single header, or can be ToCc (to match in the To or Cc header), MESSAGEID (to match in any Message-Id header), or ALL (to match in any header). SpamAssassin 3.0 also supports headername EnvelopeFrom to match against the address supplied in the SMTP MAIL FROM command if the MTA provides this information to SpamAssassin. A header that does not exist will not match any regular expression. To handle the possibility of a nonexistent header, you can add an optional [if-unset : STRING ] after the regular expression and modifiers, and STRING will be tested against the regular expression if the header does not exist. For example, to look for a Reply-To header that either contains @localhost or is missing, you could use this rule: header LOCAL_OR_NO_REPLY reply-to =~ /@localhost/ [if-unset: @localhost] Many of the methods available in the Mail::SpamAssassin::EvalTests module test headers. This module is not documented, but you can learn about its methods by reading the rules distributed with SpamAssassin. For example, the subject_is_all_caps( ) method matches when the Subject header contains all capital letters . This test is the basis of the SUBJ_ALL_CAPS rule distributed with SpamAssassin: header SUBJ_ALL_CAPS eval:subject_is_all_caps( ) 3.3.1.1 Configurable header tests (SpamAssassin 3.0)Some of the header tests in SpamAssassin 3.0 that use Mail::SpamAssassin::EvalTests methods have configurable parameters that control their operation. These parameters should be defined in sitewide or user configuration files. The check_for_from_dns( ) method performs a DNS lookup on the address in the message's Reply-To or From header to ensure that an MX record listing a host willing to receive mail for the message sender's host exists. Because DNS lookups can be slow, two configuration file options, check_mx_attempts and check_mx_delay are provided so you can adjust these lookups. Set check_mx_attempts to the number of lookup attempts you are willing to have SpamAssassin make (the default is 2). Set check_mx_delay to the number of seconds to wait between attempts in case the domain name server is temporarily down (the default is 5). The check_hashcash_value( ) and check_hashcash_double_spend( ) methods implement Hashcash verification (http://www.hashcash.org). If a message includes an X-Hashcash header, SpamAssassin can quickly verify that the sender spent the required processing time to produce a valid header and reduces the message's spam score in proportion to how difficult it was for the sender to produce the header. To control SpamAssassin's use of Hashcash, define the following configuration variables :
3.3.1.2 check_rbl( )A set of methods that can be the basis for new tests are the check_rbl( ) , check_rbl_txt( ) , and check_rbl_sub( ) methods. These methods extract IP addresses from a message's Received headers, discard those that are known to be reserved addresses or on trusted networks, and query a DNS-based blacklist for each address. If any of the addresses are listed in the blacklist, the test matches. Rules using these methods are written like other eval rules: header A_NEW_BLACKLIST eval:check_rbl('nasties','new.blacklist.zone') Call check_rbl( ) with two arguments. The first argument is the zone ID , a string that's used to identify the blacklist. It's primarily useful when you're querying a blacklist that's composed of many different lists, and you later want to evaluate the query result by which sublists the addresses were on (this topic is discussed later in this chapter). If you append -notfirsthop to the name of the zone ID, the originating IP address will be excluded from RBL lookups unless it is the only IP address. This is useful when querying blacklists of dialup or DSL (Digital Subscriber Line) hosts that are expected to relay all their email through an ISP's mail server. If new.blacklist.zone was this kind of blacklist, you might have written the test like this: header A_NEW_BLACKLIST eval:check_rbl('nasties-not-firsthop','new.blacklist.zone') Similarly, you can append -firsttrusted to check the IP address that appears in the Received header that was added by the most remote trusted server (IP addresses in Received headers added by more remote relays cannot be trusted). This is useful for querying a DNS-based whitelist to determine whether the server that first relayed the email to a trusted server appears on the whitelist. By appending -untrusted , you will check only the untrusted IP addresses (those more remote than the most remote trusted server). Here's a definition for a test of a DNS-based whitelist: header A_NEW_WHITELIST eval:check_rbl('friends-firsttrusted','new.whitelist.zone') tflags A_NEW_WHITELIST nice (Remember, as Table 3-2 points out, when defining a test that will lower the spam score, you must set the nice test flag.)
The second argument is the DNS zone for the blacklist. SpamAssassin checks the blacklist by performing a DNS query for a hostname in this zone. SpamAssassin determines the hostname by reversing the IP address that it's trying to check (e.g., 128.0.10.0 becomes 0.10.0.128) and prepending it to the zone name (e.g., creating 0.10.0.128.new.blacklist.zone). It then issues a query for a DNS A record associated with that hostname. Typically, if an address is blacklisted, the DNS query will be successful ‚ it will return an IP address (usually 127.0.0.1). If the address is not on the blacklist, the DNS query will fail (returning an NXDOMAIN response). 3.3.1.3 check_rbl_txt( )Some blacklists are based on DNS TXT records instead of DNS A records. (Blacklist operators should indicate which kind of lookup is appropriate for their blacklist.) Use the check_rbl_txt( ) method to perform lookups using a blacklist based on TXT records. check_rbl_txt( ) accepts the same arguments as check_rbl( ) and works analogously. SpamAssassin reverses the IP address that it's trying to check (e.g., 128.0.10.0 becomes 0.10.0.128) and prepends it to the zone name (e.g., creating 0.10.0.128.new.blacklist.zone). It then issues a query for a DNS TXT record associated with that hostname. If the address is blacklisted, the TXT query will return a string explaining why the address is blacklisted. If the address is not on the blacklist, the DNS query will fail (returning an NXDOMAIN response). 3.3.1.4 check_rbl_sub( )Some DNSBLs are aggregations of many different blacklists. These DNSBLs typically return different IP addresses in response to a successful A lookup to indicate on which sublist(s) the blacklisted address appears (e.g., the query returns 127.0.0.1 for addresses on sublist 1, 127.0.0.2 for addresses on sublist 2, etc.). Use the check_rbl_sub( ) method to query a combined DNSBL and determine if the IP address is on a specific sublist. This method also takes two arguments: the first is a zone ID, and the second indicates which response is associated with the desired sublist. For example, if the new.blacklist.zone blacklist is composed of sublists that return 127.0.0.1 and 127.0.0.2, you could check IP addresses against only the second sublist: header A_NEW_BLACKLIST eval:check_rbl('nasties','new.blacklist.zone') header NEW_BLACKLIST_2 eval:check_rbl_sub('nasties','127.0.0.2') Less commonly, composite lists may return a single A record whose IP address is to be interpreted as a bitmask of matching sublists. To check a sublist in this case, provide a bitmask (as a positive decimal number) as the second argument to check_rbl_sub( ) . Note that you must have a rule that uses check_rbl( ) or check_rbl_txt( ) to associate a zone ID string with the blacklist in order to check the result against a sublist. 3.3.2 Body TestsThe body , rawbody , and full directives define tests on the body of an email message. Two basic kinds of tests are provided. Message bodies can be tested against a regular expression pattern, and message bodies can be submitted to an eval test defined in Mail::SpamAssassin::Evaltests . The body directive defines a test to be applied to the text of a message, as it would be likely to appear to a person reading the message in a text-based mail client. The Subject header is considered to be the first paragraph of the message body. All textual MIME components of the message are decoded, and HTML tags are removed. The message is reformatted into paragraphs (text separated by multiple newlines), and newlines within paragraphs are removed. The test is then applied to each message paragraph. Here's an example of a body test distributed with SpamAssassin that matches if the word "remove" appears in quotes in the body: body REMOVE_IN_QUOTES /\"remove\"/i The rawbody directive defines a test to be applied to the text of a message, as it would be likely to appear to a person reading the message in an HTML-based mail client. The Subject header is not included. All textual MIME components of the message are decoded, and the message is split into lines based on the line breaks in the message. The test is then applied to each message line. Here's an example of a rawbody test distributed with SpamAssassin that's designed to find a JavaScript statement that's common in spam: rawbody HIDE_WIN_STATUS /<[^>]+onMouseOver=[^>]+window\.status=/I Note that this test could not be written as a body test because this JavaScript appears inside an HTML tag. The full directive defines a test to be applied to the full text of a message. All headers are included, along with all textual MIME components of the message body, but no decoding is performed. The message is split into lines based on the line breaks in the message, and the test is then applied to each header and message line. SpamAssassin does not distribute any full tests that match regular expressions; it reserves full for eval tests that must submit the raw message to external spam clearinghouses (which are discussed later in this chapter).
3.3.3 URI TestsThe uri directive defines a test on all URIs that appear in an email message. SpamAssassin creates a list of http , https , ftp , mailto , javascript , and file URIs and transforms bare hostnames starting with www or ftp into appropriate URIs. The test is applied to each URI in the message. URIs can be matched against a regular expression pattern. Here's an example of a distributed URI test that checks for a mailto URI with the string "remove" in the address portion: uri MAILTO_TO_REMOVE /^mailto:.*?remove/is SpamAssassin 3.0 includes a plug-in called Mail::SpamAssassin::Plugin::URIDNSBL . When loaded, this plug-in enables the uridnsbl directive, which takes each URI in the message, extracts the name of the host in the URI, looks up its IP address in DNS, and then checks the IP address against a specified DNSBL. These tests catch spam that is relayed through innocent (or temporary) mail servers but that advertise web sites on spammer servers. Here's a portion of SpamAssassin 3.0's 25_rules.cf file that defines a uridnsbl test called URIBL_SBLXBL : loadplugin Mail::SpamAssassin::Plugin::URIDNSBL ... uridnsbl URIBL_SBLXBL sbl-xbl.spamhaus.org. TXT header URIBL_SBLXBL eval:check_uridnsbl('URIBL_SBLXBL') describe URIBL_SBLXBL Contains a URL listed in the SBL/XBL blocklist 3.3.4 Meta TestsA meta test is a test that combines the results of several other tests using Boolean logic. For example, a meta test might be positive if either of two subtests are positive, or might specify that both subtests must be positive. A meta test can combine several tests using Boolean operators for and ( && ), or ( ), and not ( ! ), along with parentheses to modify the precedence in the expression. When using meta tests, you will often want some or all of the subtests to contribute only to the meta test and not to be separately scored. To achieve this effect, give the subtests names that begin with two underscores. This prevents SpamAssassin from scoring them separately. You can then assign a single score to the meta test. Because non-scoring subtests will never be listed in a SpamAssassin report, you need not include a describe directive for these tests. Example 3-3 shows the CLICK_BELOW meta test in SpamAssassin. Example 3-3. A meta test and its subtestsbody CLICK_BELOW_CAPS /CLICK\s.{0,30}(?:HEREBELOW)/s describe CLICK_BELOW_CAPS Asks you to click below (in capital letters) body _ _CLICK_BELOW /click\s.{0,30}(?:herebelow)/is meta CLICK_BELOW (_ _CLICK_BELOW && !CLICK_BELOW_CAPS) describe CLICK_BELOW Asks you to click below The CLICK_BELOW_CAPS test is standard body test that is positive if the words "CLICK BELOW" or "CLICK HERE" appear in the message in uppercase. Although it is a standard test that is used and scored on its own, SpamAssassin also uses it as a subtest in a meta test. The _ _CLICK_BELOW test is a nonscoring subtest that is positive if the same phrases appear in any combination of upper- and lowercase letters. The CLICK_BELOW meta test is positive when _ _CLICK_BELOW is positive and CLICK_BELOW_CAPS is not positive ‚ that is, when the phrase appears in anything except all uppercase. Typically, a mixed or lowercase occurrence is assigned a lower score than the uppercase version. In addition to using Boolean logic operators, it's also possible to use arithmetic operators ( + , - , * , / ) and comparisons ( > , >= , < , <= , ! = , = ). When you combine tests with arithmetic operators, the values of subtests are 1 if they are positive and 0 if they are negative. One such meta test in SpamAssassin is MULTI_FORGED, which counts the number of positive tests for different kinds of Received header forgery and is positive when two or more forgeries appear in the same message. This test is shown in Example 3-4. Example 3-4. The MULTI_FORGED meta testmeta MULTI_FORGED ((FORGED_AOL_RCVD + FORGED_HOTMAIL_RCVD + FORGED_EUDORAMAIL_RCVD + FORGED_YAHOO_RCVD + FORGED_JUNO_RCVD + FORGED_GW05_RCVD) > 1) |
‚ < ‚ Day Day Up ‚ > ‚ |