|< Day Day Up >|
Spam filter evasion can be summed up in one
Defeating a spam filter can be hard, but it often just comes back to the golden rule of spam:
Spam is a numbers game, and the majority of
Lets say you’re a spammer, and like real estate, location is very important to consider. If you want to evade a host-based spam filter you need to think about the best host to use, since the majority of network- and host-based filters carry more weight than content filters. For example, if a dial-up modem in Brazil is constantly sending you e-mail, and there is no reason for them to do so, it is easier to just filter anything coming from that host or network. Inspecting the message content is harder. There are so many different types of
The following conditions raise the suspicion of these types of filters:
The host is listed in an RBL and is a known
The sender has been sending large amounts of spam
The host sent a fake HELO
The host has no reverse DNS or MX records
The host’s reverse DNS record uses a different domain than the HELO
The hostname contains DSL, dial up, Point-to-Point Protocol (PPP), or Serial Line Internet Protocol (SLIP).
If any of these conditions are met, the
|Notes from the Underground…||
More About Proxy Servers
As mentioned in Chapter 3, using proxy servers to send spam can be highly useful and effective. However, when sending e-mail to a host that is actively checking so many elements of the proxy server, you must have a very legitimate looking host or none of your spam will get through. When filtered for the this criteria, a list of 3,000 to 4,000 proxies may only produce a list of five to ten
When you know what the filters are looking for, a spammer should simply find a host that meets their criteria. A compromised mail server is an ideal host, which already sends e-mail, has valid forward and reverse DNS entries, and even looks like a mail server! However, the majority of the time a spammer is not that lucky, and most spammers end up using home DSL users or
There are many ways of getting around a host-based filter. The simplest is to find a legitimate host or register a DNS entry and set up an MX or pointer record (PTR) for it.
|Notes from the Underground…||
Spammers chew through domain names very quickly; large spammers have thousands of names registered at any given time. Spammers promote from these domains until every filter
For the most part, only the truly devoted or the corporate spammer will go to the trouble of setting up a host with valid DNS records and legitimate information. The majority of spammers will just play the numbers game again. Although 20 percent of hosts in the world may drop e-mail coming from a DSL modem in Brazil, 80 percent will accept it.
The amount of bad spam I see amazes me. When I see my Bayesian filter mark a message with a score of 40, I know this spammer is not going to have much success. The reason is always the same: spammers try to be crafty and falsify any credentials they send. It’s not hard to pick up on this; the majority of inexperienced spammers get very low delivery rates because their spam is incredibly obvious.
The best trick to evade a spam filter is to
. Spam filters search for anything that is not legitimate looking. Think about e-mail, its content, and how a legitimate e-mail should look. Compare the two in your mind and try to make your spam look legitimate, like you sent it from Outlook, with the body looking like a
X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2627 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.xxxx
This adds to the message’s validity. The message should look like a real message; every little detail will help it get past the filters. Sending spam without an X-Mailer flag shows that the e-mail came from a questionable source. Corresponding message IDs must be set to the correct value of the source you say sent the e-mail. If you’re sending e-mail through a proxy server but want people to think it is being sent from a qmail mail server (slightly more credible), you need to use the same format that a qmail server uses. Do not try to
A message ID is a unique string assigned by the mail server where the message originated. The message ID is in the format of:
Each e-mail daemon has its own unique string. It is easy to spot a fake if you use an incorrect string or the wrong syntax for your message ID.
When sending your HELO command just before e-mail delivery, be highly creative when you say what host you are coming from. Do not identify yourself as
, or a home DSL
The FROM, TO, CC, and BCC fields are also
Keep the e-mail as personalized as possible and as legitimate looking as possible. Many rule sets now contain filters for any reply address that contains suspect information. Setting a legitimate reply address can mean the difference between e-mail being delivered or not. The phrase “offers” and strings of random numbers and
One of the best ways to
However, there is nothing to stop a spammer from appending into the message.
------BEGIN PGP SIGNATURE------ version: pgpfreeware 6.5.2 for non-commercial use <http://www.pgp.com> sp118fg4j8r7m3s9od5h2ixrqheafer3ysepsq1azdhzuvskfcntfpe9xs4fhqs wacj49dk6u883sxo4kb9u6/jnjdxawasqnzxpetxk9b2doglc/60hwrpn+vujdu xav65sop+px4knaqcciecamqj7ugiherempnbxwyatymjafkbkh1eulc2vrwdmd cjdi57fh43ks9cm78h4t ------END PGP SIGNATURE------
In early 2003, many spam were sent using a legitimate signature and the majority of spam filters delivered the spam with no questions asked. After all, it had a legitimate PGP signature so a human being must have written it, right? It did not take long for the spam filters to catch up to the spammers, though, and before you knew it spam filters were actively dropping anything that contained a signature or PGP-encoded data for suspicion of it being spam.
Once again, the integrity of your e-mail, your communication, and your privacy rests in the hands of software developers who can stop you from reading your PGP e-mail.
This is a highly
Remember, keep it readable; do not use CAPS to write everything. Use real English words and do not overuse the language; try to repeat some words. For example:
Subject:: my is much hookup is happening.
This does not read well to you and me, but to a spam filter it reads fine. Spam
Subject: FREE SAVE DOLLARS 891723
Although spammers like adding random numbers into subjects, too many have
Random numbers are usually added into a subject field to defeat hash-based spam filters. The idea is that each message should have a unique subject (a different random number) that makes the hash of each message subject different. This lowers the probability of the message being spam.
The only problem is that spam filters quickly caught onto this, and now random numbers also equate to the message being spam.
Instead of adding obvious random data, my preferred method is to substitute words with other words of the same length. Keep it looking legitimate and use only English words. I know of spammers who have large lists of random two-, four-, and seven-letter words that they use to compose a unique subject line for each e-mail. This keeps the subject unique enough to defeat hash-based filters, while not obviously trying to be unique.
A subject template that looks like “
There do weanels do. Juicy to ballium to. Glitz as colling as. Xerox to balming to.
These four subject lines look legitimate. Granted they do not make any sense, but if you look at the words and the length of each word, they match common English language structure. Spam filters will agree with this and have more faith in the e-mail’s legitimacy. Filters like to look for language discrepancies in subjects, to compare the subject to how a typical English sentence should look. It is easy to detect an invalid subject. Tricks such as using a question mark and exclamation point in the subject add to the message’s score. This
The downside of not using overly identifiable words such as “Hey, buy my Viagra” in your subject line is that the reader has no clue what your spam is about. This is a major
Although the message will probably get through more filters, the chances of someone opening the message because of something written in the subject line is low, and this can affect your sales. There are middle-of-the-road
Language frequency statistics can be used as a measure of identifying if a real language was used in the message subject or if strings of random characters were thrown in. English is a highly predictable language and
As an example, the following subject line shows no language pattern:
Jioea oifje ifje qo yd yhue uhfo uihje ojq uehf pie ie ha e oge os eb
Although the subject contains 17 words, not one word was repeated and many of the words are under three letters in length (short words are most common in the English language). Words such as “I,” “at,” “is,” “be,” “and,” “are,” and “was” may be repeated two or three times in a long sentence. Yet this sentence used 10 words, three characters long or shorter, and managed not to repeat any of them. This means that either the message is legitimate but not written in English, or the message is spam and contains random characters designed to look like words. Either way, the probability of being filtered is much higher.
This has always
Using encoding as an evasion method involves encoding e-mail with an unusual encoding method. Many spam filters fail to read the message’s true contents because they have no support for that encoding type. Seeing only the encoded data, spam filters often make mistakes, misjudging the e-mail and its contents. Obvious spam is often mistaken for legitimate e-mail. A text body containing the phrase “Buy Viagra Now” becomes QnV5IFZpYWdyYSBOb3c= when encoded with Base64. Although spam filters will not understand the e-mail contents, many e-mail
The most popular encoding methods are:
The following is an example of a Base64-encoded message (often called a Multipurpose Internet Mail Extension [MIME]-encoded message). The majority of e-mail clients will be able to decode the Base64 body and show the real text that was hidden from the spam filter.
Reply-To: <firstname.lastname@example.org> Message-ID: <031c068291029384125b2$5da01aa2@eiquhe> From: <email@example.com> To: Don't try to hide Subject: Don't wait, hide today Date: Tue, 24 Sep 2003 11:08:41 +0600 MiME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0013_83C84A5C.B4868D82" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Internet Mail Service (5.5.26xx.xx) Importance: Normal ------=_NextPart_000_00A3_83C8AD5C.B486A182 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: base64 PGh0bWw+DQo8Ym9keT4NCjxmb250IGNvbG9yPSJmZmZmZmYiPnNreTwvZm9u dD4NCjxwPllvdXIgaG9tZSByZWZpbmFuY2UgbG9hbiBpcyBhcHByb3ZlZCE8 YnI+PC9wPjxicj4NCjxwPlRvIGdldCB5b3VyIGFwcHJvdmVkIGFtb3VudCA8 YSBocmVmPSJodHRwOi8vd3d3LjJnZXRmcmVlcXVvdGVzLmNvbS8iPmdvDQpo ZXJlPC9hPi48L3A+DQo8YnI+PGJyPjxicj48YnI+PGJyPjxicj48YnI+PGJy Pjxicj48YnI+PGJyPjxicj48YnI+PGJyPjxicj48YnI+PGJyPjxicj48YnI+ DQo8cD5UbyBiZSBleGNsdWRlZCBmcm9tIGZ1cnRoZXIgbm90aWNlcyA8YSBo cmVmPSJodHRwOi8vd3d3LjJnZXRmcmVlcXVvdGVzLmNvbS9yZW1vdmUuaHRt bCI+Z28NCmhlcmU8L2E+LjwvcD4NCjxmb250IGNvbG9yPSJmZmZmZmYiPnNr eTwvZm9udD4NCjwvYm9keT4NCjxmb250IGNvbG9yPSJmZmZmZmYiPjFnYXRl DQo8L2h0bWw+DQo4MzM0Z1RpbzgtbDk=
The bolded line acts as an identifier to notify any e-mail clients that the
There is an obvious flaw in using this method. If each e-mail has to identify that it has encoded the body in Base64, you are telling the spam filter that you are trying to evade it by encoding the data. Since hardly
The following is an example of implementing Base64 encoding by not specifying that the subject is in fact Base64 encoded. This message will only be readable by an e-mail client that can actively identify and de-encode it, such as Outlook, and will not be readable from many Web-based e-mail clients.
Subject: =?iso-8859- 1?B?SGV5LCBsZXQgbWUga25vdyB3aGF0J3MgZ29pbmcgb24gaGVyZS4u?=
There is one exception to the rule. What happens when you use message encryption?
Take the following e-mail body as an example.
-----BEGIN PGP MESSAGE----- Version: PGP 8.1 qANQR1DBwU4DkfwNh5oP7QAQBEFADFkE9jXhEU7b3u0Mx67REBop4qp9yYQUP2RNZ bQsOfKKH73J6ndLM8hlbi/I59rDfzKQ9kIDYjaOJxDHdu8FieIQ6EPJ+AA1mngjk…
This is slightly different than using a PGP signature, since the only way the client can read this data is by having a copy of PGP installed and the public key for the message added to their key ring. There is no plaintext data above the signature; the entire message is encrypted.
Encrypted spam is a new idea. Hidden inside PGP, spam could have a decent chance of bypassing a spam filter using previously set up rules to ignore encrypted data. Currently, I haven’t seen any spam that has the entire message body encrypted with PGP, but why not? If you had a
The downside is that the key would be bound to your e-mail address, so you would need to have an e-mail address at a server that allows spam.
There are other encoding methods used in spam, usually when trying to hide data inside plaintext e-mails. Spam filters commonly catch links in e-mails, so you should try to obfuscate any addresses you want someone to click on, as much as possible. This also
However, if I used an encoded string such as:
<a href="http://www. 03;oogle.com"> </a>
and encoded my spam in HTML format, you would have no idea what that host is without clicking on it. The host is actually www.google.com , but I have taken each character and shown the decimal notation value for it, not the American Standard Code for Information Interchange (ASCII) value. Your browser will understand and decode the data, but it will probably look like gibberish to you.
Sadly, spam filters have caught onto this encoding trick, and trying to use any strange decimal encoding does not work against modern filters. Filters can detect that you are trying to hide or obfuscate a link by using decimal characters. Why would you do this unless you didn’t want the spam filter to see it? This is a classic example of how rule-based spam filters are designed. They look for anyone trying to hide information, not
Encoding methods can be used to evade, but only to a certain point. Although you may have a high success rate with old filters, you will have much more trouble with up-to-date rule-based filters and well-learned Bayesian rules. However, because the majority of hosts on the Internet are running out-of-date spam protection, you may receive up to a 50 percent success rate using an encoding method, depending on the different
One of the problems with spam evasion is that successful evasion depends on the filter you are trying to evade. It can be hard to evade multiple filters with one technique. Although there is a significant amount of
E-mail headers are one of the most exploited attributes in spam. The majority of spam that is now sent contains some type of fake header. These headers usually falsify which hosts the e-mail was relayed through by adding headers that
Before adding fake headers into your spam you should know that it is illegal to do so, as defined by the following sections of the Controlling the Assault of Non-Solicited Pornography and Marketing Act of 2003 (CAN-SPAM):
“(A) Header information that is technically accurate but includes an originating electronic e-mail address, domain name, or Internet Protocol address the access to which for purposes of initiating the message was obtained by means of false or fraudulent pretenses or representations shall be
“(C) Header information shall be considered materially misleading if it fails to identify accurately a protected computer used to initiate the message because the person initiating the message knowingly uses another protected computer to relay or retransmit the message for purposes of
Following is an example of injected message headers that attempt to fool the recipient of the message into believing the e-mail originated from Microsoft.com:
Received: from ppp-123.companyx.com (ppp-123.companyx.com [198.113.xx.x]) by mail.microsoft.com (8.8.7/8.8.7) with ESMTP id XAA1923 for <firstname.lastname@example.org>; Sat, 10 Sep 1998 11:16:34 -0400 (EDT) Message-Id: <199709201416.XAA24492@mail.microsoft.com> Received: from mail.microsoft.com (222.214.171.124) by mail.microsoft.com (MX E5.0) with ESMTP; Sat, 20 Sep 1997 07:20:30 -1300 EST
The last line of this entry is where the injected headers begin. The spammer who sent this e-mail was not very clever, and my rule-based spam filter easily caught this message. To start with, the headers suggest that the e-mail passed through
. Also, it shows that Microsoft is running sendmail 8.8.7 and that the local time at Microsoft is still the year1998. The IP address of
cannot be 2126.96.36.199, because this is an invalid IP address. (Note that 295. is beyond the scope of a legitimate
This e-mail really originated from the host at the beginning of the e-mail headers; ppp-123.companyx.com . This host is either part of a Botnet or the user is running an insecure proxy server. The spammer is trying to use ppp-123.compayx.com to send e-mail; however, for some reason, they decided to be stealthy in how they do this, and have falsely claimed that the e-mail was relayed through microsoft.com , when it obviously wasn’t.
Injecting different header fields can have very different results. One method that was popular in early 2000 to 2002 was injecting a virus scanner such as “Scanned-by xxxxxx anti-virus” into the header. This header often bypassed the need for the message to be scanned again. However, this method is now overused and has very little effect on modern filtering methods.
The most significant downside to adding false headers into e-mail messages is that you need to be careful about what you actually add. You need to have some idea of what should be there, including an invalid IP address or incorrect e-mail daemon. This information will confuse no one, and will only draw a filter’s attention to your message. A suspicious e-mail header can highly increase an e-mail’s chance of being filtered, because many spam filters now look for dubious or suspicious information being added into spam headers. Keep it legitimate. If you like adding headers, copy and paste a legitimate e-mail header. Be original but do not try to be creative. By nature, Spam headers are not creative.
The following is a better example of a header injection, but it still lacks quality and is highly detectable. This spammer, although more effective than the previous spammer, is still not injecting efficient or correct headers,
Return-Path: email@example.com Received: from pcp04613952pcs.gambrl01.md.comcast.net (pcp04613952pcs.gambrl01.md.comcast.net [68.49.xxx.xxx]) by mta05- svc.ntlworld.com (InterMail vM.4.01.03.37 201-229-121-137-20020806) with SMTP id <firstname.lastname@example.org>; Sun, 26 May 2004 16:52:21 +0100 Received: from pxlvx.cvp5tr.net ([195.216.xx.xxx]) by pcp04613952pcs.gambrl01.md.comcast.net with ESMTP id 9821319; Fri, 26 May 2004 22:48:11 –0100 Message-ID: email@example.com From: "Ralph Pegash" <firstname.lastname@example.org>
Message ID email@example.com is obviously fake; h00.h00 should be the name of the server currently processing the e-mail (in this case pxlvx.cvp5tr.net ). You can see that IP address 195.216.xx.xxx is listed as the address for pxlvx.cvp5tr.net , however an nslookup on that IP address shows that 195.216.xx.xxx really resolves to support.kamino.co.uk , and pxlvx.cvp5tr.net is not even a valid DNS entry. This spammer’s e-mail has been caught because they did not think about what they were injecting!
The following is what he should have done.
Find a real host to spoof. Give the real IP address of that host and its real name. Do not be lazy; it only takes a few seconds to find a real and currently active host.
Find out what e-mail software your spoofed host is running and then issue a correct Message ID for that software. Do a Google search to find out what the correct Message ID should look like. Make sure you include the correct server name after the “@” sign in the Message ID.
Make the Message ID unique. Message ID’s often contain the date, time, second, and millisecond. Follow this trend. For example: firstname.lastname@example.org starts with the year (2004), then the month (05), then the day (26), and then the second, the millisecond, and a random number or two. Given this layout, you can easily predict what valid Message ID’s can look like coming from this e-mail daemon with the following expression:
where each question mark can be a sequential decimal number. This produces 999,999
A successful header spoofing attack can result in a large portion of complaint e-mails being lost or redirected to a wrong party. Spam is a world where complaints can mean the difference between being paid and not being paid. It is worth investing time into a well thought-out header injection attack.
RBL’s can be hard for spammers to bypass,
One such incident
A month later, another variant of Minmail dubbed
Good afternoon, We are going to bill your credit card for amount of $22.95 on a weekly basis. Free pack of child porn CDs is already on the way to your billing address. If you want to cancel membership and your CD pack
pleaseemail order and credit card details to email@example.com Are you ready for all types of underage porn? We have the best selection for every taste! Just click the secret links below and have fun: www.authorizenet.com disney.go.com www.spamcop.net www.carderplanet.net www.cardcops.com www.register.com www.spews.org www.spamhaus.org Nude boys under 16! Nude girlsunder 16! Incest, a daddy& a daughter! We have everything you have ever dreamed for!
It is very
A constant war
Alternatively, spammers can attack the RBL so much that clients cannot query it when they need to question a host’s validity. Once the RBL is unreachable, spammers send out the spam to the users from a handful of proxy servers. Spammers know that without the RBL the mail server has no way of knowing that the servers sending them e-mail are known spam hosts; therefore the e-mail will have a much higher delivery rate.
Such attacks require a lot work, and there is a certain level of risk involved with attacking an RBL. The scale of e-mails sent out is very significant; 100 million spam e-mails would be the least I would send out for an operation like this. Many spammers do not have network security experience (like
The ability to use HTML inside an e-mail has opened up a new world of evasion techniques. HTML is a very functional rendering language, and there is much scope when you use HTML as a rendering engine to evade a spam filter.
HTML is becoming known as a method spammers use to hide messages. Because of this, many spam filters are becoming suspicious of HTML messages, and more precautionary measures are taking place when analyzing them.
HTML’s success is its ability to contain data that is only visible at an un-rendered level, keeping it out of the rendered page. This means the recipient sees one message while the spam filter is shown another. This is due to spam filters not having the intelligence to render HTML.
One method of achieving HTML obfuscation is to insert junk or invalid HTML tags into the message. Some spam filters are affected by this and so instead of seeing the whole word “Viagra” they see something entirely different, as shown in the following example:
<html> <b> <aef>F</aef>e<ira>e</ira>l<spa> like</spa> <aea>b</aea>uy<ea>in</ea>g <ie>V</ie>i<xtag>a</xtag>g<ali>r</ali>a? <a>C</a>l<b>i</b>c<aef>k</aef> <a href=http://www.drugsaregood.com> h<b>e</b>r<ac>e</ac> </a> and make it so! </b> </html>
As can be seen, this is highly confusing; the
Unless a filter is actively stripping out all HTML tags before parsing the e-mail, any checks to see if “Viagra” is present will fail. However, the e-mail client used to view this message is much
Figure 7.3: The Rendered Page: Clean and Readable
This is probably the most common method of obfuscation used today. If you change each junk tag name for each spam, you can beat most simple hash-based filters that are thrown off by the ever-changing HTML markup. Recent spam filters would strip out all of the HTML tags and parse the e-mail for spam content. Many people still run out-of-date filters, so these HTML obfuscation methods are still actively used with good success.
Other methods include adding more visible words or letters into the HTML body. These letters are rendered but are not visible to the reader because of their size or font. Using a 1-pixel font
The following example uses 1-pixel high characters to obfuscate the main message:
<html> <font size="1" color="#ffffff">a</font>B<font size="1" color="#ffffff">a</font>u<font size="1" color="#ffffff">x</font>y <font size="1" color="#ffffff">-</font> V<font size="1" color="#ffffff">a</font>i<font size="1" color="#ffffff">u</font>a<font size="1" color="#ffffff">a</font>g<font size="1" color="#ffffff">a</font>r<font size="1" color="#ffffff">i</font>a<font size="1" color="#ffffff">a</font> <font size="1" color="#ffffff"> </font><a href=http://www.drugsaregood.com>H<font size="1" color="#ffffff">a</font>e<font size="1" color="#ffffff">p</font>r<font size="1"
Although junk HTML tags in a browser can easily be filtered by a HTML pre-filter, how do you filter against something that is visible and rendered to the user, but not visible to the naked eye?
The phrase “Buy Viagra Here” (Figure 7.4) becomes “aBauay - Vaiaaagaraaa Haearae” once a HTML filter
Figure 7.4: HTML Character Obfuscation – Rendered Version
The long words, single pixel fonts, and white-on-white text will raise some suspicion, but as long as the rest of the e-mail is legitimate looking it will pass most rule-based filters. This is another highly popular method of obfuscating text, the message is still readable to anyone who has a HTML-enabled e-mail client, but is unreadable to any spam filters that do not actively pre-parse messages.
An easy way to bypass filters that look for white-on-white text is to use #FFFFFE instead of #FFFFFF for the text color.
Many older filters look for the string font color=”ffffff”. These filters can be bypassed by using a different string, and the color shown on the screen e-mail client will still not be visible to the user.
Also, never try to use a negative pixel size for the text. Negative size fonts incur a much higher spam score than using very small text.
When using HTML to inject characters, do not use obvious padding characters such as “.,-()^~`” etc.; use real letters. More suspicion would be raised if you wrote .B.u.y . V.i.a.g.r.a. H.e.r.e, as the filter would reason, what English sentence has thirteen full stops in it? Filters are catching onto this, so it is best to use
Inside the program you use to send spam, define a variable to be one of the letters: A, E, I, O, U, and then randomly insert each one with a 1-pixel size HTML tag between each letter of the word you’re trying to obfuscate. If you want to go
“I” before “E,” except after “C” or except when sounding like “A” such as in neighbor and weigh.
This may help with Bayesian filters if it is able to match your new words to previously known words. Using a semi-legitimate word structure for all of your injected words can help greatly.
HTML also offers the use of images, whether it’s using it as a method of verifying an e-mail address’s validity, or a method of displaying dubious words. Images can play a vital part in HTML spam.
A spam filter is unable to OCR (optically character recognize) the image you are linking to. Spam filters have no idea what the image is really saying; it could be a logo for a company or a sign saying “Buy Viagra Here.” There really is no way to tell, which can be a big problem for spam filters. If you have the money, invest in a bulletproof Web host, someone who will allow you to host pictures with them. Next, write a spam e-mail and include a link to a modestly
No one sends e-mail where the entire body is a jpg, because it would be caught by the majority of spam filters. You need to be creative and stealthy. A good rule is that for every image, add 3,000 bytes of random data (3,000 random characters). This will produce a good result with spam filters that are checking for weighted percentages of pictorial data within a message.
Random data is essential in spam for a few reasons. First, it offers a method in which the spam can be unique from every other spam message sent. Well-placed random data can ensure that the message is always unique, even to a spam filter that is trying to filter obvious attempts to be unique. If enough thought is used when making spam unique, it is possible to evade many hash-based spam filters. Again, the trick is to look normal; do not draw too much attention to obvious strings of random data.
If you decide to always have a unique message subject in your spam, do not be blatant about how you do it. Having a different random number at the end of the subject is not the best way. The majority of legitimate e-mails do not contain a single number in the subject. Stick to this rule and instead of numbers, use random words or random placements of words. If random data is to be purely random, it must be placed at random intervals throughout the message. For example, when a string of five numbers is placed at the end of a message subject, you can make the subject contain ever-changing random data but that data is always located in a very predictable location at the end of the subject. Filters have caught onto this and now filter any spam that contains a string of random numbers at the end of the subject. Mix it up a little. The use of phrases, letters, and random words from the e-mail body can greatly help your chances when trying to pass random data off as legitimate text.
Avoid using large amounts of white space as a method of hiding random data in spam. By including 10 or 20 carriage returns in the e-mail before your random data, spammers are able to push the gibberish sentences out of sight of the reader. This is an easy trick and is one rule that filters use to catch a lot of spam. Remember to keep the e-mail looking legitimate. If you typed half a page of text, why hide it out of sight below 10
Statistics show that users actively click on anything you give them. Even if there are a few lines of random data on either side, the majority of the time if the user was going to click on it before they saw the lines of random data, they will still click on it.
Readers are very used to mentally deciphering spam e-mails as they read them. Tricks such as V1^gr@ have taught readers to be very astute when reading spam. Including a few lines of text that makes no literal sense in the e-mail will not hinder anyone from buying your product, who wasn’t interested in the first place.
Another use for random data is to bulk up the size of your spam message. If you were to analyze a few hundred spam messages and then compared each to a real message, you would probably see that on average spam messages are shorter than legitimate messages, with usually only a few hundred bytes to the message. Spam usually contains a quick catch phrase and a link to the product. How many legitimate e-mails do you receive that are two lines long with an HTTP link in the body? Not many I bet, and this has become a method of filtering spam; catch the messages that are short and often HTML-encoded with
There are many legitimate reasons you might send someone a message in HTML with embedded hyperlinks, but not many of those messages are short in length. E-mails such as HTML newsletters or automated e-mail
This is where random data becomes useful; with most mailing software you can add random words, letters, or characters to an e-mail. Simply have a text paragraph or two of random phrases, include random lines from a text file of quotes, and include a joke or two. Spammers don’t usually have much to say to anyone; the majority of the time a message involves “Hey, buy my product, click here.”
Make sure there is enough text in the body so that any spam filter will think long and hard about the message and its validity. Spam filters contain code to not drop legitimate e-mails; make them think your e-mail is just that. Sure, the message comes encoded in HTML and contains a hyperlink to some questionable .com site, but it also contains a large amount of legitimate English words.
As mentioned earlier in this chapter, be sure to use correct English words in spam and do not try to make up too many new words from random characters. Also, repeat several words in the body multiple times, preferably a noun, something that would be common in a passage of text. If you can, also include punctuation marks and
The following is an example template of a message that contains eight lines of random data. The mailing program adds this data in when the message is queued for sending, but the random data is positioned in a way that it adds to the validity of the message without drawing too much attention to itself.
From: firstname.lastname@example.org Message Subject: %RND_WORD %RND_WORD. Yo %FIRST_NAME, I %RND_WORD %RND_WORD and %RND_WORD, %FIRST_NAME and I %RND_WORD %RND_WORD. %RND_WORD the %RND_WORD %RND_WORD %RND_WORD it. %RND_WORD %RND_WORD %RND_WORD a %RND_WORD %RND_WORD the %RND_WORD %RND_WORD %RND_WORD. %RND_WORD the %RND_WORD %RND_WORD %RND_WORD it. %RND_WORD %RND_WORD You should buy my Viagra and Xennax Low prices, will keep your wife happy! http://www.drugsaregood.com %RND_WORD the %RND_WORD %RND_WORD %RND_WORD it. %RND_WORD %RND_WORD %RND_WORD a %RND_WORD %RND_WORD the %RND_WORD %RND_WORD %RND_WORD. %RND_WORD the %RND_WORD %RND_WORD %RND_WORD it. %RND_WORD %RND_WORD
This plaintext message was sent to my mail server, which was running Spam Assassin using the latest rule set available. The test is to see if I can use random data to deliver a spam and what impact it has on the score of my message.
The message is sent from a host with valid reverse DNS, PTR, and MX records setup. This server is known as
. Currently, my spambox’s IP is not listed in any RBL, and it has never sent my mail server e-mail. For demonstration purposes, I will use my spambox to
Received: from email@example.com by mails by uid 89 with qmail scanner-1.22st (clamdscan: 0.74. spamassassin: 2.63. perlscan: 1.22st. Clear:RC:0(188.8.131.52):SA:0(-4.3/5.0):. Processed in 8.234695 secs); 25 Aug 2004 03:41:22 -0000 X-Spam-Status: No, hits=-4.3 required=5.0
This message scored 4.3; SpamAssassin needs a score of 5 by default to declare a message spam. It came very close, but 4.3 is still under 5. For another experiment, I sent the following:
Received: from firstname.lastname@example.org by mails by uid 89 with qmail scanner-1.22st (clamdscan: 0.74. spamassassin: 2.63. perlscan: 1.22st. Clear:RC:0(184.108.40.206):SA:0(-4.8/5.0):. Processed in 2.344291 secs); 25 Aug 2004 03:44:13 -0000 X-Spam-Status: No, hits=-4.8 required=5.0
This e-mail was still delivered even with the absence of random data in the body. However, the score was much higher than the previous message with random data, plus it took only 2.3 seconds for the server to derive this message’s score. If I had been sending this from a questionable host, something that was listed in an RBL or had a dubious DNS record, my message would have been
Random data helped evade the spam filter and would help against future filtering based on the message’s size or exact contents. Filters would quickly grow to know the Web site mentioned, or the catch phrase “Low prices will keep your wife happy!.” Ideally, if I was a spammer, I would be using additional random data within or around any spam catch phrases, to keep all elements of the e-mail unique and to help protect it from smarter hash-based filters that may be able to detect my random data.
|< Day Day Up >|