|< Day Day Up >|| |
Spam filter evasion can be summed up in one sentence: from the subject line to the reply address, every element of spam must look like the day-to-day e-mails people receive. Spam cannot contain any suspicious content because spam detection is highly intelligent and spam filters can catch and remove large quantities of messages that spammers send. As can be seen from the different types of spam filters available, many different and unique techniques are used to identify a message’s validity and the legitimacy of their contents. This section focuses on defeating rule- and hash-based filters.
Defeating a spam filter can be hard, but it often just comes back to the golden rule of spam: numbers. If you send ten million spam e-mails and only six million messages are read, you can say that you expect an instant 40 percent loss with any mailings you send out. If you send twice as much e-mail as usual, in theory you come closer to breaking even. Flooding the world with spam can dilute the impact of your product, but if enough spam is sent, rewards will eventually be gained.
Spam is a numbers game, and the majority of hosts on the Internet are not running highly effective spam filters. Legacy mail servers that have been running for years are not using the latest state-of-the-art Bayesian filters. What small company or school has the money to employee a spam-catching expert to set up their mail server? Microsoft just started shipping a spam filtering plug-in for exchange (Smart Screen) last year, and it uses only a single Bayesian filter. I estimate that 60 percent of the Internet’s mail servers that are running any spam filtration are using inefficient or highly outdated spam filters.
Lets say you’re a spammer, and like real estate, location is very important to consider. If you want to evade a host-based spam filter you need to think about the best host to use, since the majority of network- and host-based filters carry more weight than content filters. For example, if a dial-up modem in Brazil is constantly sending you e-mail, and there is no reason for them to do so, it is easier to just filter anything coming from that host or network. Inspecting the message content is harder. There are so many different types of legitimate e-mails and ways of using the English language, that trying to find illegitimate messages can be very hard. A filter has a greater chance of catching spam if it is rigorously looking for any suspicious hosts that are sending it e-mail.
The following conditions raise the suspicion of these types of filters:
The host is listed in an RBL and is a known open proxy
The sender has been sending large amounts of spam
The host sent a fake HELO
The host has no reverse DNS or MX records
The host’s reverse DNS record uses a different domain than the HELO
The hostname contains DSL, dial up, Point-to-Point Protocol (PPP), or Serial Line Internet Protocol (SLIP).
If any of these conditions are met, the chances are your message will not be delivered.
|Notes from the Underground…|| |
More About Proxy Servers
As mentioned in Chapter 3, using proxy servers to send spam can be highly useful and effective. However, when sending e-mail to a host that is actively checking so many elements of the proxy server, you must have a very legitimate looking host or none of your spam will get through. When filtered for the this criteria, a list of 3,000 to 4,000 proxies may only produce a list of five to ten legitimate-looking hosts. You must have huge numbers of available proxy servers to be able to find the few that are of good enough quality to evade host-based filters.
When you know what the filters are looking for, a spammer should simply find a host that meets their criteria. A compromised mail server is an ideal host, which already sends e-mail, has valid forward and reverse DNS entries, and even looks like a mail server! However, the majority of the time a spammer is not that lucky, and most spammers end up using home DSL users or insecure servers at universities.
There are many ways of getting around a host-based filter. The simplest is to find a legitimate host or register a DNS entry and set up an MX or pointer record (PTR) for it.
|Notes from the Underground…|| |
Spammers chew through domain names very quickly; large spammers have thousands of names registered at any given time. Spammers promote from these domains until every filter knows them as a prolific spamming domain, at which time spammers discard the domain and register a new one. Each DNS name costs only a few dollars so registering 1,000 to 2,000 is not a big deal considering the potential returns you’ll earn.
For the most part, only the truly devoted or the corporate spammer will go to the trouble of setting up a host with valid DNS records and legitimate information. The majority of spammers will just play the numbers game again. Although 20 percent of hosts in the world may drop e-mail coming from a DSL modem in Brazil, 80 percent will accept it.
The amount of bad spam I see amazes me. When I see my Bayesian filter mark a message with a score of 40, I know this spammer is not going to have much success. The reason is always the same: spammers try to be crafty and falsify any credentials they send. It’s not hard to pick up on this; the majority of inexperienced spammers get very low delivery rates because their spam is incredibly obvious.
The best trick to evade a spam filter is to look legitimate. Spam filters search for anything that is not legitimate looking. Think about e-mail, its content, and how a legitimate e-mail should look. Compare the two in your mind and try to make your spam look legitimate, like you sent it from Outlook, with the body looking like a genuine e-mail. Start from the beginning and build the message based on how Outlook messages are built. Start by sending a valid Outlook MUA as seen here:
X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2627 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.xxxx
This adds to the message’s validity. The message should look like a real message; every little detail will help it get past the filters. Sending spam without an X-Mailer flag shows that the e-mail came from a questionable source. Corresponding message IDs must be set to the correct value of the source you say sent the e-mail. If you’re sending e-mail through a proxy server but want people to think it is being sent from a qmail mail server (slightly more credible), you need to use the same format that a qmail server uses. Do not try to randomly make up your own message ID; filters look for this. Remember to keep it as realistic as possible, like the one shown here:
A message ID is a unique string assigned by the mail server where the message originated. The message ID is in the format of:
Each e-mail daemon has its own unique string. It is easy to spot a fake if you use an incorrect string or the wrong syntax for your message ID.
When sending your HELO command just before e-mail delivery, be highly creative when you say what host you are coming from. Do not identify yourself as hotmail.com, yahoo.com, or a home DSL user at chello.nl. Trust me, there is a good chance that your e-mail will be blocked at the HELO command. Use a host from a dictionary such as red.com, jack.com, or style.com. If you want a better chance of delivery, use your own e-mail host or relay as the HELO. Unfortunately, this can sometimes backfire when RBL’s catch many messages being delivered from the same HELO. Issuing a HELO of yourself will cause any checks matching your host to your HELO host to succeed.
The FROM, TO, CC, and BCC fields are also vital to a spam message. How many legitimate e-mails have you received that don’t have a sender reply address? The point of e-mail is to talk to each other, so it doesn’t make sense to send an e-mail and not give a reply address. The TO address should be the person receiving the e-mail. Other users should be included in the CC and BCC fields. Do not try to hide anything.
Keep the e-mail as personalized as possible and as legitimate looking as possible. Many rule sets now contain filters for any reply address that contains suspect information. Setting a legitimate reply address can mean the difference between e-mail being delivered or not. The phrase “offers” and strings of random numbers and letters in a reply address usually strongly point toward the message being spam. Who really has an account like email@example.com?
One of the best ways to prove that your e-mail is legitimate is by cryptographically signing it. Any e-mail message that has a Pretty Good Privacy (PGP) signature is usually treated as a legitimate message. It seems only logical to assume that a “real” person signed the message with PGP and that this is not spam.
However, there is nothing to stop a spammer from appending into the message.
------BEGIN PGP SIGNATURE------ version: pgpfreeware 6.5.2 for non-commercial use <http://www.pgp.com> sp118fg4j8r7m3s9od5h2ixrqheafer3ysepsq1azdhzuvskfcntfpe9xs4fhqs wacj49dk6u883sxo4kb9u6/jnjdxawasqnzxpetxk9b2doglc/60hwrpn+vujdu xav65sop+px4knaqcciecamqj7ugiherempnbxwyatymjafkbkh1eulc2vrwdmd cjdi57fh43ks9cm78h4t ------END PGP SIGNATURE------
In early 2003, many spam were sent using a legitimate signature and the majority of spam filters delivered the spam with no questions asked. After all, it had a legitimate PGP signature so a human being must have written it, right? It did not take long for the spam filters to catch up to the spammers, though, and before you knew it spam filters were actively dropping anything that contained a signature or PGP-encoded data for suspicion of it being spam.
Once again, the integrity of your e-mail, your communication, and your privacy rests in the hands of software developers who can stop you from reading your PGP e-mail.
This is a highly analyzed field, and anything that looks slightly different is actively filtered. So, what do you say in the subject? You should not try to be sneaky; using a subject like RE: 98324 will get you nowhere. And don’t try to fake the fact that you have replied to the e-mail or that it is a forward of another e-mail; filters are quickly catching on to this. A subject’s validity can come down to a matching word found in the subject that also exists in the body, proving that the subject is not a string of random characters and that the body relates to the subject data.
Remember, keep it readable; do not use CAPS to write everything. Use real English words and do not overuse the language; try to repeat some words. For example:
Subject:: my is much hookup is happening.
This does not read well to you and me, but to a spam filter it reads fine. Spam Assassin will not judge the subject as spam because it doesn’t contain any dubious text. The next example is slightly easier to detect with its use of CAPS, language, and random numbers:
Subject: FREE SAVE DOLLARS 891723
Although spammers like adding random numbers into subjects, too many have overused this method. Now there is a rule in filters that looks for a string of random numbers in the subject. Again I ask, how many people receive legitimate e-mail with a six-digit string of random numbers in the subject?
Random numbers are usually added into a subject field to defeat hash-based spam filters. The idea is that each message should have a unique subject (a different random number) that makes the hash of each message subject different. This lowers the probability of the message being spam.
The only problem is that spam filters quickly caught onto this, and now random numbers also equate to the message being spam.
Instead of adding obvious random data, my preferred method is to substitute words with other words of the same length. Keep it looking legitimate and use only English words. I know of spammers who have large lists of random two-, four-, and seven-letter words that they use to compose a unique subject line for each e-mail. This keeps the subject unique enough to defeat hash-based filters, while not obviously trying to be unique.
A subject template that looks like “four-letter, two-letter, seven-letter, two-letter <full stop>” produces the message subjects seen in the following example. These subjects will not cause any problems with a rule-based spam filter, yet they are each different.
There do weanels do. Juicy to ballium to. Glitz as colling as. Xerox to balming to.
These four subject lines look legitimate. Granted they do not make any sense, but if you look at the words and the length of each word, they match common English language structure. Spam filters will agree with this and have more faith in the e-mail’s legitimacy. Filters like to look for language discrepancies in subjects, to compare the subject to how a typical English sentence should look. It is easy to detect an invalid subject. Tricks such as using a question mark and exclamation point in the subject add to the message’s score. This increases its chance of being flagged spam, since both a question mark and an exclamation point are not used together in the same sentence in traditionally correct English.
The downside of not using overly identifiable words such as “Hey, buy my Viagra” in your subject line is that the reader has no clue what your spam is about. This is a major tradeoff when using any form of pseudo-random data.
Although the message will probably get through more filters, the chances of someone opening the message because of something written in the subject line is low, and this can affect your sales. There are middle-of-the-road methods such as obfuscating the subject field and its text, although these are often caught by Bayesian filters because they look so different. Rule-based filters can be easily beaten with a few simple tricks (covered later in this chapter).
Language frequency statistics can be used as a measure of identifying if a real language was used in the message subject or if strings of random characters were thrown in. English is a highly predictable language and follows many set structures around sentence composure and word use. Many words are commonly used more than once in a sentence, and spam filters look for this.
As an example, the following subject line shows no language pattern:
Jioea oifje ifje qo yd yhue uhfo uihje ojq uehf pie ie ha e oge os eb
Although the subject contains 17 words, not one word was repeated and many of the words are under three letters in length (short words are most common in the English language). Words such as “I,” “at,” “is,” “be,” “and,” “are,” and “was” may be repeated two or three times in a long sentence. Yet this sentence used 10 words, three characters long or shorter, and managed not to repeat any of them. This means that either the message is legitimate but not written in English, or the message is spam and contains random characters designed to look like words. Either way, the probability of being filtered is much higher.
This has always confused me. Why do so many spammers attempt to make their own words? Picking random legitimate words from the dictionary is not hard and has a much better delivery rate against even the smartest filters.
Using encoding as an evasion method involves encoding e-mail with an unusual encoding method. Many spam filters fail to read the message’s true contents because they have no support for that encoding type. Seeing only the encoded data, spam filters often make mistakes, misjudging the e-mail and its contents. Obvious spam is often mistaken for legitimate e-mail. A text body containing the phrase “Buy Viagra Now” becomes QnV5IFZpYWdyYSBOb3c= when encoded with Base64. Although spam filters will not understand the e-mail contents, many e-mail clients contain support for alternative encoding formats, allowing the end user to read the e-mail perfectly. This method is very useful; it makes it is possible to easily defeat a filter that is unable to understand different encoding methods.
The most popular encoding methods are:
The following is an example of a Base64-encoded message (often called a Multipurpose Internet Mail Extension [MIME]-encoded message). The majority of e-mail clients will be able to decode the Base64 body and show the real text that was hidden from the spam filter.
Reply-To: <firstname.lastname@example.org> Message-ID: <031c068291029384125b2$5da01aa2@eiquhe> From: <email@example.com> To: Don't try to hide Subject: Don't wait, hide today Date: Tue, 24 Sep 2003 11:08:41 +0600 MiME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0013_83C84A5C.B4868D82" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Internet Mail Service (5.5.26xx.xx) Importance: Normal ------=_NextPart_000_00A3_83C8AD5C.B486A182 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: base64 PGh0bWw+DQo8Ym9keT4NCjxmb250IGNvbG9yPSJmZmZmZmYiPnNreTwvZm9u dD4NCjxwPllvdXIgaG9tZSByZWZpbmFuY2UgbG9hbiBpcyBhcHByb3ZlZCE8 YnI+PC9wPjxicj4NCjxwPlRvIGdldCB5b3VyIGFwcHJvdmVkIGFtb3VudCA8 YSBocmVmPSJodHRwOi8vd3d3LjJnZXRmcmVlcXVvdGVzLmNvbS8iPmdvDQpo ZXJlPC9hPi48L3A+DQo8YnI+PGJyPjxicj48YnI+PGJyPjxicj48YnI+PGJy Pjxicj48YnI+PGJyPjxicj48YnI+PGJyPjxicj48YnI+PGJyPjxicj48YnI+ DQo8cD5UbyBiZSBleGNsdWRlZCBmcm9tIGZ1cnRoZXIgbm90aWNlcyA8YSBo cmVmPSJodHRwOi8vd3d3LjJnZXRmcmVlcXVvdGVzLmNvbS9yZW1vdmUuaHRt bCI+Z28NCmhlcmU8L2E+LjwvcD4NCjxmb250IGNvbG9yPSJmZmZmZmYiPnNr eTwvZm9udD4NCjwvYm9keT4NCjxmb250IGNvbG9yPSJmZmZmZmYiPjFnYXRl DQo8L2h0bWw+DQo4MzM0Z1RpbzgtbDk=
The bolded line acts as an identifier to notify any e-mail clients that the preceding text is encoded in Base64. The client then decodes the data, and reveals the correct decoded body.
There is an obvious flaw in using this method. If each e-mail has to identify that it has encoded the body in Base64, you are telling the spam filter that you are trying to evade it by encoding the data. Since hardly anyone on the Internet sends a body of an e-mail encoded, modern day spam filters look for a message body that is encoded. You could drop all messages that contain a strange encoding type, simply because legitimate messages are not commonly encoded.
The following is an example of implementing Base64 encoding by not specifying that the subject is in fact Base64 encoded. This message will only be readable by an e-mail client that can actively identify and de-encode it, such as Outlook, and will not be readable from many Web-based e-mail clients.
Subject: =?iso-8859- 1?B?SGV5LCBsZXQgbWUga25vdyB3aGF0J3MgZ29pbmcgb24gaGVyZS4u?=
There is one exception to the rule. What happens when you use message encryption?
Take the following e-mail body as an example.
-----BEGIN PGP MESSAGE----- Version: PGP 8.1 qANQR1DBwU4DkfwNh5oP7QAQBEFADFkE9jXhEU7b3u0Mx67REBop4qp9yYQUP2RNZ bQsOfKKH73J6ndLM8hlbi/I59rDfzKQ9kIDYjaOJxDHdu8FieIQ6EPJ+AA1mngjk…
This is slightly different than using a PGP signature, since the only way the client can read this data is by having a copy of PGP installed and the public key for the message added to their key ring. There is no plaintext data above the signature; the entire message is encrypted.
Encrypted spam is a new idea. Hidden inside PGP, spam could have a decent chance of bypassing a spam filter using previously set up rules to ignore encrypted data. Currently, I haven’t seen any spam that has the entire message body encrypted with PGP, but why not? If you had a publicly available key added to a central key server and the message encrypted with this key, any clients using that key server and who have PGP installed would be able to decrypt your message, which spam filters cannot.
The downside is that the key would be bound to your e-mail address, so you would need to have an e-mail address at a server that allows spam.
There are other encoding methods used in spam, usually when trying to hide data inside plaintext e-mails. Spam filters commonly catch links in e-mails, so you should try to obfuscate any addresses you want someone to click on, as much as possible. This also reduces what people see from the spam, and may help to keep a spam Web site up by not obviously saying “Click here www.myhost.com,” thus allowing myhost.com to receive millions of complaint e-mails.
However, if I used an encoded string such as:
<a href="http://www. 03;oogle.com"> </a>
and encoded my spam in HTML format, you would have no idea what that host is without clicking on it. The host is actually www.google.com, but I have taken each character and shown the decimal notation value for it, not the American Standard Code for Information Interchange (ASCII) value. Your browser will understand and decode the data, but it will probably look like gibberish to you.
Sadly, spam filters have caught onto this encoding trick, and trying to use any strange decimal encoding does not work against modern filters. Filters can detect that you are trying to hide or obfuscate a link by using decimal characters. Why would you do this unless you didn’t want the spam filter to see it? This is a classic example of how rule-based spam filters are designed. They look for anyone trying to hide information, not necessarily the information itself, just someone attempting to obfuscate or trick the filter. Evading a smart spam filter is simple: do not look suspicious and play it cool, and the filter will let you pass.
Encoding methods can be used to evade, but only to a certain point. Although you may have a high success rate with old filters, you will have much more trouble with up-to-date rule-based filters and well-learned Bayesian rules. However, because the majority of hosts on the Internet are running out-of-date spam protection, you may receive up to a 50 percent success rate using an encoding method, depending on the different countries and hosts whom you send e-mail to. Chances are the person who set up the mail server has since left the company without any proper updates being implemented since their departure. Of course, this leads to the current users on the server receiving more spam.
One of the problems with spam evasion is that successful evasion depends on the filter you are trying to evade. It can be hard to evade multiple filters with one technique. Although there is a significant amount of badly set up mail servers, there are also many that are set up very efficiently.
E-mail headers are one of the most exploited attributes in spam. The majority of spam that is now sent contains some type of fake header. These headers usually falsify which hosts the e-mail was relayed through by adding headers that roughly say, “Server X relayed this e-mail at 5pm EST.” The goal of injecting false headers is to confuse the reader as much as to try to evade a spam filter. If you are able to confuse the user to the point that they are unable to figure out where the e-mail originated from, they will be unable to complain to anyone. Alternatively, the reader will complain to the wrong e-mail host, frustrating them more and giving you the time to get away.
Before adding fake headers into your spam you should know that it is illegal to do so, as defined by the following sections of the Controlling the Assault of Non-Solicited Pornography and Marketing Act of 2003 (CAN-SPAM):
“(A) Header information that is technically accurate but includes an originating electronic e-mail address, domain name, or Internet Protocol address the access to which for purposes of initiating the message was obtained by means of false or fraudulent pretenses or representations shall be considered materially misleading;”
“(C) Header information shall be considered materially misleading if it fails to identify accurately a protected computer used to initiate the message because the person initiating the message knowingly uses another protected computer to relay or retransmit the message for purposes of disguising its origin.”
Following is an example of injected message headers that attempt to fool the recipient of the message into believing the e-mail originated from Microsoft.com:
Received: from ppp-123.companyx.com (ppp-123.companyx.com [198.113.xx.x]) by mail.microsoft.com (8.8.7/8.8.7) with ESMTP id XAA1923 for <firstname.lastname@example.org>; Sat, 10 Sep 1998 11:16:34 -0400 (EDT) Message-Id: <199709201416.XAA24492@mail.microsoft.com> Received: from mail.microsoft.com (2188.8.131.52) by mail.microsoft.com (MX E5.0) with ESMTP; Sat, 20 Sep 1997 07:20:30 -1300 EST
The last line of this entry is where the injected headers begin. The spammer who sent this e-mail was not very clever, and my rule-based spam filter easily caught this message. To start with, the headers suggest that the e-mail passed through e-mail.microsoft.com. Also, it shows that Microsoft is running sendmail 8.8.7 and that the local time at Microsoft is still the year1998. The IP address of e-mail.microsoft.com cannot be 2184.108.40.206, because this is an invalid IP address. (Note that 295. is beyond the scope of a legitimate dotted decimal address.) Apparently, e-mail.microsoft.com relayed this e-mail through itself, which also points toward the e-mail headers being invalid.
This e-mail really originated from the host at the beginning of the e-mail headers; ppp-123.companyx.com. This host is either part of a Botnet or the user is running an insecure proxy server. The spammer is trying to use ppp-123.compayx.com to send e-mail; however, for some reason, they decided to be stealthy in how they do this, and have falsely claimed that the e-mail was relayed through microsoft.com, when it obviously wasn’t.
Injecting different header fields can have very different results. One method that was popular in early 2000 to 2002 was injecting a virus scanner such as “Scanned-by xxxxxx anti-virus” into the header. This header often bypassed the need for the message to be scanned again. However, this method is now overused and has very little effect on modern filtering methods.
The most significant downside to adding false headers into e-mail messages is that you need to be careful about what you actually add. You need to have some idea of what should be there, including an invalid IP address or incorrect e-mail daemon. This information will confuse no one, and will only draw a filter’s attention to your message. A suspicious e-mail header can highly increase an e-mail’s chance of being filtered, because many spam filters now look for dubious or suspicious information being added into spam headers. Keep it legitimate. If you like adding headers, copy and paste a legitimate e-mail header. Be original but do not try to be creative. By nature, Spam headers are not creative.
The following is a better example of a header injection, but it still lacks quality and is highly detectable. This spammer, although more effective than the previous spammer, is still not injecting efficient or correct headers, causing this e-mail to be filtered.
Return-Path: email@example.com Received: from pcp04613952pcs.gambrl01.md.comcast.net (pcp04613952pcs.gambrl01.md.comcast.net [68.49.xxx.xxx]) by mta05- svc.ntlworld.com (InterMail vM.4.01.03.37 201-229-121-137-20020806) with SMTP id <firstname.lastname@example.org>; Sun, 26 May 2004 16:52:21 +0100 Received: from pxlvx.cvp5tr.net ([195.216.xx.xxx]) by pcp04613952pcs.gambrl01.md.comcast.net with ESMTP id 9821319; Fri, 26 May 2004 22:48:11 –0100 Message-ID: email@example.com From: "Ralph Pegash" <firstname.lastname@example.org>
Message ID email@example.com is obviously fake; h00.h00 should be the name of the server currently processing the e-mail (in this case pxlvx.cvp5tr.net). You can see that IP address 195.216.xx.xxx is listed as the address for pxlvx.cvp5tr.net, however an nslookup on that IP address shows that 195.216.xx.xxx really resolves to support.kamino.co.uk, and pxlvx.cvp5tr.net is not even a valid DNS entry. This spammer’s e-mail has been caught because they did not think about what they were injecting!
The following is what he should have done.
Find a real host to spoof. Give the real IP address of that host and its real name. Do not be lazy; it only takes a few seconds to find a real and currently active host.
Find out what e-mail software your spoofed host is running and then issue a correct Message ID for that software. Do a Google search to find out what the correct Message ID should look like. Make sure you include the correct server name after the “@” sign in the Message ID.
Make the Message ID unique. Message ID’s often contain the date, time, second, and millisecond. Follow this trend. For example: firstname.lastname@example.org starts with the year (2004), then the month (05), then the day (26), and then the second, the millisecond, and a random number or two. Given this layout, you can easily predict what valid Message ID’s can look like coming from this e-mail daemon with the following expression:
where each question mark can be a sequential decimal number. This produces 999,999 permutations on the one Message ID. A little bit of thinking will result in your spoofed header entries looking legitimate and not being filtered at any spam filter. Many automated spam-sending programs that offer a way of injecting headers to fool the recipient, are flawed, so you must be careful what you use. Bad software designs have led some mailers to use highly predictable and incorrect information by default; flaws such as using the same Simple Mail Transfer Protocol (SMTP) ID, no matter what host you relayed through. Others give incorrect time zone information, such as EST being -0600 (EST should be -0500), or use invalid IP address numbers that go beyond 255 or below 0.
A successful header spoofing attack can result in a large portion of complaint e-mails being lost or redirected to a wrong party. Spam is a world where complaints can mean the difference between being paid and not being paid. It is worth investing time into a well thought-out header injection attack.
RBL’s can be hard for spammers to bypass, especially spammers that send e-mail solely through open proxy servers, because these proxy’s are quickly found and banned by the RBL. This has led to many attacks on RBLs from spammers attempting to take their service offline. Taking an RBL down effectively stops any clients from querying another host’s validity. A few hosts can then be used to send all spam, since there is no RBL present to detect the spam-sending host’s presence.
One such incident happened on November 1, 2003, when a new Trojan worm called W32.Mimail.E surfaced. This Trojan focused on replicating itself to all of the people in an address book. Once infected, the client took part in a global Distributed Denial of Service (DDOS) against popular spam RBL SpamHaus, sending as much junk traffic as possible to six of their anti-spam Web sites. Within hours of Minmail.E being launched, spamhaus.org was receiving up to 12MB of DOS traffic at each of their Web servers.
A month later, another variant of Minmail dubbed Minmail.L began to spread. Minmail.L focused not only on attacking Web site www.spamhaus.org but also replicated itself with a message informing victim’s that their credit card was going to be billed; unless the recipient sent an e-mail to email@example.com they would be charged $22.95 a week. The e-mail also hinted that the recipient had purchased child pornography from spamhaus.org. This was a very sneaky approach since it not only spurred all of the infected clients to partake in a second DOS aimed at spamhaus’ mail server, but also gave a bad impression of spamhaus as a company, as seen in the following example message Minmail.L used:
Good afternoon, We are going to bill your credit card for amount of $22.95 on a weekly basis. Free pack of child porn CDs is already on the way to your billing address. If you want to cancel membership and your CD pack please email order and credit card details to firstname.lastname@example.org Are you ready for all types of underage porn? We have the best selection for every taste! Just click the secret links below and have fun: www.authorizenet.com disney.go.com www.spamcop.net www.carderplanet.net www.cardcops.com www.register.com www.spews.org www.spamhaus.org Nude boys under 16! Nude girls under 16! Incest, a daddy & a daughter! We have everything you have ever dreamed for!
It is very damaging to paint spamhaus as a child pornography company, which may have done more public relations damage than it did network damage. Spammers who promoted pornography and sexual enhancer pills were responsible for this particular attack; if spamhaus.org went down it would disable the spam protection of any client who wished to verify a host’s validity. This would give the spammer’s a substantial e-mail delivery rate, since no host or message filtering would take place. A large delivery rate directly affects your payout, because more clients are able to read your spam so there is a higher chance of someone buying your product.
A constant war wages between spammers and RBLs; it’s fairly common for smaller RBLs to be targeted by hackers and spammers. A common trick is to find mail servers that use RBL-based spam-filtering software. A spammer will then harvest as many e-mail addresses as possible for users at these given sites. With the help of hackers (if required), that particular RBL is then broken into or taken down by means of a DOS/DDOS attack. The goal is to make the RBL unusable. Often, if spammers can get inside an RBL they will add localhost or *@* to the banned blacklist to force clients to stop using the RBL, since it would block all of their incoming e-mail.
Alternatively, spammers can attack the RBL so much that clients cannot query it when they need to question a host’s validity. Once the RBL is unreachable, spammers send out the spam to the users from a handful of proxy servers. Spammers know that without the RBL the mail server has no way of knowing that the servers sending them e-mail are known spam hosts; therefore the e-mail will have a much higher delivery rate.
Such attacks require a lot work, and there is a certain level of risk involved with attacking an RBL. The scale of e-mails sent out is very significant; 100 million spam e-mails would be the least I would send out for an operation like this. Many spammers do not have network security experience (like myself), so they often hire hackers to help with attacks on RBLs. This furthers the social relationships between hackers and spammers.
The ability to use HTML inside an e-mail has opened up a new world of evasion techniques. HTML is a very functional rendering language, and there is much scope when you use HTML as a rendering engine to evade a spam filter.
HTML is becoming known as a method spammers use to hide messages. Because of this, many spam filters are becoming suspicious of HTML messages, and more precautionary measures are taking place when analyzing them.
HTML’s success is its ability to contain data that is only visible at an un-rendered level, keeping it out of the rendered page. This means the recipient sees one message while the spam filter is shown another. This is due to spam filters not having the intelligence to render HTML.
One method of achieving HTML obfuscation is to insert junk or invalid HTML tags into the message. Some spam filters are affected by this and so instead of seeing the whole word “Viagra” they see something entirely different, as shown in the following example:
<html> <b> <aef>F</aef>e<ira>e</ira>l<spa> like</spa> <aea>b</aea>uy<ea>in</ea>g <ie>V</ie>i<xtag>a</xtag>g<ali>r</ali>a? <a>C</a>l<b>i</b>c<aef>k</aef> <a href=http://www.drugsaregood.com> h<b>e</b>r<ac>e</ac> </a> and make it so! </b> </html>
As can be seen, this is highly confusing; the wasted HTML tags scattered around the page help break up the words for any spam filters.
Unless a filter is actively stripping out all HTML tags before parsing the e-mail, any checks to see if “Viagra” is present will fail. However, the e-mail client used to view this message is much smarter. The recipient’s e-mail client will not render unused or invalid HTML tags, so although the e-mail may look cryptically strange to the spam filter, the user will have no problem reading it (see Figure 7.3).
Figure 7.3: The Rendered Page: Clean and Readable
This is probably the most common method of obfuscation used today. If you change each junk tag name for each spam, you can beat most simple hash-based filters that are thrown off by the ever-changing HTML markup. Recent spam filters would strip out all of the HTML tags and parse the e-mail for spam content. Many people still run out-of-date filters, so these HTML obfuscation methods are still actively used with good success.
Other methods include adding more visible words or letters into the HTML body. These letters are rendered but are not visible to the reader because of their size or font. Using a 1-pixel font size is a common method of inserting rouge characters into spam. As seen earlier in Chapter 5, you can change the entire body of the message while keeping the rendered version still readable. When multiple characters are injected into the phrase “Buy Viagra Here” it can become “BAuZy ~VWiEaGgVrZa !H<eWrWa” to a spam filter.
The following example uses 1-pixel high characters to obfuscate the main message:
<html> <font size="1" color="#ffffff">a</font>B<font size="1" color="#ffffff">a</font>u<font size="1" color="#ffffff">x</font>y <font size="1" color="#ffffff">-</font> V<font size="1" color="#ffffff">a</font>i<font size="1" color="#ffffff">u</font>a<font size="1" color="#ffffff">a</font>g<font size="1" color="#ffffff">a</font>r<font size="1" color="#ffffff">i</font>a<font size="1" color="#ffffff">a</font> <font size="1" color="#ffffff"> </font><a href=http://www.drugsaregood.com>H<font size="1" color="#ffffff">a</font>e<font size="1" color="#ffffff">p</font>r<font size="1" color="#ffffff">a</font>e</a> </html>
Although junk HTML tags in a browser can easily be filtered by a HTML pre-filter, how do you filter against something that is visible and rendered to the user, but not visible to the naked eye?
The phrase “Buy Viagra Here” (Figure 7.4) becomes “aBauay - Vaiaaagaraaa Haearae” once a HTML filter strips out all of the font tags. Many spam filters will now have problems dealing with the message because they don’t know what it is. Other rules can be triggered using this method, such as using long words or no English text, but in general you will have a lower score than if you wrote “Buy Viagra Now.”
Figure 7.4: HTML Character Obfuscation – Rendered Version
The long words, single pixel fonts, and white-on-white text will raise some suspicion, but as long as the rest of the e-mail is legitimate looking it will pass most rule-based filters. This is another highly popular method of obfuscating text, the message is still readable to anyone who has a HTML-enabled e-mail client, but is unreadable to any spam filters that do not actively pre-parse messages.
An easy way to bypass filters that look for white-on-white text is to use #FFFFFE instead of #FFFFFF for the text color.
Many older filters look for the string font color=”ffffff”. These filters can be bypassed by using a different string, and the color shown on the screen e-mail client will still not be visible to the user.
Also, never try to use a negative pixel size for the text. Negative size fonts incur a much higher spam score than using very small text.
When using HTML to inject characters, do not use obvious padding characters such as “.,-()^~`” etc.; use real letters. More suspicion would be raised if you wrote .B.u.y . V.i.a.g.r.a. H.e.r.e, as the filter would reason, what English sentence has thirteen full stops in it? Filters are catching onto this, so it is best to use vowels to pad words.
Inside the program you use to send spam, define a variable to be one of the letters: A, E, I, O, U, and then randomly insert each one with a 1-pixel size HTML tag between each letter of the word you’re trying to obfuscate. If you want to go one-step further, keep the words linguistically correct.
“I” before “E,” except after “C” or except when sounding like “A” such as in neighbor and weigh.
This may help with Bayesian filters if it is able to match your new words to previously known words. Using a semi-legitimate word structure for all of your injected words can help greatly.
HTML also offers the use of images, whether it’s using it as a method of verifying an e-mail address’s validity, or a method of displaying dubious words. Images can play a vital part in HTML spam.
A spam filter is unable to OCR (optically character recognize) the image you are linking to. Spam filters have no idea what the image is really saying; it could be a logo for a company or a sign saying “Buy Viagra Here.” There really is no way to tell, which can be a big problem for spam filters. If you have the money, invest in a bulletproof Web host, someone who will allow you to host pictures with them. Next, write a spam e-mail and include a link to a modestly sized picture. Call it logo.gif and have it contain the main spam keywords the filter will be looking for most (i.e., “Buy Viagra Now ”); anything you do not want to say directly, but still want to say. Keep the size of the file small and do not overuse colors. Remember: if a million people are going to download this picture and the picture is 100 Kbps in size, it equates to a lot of bandwidth. Also, make sure that you don’t use one picture as the entire body of your message; it’s all about normalization and moderation. Keep an even amount of both textual and pictorial data such as random text (as seen in the next section), pages of the bible, or quotes from a song.
No one sends e-mail where the entire body is a jpg, because it would be caught by the majority of spam filters. You need to be creative and stealthy. A good rule is that for every image, add 3,000 bytes of random data (3,000 random characters). This will produce a good result with spam filters that are checking for weighted percentages of pictorial data within a message.
Random data is essential in spam for a few reasons. First, it offers a method in which the spam can be unique from every other spam message sent. Well-placed random data can ensure that the message is always unique, even to a spam filter that is trying to filter obvious attempts to be unique. If enough thought is used when making spam unique, it is possible to evade many hash-based spam filters. Again, the trick is to look normal; do not draw too much attention to obvious strings of random data.
If you decide to always have a unique message subject in your spam, do not be blatant about how you do it. Having a different random number at the end of the subject is not the best way. The majority of legitimate e-mails do not contain a single number in the subject. Stick to this rule and instead of numbers, use random words or random placements of words. If random data is to be purely random, it must be placed at random intervals throughout the message. For example, when a string of five numbers is placed at the end of a message subject, you can make the subject contain ever-changing random data but that data is always located in a very predictable location at the end of the subject. Filters have caught onto this and now filter any spam that contains a string of random numbers at the end of the subject. Mix it up a little. The use of phrases, letters, and random words from the e-mail body can greatly help your chances when trying to pass random data off as legitimate text.
Avoid using large amounts of white space as a method of hiding random data in spam. By including 10 or 20 carriage returns in the e-mail before your random data, spammers are able to push the gibberish sentences out of sight of the reader. This is an easy trick and is one rule that filters use to catch a lot of spam. Remember to keep the e-mail looking legitimate. If you typed half a page of text, why hide it out of sight below 10 carriage returns? Do not be worried about the reader becoming lost in the random data, or somehow not buying the product if it contains random data.
Statistics show that users actively click on anything you give them. Even if there are a few lines of random data on either side, the majority of the time if the user was going to click on it before they saw the lines of random data, they will still click on it.
Readers are very used to mentally deciphering spam e-mails as they read them. Tricks such as V1^gr@ have taught readers to be very astute when reading spam. Including a few lines of text that makes no literal sense in the e-mail will not hinder anyone from buying your product, who wasn’t interested in the first place.
Another use for random data is to bulk up the size of your spam message. If you were to analyze a few hundred spam messages and then compared each to a real message, you would probably see that on average spam messages are shorter than legitimate messages, with usually only a few hundred bytes to the message. Spam usually contains a quick catch phrase and a link to the product. How many legitimate e-mails do you receive that are two lines long with an HTTP link in the body? Not many I bet, and this has become a method of filtering spam; catch the messages that are short and often HTML-encoded with hyperlinks inside the body.
There are many legitimate reasons you might send someone a message in HTML with embedded hyperlinks, but not many of those messages are short in length. E-mails such as HTML newsletters or automated e-mail reports may contain links and be sent in HTML format, but they are usually decent in length with a large percentage of the e-mail being text.
This is where random data becomes useful; with most mailing software you can add random words, letters, or characters to an e-mail. Simply have a text paragraph or two of random phrases, include random lines from a text file of quotes, and include a joke or two. Spammers don’t usually have much to say to anyone; the majority of the time a message involves “Hey, buy my product, click here.”
Make sure there is enough text in the body so that any spam filter will think long and hard about the message and its validity. Spam filters contain code to not drop legitimate e-mails; make them think your e-mail is just that. Sure, the message comes encoded in HTML and contains a hyperlink to some questionable .com site, but it also contains a large amount of legitimate English words.
As mentioned earlier in this chapter, be sure to use correct English words in spam and do not try to make up too many new words from random characters. Also, repeat several words in the body multiple times, preferably a noun, something that would be common in a passage of text. If you can, also include punctuation marks and grammatical elements, which will add to the message’s validity.
The following is an example template of a message that contains eight lines of random data. The mailing program adds this data in when the message is queued for sending, but the random data is positioned in a way that it adds to the validity of the message without drawing too much attention to itself.
From: email@example.com Message Subject: %RND_WORD %RND_WORD. Yo %FIRST_NAME, I %RND_WORD %RND_WORD and %RND_WORD, %FIRST_NAME and I %RND_WORD %RND_WORD. %RND_WORD the %RND_WORD %RND_WORD %RND_WORD it. %RND_WORD %RND_WORD %RND_WORD a %RND_WORD %RND_WORD the %RND_WORD %RND_WORD %RND_WORD. %RND_WORD the %RND_WORD %RND_WORD %RND_WORD it. %RND_WORD %RND_WORD You should buy my Viagra and Xennax Low prices, will keep your wife happy! http://www.drugsaregood.com %RND_WORD the %RND_WORD %RND_WORD %RND_WORD it. %RND_WORD %RND_WORD %RND_WORD a %RND_WORD %RND_WORD the %RND_WORD %RND_WORD %RND_WORD. %RND_WORD the %RND_WORD %RND_WORD %RND_WORD it. %RND_WORD %RND_WORD
This plaintext message was sent to my mail server, which was running Spam Assassin using the latest rule set available. The test is to see if I can use random data to deliver a spam and what impact it has on the score of my message.
The message is sent from a host with valid reverse DNS, PTR, and MX records setup. This server is known as spambox. Currently, my spambox’s IP is not listed in any RBL, and it has never sent my mail server e-mail. For demonstration purposes, I will use my spambox to demonstrate how random text can reduce the score you receive from a rule-based filter. The following results are the output of e-mail headers, identifying what the spam score of each message was with and without the random data.
Received: from firstname.lastname@example.org by mails by uid 89 with qmail scanner-1.22st (clamdscan: 0.74. spamassassin: 2.63. perlscan: 1.22st. Clear:RC:0(220.127.116.11):SA:0(-4.3/5.0):. Processed in 8.234695 secs); 25 Aug 2004 03:41:22 -0000 X-Spam-Status: No, hits=-4.3 required=5.0
This message scored 4.3; SpamAssassin needs a score of 5 by default to declare a message spam. It came very close, but 4.3 is still under 5. For another experiment, I sent the following:
Received: from email@example.com by mails by uid 89 with qmail scanner-1.22st (clamdscan: 0.74. spamassassin: 2.63. perlscan: 1.22st. Clear:RC:0(18.104.22.168):SA:0(-4.8/5.0):. Processed in 2.344291 secs); 25 Aug 2004 03:44:13 -0000 X-Spam-Status: No, hits=-4.8 required=5.0
This e-mail was still delivered even with the absence of random data in the body. However, the score was much higher than the previous message with random data, plus it took only 2.3 seconds for the server to derive this message’s score. If I had been sending this from a questionable host, something that was listed in an RBL or had a dubious DNS record, my message would have been marked much higher or flagged as spam; it only needs .2 more points in the score to become spam. My e-mail host’s credibility gave me some leeway with the message, since the host looks highly legitimate with DNS records and “e-mail” in the host name. However, if I began delivering this message to millions of other hosts, hash-based spam filters would soon catch the message trend and ban messages with this content and my spambox’s IP.
Random data helped evade the spam filter and would help against future filtering based on the message’s size or exact contents. Filters would quickly grow to know the Web site mentioned, or the catch phrase “Low prices will keep your wife happy!.” Ideally, if I was a spammer, I would be using additional random data within or around any spam catch phrases, to keep all elements of the e-mail unique and to help protect it from smarter hash-based filters that may be able to detect my random data.
|< Day Day Up >|| |