Harvesting the Crumbs of the Internet

 < Day Day Up > 

Harvesting the Crumbs of the Internet

Wherever you travel on the Internet, whatever you say and whatever you do leaves a trail; a breadcrumb trail of facts, reply addresses, Internet Protocol (IP) addresses, names, and dates. The Internet, now littered with this information, has become an old dusty house with millions of random facts and traces left in the corners of cyberspace.

This information contains far too much detail for its own good. What’s more, it is easily accessible by anyone on the Internet, and searchable with common Internet search engines. Spammers caught onto this fact in the late 1990s and a technique known as harvesting was born. Harvesting was one of the first methods used to find new e-mail contacts. The idea is simple: search newsgroups, mailing lists, and bulletin boards for posts containing the sender’s and recipient’s e-mail addresses. As you can see in Figure 4.2, it’s easy to find. Harvesting millions of e-mails at a time, the early pioneers of spam could obtain large e-mail distribution lists quickly and simply by sifting through the cookie crumb trail of facts.

click to expand
Figure 4.2: Jungshik and AmirBehzad: Fancy Some Viagra?

Software was soon developed to take full advantage of this information, and today there are dozens of Web, Internet Messenger, and newsgroup “harvester” applications in production. These programs scan millions of messages, posts, and contacts, searching and harvesting any e-mail addresses found within. Ideal Web sites to harvest would be a directory or a Web-based “yellow pages.” These online e-mail databases allow spammers to quickly harvest millions of legitimate accounts. Often, spammers write their own custom harvester programs, designed to quickly pillage new applications such as peer to peer (P2P) networks, online game servers, and new searchable online address books.

Internet Messenger applications scan user profiles requesting their user information and then record any listed e-mail addresses. Most people use I Seek You (ICQ) or MSN for chatting, and they tend to list not only their real e-mail address but also their cell-phone number in their user profile. Therefore, this method is highly effective at collecting legitimate e-mail addresses.

Network News Transfer Protocol

In the early days of the Internet, a popular method of talking to many people was using the Network News Transfer Protocol (NNTP) boards, which were much like the Web-based bulletin boards of today but were primitive and lacked any privacy measures. Unlike the bulletin boards of today that hide e-mail addresses from prying eyes, NTTP will clearly show senders’ e-mail address and often their IP. This information is visible to anyone with access to the NTTP server. An example of these older NTTP boards is shown in Figure 4.3.

click to expand
Figure 4.3: NNTP Message Example

As you can see, the e-mail address is clearly visible. To build a quick contact list you would simply scan the entire NNTP server and collect the e-mail addresses in each message. NNTP harvesting is a very popular harvesting method that is still used today, and is seen in Figure 4.4.

click to expand
Figure 4.4: Newsgroup Harvesting in Action

The sudden interest in e-mail harvesting caused the Internet to become very conscious of the information it disclosed, and people began reducing the amount of information they gave out.

The early methods of spam were quick, easy, and highly untargeted. You could harvest a million e-mail addresses and still have no idea about the users’ likes or dislikes, so selling a product to them was much harder. It was all about luck; send as much e-mail as possible to as many people as possible, and hope that they buy your product.

This crude method worked well in the early days of spam, when the world was new to the idea of unsolicited e-mail and people were easily swayed by slick offers. However, as time passed and spam became more unpopular and thus ignored, harvesting became a very unsuccessful method of obtaining new e-mail contacts. Sure, people still bought the product but the percentages of those people were as low as 0.001 percent.

Now a spammer had to send many millions of messages just to break even financially. For some spammers, they dislike the idea of harvesting e-mails, as it provides a highly untargeted user base and you usually end up sending spam to people who not only do not want spam, but also have no interest in buying the advertised product. For them, quality, not quantity, is very important when dealing with mailing lists.

Internet Relay Chat Harvesting

Internet Relay Chat (IRC) is a popular chat network used worldwide. Clients connect to an IRC server and then join channels and discuss random topics. IRC is very popular with younger Internet users and offers a much richer talking experience, allowing users to talk to many large chat rooms filled with like-minded users. However, IRC is also known for leaking information using the identification (IDENT) protocol. IDENT is an original UNIX-based protocol that, when asked, shows the user currently running the IRC client. For example:

_Wrillge is xxxxx@box21.stanford.edu * I'm too lame to read BitchX.doc * _Wrillge on #imatstanford  _Wrillge using irc.choopa.net Divided we stand, united we fall _Wrillge End of /WHOIS list.

Here we can see that the nickname _Wrillge is actually xxxxx who is using BitchX (a UNIX-based IRC client) on a UNIX server at Stanford University.

There is a good chance that user xxxxx@box21.stanford.edu is a valid e-mail account, but it will require the server to be running an e-mail daemon.

This method, although easy, is highly unpredictable. The majority of people who use IRC are Windows-based clients, who are not usually running an e-mail server and are using a home Digital Subscriber Line (DSL) connection. For example:

exad is manny@61-166-154-55.clvdoh.adelphia.net * Manny exad on #idler exad using irc.blessed.net A fool's mouth invites a beating. exad End of /WHOIS list.

In this example, the chance of manny@61-166-154-55.clvdoh.adelphia.net being a valid e-mail account is slim to none.

Harvesting e-mail accounts from IRC was one of the earliest methods used and is obviously not very accurate. Still, it can produce some valid e-mail addresses, mostly collecting users running IRC from UNIX-based computers, which have sendmail and IDENT installed and are running by default. However, these e-mail addresses may not be the user’s primary addresses and thus may not even be checked. In fact, the users may not even be aware that they are running an e-mail daemon. This decreases the usability of the e-mails greatly; a spammer should not expect a wondrous return by collecting e-mails from IRC.

whois Database

When you register a new domain, you are required to enter personal details to assist the billing and technical responsibilities of the domain. These details include phone number, address, and e-mail address. (For this example, the real name and contact information has been replaced with “X’s.”) For example:

[root@spammerx ~]# whois apple.com [Querying whois.internic.net] [Querying whois.markmonitor.com] [whois.markmonitor.com]     Administrative Contact:         XXXXXXXX XXXXXXX (XX557)         (NIC-14211601)          Apple Computer, Inc.         1 Infinite Loop M/S 60-DR         Cupertino         CA         95014         US         XXXX@apple.com         +1.40XXXXXXXX         Fax- +1.40XXXXXXXX     Technical Contact, Zone Contact:         NOC Apple (NA4189-ORG)         (NIC-14211609)          Apple Computer, Inc.         1 Infinite Loop         M/S 60-DR         Cupertino         CA         95014         US         XXXX@APPLE.COM         +1.40XXXXXXXX         Fax- +1.40XXXXXXXX     Created on..............: 19XX-Feb-19.     Expires on..............: 20XX-Feb-20.     Record last updated on..: 20XX-May-20 12:16:06.

Spammers love anything that requires you to enter your e-mail address, and sure enough, many spammers actively harvest contacts from the whois database.

By using the UNIX tool whois database one can easily see who is listed as the administrative contact; this is a valid address and is probably active right now.

There are applications that were developed to harvest contact details from the whois database. One such application is whois extractor (see Figure 4.5). Developed by www.bestextractor.com, its design lets you quickly enumerate name, phone number, and e-mail address for both the technical and administrative contact for any domain currently active.

click to expand
Figure 4.5: whois Extractor in Action

Although these are legitimate e-mails and more than likely currently active, the majority of the users are probably not interested in buying erectile dysfunction medication or investing in a new home loan. Their worth is much less than that of a direct opt-in list because they lack targeting; the only common interest these e-mails share is that they have all bought a DNS name. So, perhaps spamming a DNS sign-up program just before the domain expires is not such a bad idea.

Purchasing a Bulk Mailing List

How often do you receive an e-mail offering to sell you 100 million e-mails for use in bulk e-mailing or direct e-mail marketing?

The number of e-mails are staggering; at least 100 million verified e-mail addresses for around $100.00. That works out to 0.000001 cents per e-mail address, and is by far cheaper than buying from another spammer or hacker where you may only get one or two million e-mails for the same price.

Usually, bulk mailing-list companies are run by ex-spammers or hackers, are often run anonymously out of an offshore P.O. box tax free, and in general are very discreet operations. Even though bulk mailing list companies often keep their word and sell you 100 million e-mail addresses, the usability of the e-mails is often very poor. The majority of the e-mails originate from other well-used mailing lists. Furthermore, large amounts of the addresses originate from harvested public Web sites. This means that the e-mail addresses have been receiving spam for a long time, and by now are either running very strict spam filtering software or are very sick of receiving spam and are likely trashing the messages without opening them.

E-mail addresses such as Webmaster@company.com and contact@company.com litter the lists; they are obviously addresses that would not be interested in purchasing any product, even though they are legitimate e-mail addresses.

Notes from the Underground…

Using a List

In my early days of spam I fell for one of these lists. I paid a small US- based company $50.00 for 75 million verified e-mail addresses. I was very eager about the possible income this could produce, so over the next week I sent spam to all 75 million. I was selling a new diet pill called Solidax ADX. It offered an “easy, effective way to loose those unwanted pounds” by suppressing your appetite.

Previous spam sent to known buyers of weight loss products were selling at a ratio of 1 to 900 e-mails sent, the average sale making around $40.00 U.S. dollars. With this in mind I predicted I would make at least 500 sales selling to 75 million untargeted e-mails; I did not expect my very successful 1:900 ratio.

To my utter astonishment, not a single person out of the 75 million bought any diet pills. 0:75000000 is beyond ridiculous (only 400 people even clicked on the e-mail), showing that the average user was very sick of spam and showed no interested whatsoever in the product. The quality of the list was severally affected by its untargeted nature. The likelihood is that 10 to 20 other spammers had already used the list to send spam, further decreasing my chances.

Some companies offer targeted bulk mailing lists with offers such as “100 million guaranteed American addresses” and “290 million married older men.” The prices are almost ten times higher than the untargeted lists, with an average price of 0.0001 cent for each e-mail address.

These lists promise a more targeted approach and a younger group of users, ensuring that the user is not already sick of receiving spam. The majority of these companies obtain their lists through hackers, spackers, and insiders, buying any personal demographics and customer contact lists that are for sale.

Notes from the Underground…

Bulk Mailing Lists

I have asked friends of mine who actively buy targeted bulk mailing lists, for their opinion on the lists’ success versus untargeted lists. The general feeling is that the return rate is much higher than that of an untargeted list, with an average list giving a 5 to 15 percent click rate versus an average 1 to 5 percent on an untargeted list.

Although this percentage is higher than an untargeted list, it is still much lower than a list you might source yourself (i.e., from hacking an opt-in list). This is because every list may have been bought by at least five other spammers, which significantly lessens the impact factor of the e-mail. Targeted or not, if a user has to deal with five or ten spam messages per day, the chance is much higher that they will delete your e-mail without even reading it due to the amount of spam in their in-box. Most of the time, buying a bulk mailing list does not produce amazing results—anything that can be sold will be sold multiple times.

In the end, no one really profits from these bulk mailing lists except the entity selling the list. If you only receive one spam e-mail per day, you may be tempted to open it and click on the link within. If you receive 50 spam e-mails daily, you will probably select all of them and press delete. The potential customer has become irritated with the spam, and the spammer fails to gain any profit.

Some spammers do not recommend using an untargeted bulk mailing list again. The returns are too poor and you end up aggravating the public unnecessarily. If you must use a bought mailing list, use a semi-targeted list, data that makes sure your message goes to an English-speaking person who will have some interest in the product you are selling.

start sidebar
Tricks of the Trade…The Great Circle of Spam

You may notice a trend around the amount of spam you receive. Some weeks you may receive one or two messages a day, other weeks up to 100 per day, and then back to one or two the following week. This trend is mostly due to companies selling bulk e-mail lists. Your e-mail address was probably harvested, sold, collected, or stolen and is now part of a large bulk mailing list along with hundreds of millions of others.

When the list is sold to a new spammer you will receive more spam. For a week or two you will be bombarded with many offers from that particular spammer. Once the spammer finds little or no revenue left in the e-mail addresses they will stop spamming them and probably sell the list it to another spammer for $5.00, at which point you will start receiving new types of spam from a new spammer. This trend creates what I call “The Great Circle of Spam;” a predictable and mapable lifecycle showing the spread and growth of spam to your e-mail account.

end sidebar

 < Day Day Up > 

Inside the SPAM Cartel(c) Trade Secrets From the Dark Side
Inside the SPAM Cartel: By Spammer-X
ISBN: 1932266860
EAN: 2147483647
Year: 2004
Pages: 79

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net