An important aspect of following the cybertrail in an investigation is to search for related information on the Internet such as a victim's Web pages or Usenet messages, an offender's e-mail address or telephone number, and personal data in various online databases. Because the Internet contains so much loosely ordered information, searching for something in particular can be like looking for a needle in a haystack. This is why it is crucial to learn how to search the Internet effectively. In addition to becoming familiar with various search tools, it is necessary to develop search strategies.
One method of searching for digital evidence on the Internet is to look for online resources in a particular geographical area. For instance, if a victim or unknown offender lives in San Francisco, there is likely to be a higher concentration of related information in that area. Searching online telephone directories, newspaper archives, bulletin boards, chat rooms, and other resources dedicated to San Francisco can uncover unknown aspects of a known victim's online activities and can lead to the identity of a previously unknown offender. Search engines that focus on a particular country (e.g. www.google.it, ie.altavista.com) can also be useful for a geographically focused search.
Another strategy is to search within a particular organization. For instance, if a victim or offender is affiliated with a particular company or school, there is likely to be a higher concentration of personal information in associated online resources. As with a geographically focused search, looking through an organization's online telephone directory, internal bulletins or newsletters, discussion boards or mailing lists, and other publicly accessible online resources can lead to useful information. Additionally, it may be possible to query systems on an organization's network for information about users. Although it is permissible to access information on an organization's computer systems in non-invasive ways, care should be taken not to cross the line into unauthorized access.
Besides searching for real names, nicknames, full e-mail addresses, and segments of e-mail addresses, it can be productive to focus searches around unusual interests, searching areas on the Internet that the victim or suspect frequented. Given the difficulty in making informed guesses of where a victim or offender might go on the Internet, this type of search usually develops from a lead. For instance, interviews with family and friends, or an examination of a victim's computer may reveal that she subscribed to a particular newsgroup and frequented a particular IRC chat room to arrange sexual encounters. An offender or victim may have left traces of their activities in these online areas. Searching these areas can be particularly productive if the offender and victim communicated with each other in a public area on the Internet, revealing connections between them.
In addition to the traces of activities that remain on the Internet, online witnesses who used the same areas may have logs of the activities on their computers. For instance, in the Sharon Lopatka case, participants in the AOL and IRC channels that the victim and offender frequented recalled that both of them did not employ "safe-words" to prevent injury during rough sex (Cairns 1996). As another example, after apprehending an offender, some digital evidence examiners will contact people who the offender was in contact with on the Internet (e.g. sent e-mail, AOL Buddy list). By sending a letter to these individuals informing them of the situation and asking them for any related information, it is possible to locate witnesses and other victims. In some cases, victims of a common offender seek each other out to form online support networks. These associations can be helpful to the victims. They can also be useful to investigators because the networks make identifying and contacting victims easier. However, sharing information about the criminal activity and the offender among victims who are also potential witnesses may complicate matters when the time comes for them to testify.
Notably, these search strategies are not mutually exclusive and can be effectively combined to locate the majority of available information on the Internet regarding the search subject. Whichever combination of search strategies is used, investigators should document important searches, indicating when, where, and how specific items were found. Handwritten notes combined with the investigator's Web browser history are generally sufficient to show when, where, and how information was located. Also, because information on the Internet can change at any moment, screenshots and copies of Web pages are useful for documenting what investigators saw at the time. Some tools for capturing a Web site efficiently and fairly completely are:
Web Whacker: www.webwhacker.com
Adobe Acrobat: www.adobe.com
Teleport: www.tenmax.com/teleport/pro/home.htm
Httrack: www.httrack.com
Web Copier: www.maximumsoft.com
Snagit: www.techsmith.com
Anawave's WebSnake: http://www.websnake.com/
Htdig: http://www.htdig.org
Surfsaver: www.surfsaver.com/download
Wget: http://www.gnu.org/software/wget/wget.html
Black Widow: www.softbytelabs.com/BlackWidow
Some of these tools will not copy subpages of a Web site if links to these subpages are encoded in a scripting language that the tool does not understand. Therefore, it is advisable to test a tool to ensure that it is adequate for the task and inspect the resulting files to verify that they are satisfactory. Any files that are generated during the search process should be inventoried, documenting file names, MD5 values, and date-time stamps.
Search engines are among the most useful tools for finding information on the Internet. Although search engines are not particularly difficult to use, there is some skill involved in using them effectively. Each search engine has different contents, archiving methods, search features, and limitations. Therefore, if is important to understand how each search engine works and which ones are best suited for particular tasks.
Many search engines, like Altavista, actively update themselves by running programs that search the Web incessantly for new data. As a result, they can turn up recent information but lack older, outdated data.[10] Google compensates for this shortcoming by retaining a copy of Web pages it has found - this "cached" information is useful when the original is gone. Google is also capable of searching Word documents and PDF files that other search engines overlook. Additionally, Google has a searchable archive of Usenet messages stretching back to 1981. Another unique feature of Google is its search algorithm (PageRank), which estimates the relevance and quality of data based on the number of links to the data from other sources on the Web. It is important to be aware of how each search engine attempts to "help" with a search so that this "help" can be utilized when it is useful and avoided when it is not.
Investigators can employ the language of the search engines they are using to create more narrowly focused searches. For example, some search engines understand words like AND, OR, NOT, and NEAR. Some search engines also allow symbols such as "-" to exclude terms for the search and "+" to include terms. For instance, in Altavista, the following commands can be used to find documents containing the words "unsolved" and "homicide" but not the words "mystery" or "mysteries:"
+homicide +unsolved -mystery -mysteries
homicide AND unsolved AND NOT myster*
Some offenders protect themselves by using computer-smart nicknames such as En0ch|an instead of Enochian. The zero instead of an "o" and the pipe (|) instead of an "i" confound search algorithms. In such cases, clever use of search engine syntax (e.g. AND, OR, NEAR) is required. Search engines can also be useful for finding connections on the Web. For instance, pages containing links to a suspect's Web site can be found by searching Google or Altavista using the syntax "link:www.suspectswebpage.com." For additional discussion about utilizing advanced features of search engines see SearchEngineWatch.[11]
Keep in mind that searching for obviously illegal terms will rarely turn up anything illegal. Many Web sites use illegal terms to attract interest, but actual criminals make some effort to hide their activities using euphemisms. For instance, some offenders use the terms "lolita" or "nature shots" to refer to images of children, or "family fun" to refer to incest. These euphemisms may turn up during the initial searches, in which case it will be necessary to expand the search using this new knowledge and gradually narrow the search again. Also, individuals who want their Web pages to be excluded by search engines can simply place "robots.txt" files on their Web sites.
Metasearch engines such as Copernic and Metacrawler enable individuals to search multiple search engines simultaneously from a single site. Because they utilize many other search engines, metasearch engines can be useful for brainstorming or finding very specific details. However, since metasearch engines tend to usurp control of the search, their results can be incomplete or can contain unrelated entries. As a result, metasearch engines make it more difficult to determine why certain pages were included in the results, making it difficult to explain to others how the page was found. Search results may contain pages that are unrelated to the subject in question but that contain some of the keywords. Failing to explain exactly how a particular piece of evidence was found can weaken a case. Furthermore, the large number of hits that are common in metasearch engines can be overwhelming and can hinder an investigation.
Although metasearch engines can be useful when searching for very specific details (e.g. occurrences of a telephone number on a Web page), it is important to also search specialized search engines or databases (e.g. telephone directories) when looking for fine details.
There are many databases on the Web containing data within specific subject areas. For example, online databases contain information about sex offenders, missing children, individuals' assets and credit history, and medical information. Many of these databases can be located using search engines but the information they contain can only be queried directly. For instance, using Google or Altavista for "sex AND offender AND database" leads to various Sex Offender Registries around the United States. Some databases are organized on the following Web sites, making them easier to find.
InvisibleWeb: http://invisibleweb.com
Internets: http://www.internets.com
JournalismNet: http://www.journalismnet.com
PowerReporting: http://www.powerreporting.com
There are also online databases, such as AutoTrack and KnowX, containing a wide variety of information about individuals but these databases charge fees for use.
Whois databases are particularly useful for investigations involving the Internet. Whois databases are maintained by Internet registrars and contain the names and contact information of people who are responsible for the many computer systems that make up the Internet. These databases can reveal who is responsible for a particular Web site, including their name, telephone number and address. There are separate Whois databases for different countries - some of the main databases are listed here and others can be found at Allwhois.[12]
United States (NetSol): http://www.netsol.com/cgi-bin/whois/whois
United States (ARIN): http://whois.arin.net/whois/index.html
Europe: http://www.ripe.net/db/whois.html
Asia: http://whois.apnic.net/
Some registrar databases only have information on high-level domains while others have information on IP addresses. For instance, to find the contact information for "www.wsex.com," search Netsol whereas to find contact information for the associated IP address (207.42.132.101), search ARIN. Note that these databases have slightly different contact information for the World Sports Exchange.
Domain name: www.wsex.com | IP Address: 207.42.132.101 | ||
---|---|---|---|
Registrant: Big Green (WSEX-DOM) | ISP: Cable & Wireless Antigua | ||
SPRINT-CF2A87 | |||
| |||
| OrgName: | World Sports Exchange | |
| OrgID: | WSE-9 | |
Address: | Friar's Hill Road | ||
Domain Name: WSEX.COM | Address: | Woods Center, St John's | |
City: | |||
Administrative Contact: | StateProv: | ||
| PostalCode: | ||
| Country: | AG | |
| |||
| NetRange: | 207.42.132.96-207.42.132.127 | |
| CIDR: | 207.42.132.96/27 | |
| NetName: | CWAG-207-42-132-96 | |
NetHandle: | NET-207-42-132-96-1 | ||
| Parent: | NET-207-42-132-0-1 | |
| NetType: | Reassigned | |
| Comment: | ||
| RegDate: | 2001-04-20 | |
| Updated: | 2001-04-20 | |
| |||
| TechHandle: | MH1271-ARIN | |
TechName: | Hayden, Matthew | ||
Record expires on 19-Sep-2009. | TechPhone: | (268)-480-3888 | |
Record created on 18-Sep-1996. | TechEmail: | jay@wsex.com | |
Domain servers in listed order: | |||
| 207.42.132.101 | ||
| 207.42.132.119 | ||
| 66.216.122.143 |
Sites such as Geektools[13] facilitate searches by providing a single interface to many Whois databases. It is also possible to search some Whois databases for other fields such as names and e-mail addresses. Some individuals use services like Domain by Proxy[14] to prevent their contact information from being placed in the Whois database system.
Archives such as Google Groups contain millions of messages from tens of thousands of newsgroups. These archives are invaluable tools for investigators because they contain a vast amount of detailed information about individuals and their interactions. By searching this archive, it may be possible to learn about a person's interests, personality, and much more. However, these archives are not comprehensive and should not be depended on completely when dealing with Usenet. Few archives include message attachments and anyone can specify that they do not want their postings to be archived. Any newsgroup posting with "x-no-archive: yes" as its first line will be ignored by archiving software. Also, there are private newsgroups that are not archived.
Therefore, it is important for investigators to become familiar with and involved in the actual newsgroups related to an investigation rather than rely entirely on the archives. As well as seeing information that is not archived by Google Groups (e.g. images and other file attachments), it is useful to see discussions develop and progress, get to know the characters of the participants, and observe patterns of a particular group's behavior. Additionally, investigators may be able to observe offenders of their local community in newsgroups dedicated to a specific geographic region.
[10]An archive of many Web pages can be found at http://web.archive.org/
[11]http://www.searchenginewatch.com/facts/index.php
[12]http://www.allwhois.com
[13]http://www.geektools.com
[14]http://www.domainsbyproxy.com