Forensic Investigation: Not Exactly a Needle in a Haystack

< Day Day Up >

These are some logical areas that may interest an investigator in locating digital evidence:

File space. This refers to blocks on the drive that either are assigned to an active file or assigned to the file system depending on the structure such as FAT (Windows) or inode (UNIX). Of course viewing interesting files from file space is merely a matter of using a disk editor, locating the file, and copying the file to another media for viewing by the investigator. In this fashion, the original media does not suffer from being changed.
Slack space. This is the space made up of the file system blocks that are partially used by the operating system. Slack space is prevalent in file systems that have written to a sector, then overwritten that space with the newly written information not occupying the entire sector creating a slack space containing data from the previous data. Tools like EnCase or a disk editor will allow investigators to see the "junk" contained in the slack space. Slack space seldom contains enough information to see the entire file, however there is often enough information to interest investigators. File names, file extensions, and pieces of text files are the usual finds.

RAM space. RAM space is the term used to describe empty space between the data and the end of the sector. If there is an empty space, the operating system selects information from the data currently in RAM and writes it there. It can be similar to slack space in appearance.

Experience Note

An investigator conducting an analysis on a target hard drive was able to effectively refute allegations made by a defendant that he had never installed pirated software on his workstation. The defendant had installed a number of expensive applications on his workstation and deleted them and attempted to write over the disk space. However, there were enough data left in the slack space to demonstrate he had indeed installed these applications. The most incriminating evidence was the extensions of the application's files.

Unallocated file space. Any unclaimed sector falling within an active partition or not.
Unclaimed sectors can often be restored by Undelete utilities depending on the operating system and if the unallocated file space is partially overwritten or not.

Physical Level Search

Investigators should consider begin looking at the raw data contained on the target media. Often these analyses are performed with tools like a disk editor or EnCase. With the forensically correct duplicated software, many experienced investigators will perform these principle processes:

String search
Slack space
Free space examination

All analysis operations must be performed on the forensic image or the restored image of the evidence. Never perform examinations on the original evidence.

There is a frequently pursued avenue in running string searches to produce lists of data; for example:

All e-mail addresses
All Web site URLs
All gif and jpeg file extensions
String searches matching specific words
String search

Experience Note

There is a very handy DOS-based program called SearchString written by Dan Mares. It is available at www.maresware.com. This tool provides the context of the string search hit as well as the location being the byte offset from the beginning of the file. By inputting the specific string to be searched, this tool will scan the target media and produce the relative location of the item.

Also, most disk editors have well-developed string search capabilities. Many experienced investigators use disk editors to search for file extensions that are pertinent to the case, e.g., eml, png, gif, jpg, doc, txt, or exe.

File Slack and Free Space

Depending on the operating system's file system, there will be residue that can be located and examined when looking for evidence. File residue basically falls into two categories, file slack and free space.

Free space is that space located on a hard drive that is not allocated to a file. It can be space that has never been allocated to a file or space that is considered unallocated. This unallocated condition usually occurs after a file has been deleted. Unallocated file space occurring after a file has been deleted will often contain remnants of the deleted file. Fragmented data previously written could still reside in these areas and not be easily accessible to the everyday user. In order to gain visibility into these areas, it is necessary to work on the physical level.

In the case of slack space, this occurs when data is written to a storage medium in measures that fail to completely fill the block size as it is defined by the operating system. Investigators attempting to look into this area for evidence will also have to work beneath the operating system at the physical level of the medium.

Experience Note

An employee had been downloading obscene images to his work-station and subsequently deleting them. After a time, he performed word processing and other types of work thinking these had overwritten the images he had previously downloaded and would make viewing the images impossible. Fragments of these images and their file extensions were contained within the slack space and unallocated file space of his workstation hard drive. After forensically imaging the hard drive, investigators peered into slack areas using a disk editor. Investigators were aware that most photographic-quality image files have extensions such as .gif, .jpeg, and .png. They merely used the find function of the disk editor to perform a string search for these extensions. Experience and training taught them that deleted files in DOS-based operating systems are preceded by the σ character (lower-case sigma) and are listed with a hexadecimal value of E5h. They easily located the deleted files. After completing their search, they were able to identify the nature of the deleted files by their names and extensions and even recover some of the image fragments.

DOS-Based Operating Systems File Deletions

The file deletion process in DOS-based operating systems is a two-step process. In the first phase, the operating system marks the file entry with a lower-case sigma character× σ. This character has a hexadecimal value of E5h. In phase two, it clears the FAT chain marking all data blocks as empty. In principle, many operating systems handle file deletions in similar fashion.

Using an undelete utility, like Norton's Utility suite, the file recovery software searches the file directory tree for file names beginning with σ and labeled with the value of hexadecimal E5h. Once found, the utility starts at the file cluster offset that is specified in the directory entry. If the file cluster is not claimed by another file in the block allocation table (FAT), then the utility will indicate the file has a good chance of recovery. Many commercial file recovery utilities will reconstruct the deleted file by replacing the sigma character with another recognizable character and rebuild the FAT table. In processing, the utility looks to the file size specified in the directory entry and determines if that block is free. If it is possible, the program will advise that the file has a good chance of being recovered.

Reading E-Mail Headers

As it appears in your e-mail client, it seems that e-mail is passed directly from the sender to the recipient without any intermediate steps. Typically, an e-mail passes through at least four computers in its route. In the case of an ISP whose users connect via dial-up, DSL, Cable Internet, or T1, the client is the user's machine and the actual mail server belongs to the client's ISP. To review the process, when a user sends e-mail, she normally composes the message on her workstation and sends it off to either the mail server located within the company of the ISP. At this point, her workstation usually keeps a copy of the e-mail in the send folder. Even if she deletes the contents of the send folder, the e-mail will reside in the deleted folder until she deletes them from this folder.

Experience Note

It is possible that the e-mail client is configured to automatically empty the deleted folder, but as you have seen, there are ways to recover deleted files.

From her workstation, the e-mail server receives it and the server begins to look for the recipient's e-mail server, exchanging information packets with this server and eventually delivering the e-mail message. It does not really matter whether she is sending her e-mail through the Internet or merely within her own organization. For practical purposes, the process is basically the same. This e-mail will reside on this server until the recipient accesses his e-mail client and reads the e-mail. Of course, there are times depending on the type of e-mail configuration and the type of e-mail server, the e-mail server retains a copy of the e-mail or downloads the e-mail to the recipient's e-mail client located on the workstation. It is very possible that although the e-mail was downloaded to the recipient's workstation and the account emptied of the e-mail, there is a copy of the e-mail located on the e-mail server's backup storage. Tenacious investigators will pursue the chances of obtaining a copy of the e-mail from one of the many e-mail servers involved in the message transmission and receipt.

E-Mail Processing

For example, consider the users <alice@largeU.edu> and <bob@biggycorp.com.> Bob is a dial-up user at biggycorp.com while Alice is located on the university network, largeU.edu. If Alice wants to send an e-mail to Bob, she composes it at her workstation and the message is transmitted to the mail server at largeU.edu. In her mind, this is the last time she will see the e-mail. Alice's server contacts Bob's e-mail server at biggycorp.com and delivers Alice's message to Bob's e-mail server where it resides until Bob retrieves it through his e-mail client.

During this e-mail processing, there are headers added to the message: at the time of Alice's e-mail composition, when that program passes the e-mail to the largeU e-mail server, and when the e-mail is transmitted to Bob's e-mail server at biggycorp.com.

As generated by Alice's server and transmitted to e-mail.largeU.edu:

From: <Alice@largeU.edu>
To: <Bob@biggycorp.com>
Date: Tue, Mar 18 2003 18:20:15PST
X-Mailer: Sendmail v2.2
Subject: Up for lunch today?

This is the transmission from Alice's server to Bob's e-mail server:

Received: from theta.largeU.edu (theta.largeU.edu [106.104.3.33]) by e-mail.largeU.edu id 004A21; Tue, Mar 18 2003 14:36:17-0800 (PST)
From: <Alice@largeU.edu>
To: <Bob@biggycorp.com>
Date: Tue, Mar 18 2002 18:20:15PST
Message-Id: <rev011897133451-000145298@e-mail.largeU.edu>
X-Mailer: Sendmail v2.2
Subject: Up for lunch today?

Below is Alice's e-mail header when mailhost.biggycorp.com completes processing the message and stores it for Bob to retrieve:

Received: from e-mail.largeU.edu (e-mail.largeU.edu [127.234.3.78]) by mailhost.biggycorp.com with SMTP id CTX39794 for Tue, 18 Mar 2003 14:39:24-0800 (PST)
Received: from theta.largeU.edu (theta.largeU.edu [106.104.3.33]) by e-mail.largeU.edu id 004A21; Tue, Mar 18 2003 14:36:17-0800 (PST)
From: <Alice@largeU.edu>
To: <Bob@biggycorp.com>
Date: Tue, Mar 18 2003 18:20:15PST
Message-Id: <sdh011897133451-000145298@e-mail.largeU.edu>
X-Mailer: Sendmail v2.2
Subject: Up for lunch today?

Here's a line-by-line description of these headers and exactly what each one means:

Received: from email.largeU.edu	This e-mail was received from a machine calling itself email.largeU.edu.
(email.largeU.edu [127.234.3.78])	This header explains that it is really named email.largeU.edu and has the IP address 127.234.3.78.
with SMTP id CTX39794	The receiving server assigned the identification number CTX39794 to the message. This number is unique and can be used to look up the message in the server's log files.
for <<Bob@biggycorp.com>>;	This message was addressed to <Bob@biggycorp.com>
Tue, 18 Mar 2003 14:39:24-0800 (PST)	This e-mail was transmitted on Tuesday, March 18, 2003, at 14:39:24 (2:39:24 in the afternoon) Pacific Standard Time (which is 8 hours behind Greenwich Mean Time; hence the time written: "-0800").
Received: from theta.largeU.edu (theta.largeU.edu [106.104.3.33]) by email.largeU.edu id 004A21; Tue, Mar 18 2003 14:36:17-0800 (PST)	This line indicates the mail transmission from theta.largeU.edu (location of Alice's workstation) to email.largeU.edu (Alice's e-mail server) happened at 14:36:17 Pacific Standard Time. The sending machine called itself theta.largeU.edu. It is called theta.largeU.edu and has the IP address of 106.104.3.33. This line has assigned the ID number of 004A21 to this e-mail for internal logging and processing.
From: <Alice@largeU.edu>	The mail was sent by <Alice@largeU.edu.>
To: <Bob@biggycorp.com>	The letter is addressed to <Bob@biggycorp.com.>
Date: Tue, Mar 18 2003 18:20:15PST	The message was transmitted at 18:20:15 Pacific Standard Time on Tuesday, March 18, 2003.
Message-Id: alice031897143614-23446298@e mail.largeU.edu	The message has been given this number (by email.large.edu) to identify it. This ID is different from the SMTP ID numbers in the "Received" headers, because it is attached to this message for life; the other IDs are only associated with specific mail transactions at specific machines. In this fashion, one machine's ID number means nothing to another machine.
X-Mailer: Sendmail v2.2	The message was sent using a UNIX program called Sendmail, version 2.2.

E-Mail with Firewall Headers

From the vantage of another computer trying to deliver e-mail to a system behind a firewall, it has to exchange information with the firewall. Of course the firewall performs like another machine that is passing e-mail. Using our e-mail example from above, it would be modified somewhat to resemble this:

Received: from firewall.biggycorp.com (firewall.biggycorp.com [121.214.13.129]) by mailhost.biggycorp.com with SMTP id CTX39794 for <<Bob@biggycorp.com>>; Tue, 18 Mar 2003 14:40:34-0800 (PST)
Received: from email.largeU.edu (email.largeU.edu [127.234.3.78]) by firewall.biggycorp.com with SMTP id CTX39794 for; Tue, 18 Mar 2003 14:39:24-0800 (PST)
Received: from theta.largeU.edu (theta.largeU.edu [106.104.3.33]) by email.largeU.edu id 004A21; Tue, Mar 18 2003 14:36:17-0800 (PST)
From: <Alice@largeU.edu>
To: <Bob@biggycorp.com>
Date: Tue, Mar 18 2003 18:20:15PST
Message-Id: <Alice :<alice031897143614-23446298@e-mail.largeU.edu>
X-Mailer: Sendmail v2.2
Subject: Up for lunch today?

If an outgoing e-mail message from largeU.edu were passed through a firewall, there would be an added Received line inserted by the outgoing firewall. In this same fashion, it is feasible that there are common routing points for this e-mail. For example, if biggycorp.com maintains machines in different physical locations and uses separate mail servers, it would not be outside the possibility of having many headers like this example:

Received: from mailgate.biggycorp.com (mailgate.biggycorp.com [121.214.34.102]) by mail5.biggycorp.com with SMTP id PDA30141 for <<Bob@biggycorp.com>>; Tue, 18 Mar 2003 14:41:08-0800 (PST)
Received: from firewall.biggycorp.com (firewall.biggycorp.com [121.214.13.129]) by mailgate.biggycorp.com with SMTP id CTX39794 for <<Bob@biggycorp.com>>; Tue, 18 Mar 2003 14:40:34-0800 (PST)
Received: from firewall.largeU.edu (firewall.largeU.edu [127.234.4.13]) by firewall.biggycorp.com with SMTP id PDA28874 for <<Bob@biggycorp.com>>; Tue, 18 Mar 2003 14:39:34-0800 (PST)
Received: from email.largeU.edu (email.largeU.edu [127.234.3.78]) by firewall.largeU.edu with SMTP id PDA61271; Tue, 18 Mar 2003 14:39:08-0800 (PST)
Received: from theta.largeU.edu (theta.largeU.edu [106.104.3.33]) by mail.largeU.edu id 004A21; Tue, Mar 18 2003 14:36:17-0800 (PST)
From: <Alice@largeU.edu>
To: <Bob@biggycorp.com>
Date: Tue, Mar 18 2003 18:20:15PST
Message-Id: <Alice031897143614-2344628@email.largeU.edu>
X-Mailer: Sendmail v2.2
Subject: Up for lunch today?

The history of the e-mail can be seen by reading the Received headers from bottom to top. It traveled from theta.largeU.edu to e-mail.largeU.edu, to firewall.largeU.edu to firewall.biggycorp.com to mailgate.biggycorp.com to mail5.biggycorp.com. Here the e-mail is stored, waiting for Bob to read it.

Relaying

The following examples provide some e-mail header possibilities:

Received: from emailserver.emailpassing.com (emailserver.emailpassing.com [98.134.34.32]) by email.largeU.edu id 004B32 for <<Alice@largeU.edu>>; Wed, Jul 20 2003 16:39:50-0800 (PST)
Received: from tipidwater.com ([104.128.23.205]) by emailserver.emailpassing.com with SMTP id PDA12741; Wed, Jul 20 2003 19:36:28-0500 (EST)
From: Shameless Spammer <<junkmail@tipidwater.com>>
To: (recipient list suppressed)
Message-Id: <w45ppz23-34ls5@emailserver.emailpassing.com>
X-Mailer: Massive Annoyance
Subject: FREE PRESCRIPTION DRUGS

From an investigator's point of view, there are some interesting features in this header example. The message originated at tipidwater.com and was transmitted to emailserver.emailpassing.com to its ultimate destination, email.largeU.edu. Basically, tipidwater.com merely connected to the SMTP port at e-mailserver.e-mailpassing.com and directed this server to transmit the e-mail message to <Alice@largeU.edu>. Spammers frequently use this type of e-mail forwarding in order to disguise their true e-mail location and avoid detection and identification. They simply look for an open SMTP machine that is poorly configured to relay e-mail from their location. Pointing their e-mail client to that e-mail relaying machine, spammers send their e-mail using the resources and bandwidth of the organization's SMTP server.

Experience Note

E-mail servers should not be allowed to relay e-mail except from IP addresses originating within the organization's networks. Poorly configured e-mail servers mark the organization as having sloppy system configurations.

Common E-Mail Headers

Message-Id: The Message-Id is a unique identifier assigned to the message usually by the first e-mail server. It is in the format of <Alice@largeU.edu>. "Alice" is the identification of the e-mail's origin and the second part is the name of the domain. Any e-mail where the message ID is malformed is not the real site of origin and is indicative of a forgery (spoofed).
Content-Transfer-Encoding: This header information indicates MIME (Multipurpose Internet Mail Extensions). MIME is a method of enclosing non-text content in e-mail messages. It does not have any affect on delivery of e-mail, but it permits MIME-compliant mail programs to handle message content.
Mime-Version: (also seen as MIME-Version) This is another MIME header. This header identifies the version of the MIME protocol used by the message-sender.
Newsgroups: This header only appears in e-mail posted to a Usenet (Newsgroup).
Reply-To: Identifies the reply e-mail address. Because this header has many purposes, it is often used by spammers to conceal their identities and origins.
X-Confirm-Reading-To: This e-mail header requests an automated confirmation reply.
X-Mailer: (also X-mailer) Information relating to the sender's e-mail software.

Network Resources

Here are two valuable resources when researching domain, ISP, and network contacts:

www.forensicsweb.com/downloads/cfid/isplist/isplist.htm
www.loc.gov/copyright/onlinesp/list/index.html

Networking Review

If a refresher is needed on IP addresses, a good resource is RFC 791, available at www.ietf.org/rfc/rfc0791.txt?number=791.
A review of the Open Systems Interconnect, OSI, model is available at www.inetdaemon.com/tutorials/theory/osi/.
A review of IP addresses is available at ftp://ftp.rfc-editor.org/in-notes/rfc1466.txt.
A review of domain name service is available at www.ietf.org/rfc/rfc1034.txt?number=1034.
A review of TCP/IP protocols is available at www.ietf.org/rfc/rfc1180.txt?number=1180.
There is a UNIX networking administration review available at www.unituebingen.de/zdv/projekte/linux/books/nag/node1.html.
A review of TCP/IP networks is available at www.onlamp.com/lpt/a/345.

< Day Day Up >