Digital Crime Scene Investigation Process | Digital Investigation Foundations

There is no single way to conduct an investigation. If you ask five people to find the person who drank the last cup of coffee without starting a new pot, you will probably see five different approaches. One person may dust the pot for fingerprints, another may ask for security camera tapes of the break room, and another may look for the person with the hottest cup of coffee. As long as we find the right person and do not break any laws in the process, it does not matter which process is used, although some are more efficient than others.

The approach that I use for a digital investigation is based on the physical crime scene investigation process [Carrier and Spafford 2003]. In this case, we have a digital crime scene that includes the digital environment created by software and hardware. The process has three major phases, which are system preservation, evidence searching, and event reconstruction. These phases do not need to occur one after another, and the flow is shown in Figure 1.1.

Figure 1.1. The three major phases of a digital crime scene investigation.

This process can be used when investigating both live and dead systems. A live analysis occurs when you use the operating system or other resources of the system being investigated to find evidence. A dead analysis occurs when you are running trusted applications in a trusted operating system to find evidence. With a live analysis, you risk getting false information because the software could maliciously hide or falsify data. A dead analysis is more ideal, but is not possible in all circumstances.

System Preservation Phase

The first phase in the investigation process is the System Preservation Phase where we try to preserve the state of the digital crime scene. The actions that are taken in this phase vary depending on the legal, business, or operational requirements of the investigation. For example, legal requirements may cause you to unplug the system and make a full copy of all data. On the other extreme could be a case involving a spyware infection or a honeypot[2] and no preservation is performed. Most investigations in a corporate or military setting that will not go to court use techniques in between these two extremes.

[2] A honeypot is "an information resource whose value lies in unauthorized or illicit use of that resource" [Honeynet Project 2004].

The purpose of this phase is to reduce the amount of evidence that may be overwritten. This process continues after data has been acquired from the system because we need to preserve the data for future analysis. In Chapter 3, "Hard Disk Data Acquisition," we will look at how to make a full copy of a hard disk, and the remainder of the book will cover how to analyze the data and search for evidence.

Preservation Techniques

The goal of this phase is to reduce the amount of evidence that is overwritten, so we want to limit the number processes that can write to our storage devices. For a dead analysis, we will terminate all processes by turning the system off, and we will make duplicate copies of all data. As will be discussed in Chapter 3, write blockers can be used to prevent evidence from being overwritten.

For a live analysis, suspect processes can be killed or suspended. The network connection can be unplugged (plug the system into an empty hub or switch to prevent log messages about a dead link), or network filters can be applied so that the perpetrator cannot connect from a remote system and delete data. Important data should be copied from the system in case it is overwritten while searching for evidence. For example, if you are going to be reading files, then you can save the temporal data for each file so that you have a copy of the last access times before you cause them to be updated.

When important data are saved during a dead or live analysis, a cryptographic hash should be calculated to later show that the data have not changed. A cryptographic hash, such as MD5, SHA-1, and SHA-256, is a mathematical formula that generates a very big number based on input data. If any bit of the input data changes, the output number changes dramatically. (A more detailed description can be found in Applied Cryptography, 2nd Edition [Schneier 1995].) The algorithms are designed such that it is extremely difficult to find two inputs that generate the same output. Therefore, if the hash value of your important data changes, then you know that the data has been modified.

Evidence Searching Phase

After we have taken steps to preserve the data we need to search them for evidence. Recall that we are looking for data that support or refute hypotheses about the incident. This process typically starts with a survey of common locations based on the type of incident, if one is known. For example, if we are investigating Web-browsing habits, we will look at the Web browser cache, history file, and bookmarks. If we are investigating a Linux intrusion, we may look for signs of a rootkit or new user accounts. As the investigation proceeds and we develop hypotheses, we will search for evidence that will refute or support them. It is important to look for evidence that refutes your hypothesis instead of only looking for evidence that supports your hypothesis.

The theory behind the searching process is fairly simple. We define the general characteristics of the object for which we are searching and then look for that object in a collection of data. For example, if we want all files with the JPG extension, we will look at each file name and identify the ones that end with the characters ".JPG." The two key steps are determining for what we are looking and where we expect to find it.

Part 2, "Volume Analysis," and Part 3, "File System Analysis," of this book are about searching for evidence in a volume and file system. In fact, the file system analysis chapters are organized so that you can focus on a specific category of data that may contain your evidence. The end of this chapter contains a summary of the popular investigation toolkits, and they all allow you to view, search, and sort the data from a suspect system so that you can find evidence.

Search Techniques

Most searching for evidence is done in a file system and inside files. A common search technique is to search for files based on their names or patterns in their names. Another common search technique is to search for files based on a keyword in their content. We can also search for files based on their temporal data, such as the last accessed or written time.

We can search for known files by comparing the MD5 or SHA-1 hash of a file's content with a hash database such as the National Software Reference Library (NSRL) (http://www.nsrl.nist.gov). Hash databases can be used to find files that are known to be bad or good. Another common method of searching is to search for files based on signatures in their content. This allows us to find all files of a given type even if someone has changed their name.

When analyzing network data, we may search for all packets from a specific source address or all packets going to a specific port. We also may want to find packets that have a certain keyword in them.

Event Reconstruction Phase

The last phase of the investigation is to use the evidence that we found and determine what events occurred in the system. Our definition of an investigation was that we are trying to answer questions about digital events in the system. During the Evidence Searching Phase, we might have found several files that violate a corporate policy or law, but that does not answer questions about events. One of the files may have been the effect of an event that downloaded it, but we should also try to determine which application downloaded it. Is there evidence that a Web browser downloaded them, or could it be from malware? (Several cases have used malware as a defense when contraband or other digital evidence has been found [George 2004; Brenner, Carrier, and Henninger 2004].) After the digital event reconstruction phase, we may be able to correlate the digital events with physical events.

Event reconstruction requires knowledge about the applications and the OS that are installed on the system so that you can create hypotheses based on their capabilities. For example, different events can occur in Windows 95 than Windows XP, and different versions of the Mozilla Web browser can cause different events. This type of analysis is out of the scope of this book, but general guidelines can be found in Casey [2004].

General Guidelines

Not every investigation will use the same procedures, and there could be situations where you need to develop a new procedure. This book might be considered a little academic because it does not cover only what exists in current tools. There are some techniques that have not been implemented, so you may have to improvise to find the evidence. Here are my PICL guidelines, which will hopefully keep you out of one when you are developing new procedures. PICL stands for preservation, isolation, correlation, and logging.

The first guideline is preservation of the system being investigated. The motivation behind this guideline is that you do not want to modify any data that could have been evidence, and you do not want to be in a courtroom where the other side tries to convince the jury that you may have overwritten exculpatory evidence. This is what we saw in the Preservation Phase of the investigation process. Some examples of how the preservation guideline is implemented are

Copy important data, put the original in a safe place, and analyze the copy so that you can restore the original if the data is modified.
Calculate MD5 or SHA hashes of important data so that you can later prove that the data has not changed.
Use a write-blocking device during procedures that could write to the suspect data.
Minimize the number of files created during a live analysis because they could overwrite evidence in unallocated space.
Be careful when opening files on the suspect system during a live analysis because you could be modifying data, such as the last access time.

The second guideline is to isolate the analysis environment from both the suspect data and the outside world. You want to isolate yourself from the suspect data because you do not know what it might do. Running an executable from the suspect system could delete all files on your computer, or it could communicate with a remote system. Opening an HTML file from the suspect system could cause your Web browser to execute scripts and download files from a remote server. Both of these are potentially dangerous, and caution should be taken. Isolation from the suspect data is implemented by viewing data in applications that have limited functionality or in a virtual environment, such as VMWare (http://www.vmware.com), that can be easily rebuilt if it is destroyed.

You should isolate yourself from the outside world so that no tampering can occur and so that you do not transmit anything that you did not want to. For example, the previous paragraph described how something as simple as an HTML page could cause you to connect to a remote server. Isolation from the outside world is typically implemented using an analysis network that is not connected to the outside world or that is connected using a firewall that allows only limited connectivity.

Note that isolation is difficult with live analysis. By definition, you are not isolated from the suspect data because you are analyzing a system using its OS, which is suspect code. Every action you take involves suspect data. Further, it is difficult to isolate the system from the outside world because that requires removing network connectivity, and live analysis typically occurs because the system must remain active.

The third guideline is to correlate data with other independent sources. This helps reduce the risk of forged data. For example, we will later see that timestamps can be easily changed in most systems. Therefore, if time is very important in your investigation, you should try to find log entries, network traffic, or other events that can confirm the file activity times.

The final guideline is to log and document your actions. This helps identify what searches you have not yet conducted and what your results were. When doing a live analysis or performing techniques that will modify data, it is important to document what you do so that you can later document what changes in the system were because of your actions.

Part I: Foundations

Digital Investigation Foundations