15.7 Investigative Reconstruction

The fundamentals of investigative reconstruction covered in Chapter 5 do not change when networks are involved. For instance, it may be necessary to perform a relational reconstruction to discern patterns in evidence obtained from a network. For instance, Figure 15.5 shows network traffic represented as host-to-host connections, highlighting one host that is generating the most activity and deserves further attention.

click to expand
Figure 15.5: Network traffic depicted in IP address-IP address connections creating a circular mesh using NetIntercept.

Creating this type of link diagram showing client-server connections can help identify important systems. For instance, in computer intrusion investigations, first focusing on the attacker's IP address can reveal which hosts were targeted and then examining traffic from each target can show which systems were compromised. Examining traffic from a compromised target can give investigators a general sense of what the attacker did on the system.

However, the reconstruction process can be more challenging when networks are involved. A criminal or victim can be several (virtual) places on a network at any given time, making the reconstruction process more complicated and arduous. For instance, a computer intruder may be sharing information with accomplices on IRC while they are breaking into computers around the world. Also, because it is difficult to obtain all relevant digital evidence on a network, there are often gaps in parts of the crime reconstruction.

CASE EXAMPLE

In an intellectual property theft case, one suspect has been identified but his contact within the organization is unknown. Most of the prime suspect's activities during the key time period are known except for details of his connections to Hushmail and Ziplip. Evidence on his hard drive indicates that he received stolen data at the time but it cannot be determined who sent them. Also, log files on the victim organization's network indicate that the prime suspect used a second dial-up account to access the Internet, connect to the organization's systems, and steal information but the Internet Service Provider for this second account does not have related log files. Without these intermediate log files, the continuity of offense cannot be established and the activities cannot be attributed to the offender.

An offender can also use the Internet to conceal his actual location by connecting through computers in other parts of the country or world. Computer intruders use this technique, launching their attack from a compromised computer in a distant location to hide their IP address and geographic location. Also, a Virtual Private Network (VPN) securely extends a local area network to anywhere in the world, providing an encrypted tunnel from the individual's computer at a remote location to the local network. In this way, people can connect their computers to a remote VPN server and obtain an IP address on that network, giving the impression that their computers are on the remote network (Figure 15.6).

click to expand
Figure 15.6: VPN connection makes an offender in California appear to be in Connecticut, throwing investigators off track and giving the victim a false sense of security.

When AOL users access Web pages and some other Internet resources (AOL IM), their connections pass through proxies that AOL uses to manage network bandwidth but that conceals the individual's actual IP address. Other types of connections do not pass through these proxies (e.g. a Telnet connection to a server on the Internet) and so disclose users' IP addresses that can be tied to an AOL user account.

Developing relational reconstructions is made more difficult by the mobility of hosts and changeability of networks. Computers can be moved, IP addresses reassigned, DNS entries changed, and individuals can connect to a computer remotely, or through a number of systems. Therefore, before assuming that an individual was in a particular location simply based on an IP address or the current location of the computer, examine the alternative possibilities closely. Furthermore, be careful not to assume too much from a log entry. A connection attempt recorded in network logs does not necessarily imply that an individual gained access to the system. Additional corroborating data is needed to determine if the individual successfully entered the system. Also, a functional analysis may reveal that the computer in question was configured to prevent such access.

Fortunately, networks often contain multiple sources of corroborating data that can be used to fill in any gaps, improve the fidelity of a reconstruction, and generally increase the certainty of what occurred. An intrusion investigation involving a Linux server compromised via FTP demonstrates the value of corroborating sources of evidence on a network.

CASE EXAMPLE

A computer intrusion was quickly detected by Tripwire when several system components were replaced using a rootkit (e.g. /bin/login, /usr/bin/du, /usr/bin/top, /usr/bin/killall, /usr/bin/find) The following entry in /var/log/secure showed a connection to the FTP server at the time:

    Apr 24 22:50:34 ftpserver in.ftpd[2103]: connect from 62.30.247.138

There was a corresponding entry in /var/log/wtmp as shown here:

    ftp ftp pc-62-30-247-138-do.blueyonder.co.uk [62.30.247.138] Tue Apr 24    22:50-22:50 (00:00)

This unauthorized connection was partially supported by the following entry in /var/log/messages, the only difference being the time stamp.^[9]

    Apr 25 02:50:40 ftpserver ftpd[2103]: ANONYMOUS FTP LOGIN FROM    pc-62-30-247-138-do.blueyonder.co.uk [62.30.247.138], guest@here.com

Knowing that the intruder could have altered logs on the compromised host, digital investigators checked the intrusion detection system logs for a corresponding entry but did not find in one. However, they did find an entry for a different time and source.

    [**] FTP-site-exec [**]    04/25-02:48:44 04/25-02:49:37 63 62.122.10.221 -> 192.168.2.6S: 4158 D: 21

To get a more detailed picture of what occurred, the digital investigators searched the NetFlow logs for all connections to and from the compromised computer. They found that the original connection from blueyonder.co.uk at 22:50:34 was part of a broader scan for FTP servers, which was not logged by the intrusion detection system. The NetFlow logs also showed that the actual intrusion occurred at 02:47:12 from 62-122-10-221.flat.galactica.it and that the intruder downloaded a patch from RPMfind.net and fixed the vulnerability. Intruders often fix the vulnerability they exploit to prevent other intruders from gaining unauthorized access and to hide the fact that the system may be compromised (if computer security professionals scan the system for vulnerabilities it will not raise an alarm).

The intrusion detection system and NetFlow logs provided more reliable sources of digital evidence (C4 on the Certainty Scale discussed in Chapter 7) than the tampered logs on the compromised host (CO). Rather than the intrusion coming from the United Kingdom, the intrusion actually originated in Italy.

Piecing together the large amounts of data that are common in network investigations can also be a challenge. One approach is to only extract portions that seem relevant to the investigation. Consider a harassment case in which the offender was reading the victim's e-mail via a Web proxy.

CASE EXAMPLE

Starting with the e-mail server logs shown below, digital investigators determined when the offender was accessing the victim's account and that he was connected through a Web proxy.

    Apr 4 18:12:29 mailsrv imapd4[18788]: Login user= tsmith host=    www-proxy.domain.net [10.10.2.10]    Apr 4 18:16:03 mailsrv imapd4[18788]: Logout user= tsmith host=    www-proxy.domain.net [10.10.2.10]    Apr 5 17:52:47 mailsrv imapd4[19405]: Login user= tsmith host=    www-proxy.domain.net [10.10.2.10]    Apr 5 17:56:14 mailsrv imapd4[19405]: Logout user= tsmith host=    www-proxy.domain.net [10.10.2.10]    Apr 6 19:01:56 mailsrv imapd4[19956]: Login user= tsmith host=    www-proxy.domain.net [10.10.2.10]    Apr 6 19:04:42 mailsrv imapd4[19956]: Logout user = tsmith host=    www-proxy.domain.net [10.10.2.10]

Extracting the portions of the Web proxy logs that corresponded to the e-mail server logs, digital investigators found the offender's IP address. As an example, the following simplified log segment from April 6, 2003 shows the e-mail of a victim of harassment being accessed through the Web proxy from IP address 172.16.34.14.

    172.16.34.14, anonymous, 4/6/02, 19:01:24, WWW-PROXY, mailsrv.ispX.com,    GET, http://mailsrv.ispX.com/login.html, 200    172.16.34.14, anonymous, 4/6/02, 19:02:02, WWW-PROXY, mailsrv.ispX.com,    GET, http://mailsrv.ispX.com/tsmith/inbox.html, 200    172.16.34.14, anonymous, 4/6/02, 19:03:27, WWW-PROXY, mailsrv.ispX.com,    GET, http://mailsrv.ispX.com/tsmith/message13.html, 200    172.16.34.14, anonymous, 4/6/02, 19:04:36, WWW-PROXY, mailsrv.ispX.com,    GET, http://mailsrv.ispX.com/tsmith/message14.html, 200

The offending IP address was a DSL account and the ISP provided investigators with the subscriber information, including his home address. This individual was the victim's ex-boyfriend who used a Web proxy to conceal his IP address while connecting to the victim's e-mail account. A search of his computer revealed incriminating Web browser history logs and portions of the victim's e-mail messages, confirming that the suspect's computer had been used to access the victim's e-mail account. In conclusion, the harasser's computer was located using e-mail server and Web proxy server logs (C-value C4) and implicating evidence was found on his computer (C-value C5), indicating that it was used to commit the offense.

The main problem with only extracting portions of logs is that important details might be missed. For instance, in the previous example, Web proxy logs from prior days might have shown the harasser accessing the victim's e-mail many times over an extended period, demonstrating persistent and intentional spying as opposed to a single, isolated event.

Another approach to dealing with large amounts of network related data is to reconstruct smaller, more manageable portions of the crime separately before combining them into complete crime reconstruction. For example, when criminal activity is spread out over an extended period of time, prioritizing and focusing on several critical periods and locations before combining them into a larger reconstruction will provide clues and leads more quickly than trying to reconstruct the entire crime all at once.

CASE EXAMPLE

A computer intruder broke into several servers over a period of months. It was not initially clear that the same individual had compromised all of these servers. The commonalities between these intrusions were only apparent after individual timelines were created using log files and file date-time stamps from each of the compromised systems. A rough timeline of the entire incident was constructed, providing an overview of events, but the individual timelines for each system were also useful to investigators in the long run because they contained more details.

It may not be possible to identify critical periods in a crime without performing some analysis on all available log files. Logs from routers, firewalls, intrusion detection systems, and other sources may only reveal important patterns when combined.^[10] For instance, when an intruder is targeting systems on a network, firewall logs may only show a few denied connection attempts that do not cause alarm on their own. Similarly, when viewed independently, system logs on the targeted hosts may not cause alarm. However, when combined with router and intrusion detection system logs, it may become clear that the denied connections were part of a more widespread series of attacks against several systems on the network. When performing temporal analysis on multiple log files, it is generally more efficient to combine them before sorting them and analyzing them for patterns.

However, before combining log files, it is crucial to correct for time zone differences and system clock discrepancies. Even log files from a single system can contain date-time stamps with different time zones. For instance, Microsoft's Internet Information Server logs are in GMT by default whereas the NT Event Logs generally use the local time. Internet service providers like AOL have been known to adjust date-time stamps in their logs into British Summer Time instead of GMT, resulting in a 1-hour discrepancy. Additionally, it may be necessary to rearrange certain log files before combining them with others. For instance, some logs are ordered by end time (e.g. pacct, NetFlow) and may provide a clearer picture of events when they are sorted by start time.

In some cases, it may be necessary to determine how a criminal was able to commit the crime. For example, when an intruder breaks into a computer that appears to be secure, digital investigators may need to conduct a detailed functional reconstruction or even a reenactment to determine if an unknown vulnerability was exploited or if the intruder had inside information such as a password to the system. Whenever possible, as part of the functional reconstruction of a crime, investigators should replicate the process that created the digital evidence. When asked to testify that a certain process created a given piece of digital evidence, investigators may be asked if they verified the process or even to provide a demonstration. Additionally, trying to replicate the process can improve digital investigators' understanding of evidence and the criminal or victim. In a missing persons investigation, there was a question regarding how much an individual deliberated over a goodbye e-mail message. Creating a test e-mail message and comparing the time stamps in the header may indicate how long it took the author to compose the message. For instance, the time in the Message-ID line of the following message indicates that it was started at 1019 hours on November 19 and the other times in the header indicate that it was sent at 1103 hours, a difference of 44 minutes.

    Received: from mail.corpX.com (mail.corpX.com [192.168.5.18])         by lsh110.siteprotect.com (8.9.3/8.9.3) with ESMTP id KAA09889         for <eco@corpus-delicti.com>; Tue, 19 Nov 2002 10:03:36 -0600    Received: from localhost (sysadmln@localhost)         by mail.corpX.com (8.11.6/8.11.6) with ESMTP id gAJG3W725027         for <eco@corpus-delicti.com>; Tue, 19 Nov 2002 11:03:32 -0500    Date: Tue, 19 Nov 2002 11:03:32  -0500 (EST)    From: sysadmin <sysadmin@mail.corpX.com>    To: eco@corpus-delicti.com    Subject: Test time    Message-ID: <Pine.LNX.4.44.0211191019020.14986-100000@mail.corpX.com>

15.7.1 Behavioral Evidence Analysis

When examining digital evidence, particularly on networks, it is important to keep in mind that we are looking at effects of human activities and trying to reconstruct associated behavior and intent. People are creatures of habit to a certain degree - we seek the illusion of order, stability, and certainty in many areas of life. Our daily activities often revolve around things like our family, friends, meals, exercise, work, and entertainment. These activities can reflect our needs and, to some degree, our personalities and exposure to risk. For instance, bartenders and taxi drivers are at high risk of robbery and assault but also have access to a large number of potential victims. If someone becomes a victim, it is likely to occur through some aspect of his or her regular activities. If there is no clue how someone became a victim, some evidence may be missing or the targeting may have been opportunistic. Opportunistic is not to say random because the offender selected the victim with a purpose and for certain reasons, whether it was the time, place, or victim's appearance. Offenders have patterns in life and crime - again, these patterns as seen in evidence can reveal their needs.

Log files are a particularly rich source of behavioral evidence because they record so many actions. Using the information in these log files it is often possible to determine with a high degree of detail what an individual did or was trying to achieve. An appreciation for patterns of activity in log files can help digital investigators differentiate between an automated worm and a computer intruder gaining unauthorized access to a computer. In some cases it is possible to discern modus operandi behaviors from log file that can be used to determine if the same computer intruder was responsible for multiple intrusions. Patience, familiarity with data processing tools, and some understanding of the underlying technology are required to sift through large log files for the few pieces of relevant information but the effort will pay off in the long run as we become more reliant on technology.

It is often worthwhile to think about what the individual would have to do in order to achieve a given result, breaking activity into smaller segments and looking for signs of these segments. For instance, a computer intruder generally performs some level of surveillance of a target before attempting to break into the system. This approach can improve one's understanding of events, lead to additional sources or evidence, and give an indication of planning. Online sexual offenders often groom their victims to gain their trust - this can be a complex and prolonged process that can generate significant amounts of digital evidence.

CASE EXAMPLE

Individuals break into Web sites and vandalize the pages in retaliation for a perceived wrong and/or to assert their power over the owner(s) of the site. An obvious part of investigating this type of occurrence is to examine the log files of the Web server that was broken into for information about the intruders. Of course, this is obvious to intruders as well, so if they cannot delete the log files on the Web server they often break in from another computer that they have compromised. Typically, intruders will delete all of the digital evidence on the host they use to break into the Web server making it difficult for an investigator to track them down.

Fortunately, investigators can take advantage of a vandal's behavior and the Web server access log to narrow the pool of suspects. A vandal usually looks at the page after (and sometimes before) modifying it. The Web server access log contains IP addresses of computers that accessed the Web page. Therefore, by looking at entries in the log file around the time of the vandalism, investigators often find the IP address of the vandal. In many cases vandals use the browser on their personal computer to view the Web page so the IP address in the Web server access log is a direct link, bypassing any intermediate hosts that the vandal used to break into the Web server. Although it is not conclusive, this IP address can help investigators reconstruct the crime and find suspects.

Keep in mind that the same individual behavior can mean different things in different situations, so, rather than considering items of evidence in isolation, it is necessary to consider all activities together to gain insight into their overall meaning. Some individuals view Web pages via a Web proxy because the resources they are interested in are only accessible through the proxy. Some individuals use Web proxies to conceal their identities.

To understand how digital evidence on networks reflects behavior, it is instructive to consider some examples. When thieves target an organization's computer systems, their actions leave behind digital evidence that can reveal their intent, skill level, and knowledge of the target. Network logs may show a broad network scan prior to an intrusion, suggesting that the individual was exploring the network for vulnerable and/or valuable systems. This exploration implies that the individual does not have much prior knowledge of the network and may not even know what he/she is looking for but is simply prospecting. Conversely, thieves who have prior knowledge of their target will launch a more focused and intricate attack. For instance, if a thief only targets the financial systems on a network, this directness suggests that the intruder is interested in the organization's financial information and knows where it is located.

So, if the targeting is very narrow - the thief focuses on a single machine - this indicates that he/she is already familiar with the network and there is something about the machine that interests him/her. Similarly, time pattern analysis of the target's file system can show how long it took the intruder to locate desired information on a system. A short duration is a telltale sign that the intruder already knew where the data was located whereas protracted searches of files on a system indicates less knowledge.

The sophistication of the intrusion and subsequent precautionary acts help determine the perpetrator's skill level. The thief's knowledge of the target and his/her criminal skill can be very helpful in narrowing the suspect pool, particularly when only a few individuals possess the requisite knowledge and skills - suggesting insider involvement.

^[9]The particular FTP exploit used in this intrusion often inserts an incorrect time stamp, possibly because it is using the time on the computer used to launch the attack.

^[10]Commercial software is available for combining and analyzing log files but they are often limited to a few log formats or require customization to accommodate new log formats. Using such tools may be justified if they help digital investigators analyze log files they regularly encounter in many investigations. However, few tools surpass Perl and UNIX for special purpose tasks such as analyzing log files that are only encountered occasionally.