DD: A Forensic Duplication Tool | Anti-Hacker Tool Kit, Third Edition

< Day Day Up >

The dd tool is used to copy bits from one file to another. Copying bits in this manner is the basis for all forensic duplication tools. dd is versatile and the source code is available to the public. Furthermore, dd can be compiled on nearly every Unix platform. This section discusses the methods that dd can implement to perform a forensic duplication.

dd was written originally for data conversion by Paul Rubin, David MacKenzie, and Stuart Kem. The source code and man page don’t actually say what dd stands for, but it is generally thought of as “data dump.” dd is included in the GNU fileutils package and can be downloaded from http://mirrors.kernel.org/gnu/fileutils/.

Implementation

The command-line options pertinent to forensic duplication for dd are as follows:

if Specifies the input file to be read.
of Specifies the output file to be written.
bs Specifies the block size, in bytes, to be read and written.
count Specifies the number of blocks to copy from the input file to the output file.
skip Specifies the number of blocks to skip from the beginning before reading from the input file.
conv Allows extra arguments to be specified, some of which are as follows:
- notrunc Will not allow the output to be truncated in case of an error.
- noerror Will not stop reading the input file in case of an error (that is, if bad blocks were read in, the process would continue).
- sync Will fill the corresponding output bits with zeros when an input error occurs. This occurs only if it is used in conjunction with the notrunc option.

It should be obvious that dd operates with files rather than directly on physical devices. However, open-source Unix operating systems such as Linux and FreeBSD implement devices as files. These special files, located in the /dev directory, allow direct access to devices mediated by the operating system. Therefore, input files to dd can be entire hard drives, partitions of hard drives, or other devices. To create a forensic duplication of a hard drive, the hard drive device file (that is, /dev/hdb in Linux or /dev/ad1 in FreeBSD) will be the input file. To create a forensic duplication of a single partition, the input file will be the partition device file (that is, /dev/hdb1 in Linux or /dev/ad1s1 in FreeBSD).

Naturally, the next consideration is what the destination will be for the duplication. The destination could be another hard drive (using the device files mentioned), which is called a bit-for-bit copy of the source hard drive. We could extend this idea beyond using hard drives as the destination media and use a tape drive instead, albeit a far slower method. The destination could also be a regular file (also denoted as an evidence file), saved on any file system as a logical file. This is typically the way most modern forensic duplications are stored, because of the ease of manipulation when moving the evidence file between storage devices. Lastly, the destination could be the standard output (that is, output to the display). Although we cannot do anything with the data being output directly to the screen (standard out) at this point, later in this section we will examine a method of duplication that will rely on this method.

All three of these output destinations have been successfully used in the past for one reason or another when creating a forensic duplication. The type (also known as the method) of duplication is typically dictated by the problems encountered during the duplication attempt that are often out of the investigator’s control. For instance, if it is impossible to remove the hard drive from the source computer during a duplication and no other connectors are available to attach an additional storage hard drive, it would be difficult to save the hard drive’s contents directly to another hard drive. Similarly, you could not save the duplication to a regular file because it would have to be copied to media already in the source computer, therefore overwriting potential evidence. The only choice in such a case would be to image over a network, as will be discussed in upcoming sections.

Many options in dd can make forensic duplication more efficient. For instance, you can manipulate the block size that is copied to make the process faster for the host that dd is running on—the bs switch is typically chosen to be 1KB or 1MB at a time. Another option you should utilize is the conv switch, which allows extra optional parameters to dictate the copying process. Two highly recommended options are the noerror and notrunc parameters. These switches will ignore the occurrence of bad blocks read from the source media, so the copy will continue without truncating the output to the evidence media. An additional option of sync used with noerror will make those bad blocks from the input turn into zeros in the output.

Note

When duplicating CD-ROMS, be sure to use a block size that is a multiple of 2048 bytes.

It is always a good idea to generate a log when you’re performing a forensic duplication so that you can refer to it in the future or make it available for legal proceedings. The script command in Unix will capture the input and output of a Unix console or xterm session and save it to a file. It’s a good idea to run the following command before you start your duplication. You should type exit after you finish duplication.

forensic# script /root/disk.bin.duplication

Caution

Upcoming sections will show forensic duplication either on the forensic workstation (the destination computer) or on the source machine (the victim computer) and will transmit the duplication across the network. When you save the script file, you don’t want to save it on the media you are duplicating, as it will destroy some of the evidence. Because this file is small, it is best saved to a floppy disk if it cannot be saved to the destination hard drive’s logical file system.

Forensic Duplication #1: Exact Binary Duplications of Hard Drives

To create a mirror image copy of a hard drive using dd, you must tell the utility which source hard drive will be the input file and which will be the output (the evidence) hard drive where you will store the image.

You can determine which hard drive is the source and which is the destination by studying the output of the dmesg command. In both Linux and FreeBSD, the dmesg command will present information that appeared on the console as the machine was booted (and any other console messages that appeared since bootup). Determining which hard drive is which isn’t a scientific process; rather, you might want to connect to a storage hard drive created by a manufacturer different from that of the source hard drive, which makes it obvious which is the source and which is the destination. After you have cleansed the destination hard drive at /dev/hdd (discussed later in the section "dd: A Hard Drive Cleansing Tool"), the following syntax can be used to create a forensic duplication from a source hard drive attached to /dev/hdc in Linux:

forensic# dd if=/dev/hdc of=/dev/hdd bs=1024 conv=noerror,notrunc,sync

Caution

This process could delete all data, file system structures, and unallocated space from your source drive. Be very careful when assigning the source and destinations with the dd command.

Forensic Duplication #2: Creating a Local Evidence File

In the first method, we processed a bit-for-bit copy from the source hard drive and laid it on top of the destination hard drive. Using this method, we cannot simply copy the evidence from one media to another. A method that facilitates simpler management of the evidence is to create a logical file that is a bit-for-bit representation of the source hard drive. Obviously, we should never save the evidence file to the source hard drive or we may destroy evidence. The following command demonstrates the creation of a forensic duplication from a source hard drive on /dev/hdc to a regular file located at /mnt/storage/disk.bin in Linux.

Caution

This process could delete all data, file system structures, and unallocated space from your source drive. Be very careful when assigning the source and destinations with the dd command.

forensic# dd if=/dev/hdc of=/mnt/storage/disk.bin bs=1024 conv=noerror, ¬ notrunc,sync

The process for creating a duplication in other flavors of Unix operating systems is similar. The only difference is signifying the correct device filename for the input. The following command demonstrates duplicating a source Windows 98 hard drive within FreeBSD. The source drive is connected to /dev/ad0 and the result is an evidence file located at /mnt/storage/disk.bin.

forensic# dd if=/dev/ad0 of=/mnt/storage/disk.bin bs=1024 conv=notrunc, ¬ noerror,sync         20044080+0 records in 20044080+0 records out 20525137920 bytes transferred in 5665.925325 secs (3622557 bytes/sec)     forensic# cd /mnt/storage     forensic# ls -al total 20048997 drwxr-xr-x  2 root  wheel           512 Jan 15 13:30 . drwxr-xr-x  7 root  wheel           512 Jan 15 11:58 .. -rw-r--r--  1 root  wheel   20525237920 Jan 15 13:30 disk.bin

Caution

Some file systems have file size limitations. For example, older file systems may be able to support only 2GB files, while newer file systems may be larger. Be sure to check the limitations of your destination file system before you begin imaging.

In the preceding example, if we were to encounter an error during the duplication process, the number of records in will not match the number of records out. For instance, if one bad block were present, the following would have been output from dd:

forensic# dd if=/dev/ad0 of=/mnt/storage/disk.bin bs=1024 conv=notrunc,noerror     20044079+1 records in 20044080+0 records out

The +1 field indicates the number of records that were read and had errors. When this happens, because we provided the conv=notrunc,noerror,sync arguments, dd will pad the matching block in the output with zeros. Because the block size is 1024 (indicated with the bs argument), 1024 bytes of data are unreliable in our forensic duplication. If we were to calculate the MD5 checksum for /dev/ad0 and /mnt/storage/disk.bin, it would be highly probable that these two files would not match. In short, this output is the reason we would want to run the script command so we could document this error in our investigative report.

Sometimes an investigator will create many output evidence files for a single-source hard drive or partition. This usually occurs when the investigator wants the evidence files to be small enough to fit on a CD-ROM for archiving, or when the host file system does not support files of enormous length. This problem can be solved using a combination of skip and count switches. The skip switch dictates the position where dd will start copying from in the input file. The count switch dictates how many blocks, denoted with the bs switch, dd will read from the input source file. Therefore, running a combination of dd commands with incrementing skip and count switches will create many output files, as seen here:

forensic# dd if=/dev/hdc of=/mnt/storage/disk.1.bin bs=1M skip=0  count=620 ¬ conv=noerror,notrunc,sync     forensic# dd if=/dev/hdc of=/mnt/storage/disk.2.bin bs=1M skip=620 count=620 ¬ conv=noerror,notrunc,sync     forensic# dd if=/dev/hdc of=/mnt/storage/disk.3.bin bs=1M skip=1240 count=620 ¬ conv=noerror,notrunc,sync     forensic# dd if=/dev/hdc of=/mnt/storage/disk.4.bin bs=1M skip=1860 count=620 ¬ conv=noerror,notrunc,sync

Caution

If you are using these commands, you are probably splitting the large duplication into many smaller pieces for archival (on a CD-ROM or otherwise). Be sure that you verify, via MD5 checksum (discussed later in this chapter), the individual files combined when you’re transferring them from one media to the next.

When you need to reassemble the different parts of the duplication that represent the source hard drive to analyze it, use the following command:

forensic# cat disk.1.bin disk.2.bin disk.3.bin disk.4.bin > disk.whole.bin

Finally, you can speed up the process of duplication by varying the block size. Because sectors on the disk are 512 bytes, you can speed up the read and write time by changing to a bigger block size with the bs switch. The following command demonstrates how the process was accelerated when duplicating a portion of an external hard drive in FreeBSD:

freebsd# /usr/bin/time -h dd if=/dev/ad0 of=test.bin bs=512 count=200000 ¬ conv=notrunc,noerror,sync 200000+0 records in 200000+0 records out 102400000 bytes transferred in 69.452716 secs (1474384 bytes/sec)         1m9.51s real            0.28s user              8.46s sys     forensic# /usr/bin/time -h dd if=/dev/ad0 of=test.bin bs=1024 count=100000 ¬ conv=notrunc,noerror,sync     100000+0 records in 100000+0 records out 102400000 bytes transferred in 41.785020 secs (2450639 bytes/sec)         41.79s real             0.20s user              4.42s sys

You may be unfamiliar with the time command, which simply places a stopwatch on the command you supply. In this example, time times the duplication process from start to finish and supplies the real, user, and system time. Notice how the real time is less when we increase the block size from 512 to 1024. This happens because it is more efficient for the workstation to read (and write) 1024 bytes at a time than 512 (for the same given total file size). The preceding commands copied only approximately 100MB of information. Imagine the efficiency if we increased the block size for an 80GB hard drive! Of course at some point you’ll experience diminishing returns, so you may want to experiment with your particular hardware to see what works best for you.

Forensic Duplication #3: Creating a Remote Evidence File

Typically as a last resort, the forensic duplication could be transmitted to a separate workstation altogether. This can be accomplished by redirecting dd’s standard output and redirecting it through Netcat (or Cryptcat) to another machine connected by a TCP/IP network. (For a discussion on Netcat or Cryptcat, see Chapter 1.)

The source machine containing the media to be imaged must be booted with a trusted floppy disk or CD-ROM into Linux or FreeBSD. You do not have to go to great lengths to create a trusted floppy or CD-ROM, as whole projects have been dedicated to creating an application to accomplish this. One example is Trinux, which you can research at http://trinux.sourceforge.net to see whether it is right for you. The destination workstation should be booted into a Unix environment to keep things consistent. After that, forensic duplication over a network can be accomplished with simple commands.

Note

It is not necessary that the forensic workstation is booted into a Unix operating system. The destination workstation can be a Windows operating system instead.

On the destination workstation, execute the following command:

forensic# nc -l -p 2222 > /mnt/storage/disk.bin

You should use Cryptcat to transmit the forensic duplication. Cryptcat gives you two benefits Netcat will not: validation and secrecy. Because the data is encrypted on the source machine, decryption on the destination workstation should produce a bit-for-bit copy of the input. If an attacker were changing bits midstream on the network, the output would be significantly altered after decryption. Furthermore, an attacker could not capture an exact copy of the source machine’s hard drive on the network with a sniffer such as tcpdump or Ethereal (see Chapter 14). If an attacker is capable of acquiring the duplication, too, he would be able to sidestep any local security measures and examine all files just like a forensic analyst!

On the source machine, execute the following command (the forensic workstation uses 192.168.1.1 as its IP address):

source# dd if=/dev/hdc bs=1024 conv=noerror,notrunc,sync | nc 192.168.1.1 2222

< Day Day Up >