All digital data are basically combinations of ones and zeros, commonly called bits. It is often necessary for digital investigators to deal with data at the bit level, requiring an understanding of how different systems represent data. For instance, the number 511 is represented as 00000001 11111111 on big-endian systems (e.g. computers with Motorola processors such as Macintosh; RISC-based computers such as Sun). The same number is represented as 1111111100000001 on little-endian systems such as Intel-based computers. In other words, big-endian architectures place the most significant bytes on the left (putting the big end first) whereas little-endian architectures place the most significant bytes on the right (putting the little end first).[1]
Whether little- or big-endian, this binary representation of data (ones and zeros) is cumbersome. Instead, digital investigators often view the hexadecimal representation of data. Another commonly used representation of data is ASCII. The ASCII standard specifies that certain combinations of ones and zeros represent certain letters and numbers. Table 8.1 shows the ASCII and hexadecimal values of capital letters.
Letter | Hexadecimal | ASCII |
---|---|---|
A | 41 | 65 |
B | 42 | 66 |
C | 43 | 67 |
D | 44 | 68 |
E | 45 | 69 |
F | 46 | 70 |
G | 47 | 71 |
H | 48 | 72 |
I | 49 | 73 |
J | 4A | 74 |
K | 4B | 75 |
L | 4C | 76 |
M | 4D | 77 |
N | 4E | 78 |
O | 4F | 79 |
P | 50 | 80 |
Q | 51 | 81 |
Y | 59 | 89 |
Z | 5A | 90 |
Conceptually, programs that display each byte of data in hexadecimal and ASCII format are like microscopes, allowing digital investigators to view features that are normally invisible. For instance, Word documents contain data that are not generally visible but can be displayed using a program like WinHex[2] as shown in Table 8.2 with hexadecimal on the left and ASCII on the right.
Hexadecimal | ASCII |
---|---|
1e 10 00 00 01 00 00 00 0a 00 00 00 43 68 61 70 | ............Chap |
74 65 72 20 38 00 0c 10 00 00 02 00 00 00 1e 00 | ter 8........... |
00 00 06 00 00 00 54 69 74 6c 65 00 03 00 00 00 | ......Title..... |
01 00 00 00 98 00 00 00 03 00 00 00 00 00 00 00 | ................ |
20 00 00 00 01 00 00 00 36 00 00 00 02 00 00 00 | .......6....... |
3e 00 00 00 01 00 00 00 02 00 00 00 0a 00 00 00 | >............... |
5f 50 49 44 5f 47 55 49 44 00 02 00 00 00 10 27 | _PID_GUID.......' |
00 00 41 00 00 00 4e 00 00 00 7b 00 30 00 43 00 | ..A...N...{.0.C. |
33 00 37 00 34 00 46 00 30 00 30 00 2d 00 42 00 | 3.7.4.F.0.0.-.B. |
37 00 30 00 30 00 2d 00 31 00 31 00 44 00 32 00 | 7.0.0.-.1.1.D.2. |
2d 00 38 00 46 00 43 00 46 00 2d 00 39 00 35 00 | -.8.F.C.F.-.9.5. |
46 00 39 00 43 00 38 00 34 00 37 00 41 00 31 00 | F.9.C.8.4.7.A.1. |
33 00 30 00 7d 00 00 00 00 00 00 00 00 00 00 00 | 3.0.}........... |
The difference between little- and big-endian representations is most apparent when converting data from their computer representation into a more readable form. For instance, Table 8.3 shows the first two lines of a tcpdump file created on an Intel-based computer (left) compared with a tcp-dump file created at the same time on a Sun computer (right). As discussed in Chapter 11, UNIX represents the date "Sat, 10 May 2003 08:37:01 GMT" using the sequence of bytes shown in Table 8.3 in bold - the different byte order on both systems is clearly visible.
Linux on Intel (little-endian) | Solaris on Sun (big-endian) |
---|---|
D4C3B2A1 02000400 00000000 00000000 | A1B2C3D4 00020004 00000000 00000000 |
60000000 01000000 2DBABC3E 46C30500 | 00000044 00000001 3EBCBA2D 0004BFF0 |
An awareness of byte order is also required when searching through digital evidence for specific combinations of bytes.
[1]The terms big-endian and little-endian are based on the story in Gulliver's Travels, in which the Lilliputians' main political conflict was whether soft-boiled eggs should be opened on the big end or the little end.
[2]http://www.winhex.com