8.3 Representation of Data


8.3 Representation of Data

All digital data are basically combinations of ones and zeros, commonly called bits. It is often necessary for digital investigators to deal with data at the bit level, requiring an understanding of how different systems represent data. For instance, the number 511 is represented as 00000001 11111111 on big-endian systems (e.g. computers with Motorola processors such as Macintosh; RISC-based computers such as Sun). The same number is represented as 1111111100000001 on little-endian systems such as Intel-based computers. In other words, big-endian architectures place the most significant bytes on the left (putting the big end first) whereas little-endian architectures place the most significant bytes on the right (putting the little end first).[1]

Whether little- or big-endian, this binary representation of data (ones and zeros) is cumbersome. Instead, digital investigators often view the hexadecimal representation of data. Another commonly used representation of data is ASCII. The ASCII standard specifies that certain combinations of ones and zeros represent certain letters and numbers. Table 8.1 shows the ASCII and hexadecimal values of capital letters.

Table 8.1: ASCII and hexadecimal values of some capital case letters.

Letter

Hexadecimal

ASCII

A

41

65

B

42

66

C

43

67

D

44

68

E

45

69

F

46

70

G

47

71

H

48

72

I

49

73

J

4A

74

K

4B

75

L

4C

76

M

4D

77

N

4E

78

O

4F

79

P

50

80

Q

51

81

Y

59

89

Z

5A

90

Conceptually, programs that display each byte of data in hexadecimal and ASCII format are like microscopes, allowing digital investigators to view features that are normally invisible. For instance, Word documents contain data that are not generally visible but can be displayed using a program like WinHex[2] as shown in Table 8.2 with hexadecimal on the left and ASCII on the right.

Table 8.2: Segment of a Word document shown in hexadecimal and ASCII format.

Hexadecimal

ASCII

1e 10 00 00 01 00 00 00 0a 00 00 00 43 68 61 70

............Chap

74 65 72 20 38 00 0c 10 00 00 02 00 00 00 1e 00

ter 8...........

00 00 06 00 00 00 54 69 74 6c 65 00 03 00 00 00

......Title.....

01 00 00 00 98 00 00 00 03 00 00 00 00 00 00 00

................

20 00 00 00 01 00 00 00 36 00 00 00 02 00 00 00

.......6.......

3e 00 00 00 01 00 00 00 02 00 00 00 0a 00 00 00

>...............

5f 50 49 44 5f 47 55 49 44 00 02 00 00 00 10 27

_PID_GUID.......'

00 00 41 00 00 00 4e 00 00 00 7b 00 30 00 43 00

..A...N...{.0.C.

33 00 37 00 34 00 46 00 30 00 30 00 2d 00 42 00

3.7.4.F.0.0.-.B.

37 00 30 00 30 00 2d 00 31 00 31 00 44 00 32 00

7.0.0.-.1.1.D.2.

2d 00 38 00 46 00 43 00 46 00 2d 00 39 00 35 00

-.8.F.C.F.-.9.5.

46 00 39 00 43 00 38 00 34 00 37 00 41 00 31 00

F.9.C.8.4.7.A.1.

33 00 30 00 7d 00 00 00 00 00 00 00 00 00 00 00

3.0.}...........

The difference between little- and big-endian representations is most apparent when converting data from their computer representation into a more readable form. For instance, Table 8.3 shows the first two lines of a tcpdump file created on an Intel-based computer (left) compared with a tcp-dump file created at the same time on a Sun computer (right). As discussed in Chapter 11, UNIX represents the date "Sat, 10 May 2003 08:37:01 GMT" using the sequence of bytes shown in Table 8.3 in bold - the different byte order on both systems is clearly visible.

Table 8.3: Viewing two tcpdump files created on Intel-based and Sun systems shows the difference between little-and big-endian representations of the same UNIX date (in bold).

Linux on Intel (little-endian)

Solaris on Sun (big-endian)

D4C3B2A1 02000400 00000000 00000000

A1B2C3D4 00020004 00000000 00000000

60000000 01000000 2DBABC3E 46C30500

00000044 00000001 3EBCBA2D 0004BFF0

An awareness of byte order is also required when searching through digital evidence for specific combinations of bytes.

[1]The terms big-endian and little-endian are based on the story in Gulliver's Travels, in which the Lilliputians' main political conflict was whether soft-boiled eggs should be opened on the big end or the little end.

[2]http://www.winhex.com




Digital Evidence and Computer Crime
Digital Evidence and Computer Crime, Second Edition
ISBN: 0121631044
EAN: 2147483647
Year: 2003
Pages: 279

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net