2.5 Itanium Information Units and Data Types

The basic information unit for the Itanium architecture is the 8-bit byte. Individual bytes are given 64-bit addresses, but it is also important to understand that groups of adjacent bytes have addresses, as shown in Figure 2-5. These multibyte units include the 16-bit word (2 bytes), the 32-bit double word (4 bytes), and the 64-bit quad word (8 bytes). For little-endian systems, such units are addressed by the low-order byte of the group. Similarly, the addresses of the higher-order bytes within the larger information units take on successive values beyond the address of the lowest-order byte.

Figure 2-5. Itanium information units

graphics/02fig05.gif

In the convention that has been used by Intel and Digital Equipment Corporation, the individual bits within any information unit are numbered from the least significant bit on the right, bit 0. The most significant bit is then bit 7 for a byte, bit 15 for a word, bit 31 for a double word, and bit 63 for a quad word. Some other machine designers, including Hewlett-Packard Company, have adopted the opposite numbering convention of naming the most significant bit on the left as bit 0. Note that the convention for the Itanium architecture does have the convenience of corresponding directly to the positional weighting scheme for evaluating binary values presented in Chapter 1. That is, the weight of bit i is 2i.

The corresponding convention for ordering the bytes within words, double words, and quad words is to store the lowest order byte of the group at the lowest address. This is the little-endian convention. The opposite convention, where the highest order byte of a group is stored at the lowest address, is called big-endian, which has historically been followed by Hewlett-Packard and Motorola. When character string data are transmitted between systems, the bytes travel in the same order as letters in words and words in sentences in Western languages. But when little-endian and big-endian systems attempt to break up, say, a 32-bit binary number into four 8-bit binary bytes for sequential transmission, what one system views as WXYZ will be perceived by the other as ZYXW when reassembled. This problem affects only the byte ordering; all systems agree on the ordering (but not the numbering!) of the bits within bytes.

Let us consider, as a specific example of little-endian data storage, that the quad word quantity 0x0F0E0D0C0B0A0908 is stored at address Q. Location Q is then also the address of the double word whose value is 0x0B0A0908, the word whose value is 0x0908, and the byte whose value is 0x08. In similar fashion, location Q+1 is the address of the byte whose value is 0x09, the unaligned word whose value is 0x0A09, and so forth. The Itanium instruction set uses separate opcodes (ld1, ld2, ld4, ld8) and (st1, st2, st4, st8), in order to specify what type of information unit (byte, word, double word, quad word) is to be loaded or stored in a data transfer between memory and a register.

Itanium integer registers are 64 bits wide and can thus accommodate any of these four information units. The hardware design specifies precisely how to widen or narrow information of other sizes when it is placed into or retrieved from a 64-bit register.

What interpretations can be made from the bit patterns stored in these information units? The fundamental data types supported by the instruction set of the Itanium architecture are integers and floating-point numbers. In some contexts, an integer will represent an address instead of a data value. In addition, a compiler program or an assembly language programmer can impart further purposes to integers for example, to represent characters or Boolean variables.

2.5.1 Integers

We reviewed the concepts of binary representation of integers in Chapter 1. A span of N bits can be used in one of two ways: to represent a range of unsigned integers, 0 to 2N 1, or to represent a range of signed integers, 2N 1 through 0 to +2N 1 1. Table 2-1 shows the numeric ranges for the various integer sizes that are pertinent to Itanium contexts.

The Itanium architecture has integer arithmetic instructions only for data of quad word width, even though the detailed operation of Itanium load and store instructions facilitates packing and unpacking information units of smaller widths. Itanium logical instructions work only with quad word data, but these instructions provide some capability for access to data packed at the bit or group-of-bits level.

2.5.2 Floating-Point Numbers

Since integers may lack the dynamic range necessary for certain scientific applications, most computer architectures provide for floating-point numbers, which correspond to scientific notation. Whereas hand-held calculators display numbers as a decimal number that is multiplied by some positive or negative power of 10, computers typically represent non-integer data as a significand that is multiplied by some power of 2. The exponent and sign of a number can be bit-packed with the significand into an information unit in several ways.

Table 2-1. Integer Data Types

Type

Bits

Bytes

Numeric Range (expressed in decimal radix)

Signed

Unsigned

Byte

8

1

128 to +127

0 to 255

Word

16

2

32,768 to +32,767

0 to 65,535

Double word

32

4

2,147,483,648 to +2,147,483,647

0 to 4,294,967,295

Quad word

64

8

9,223,372,036,854,775,808 to +9,223,372,036,854,775,807

0 to 18,446,744,073,709,551,615

In the past, various computer manufacturers represented floating-point data in ways that were not fully compatible across architectures. Accordingly, concerns arose about inaccuracies that might compound in repeated mathematical operations. Some of these difficulties fell away as the computer industry began to consolidate, but a satisfactory solution came about only through participation in the agreed-upon standards documented in ANSI/IEEE 754, IEEE Standard for Binary Floating-Point Arithmetic.

Two fundamental formats have been supported by nearly every new architecture introduced since the standard emerged: single and double, requiring respectively 32 bits (4 bytes) and 64 bits (8 bytes) for storage. Two additional IEEE formats, extended single and extended double, provided some leeway within which certain older formats could be retained e.g., an Intel format requiring 80 bits (10 bytes) for storage. In this book, we shall discuss only the widely supported IEEE single and double formats for floating-point data, whose characteristics are summarized in Table 2-2.

The IEEE representations not only facilitate direct interchange of data between computer systems with different architectures, but also provide for special values that could not be represented in some of the older proprietary formats. For example, special bit patterns are assigned to represent positive infinity and negative infinity. These obey standard algebraic rules, ensuring, for example, that positive infinity plus a valid finite number yields positive infinity as the sum. Other special bit patterns are called NaN, not a number. These can be used when a computed result is algebraically indeterminate, such as infinity minus infinity.

Double precision

An IEEE double-precision datum occupies 8 adjacent bytes in memory. In order to minimize the time required to load and store the datum, it should start on an address boundary that is evenly divisible by 8; that is, the datum should be naturally aligned, which is along quad word boundaries for the Itanium architecture. In a little-endian representation the bits are labeled from right to left, 0 through 63, as follows, where D denotes the lowest byte address of the information units storing the datum:

graphics/02fig05a.gif

Bit 63 is the sign bit, bits <62:52> represent the exponent of 2 biased by addition of 1023 to the true value, and bits <51:0> represent a 52-bit fraction. If all the bits in the representation are zero, the number represented is zero by convention.

Table 2-2. IEEE Floating-Point Numbers
 

Single

Double

Size of representation in memory

  

Sign

1 bit

1 bit

Exponent

8 bits

11 bits

Fraction[*]

23 bits

52 bits

Bias for exponent

127

1023

Minimum magnitude

1.175 x 10 38

2.225 x 10 308

Maximum magnitude

3.403 x 10+38

1.798 x 10+308

Precision

  

binary

24 bits

53 bits

decimal

6 decimal digits

15 decimal digits

[*] The significand consists of an implicit "hidden bit" followed by the fraction.

The significand is adjusted so that it consists of a leading bit of 1 to the left of an implied binary point; that is, it is scaled into the range from 1 up to (but not including) 2. For storage in the information units of memory, this logically known bit to the left of the implied binary point is not represented physically. The precision of the significand is thus one part in 253 even though only 52 bits store the fraction physically. Except for special cases, the value of the number is

(1 2 x S) x 1.F x 2(E B)

where S is the sign of the number (0 for positive, 1 for negative), F is the binary fraction, 1.F is the significand, E is the true exponent, and B is the bias (equal to 1023 for double precision).

In order to facilitate certain IEEE constraints on accuracy when rounding computed results, as well as to accommodate the 80-bit (10-byte) extended double-precision format brought forward from Intel's IA-32 architecture, the datapath for floating-point manipulations in an Itanium processor (including 128 floating-point registers) has a total width of 82 bits.

When the various bit regions of a double-precision datum are retrieved from memory into an Itanium floating-point register, their arrangement is as follows:

graphics/02fig05b.gif

The "hidden bit" that is suppressed for economy of storage in memory is thus made explicit in the representation of a floating-point number in a processor register. We defer discussion of the expansion of space for the exponent to a later chapter.

Single precision

An IEEE single-precision datum occupies 4 adjacent bytes in memory. In order to minimize the time required for loading and storing the datum, it should start on an address boundary that is evenly divisible by 4; that is, the datum should be naturally aligned (i.e., double word aligned). In a little-endian representation the bits are labeled from right to left, 0 through 31, as follows, where S denotes the lowest byte address of the information units storing the datum:

graphics/02fig05c.gif

Bit 31 is the sign bit, bits <30:23> represent the exponent of 2 biased by addition of 127 to the true value, and bits <22:0> represent a 23-bit fraction. If all the bits in the representation are zero, the number represented is zero by convention.

The significand is adjusted so that it consists of a leading bit of 1 to the left of an implied binary point; that is, it is scaled into the range from 1 up to (but not including) 2. For storage in the information units of memory, this logically known bit to the left of the implied binary point is not represented physically. The precision of the significand is thus one part in 224 even though only 23 bits store the fraction physically. Except for special cases, the value of the number is

(1 2 x S) x 1.F x 2(E B)

where S is the sign of the number (0 for positive, 1 for negative), F is the binary fraction, 1.F is the significand, E is the true exponent, and B is the bias (equal to 127 for single precision).

When the various bit regions of a single-precision datum are retrieved from memory into an Itanium floating-point register, their arrangement is as follows:

graphics/02fig05d.gif

Again, the "hidden bit" that is suppressed for economy of storage in memory is thus made explicit in the representation of a floating-point number in a processor register. We defer discussion of the expansion of space for the exponent to a later chapter.

The Itanium processor hardware can thus work with the same 82-bit register representation of floating-point quantities, while the memory representation takes different amounts of storage space depending on the precision required for an application.

As one example, the IEEE single-precision representation for the decimal number 4.25 as it would be stored in memory can be constructed using the following steps:

4.2510

=

22 + 2 2 = 100.012

(convert from base 10 to base 2)

 

=

1.0001 x 22

(shift into normalized form)

After the "hidden bit" is suppressed, the binary fraction is F = 00010000000000000000000 (23 bits in all). The true exponent is 2, but with the bias of 12710 this becomes 12910 or E = 100000012. The sign is S = 0, since the number is positive. Putting all those pieces together in the order S-E-F, we have

0

10000001

00010000000000000000000

S

E

F

By reclustering 4 bits at a time, we can deduce what this would look like if it were to be printed as an unsigned hexadecimal number:

0100 0000 1000 1000 0000 0000 0000 00002 = 4088000016

Note that only a real number whose fractional part can be represented exactly as a sum of inverse powers of 2 can be stored exactly. Common decimal fractions like 0.1 or 0.7 cannot be stored exactly.

As another example, the 32-bit pattern 4126000016 for a single-precision number stored in memory can be interpreted by reversing the steps just illustrated:

4126000016

=

0100 0001 0010 0110 0000 0000 0000 00002

 

=

graphics/02equ04.gif

 

=

+ 1.0100112 x 2130 127 = 1.0100112 x 23 = 1010.0112

 

=

+ (8 + 2 + 0.25 + 0.125)10 = 10.37510

Conversions for double-precision numbers would proceed in a similar fashion.

2.5.3 Alphanumeric Characters

Binary numbers can encode any information, including alphanumeric characters (letters and numerals) and punctuation marks. The development of coded character sets is an old and continuing story. Morse code for telegraphy in the nineteenth century, the use of punch cards for tabulating the US census in the early twentieth century, the later spread of computer applications into business and commerce, and present requirements for encoding the character sets of all the world's written human languages all require compact and consistent encoding schemes.

Providing enough codes while facilitating efficient storage and convenient sorting algorithms has led to many different systems, incompatibilities, and compromises. As a consensus, Unicode® provides methods for accommodating about a million different historical and currently used character symbols, requiring 21 bits for encoding. Several Unicode transformation formats (UTF) have been defined:

  • The UTF-32 convention uses codes values running from 0 to 10FFFF16, treated as 32-bit data elements.

  • The UTF-16 variable-length convention represents 1,112,064 codes, 65,536 as 2 bytes and the rest as 4 bytes.

  • The UTF-8 variable-length convention represents 128 characters (ASCII, see below) as 1 byte, 1920 characters as 2 bytes (European, Hebrew, and Arabic elements), 63,488 characters as 3 bytes (Chinese, Japanese, Korean elements), and 2,147,418,112 additional characters using up to 6 bytes.

To ensure unambiguous transmission, both big-endian (default) and little-endian variants are defined for UTF-16 and UTF-32.

Linux® and other contemporary programming environments support UTF-8, which can handle the full generality of Unicode definitions, while still taking advantage of the efficiency of a single-byte coding scheme when possible. The programming language Java includes Unicode support.

The American Standard Code for Information Interchange (ASCII) character set includes both uppercase and lowercase alphabetic characters (A through Z, and a through z), the decimal digits (0 through 9), punctuation marks, and special control characters. The ASCII code was accepted by the American National Standards Institute (ANSI) to standardize the exchange of textual information between computers and peripherals of different manufacturers. This code exists in 7-bit and 8-bit forms; for simplicity we show the 7-bit chart that is compatible with UTF-8 as Table 2-3.

Anyone even slightly familiar with world languages and cultures will perceive at a glance the inadequacy of 7-bit ASCII. There are no diacritically marked letters as used in most Western languages. The symbol $ is not universally used for currency, and the symbol ¢ is not included. Some of these needs for Western languages can be accommodated with extensions of ASCII character representations to 8 bits, but a truly global solution obviously requires Unicode.

Table 2-3. ASCII Character Encoding
 

Hex

ASCII

 

Hex

ASCII

 

Hex

ASCII

 

Hex

ASCII

000

00

NUL

032

20

SP

064

40

@

096

60

`

001

01

SOH

033

21

!

065

41

A

097

61

a

002

02

STX

034

22

"

066

42

B

098

62

b

003

03

ETX

035

23

#

067

43

C

099

63

c

004

04

EOT

036

24

$

068

44

D

100

64

d

005

05

ENQ

037

25

%

069

45

E

101

65

e

006

06

ACK

038

26

&

070

46

F

102

66

f

007

07

BEL

039

27

'

071

47

G

103

67

g

008

08

BS

040

28

(

072

48

H

104

68

h

009

09

HT

041

29

)

073

49

I

105

69

i

010

0A

LF

042

2A

*

074

4A

J

106

6A

j

011

0B

VT

043

2B

+

075

4B

K

107

6B

k

012

0C

FF

044

2C

,

076

4C

L

108

6C

l

013

0D

CR

045

2D

-

077

4D

M

109

6D

m

014

0E

SO

046

2E

.

078

4E

N

110

6E

n

015

0F

SI

047

2F

/

079

4F

O

111

6F

o

016

10

DLE

048

30

0

080

50

P

112

70

p

017

11

DC1

049

31

1

081

51

Q

113

71

q

018

12

DC2

050

32

2

082

52

R

114

72

r

019

13

DC3

051

33

3

083

53

S

115

73

s

020

14

DC4

052

34

4

084

54

T

116

74

t

021

15

NAK

053

35

5

085

55

U

117

75

u

022

16

SYN

054

36

6

086

56

V

118

76

v

023

17

ETB

055

37

7

087

57

W

119

77

w

024

18

CAN

056

38

8

088

58

X

120

78

x

025

19

EM

057

39

9

089

59

Y

121

79

y

026

1A

SUB

058

3A

:

090

5A

Z

122

7A

z

027

1B

ESC

059

3B

;

091

5B

[

123

7B

{

028

1C

FS

060

3C

<

092

5C

\

124

7C

|

029

1D

GS

061

3D

=

093

5D

]

125

7D

}

030

1E

RS

062

3E

>

094

5E

^

126

7E

~

031

1F

US

063

3F

?

095

5F

_

127

7F

DEL

Any ASCII character-oriented peripheral device, such as a printer, will output an A when the ASCII code for A (0x41) is sent to it. Similarly, such devices should provide a horizontal space in response to the SP nonprinting character (0x20). The ASCII encoding of the string My Itanium would be as follows:

graphics/02fig05e.gif

Note that each character uses one byte of storage, shown here as two hex digits. The entire string can be referenced by the address of its first byte containing the representation of the character M at the address symbolized by STRING.

The arrangement of Table 2-3 makes evident the convenient feature of ASCII coding that corresponding uppercase and lowercase letters differ by only a single bit. Uppercase A is 0x41 (0100 0001), and lowercase a is 0x61 (0110 0001). This relationship simplifies case conversion or collapsing of the two cases to facilitate certain alphabetic sorting operations.

About one-fourth of the 7-bit ASCII codes designate control characters intended for device control. The presence of these extra codes has given the ASCII code its versatility in such areas as the control of laboratory instrumentation through relatively simple interfaces attached to the serial communication ports of inexpensive microcomputers.

Viewed as a data structure, any string has two attributes: an address and a length in bytes (or number of characters). The VAX architecture and a few others have machine instructions intended specifically for manipulating strings as a special data type. The Itanium architecture and most others typically ensure that the features of machine instructions that handle small information units (e.g., byte, word, and double word) can also handle string manipulations. Therefore, the programmer or compiler must take responsibility for managing strings as data structures.



ItaniumR Architecture for Programmers. Understanding 64-Bit Processors and EPIC Principles
ItaniumR Architecture for Programmers. Understanding 64-Bit Processors and EPIC Principles
ISBN: N/A
EAN: N/A
Year: 2003
Pages: 223

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net