6.3 Bit and Field Operations

We have seen how arithmetic and logical operations work bitwise with two source operands to form a single result. In shift operations, all of the bits move as one, to the left or right, in the amount determined by one of the source operands. In deposit and extract operations, selected bits from the source operands are assembled into a new result.

6.3.1 Shift Instructions

Computer architectures usually provide some instructions that shift a bit pattern to the left or right. Early architectures offered shifts by only a single bit position, while modern architectures tend to offer shifts by multiple bit positions. The Itanium architecture offers left logical, right logical, and right arithmetic shift instructions:

 shl     r1=r3,r2       // r1 <- r3 shifted left r2 bits shr     r1=r3,r2       // r1 <- r3 shifted right r2 bits shr.u   r1=r3,r2       // r1 <- r3 shifted right r2 bits,                        // unsigned 

where left shifts are always unsigned (logical) shifts. For right shifts, the instruction completer u specifies an unsigned (logical) shift; otherwise, shr is a signed (arithmetic) shift. Note the particular order in which to specify register r3, the data source, and register r2, the shift count parameter.

With all of these shift instructions, bits shifted beyond position 63 at the left or beyond position 0 at the right are lost into the "bit bucket." Shift counts greater than 63 produce a zero result, with one exception: shr, with a count greater than 63 and a negative source value, will produce an all ones result, which is the value 1.

The shift left operation (shl) and the shift right unsigned (logical) operation (shr.u) fill in vacated bit positions with 0 bits. The shift right arithmetic operation (shr) fills in vacated positions with either 0 or 1 bits that will replicate the sign bit of the original value (bit 63 of source register r3). The Itanium architecture uses separate opcodes to specify the direction of logical shifts, while other architectures may specify the direction using a signed shift count.

Using a register for the shift count allows that count to change dynamically during program execution. When the shift count is constant, it can be encoded within the shift instruction as an immediate value:

 shl     r1=r3,count6   // r1 <- r3 shifted left count6 shr     r1=r3,count6   // r1 <- r3 shifted right count6 shr.u   r1=r3,count6   // r1 <- r3 shifted right count6,                        // unsigned 

where count6 is a 6-bit unsigned binary value specifying a shift count of zero to 63 bit positions. When the shift count is specified with an immediate value, the opcode shl is actually a pseudo-op for a special case of the deposit instruction and the opcodes shr and shr.u are pseudo-ops for special cases of the extract instruction. The deposit and extract instructions are discussed below.

A note of caution: Many other architectures give special properties to their shift instructions, including implementing circular shifts rather than tossing bits into the "bit bucket." For this and many other reasons, great care should always be taken when adapting any algorithm from one assembly language to another.

6.3.2 Applications of Shift Operations

Up to 64 individual Boolean values can be packed as single bits into a quad word. Logical instructions, such as and and andcm, and shift instructions provide useful means to determine individual bit values directly or by pruning away bit values of no interest.

The shr (shift right arithmetic) instruction divides a signed integer value by 2 for each bit position shifted. In C, this would be the div function; the remainder is lost (i.e., 3/2 and 2/2 both yield 1 as the result). What happens for 3/2, 2/2, and 1/2 when using the shr instruction? Is shr suitable for division of signed integers by 2?

Unlike some other architectures, the Itanium ISA does not have an arithmetic shift left instruction. The shl instruction is a logical shift left. While a logical shift left can be used to multiply by powers of 2 (the obverse of using shr for division by 2), the Itanium ISA does not check for overflow. Without an overflow check, the programmer must be aware that the result may change sign, and should test for this situation. Nevertheless, shl has a wider range than the shladd instruction (Section 4.2.3) and, along with shr, provides a useful facility for multiplication and division by powers of 2.

6.3.3 The Shift Right Pair Instruction

Many architectures include a "long shift" instruction, which can shift a bit pattern that is twice the width of a single register. In this vein, the Itanium architecture offers the shrp (shift right pair) instruction:

 shrp     r1 = r2,r3,count6  // r1 = [r2r3] shifted count6 bits 

which acts as though two 64-bit source operands are placed into a virtual 128-bit working register. Then, imagine that this virtual register is shifted to the right count6 bits, and the rightmost 64 bits of the shifted result are put into the destination register r1, which can be any register including one used as a source.

Rotating a 64-bit pattern

Many architectures have offered an instruction that rotates the bit pattern in an integer register to the left or right by one or more positions, preserving the bit(s) shifted out at one end by sending them to the other. The Itanium shrp instruction operates in this manner when the same register is specified for both sources (r2 and r3).

Long shift instructions have long been used to facilitate computer arithmetic. We shall show an application, involving multiplication, later in this chapter.

6.3.4 Extract and Deposit Instructions

The Itanium architecture provides a useful set of instructions for working with any number of bits within a register. These instructions allow the programmer to directly read or write bit sequences and patterns.

Extract instruction

The Itanium extract instruction can isolate a contiguous span of bits from anywhere within the full width of the source register, and place those bits, right-justified, back into the same or another general register. There are two forms:

 extr     r1 = r3,pos6,len6       // Signed form extr.u   r1 = r3,pos6,len6       // Unsigned form 

where pos6 is a bit position <63:0> and len6 is a field width from 1 to 64 (encoded within the instruction as 0 to 63). Bits <pos6+len6-1:pos6> from source register r3 are copied into destination register r1 as bits <len6-1:0>. Bits <63:len6> in the destination register are set to match the sign bit <pos6+len6-1> of the segment moved for a signed extract (extr), but are set to 0 for an unsigned extract (extr.u).

Deposit instruction

The Itanium deposit instruction can isolate contiguous bits from the right-hand side of a source register and reposition that segment at an arbitrary location into the same or another general register. There are two principal forms, each of which can have an immediate value in place of one operand from a register:

 dep.z    r1 = r2,pos6,len6        // Zero form dep.z    r1 = imm8,pos6,len6      // Zero form, immediate dep      r1 = r2,r3,pos6,len4     // Merge form dep      r1 = imm1,r3,pos6,len6   // Merge form, immediate 

where pos6 is a bit position <63:0>, lenx is a field width encoded within the instruction (x = 4 or 6), and imm1 or imm8 is a signed 1- or 8-bit immediate value.

The zero form (dep.z) copies bits <len6-1:0> from source register r2 into the destination register r1 as bits <pos6+len6-1:pos6>, and all other bits of the destination are set to zero. When an immediate value is specified for the zero form, it is sign-extended to arbitrary width and then bits <len6-1:0> comprise the source segment to be repositioned.

The merge form copies bits <len6-1:0> from source register r2 into the destination register r1 as bits <pos6+len6-1:pos6>; all other bits for the destination are taken from corresponding segments of the other source register r3. The length of the segment from source register r2 is limited to 16 bits because of the 4-bit width of len4.

When an immediate value is specified for the merge form, it is sign-extended to arbitrary width and then bits <len6-1:0> comprise the first source segment to be repositioned. This has the effect of forcing bits <pos6+len6-1:pos6> to be all 0s or all 1s by copying the entire sign-extended second source onto the destination.

Applicability and generality

While the position pos6 is not required to coincide with a byte boundary, it is typically useful to work in a byte-aligned mode. Figure 6-2 illustrates the four major forms of these instructions for data modifications at the fifth byte position in a quad word, pos6 = 40 and len6 = 8.

Figure 6-2. Extract and deposit instructions

graphics/06fig02.gif

While some of the capabilities of these instructions can be mimicked using combinations of shift instructions and logical masking, the extract and deposit instructions are easier to comprehend and generally more efficient.



ItaniumR Architecture for Programmers. Understanding 64-Bit Processors and EPIC Principles
ItaniumR Architecture for Programmers. Understanding 64-Bit Processors and EPIC Principles
ISBN: N/A
EAN: N/A
Year: 2003
Pages: 223

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net