11.5 Packing and Unpacking Bit Strings

A common bit operation is inserting a bit string into an operand or extracting a bit string from an operand. Chapter 2 provided simple examples of packing and unpacking such data; now it is time to formally describe how to do this.

For the purposes of the current discussion, we will assume that we're dealing with bit strings — that is, a contiguous sequence of bits. A little later in this chapter we'll take a look at how to extract and insert bit sets in an operand. Another simplification we'll make is that the bit string completely fits within a byte, word, or double word operand. Large bit strings that cross object boundaries require additional processing; a discussion of bit strings that cross double word boundaries appears later in this section.

A bit string has two attributes that we must consider when packing and unpacking that bit string: a starting bit position and a length. The starting bit position is the bit number of the L.O. bit of the string in the larger operand. The length, of course, is the number of bits in the operand. To insert (pack) data into a destination operand, you start with a bit string of the appropriate length that is right-justified (i.e., starts in bit position zero) and is zero extended to 8, 16, or 32 bits. The task is to insert this data at the appropriate starting position in some other operand that is 8, 16, or 32 bits wide. There is no guarantee that the destination bit positions contain any particular value.

The first two steps (which can occur in any order) is to clear out the corresponding bits in the destination operand and shift (a copy of) the bit string so that the L.O. bit begins at the appropriate bit position. After completing these two steps, the third step is to OR the shifted result with the destination operand. This inserts the bit string into the destination operand (see Figure 11-3).

click to expand
Figure 11-3: Inserting a Bit String into a Destination Operand.

It only takes three instructions to insert a bit string of known length into a destination operand. The following three instructions demonstrate how to handle the insertion operation in Figure 11-3. These instructions assume that the source operand is in BX and the destination operand is AX:

         shl( 5, bx );         and( %111111000011111, ax );         or( bx, ax );

If the length and the starting position aren't known when you're writing the program (that is, you have to calculate them at runtime), then bit string insertion is a little more difficult. However, with the use of a look-up table it's still an easy operation to accomplish. Let's assume that we have two 8-bit values: a starting bit position for the field we're inserting and a non-zero 8-bit length value. Also assume that the source operand is in EBX and the destination operand is EAX. The code to insert one operand into another could take the following form:

 readonly      // The index into the following table specifies the length of the bit string      // at each position:      MaskByLen: dword[ 32 ] :=           [                0, $1, $3, $7, $f, $1f, $3f, $7f,                $ff, $1ff, $3ff, $7ff, $fff, $1fff, $3fff, $7fff, $ffff,                $1_ffff, $3_ffff, $7_ffff, $f_ffff,                $1f_ffff, $3f_ffff, $7f_ffff, $ff_ffff,                $1ff_ffff, $3ff_ffff, $7ff_ffff, $fff_ffff,                $1fff_ffff, $3fff_ffff, $7fff_ffff, $ffff_ffff           ];                .                .                .           movzx( Length, edx );           mov( MaskByLen[ edx*4 ], edx );           mov( StartingPosition, cl );           shl( cl, edx );           not( edx );           shl( cl, ebx );           and( edx, eax );           or( ebx, eax );

Each entry in the MaskByLen table contains the number of one bits specified by the index into the table. Using the Length value as an index into this table fetches a value that has as many one bits as the Length value. The code above fetches an appropriate mask, shifts it to the left so that the L.O. bit of this run of ones matches the starting position of the field into which we want to insert the data, then it inverts the mask and uses the inverted value to clear the appropriate bits in the destination operand.

To extract a bit string from a larger operand is just as easy as inserting a bit string into some larger operand. All you've got to do is mask out the unwanted bits and then shift the result until the L.O. bit of the bit string is in bit zero of the destination operand. For example, to extract the 4-bit field starting at bit position five in EBX and leave the result in EAX, you could use the following code:

           mov( ebx, eax );            // Copy data to destination.           and( %1_1110_0000, ebx );   // Strip unwanted bits.           shr( 5, eax );              // Right justify to bit position zero.

If you do not know the bit string's length and starting position when you're writing the program, you can still extract the desired bit string. The code is very similar to insertion (though a tiny bit simpler). Assuming you have the Length and StartingPosition values we used when inserting a bit string, you can extract the corresponding bit string using the following code (assuming source=EBX and dest=EAX):

           movzx( Length, edx );           mov( MaskByLen[ edx*4 ], edx );           mov( StartingPosition, cl );           mov( ebx, eax );           shr( cl, eax );           and( edx, eax );

The examples up to this point all assume that the bit string appears completely within a double word (or smaller) object. This will always be the case if the bit string is less than or equal to 24 bits in length. However, if the length of the bit string plus its starting position (mod eight) within an object is greater than 32, then the bit string will cross a double word boundary within the object. To extract such bit strings requires up to three operations: one operation to extract the start of the bit string (up to the first double word boundary), an operation that copies whole double words (assuming the bit string is so long that it consumes several double words), and a final operation that copies leftover bits in the last double word at the end of the bit string. The actual implementation of this operation is left as an exercise for the reader.