11.6 Coalescing Bit Sets and Distributing Bit Strings

Inserting and extracting bit sets is little different than inserting and extract bit strings if the "shape" of the bit set you're inserting (or resulting bit set you're extracting) is the same as the bit set in the main object. The "shape" of a bit set is the distribution of the bits in the set, ignoring the starting bit position of the set. So a bit set that includes bits zero, four, five, six, and seven has the same shape as a bit set that includes bits 12, 16, 17, 18, and 19 because the distribution of the bits is the same. The code to insert or extract this bit set is nearly identical to that of the previous section; the only difference is the mask value you use. For example, to insert this bit set starting at bit number zero in EAX into the corresponding bit set starting at position 12 in EBX, you could use the following code:

         and( %1111_0001_0000_0000_0000, ebx );  // Mask out destination bits.         shl( 12, eax );                         // Move source bits into posn.         or( eax, ebx );                         // Merge the bit set into EBX.

However, suppose you have 5 bits in bit positions zero through four in EAX and you want to merge them into bits 12, 16, 17, 18, and 19 in EBX. Somehow you've got to distribute the bits in EAX prior to logically ORing the values into EBX. Given the fact that this particular bit set has only two runs of one bits, the process is somewhat simplified. The following code achieves this in a somewhat sneaky fashion:

           and( %1111_0001_0000_0000_0000, ebx );           shl( 3, eax );   // Spread out the bits: 1-4 goes to 4-7 and 0 to 3.           btr( 3, eax );   // Bit 3->carry and then clear bit 3           rcl( 12, eax );  // Shift in carry and put bits into final position           or( eax, ebx );  // Merge the bit set into EBX.

This trick with the btr (bit test and reset) instruction worked well because we only had 1 bit out of place in the original source operand. Alas, had the bits all been in the wrong location relative to one another, this scheme might not have worked quite as well. We'll see a more general solution in just a moment.

Extracting this bit set and collecting ("coalescing") the bits into a bit string is not quite as easy. However, there are still some sneaky tricks we can pull. Consider the following code that extracts the bit set from EBX and places the result into bits 0..4 of EAX:

           mov( ebx, eax );           and( %1111_0001_0000_0000_0000, eax );  // Strip unwanted bits.           shr( 5, eax );                          // Put bit 12 into bit 7, etc.           shr( 3, ah );                           // Move bits 11..14 to 8..11.           shr( 7, eax );                          // Move down to bit zero.

This code moves (original) bit 12 into bit position seven, the H.O. bit of AL. At the same time it moves bits 16..19 down to bits 11..14 (bits 3..6 of AH). Then the code shifts the bits 3..6 in AH down to bit 0. This positions the H.O. bits of the bit set so that they are adjacent to the bit left in AL. Finally, the code shifts all the bits down to bit zero. Again, this is not a general solution, but it shows a clever way to attack this problem if you think about it carefully.

The problem with the coalescing and distribution algorithms above is that they are not general. They apply only to their specific bit sets. In general, specific solutions are going to provide the most efficient solution. A generalized solution (perhaps that lets you specify a mask and the code distributes or coalesces the bits accordingly) is going to be a bit more difficult. The following code demonstrates how to distribute the bits in a bit string according to the values in a bit mask:

 // EAX- Originally contains some value into which we insert bits from EBX. // EBX- L.O. bits contain the values to insert into EAX. // EDX- bitmap with ones indicating the bit positions in EAX to insert. // CL- Scratchpad register.            mov( 32, cl );    // Count # of bits we rotate.            jmp DistLoop; CopyToEAX: rcr( 1, ebx );    // Don't use SHR here, must preserve Z-flag.            rcr( 1, eax );            jz Done; DistLoop: dec( cl );            shr( 1, edx );            jc CopyToEAX;            ror( 1, eax );    // Keep current bit in EAX.            jnz DistLoop; Done:      ror( cl, eax );   // Reposition remaining bits.

In the code above, if we load EDX with %1100_1001 then this code will copy bits 0..3 to bits 0, 3, 6, and 7 in EAX. Notice the short-circuit test that checks to see if we've exhausted the values in EDX (by checking for a zero in EDX). Note that the rotate instructions do not affect the zero flag while the shift instructions do. Hence the shr instruction above will set the zero flag when there are no more bits to distribute (i.e., when EDX becomes zero).

The general algorithm for coalescing bits is a tad more efficient than distribution. Here's the code that will extract bits from EBX via the bit mask in EDX and leave the result in EAX:

 // EAX- Destination register. // EBX- Source register. // EDX- Bitmap with ones representing bits to copy to EAX. // EBX and EDX are not preserved.             sub( eax, eax );  // Clear destination register.             jmp ShiftLoop; ShiftInEAX: rcl( 1, ebx );    // Up here we need to copy a bit from             rcl( 1, eax );    // EBX to EAX. ShiftLoop:  shl( 1, edx );    // Check mask to see if we need to copy a bit.             jc ShiftInEAX;    // If carry set, go copy the bit.             rcl( 1, ebx );    // Current bit is uninteresting, skip it.             jnz ShiftLoop;    // Repeat as long as there are bits in EDX.

This sequence takes advantage of one sneaky trait of the shift and rotate instructions: The shift instructions affect the zero flag, while the rotate instructions do not. Therefore, the "shl( 1, edx);" instruction sets the zero flag when EDX becomes zero (after the shift). If the carry flag was also set, the code will make one additional pass through the loop in order to shift a bit into EAX, but the next time the code shifts EDX one bit to the left, EDX is still zero and so the carry will be clear. On this iteration, the code falls out of the loop.

Another way to coalesce bits is via table look-up. By grabbing a byte of data at a time (so your tables don't get too large) you can use that byte's value as an index into a look-up table that coalesces all the bits down to bit zero. Finally, you can merge the bits at the low end of each byte together. This might produce a more efficient coalescing algorithm in certain cases. The implementation is left to the reader… .