2.12 Bit Fields and Packed Data

Although the 80x86 operates most efficiently on byte, word, and double word data types, occasionally you'll need to work with a data type that uses some number of bits other than 8, 16, or 32. For example, consider a date of the form "04/02/01." It takes three numeric values to represent this date: a month, day, and year value. Months, of course, take on the values 1..12. It will require at least 4 bits (maximum of 16 different values) to represent the month. Days range between 1..31. So it will take 5 bits (maximum of 32 different values) to represent the day entry. The year value, assuming that we're working with values in the range 0..99, requires 7 bits (that can be used to represent up to 128 different values). Four plus five plus seven is 16 bits, or two bytes. In other words, we can pack our date data into two bytes rather than the three that would be required if we used a separate byte for each of the month, day, and year values. This saves one byte of memory for each date stored, which could be a substantial saving if you need to store a lot of dates. The bits could be arranged as shown in Figure 2-20.

Figure 2-20: Short Packed Date Format (Two Bytes).

MMMM represents the 4 bits making up the month value; DDDDD represents the 5 bits making up the day, and YYYYYYY is the 7 bits comprising the year. Each collection of bits representing a data item is a bit field. April 2, 2001 would be represented as $4101:

 0100 00010 0000001 = %0100_0001_0000_0001 or $4101  4     2      01

Although packed values are space efficient (that is, very efficient in terms of memory usage), they are computationally inefficient (slow!). The reason? It takes extra instructions to unpack the data packed into the various bit fields. These extra instructions take additional time to execute (and additional bytes to hold the instructions); hence, you must carefully consider whether packed data fields will save you anything. The sample program in Listing 2-9 demonstrates the effort that must go into packing and unpacking this 16-bit date format.

Listing 2-9: Packing and Unpacking Date Data.

 program dateDemo; #include( "stdlib.hhf" ) static     day:    uns8;     month:  uns8;     year:   uns8;     packedDate: word; begin dateDemo;     stdout.put( "Enter the current month, day, and year: " );     stdin.get( month, day, year );     // Pack the data into the following bits:     //     // 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0     // m m m m d d d d d y y y y y y y     mov( 0, ax );     mov( ax, packedDate ); //Just in case there is an error.     if( month > 12 ) then          stdout.put( "Month value is too large", nl );     elseif( month = 0 ) then          stdout.put( "Month value must be in the range 1..12", nl );     elseif( day > 31 ) then          stdout.put( "Day value is too large", nl );     elseif( day = 0 ) then          stdout.put( "Day value must be in the range 1..31", nl );     elseif( year > 99 ) then          stdout.put( "Year value must be in the range 0..99", nl );     else          mov( month, al );          shl( 5, ax );          or( day, al );          shl( 7, ax );          or( year, al );          mov( ax, packedDate );     endif;     // Okay, display the packed value:     stdout.put( "Packed data = $", packedDate, nl );     // Unpack the date:     mov( packedDate, ax );     and( $7f, al );        // Retrieve the year value.     mov( al, year );     mov( packedDate, ax ); // Retrieve the day value.     shr( 7, ax );     and( %1_1111, al );     mov( al, day );     mov( packedDate, ax ); // Retrive the month value.     rol( 4, ax );     and( %1111, al );     mov( al, month );     stdout.put( "The date is ", month, "/", day, "/", year, nl ); end dateDemo;

Of course, having gone through the problems with Y2K, using a date format that limits you to 100 years (or even 127 years) would be quite foolish at this time. If you're concerned about your software running 100 years from now, perhaps it would be wise to use a three-byte date format rather than a two-byte format. As you will see in the chapter on arrays, however, you should always try to create data objects whose length are an even power of two (one byte, two bytes, four bytes, eight bytes, and so on) or you will pay a performance penalty. Hence, it is probably wise to go ahead and use four bytes and pack this data into a double word variable. Figure 2-21 shows one possible data organization for a four-byte date.

Figure 2-21: Long Packed Date Format (Four Bytes).

In this long packed data format several changes were made beyond simply extending the number of bits associated with the year. First, because there are extra bits in a 32-bit double word variable, this format allocates extra bits to the month and day fields. Because these two fields now consist of 8 bits each, they can be easily extracted as a byte object from the double word. This leaves fewer bits for the year, but 65,536 years is probably sufficient; you can probably assume without too much concern that your software will not still be in use 63,000 years from now when this date format will no longer work.

Of course, you could argue that this is no longer a packed date format. After all, we needed three numeric values, two of which fit just nicely into one byte each and one that should probably have at least two bytes. Because this "packed" date format consumes the same four bytes as the unpacked version, what is so special about this format? Well, another difference you will note between this long packed date format and the short date format appearing in Figure 2-20 is the fact that this long date format rearranges the bits so the Year is in the H.O. bit positions, the Month field is in the middle bit positions, and the Day field is in the L.O. bit positions. This is important because it allows you to very easily compare two dates to see if one date is less than, equal to, or greater than another date. Consider the following code:

      mov( Date1, eax );       // Assume Date1 and Date2 are dword variables      if( eax > Date2 ) then    // using the Long Packed Date format.            << do something if Date1 > Date2 >>      endif;

Had you kept the different date fields in separate variables or organized the fields differently, you would not have been able to compare Date1 and Date2 in such a straight-forward fashion. Therefore, this example demonstrates another reason for packing data even if you don't realize any space savings: It can make certain computations more convenient or even more efficient (contrary to what normally happens when you pack data).

Examples of practical packed data types abound. You could pack eight boolean values into a single byte; you could pack two BCD digits into a byte, and so on. Of course, a classic example of packed data is the EFLAGs register (see Figure 2-22). This register packs nine important boolean objects (along with seven important system flags) into a single 16-bit register. You will commonly need to access many of these flags. For this reason, the 80x86 instruction set provides many ways to manipulate the individual bits in the flags register. Of course, you can test many of the condition code flags using the HLA @c, @nc, @z, @nz, and so on pseudo-boolean variables in an if statement or other statement using a boolean expression.

In addition to the condition codes, the 80x86 provides instructions that directly affect certain flags (Table 2-7).

Table 2-7: Instructions That Affect Certain Flags
Instruction	Explanation

cld();	Clears (sets to zero) the direction flag.
std();	Sets (to one) the direction flag.
cli();	Clears the interrupt disable flag.
sti();	Sets the interrupt disable flag.
clc();	Clears the carry flag.
stc();	Sets the carry flag.
cmc();	Complements (inverts) the carry flag.
sahf();	Stores the AH register into the L.O. 8 bits of the flags register.
lahf();	Loads AH from the L.O. 8 bits of the flags register.

There are other instructions that affect the flags register as well; these instructions, however, demonstrate how to access several of the packed boolean values in the flags register. The lahf and sahf instructions, in particular, provide a convenient way to access the L.O. 8 bits of the flags register as an 8-bit byte (rather than as eight separate 1-bit values). See Figure 2-22 for a layout of the flags register.

click to expand
Figure 2-22: Layout of the flags register.

The lahf (load AH with the L.O. 8 bits of the flags register) and the sahf (store AH into the L.O. byte of the flags register) use the following syntax:

 lahf(); sahf();