11.2 What Is Bit Data, Anyway?

Before describing how to manipulate bits, it might not be a bad idea to define exactly what this text means by bit data. Most readers probably assume that bit manipulation programs twiddle individual bits in memory. While programs that do this are definitely bit manipulation programs, we're not going to limit our definition to just those programs. For our purposes, "bit manipulation" refers to working with data types that consist of strings of bits that are non-contiguous or are not an even multiple of 8 bits long. Generally, such bit objects will not represent numeric integers, although we will not place this restriction on our bit strings.

A bit string is some contiguous sequence of one or more bits (this term even applies if the bit string's length is an even multiple of 8 bits). Note that a bit string does not have to start or end at any special point. For example, a bit string could start in bit 7 of one byte in memory and continue through to bit 6 of the next byte in memory. Likewise, a bit string could begin in bit 30 of EAX, consume the upper 2 bits of EAX, and then continue from bit 0 through bit 17 of EBX. In memory, the bits must be physically contiguous (i.e., the bit numbers are always increasing except when crossing a byte boundary, and at byte boundaries the memory address increases by one byte). In registers, if a bit string crosses a register boundary, the application defines the continuation register, but the bit string always continues in bit zero of that second register.

A bit set is a collection of bits, not necessarily contiguous (though they may be contiguous), within some larger data structure. For example, bits 0..3, 7, 12, 24, and 31 from some double word forms a set of bits. Usually, we will limit bit sets to some reasonably sized container object (that is, the data structure that encapsulates the bit set), but the definition doesn't specifically limit the size. Normally, we will deal with bit sets that are part of an object no more than about 32 or 64 bits in size, though this limit is completely artificial. Note that bit strings are special cases of bit sets.

A bit run is a sequence of bits with all the same value. A run of zeros is a bit string that contains all zeros, and a run of ones is a bit string containing all ones. The first set bit in a bit string is the bit position of the first bit containing a one in a bit string, i.e., the first ‘1’ bit following a possible run of zeros. A similar definition exists for the first clear bit. The last set bit is the last bit position in a bit string containing that contains ‘1’; afterward, the remainder of the string forms an uninterrupted run of zeros. A similar definition exists for the last clear bit.

A bit offset is the number of bits from some boundary position (usually a byte boundary) to the specified bit. As noted in Chapter 2, we number the bits starting from zero at the boundary location. If the offset is less than 32, then the bit offset is the same as the bit number in a byte, word, or double word value.

A mask is a sequence of bits that we'll use to manipulate certain bits in another value. For example, the bit string %0000_1111_0000, when it's used with the and instruction, can mask away (clear) all the bits except bits 4 through 7. Likewise, if you use the same value with the or instruction, it can force bits 4 through 7 to ones in the destination operand. The term "mask" comes from the use of these bit strings with the and instruction; in those situations the one and zero bits behave like masking tape when you're painting something; they pass through certain bits unchanged while masking out (clearing) the other bits.

Armed with these definitions, we're ready to start manipulating some bits!