A numbering system is a mechanism we use to represent numeric values. In today's society, people most often use the decimal numbering system (base 10) and most computer systems use binary representation. Confusion between the two can lead to poor coding practices. So to write great code, you must eliminate this confusion.
To appreciate the difference between numbers and their representations, let's start with a concrete discussion of the decimal numbering system. The Arabs developed the decimal numbering system we commonly use today (indeed, the ten decimal digits are known as Arabic numerals ). The Arabic system uses a positional notation system to represent values with a relatively small number of different symbols. That is, the Arabic representation takes into consideration not only the symbol itself, but the position of the symbol in a sequence of symbols, a scheme that is far superior to other, nonpositional, representations. To appreciate the difference between a positional system and a nonpositional system, consider the tally-slash representation of the number 25 in Figure 2-1.
The tally-slash representation uses a sequence of n marks to represent the value n . To make the values easier to read, most people arrange the tally marks in groups of five, as in Figure 2-1. The advantage of the tally-slash numbering system is that it is easy to use when counting objects. The disadvantages include the fact that the notation is bulky, and arithmetic operations are difficult. However, without question, the biggest problem with the tally-slash representation is the amount of physical space this representation consumes. To represent the value n requires some amount of space that is proportional to n . Therefore, for large values of n, the tally-slash notation becomes unusable.
The decimal positional notation (base 10) represents numbers using strings of Arabic numerals. The symbol immediately to the left of the decimal point in the sequence represents some value between zero and nine. If there are at least two digits, then the second symbol to the left of the decimal point represents some value between zero and nine times ten. In the decimal positional numbering system each digit appearing to the left of the decimal point represents a value between zero and nine times an increasing power often (see Figure 2-2).
When you see a numeric sequence like '123.45,' you don't think about the value 123.45; rather, you generate a mental image of this quantity. In reality, 123.45 represents:
1 10 2 + 2 10 1 + 3 10 + 4 10 -1 + 5 10 -2
or
100 + 20 + 3 + 0.4 + 0.05
To get a clear picture of how powerful this notation is, consider the following facts:
The positional numbering system, using base 10, can represent the value ten in one-third the space of the tally-slash system.
The base-10 positional numbering system can represent the value one hundred in about 3 percent of the space of the tally-slash system.
The base-10 positional numbering system can represent the value one thousand in about 0.3 percent of the space of the tally-slash system.
As the numbers grow larger, the disparity becomes even greater. Because of the compact and easy-to-recognize notation, positional numbering systems are quite popular.
Human beings developed the decimal numbering system because it corresponds to the number of fingers ('digits') on their hands. However, the decimal numbering system isn't the only positional numbering system possible. In fact, for most computer-based applications, the decimal numbering system isn't even the best numbering system available. Again, our goal of writing great code requires that we learn to 'think like the machine,' and that means we need to understand different ways to represent numbers on our machines. So let's take a look at how we represent values in other bases.
The decimal positional numbering system uses powers of ten and ten unique symbols for each digit position. Because decimal numbers use powers of ten, we call such numbers 'base-10' numbers. By substituting a different set of numeric digits and multiplying those digits by powers of some base other than 10, we can devise a different numbering system to represent our numbers. The base, or radix , is the value that we raise to successive powers for each digit to the left of the radix point (note that the term decimal point only applies to decimal numbers).
As an example, we can create a base-8 numbering system using eight symbols (0-7) and powers of eight (base 8, or octal , was actually a common representation on early binary computer systems). The base-8 system uses successive powers of eight to represent values. Consider the octal number 123 8 (the subscript denotes the base using standard mathematical notation), which is equivalent to 83 10 :
1 8 2 + 2 8 1 + 3 8
or
64 + 16 + 3
To create a base- n numbering system, you need n unique digits. The smallest possible radix is two (for this scheme). For bases two through ten, the convention is to use the Arabic digits zero through n ˆ’ 1 (for a base- n system). For bases greater than ten, the convention is to use the alphabetic digits a..z [1] or A..Z (ignoring case) for digits greater than nine. This scheme supports numbering systems through base 36 (10 numeric digits and 26 alphabetic digits). No agreed-upon convention exists for symbols beyond the 10 Arabic numeric digits and the 26 alphabetic digits. Throughout this book, we'll deal with base-2, base-8, and base-16 values because base 2 (binary) is the native representation most computers use, and base 16 is more compact than base 2. Base 8 deserves a short discussion because it was a popular numeric representation on older computer systems. You'll find many programs that use these three different bases, so you'll want to be familiar with them.
If you're reading this book, chances are pretty good that you're already familiar with the base-2, or binary , numbering system; nevertheless, a quick review is in order. The binary numbering system works just like the decimal numbering system, with two exceptions: binary only uses the digits 0 and 1 (rather than 0-9), and binary uses powers of two rather than powers of ten.
Why even worry about binary? After all, almost every computer language available allows programmers to use decimal notation (automatically converting decimal representation to the internal binary representation). Despite computer languages being able to convert decimal notation, most modern computer systems talk to I/O devices using binary, and their arithmetic circuitry operates on binary data. For this reason, many algorithms depend upon binary representation for correct operation. Therefore, a complete understanding of binary representation is necessary if you want to write great code.
In order to allow human beings to work with decimal representation, the computer has to convert between the decimal notation that humans use and the binary format that computers use. To appreciate what the computer does for you, it's useful to learn how to do these conversions manually.
To convert a binary value to decimal, we add 2 i for each '1' in the binary string, where i is the zero-based position of the binary digit. For example, the binary value 11001010 2 represents:
1 2 7 + 1 2 6 + 0 2 5 + 0 2 4 + 1 2 3 + 0 2 2 + 1 2 1 + 0 2
or
128 + 64 + 8 + 2
or
202 10
To convert decimal to binary is almost as easy. Here's an algorithm that converts decimal representation to the corresponding binary representation:
If the number is even, emit a zero. If the number is odd, emit a one.
Divide the number by two and throw away any fractional component or remainder.
If the quotient is zero, the algorithm is complete.
If the quotient is not zero and the number is odd, insert a one before the current string. If the quotient is not zero and the number is even, prefix your binary string with zero.
Go back to step 2 and repeat.
As you can tell by the equivalent representations, 202 10 and 11001010 2 , binary representation is not as compact as decimal representation. Because binary representation is bulky, we need some way to make the digits, or bits, in binary numbers easier to read.
In the United States, most people separate every three digits with a comma to make larger numbers easier to read. For example, 1,023,435,208 is much easier to read and comprehend than 1023435208. We'll adopt a similar convention in this book for binary numbers. We will separate each group of four binary bits with an underscore . For example, we will write the binary value 1010111110110010 2 as 1010_1111_1011_0010 2 .
This chapter has been using the subscript notation embraced by mathematicians to denote binary values (the lack of a subscript indicates the decimal base). While this works great in word processing systems, most program text editors do not provide the ability to specify a numeric base using a subscript. Even if a particular editor does support this feature, very few programming language compilers would recognize the subscript. Therefore, we need some way to represent various bases within a standard ASCII text file.
Generally , only assembly language compilers ('assemblers') allow the use of literal binary constants in a program. Because there is a wide variety of assemblers out there, it should come as no surprise that there are many different ways to represent binary literal constants in an assembly language program. Because this text presents examples using MASM and HLA, it makes sense to adopt the conventions these two assemblers use.
MASM treats any sequence of binary digits (zero and one) that ends with a 'b' or 'B' as a binary value. The 'b' suffix differentiates binary values like '1001' and the decimal value of the same form (one thousand and one). Therefore, the binary representation for nine would be '1001b' in a MASM source file.
HLA prefixes binary values with the percent symbol (%) . To make binary numbers more readable, HLA also allows you to insert underscores within binary strings:
%11_1011_0010_1101
Binary number representation is verbose. Because reading and writing binary values is awkward , programmers often avoid binary representation in program source files, preferring hexadecimal notation. Hexadecimal representation offers two great features: it's very compact, and it's easy to convert between binary and hexadecimal. Therefore, software engineers generally use hexadecimal representation rather than binary to make their programs more readable.
Because hexadecimal representation is base 16, each digit to the left of the hexadecimal point represents some value times a successive power of 16. For example, the number 1234 16 is equal to:
1 16 3 + 2 16 2 + 3 16 1 + 4 16
or
4096 + 512 + 48 + 4
or
4660 10
Hexadecimal representation uses the letters A through F for the additional six digits it requires (above and beyond the ten standard decimal digits, 0-9). The following are all examples of valid hexadecimal numbers:
234 16 DEAD 16 BEEF 16 0AFB 16 FEED 16 DEAF 16
One problem with hexadecimal representation is that it is difficult to differentiate hexadecimal values like 'dead' from standard program identifiers. Therefore, most programming languages use a special prefix or suffix character to denote the hexadecimal radix for constants appearing in your source files. Here's how you specify literal hexadecimal constants in several popular languages:
The C, C++, C#, Java, and other C-derivative programming languages use the prefix '0x' to denote a hexadecimal value. Therefore, you'd use the character sequence '0xdead' for the hexadecimal value DEAD 16 .
The MASM assembler uses an 'h' or 'H' suffix to denote a hexadecimal value. This doesn't completely resolve the ambiguity between certain identifiers and literal hexadecimal constants; 'deadh' still looks like an identifier to MASM. Therefore, MASM also requires that a hexadecimal value begin with a numeric digit. So for hexadecimal values that don't already begin with a numeric digit, you would add '0' to the beginning of the value (adding a zero to the beginning of any numeric representation does not alter the value of that representation). For example, use '0deadh' to unambiguously represent the hexadecimal value DEAD 16 .
Visual Basic uses the '&H' or '&h' prefix to denote a hexadecimal value. Continuing with our current example (DEAD 16 ), you'd use '&Hdead' to represent this hexadecimal value in Visual Basic.
Pascal (Delphi/Kylix) uses the symbol $ as a prefix for hexadecimal values. So you'd use '$dead' to represent our current example in Delphi/Kylix.
HLA similarly uses the symbol $ as a prefix for hexadecimal values. So you'd also use '$dead' to represent DEAD 16 with HLA. HLA allows you to insert underscores into the middle of a hexadecimal number to make it easier to read, for example '$FDEC_A012.'
In general, this text will use the HLA/Delphi/Kylix notation for hexadecimal numbers except in examples specific to some other programming language. Because there are several C/C++ examples in this book, you'll frequently see the C/C++ notation, as well.
On top of being a compact way to represent values in code, hexadecimal notation is also popular because it is easy to convert between the binary and hexadecimal representations. By memorizing a few simple rules, you can mentally convert between these two representations. Consider Table 2-1.
Binary | Hexadecimal |
---|---|
%0000 | $0 |
%0001 | $1 |
%0010 | $2 |
%0011 | $3 |
%0100 | $4 |
%0101 | $5 |
%0110 | $6 |
%0111 | $7 |
%1000 | $8 |
%1001 | $9 |
%1010 | $A |
%1011 | $B |
%1100 | $C |
%1101 | $D |
%1110 | $E |
%1111 | $F |
To convert the hexadecimal representation of a number into binary, substitute the corresponding four binary bits for each hexadecimal digit. For example, to convert $ABCD into the binary form %1010_1011_1100_1101, convert each hexadecimal digit according to the values in Table 2-1:
A B C D Hexadecimal 1010 1011 1100 1101 Binary
To convert the binary representation of a number into hexadecimal is almost as easy. The first step is to pad the binary number with zeros to make sure it is a multiple of four bits long. For example, given the binary number 1011001010, the first step would be to add two zero bits to the left of the number so that it contains 12 bits without changing its value. The result is 001011001010. The next step is to separate the binary value into groups of four bits: 0010_1100_1010. Finally, look up these binary values in Table 2-1 and substitute the appropriate hexadecimal digits, which are $2CA. Contrast this with the difficulty of converting between decimal and binary or decimal and hexadecimal!
Octal (base-8) representation was common in early computer systems. As a result, you may still see people use the octal representation now and then. Octal is great for 12-bit and 36-bit computer systems (or any other size that is a multiple of three). However, it's not particularly great for computer systems whose bit size is some power of two (8-bit, 16-bit, 32-bit, and 64-bit computer systems). As a result, octal has fallen out of favor over the past several decades. Nevertheless, some programming languages provide the ability to specify numeric values in octal notation, and you can still find some older Unix applications that use octal, so octal is worth discussing here.
The C programming language (and derivatives like C++ and Java), Visual Basic, and MASM support octal representation. You should be aware of the notation various programming languages use for octal numbers in case you come across it in programs written in these languages.
In C, you specify the octal base by prefixing a numeric string with a zero. For example, '0123' is equivalent to the decimal value 83 10 and is definitely not equivalent to the decimal value 123 10 .
MASM uses a 'Q' or 'q' suffix to denote an octal number (Microsoft/Intel probably chose 'Q' because it looks like an 'O' and they didn't want to use 'O' or 'o' because of the possible confusion with zero).
Visual Basic uses the '&O' (that's the letter O , not a zero) prefix to denote an octal value. For example, you'd use '&O123' to represent the decimal value 83 10 .
Converting between binary and octal is similar to converting between binary and hexadecimal, except that you work in groups of three bits rather than four. See Table 2-2 for the list of binary and octal equivalent representations.
Binary | Octal |
---|---|
%000 |
|
%001 | 1 |
%010 | 2 |
%011 | 3 |
%100 | 4 |
%101 | 5 |
%110 | 6 |
%111 | 7 |
To convert octal into binary, replace each octal digit in the number with the corresponding three bits from Table 2-2. For example, when converting 123q into a binary value the final result is %0_0101_0011:
1 2 3 001 010 011
To convert a binary number into octal, you break up the binary string into groups of three bits (padding with zeros, as necessary), and then you look up each triad in Table 2-2 and substitute the corresponding octal digit.
If you've got an octal value and you'd like to convert it to hexadecimal notation, convert the octal number to binary and then convert the binary value to hexadecimal.
[1] The '..' notation, taken from Pascal and other programming languages, denotes a range of values. For example, 'a..z' denotes all the lowercase alphabetic characters between a and z .