When you create a computer program, you are creating a set of instructions that tell the computer exactly and completely what to do. Now before you jump all over me and hammer me with comments like, "Well, duh! Of course programming a computer is like telling it what to do," I want you to read the first sentence again. It is not an analogy, and it is not some kind of vague and airy all-encompassing cop-out.
Everything that a computer does, at any time, is decided by at least one programmer. In the vast majority of cases, the computer's instructions—contained in programs—are the work-product of hundreds, if not thousands, of programmers. All of the programs that a computer uses are organized and classified in many different ways. The organization helps us humans keep track of what they do, why we need them, how to link one program with another, and other useful things. The computer's operating system is a huge collection of programs designed to work in conjunction with other programs, or sometimes to work alone, but in the context created by other programs.
We leverage the efforts of other programmers when we sit down to program a computer for any purpose. One of the results of many that have gone before is the creation of programming languages. Computers operate using a language that is usually unique to each brand and model, called machine code. Machine code is designed to directly control the computer's electronics—the hardware. Machine code is not very friendly to humans.
To give you an idea, we'll look at an example of machine code that tells a computer using an Intel 80386 chip to add together two numbers and save the result somewhere. What we will do is add A and B together and leave the result in C. To start, A will equal 4 and B will equal 6.
So our formula will be a simple math problem:
A=4 B=6 C = A + B
The computer machine code looks like this:
11000111000001010000000000000000000000000000000000000010000000000000000000000000110001110 00001010000000000000000000000000000000000000110000000000000000000000000101000010000000000 00000000000000000000000000001100000101000000000000000000000000000000001010001100000000000 000000000000000000000
Now go ahead and look carefully at that and tell yourself honestly whether you could work with a computer using machine code for longer than, oh, about 12 minutes! My personal best is somewhere around 30 seconds, but that's just me. The number system used here is the binary system.
Each one of those 1s and 0s is called a bit and has a precise meaning to the computer. This is all the computer actually understands—the ones, the zeros, their location and organization, and when and how they are to be used. To make it easier for humans to read machine code at those rare times when it is actually necessary, we normally organize the machine code with a different number system, called hexadecimal (or hex), which is a base-16 number system (rather than base-10 like the decimal system we use in everyday work). Every 4 bits becomes a hex numeral, using the symbols from 0 to 9 and the letters A to F. We pair two hex numerals to carry the information contained in 8 bits from the machine code. This compresses the information into an easier-to-read and more manageable size. Here is the same calculation written in the hex form of machine code:
C7 05 00 00 00 00 04 00 00 00 C7 05 00 00 00 00 06 00 00 00 A1 00 00 00 00 03 05 00 00 00 00 A3 00 00 00 00
Much better and easier on the eyes! There are many people who work close to the computer hardware who work in hex quite often, but it still is pretty obscure. Fortunately, there is a human-readable form of the machine code for every microprocessor or computer, which in general is known as assembly language. In this case we use words and symbols to represent meaningful things to us as programmers. Tools called assemblers convert assembly language programs to the machine code we looked at earlier. Here is the Intel 80386 Assembler version of our little math problem:
mov DWORD PTR a, 4 ; (1) mov DWORD PTR b, 6 ; (2) mov eax, DWORD PTR a ; (3) add eax, DWORD PTR b ; (4) mov DWORD PTR c, eax ; (5)
Now we are getting somewhere! Let's take a closer look. Lines 1 and 2 save the numbers 4 and 6 in memory somewhere, referenced by the symbols a and b. The third line gets the value for a (4) and stores it in some scratch memory. Line 4 gets the value for b (6), adds it to the 4 in scratch memory, and leaves the result in the same place. The last line moves the result into a place represented by the symbol c. The semicolon tells the assembler tool to ignore what comes after it; we use the area after the semicolon to write commentary and notes about the program. In this case I've used the comment space to mark the line numbers for reference.
Now that, my friends, is a program! Small and simple, yes, but it is clear and explicit and in complete control of the computer.
As useful as assembly language code is, you can see that it is still somewhat awkward. It is important to note that some large and complex programs have been written in assembly language, but it is not done often these days. Assembly language is as close to the computer hardware as one would ever willingly want to approach. You are better served by using a high-level language. The next version of our calculation is in a powerful high-level language called C. No, really! That's the name of the language. Here is our calculation written in C:
a=4; // (1) b=6; // (2)c=a+b; // (3)
Now, if you're thinking what I think you're thinking, then you're thinking, "Hey! That code looks an awful lot like the original formula!" And you know what? I think you are right. And that's part of the point behind this rather long-winded introduction: When we program, we want to use a programming language that best represents the elements of the problem we want to solve. Another point is that quite a few things are done for the programmer behind the scenes—there is a great deal of complexity. Also, you should realize that there are even more layers of complexity "below" the machine code, and that is the electronics. We're not even going to go there. The complexity exists simply because it is the nature of the computer software beast. But be aware that the same hidden complexity can sometimes lead to problems that will need to be resolved. But it's not magic—it's software.
The C language you've just seen is what is known as a procedural language. It is designed to allow programmers to solve problems by describing the procedure to use, and defining the elements that are used during the procedure. Over time, programmers started looking for more powerful methods of describing problems, and one such method that surfaced was called object-oriented programming (OOP).
The simplest point behind OOP is that programmers have a means to describe the relationships between collections of code and variables that are known as objects. The C language eventually spawned a very popular variant called C++. C++ includes the ability to use the original C procedural programming techniques, as well as the new object-oriented methods. So we commonly refer to C/C++, acknowledging the existence of both procedural and object-oriented capabilities. From here on, in the book, I will refer to C/C++ as the general name of the language, unless I need to specifically refer to one or the other for some detailed reason.
