Data Types | C++ Demystified(c) A Self-Teaching Guide

The ones and zeroes that may be stored at a memory address may represent text, such as my name , Jeff Kent. These ones and zeroes instead may represent a whole number, such as my height in inches, 72, or a number with digits to the right of the decimal point, such as my GPA in high school, which I ll say was 3.75 (I honestly don t remember, it was too long ago). Alternatively, the ones and zeroes may represent either true or false, such as whether I am a U.S. citizen.

Data comes in many forms, and is generally either numeric or textual. Additionally, some numeric data uses whole numbers, such as 6, 0, or “7, while other numeric data uses floating-point numbers , such as .6, 7.3, and “6.1.

There are different data types for each of the many forms of data. The data type you choose will affect not only the form in which the data is stored, but also the amount of memory required to store the data. Let s now take a look at these different data types.

Whole Number Data Types

We deal with whole numbers all the time. Think of the answers to questions such as how many cars are in the parking lot, how many classes are you taking, or how many brothers and sisters do you have? Each answer involves a number, with no need to express any value to the right of the decimal point. After all, who has 3.71 brothers and sisters?

Often, you don t need a large whole number. What unfortunate student would be taking 754,361 classes at one time? However, sometimes the whole number needs to be large. For example, if you are studying astronomy, the moon is approximately 240,000 miles from Earth. Indeed, sometimes the whole number may need to be very, very large. Pluto s minimum distance from the Earth is about 2.7 billion miles.

Many times, the whole number won t be negative. No matter how badly you do on a test, chances are you won t score below zero points. However, some whole numbers may be below zero, such as the temperature at the North Pole.

Because of the different needs whole numbers may have to meet, there are several different whole number data types (shown in Table 2-2). The listed sizes and ranges are typical, but may vary depending on the compiler and operating system. In the sizeof operator project later in this chapter, you will determine through code the size of different data types on your compiler and operating system.

Table 2-2: Whole Number Data Types, Sizes, and Ranges
Data Type	Size (in Bytes)	Range
short	2	“32,768 to 32,767
unsigned short	2	0 to 65,365
int	4	“2,147,483,648 to 2,147,483,647
unsigned int	4	0 to 4,294,987,295
long	4	“2,147,483,648 to 2,147,483,647
unsigned long	4	0 to 4,294,987,295

Note

You may be wondering about the purpose of the long data type, since its size and range is the same as an int in Table 2-2. However, as noted just before that table, the actual size, and, therefore, range of a particular data type varies depending on the compiler and operating system. On some combinations of compilers and operating systems, short may be 1 byte, int may be 2 bytes, and long may be 4 bytes.

Beginning programmers sometimes see information like that shown in Table 2-2 and panic that they can t possibly memorize all of it. The good news is you don t have to. To be sure, some memorization is necessary for almost any task. However, since there really is too much information to memorize, programmers frequently resort to online help or reference books. Believe me, I do.

Far more important to a programmer than rote memorization is to understand how and why a program works as it does. Therefore, this section will go into some detail as to how data types work. Some arithmetic necessarily is involved, but it is not difficult, and if you follow the arithmetic, you will have a good understanding of data types that will help you in your programming in the following chapters.

Unsigned vs. Signed Data Type

Table 2-2 lists three data types: short, int, and long. Each of these three data types has either the word unsigned in front of it or nothing at all ”as in unsigned short and short.

Unsigned means the number is always zero or positive, never negative. Signed means the number may be negative or positive (or zero). If you don t specify signed or unsigned, the data type is presumed to be signed. Thus, signed short and short are the same.

Since an unsigned data type means its value is always 0 or positive, never negative, in Table 2-2 the smallest value of an unsigned short is therefore zero; an unsigned short cannot be negative. By contrast, the smallest value of a short is “32767, since a signed data type may be negative, positive, or zero.

Size

Each of the whole number data types listed in Table 2-2 has a size. Indeed, all C++ data types have a size. However, unlike people, the size of a data type is not expressed in inches or in pounds (a sore subject for me), but in bytes.

Since a byte is the smallest unit of information that a computer can process, no data type may be smaller than one byte. Most data types are larger than one byte; all the whole number data types listed in Table 2-2 are. However, regardless of the size, the number of bytes is always a whole number. You cannot have a data type whose size is 3.5 bytes because .5 bytes, or 4 bits, is too small for the computer to process.

Generally, the number of bytes for a data type is the result of a power of 2 since computers use a binary number system. Thus, typical data type sizes are 1 byte (2 ), 2 bytes (2 ¹ ), four bytes (2 ² ), or eight bytes (2 ³ ).

The size of a data type matters in two related respects: (1) the range of different values that the data type may represent and (2) the amount of memory required to store the data type.

Range

Range means the highest and lowest value that may be represented by a given data type. For example, the range of the unsigned short data type is 0 to 65,365. These lowest and highest values are not arbitrary, but instead can be calculated.

The number of different values that a data type can represent is 2 ⁿ , n being the number of bits in the data type. The size of a short data type is 2 bytes, or 16 bits. Therefore, the number of different whole numbers that the short data type can represent is 2 ¹⁶ , which is 65,356.

However, the highest value that an unsigned short can represent is 65,355, not 65,356, because the unsigned short data type starts at 0, not 1. Therefore, the highest number that an unsigned data type may represent is 2 ⁿ “ 1; n again being the number of bits in the data type, and the minus 1 being used because we are starting at 0, not 1.

Signed data types involve an additional issue. Since the range of a signed data type includes negative numbers, there needs to be a way of determining if a number is positive or negative. We determine if a decimal number is positive or negative by looking to see if the number is preceded by a negative sign ( “). However, a bit can be only 1 or 0; there is no option for a negative sign in a binary number.

There are several different explanations in computer science for the representation of negative numbers, such as signed magnitude, one s complement, and two s complement . However, we don t need to get into the complexities of these explanations.

For example, a signed short data type, like an unsigned short data type, can represent 2 ¹⁶ or 65,356 different numbers. However, with a signed data type, these different numbers must be split evenly between those starting at zero and going up, and those starting at zero and going down. To do this, the two ranges would be 0 to 32,767 and “1 to “32,768. This can be confirmed by Table 2-2, which shows the range of a signed data type as “32,768 to 32,767.

Another way of explaining the high and low numbers of the range of the signed short data type is that one of the bits is used to store the sign, positive or negative. That leaves 15 bits. The highest number in the range is 2 ¹⁵ “ 1, or 32,767; the minus 1 being used because we are starting at 0, not 1. The lowest number in the range is “(2 ¹⁵ ), or “32,768; there s no minus 1 because we are starting at “1, not 0.

Storage

In binary, 65365 as an unsigned short is represented by sixteen ones: 1111111111111111. You cannot fit 16 bits into a single memory address. A memory address can hold only 8 bits, or a byte. How then can you store this value in memory?

The answer is you need two memory addresses to store 65365 in decimal. This provides two bytes of storage, sufficient to store this value. This is why the short data type requires 2 bytes of storage. Figure 2-3 shows how this value would be stored as a short data type.

Figure 2-3: Storage in memory of 65365 in decimal as an unsigned short data type

The int data type requires 4 bytes of storage. Figure 2-4 shows how 65365 in decimal would be stored as an unsigned int data type.

Figure 2-4: Storage in memory of 65365 in decimal as an unsigned int data type

You may legitimately wonder why 65365 in decimal as an unsigned int data type requires four bytes of storage when 65365 in decimal as an unsigned short data type requires only two bytes of storage. In other words, if you specify int instead of short as the data type, four bytes of storage will be reserved, even if you could store the number in less bytes. The reason is that it is not known, when memory is reserved, what value will be stored there. Additionally, the value could change. Accordingly, enough bytes of storage are reserved for the maximum possible value of that data type.

Why Use a Smaller Size Data Type?

Given that an int can store a far wider range of numbers than a short, you also may be wondering why you ever would use a short rather than an int. The answer is that the wider range of an int comes at a price; it requires twice as much RAM as a short ”four instead of two bytes.

However, computers these days come with hundreds of megabytes of RAM, each megabyte being 1,048,576 bytes; you still may wonder why you should care about two measly extra bytes. If it was just 2 extra bytes, you wouldn t care. However, if you are writing a program for an insurance company that has one million customers, you won t be talking about 2 extra bytes, but instead 2 million extra bytes. Therefore, you should not just reflexively choose the largest data type.

All this said, as a general rule, of the six whole number data types, you most often will use int. However, it is good to know about the other choices.

Floating-Point Data Types

I was nearsighted my entire adult life until I had lasik surgery on my eyes. In this surgery, the eye surgeon programs information that the laser used to reshape my eyeball by shaving off very thin slices of my cornea, measuring only thousandths of an inch, in certain areas of my eyeball, leaving untouched other areas, again only thousandths of an inch away.

Can you imagine my reaction if the eye surgeon had told me his philosophy was close enough for government work, so he was using only whole numbers, ignoring any values to the right of the decimal point? You next would have seen my silhouette through the wall after I ran through it to escape. (Since I still go to my eye surgeon, who, by the way, earned his way through college as a computer programmer, and it is not in my best interest to get on his bad side, let me hasten to add that he was very precise and the surgery was successful.)

Whole numbers work fine for certain information where fractions don t apply. For example, who would say they have 2 ¾ children? Whole numbers also work fine for certain information where fractions do apply but are not important. For example, it would be sufficient normally to say the location is 98 miles away; precision such as 98.177 miles usually is not necessary.

However, other times fractions, expressed as numbers to the right of the decimal point, are very important. My lasik surgery is an extreme example, but there are many other more common ones. If you had a 3.9 GPA, you probably would not want the school to just forget about the .9 and say your GPA was 3. Similarly, a bank that kept track of dollars but not cents with deposits and withdrawals would, with potentially millions of transactions a day, soon have very inaccurate information as to how much money it has, and its depositors have.

Accordingly, there are floating-point data types that you can use when a value to the right of the decimal point is important. The term floating point comes from the fact that there is no fixed number of digits before and after the decimal point; that is, the decimal point can float. Floating-point numbers also are sometimes referred to as real numbers.

Table 2-3 lists each of the floating-point number data types. As with the whole number data types, the listed sizes and ranges are typical, but may vary depending on the compiler and operating system.

Table 2-3: Floating-point Number Data Types, Sizes, and Ranges
Data Type	Size (in Bytes)	Range (in E notation)
float	4	±3.4E-38 to ±3.4E38
double	8	±1.7E-308 to ±1.7E308
long double	10	±3.4E-4932 to ±3.4E4932

Note	The size of a long double on many combinations of compilers and operating systems may be 8 bytes, not 10.

Scientific and E Notations

The range column in Table 2-3 may not look like any number you have ever seen before. That is because these are not usual decimal numbers, but instead numbers expressed in E notation , the letter E standing for exponent.

The float data types can store very large numbers, such as (in decimal) 10000000000000000000000000000000000000, which could be a distance across the universe. The float data types also can store very small numbers, such as .00000000000000000000000000000000000001, which could be the diameter of a subatomic particle.

Rather than having digits running across the page, the number can be expressed more compactly. One way is with scientific notation, another is with E notation. Table 2-4 shows how certain floating-point numbers are represented in both notations.

Table 2-4: Scientific and E Notation Representations of Floating Point Values
Decimal Notation	Scientific Notation	E Notation
123.45	1.2345 x 10 ²	1.2345E2
0.0051	5.1 x 10 ^-3	5.1E-3
1,200,000,000	1.2 x 10 ⁹	1.2E9

In scientific notation, the number before the multiplication operator, called the mantissa , always is expressed as having a single digit to the left of the decimal point, and as many digits as necessary to the right side of the decimal point to express the number. The number after the multiplication operator is a power of 10, which may be positive for very large numbers or negative for very small fractions. The value of the expression is the mantissa multiplied by the power of 10.

E notation is very similar to scientific notation. The only difference is the multiplication operator, followed by 10 and an exponent, is replaced by an E followed by the exponent.

Storage of Floating-Point Numbers

Since only ones and zeroes can be stored in memory, complex codes, well beyond the scope of this book, are required to store floating-point numbers. Even with complex codes, a computer can only approximately represent many floating-point values. Indeed, in certain programs the programmer has to take care to ensure that small discrepancies in each of a number of approximations don t accumulate to the point where the final result is wrong.

Note	Because mathematics with floating-point numbers requires a great deal of computing power, many CPUs come with a chip specialized for performing floating-point arithmetic. These chips often are referred to as math coprocessors .

Text Data Types

There are two text data types. The first is char, which stands for character. It usually is 1 byte, and can represent any single character, including a letter, a digit, a punctuation mark, or a space.

The second text data type is string. The string data type may store a number of characters , including this sentence , or paragraph, or page. The number of bytes required depends on the number of characters involved.

Note

Unlike char and the other data types we have discussed, the string type is not a data type built into C++. Instead, it is defined in the standard library file string, which therefore must be included with an include directive (#include < string > ) to use the string data type. Chapter 1 covers the include directive, which in the Hello World! program was #include < iostream > .

Storage of Character Values

There is a reason why the size of a character data type usually is 1 byte.

ANSI (American National Standards Institute) and ASCII (American Standards Committee for Information Interchange) adopted for the English language a set of 256 characters, which includes all alphabetical characters (upper- and lowercase), digits and punctuation marks, and even characters used in graphics and line drawing. Each of these 256 different characters is represented by a number between 0 and 255 that it corresponds to. Table 2-5 lists the ASCII values of commonly used characters.

Table 2-5: ASCII Values of Commonly Used Characters
Characters	Values	Comments
0 through 9	48 “57	0 is 48, 9 is 57
A through Z	65 “90	A is 65, Z is 90
a through z	97 “122	a is 97, z is 122

Each of the 256 different values can be represented by different combinations of 8 bits, or one byte. This is true because 2 ⁸ equals 256. Thus, 00000000 is equal to 0, the smallest ASCII value, and 11111111 is equal to 255, the largest ASCII value ^.

For example, the letter J has the ASCII code 74. The binary equivalent of 74 is 1001010. Thus, 1001010 at a memory address could indicate the letter J.

Note

1001010 also could indicate the number 74; you wouldn t know which value was being represented unless you knew the data type associated with that memory address. In the next chapter, you will learn about variables , which enable you to associate a particular data type with a specific memory address.

Storage of Strings

The amount of memory required for a string depends on the number of characters in the string. However, each memory address set aside for the string would store one character of the string.

The bool Data Type

There is one more data type, bool. This data type has only two possible values, true and false, and its size usually is one byte. The term bool is a shortening of Boolean, which is usually used in connection with Boolean Algebra, named after the British mathematician , George Boole.

The bool data type is mentioned separately since it does not neatly fit into either the number or text categories. It could be regarded as a numeric data type in that zero is seen as false, and one (or any other non-zero number) as true. While it may not seem intuitive why zero would be false and one would be true, remember that computers essentially store information in switches, where 1 is on, and 0 is off.