Section 2.1. Primitive Built-in Types

2.1. Primitive Built-in Types

C++ defines a set of arithmetic types, which represent integers, floating-point numbers, and individual characters and boolean values. In addition, there is a special type named void. The void type has no associated values and can be used in only a limited set of circumstances. The void type is most often used as the return type for a function that has no return value.

The size of the arithmetic types varies across machines. By size, we mean the number of bits used to represent the type. The standard guarantees a minimum size for each of the arithmetic types, but it does not prevent compilers from using larger sizes. Indeed, almost all compilers use a larger size for int than is strictly required. Table 2.1 (p. 36) lists the built-in arithmetic types and the associated minimum sizes.

Table 2.1. C++: Arithmetic Types
Type	Meaning	Minimum Size
`bool`	boolean	NA
`char`	character	8 bits
`wchar_t`	wide character	16 bits
`short`	short integer	16 bits
`int`	integer	16 bits
`long`	long integer	32 bits
`float`	single-precision floating-point	6 significant digits
`double`	double-precision floating-point	10 significant digits
`long double`	extended-precision floating-point	10 significant digits

Because the number of bits varies, the maximum (or minimum) values that these types can represent also vary by machine.

2.1.1. Integral Types

The arithmetic types that represent integers, characters, and boolean values are collectively referred to as the integral types.

There are two character types: char and wchar_t. The char type is guaranteed to be big enough to hold numeric values that correspond to any character in the machine's basic character set. As a result, chars are usually a single machine byte. The wchar_t type is used for extended character sets, such as those used for Chinese and Japanese, in which some characters cannot be represented within a single char.

The types short, int, and long represent integer values of potentially different sizes. Typically, shorts are represented in half a machine word, ints in a machine word, and longs in either one or two machine words (on 32-bit machines, ints and longs are usually the same size).

Machine-Level Representation of The Built-in Types

The C++ built-in types are closely tied to their representation in the computer's memory. Computers store data as a sequence of bits, each of which holds either 0 or 1. A segment of memory might hold

      00011011011100010110010000111011 ...

At the bit level, memory has no structure and no meaning.

The most primitive way we impose structure on memory is by processing it in chunks. Most computers deal with memory as chunks of bits of particular sizes, usually powers of 2. They usually make it easy to process 8, 16, or 32 bits at a time, and chunks of 64 and 128 bits are becoming more common. Although the exact sizes can vary from one machine to another, we usually refer to a chunk of 8 bits as a "byte" and 32 bits, or 4 bytes, as a "word."

Most computers associate a numbercalled an addresswith each byte in memory. Given a machine that has 8-bit bytes and 32-bit words, we might represent a word of memory as follows:

736424	0	0	1	1	0	1	1
736425	1	1	1	0	0	0	1
736426	1	1	0	0	1	0	0
736427	0	1	1	1	0	1	1

In this illustration, each byte's address is shown on the left, with the 8 bits of the byte following the address.

We can use an address to refer to any of several variously sized collections of bits starting at that address. It is possible to speak of the word at address 736424 or the byte at address 736426. We can say, for example, that the byte at address 736425 is not equal to the byte at address 736427.

To give meaning to the byte at address 736425, we must know the type of the value stored there. Once we know the type, we know how many bits are needed to represent a value of that type and how to interpret those bits.

If we know that the byte at location 736425 has type "unsigned 8-bit integer," then we know that the byte represents the number 112. On the other hand, if that byte is a character in the ISO-Latin-1 character set, then it represents the lower-case letter q. The bits are the same in both cases, but by ascribing different types to them, we interpret them differently.

The type bool represents the truth values, true and false. We can assign any of the arithmetic types to a bool. An arithmetic type with value 0 yields a bool that holds false. Any nonzero value is treated as TRue.

Signed and Unsigned Types

The integral types, except the boolean type, may be either signed or unsigned. As its name suggests, a signed type can represent both negative and positive numbers (including zero), whereas an unsigned type represents only values greater than or equal to zero.

The integers, int, short, and long, are all signed by default. To get an unsigned type, the type must be specified as unsigned, such as unsigned long. The unsigned int type may be abbreviated as unsigned. That is, unsigned with no other type implies unsigned int.

Unlike the other integral types, there are three distinct types for char: plain char, signed char, and unsigned char. Although there are three distinct types, there are only two ways a char can be represented. The char type is respresented using either the signed char or unsigned char version. Which representation is used for char varies by compiler.

How Integral Values Are Represented

In an unsigned type, all the bits represent the value. If a type is defined for a particular machine to use 8 bits, then the unsigned version of this type could hold the values 0 through 255.

The C++ standard does not define how signed types are represented at the bit level. Instead, each compiler is free to decide how it will represent signed types. These representations can affect the range of values that a signed type can hold. We are guaranteed that an 8-bit signed type will hold at least the values from 127 through 127; many implementations allow values from 128 through 127.

Under the most common strategy for representing signed integral types, we can view one of the bits as a sign bit. Whenever the sign bit is 1, the value is negative; when it is 0, the value is either 0 or a positive number. An 8-bit integral signed type represented using a sign-bit can hold values from 128 through 127.

Assignment to Integral Types

The type of an object determines the values that the object can hold. This fact raises the question of what happens when one tries to assign a value outside the allowable range to an object of a given type. The answer depends on whether the type is signed or unsigned.

For unsigned types, the compiler must adjust the out-of-range value so that it will fit. The compiler does so by taking the remainder of the value modulo the number of distinct values the unsigned target type can hold. An object that is an 8-bit unsigned char, for example, can hold values from 0 through 255 inclusive. If we assign a value outside this range, the compiler actually assigns the remainder of the value modulo 256. For example, we might attempt to assign the value 336 to an 8-bit signed char. If we try to store 336 in our 8-bit unsigned char, the actual value assigned will be 80, because 80 is equal to 336 modulo 256.

For the unsigned types, a negative value is always out of range. An object of unsigned type may never hold a negative value. Some languages make it illegal to assign a negative value to an unsigned type, but C++ does not.

In C++ it is perfectly legal to assign a negative number to an object with unsigned type. The result is the negative value modulo the size of the type. So, if we assign 1 to an 8-bit unsigned char, the resulting value will be 255, which is 1 modulo 256.

When assigning an out-of-range value to a signed type, it is up to the compiler to decide what value to assign. In practice, many compilers treat signed types similarly to how they are required to treat unsigned types. That is, they do the assignment as the remainder modulo the size of the type. However, we are not guaranteed that the compiler will do so for the signed types.

2.1.2. Floating-Point Types

The types float, double, and long double represent floating-point single-, double-, and extended-precision values. Typically, floats are represented in one word (32 bits), doubles in two words (64 bits), and long double in either three or four words (96 or 128 bits). The size of the type determines the number of significant digits a floating-point value might contain.

The float type is usually not precise enough for real programsfloat is guaranteed to offer only 6 significant digits. The double type guarantees at least 10 significant digits, which is sufficient for most calculations.