Section 2.2. Literal Constants | C Primer Plus (5th Edition)

2.2. Literal Constants

A value, such as 42, in a program is known as a literal constant: literal because we can speak of it only in terms of its value; constant because its value cannot be changed. Every literal has an associated type. For example, 0 is an int and 3.14159 is a double. Literals exist only for the built-in types. There are no literals of class types. Hence, there are no literals of any of the library types.

Advice: Using the Built-in Arithmetic Types

The number of integral types in C++ can be bewildering. C++, like C, is designed to let programs get close to the hardware when necessary, and the integral types are defined to cater to the peculiarities of various kinds of hardware. Most programmers can (and should) ignore these complexities by restricting the types they actually use.

In practice, many uses of integers involve counting. For example, programs often count the number of elements in a data structure such as a vector or an array. We'll see in Chapters 3 and 4 that the library defines a set of types to use when dealing with the size of an object. When counting such elements it is always right to use the library-defined type intended for this purpose. When counting in other circumstances, it is usually right to use an unsigned value. Doing so avoids the possibility that a value that is too large to fit results in a (seemingly) negative result.

When performing integer arithmetic, it is rarely right to use shorts. In most programs, using shorts leads to mysterious bugs when a value is assigned to a short that is bigger than the largest number it can hold. What happens depends on the machine, but typically the value "wraps around" so that a number too large to fit turns into a large negative number. For the same reason, even though char is an integral type, the char type should be used to hold characters and not for computation. The fact that char is signed on some implementations and unsigned on others makes it problematic to use it as a computational type.

On most machines, integer calculations can safely use int. Technically speaking, an int can be as small as 16 bitstoo small for most purposes. In practice, almost all general-purpose machines use 32-bits for ints, which is often the same size used for long. The difficulty in deciding whether to use int or long occurs on machines that have 32-bit ints and 64-bit longs. On such machines, the run-time cost of doing arithmetic with longs can be considerably greater than doing the same calculation using a 32-bit int. Deciding whether to use int or long requires detailed understanding of the program and the actual run-time performance cost of using long versus int.

Determining which floating-point type to use is easier: It is almost always right to use double. The loss of precision implicit in float is significant, whereas the cost of double precision calculations versus single precision is negligible. In fact, on some machines, double precision is faster than single. The precision offered by long double usually is unnecessary and often entails considerable extra run-time cost.

Rules for Integer Literals

We can write a literal integer constant using one of three notations: decimal, octal, or hexadecimal. These notations, of course, do not change the bit representation of the value, which is always binary. For example, we can write the value 20 in any of the following three ways:

      20     // decimal      024    // octal      0x14   // hexadecimal

Literal integer constants that begin with a leading 0 (zero) are interpreted as octal; those that begin with either 0x or 0X are interpreted as hexadecimal.

By default, the type of a literal integer constant is either int or long. The precise type depends on the value of the literalvalues that fit in an int are type int and larger values are type long. By adding a suffix, we can force the type of a literal integer constant to be type long or unsigned or unsigned long. We specify that a constant is a long by immediately following the value with either L or l (the letter "ell" in either uppercase or lowercase).

Exercises Section 2.1.2

Exercise 2.1:
What is the difference between an int, a long, and a short value?

Exercise 2.2:
What is the difference between an unsigned and a signed type?

Exercise 2.3:
If a short on a given machine has 16 bits then what is the largest number that can be assigned to a short? To an unsigned short?

Exercise 2.4:
What value is assigned if we assign 100,000 to a 16-bit unsigned short? What value is assigned if we assign 100,000 to a plain 16-bit short?

Exercise 2.5:
What is the difference between a float and a double?

Exercise 2.6:
To calculate a mortgage payment, what types would you use for the rate, principal, and payment? Explain why you selected each type.

When specifying a long, use the uppercase L: the lowercase letter l is too easily mistaken for the digit 1.

In a similar manner, we can specify unsigned by following the literal with either U or u. We can obtain an unsigned long literal constant by following the value by both L and U. The suffix must appear with no intervening space:

      128u     /* unsigned   */          1024UL    /* unsigned long   */      1L       /* long    */             8Lu        /* unsigned long   */

There are no literals of type short.

Rules for Floating-Point Literals

We can use either common decimal notation or scientific notation to write floating-point literal constants. Using scientific notation, the exponent is indicated either by E or e. By default, floating-point literals are type double. We indicate single precision by following the value with either F or f. Similarly, we specify extended precision by following the value with either L or l (again, use of the lowercase l is discouraged). Each pair of literals below denote the same underlying value:

      3.14159F            .001f          12.345L            0.      3.14159E0f          1E-3F          1.2345E1L          0e0

Boolean and Character Literals

The words true and false are literals of type bool:

      bool test = false;

Printable character literals are written by enclosing the character within single quotation marks:

      'a'         '2'         ','         ' ' // blank

Such literals are of type char. We can obtain a wide-character literal of type wchar_t by immediately preceding the character literal with an L, as in

      L'a'

Escape Sequences for Nonprintable Characters

Some characters are nonprintable. A nonprintable character is a character for which there is no visible image, such as backspace or a control character. Other characters have special meaning in the language, such as the single and double quotation marks, and the backslash. Nonprintable characters and special characters are written using an escape sequence. An escape sequence begins with a backslash. The language defines the following escape sequences:

newline	`\n`	horizontal tab	`\t`
vertical tab	`\v`	backspace	`\b`
carriage return	`\r`	formfeed	`\f`
alert (bell)	`\a`	backslash	`\\`
question mark	`\?`	single quote	`\'`
double quote	`\"`

We can write any character as a generalized escape sequence of the form

      \ooo

where ooo represents a sequence of as many as three octal digits. The value of the octal digits represents the numerical value of the character. The following examples are representations of literal constants using the ASCII character set:

      \7 (bell)      \12 (newline)     \40 (blank)      \0 (null)      \062 ('2')        \115 ('M')

The character represented by '\0' is often called a "null character," and has special significance, as we shall soon see.

We can also write a character using a hexadecimal escape sequence

      \xddd

consisting of a backslash, an x, and one or more hexadecimal digits.

Character String Literals

All of the literals we've seen so far have primitive built-in types. There is one additional literalstring literalthat is more complicated. String literals are arrays of constant characters, a type that we'll discuss in more detail in Section 4.3 (p. 130).

String literal constants are written as zero or more characters enclosed in double quotation marks. Nonprintable characters are represented by their underlying escape sequence.

      "Hello World!"                 // simple string literal      ""                             // empty string literal      "\nCC\toptions\tfile.[cC]\n"   // string literal using newlines and tabs

For compatibility with C, string literals in C++ have one character in addition to those typed in by the programmer. Every string literal ends with a null character added by the compiler. A character literal

      'A' // single quote: character literal

represents the single character A, whereas

      "A" // double quote: character string literal

represents an array of two characters: the letter A and the null character.

Just as there is a wide character literal, such as

         L'a'

there is a wide string literal, again preceded by L, such as

       L"a wide string literal"

The type of a wide string literal is an array of constant wide characters. It is also terminated by a wide null character.

Concatenated String Literals

Two string literals (or two wide string literals) that appear adjacent to one another and separated only by spaces, tabs, or newlines are concatenated into a single new string literal. This usage makes it easy to write long literals across separate lines:

      // concatenated long string literal      std::cout << "a multi-line "                   "string literal "                   "using concatenation"                << std::endl;

When executed this statement would print:

      a multi-line string literal using concatenation

What happens if you attempt to concatenate a string literal and a wide string literal? For example:

      // Concatenating plain and wide character strings is undefined      std::cout << "multi-line " L"literal " << std::endl;

The result is undefinedthat is, there is no standard behavior defined for concatenating the two different types. The program might appear to work, but it also might crash or produce garbage values. Moreover, the program might behave differently under one compiler than under another.

Advice: Don't Rely on Undefined Behavior

Programs that use undefined behavior are in error. If they work, it is only by coincidence. Undefined behavior results from a program error that the compiler cannot detect or from an error that would be too much trouble to detect.

Unfortunately, programs that contain undefined behavior can appear to execute correctly in some circumstances and/or on one compiler. There is no guarantee that the same program, compiled under a different compiler or even a subsequent release of the current compiler, will continue to run correctly. Nor is there any guarantee that what works with one set of inputs will work with another.

Programs should not (knowingly) rely on undefined behavior. Similarly, programs usually should not rely on machine-dependent behavior, such as assuming that the size of an int is a fixed and known value. Such programs are said to be nonportable. When the program is moved to another machine, any code that relies on machine-dependent behavior may have to be found and corrected. Tracking down these sorts of problems in previously working programs is, mildly put, a profoundly unpleasant task.

Multi-Line Literals

There is a more primitive (and less useful) way to handle long strings that depends on an infrequently used program formatting feature: Putting a backslash as the last character on a line causes that line and the next to be treated as a single line.

As noted on page 14, C++ programs are largely free-format. In particular, there are only a few places that we may not insert whitespace. One of these is in the middle of a word. In particular, we may not break a line in the middle of a word. We can circumvent this rule by using a backslash:

       // ok: A \ before a newline ignores the line break       std::cou\       t << "Hi" << st\       d::endl;

is equivalent to

       std::cout << "Hi" << std::endl;

We could use this feature to write a long string literal:

            // multiline string literal            std::cout << "a multi-line \       string literal \       using a backslash"                     << std::endl;           return 0;       }

Note that the backslash must be the last thing on the lineno comments or trailing blanks are allowed. Also, any leading spaces or tabs on the subsequent lines are part of the literal. For this reason, the continuation lines of the long literal do not have the normal indentation.

Exercises Section 2.2

Exercise 2.7:
Explain the difference between the following sets of literal constants:
   (a) 'a',L 'a',"a",L"a"   (b) 10, 10u, 10L, 10uL, 012, 0xC   (c) 3.14, 3.14f, 3.14L 
Exercise 2.8:
Determine the type of each of these literal constants:
       (a) -10 (b) -10u (c) -10. (d) -10e-2 
Exercise 2.9:
Which, if any, of the following are illegal?
       (a) "Who goes with F\145rgus?\012"       (b) 3.14e1L          (c) "two" L"some"       (d) 1024f            (e) 3.14UL       (f) "multiple line            comment" 
Exercise 2.10:
Using escape sequences, write a program to print 2M followed by a newline. Modify the program to print 2, then a tab, then an M, followed by a newline.