C Data Types

I l @ ve RuBoard

Now let's look at the specifics of the basic data types used by C. For each type, we describe how to declare a variable, how to represent a constant, and what a typical use would be. Some pre-ANSI C compilers do not support all these types, so check your manual to see which ones you have available.

The int Type

C offers a variety of integer types. They vary in the range of values offered and in whether negative numbers can be used. The int type is the basic choice, but should you need other choices to meet the requirements of a particular task or machine, they are available.

The int type is a signed integer. That means it must be an integer and it can be positive, negative, or zero. The range in possible values depends on the computer system. Typically, an int uses one machine word for storage. Therefore, older IBM PC compatibles, which have a 16-bit word, use 16 bits to store an int . This allows a range in values from -32768 to +32767 . Other machines might have different ranges. See Table 3.2 near the end of this chapter for examples. ANSI C specifies that the minimum range for type int should be from -32767 to +32767 . Typically, systems represent signed integers by using the value of a particular 1 bit to indicate the sign. Chapter 15 discusses common methods .

Declaring an int Variable

As you saw in Chapter 2, the keyword int is used to declare the basic integer variable. First comes int , then the chosen name of the variable, and then a semicolon. To declare more than one variable, you can declare each variable separately, or you can follow the int with a list of names in which each name is separated from the next by a comma. The following are valid declarations:

 int erns; int hogs, cows, goats;

You could have used a separate declaration for each variable, or you could have declared all four variables in the same statement. The effect is the same: Associate names and arrange storage space for four int - sized variables.

These declarations create variables but don't supply values for them. How do variables get values? You've seen two ways that they can pick up values in the program. First, there is assignment:

 cows = 112;

Second, a variable can pick up a value from a function, from scanf() , for example. Now let's look at a third way.

Initializing a Variable

To initialize a variable means to assign it an initial, or starting, value. In C, this can be done as part of the declaration. Just follow the variable name with the assignment operator ( = ) and the value you want the variable to have. Here are some examples:

 int hogs = 21; int cows = 32, goats = 14; int dogs, cats = 94;    /* valid, but poor, form */

In the last line, only cats is initialized . A quick reading might lead you to think that dogs is also initialized to 94 , so it is best to avoid putting initialized and noninitialized variables in the same declaration statement.

In short, these declarations create and label the storage for the variables and assign starting values to each (see Figure 3.4).

Figure 3.4. Defining and initializing a variable.

Type int Constants

The various integers ( 21 , 32 , 14 , and 94 ) in the last example are integer constants. When you write a number without a decimal point and without an exponent, C recognizes it as an integer. Therefore, 22 and -44 are integer constants, but 22.0 and 2.2E1 are not. C treats most integer constants as type int. Very large integers may be treated differently; see the later discussion of the long int type in section "Type long and long long Constants."

Printing int Values

You can use the printf() function to print int types. As you saw in Chapter 2, the %d notation is used to indicate just where in a line the integer is to be printed. The %d is an example of a format specifier , for it indicates the form that printf() uses to display a value. Each %d in the format string must be matched by a corresponding int value in the list of items to be printed. That value can be an int variable, an int constant, or any other expression having an int value. It's your job to make sure the number of format specifiers matches the number of values; the compiler won't catch mistakes of that kind. Listing 3.2 presents a simple program that initializes a variable and prints the value of the variable, the value of a constant, and the value of a simple expression. It also shows what can happen if you are not careful.

Listing 3.2 The `print1.c` program.

 /* print1.c--displays some properties of printf() */ #include <stdio.h> int main(void) {    int ten = 10;    printf("Doing it right: ");    printf("%d minus %d is %d\n", ten, 2, ten - 2 );    printf("Doing it wrong: ");    printf("%d minus %d is %d\n", ten );  // forgot 2 arguments    return 0; }

Compiling and running the program produces this output:

 Doing it right: 10 minus 2 is 8 Doing it wrong: 10 minus 10 is 6618680

Therefore, the first %d represents the int variable ten , the second %d represents the int constant 2 , and the third %d represents the value of the int expression ten - 2 . The second time, however, the program used ten to provide a value for the first %d and used whatever values happened to be lying around in memory for the next two! (The numbers you get could very well be different from those shown here.)

You might be annoyed that the compiler doesn't catch such an obvious error. Blame the unusual design of printf() . Most functions take a specific number of arguments, and the compiler can check to see whether you've used the correct number. However, printf() can have one, two, three, or more arguments, and that keeps the compiler from using its usual methods for error checking. Remember: Check to see that the number of format specifiers you give to printf() matches the number of values to be displayed.

Octal and Hexadecimal

Normally, C assumes that integer constants are decimal, or base 10, numbers. However, octal (base 8) and hexadecimal (base 16) numbers are popular with many programmers. Because 8 and 16 are powers of 2, and 10 is not, these number systems occasionally offer a more convenient way for expressing computer- related values. For example, the number 65536, which often pops up in 16-bit machines, is just 10000 in hexadecimal. But how can the computer tell whether 10000 is meant to be a decimal, hexadecimal, or octal value? In C, special prefixes indicate which number base you are using. A prefix of 0x or 0X (zero-exe) means that you are specifying a hexadecimal value, so 16 is written as 0x10 , or 0X10 , in hexadecimal. Similarly, a (zero) prefix means that you are writing in octal. For example, in C the decimal value 16 is written as 020 in octal. Chapter 15 discusses these alternative number bases more fully.

Be aware that this option of using different number systems is provided as a service for your convenience. It doesn't affect how the number is stored. That is, you can write 16 or 020 or 0x10 , and the number is stored exactly the same way in each case ”in the binary code used internally by computers.

Incidentally, octal and hexadecimal constants are treated as unsigned values; this chapter takes up unsigned types shortly.

Displaying Octal and Hexadecimal

Just as C enables you to write a number in any one of three number systems, it also enables you to display a number in any of these three systems. To display an integer in octal notation instead of decimal, use %o instead of %d . To display an integer in hexadecimal, use %x . Listing 3.3 shows a short example.

Listing 3.3 The `bases.c` program.

 /* bases.c--prints 100 in decimal, octal, and hex */ #include <stdio.h> int main(void) {    int x = 100;    printf("dec = %d; octal = %o; hex = %x\n", x, x, x);    return 0; }

Compiling and running this program produces this output:

 dec = 100; octal = 144; hex = 64

You see the same value displayed in three different number systems. The printf() function makes the conversions. Note that the and the 0x prefixes are not displayed in the output. ANSI C provides that the specifiers %#o , %#x , and %#X generate the , 0x , and 0X prefixes, respectively.

Other Integer Types

When you are just learning the language, the int type will probably meet most of your integer needs. To be complete, however, we'll cover the other forms now. If you like, you can skim this section and jump to the discussion of the char type, returning here when you have a need.

C offers three adjective keywords to modify the basic integer type: short , long , and unsigned .

The type short int or, more briefly , short may use less storage than int , thus saving space when only small numbers are needed. Like int , short is a signed type.
The type long int , or long , may use more storage than int, thus enabling you to express larger integer values. Like int, long is a signed type.
The type unsigned int , or unsigned , is used for variables that have only nonnegative values. This type shifts the range of numbers that can be stored. For example, a 2-byte unsigned int allows a range from 0 to 65535 in value instead of from -32768 to +32767 . The bit used to indicate the sign of signed numbers now becomes another binary digit, allowing the larger number.
ANSI C and many pre-ANSI C compilers also recognize as valid types unsigned long int , or unsigned long , and unsigned short int , or unsigned short .
Many C compilers have added the types long long int ( long long , for short) and unsigned long long int ( unsigned long long , for short) to provide for even larger integer values. The C9X committee proposes adding these two types to the standard.

Declaring Other Integer Types

Other integer types are declared in the same manner as the int type. The following list shows several examples. Not all pre-ANSI C compilers recognize the last three, and the final example is (at the time of this writing) part of the proposed revision of the ANSI C standard.

 long int estine; long johns; short int erns; short ribs; unsigned int s_count; unsigned players; unsigned long headcount; unsigned short yesvotes; long long ago;

Why Multiple Integer Types?

Why do we say that long and short types "may" use more or less storage than int ? Because C guarantees only that short is no longer than int and that long is no shorter than int . The idea is to fit the types to the machine. On an IBM PC running Windows 3.1, for example, an int and a short are both 16 bits, and a long is 32 bits. On a Windows 95 machine or a Macintosh PowerPC, however, a short is 16 bits, and both int and long are 32 bits. The natural word size on a Pentium chip or a PowerPC chip is 32 bits. Because this allows integers in excess of 2 billion (see Table 3.2 later in this chapter), the implementors of C on these processor/operating system combinations did not see a necessity for anything larger; therefore, long is the same as int . For many uses, integers of that size are not needed, so a space-saving short was created. The original IBM PC, on the other hand, has only a 16-bit word, which means that a larger long was needed.

The most common practice today is to set up long as 32 bits, short as 16 bits, and int to either 16 bits or 32 bits, depending on the machine's natural word size. In principle, however, these three types could represent three distinct sizes.

ANSI C provides guidelines specifying the minimum allowable size for each basic data type. The minimum range for both short and int is -32,767 to +32,767, corresponding to a 16-bit unit, and the minimum range for long is -2,147,483,647 to +2,147,483,647, corresponding to a 32-bit unit. (Note: For legibility, we've used commas, but C code doesn't allow that option.) For unsigned short and unsigned int , the minimum range is 0 to 65, 535, and for unsigned long , the minimum range is 0 to 4,294,967,295. The C9X committee's proposed long long type is intended to support 64-bit needs. Its minimum range is a substantial -9,223,372,036,854,775,807 to 9,223,372,036,854,775,807, and the minimum range for unsigned long long is 0 to 18,446,744,073,709,551,615. (For those of you writing checks, that's eighteen quintillion, four hundred and forty-six quadrillion, seven hundred forty-four trillion, seventy-three billion, seven hundred nine million, five hundred fifty-one thousand, six hundred fifteen in U.S. notation, but who's counting?)

When do you use the various int types? First, consider unsigned types. It is natural to use them for counting because you don't need negative numbers, and the unsigned types enable you to reach higher positive numbers than the signed types.

Use the long type if you need to use numbers that long can handle and that int cannot. However, on systems for which long is bigger than int , using long may slow down calculations, so don't use long if it is not essential. One further point: If you are writing code on a machine for which int and long are the same size, and if you do need 32-bit integers, you should use long instead of int so that the program will function correctly if transferred to a 16-bit machine.

Similarly, use long long if you need 64-bit integer values. Some computers already use 64-bit processors, and 64-bit processing should become common in the early 2000s.

Use short to save storage space or if, say, you need a 16-bit value on a system where int is 32-bit. Usually, saving storage space is important only if your program uses arrays of integers that are large in relation to a system's available memory. Another reason to use short is that it may correspond in size to hardware registers used by particular components in a computer.

Integer Overflow

What happens if an integer tries to get too big for its type? Let's set an integer to its largest possible value, add to it, and see what happens. Try both signed and unsigned types. (The program uses the %u specifier to display unsigned int values .)

 /* toobig.c--exceeds maximum int size on our system */ #include <stdio.h> int main(void) {    int i = 2147483647;    unsigned int j = 4294967295;    printf("%d %d %d\n", i, i+1, i+2);    printf("%u %u %u\n", j, j+1, j+2);    return 0; }

Here's the result for our system:

 2147483647 -2147483648 -2147483647 4294967295 0 1

The unsigned integer j is acting like a car's odometer. When it reaches its maximum value, it starts over at the beginning. The integer i acts similarly. The main difference is that an odometer and the unsigned int variable j begin at 0, but the int variable i begins at -2147483648. Notice that you are not informed that i has exceeded (overflowed) its maximum value. You would have to include your own programming to keep tabs on that.

The behavior described here is mandated by the rules of C for unsigned types. The standard doesn't define how signed types should behave, but the behavior shown here is typical.

Type long and long long Constants

Normally, when you use a number like 2345 in your program code, it is stored as an int type. What if you use a number like 1000000 on a system in which int will not hold such a large number? Then the compiler treats it as a long int , assuming that type is large enough. If the number is larger than the long maximum, C treats it as unsigned long . If that is still insufficient, C treats the value as long long or unsigned long long , if those types are available.

Octal and hexadecimal constants are treated as type unsigned int unless the value is too large. Then the compiler tries unsigned long . If that doesn't work, it tries unsigned long long , if available.

Sometimes you might want the compiler to store a small number as a long integer. Programming that involves explicit use of memory addresses on an IBM PC, for instance, can create such a need. Also, some standard C functions require type long values. To cause a small constant to be treated as type long , you can append an l (lowercase L ) or L as a suffix. I recommend the second form because it looks less like the digit 1. Therefore, a system with a 16-bit int and a 32-bit long treats the integer 7 as 16 bits and the integer 7L as 32 bits. The l and L suffixes can also be used with octal and hex integers, as in 020L and 0x10L .

Similarly, on those systems supporting the long long type, you can use an ll or LL suffix to indicate a long long value, as in 3LL . Add a u or U to the suffix for unsigned long long , as in 5ull or 10LLU or 6LLU or 9Ull .

Printing `long` , `short` , and `unsigned` Types

To print an unsigned int number, use the %u notation. To print a long value, use the %ld format specifier. If int and long are the same size on your system, just %d will suffice, but your program will not work properly when transferred to a system on which the two types are different, so use the %ld specifier. You can use the l prefix for x and o , too. Therefore, you would use %lx to print a long integer in hexadecimal format and %lo to print in octal format. Note that although C allows both uppercase and lowercase letters for constant suffixes, these format specifiers use just lowercase.

ANSI C has several additional printf() forms. First, you can use an h prefix for short types. Therefore, %hd displays a short integer in decimal form, and %ho displays a short integer in octal form. Both the h and l prefixes can be used with u for unsigned types. For instance, you would use the %lu notation for printing unsigned long types. Listing 3.4 provides an example. Systems supporting the long long types use %lld and %llu for the signed and unsigned versions.

Listing 3.4 The `print2.c` program.

 /* print2.c--more printf() properties */ #include <stdio.h> int main(void) {   unsigned int un = 3000000000; /* system with 32-bit int */   short sn = 200;               /* and 16-bit short       */   long ln = 65537;   printf("un = %u and not %d\n", un, un);   printf("sn = %hd and %d\n", sn, sn);   printf("ln = %ld and not %hd\n", ln, ln);   return 0; }

Here is the output on one system:

 un = 3000000000 and not -1294967296 sn = 200 and 200 ln = 65537 and not 1

This example points out that using the wrong specification can produce unexpected results. First, note that using the %d specifier for the unsigned variable un produces a negative number! The reason for this is that the unsigned value 3000000000 and the signed value -129496296 have exactly the same internal representation in memory on our system. (Chapter 15 explains this property in more detail.) So if you tell printf() that the number is unsigned, it prints one value, and if you tell it that the same number is signed, it prints the other value. This behavior shows up with values larger than the maximum signed value. Smaller positive values, such as 96, are stored and displayed the same for both signed and unsigned types.

Next, note that the short variable sn is displayed the same whether you tell printf() that sn is a short (the %hd specifier) or an int (the %d specifier). That's because C automatically expands a type short value to a type int value when it's passed as an argument to a function. This may raise two questions in your mind: Why does this conversion take place, and what's the use of the h modifier? The answer to the first question is that the int type is intended to be the integer size that the computer handles most efficiently . So, on a computer for which short and int are different sizes, it may be faster to pass the value as an int . The answer to the second question is that you can use the h modifier to show how a longer integer would look if truncated to the size of short . The third line of output illustrates this point. When the value 65537 is written in binary format as a 32-bit number, it looks like this: 00000000000000010000000000000001. Using the %hd specifier persuaded printf() to look at just the last 16 bits; so it displayed the value as 1.

Earlier you saw that it is your responsibility to make sure the number of specifiers matches the number of values to be displayed. Here you see that it is also your responsibility to use the correct specifier for the type of value to be displayed.

Using Characters : Type char

The char type is used for storing characters such as letters and punctuation marks, but technically it is an integer type. Why? Because the char type actually stores integers, not characters. To handle characters, the computer uses a numerical code in which certain integers represent certain characters. The most commonly used code is the ASCII code given in Appendix E, "ASCII Table." It is the code this book assumes. In it, for example, the integer value 65 represents an uppercase A. So to store the letter A, you actually need to store the integer 65 . (Many IBM mainframes use a different code, called EBCDIC, but the principle is the same.)

The standard ASCII code runs numerically from 0 to 127. This range is small enough that 7 bits can hold it. The char type is typically defined as an 8-bit unit of memory, so it is more than large enough to encompass the standard ASCII code. Many systems, such as the IBM PC and the Apple Macintosh, offer extended ASCII codes (different for the two systems) that still stay within an 8-bit limit. More generally , C guarantees that the char type is large enough to store the basic character set for the system on which C is implemented.

Many character sets have many more than 127 or even 255 values. For example, there is the Japanese kanji character set. The Unicode initiative has created a system to represent a variety of character sets worldwide and currently has over 40,000 characters. A platform that used one of these sets as its basic character set could use a 16-bit char representation. The C language defines a byte to be the number of bits used by type char, so a byte would be 16 bits rather than 8 bits on such systems, at least as far as C documentation goes.

Declaring Type char Variables

As you might expect, char variables are declared in the same manner as other variables. Here are some examples:

 char response; char itable, latan;

This program would create three char variables: response , itable , and latan .

Character Constants and Initialization

Suppose you want to initialize a character constant to the letter A . Computer languages are supposed to make things easy, so you shouldn't have to memorize the ASCII code, and you don't. You can assign the character A to grade with the following initialization:

 char grade = 'A';

A single letter contained between single quotes is a C character constant. When the compiler sees 'A' , it converts the 'A' to the proper code value. The single quotes are essential.

 char broiled;        /* declare a char variable        */ broiled = 'T';       /* OK                             */ broiled = T;         /* NO! Thinks T is a variable     */ broiled = "T";       /* NO! Thinks "T" is a string     */

If you leave off the quotes, the compiler thinks that T is the name of a variable. If you use double quotes, it thinks you are using a string. We'll discuss strings in Chapter 4.

Because characters are really stored as numeric values, you can also use the numerical code to assign values:

 char grade = 65;  /* ok for ASCII, but poor style */

In this example, 65 is type int , but, because the value is smaller than the maximum char size, it can be assigned to grade without any problems. Because 65 is the ASCII code for the letter A , this example assigns the value A to grade . Note, however, that this example assumes that the system is using ASCII code. Using 'A' instead of 65 produces code that works on any system. Therefore, it's better to use character constants than numeric code values.

Somewhat oddly, C treats character constants as type int rather than type char . For example, on an ASCII system with a 32-bit int and an 8-bit char , the code

 char grade = 'B';

represents 'B' as the numerical value 66 stored in a 32-bit unit, but grade winds up with 66 stored in an 8-bit unit. This characteristic of character constants makes it possible to define a character constant like 'FATE' , with 4 separate 8-bit ASCII codes stored in a 32-bit unit. However, attempting to assign such a character constant to a char variable results in only the last 8 bits being used, so the variable gets the value 'E' .

Nonprinting Characters

The single-quote technique is fine for characters, digits, and punctuation marks, but if you look through Appendix E, you see that some of the ASCII characters are nonprinting. For example, some represent actions such as backspacing or going to the next line or making the terminal bell ring (or speaker beep). How can these be represented?

The first way we have already mentioned: Just use the ASCII code. For example, the ASCII value for the beep character is 7, so you can do this:

 char beep = 7;

The second way to represent certain awkward characters in C is to use special symbol sequences. These are called escape sequences. Table 3.1 shows the escape sequences and their meanings.

Table 3.1. Escape sequences.

Sequence	Meaning
`\a`	Alert (ANSI C)
`\b`	Backspace
`\f`	Form feed
`\n`	Newline
`\r`	Carriage return
`\t`	Horizontal tab
`\v`	Vertical tab (ANSI C)
`\\`	Backslash `(\)`
`\'`	Single quote `(')`
`\"`	Double quote `(")` (ANSI C)
`\0oo`	Octal value ( `o` represents an octal digit)
`\xhh`	Hexadecimal value ( `h` represents a hexadecimal digit)

Escape sequences must be enclosed in single quotes when assigned to a character variable. For example, you could make the statement

 nerf = '\n';

and then print the variable nerf to advance the printer or screen one line.

Now take a closer look at what each escape sequence does. The alert character ( \a ), added by ANSI C, produces an audible or visible alert. The nature of the alert depends on the hardware, with the beep being the most common. (With many Windows implementations , the alert character has no effect.) The ANSI standard states that the alert character shall not change the active position . By active position, the standard means the location on the display device (screen, teletype, printer, and so forth) at which the next character would otherwise appear. In short, the active position is a generalization of the screen cursor you are probably accustomed to. Using the alert character in a program displayed on a screen should produce a beep without moving the screen cursor.

Next, the \b , \f , \n , \r , \t , and \v escape sequences are common output device control characters. They are best described in terms of how they affect the active position. A backspace ( \b ) moves (backspace character) the active position back one space on the current line. A form (form feed character) feed ( \f ) advances the active position to the start of the next page. A newline ( \n ) sets the active position to the beginning of the next. A carriage return ( \r ) moves the active position to the beginning of the current line. A horizontal tab ( \t ) moves the active position to the next horizontal tab stop; typically, they are found at character positions 1, 9, 17, 25, and so on. A vertical tab ( \v ) moves the active position to the next vertical tab position.

These escape characters do not necessarily work with all display devices. For instance, the form feed and vertical tab characters produce odd symbols on a PC screen instead of any cursor movement, but they work as described if sent to a printer instead of to the screen.

The next three escape sequences ( \\ , \' , and \" ) enable you to use \, ' , and " as character constants. (Because these symbols are used to define character constants as part of a printf() command, the situation could get confusing if you use them literally.) Suppose you want to print the following line:

 Gramps sez, "a \ is a backslash."

Then use this code:

 printf("Gramps sez, \"a \ is a backslash.\"\n");

The final two forms ( \0oo and \xhh ) are special representations of the ASCII code. To represent a character by its octal ASCII code, precede it with a backslash ( \ ), and enclose the whole thing in single quotes. For instance, if your compiler doesn't recognize the alert character ( \a ), you could use the ASCII code instead:

 beep = '  beep = '\007'; 
 7';

You can omit the leading zeros, so '\07' or even '\7' will do. This notation causes numbers to be interpreted as octal even if there is no initial .

ANSI C and many new implementations accept a hexadecimal form for character constants. In this case, the backslash is followed by an x or X and one to three hexadecimal digits. For example, the Control-P character has an ASCII hex code of 10 (16, in decimal), so it can be expressed as '\x10' or '\X010' . Figure 3.5 shows some representative integer types.

Figure 3.5. Writing constants with the `int` family.

When you use ASCII code, note the difference between numbers and number characters. For example, the character 4 is represented by ASCII code value 52. The notation '4' represents the symbol 4, not the numerical value 4.

At this point, you may have three questions. One, why aren't the escape sequences enclosed in single quotes in the last example ( printf("Gramps sez, \"a \\ is a backslash\"\"n"); )? Two, when should you use the ASCII code, and when should you use the escape sequences? Three, if you need to use numeric code, why use, say, '\'032' instead of 032 ? Here are the answers:

When a character, be it an escape sequence or not, is part of a string of characters enclosed in double quotes, don't enclose it in single quotes. Notice that none of the other characters in this example ( G , r , a , m , p , s , and so on) are marked off by single quotes. A string of characters enclosed in double quotes is called a character string . You explore strings in Chapter 4. Similarly, printf("Hello!\007\n"); will print Hello! and beep, but printf("Hello!7\n"); will print Hello!7 . Digits not part of an escape sequence are treated as ordinary characters to be printed.
If you have a choice between using one of the special escape sequences, say '\f' , or an equivalent ASCII code, say '\014' , use the '\f' . First, the representation is more mnemonic. Second, it is more portable. If you have a system that doesn't use ASCII code, the '\f' will still work.
First, using '\032' instead of 032 makes it clear to someone reading the code that you intend to represent a character code. Second, an escape sequence like \032 can be embedded in part of a C string, the way \007 was in point #1.

Printing Characters

The printf() function uses %c to indicate that a character should be printed. Recall that a character variable is stored as a 1-byte integer value. Therefore, if you print the value of a char variable with the usual %d specifier, you get an integer. The %c format specifier tells printf() to convert the integer to the corresponding character. Listing 3.5 shows a char variable both ways.

Listing 3.5 The `charcode.c` program.

 /* charcode.c--displays code number for a character */ #include <stdio.h> int main(void) {    char ch;    printf("Please enter a character.\n");    scanf("%c, &ch);   /* user inputs character */    printf("The code for %c is %d.\n", ch, ch);    return 0; }

Here is a sample run:

 Please enter a character. C The code for C is 67.

When you use the program, remember to press the Enter or Return key after typing the character. The scanf() function then fetches the character you typed, and the ampersand ( & ) causes the character to be assigned to the variable ch . The printf() function then prints the value of ch twice, first as a character (prompted by the %c code) and then as a decimal integer (prompted by the %d code). Note that the printf() specifiers determine how data is displayed, not how it is stored (see Figure 3.6).

Figure 3.6. Data display versus data storage.

Signed or Unsigned?

Some C implementations make char a signed type. This means a char can hold values typically in the range -128 through +127. Other implementations make char an unsigned type, which provides a range of 0 through 255. Your compiler manual should tell you which type your char is, or you can check the limits.h header file, discussed in the next chapter.

ANSI C and many newer implementations enable you to use the keywords signed and unsigned with char . Then, regardless of what your default char is, signed char would be signed, and unsigned char would be unsigned.

Types `float` and `double`

The various integer types serve well for most software development projects. However, mathematically oriented programs often make use of floating-point numbers. In C, such numbers are called type float . They correspond to the real types of FORTRAN and Pascal. The floating-point approach, as already mentioned, enables you to represent a much greater range of numbers, including decimal fractions. Floating-point number representation is similar to scientific notation , a system used by scientists to express very large and very small numbers. Let's take a look.

In scientific notation, numbers are represented as decimal numbers times powers of 10. Here are some examples.

Number	Scientific Notation	Exponential Notation
1,000,000,000	1.0 — 10 ⁹	= 1.0e9
123,000	= 1.23 — 10 ⁵	= 1.23e5
322.56	= 3.2256 — 10 ²	= 3.2256e2
0.000056	5.6 — 10 ^-5	= 5.6e-5

The first column shows the usual notation, the second column scientific notation, and the third column exponential notation, which is the way scientific notation is usually written for and by computers, with the e followed by the power of 10. Figure 3.7 shows more floating-point representations.

Figure 3.7. Some floating-point numbers.

ANSI C provides that a float has to be able to represent at least six significant figures and allow a range of at least 10 ^-37 to 10 ⁺³⁷ . The first requirement means, for example, that a float has to represent accurately at least the first six digits in a number like 33.333333. The second requirement is handy if you like to use numbers such as the mass of the sun (2.0e30 kilograms) or the charge of a proton (1.6e-19 coulombs) or the national debt. Often, systems use 32 bits to store a floating-point number. Eight bits are used to give the exponent its value and sign, and 24 bits are used to represent the nonexponent part, called the mantissa or significand , and its sign.

C also has a double (for double precision) floating-point type. The double type has the same minimum range requirements as float , but it extends the minimum number of significant figures that can be represented to 10. Typical double representations use 64 bits instead of 32. Some systems use all 32 additional bits for the nonexponent part. This increases the number of significant figures and reduces round-off errors. Other systems use some of the bits to accommodate a larger exponent; this increases the range of numbers that can be accommodated. Either approach leads to at least 13 significant figures, more than meeting the minimum standard.

ANSI C allows for a third floating-point type: long double . The intent is to provide for even more precision than double . However, ANSI C guarantees only that long double is at least as precise as double .

Declaring Floating-Point Variables

Floating-point variables are declared and initialized in the same manner as their integer cousins. Here are some examples:

 float noah, jonah; double trouble; float planck = 6.63e-34; long double gnp;

Floating-Point Constants

There are many choices open to you when you write a floating-point constant. The basic form of a floating-point constant is a signed series of digits including a decimal point, followed by an e or E , followed by a signed exponent indicating the power of 10 used. Here are two examples of valid floating-point constants:

 -1.56E+12 2.87e-3

You can leave out positive signs. You can do without a decimal point (2E5) or an exponential part (19.28), but not both simultaneously . You can omit a fractional part (3.E16) or an integer part (.45E-6), but not both (that wouldn't leave much!). Here are some more valid floating-point constants:

 3.14159 .2 4e16 .8E-5 100.

Don't use spaces in a floating-point constant.

 WRONG  1.56  E+12

By default, the compiler assumes floating-point constants are double precision. Suppose, for example, that some is a float variable, and that you have this statement:

 some = 4.0 * 2.0;

Then the 4.0 and 2.0 are stored as double , using (typically) 64 bits for each. The product is calculated using double-precision arithmetic, and only then is the answer trimmed to regular float size. This ensures greater precision for your calculations, but can slow down a program.

ANSI C enables you to override this default by using an f or F suffix to make the compiler treat a floating-point constant as type float : examples are 2.3f and 9.11E9F. An l or L suffix makes it type long double ; examples are 54.3l and 4.32e4L. Note that L is less likely to be mistaken for a 1 than is l . If the floating-point number has no suffix, it is type double .

Printing Floating-Point Values

The printf() function uses the %f format specifier to print type float and double numbers using decimal notation, and it uses %e to print them in exponential notation. Listing 3.6 illustrates this.

Listing 3.6 The `showfpt.c` program.

 /* showf_pt.c--displays float value in two ways */ #include <stdio.h> int main(void) {    float value = 32000.0;    printf("%f can be written %e\n", value, value);    return 0; }

This is the output:

 32000.000000 can be written 3.200000e+004

The preceding example illustrates the default output. The next chapter discusses how to control the appearance of this output by setting field widths and the number of places to the right of the decimal.

Those implementations that support the ANSI C long double type use the %Lf and %Le specifiers to print that type. Note, however, that both float and double use the %f or %e specifiers for output. That's because C automatically expands type float values to type double when they are passed as arguments to any function, such as printf() , that doesn't explicitly prototype the argument type.

Other Types

That finishes the list of fundamental data types (see Figure 3.8). For some of you, the list must seem long. Others of you might be thinking that more types are needed. What about a Boolean type or a string type? C doesn't have them, but it can still deal quite well with logical manipulations and with strings. You will take a first look at strings in Chapter 4.

Figure 3.8. C data types for a typical system.

Floating-Point Overflow and Underflow

What happens if you try to make a float variable exceed its limits? For example, suppose you multiply 1.0e38f by 1000.0f (overflow) or divide 1.0e-37f by 1.0e8f (underflow)? The result depends on the system. Either could cause the program to abort and to print a runtime error message. Or overflows may be replaced by a special value, such as the largest possible float value, underflows might be replaced by 0. Other systems may not issue warnings or may offer you a choice of responses. If this matter concerns you, check the rules for your system. If you can't find the information, don't be afraid of a little trial and error.

C does have other types derived from the basic types. These types include arrays, pointers, structures, and unions. Although they are subject matter for later chapters, we have already smuggled some pointers into this chapter's examples. (A pointer points to the location of a variable or other data object. The & prefix I used with the scanf() function creates a pointer telling scanf() where to place information.)

Floating-Point Round-Off Errors

Take a number, add 1 to it, and subtract the original number. What do you get? You get 1. A floating-point calculation, such as the following, may give another answer:

 /* floaterr.c--demonstrates round-off error */ #include <stdio.h> int main(void) {     float a,b;     b = 2.0e20 + 1.0;     a = b - 2.0e20;     printf("%f \n", a);    return 0; }

The output is this:

 0.000000  -VAX 750, UNIX -13584010575872.000000  -Turbo C 1.5 4008175468544.000000  -Borland C 3.1, MSVC++ 5.0

The reason for these odd results is that the computer doesn't keep track of enough decimal places to do the operation correctly. The number 2.0e20 is 2 followed by 20 zeros, and by adding 1, you are trying to change the 21st digit. To do this correctly, the program would need to be able to store a 21-digit number. A float number is typically just 6 or 7 digits scaled to bigger or smaller numbers with an exponent. The attempt is doomed. On the other hand, if you used, say, 2.0e4 instead of 2.0e20, you would get the correct answer because you are trying to change the 5th digit, and float numbers are precise enough for that.

Summary: The Basic Data Types

Keywords:

The basic data types are set up using eight keywords ” int, long, short, unsigned, char, float, double , and signed (ANSI C).

Signed Integers:

They can have positive or negative values.

int : The basic integer type for a given system. ANSI guarantees at least 16 bits for int .

short or short int : The largest short integer is no larger than the largest int and may be smaller. ANSI guarantees at least 16 bits for short .

long or long int : Can hold an integer at least as large as the largest int and possibly larger. ANSI guarantees at least 32 bits for long.

long long or long long int : This proposed extension can hold an integer at least as large as the largest long and possibly larger. The long long type is least 64 bits.

Typically, long will be bigger than short , and int will be the same as one of the two. For example, DOS-based systems for the PC provide 16-bit short and int and 32-bit long , and Windows 95-based systems provide 16-bit short and 32-bit int and long .

Unsigned Integers:

They have zero or positive values only. This extends the range of the largest possible positive number. Use the keyword unsigned before the desired type: unsigned int , unsigned long , unsigned short . A lone unsigned is the same as unsigned int .

Characters:

They are typographic symbols such as A , & , and + . By definition, the char type uses 1 byte of memory to represent a character. Historically, this character byte has most often been 8 bits, but it can be 16 bits or larger, if needed to represent the base character set.

char : The keyword for this type. Some implementations use a signed char , but others use an unsigned char . ANSI C enables you to use the keywords signed and unsigned to specify which form you want.

Floating Point:

They can have positive or negative values.

float : The basic floating-point type for the system; it can represent at least six significant figures accurately.

double : A (possibly) larger unit for holding floating-point numbers. It may allow more significant figures (at least 10, typically more) and perhaps larger exponents than float .

long double : A (possibly) even larger unit for holding floating-point numbers. It may allow more significant figures and perhaps larger exponents than double .

Summary: How to Declare a Simple Variable

Choose the type you need.
Choose a name for the variable.
Use the following format for a declaration statement:
```
 type-specifier variable-name; 
```
The type-specifier is formed from one or more of the type keywords; here are examples of declarations:
```
 int erest; unsigned short cash;. 
```
You can declare more than one variable of the same type by separating the variable names with commas, for example:
```
 char ch, init, ans;. 
```
You can initialize a variable in a declaration statement:
```
 float mass = 6.0E24; 
```

Type Sizes

Tables 3.2 and 3.3 show type sizes for some common C environments. (In some environments, you have a choice.) What is your system like? Try running the program in Listing 3.7 to find out.

Table 3.2. Integer type sizes (bytes) for representative systems.

Type	Macintosh Metrowerks CW (default)	IBM PC Borland DOS and Windows 3.1	IBM PC Windows 98 and Windows NT	ANSI C Minimum
`char`	1	1	1	1
`int`	4	2	4	2
`short`	2	2	2	2
`long`	4	4	4	4

Table 3.3. Floating-point facts for representative systems.

Type	Macintosh Metrowerks CW (default)	IBM PC Borland 3.1 (DOS)	IBM PC Windows 98 and Windows NT	ANSI C Minimum
`float`	6 digits	6 digits	6 digits	6 digits
	-37 to 38	-37 to 38	-37 to 38	-37 to 37
`double`	18 digits	15 digits	15 digits	10 digits
	-4931 to 4932	-307 to 308	-307 to 308	-37 to 37
long double	18 digits	19 digits	18 digits	10 digits
	-4931 to 4932	-4931 to 4932	-4931 to 4932	-37 to 37

For each type, the top row is the number of significant digits and the second row is the exponent range (base 10).

Listing 3.7 The `typesize.c` program.

 /* typesize.c--prints out type sizes */ #include <hstdio.h> int main(void) {    printf("Type int has a size of %d bytes.\n", sizeof(int));    printf("Type char has a size of %d bytes.\n", sizeof(char));    printf("Type long has a size of %d bytes.\n", sizeof(long));    printf("Type double has a size of %d bytes.\n",             sizeof(double));    return 0; }

C has a built-in operator called sizeof that gives sizes in bytes. (Some compilers, such as Think C for the Macintosh, require %ld instead of %d for printing sizeof quantities . That's because C leaves some latitude as to the actual integer type that sizeof uses to report its findings.) The output from this program is as follows :

 Type int has a size of 4 bytes. Type char has a size of 1 bytes. Type long has a size of 4 bytes. Type double has a size of 8 bytes.

This program found the size of only four types, but you can easily modify it to find the size of any other type that interests you. Note that the size of char is necessarily 1 byte because C defines the size of one byte in terms of char . So, on a system with a 16-bit char and a 64-bit double , sizeof will report double as having a size of 4 bytes. If you have an ANSI C compiler, you can check the limits.h and float.h header files for more detailed information on type limits. (The next chapter discusses these two files further.)

Incidentally, notice in the last line how the printf() statement is spread over two lines. You can do this as long as the break does not occur in the quoted section or in the middle of a word.

I l @ ve RuBoard

C Data Types

C Data Types

The int Type

Declaring an int Variable

Initializing a Variable

Figure 3.4. Defining and initializing a variable.

Type int Constants

Printing int Values

Listing 3.2 The print1.c program.

Octal and Hexadecimal

Displaying Octal and Hexadecimal

Listing 3.3 The bases.c program.

Other Integer Types

Declaring Other Integer Types

Why Multiple Integer Types?

Integer Overflow

Type long and long long Constants

Printing long , short , and unsigned Types

Listing 3.4 The print2.c program.

Using Characters : Type char

Declaring Type char Variables

Character Constants and Initialization

Nonprinting Characters

Table 3.1. Escape sequences.

Figure 3.5. Writing constants with the int family.

Printing Characters

Listing 3.5 The charcode.c program.

Figure 3.6. Data display versus data storage.

Signed or Unsigned?

Types float and double

Figure 3.7. Some floating-point numbers.

Declaring Floating-Point Variables

Floating-Point Constants

Printing Floating-Point Values

Listing 3.6 The showfpt.c program.

Other Types

Figure 3.8. C data types for a typical system.

Floating-Point Overflow and Underflow

Floating-Point Round-Off Errors

Summary: The Basic Data Types

Summary: How to Declare a Simple Variable

Type Sizes

Table 3.2. Integer type sizes (bytes) for representative systems.

Table 3.3. Floating-point facts for representative systems.

Listing 3.7 The typesize.c program.

Listing 3.2 The `print1.c` program.

Listing 3.3 The `bases.c` program.

Printing `long` , `short` , and `unsigned` Types

Listing 3.4 The `print2.c` program.

Figure 3.5. Writing constants with the `int` family.

Listing 3.5 The `charcode.c` program.

Types `float` and `double`

Listing 3.6 The `showfpt.c` program.

Listing 3.7 The `typesize.c` program.