Character String Arrays and Initialization

I l @ ve RuBoard

Character String Arrays and Initialization

When you define a character string array, you must let the compiler know how much space is needed. One way to do this is to initialize the array with a string constant. With older C implementations , you must use static or external arrays for this purpose, but we'll assume you're using ANSI C and can initialize arrays of any storage class. The following declaration initializes the external array m1 to the characters of the indicated string:

 char m1[] = "Just limit yourself to one line's worth."; 

This form of initialization is short for the standard array initialization form:

 char m1[] = { `J', `u', `s', `t', ` `, `l', `i', `m', `i', `t', ` `, `y', `o', `u', `r', `s', `e', `l', `f', ` `, `t', `o', ` `, `o', `n', `e', ` `, `l', `i', `n', `e', `\", `s', ` `, `w', `o', `r', `t', `h', `.', `\O' }; 

Note the closing null character. Without it, you have a character array, but not a string. For either form (and we do recommend the first), the compiler counts the characters and sizes the array accordingly . Initializing character arrays is one case when it really does make sense to let the compiler determine the array size .

The array name m1 , like any array name, yields the address of the first element of the array:

 m1 == &m1[0] , *m1 == `J', and *(m1+1) == m1[1] == `u' 

Indeed, you can use pointer notation to set up a string. For example, Listing 11.1 uses this declaration:

 char *m3 = "\nEnough about me -- what's your name?"; 

This declaration is very nearly the same as this one:

 char m3[] = "\nEnough about me -- what's your name?" 

Both declarations amount to saying that m3 is a pointer to the indicated string. In both cases, the string itself determines the amount of storage set aside for the string. Nonetheless, the forms are not identical.

Array Versus Pointer

What is the difference, then, between an array and a pointer form? The array form ( m3[] ) causes an array of 38 elements (one for each character plus one for the terminating `\0' ) to be created in static storage. (The storage is static because the declaration is external.) Each element is initialized to the corresponding character. Hereafter, the compiler will recognize the name m3 as a synonym for the address of the first array element, &m3[0] . One important point here is that in the array form, m3 is an address constant . You can't change m3 because that would mean changing the location (address) where the array is stored. You can use operations like m3+1 to identify the next element in an array, but ++m3 is not allowed. The increment operator can be used only with the names of variables , not with constants.

The pointer form ( *m3 ) also causes 38 elements in static storage to be set aside for the string. In addition, it sets aside one more storage location for the pointer variable m3 . This variable initially points to the first character of the string, but the value can be changed. Therefore, you can use the increment operator. For instance, ++m3 would point to the second character ( E ). Note that *m3 does not have to be declared as static, even for K&R C. The reason is that you are not initializing an array of 38 elements; rather, you are initializing a single pointer variable. There are no storage class restrictions for initializing ordinary, non-array variables, either in K&R C or in ANSI C.

Are these differences important? Often they are not, but it depends on what you try to do. See the following discussion for some examples.

Array and Pointer Differences

Let's examine the differences between initializing a character array to hold a string and initializing a pointer to point to a string. (By "pointing to a string," we really mean pointing to the first character of a string.) For example, consider these two declarations:

 char heart[] = "I love Tillie!"; char *head = "I love Millie!"; 

The chief difference is that the array name heart is a constant, but the pointer head is a variable. What practical difference does this make?

First, both can use array notation:

 for (i = 0; i < 6; i++)     putchar(heart[i]); putchar(`\n'); for (i = 0; i < 6; i++)     putchar(head[i])); putchar(`\n'); 

This is the output:

 I love I love 

Next, both can use pointer addition:

 for (i = 0; i < 6; i++)     putchar(*(heart + i)); putchar(`\n'); for (i = 0; i < 6; i++)     putchar(*(head + i)); putchar(`\n'); 

Again, the output is this:

 I love I love 

Only the pointer version, however, can use the increment operator:

 while (*(head) != ` 
  while (*(head) != `\0') /* stop at end of string */ putchar(*(head++)); /* print character, advance pointer */  
') /* stop at end of string */ putchar(*(head++)); /* print character, advance pointer */

This produces the following output:

 I love Millie! 

Suppose you want head to agree with heart . You can say this:

 head = heart; /* head now points to the array heart */ 

This makes the head pointer point to the first element of the heart array.

However, you cannot say this:

 heart = head; /* illegal construction */ 

The situation is analogous to x = 3; versus 3 = x; . The left side of the assignment statement must be a variable or, more generally , an lvalue, such as *p_int . Incidentally, head = heart; does not make the Millie string vanish ; it just changes the address stored in head . Unless you've saved the address of "I love Millie!" elsewhere, however, you won't be able to access that string when head points to another location.

There is a way to alter the heart message ”go to the individual array elements:

 heart[7]= `M'; 

or

 *(heart + 7) = `M'; 

The elements of an array are variables (unless the array was declared as const ), but the name is not a variable.

Specifying Array Size Explicitly

Another way to set up storage is to be explicit. In the external declaration, you could have said

 char m1[44] = "Just limit yourself to one line's worth."; 

instead of

 char m1[] = "Just limit yourself to one line's worth."; 

Just be sure that the number of elements is at least one more (that null character again) than the string length. As with other static or external arrays, any unused elements are automatically initialized to (which in char form is the null character, not the zero digit character). See Figure 11.1.

Figure 11.1. Initializing an array.
graphics/11fig01.jpg

Note that in the program you had to assign a size for the array name :

 #define LINELEN 81    /* maximum string length + 1 */ char name[LINELEN]; 

Because the contents for name are to be read when the program runs, the compiler has no way of knowing in advance how much space to set aside unless you tell it. There is no string constant present whose characters the compiler can count, so we gambled that 80 characters would be enough to hold the user 's name. When you declare an array, the array size must evaluate to an integer constant. You can't use a variable that gets set at runtime. The array size is locked into the program at compile time.

 int n = 8; char cakes[2 + 5];  /* valid, size is a constant expression */ char crumbs[n];   /* invalid, size is a variable     */ 

Arrays of Character Strings

Often it is convenient to have an array of character strings. Then you can use a subscript to access several different strings. Listing 11.1 used this example:

 static char *mytal[LIM] = {"Adding numbers swiftly",        "Multiplying accurately", "Stashing data",        "Following instructions to the letter",        "Understanding the C language"}; 

Let's study this declaration. Because LIM is 5 , you can say that mytal is an array of 5 pointers-to- char . That is, mytal is a one-dimensional array, and each element in the array holds the address of a char . The first pointer is mytal[0] , and it points to the first character of the first string. The second pointer is mytal[1] , and it points to the beginning of the second string. In general, each pointer points to the first character of the corresponding string.

 *mytal[0] == `A', *mytal[1] == `M', *mytal[2] == `S' 

And so on. The mytal array doesn't actually hold the strings; it just holds the addresses of the strings. You can think of mytal[0] as representing the first string and *mytal[0] as the first character of the first string. Because of the relationship between array notation and pointers, you can also use mytal[0][0] to represent the first character of the first string, even though mytal is not defined as a two-dimensional array.

The initialization follows the rules for arrays. The braced portion is equivalent to this:

 {{...}, {...},...,{...} }; 

The ellipses indicate the stuff we were too lazy to type in. The main point is that the first set of double quotation marks corresponds to a brace -pair and so is used to initialize the first character string pointer. The next set of double quotation marks initializes the second pointer, and so on. A comma separates adjacent strings.

Another approach is to create a two-dimensional array:

 char mytal_2[LIM][LINLIM]; 

Here, mytal_2 is an array of five elements, and each of these elements is itself an array of 81 char values. In this case, the strings themselves are stored in the array. One difference is that this second choice sets up a rectangular array with all the rows of the same length. That is, 81 elements are used to hold each string. The array of pointers, however, sets up a ragged array, with each row's length determined by the string it was initialized to:

 char *mytal[LIM]; 

This ragged array doesn't waste any storage space. Figure 11.2 shows the two kinds of arrays. (Actually, the strings pointed to by the mytal array elements don't necessarily have to be stored consecutively in memory, but the figure does illustrate the difference in storage requirements.)

Figure 11.2. Rectangular versus ragged array.
graphics/11fig02.jpg

Another difference is that mytal and mytal_2 have different types; mytal is an array of pointers-to- char , but mytal_2 is an array of arrays of char . In short, mytal holds five addresses, but mytal_2 holds five complete character arrays.

Pointers and Strings

Perhaps you noticed an occasional reference to pointers in this discussion of strings. Most C operations for strings actually work with pointers. Consider, for example, the instructive program shown in Listing 11.3.

Listing 11.3 The p_and_s.c program.
 /* p_and_s.c -- pointers and strings */ #include <stdio.h> int main(void) {   char * mesg = "Don't be a fool!";   char * copy;   copy = mesg;   printf("%s\n", copy);   printf("mesg = %s; &mesg = %p; value = %p\n",        mesg, &mesg, mesg);   printf("copy = %s; &copy = %p; value = %p\n",        copy, &copy, copy);   return 0; } 

Note

Use %u or %lu instead of %p if your compiler doesn't support %p .


Looking at this program, you might think that it makes a copy of the string "Don't be a fool! , and your first glance at the output might seem to confirm this guess.

 Don't be a fool! mesg = Don't be a fool!; &mesg = 0064FDF0; value = 00410A30 copy = Don't be a fool!; &copy = 0064FDF4; value = 00410A30 

Study the printf() output. First, mesg and copy are printed as strings ( %s ). No surprises here; all the strings are "Don't be a fool!" .

The next item on each line is the address of the specified pointer. The two pointers mesg and copy are stored in locations 0064FDF0 and 0064FDF4 , respectively.

Now notice the final item, the one we called value . It is the value of the specified pointer. The value of the pointer is the address it contains. You can see that mesg points to location 00410A30 , and so does copy . Therefore, the string itself was never copied . All that copy = mesg; does is produce a second pointer pointing to the very same string.

Why all this pussyfooting around? Why not just copy the whole string? Well, ask yourself which is more efficient: copying one address or copying, say, 50 separate elements? Often, the address is all that is needed to get the job done. If you truly require a copy that is a duplicate, you can use the strcpy () or strncpy () functions discussed later in this chapter.

Now that we have discussed defining strings within a program, let's turn to strings that are read in.

I l @ ve RuBoard


C++ Primer Plus
C Primer Plus (5th Edition)
ISBN: 0672326965
EAN: 2147483647
Year: 2000
Pages: 314
Authors: Stephen Prata

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net