Section 4.3. C-Style Character Strings

4.3. C-Style Character Strings

Although C++ supports C-style strings, they should not be used by C++ programs. C-style strings are a surprisingly rich source of bugs and are the root cause of many, many security problems.

In Section 2.2 (p. 40) we first used string literals and learned that the type of a string literal is array of constant characters. We can now be more explicit and note that the type of a string literal is an array of const char. A string literal is an instance of a more general construct that C++ inherits from C: C-style character strings. C-style strings are not actually a type in either C or C++. Instead, C-style strings are null-terminated arrays of characters:

           char ca1[] = {'C', '+', '+'};        // no null, not C-style string           char ca2[] = {'C', '+', '+', '\0'};  // explicit null           char ca3[] = "C++";     // null terminator added automatically           const char *cp = "C++"; // null terminator added automatically           char *cp1 = ca1;   // points to first element of a array, but not C-style string           char *cp2 = ca2;   // points to first element of a null-terminated char array

Neither ca1 nor cp1 are C-style strings: ca1 is a character array, but the array is not null-terminated. cp1, which points to ca1, therefore, does not point to a null-terminated array. The other declarations are all C-style strings, remembering that the name of an array is treated as a pointer to the first element of the array. Thus, ca2 and ca3 are pointers to the first elements of their respective arrays.

Exercises Section 4.3

Exercise 4.19:
Explain the meaning of the following five definitions. Identify any illegal definitions.
           (a) int i;           (b) const int ic;           (c) const int *pic;           (d) int *const cpi;           (e) const int *const cpic; 
Exercise 4.20:
Which of the following initializations are legal? Explain why.
           (a) int i = -1;           (b) const int ic = i;           (c) const int *pic = &ic;           (d) int *const cpi = &ic;           (e) const int *const cpic = &ic; 
Exercise 4.21:
Based on the definitions in the previous exercise, which of the following assignments are legal? Explain why.
           (a) i = ic;           (b) pic = &ic;           (c) cpi = pic;           (d) pic = cpic;           (e) cpic = &ic;           (f) ic = *cpic; 

Using C-style Strings

C-style strings are manipulated through (const) char* pointers. One frequent usage pattern uses pointer arithmetic to traverse the C-style string. The traversal tests and increments the pointer until we reach the terminating null character:

           const char *cp = "some value";           while (*cp) {               // do something to *cp               ++cp;           }

The condition in the while dereferences the const char* pointer cp and the resulting character is tested for its true or false value. A true value is any character other than the null. So, the loop continues until it encounters the null character that terminates the array to which cp points. The body of the while does whatever processing is needed and concludes by incrementing cp to advance the pointer to address the next character in the array.

This loop will fail if the array that cp addresses is not null-terminated. If this case, the loop is apt to read characters starting at cp until it encounters a null character somewhere in memory.

C Library String Functions

The Standard C library provides a set of functions, listed in Table 4.1, that operate on C-style strings. To use these functions, we must include the associated C header file

Table 4.1. C-Style Character String Functions
`strlen(s)`	Returns the length of `s`, not counting the null.
`strcmp(s1, s2)`	Compares `s1` and `s2` for equality. Returns 0 if `s1 == s2`, positive value if `s1 > s2`, negative value if `s1 < s2`.
`strcat(s1, s2)`	Appends `s2` to `s1`. Returns `s1`.
`strcpy(s1, s2)`	Copies `s2` into `s1`. Returns `s1`.
`strncat(s1, s2,n)`	Appends `n` characters from `s2` onto `s1`. Returns `s1`.
`strncpy(s1, s2, n)`	Copies `n` characters from `s2` into `s1`. Returns `s1`.

           #include <cstring>

which is the C++ version of the string.h header from the C library.

These functions do no checking on their string parameters.

The pointer(s) passed to these routines must be nonzero and each pointer must point to the initial character in a null-terminated array. Some of these functions write to a string they are passed. These functions assume that the array to which they write is large enough to hold whatever characters the function generates. It is up to the programmer to ensure that the target string is big enough.

When we compare library strings, we do so using the normal relational operators. We can use these operators to compare pointers to C-style strings, but the effect is quite different; what we're actually comparing is the pointer values, not the strings to which they point:

           if (cp1 < cp2) // compares addresses, not the values pointed to

Assuming cp1 and cp2 point to elements in the same array (or one past that array), then the effect of this comparison is to compare the address in cp1 with the address in cp2. If the pointers do not address the same array, then the comparison is undefined.

To compare the strings, we must use strcmp and interpret the result:

           const char *cp1 = "A string example";           const char *cp2 = "A different string";           int i = strcmp(cp1, cp2);    // i is positive           i = strcmp(cp2, cp1);        // i is negative           i = strcmp(cp1, cp1);        // i is zero

The strcmp function returns three possible values: 0 if the strings are equal; or a positive or negative value, depending on whether the first string is larger or smaller than the second.

Never Forget About the Null-Terminator

When using the C library string functions it is essential to remember the strings must be null-terminated:

           char ca[] = {'C', '+', '+'}; // not null-terminated           cout << strlen(ca) << endl; // disaster: ca isn't null-terminated

In this case, ca is an array of characters but is not null-terminated. What happens is undefined. The strlen function assumes that it can rely on finding a null character at the end of its argument. The most likely effect of this call is that strlen will keep looking through the memory that follows wherever ca happens to reside until it encounters a null character. In any event, the return from strlen will not be the correct value.

Caller Is Responsible for Size of a Destination String

The array that we pass as the first argument to strcat and strcpy must be large enough to hold the generated string. The code we show here, although a common usage pattern, is frought with the potential for serious error:

           // Dangerous: What happens if we miscalculate the size of largeStr?           char largeStr[16 + 18 + 2];         // will hold cp1 a space and cp2           strcpy(largeStr, cp1);              // copies cp1 into largeStr           strcat(largeStr, " ");              // adds a space at end of largeStr           strcat(largeStr, cp2);              // concatenates cp2 to largeStr           // prints A string example A different string           cout << largeStr << endl;

The problem is that we could easily miscalculate the size needed in largeStr. Similarly, if we later change the sizes of the strings to which either cp1 or cp2 point, then the calculated size of largeStr will be wrong. Unfortunately, programs similar to this code are widely distributed. Programs with such code are error-prone and often lead to serious security leaks.

When Using C-Style Strings, Use the `strn` Functions

If you must use C-style strings, it is usually safer to use the strncat and strncpy functions instead of strcat and strcpy:

           char largeStr[16 + 18 + 2]; // to hold cp1 a space and cp2           strncpy(largeStr, cp1, 17); // size to copy includes the null           strncat(largeStr, " ", 2);  // pedantic, but a good habit           strncat(largeStr, cp2, 19); // adds at most 18 characters, plus a null

The trick to using these versions is to properly calculate the value to control how many characters get copied. In particular, we must always remember to account for the null when copying or concatenating characters. We must allocate space for the null because that is the character that terminates largeStr after each call. Let's walk through these calls in detail:

On the call to strncpy, we ask to copy 17 characters: all the characters in cp1 plus the null. Leaving room for the null is necessary so that largeStr is properly terminated. After the strncpy call, largeStr has a strlen value of 16. Remember, strlen counts the characters in a C-style string, not including the null.
When we call strncat, we ask to copy two characters: the space and the null that terminates the string literal. After this call, largeStr has a strlen of 17. The null that had ended largeStr is overwritten by the space that we appended. A new null is written after that space.
When we append cp2 in the second call, we again ask to copy all the characters from cp2, including the null. After this call, the strlen of largeStr would be 35: 16 characters from cp1, 18 from cp2, and 1 for the space that separates the two strings.

The array size of largeStr remains 36 throughout.

These operations are safer than the simpler versions that do not take a size argument as long as we calculate the size argument correctly. If we ask to copy or concatenate more characters than the size of the target array, we will still overrun that array. If the string we're copying from or concatenating is bigger than the requested size, then we'll inadvertently truncate the new version. Truncating is safer than overrunning the array, but it is still an error.

Whenever Possible, Use Library `string`s

None of these issues matter if we use C++ library strings:

           string largeStr = cp1; // initialize large Str as a copy of cp1           largeStr += " ";       // add space at end of largeStr           largeStr += cp2;       // concatenate cp2 onto end of largeStr

Now the library handles all memory management, and we need no longer worry if the size of either string changes.

For most applications, in addition to being safer, it is also more efficient to use library strings rather than C-style strings.

4.3.1. Dynamically Allocating Arrays

A variable of array type has three important limitations: Its size is fixed, the size must be known at compile time, and the array exists only until the end of the block in which it was defined. Real-world programs usually cannot live with these restrictionsthey need a way to allocate an array dynamically at run time. Although all arrays have fixed size, the size of a dynamically allocated array need not be fixed at compile time. It can be (and usually is) determined at run time. Unlike an array variable, a dynamically allocated array continues to exist until it is explicitly freed by the program.

Exercises Section 4.3

Exercise 4.22:
Explain the difference between the following two while loops:
           const char *cp = "hello";           int cnt;           while (cp) { ++cnt; ++cp; }           while (*cp) { ++cnt; ++cp; } 
Exercise 4.23:
What does the following program do?
           const char ca[] = {'h', 'e', 'l', 'l', 'o'};           const char *cp = ca;           while (*cp) {               cout << *cp << endl;               ++cp;           } 
Exercise 4.24:
Explain the differences between strcpy and strncpy. What are the advantages of each? The disadvantages?

Exercise 4.25:
Write a program to compare two strings. Now write a program to compare the value of two C-style character strings.

Exercise 4.26:
Write a program to read a string from the standard input. How might you write a program to read from the standard input into a C-style character string?

Every program has a pool of available memory it can use during program execution to hold dynamically allocated objects. This pool of available memory is referred to as the program's free store or heap. C programs use a pair of functions named malloc and free to allocate space from the free store. In C++ we use new and delete expressions.

Defining a Dynamic Array

When we define an array variable, we specify a type, a name, and a dimension. When we dynamically allocate an array, we specify the type and size but do not name the object. Instead, the new expression returns a pointer to the first element in the newly allocated array:

           int *pia = new int[10]; // array of 10 uninitialized ints

This new expression allocates an array of ten ints and returns a pointer to the first element in that array, which we use to initialize pia.

A new expression takes a type and optionally an array dimension specified inside a bracket-pair. The dimension can be an arbitrarily complex expression. When we allocate an array, new returns a pointer to the first element in the array. Objects allocated on the free store are unnamed. We use objects on the heap only indirectly through their address.

Initializing a Dynamically Allocated Array

When we allocate an array of objects of a class type, then that type's default constructor (Section 2.3.4, p. 50) is used to initialize each element. If the array holds elements of built-in type, then the elements are uninitialized:

           string *psa = new string[10]; // array of 10 empty strings           int *pia = new int[10];       // array of 10 uninitialized ints

Each of these new expressions allocates an array of ten objects. In the first case, those objects are strings. After allocating memory to hold the objects, the default string constructor is run on each element of the array in turn. In the second case, the objects are a built-in type; memory to hold ten ints is allocated, but the elements are uninitialized.

Alternatively, we can value-initialize (Section 3.3.1, p. 92) the elements by following the array size by an empty pair of parentheses:

           int *pia2 = new int[10] (); // array of 10 uninitialized ints

The parentheses are effectively a request to the compiler to value-initialize the array, which in this case sets its elements to 0.

The elements of a dynamically allocated array can be initialized only to the default value of the element type. The elements cannot be initialized to separate values as can be done for elements of an array variable.

Dynamic Arrays of `const` Objects

If we create an array of const objects of built-in type on the free store, we must initialize that array: The elements are const, there is no way to assign values to the elements. The only way to initialize the elements is to value-initialize the array:

           // error: uninitialized const array           const int *pci_bad = new const int[100];           // ok: value-initialized const array           const int *pci_ok = new const int[100]();

It is possible to have a const array of elements of a class type that provides a default constructor:

           // ok: array of 100 empty strings           const string *pcs = new const string[100];

In this case, the default constructor is used to initialize the elements of the array.

Of course, once the elements are created, they may not be changedwhich means that such arrays usually are not very useful.

It Is Legal to Dynamically Allocate an Empty Array

When we dynamically allocate an array, we often do so because we don't know the size of the array at compile time. We might write code such as

           size_t n = get_size(); // get_size returns number of elements needed           int* p = new int[n];           for (int* q = p; q != p + n; ++q)                /* process the array */ ;

to figure out the size of the array and then allocate and process the array.

An interesting question is: What happens if get_size returns 0? The answer is that our code works fine. The language specifies that a call to new to create an array of size zero is legal. It is legal even though we could not create an array variable of size 0:

           char arr[0];            // error: cannot define zero-length array           char *cp = new char[0]; // ok: but cp can't be dereferenced

When we use new to allocate an array of zero size, new returns a valid, nonzero pointer. This pointer will be distinct from any other pointer returned by new. The pointer cannot be dereferencedafter all, it points to no element. The pointer can be compared and so can be used in a loop such as the preceeding one. It is also legal to add (or subtract) zero to such a pointer and to subtract the pointer from itself, yielding zero.

In our hypothetical loop, if the call to get_size returned 0, then the call to new would still succeed. However, p would not address any element; the array is empty. Because n is zero, the for loop effectively compares q to p. These pointers are equal; q was initialized to p, so the condition in the for fails and the loop body is not executed.

Freeing Dynamic Memory

When we allocate memory, we must eventually free it. Otherwise, memory is gradually used up and may be exhausted. When we no longer need the array, we must explicitly return its memory to the free store. We do so by applying the delete [] expression to a pointer that addresses the array we want to release:

           delete [] pia;

deallocates the array pointed to by pia, returning the associated memory to the free store. The empty bracket pair between the delete keyword and the pointer is necessary: It indicates to the compiler that the pointer addresses an array of elements on the free store and not simply a single object.

If the empty bracket pair is omitted, it is an error, but an error that the compiler is unlikely to catch; the program may fail at run time.

The least serious run-time consequence of omitting brackets when freeing an array is that too little memory will be freed, leading to a memory leak. On some systems and/or for some element types, more serious run-time problems are possible. It is essential to remember the bracket-pair when deleting pointers to arrays.

Contrasting C-Style Strings and C++ Library `string`s

The following two programs illustrate the differences in using C-style character strings versus using the C++ library string type. The string version is shorter, easier to understand, and less error-prone:

[View full width]
           // C-style character string implementation              const char *pc = "a very long literal string";              const size_t len = strlen(pc +1);      // space to  allocate              // performance test on string allocation and copy              for (size_t ix = 0; ix != 1000000; ++ix) {                  char *pc2 = new char[len + 1]; // allocate the space                  strcpy(pc2, pc);               // do the copy                  if (strcmp(pc2, pc))           // use the new string                      ;   // do nothing                  delete [] pc2;                 // free the memory           }           // string implementation              string str("a very long literal string");              // performance test on string allocation and copy              for (int ix = 0; ix != 1000000; ++ix) {                  string str2 = str; // do the copy, automatically  allocated                  if (str != str2)           // use the new string                        ;  // do nothing           }                                             // str2 is  automatically freed

These programs are further explored in the exercises to Section 4.3.1 (p. 139).

Using Dynamically Allocated Arrays

A common reason to allocate an array dynamically is if its dimension cannot be known at compile time. For example, char* pointers are often used to refer to multiple C-style strings during the execution of a program. The memory used to hold the various strings typically is allocated dynamically during program execution based on the length of the string to be stored. This technique is considerably safer than allocating a fixed-size array. Assuming we correctly calculate the size needed at run time, we no longer need to worry that a given string will overflow the fixed size of an array variable.

Suppose we have the following C-style strings:

           const char *noerr = "success";           // ...           const char *err189 = "Error: a function declaration must "                                "specify a function return type!";

We might want to copy one or the other of these strings at run time to a new character array. We could calculate the dimension at run time, as follows:

     const char *errorTxt;     if (errorFound)         errorTxt = err189;     else         errorTxt = noerr;     // remember the 1 for the terminating null     int dimension = strlen(errorTxt) + 1;     char *errMsg = new char[dimension];     // copy the text for the error into errMsg     strncpy (errMsg, errorTxt, dimension);

Recall that strlen returns the length of the string not including the null. It is essential to remember to add 1 to the length returned from strlen to accommodate the trailing null.

Exercises Section 4.3.1

Exercise 4.27:
Given the following new expression, how would you delete pa?
      int *pa = new int[10]; 
Exercise 4.28:
Write a program to read the standard input and build a vector of ints from values that are read. Allocate an array of the same size as the vector and copy the elements from the vector into the array.
Exercise 4.29:
Given the two program fragments in the highlighted box on page 138,
Explain what the programs do.
As it happens, on average, the string class implementation executes considerably faster than the C-style string functions. The relative average execution times on our more than five-year-old PC are as follows:
           user       0.47    # string class           user       2.55    # C-style character string 
Did you expect that? How would you account for it?
Exercise 4.30:
Write a program to concatenate two C-style string literals, putting the result in a C-style string. Write a program to concatenate two library strings that have the same value as the literals used in the first program.

4.3.2. Interfacing to Older Code

Many C++ programs exist that predate the standard library and so do not yet use the string and vector types. Moreover, many C++ programs interface to existing C programs that cannot use the C++ library. Hence, it is not infrequent to encounter situations where a program written in modern C++ must interface to code that uses arrays and/or C-style character strings. The library offers facilities to make the interface easier to manage.

Mixing Library `string`s and C-Style Strings

As we saw on page 80 we can initialize a string from a string literal:

           string st3("Hello World");  // st3 holds Hello World

More generally, because a C-style string has the same type as a string literal and is null-terminated in the same way, we can use a C-style string anywhere that a string literal can be used:

We can initialize or assign to a string from a C-style string.
We can use a C-style string as one of the two operands to the string addition or as the right-hand operand to the compound assignment operators.

The reverse functionality is not provided: there is no direct way to use a library string when a C-style string is required. For example, there is no way to initialize a character pointer from a string:

           char *str = st2; // compile-time type error

There is, however, a string member function named c_str that we can often use to accomplish what we want:

           char *str = st2.c_str(); // almost ok, but not quite

The name c_str indicates that the function returns a C-style character string. Literally, it says, "Get me the C-style string representation"that is, a pointer to the beginning of a null-terminated character array that holds the same data as the characters in the string.

This initialization fails because c_str returns a pointer to an array of const char. It does so to prevent changes to the array. The correct initialization is:

           const char *str = st2.c_str(); // ok

The array returned by c_str is not guaranteed to be valid indefinitely. Any subsequent use of st2 that might change the value of st2 can invalidate the array. If a program needs continuing access to the data, then the program must copy the array returned by c_str.

Using an Array to Initialize a `vector`

On page 112 we noted that it is not possible to initialize an array from another array. Instead, we have to create the array and then explicitly copy the elements from one array into the other. It turns out that we can use an array to initialize a vector, although the form of the initialization may seem strange at first. To initialize a vector from an array, we specify the address of the first element and one past the last element that we wish to use as initializers:

           const size_t arr_size = 6;           int int_arr[arr_size] = {0, 1, 2, 3, 4, 5};           // ivec has 6 elements: each a copy of the corresponding element in int_arr           vector<int> ivec(int_arr, int_arr + arr_size);

The two pointers passed to ivec mark the range of values with which to initialize the vector. The second pointer points one past the last element to be copied. The range of elements marked can also represent a subset of the array:

           // copies 3 elements: int_arr[1], int_arr[2], int_arr[3]           vector<int> ivec(int_arr + 1, int_arr + 4);

This initialization creates ivec with three elements. The values of these elements are copies of the values in int_arr[1] through int_arr[3].

Exercises Section 4.3.2

Exercise 4.31:
Write a program that reads a string into a character array from the standard input. Describe how your program handles varying size inputs. Test your program by giving it a string of data that is longer than the array size you've allocated.

Exercise 4.32:
Write a program to initialize a vector from an array of ints.

Exercise 4.33:
Write a program to copy a vector of ints into an array of ints.

Exercise 4.34:
Write a program to read strings into a vector. Now, copy that vector into an array of character pointers. For each element in the vector, allocate a new character array and copy the data from the vector element into that character array. Then insert a pointer to the character array into the array of character pointers.

Exercise 4.35:
Print the contents of the vector and the array created in the previous exercise. After printing the array, remember to delete the character arrays.

4.3. C-Style Character Strings

Exercises Section 4.3

Using C-style Strings

C Library String Functions

Table 4.1. C-Style Character String Functions

Never Forget About the Null-Terminator

Caller Is Responsible for Size of a Destination String

When Using C-Style Strings, Use the strn Functions

Whenever Possible, Use Library strings

4.3.1. Dynamically Allocating Arrays

Exercises Section 4.3

Defining a Dynamic Array

Initializing a Dynamically Allocated Array

Dynamic Arrays of const Objects

It Is Legal to Dynamically Allocate an Empty Array

Freeing Dynamic Memory

Contrasting C-Style Strings and C++ Library strings

Using Dynamically Allocated Arrays

Exercises Section 4.3.1

4.3.2. Interfacing to Older Code

Mixing Library strings and C-Style Strings

Using an Array to Initialize a vector

Exercises Section 4.3.2

When Using C-Style Strings, Use the `strn` Functions

Whenever Possible, Use Library `string`s

Dynamic Arrays of `const` Objects

Contrasting C-Style Strings and C++ Library `string`s

Mixing Library `string`s and C-Style Strings

Using an Array to Initialize a `vector`