4.1 STRINGS IN C, A BRIEF REVIEW


4.1 STRINGS IN C, A BRIEF REVIEW

A C-style string is defined as an array of characters that terminates in the null character. For example, in a C program the following would declare a string variable str with a storage allocation of 6 characters, the last character being reserved for the terminating null character:

     char str[6]; 

If we wish to also initialize a string variable at the time it is declared, we can do so by

     char str[5 + 1] = "hello"; 

or by

     char str[] = "hello";                       /* (A) */ 

where we have omitted the length of the array str. The double-quoted string of characters on the right-hand side, "hello", is called a string literal.[2] A string literal is a string constant, very much like the number 5 is an integer constant. Since a string literal is stored as an array of chars, the compiler represents it by the memory address of the first character, in the above case the address of the character h. More precisely, the type of a string literal is const char*.

We can also use a character pointer directly to represent a string, as in

      char* str = "hello";                       /* (B) */ 

which causes the address of the first character, h, of the string literal "hello" to be stored in the pointer variable str. Note that the declaration in (B) gives you direct access to the block of memory, that is read-only, in which the string literal is stored. On the other hand, the declaration in (A) copies the string literal from wherever it is stored into the designated array.

While we may declare a string variable to be an array of characters, as in the definition in line (A) above, or to be a character pointer, as in the definition in line (B), the two versions are not always interchangeable. In the array version, the individual characters can be modified, as would be the case with an array in general. However, with the pointer version, the individual characters of the string cannot be changed because a string literal, being of type const char*, is stored in a read-only section of the memory. The fact that a statement such as the one shown in line (B) is legal is because the compiler allows you to assign a const char* type to char* type. So whereas the pointer str in line (B) is of type char*, it is pointing to a block of read-only memory in which the string literal itself is stored.[3] For another difference between the string definitions in lines (A) and (B), the identifier str in the array version is the name of an array—it cannot be assigned values as it cannot serve as an lvalue. On the other hand, in the pointer version in line (B), str is a pointer variable that, during program execution, could be given any value of type char*.

We will now review briefly the frequently used functions in C that are provided by the string.h header file for performing operations on strings. These include strcmp whose prototype is given by

      int strcmp( const char* argl, const char* arg2 ); 

for comparing two strings that are supplied to it as arg1 and arg2. It returns a value less than, equal to, or greater than 0 depending on whether arg1 is less than, equal to, or greater than arg2. Typically, ASCII character sets are used and strings are compared using the ASCII integer codes associated with the characters. For example, the following inequality is true for the one-character strings shown

      strcmp( "A", "a" ) < 0 

because the ASCII code for the character A is 65, whereas the ASCII code for a is 97, making the string literal "A" less than the string literal "a". Given this character by character comparison on the basis of ASCII codes, longer strings are compared using lexicographic ordering—an ordering that is akin to how words are arranged in a dictionary. For example, in lexicographic ordering, the string "abs" will occur before the string absent, so the former is less than the latter. However, the string Zebra will occur before the string debra, as the former is less than the latter because the ASCII codes for all uppercase letters, A through Z, occupy the range 65 through 90, whereas the codes for lowercase letters, a through z, occupy the range 97 through 122.

Another frequently used string function from the string.h header file is the strlen function for ascertaining the length of a string. This function has the following prototype:

     size_t strlen( const char* arg ); 

where the return type, size_t, defined in the header file stddef.h, is usually either unsigned int or unsigned long int. For practically all cases, we can simply think of the value returned by strlen as an integer. To illustrate,

     strlen( "hello" ) 

returns 5. Note that the integer count returned by strlen does not include the terminating null character.

Another very useful C function for dealing with strings is

     char* strcpy( char* arg1, const char* arg2 ); 

which copies the characters from the string arg2 into the memory locations pointed to by arg1. For illustration, we could say

     char str1[6];     char* str2 = "hello";     strcpy( str1, str2 ); 

or, using the C memory allocation function malloc (),

     char* str1 = (char*) malloc( 6 );     char* str2 = "hello";     strcpy( str1, str2 ); 

In both cases above, the string hello will be copied into the memory locations pointed to by the character pointer str1. The function strcpy () returns the pointer that is its first argument. However, in most programming, the value returned by strcpy () is ignored. The returned value can be useful in nested calls to this function [45, p. 252].

When one wants to join two strings together, the following function from the string.h header comes handy

     char* strcat( char* arg1, const char* arg2 ); 

This function appends the string pointed to by arg2 to the string pointed to by arg1. For example,

     char str1[8];     strcpy( str1, "hi" );     strcat( str1, "there" ); 

will cause the string hithere to be stored at the memory locations pointed to by str1. As with the strcpy () function, the string concatenation function returns the pointer to its first argument. But again as before, the returned value is usually ignored in most programming.

[2]The initialization syntax shown at (A) copies over the string literal stored in a read-only section of the memory into the array. Therefore, effectively, the declaration shown at (A) is equivalent to

    char str[] = { 'h', 'e', '1', '!', 'o', '\0' } ; 

[3]Some C and C++ compilers do allow a string literal to be modified through a pointer to which the string literal is assigned. For example, the following will work with some compilers:

      char* str = "hello";      *str = 'j'; 

But modifying a string literal though a pointer in this manner could result in non-portable code. If you must modify a string literal, it is best to first copy it into an array that is stored at a location different from where the string literal itself is stored, as in

      char str[] = "hello";      str[0] = 'j'; 

String literals being represented by const char* allows for code optimization, such as achieved by storing only one copy of each literal.




Programming With Objects[c] A Comparative Presentation of Object-Oriented Programming With C++ and Java
Programming with Objects: A Comparative Presentation of Object Oriented Programming with C++ and Java
ISBN: 0471268526
EAN: 2147483647
Year: 2005
Pages: 273
Authors: Avinash Kak

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net