Section 9.6. strings Revisited


9.6. strings Revisited

We introduced the string type in Section 3.2 (p. 80). Table 9.12 (p. 337) recaps the string operations covered in that section.

Table 9.12. string Operations Introduced in Section 3.2

string s;

Defines a new, empty string named s.

string s(cp);

Defines a new string initialized from the null-terminated C-style string pointed to by cp.

string s(s2);

Defines a new string initialized as a copy of s2.

is >> s;

Reads a whitespace-separated string from the input stream is into s.

os << s;

Writes s to the output stream os.

getline(is, s)

Reads characters up to the first newline from input stream is into s.

s1 + s2

Concatenates s1 and s2, yielding a new string.

s1 += s2

Appends s2 to s1.

Relational Operators

The equality (== and !=) and relational (<, <=, >, and >=) can be used to compare strings. string comparison is equivalent to (case-sensitive) dictionary ordering.


In addition to the operations we've already used, strings also supports most of the sequential container operations. In some ways, we can think of a string as a container of characters. With some exceptions, strings support the same operations that vectors support: The exceptions are that string does not support the operations to use the container like a stack: We cannot use the front, back, and pop_back operations on strings.

Exercises Section 9.5

Exercise 9.33:

Which is the most appropriatea vector, a deque, or a listfor the following program tasks? Explain the rationale for your choice. If there is no reason to prefer one or another container explain why not?

  1. Read an unknown number of words from a file for the purpose of generating English language sentences.

  2. Read a fixed number of words, inserting them in the container alphabetically as they are entered. We'll see in the next chapter that associative containers are better suited to this problem.

  3. Read an unknown number of words. Always insert new words at the back. Remove the next value from the front.

  4. Read an unknown number of integers from a file. Sort the numbers and then print them to standard output.


The container operations that string supports are:

  • The typedefs, including the iterator types, listed in Table 9.5 (p. 316).

  • The constructors listed in Table 9.2 (p. 307) except for the constructor that takes a single size parameter.

  • The operations to add elements listed in Table 9.7 (p. 319) that vector supports. Note: Neither vector nor string supports push_front.

  • The size operations in Table 9.8 (p. 324).

  • The subscript and at operations listed in Table 9.9 (p. 325); string does not provide back or front operations listed in that table.

  • The begin and end operations of Table 9.6 (p. 317).

  • The erase and clear operations of Table 9.10 (p. 326); string does not support either pop_back or pop_front.

  • The assignment operations in Table 9.11 (p. 329).

  • Like the elements in a vector, the characters of a string are stored contiguously. Therefore, string supports the capacity and reserve operations described in Section 9.4 (p. 330).

When we say that string supports the container operations, we mean that we could take a program that manipulates a vector and rewrite that same program to operate on strings. For example, we could use iterators to print the characters of a string a line at a time to the standard output:

      string s("Hiya!");      string::iterator iter = s.begin();      while (iter != s.end())          cout << *iter++ << endl; // postfix increment: print old value 

Not surprisingly, this code looks almost identical to the code from page 163 that printed the elements of a vector<int>.

In addition to the operations that string shares with the containers, string supports other operations that are specific to strings. We will review these string-specific operations in the remainder of this section. These operations include additional versions of container-related operations as well as other, completely new functions. The additional functions that string provides are covered starting on page 341.

The additional versions of the container operations that string provides are defined to support attributes that are unique to strings and not shared by the containers. For example, several operations permit us to specify arguments that are pointers to character arrays. These operations support the close interaction between library strings and character arrays, whether null-terminated or not. Other versions let us use indices rather than iterators. These versions operate positionally: We specify a starting position, and in some cases a count, to specify the element or range of elements which we want to manipulate.

Exercises Section 9.6

Exercise 9.34:

Use iterators to change the characters in a string to uppercase.

Exercise 9.35:

Use iterators to find and to erase each capital letter from a string.

Exercise 9.36:

Write a program that initializes a string from a vector<char>.

Exercise 9.37:

Given that you want to read a character at a time into a string, and you know that the data you need to read is at least 100 characters long, how might you improve the performance of your program?


The string library defines a great number of functions, which use repeated patterns. Given the number of functions supported, this section can be mind-numbing on first reading.


Readers might want to skim the remainder of Section 9.6. Once you know what kinds of operations are available, you can return for the details when writing programs that need to use a given operation.


9.6.1. Other Ways to Construct strings

The string class supports all but one of the constructors in Table 9.2 (p. 307). The constructor that takes a single size parameter is not supported for string. We can create a string: as the empty string, by providing no argument; as a copy of another string; from a pair of iterators; or from a count and a character:

      string s1;           // s1 is the empty string      string s2(5, 'a');   // s2 == "aaaaa"      string s3(s2);       // s3 is a copy of s2      string s4(s3.begin(),                s3.begin() + s3.size() / 2); // s4 == "aa" 

In addition to these constructors, the string type supports three other ways to create a string. We have already used the constructor that takes a pointer to the first character in a null-terminated, character array. There is another constructor that takes a pointer to an element in a character array and a count of how many characters to copy. Because the constructor takes a count, the array does not have to be null-terminated:

      char *cp = "Hiya";            // null-terminated array      char c_array[] = "World!!!!"; // null-terminated      char no_null[] = {'H', 'i'};  // not null-terminated      string s1(cp);             // s1 == "Hiya"      string s2(c_array, 5);     // s2 == "World"      string s3(c_array + 5, 4); // s3 == "!!!!"      string s4(no_null);        // runtime error: no_null not null-terminated      string s5(no_null, 2);     // ok: s5 == "Hi" 

We define s1 using the constructor that takes a pointer to the first character of a null-terminated array. All the characters in that array, up to but not including the terminating null, are copied into the newly created string.

The initializer for s2 uses the second constructor, taking a pointer and a count. In this case, we start at the character denoted by the pointer and copy as many characters as indicated in the second argument. s2, therefore, is a copy of the first five characters from the array c_array. Remember that when we pass an array as an argument, it is automatically converted to a pointer to its first element. Of course, we are not restricted to passing a pointer to the beginning of the array. We initialize s3 to hold four exclamation points by passing a pointer to the first exclamation point in c_array.

The initializers for s4 and s5 are not C-style strings. The definition of s4 is an error. This form of initialization may be called only with a null-terminated array. Passing an array that does not contain a null is a serious error (Section 4.3, p. 130), although it is an error that the compiler cannot detect. What happens at run time is undefined.

The initialization of s5 is fine: That initializer includes a count that says how many characters to copy. As long as the count is within the size of the array, it doesn't matter whether the array is null-terminated.

Table 9.13. Additional Ways to Construct strings

string s(cp, n)

Create s as a copy of n characters from array pointed to by cp.

string s(s2, pos2)

Create s as a copy of characters in the string s2 starting at index pos2. Undefined if pos2 > s2.size().

string s(s2, pos2, len2)

 

Create s as a copy of len2 characters from s2 starting at index pos2. Undefined if pos2 > s2.size(). Regardless of the value of len2, copies at most s2.size() - pos2 characters.

Note: n, len2 and pos2 are all unsigned values.


Using a Substring as the Initializer

The other pair of constructors allow us to create a string as a copy of a substring of the characters in another string:

      string s6(s1, 2);    // s6 == "ya"      string s7(s1, 0, 2); // s7 == "Hi"      string s8(s1, 0, 8); // s8 == "Hiya" 

The first two arguments are the string from which we want to copy and a starting position. In the two-argument version, the newly created string is initialized with the characters from that position to the end of the string argument. We can also provide a third argument that specifies how many characters to copy. In this case, we copy as many characters as indicated (up to the size of the string), starting at the specified position. For example, when we create s7, we copy two characters from s1, starting at position zero. When we create s8, we copy only four characters, not the requested nine. Regardless of how many characters we ask to copy, the library copies up to the size of the string, but not more.

9.6.2. Other Ways to Change a string

Many of the container operations that string supports operate in terms of iterators. For example, erase takes an iterator or iterator range to specify which element(s) to remove from the container. Similarly, the first argument to each version of insert takes an iterator to indicate the position before which to insert the values represented by the other arguments. Although string supports these iterator-based operations, it also supplies operations that work in terms of an index. The index is used to indicate the starting element to erase or the position before which to insert the appropriate values. Table 9.14 lists the operations that are common to both string and the containers; Table 9.15 on the facing page lists the string-only operations.

Table 9.14. string Operations in Common with the Containers

s.insert(p, t)

Insert copy of value t before element referred to by iterator p.

Returns an iterator referring to the inserted element.

s.insert(p, n, t)

Insert n copies of t before p. Returns void.

s.insert(p, b, e)

Insert elements in range denoted by iterators b and e before p.

Returns void.

s.assign(b, e)

Replace s by elements in range denoted by b and e. For string, returns s, for the containers, returns void.

s.assign(n, t)

Replace s by n copies of value t. For string, returns s, for the containers, returns void.

s.erase(p)

Erase element referred to by iteartor p.

Returns an iterator to the element after the one deleted.

s.erase(b, e)

Remove elements in range denoted by b and e.

Returns an iterator to the first element after the range deleted.


Table 9.15. string-Specific Versions

s.insert(pos, n, c)

Insert n copies of character c before element at index pos.

s.insert(pos, s2)

Insert copy of string s2 before pos.

s.insert(pos, s2, pos2, len)

 

Insert len characters from s2 starting at pos2 before pos.

s.insert(pos, cp, len)

Insert len characters from array pointed to by cp before pos.

s.insert(pos, cp)

Insert copy of null-terminated string pointed to by cp before pos.

s.assign(s2)

Replace s by a copy of s2.

s.assign(s2, pos2, len)

Replace s by a copy of len characters from s2 starting at index pos2 in s2.

s.assign(cp, len)

Replace s by len characters from array pointed to by cp.

s.assign(cp)

Replace s by null-terminated array pointed to by cp.

s.erase(pos, len)

Erase len characters starting at pos.

Unless noted otherwise, all operations return a reference to s.


Position-Based Arguments

The string-specific versions of these operations take arguments similar to those of the additional constructors covered in the previous section. These operations let us deal with strings positionally and/or let us use arguments that are pointers to character arrays rather than strings.

For example, all containers let us specify a pair of iterators that denote a range of elements to erase. For strings, we can also specify the range by passing a starting position and count of the number of elements to erase. Assuming s is at least five characters long, we could erase the last five characters as follows:

      s.erase(s.size() - 5, 5); // erase last five characters from s 

Similarly, we can insert a given number of values in a container before the element referred to by an iterator. In the case of strings, we can specify the insertion point as an index rather than using an iterator:

      s.insert(s.size(), 5, '!'); // insert five exclamation points at end of s 

Specifying the New Contents

The characters to insert or assign into the string can be taken from a character array or another string. For example, we can use a null-terminated character array as the value to insert or assign into a string:

      char *cp = "Stately plump Buck";      string s;      s.assign(cp, 7);            // s == "Stately"      s.insert(s.size(), cp + 7); // s == "Stately plump Buck" 

Similarly, we can insert a copy of one string into another as follows:

      s = "some string";      s2 = "some other string";      // 3 equivalent ways to insert all the characters from s2 at beginning of s      // insert iterator range before s.begin()      s.insert(s.begin(), s2.begin(), s2.end());      // insert copy of s2 before position 0 in s      s.insert(0, s2);      // insert s2.size() characters from s2 starting at s2[0] before s[0]      s.insert(0, s2, 0, s2.size()); 

9.6.3. string-Only Operations

The string type provides several other operations that the containers do not:

  • The substr function that returns a substring of the current string

  • The append and replace functions that modify the string

  • A family of find functions that search the string

The substr Operation

The substr operation lets us retrieve a substring from a given string. We can pass substr a starting position and a count. It creates a new string that has that many characters, (up to the end of the string) from the target string, starting at the given position:

      string s("hello world");      // return substring of 5 characters starting at position 6      string s2 = s.substr(6, 5);   // s2 = world 

Alternatively, we could obtain the same result by writing:

      // return substring from position 6 to the end of s      string s3 = s.substr(6);      // s3 = world 

Table 9.16. Substring Operation

s.substr(pos, n)

Return a string containing n characters from s starting at pos.

s.substr(pos)

Return a string containing characters from pos to the end of s.

s.substr()

Return a copy of s.


The append and replace Functions

There are six overloaded versions of append and ten versions of replace. The append and replace functions are overloaded using the same set of arguments, which are listed in Table 9.18 on the next page. These arguments specify the characters to add to the string. In the case of append, the characters are added at the end of the string. In the replace function, these characters are inserted in place a specified range of existing characters in the string.

The append operation is a shorthand way of inserting at the end:

      string s("C++ Primer");        // initialize s to "C++ Primer"      s.append(" 3rd Ed.");          // s == "C++ Primer 3rd Ed."      // equivalent to s.append(" 3rd Ed.")      s.insert(s.size(), " 3rd Ed."); 

The replace operations remove an indicated range of characters and insert a new set of characters in their place. The replace operations have the same effect as calling erase and insert.

The ten different versions of replace differ from each other in how we specify the characters to remove and in how we specify the characters to insert in their place. The first two arguments specify the range of elements to remove. We can specify the range either with an iterator pair or an index and a count. The remaining arguments specify what new characters to insert.

We can think of replace as a shorthand way of erasing some characters and inserting others in their place:

Table 9.17. Operations to Modify strings (args defined in Table 9.18)

s.append( args)

Append args to s. Returns reference to s.

s.replace(pos, len, args)

Remove len characters from s starting at pos and replace them by characters formed by args. Returns reference to s.

This version does not take args equal to b2, e2.

s.replace(b, e, args)

Remove characters in the range denoted by iterators b and e and replace them by args. Returns reference to s.

This version does not take args equal to s2, pos2, len2.


      // starting at position 11, erase 3 characters and then insert "4th"      s.replace(11, 3, "4th");          // s == "C++ Primer 4th Ed."      // equivalent way to replace "3rd" by "4th"      s.erase(11, 3);                   // s == "C++ Primer Ed."      s.insert(11, "4th");              // s == "C++ Primer 4th Ed." 

There is no requirement that the size of the text removed and inserted be the same.



In the previous call to replace, the text we inserted happens to be the same size as the text we removed. We could insert a larger or smaller string:

      s.replace(11, 3, "Fourth"); // s == "C++ Primer Fourth Ed." 

In this call we remove three characters but insert six in their place.

Table 9.18. Arguments to append and replace

s2

The string s2.

s2, pos2, len2

up to len2 characters from s2 starting at pos2.

cp

Null-terminated array pointed to by pointer cp.

cp, len2

up to len2 characters from character array pointed to by cp.

n, c

n copies of character c.

b2, e2

Characters in the range formed by iterators b2 and e2.


9.6.4. string Search Operations

The string class provides six search functions, each named as a variant of find. The operations all return a string::size_type value that is the index of where the match occurred, or a special value named string::npos if there is no match. The string class defines npos as a value that is guaranteed to be greater than any valid index.

There are four versions of each of the search operations, each of which takes a different set of arguments. The arguments to the search operations are listed in Table 9.20. Basically, these operations differ as to whether they are looking for a single character, another string, a C-style, null-terminated string, or a given number of characters from a character array.

Table 9.19. string Search Operations (Arguments in Table 9.20)

s.find( args)

Find first occurrence of args in s.

s.rfind( args)

Find last occurrence of args in s.

s.find_first_of( args)

Find first occurrence of any character from args in s.

s.find_last_of( args)

Find last occurrence of any character from args in s.

s.find_first_not_of( args)

Find first character in s that is not in args.

s.find_last_not_of( args)

Find last character in s that is not in args.


Table 9.20. Arguments to string find Operations

c, pos

Look for the character c starting at position pos in s. pos defaults to 0.

s2, pos

Look for the string s2 starting at position pos in s. pos defaults to 0.

cp, pos

Look for the C-style null-terminated string pointed to by the pointer cp.

Start looking starting at position pos in s. pos defaults to 0.

cp, pos, n

Look for the first n characters in the array pointed to by the pointer cp.

Start looking starting at position pos in s. No default for pos or n.


Finding an Exact Match

The simplest of the search operations is the find function. It looks for its argument and returns the index of the first match that is found, or npos if there is no match:

      string name("AnnaBelle");      string::size_type pos1 = name.find("Anna"); // pos1 == 0 

Returns 0, the index at which the substring "Anna" is found in "AnnaBelle".

By default, the find operations (and other string operations that deal with characters) use the built-in operators to compare characters in the string. As a result, these operations (and other string operations) are case sensitive.



When we look for a value in the string, case matters:

      string lowercase("annabelle");      pos1 = lowercase.find("Anna"); // pos1 == npos 

This code will set pos2 to nposthe string Anna does not match anna.

The find operations return a string::size_type. Use an object of that type to hold the return from find.



Find Any Character

A slightly more complicated problem would be if we wanted to match any character in our search string. For example, the following locates the first digit within name:

      string numerics("0123456789");      string name("r2d2");      string::size_type pos = name.find_first_of(numerics);      cout << "found number at index: " << pos           << " element is "  << name[pos] << endl; 

In this example, pos is set to a value of 1 (the elements of a string, remember, are indexed beginning at 0).

Specifying Where to Start the Search

We can pass an optional starting position to the find operations. This optional argument indicates the index position from which to start the search. By default, that position is set to zero. One common programming pattern uses this optional argument to loop through a string finding all occurrences. We could rewrite our search of "r2d2" to find all the numbers in name:

      string::size_type pos = 0;      // each trip reset pos to the next instance in name      while ((pos = name.find_first_of(numerics, pos))                    != string::npos) {          cout << "found number at index: " << pos               << " element is " << name[pos] << endl;          ++pos; // move to the next character      } 

In this case, we initialize pos to zero so that on the first trip through the while name is searched, beginning at position 0. The condition in the while resets pos to the index of the first number encountered, starting from the current value of pos. As long as the return from find_first_of is a valid index, we print our result and increment pos.

Had we neglected to increment pos at the end of this loop, then it would never terminate. To see why, consider what would happen if we didn't. On the second trip through the loop. we start looking at the character indexed by pos. That character would be a number, so find_first_of would (repeatedly) returns pos!

It is essential that we increment pos. Doing so ensures that we start looking for the next number at a point after the number we just found.



Looking for a Nonmatch

Instead of looking for a match, we might call find_first_not_of to find the first position that is not in the search argument. For example, to find the first non-numeric character of a string, we can write

      string numbers("0123456789");      string dept("03714p3");      // returns 5, which is the index to the character 'p'      string::size_type pos = dept.find_first_not_of(numbers); 

Searching Backward

Each of the find operations that we've seen so far executes left to right. The library provides an analogous set of operations that look through the string from right to left. The rfind member searches for the lastthat is, rightmostoccurrence of the indicated substring:

      string river("Mississippi");      string::size_type first_pos = river.find("is"); // returns 1      string::size_type last_pos = river.rfind("is"); // returns 4 

find returns an index of 1, indicating the start of the first "is", while rfind returns an index of 4, indicating the start of the last occurrence of "is".

The find_last Functions

The find_last functions operate like the corresponding find_first functions, except that they return the last match rather than the first:

  • find_last_of searches for the last character that matches any element of the search string.

  • find_last_not_of searches for the last character that does not match any element of the search string.

Each of these operations takes an optional second argument indicating the position within the string to begin searching.

9.6.5. Comparing strings

As we saw in Section 3.2.3 (p. 85), the string type defines all the relational operators so that we can compare two strings for equality (==), inequality (!=), and the less- or greater-than operations (<, <=, >, >=). Comparison between strings is lexicographicalthat is, string comparison is the same as a case-sensitive, dictionary ordering:

      string cobol_program_crash("abend");      string cplus_program_crash("abort"); 

Exercises Section 9.6.4

Exercise 9.38:

Write a program that, given the string

      "ab2c3d7R4E6" 

finds each numeric character and then each alphabetic character. Write two versions of the program. The first should use find_first_of, and the second find_first_not_of.

Exercise 9.39:

Write a program that, given the strings

      string line1 = "We were her pride of 10 she named us:";      string line2 = "Benjamin, Phoenix, the Prodigal"      string line3 = "and perspicacious pacific Suzanne";      string sentence = line1 + ' ' + line2 + ' ' + line3; 

counts the number of words in sentence and identifies the largest and smallest words. If several words have the largest or smallest length, report all of them.


Here cobol_program_crash is less than the cplus_program_crash. The relational operators compare two strings character by character until reaching a position where the two strings differ. The overall comparison of the strings depends on the comparison between these unequal characters. In this case, the first unequal characters are 'e' and 'o'. The letter 'e' occurs before (is less than) 'o' in the English alphabet and so "abend" is less than "abort". If the strings are of different length, and one string is a substring of the other, then the shorter string is less than the longer.

The compare Functions

In addition to the relational operators, string provides a set of compare operations that perform lexicographical comparions. The results of these operations are similar to the C library strcmp function (Section 4.3, p. 132). Given

      s1.compare (args); 

compare returns one of three possible values:

  1. A positive value if s1 is greater than the string represented by args

  2. A negative value if s1 is less than the string represented by args

  3. 0 if s1 is equal to the string represented by args

For example

      // returns a negative value      cobol_program_crash.compare(cplus_program_crash);      // returns a positive value      cplus_program_crash.compare(cobol_program_crash); 

Table 9.21. string compare Operations

s.compare(s2)

Compare s to s2.

s.compare(pos1, n1, s2)

 

Compares n1 characters starting at pos1 from s to s2.

s.compare(pos1, n1, s2, pos2, n2)

 

Compares n1 characters starting at pos1 from s to the n2 characters starting at pos2 in s2.

s.compare(cp)

Compares s to the null-terminated string pointed to by cp.

s.compare(pos1, n1, cp)

 

Compares n1 characters starting at pos1 from s to cp.

s.compare(pos1, n1, cp, n2)

 

Compares n1 characters starting at pos1 from s to n2 characters starting from the pointer cp.


The overloaded set of six compare operations allows us to compare a substring of either one or both strings for comparison. They also let us compare a string to a character array or portion thereof:

      char second_ed[] = "C++ Primer, 2nd Edition";      string third_ed("C++ Primer, 3rd Edition");      string fourth_ed("C++ Primer, 4th Edition");      // compares C++ library string to C-style string      fourth_ed.compare(second_ed); // ok, second_ed is null-terminated      // compare substrings of fourth_ed and third_ed      fourth_ed.compare(fourth_ed.find("4th"), 3,                        third_ed, third_ed.find("3rd"), 3); 

The second call to compare is the most interesting. This call uses the version of compare that takes five arguments. We use find to locate the position of the beginning of the substring "4th". We compare three characters starting at that position to a substring from third_ed. That substring begins at the position returned from find when looking for "3rd" and again we compare three characters. Essentially, this call compares "4th" to "3rd".



C++ Primer
C Primer Plus (5th Edition)
ISBN: 0672326965
EAN: 2147483647
Year: 2006
Pages: 223
Authors: Stephen Prata

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net