Section 3.2. Library string Type

3.2. Library `string` Type

The string type supports variable-length character strings. The library takes care of managing the memory associated with storing the characters and provides various useful operations. The library string type is intended to be efficient enough for general use.

As with any library type, programs that use strings must first include the associated header. Our programs will be shorter if we also provide an appropriate using declaration:

      #include <string>      using std::string;

3.2.1. Defining and Initializing `string`s

The string library provides several constructors (Section 2.3.3, p. 49). A constructor is a special member function that defines how objects of that type can be initialized. Table 3.1 on the facing page lists the most commonly used string constructors. The default constructor (Section 2.3.4, p. 52) is used "by default" when no initializer is specified.

Table 3.1. Ways to Initialize a `string`
`string s1;`	Default constructor; `s1` is the empty string
`string s2(s1);`	Initialize `s2` as a copy of `s1`
`string s3("value");`	Initialize `s3` as a copy of the string literal
`string s4(n, 'c');`	Initialize `s4` with `n` copies of the character `'c'`

Caution: Library `string` Type and String Literals

For historical reasons, and for compatibility with C, character string literals are not the same type as the standard library string type. This fact can cause confusion and is important to keep in mind when using a string literal or the string data type.

Exercises Section 3.2.1

Exercise 3.2:
What is a default constructor?

Exercise 3.3:
Name the three ways to initialize a string.
Exercise 3.4:
What are the values of s and s2?
      string s;      int main() {       string s2;      } 

3.2.2. Reading and Writing `string`s

As we saw in Chapter 1, we use the iostream library to read and write values of built-in types such as int, double, and so on. Similarly, we can use the iostream and string libraries to allow us to read and write strings using the standard input and output operators:

      // Note: #include and using declarations must be added to compile this code      int main()      {          string s;          // empty string          cin >> s;          // read whitespace-separated string into s          cout << s << endl; // write s to the output          return 0;      }

This program begins by defining a string named s. The next line,

      cin >> s;        // read whitespace-separated string into s

reads the standard input storing what is read into s. The string input operator:

Reads and discards any leading whitespace (e.g., spaces, newlines, tabs)
It then reads characters until the next whitespace character is encountered

So, if the input to this program is "Hello World!", (note leading and trailing spaces) then the output will be "Hello" with no extra spaces.

The input and output operations behave similarly to the operators on the builtin types. In particular, the operators return their left-hand operand as their result. Thus, we can chain together multiple reads or writes:

      string s1, s2;      cin >> s1 >> s2; // read first input into s1, second into s2      cout << s1 << s2 << endl; // write both strings

If we give this version of the program the same input as in the previous paragraph, our output would be

      HelloWorld!

To compile this program, you must add #include directives for both the iostream and string libraries and must issue using declarations for all the names used from the library: string, cin, cout, and endl.

The programs presented from this point on will assume that the needed #include and using declarations have been made.

Reading an Unknown Number of `string`s

Like the input operators that read built-in types, the string input operator returns the stream from which it read. Therefore, we can use a string input operation as a condition, just as we did when reading ints in the program on page 18. The following program reads a set of strings from the standard input and writes what it has read, one string per line, to the standard output:

      int main()      {          string word;          // read until end-of-file, writing each word to a new line          while (cin >> word)              cout << word << endl;          return 0;      }

In this case, we read into a string using the input operator. That operator returns the istream from which it read, and the while condition tests the stream after the read completes. If the stream is validit hasn't hit end-of-file or encountered an invalid inputthen the body of the while is executed and the value we read is printed to the standard output. Once we hit end-of-file, we fall out of the while.

Using `getline` to Read an Entire Line

There is an additional useful string IO operation: getline. This is a function that takes both an input stream and a string. The getline function reads the next line of input from the stream and stores what it read, not including the newline, in its string argument. Unlike the input operator, getline does not ignore leading newlines. Whenever getline encounters a newline, even if it is the first character in the input, it stops reading the input and returns. The effect of encountering a newline as the first character in the input is that the string argument is set to the empty string.

The getline function returns its istream argument so that, like the input operator, it can be used as a condition. For example, we could rewrite the previous program that wrote one word per line to write a line at a time instead:

      int main()      {          string line;          // read line at time until end-of-file          while (getline(cin, line))              cout << line << endl;          return 0;      }

Because line does not contain a newline, we must write our own if we want the strings written one to a line. As usual, we use endl to write a newline and flush the output buffer.

The newline that causes getline to return is discarded; it does not get stored in the string.

Exercises Section 3.2.2

Exercise 3.5:
Write a program to read the standard input a line at a time. Modify your program to read a word at a time.

Exercise 3.6:
Explain how whitespace characters are handled in the string input operator and in the getline function.

3.2.3. Operations on `string`s

Table 3.2 on the next page lists the most commonly used string operations.

Table 3.2. `string` Operations
`s.empty()`	Returns `true` if `s` is empty; otherwise returns `false`
`s.size()`	Returns number of characters in `s`
`s[n]`	Returns the character at position `n` in `s`; positions start at 0.
`s1 + s2`	Returns a `string` equal to the concatenation of `s1` and `s2`
`s1 = s2`	Replaces characters in `s1` by a copy of `s2`
`v1 == v2`	Returns `true` if `v1` and `v2` are equal; `false` otherwise
`!=, <, <=`, `>, and >=`	Have their normal meanings

The `string size` and `empty` Operations

The length of a string is the number of characters in the string. It is returned by the size operation:

      int main()      {          string st("The expense of spirit\n");          cout << "The size of " << st << "is " << st.size()               << " characters, including the newline" << endl;          return 0;      }

If we compile and execute this program it yields

      The size of The expense of spirit      is 22 characters, including the newline

Often it is useful to know whether a string is empty. One way we could do so would be to compare size with 0:

      if (st.size() == 0)           // ok: empty

In this case, we don't really need to know how many characters are in the string; we are only interested in whether the size is zero. We can more directly answer this question by using the empty member:

      if (st.empty())           // ok: empty

The empty function returns the bool (Section 2.1, p. 34) value true if the string contains no characters; otherwise, it returns false.

`string::size_type`

It might be logical to expect that size returns an int, or, thinking back to the note on page 38, an unsigned. Instead, the size operation returns a value of type string::size_type. This type requires a bit of explanation.

The string classand many other library typesdefines several companion types. These companion types make it possible to use the library types in a machine-independent manner. The type size_type is one of these companion types. It is defined as a synonym for an unsigned typeeither unsigned int or unsigned longthat is guaranteed to be big enough to hold the size of any string. To use the size_type defined by string, we use the scope operator to say that the name size_type is defined in the string class.

Any variable used to store the result from the string size operation ought to be of type string::size_type. It is particularly important not to assign the return from size to an int.

Although we don't know the precise type of string::size_type, wedo know that it is an unsigned type (Section 2.1.1, p. 34). We also know that for a given type, the unsigned version can hold a positive value twice as large as the corresponding signed type can hold. This fact implies that the largest string could be twice as large as the size an int can hold.

Another problem with using an int is that on some machines the size of an int is too small to hold the size of even plausibly large strings. For example, if a machine has 16-bit ints, then the largest string an int could represent would have 32,767 characters. A string that held the contents of a file could easily exceed this size. The safest way to hold the size of a string is to use the type the library defines for this purpose, which is string::size_type.

The `string` Relational Operators

The string class defines several operators that compare two string values. Each of these operators works by comparing the characters from each string.

string comparisons are case-sensitivethe upper- and lowercase versions of a letter are different characters. On most computers, the uppercase letters come first: Every uppercase letter is less than any lowercase letter.

The equality operator compares two strings, returning true if they are equal. Two strings are equal if they are the same length and contain the same characters. The library also defines != to test whether two strings are unequal.

The relational operators <, <=, >, >= test whether one string is less than, less than or equal, greater than, or greater than or equal to another:

      string big = "big", small = "small";      string s1 = big;    // s1 is a copy of big      if (big == small)   // false          // ...      if (big <= s1)      // true, they're equal, so big is less than or equal to s1          // ...

The relational operators compare strings using the same strategy as in a (case-sensitive) dictionary:

If two strings have different lengths and if every character in the shorter string is equal to the corresponding character of the longer string, then the shorter string is less than the longer one.
If the characters in two strings differ, then we compare them by comparing the first character at which the strings differ.

As an example, given the strings

      string substr = "Hello";      string phrase = "Hello World";      string slang  = "Hiya";

then substr is less than phrase, and slang is greater than either substr or phrase.

Assignment for `string`s

In general the library types strive to make it as easy to use a library type as it is to use a built-in type. To this end, most of the library types support assignment. In the case of strings, we can assign one string object to another:

      // st1 is an empty string, st2 is a copy of the literal      string st1, st2 = "The expense of spirit";      st1 = st2; // replace st1 by a copy of st2

After the assignment, st1 contains a copy of the characters in st2.

Most string library implementations go to some trouble to provide efficient implementations of operations such as assignment, but it is worth noting that conceptually, assignment requires a fair bit of work. It must delete the storage containing the characters associated with st1, allocate the storage needed to contain a copy of the characters associated with st2, and then copy those characters from st2 into this new storage.

Adding Two `string`s

Addition on strings is defined as concatenation. That is, it is possible to concatenate two or more strings through the use of either the plus operator (+) or the compound assignment operator (+=) (Section 1.4.1, p. 13). Given the two strings

      string s1("hello, ");      string s2("world\n");

we can concatenate the two strings to create a third string as follows:

      string s3 = s1 + s2;   // s3 is hello, world\n

If we wanted to append s2 to s1 directly, then we would use +=:

      s1 += s2;   // equivalent to s1 = s1 + s2

Adding Character String Literals and `string`s

The strings s1 and s2 included punctuation directly. We could achieve the same result by mixing string objects and string literals as follows:

      string s1("hello");      string s2("world");      string s3 = s1 + ", " + s2 + "\n";

When mixing strings and string literals, at least one operand to each + operator must be of string type:

      string s1 = "hello";   // no punctuation      string s2 = "world";      string s3 = s1 + ", ";           // ok: adding a string and a literal      string s4 = "hello" + ", ";      // error: no string operand      string s5 = s1 + ", " + "world"; // ok: each + has string operand      string s6 = "hello" + ", " + s2; // error: can't add string literals

The initializations of s3 and s4 involve only a single operation. In these cases, it is easy to determine that the initialization of s3 is legal: We initialize s3 by adding a string and a string literal. The initialization of s4 attempts to add two string literals and is illegal.

The initialization of s5 may appear surprising, but it works in much the same way as when we chain together input or output expressions (Section 1.2, p. 5). In this case, the string library defines addition to return a string. Thus, when we initialize s5, the subexpression s1 + ", " returns a string, which can be concatenated with the literal "world\n". It is as if we had written

      string tmp = s1 + ", "; // ok: + has a string operand      s5 = tmp + "world";     // ok: + has a string operand

On the other hand, the initialization of s6 is illegal. Looking at each subexpression in turn, we see that the first subexpression adds two string literals. There is no way to do so, and so the statement is in error.

Fetching a Character from a `string`

The string type uses the subscript ([ ]) operator to access the individual characters in the string. The subscript operator takes a size_type value that denotes the character position we wish to fetch. The value in the subscript is often called "the subscript" or "an index."

Subscripts for strings start at zero; if s is a string, then if s isn't empty, s[0] is the first character in the string, s[1] is the second if there is one, and the last character is in s[s.size() - 1].

It is an error to use an index outside this range.

We could use the subscript operator to print each character in a string on a separate line:

      string str("some string");      for (string::size_type ix = 0; ix != str.size(); ++ix)          cout << str[ix] << endl;

On each trip through the loop we fetch the next character from str, printing it followed by a newline.

Subscripting Yields an Lvalue

Recall that a variable is an lvalue (Section 2.3.1, p. 45), and that the left-hand side of an assignment must be an lvalue. Like a variable, the value returned by the subscript operator is an lvalue. Hence, a subscript can be used on either side of an assignment. The following loop sets each character in str to an asterisk:

      for (string::size_type ix = 0; ix != str.size(); ++ix)          str[ix] = '*';

Computing Subscript Values

Any expression that results in an integral value can be used as the index to the subscript operator. For example, assuming someval and someotherval are integral objects, we could write

      str[someotherval * someval] = someval;

Although any integral type can be used as an index, the actual type of the index is string::size_type, which is an unsigned type.

The same reasons to use string::size_type as the type for a variable that holds the return from size apply when defining a variable to serve as an index. A variable used to index a string should have type string::size_type.

When we subscript a string, we are responsible for ensuring that the index is "in range." By in range, we mean that the index is a number that, when assigned to a size_type, is a value in the range from 0 through the size of the string minus one. By using a string::size_type or another unsigned type as the index, we ensure that the subscript cannot be less than zero. As long as our index is an unsigned type, we need only check that it is less than the size of the string.

The library is not required to check the value of the index. Using an index that is out of range is undefined and usually results in a serious run-time error.

3.2.4. Dealing with the Characters of a `string`

Often we want to process the individual characters of a string. For example, we might want to know if a particular character is a whitespace character or whether the character is alphabetic or numeric. Table 3.3 on the facing page lists the functions that can be used on the characters in a string (or on any other char value). These functions are defined in the cctype header.

Table 3.3. `cctype` Functions
`isalnum(c)`	`TRue` if `c` is a letter or a digit.
`isalpha(c)`	`true` if `c` is a letter.
`iscntrl(c)`	`true` if `c` is a control character.
`isdigit(c)`	`true` if `c` is a digit.
`isgraph(c)`	`true` if `c` is not a space but is printable.
`islower(c)`	`true` if `c` is a lowercase letter.
`isprint(c)`	`TRue` if `c` is a printable character.
`ispunct(c)`	`TRue` if `c` is a punctuation character.
`isspace(c)`	`true` if `c` is whitespace.
`isupper(c)`	`TRue` if `c` is an uppercase letter.
`isxdigit(c)`	`true` if `c` is a hexadecimal digit.
`tolower(c)`	If `c` is an uppercase letter, returns its lowercase equivalent; otherwise returns `c` unchanged.
`toupper(c)`	If `c` is a lowercase letter, returns its uppercase equivalent; otherwise returns `c` unchanged.

These functions mostly test the given character and return an int, which acts as a truth value. Each function returns zero if the test fails; otherwise, they return a (meaningless) nonzero value indicating that the character is of the requested kind.

For these functions, a printable character is a character with a visible representation; whitespace is one of space, tab, vertical tab, return, newline, and formfeed; and punctuation is a printable character that is not a digit, a letter, or (printable) whitespace character such as space.

As an example, we could use these functions to print the number of punctuation characters in a given string:

      string s("Hello World!!!");      string::size_type punct_cnt = 0;      // count number of punctuation characters in s      for (string::size_type index = 0; index != s.size(); ++index)          if (ispunct(s[index]))              ++punct_cnt;      cout << punct_cnt           << " punctuation characters in " << s << endl;

The output of this program is

      3 punctuation characters in Hello World!!!

Rather than returning a truth value, the tolower and toupper functions return a charactereither the argument unchanged or the lower- or uppercase version of the character. We could use tolower to change s to lowercase as follows:

      // convert s to lowercase      for (string::size_type index = 0; index != s.size(); ++index)          s[index] = tolower(s[index]);      cout << s << endl;

which generates

      hello world!!!

Advice: Use the C++ Versions of C Library Headers

In addition to facilities defined specifically for C++, the C++ library incorporates the C library. The cctype header makes available the C library functions defined in the C header file named ctype.h.

The standard C headers names use the form name.h. The C++ versions of these headers are named cnamethe C++ versions remove the .h suffix and precede the name by the letter c. Thec indicates that the header originally comes from the C library. Hence, cctype has the same contents as ctype.h, but in a form that is appropriate for C++ programs. In particular, the names defined in the cname headers are defined inside the std namespace, whereas those defined in the .h versions are not.

Ordinarily, C++ programs should use the cname versions of headers and not the name.h versions. That way names from the standard library are consistently found in the std namespace. Using the .h headers puts the burden on the programmer to remember which library names are inherited from C and which are unique to C++.

Exercises Section 3.2.4

Exercise 3.7:
Write a program to read two strings and report whether the strings are equal. If not, report which of the two is the larger. Now, change the program to report whether the strings have the same length and if not report which is longer.

Exercise 3.8:
Write a program to read strings from the standard input, concatenating what is read into one large string. Print the concatenated string. Next, change the program to separate adjacent input strings by a space.
Exercise 3.9:
What does the following program do? Is it valid? If not, why not?
      string s;      cout << s[0] << endl; 
Exercise 3.10:
Write a program to strip the punctuation from a string. The input to the program should be a string of characters including punctuation; the output should be a string in which the punctuation is removed.

3.2. Library `string` Type

3.2.1. Defining and Initializing `string`s

Table 3.1. Ways to Initialize a `string`

Caution: Library `string` Type and String Literals

Exercises Section 3.2.1

3.2.2. Reading and Writing `string`s

Reading an Unknown Number of `string`s

Using `getline` to Read an Entire Line

Exercises Section 3.2.2

3.2.3. Operations on `string`s

Table 3.2. `string` Operations

The `string size` and `empty` Operations

`string::size_type`

The `string` Relational Operators

Assignment for `string`s

Adding Two `string`s

Adding Character String Literals and `string`s

Fetching a Character from a `string`

Subscripting Yields an Lvalue

Computing Subscript Values

3.2.4. Dealing with the Characters of a `string`

Table 3.3. `cctype` Functions

Advice: Use the C++ Versions of C Library Headers

Exercises Section 3.2.4

Section 3.2. Library string Type

3.2. Library string Type

3.2.1. Defining and Initializing strings

Table 3.1. Ways to Initialize a string

Caution: Library string Type and String Literals

Exercises Section 3.2.1

3.2.2. Reading and Writing strings

Reading an Unknown Number of strings

Using getline to Read an Entire Line

Exercises Section 3.2.2

3.2.3. Operations on strings

Table 3.2. string Operations

The string size and empty Operations

string::size_type

The string Relational Operators

Assignment for strings

Adding Two strings

Adding Character String Literals and strings

Fetching a Character from a string

Subscripting Yields an Lvalue

Computing Subscript Values

3.2.4. Dealing with the Characters of a string

Table 3.3. cctype Functions

Advice: Use the C++ Versions of C Library Headers

Exercises Section 3.2.4

3.2. Library `string` Type

3.2.1. Defining and Initializing `string`s

Table 3.1. Ways to Initialize a `string`

Caution: Library `string` Type and String Literals

3.2.2. Reading and Writing `string`s

Reading an Unknown Number of `string`s

Using `getline` to Read an Entire Line

3.2.3. Operations on `string`s

Table 3.2. `string` Operations

The `string size` and `empty` Operations

`string::size_type`

The `string` Relational Operators

Assignment for `string`s

Adding Two `string`s

Adding Character String Literals and `string`s

Fetching a Character from a `string`

3.2.4. Dealing with the Characters of a `string`

Table 3.3. `cctype` Functions