Using Regular Expressions to Split a String

Problem

You want to split a string into tokens, but you require more sophisticated searching or flexibility than Recipe 4.7 provides. For example, you may want tokens that are more than one character or can take on many different forms. This often results in code, and causes confusion in consumers of your class or function.

Solution

Use Boost's regex class template. regex enables the use of regular expressions on string and text data. Example 4-33 shows how to use regex to split strings.

Example 4-33. Using Boost's regular expressions

#include 
#include 
#include 

int main( ) {

 std::string s = "who,lives:in-a,pineapple under the sea?";

 boost::regex re(",|:|-|\s+"); // Create the reg exp
 boost::sregex_token_iterator // Create an iterator using a
 p(s.begin( ), s.end( ), re, -1); // sequence and that reg exp
 boost::sregex_token_iterator end; // Create an end-of-reg-exp
 // marker
 while (p != end)
 std::cout << *p++ << '
';
}

Discussion

Example 4-33 shows how to use regex to iterate over matches in a regular expression. The following line sets up the regular expression:

boost::regex re(",|:|-|\s+");

What it says, essentially, is that each match of the regular expression is either a comma, or a colon, or a dash, or one or more spaces. The pipe character is the logical operator that ORs each of the delimiters together. The next two lines set up the iterator:

boost::sregex_token_iterator
 p(s.begin( ), s.end( ), re, -1);
boost::sregex_token_iterator end;

The iterator p is constructed using the regular expression and an input string. Once that has been built, you can treat p like you would an iterator on a standard library sequence. A sregex_token_iterator constructed with no arguments is a special value that represents the end of a regular expression token sequence, and can therefore be used in a comparison to know when you hit the end.

Building C++ Applications

Code Organization

Numbers

Strings and Text

Dates and Times

Managing Data with Containers

Algorithms

Classes

Exceptions and Safety

Streams and Files

Science and Mathematics

Multithreading

Internationalization

XML

Miscellaneous

Index