Problem
You want to split a string into tokens, but you require more sophisticated searching or flexibility than Recipe 4.7 provides. For example, you may want tokens that are more than one character or can take on many different forms. This often results in code, and causes confusion in consumers of your class or function.
Solution
Use Boost's regex class template. regex enables the use of regular expressions on string and text data. Example 4-33 shows how to use regex to split strings.
Example 4-33. Using Boost's regular expressions
#include #include #include int main( ) { std::string s = "who,lives:in-a,pineapple under the sea?"; boost::regex re(",|:|-|\s+"); // Create the reg exp boost::sregex_token_iterator // Create an iterator using a p(s.begin( ), s.end( ), re, -1); // sequence and that reg exp boost::sregex_token_iterator end; // Create an end-of-reg-exp // marker while (p != end) std::cout << *p++ << ' '; }
Discussion
Example 4-33 shows how to use regex to iterate over matches in a regular expression. The following line sets up the regular expression:
boost::regex re(",|:|-|\s+");
What it says, essentially, is that each match of the regular expression is either a comma, or a colon, or a dash, or one or more spaces. The pipe character is the logical operator that ORs each of the delimiters together. The next two lines set up the iterator:
boost::sregex_token_iterator p(s.begin( ), s.end( ), re, -1); boost::sregex_token_iterator end;
The iterator p is constructed using the regular expression and an input string. Once that has been built, you can treat p like you would an iterator on a standard library sequence. A sregex_token_iterator constructed with no arguments is a special value that represents the end of a regular expression token sequence, and can therefore be used in a comparison to know when you hit the end.
Building C++ Applications
Code Organization
Numbers
Strings and Text
Dates and Times
Managing Data with Containers
Algorithms
Classes
Exceptions and Safety
Streams and Files
Science and Mathematics
Multithreading
Internationalization
XML
Miscellaneous
Index