Strings | MUD Game Programming (Premier Press Game Development)

[ LiB ]

This book is about MUDs; therefore, it wouldn't be complete if I didn't include the information about strings that you need.

Strings in C are a messy issue. Almost every other language in the world has better string abilities . Standard C-strings are ugly to use, inflexible , prone to errors, and cause lots of security problems. When you create a C-string, you can't change its sizeever. Your only option is to completely destroy the string and create a new one. Rather than show you examples of C-strings, I'm just going to pretend that they don't exist. Believe meit's better that way. You'll be saner without them.

So, what should you use instead of C-strings? I had considered writing my own string class for the book, with cool reference-counting features and optimizations. On doing more

NOTE

C-strings are a security issue due to something called a buffer-overflow attack. There will be times when the users of your server will send lots of data, and with C-strings, you'll inevitably forget to check to see if you're writing past the end of the string in one place or another. Attack ers use this situation to overwrite memory they shouldn't have access to, and many times this happens to be places in memory where the computer stores the instructions of your program. This means that hackers can actually overwrite the assembly code of your program, and ex ecute anything they want!

research, I found that reference-counting optimizations aren't really such a great optimization after all. Reference-counting means that the string actually keeps track of how many things are pointing at the same string object, and only creates new strings when one of the strings is modified from the original. This sounds great in theory, but in reality it has tons of problems; I decided I didn't want to waste space by devoting an entire chapter to creating a string class. This is a book about MUDs after all, not strings!

C++ to the rescue! The designers of the C++ standard library were smart; they realized that people were just plain sick of C-strings. So, when creating the C++ standard template library (STL), they decided to add a string class, called basic_string . The absolutely best thing about this class is that it is flexible; it doesn't assume that you are using 8-bit ASCII characters . Instead, basic_string uses a template parameter, so you can use characters of any type:

 #include <string> std::basic_string<char> str1;      // ASCII 8-bit string std::basic_string<wchar_t> str2;   // unicode 16-bit string std::basic_string<int> str3;       // string of integers

Okay, so that notation looks a little ugly. Luckily, the designers of STL put in a couple typedefs :

 typedef basic_string<char> string; typedef basic_string<wchar_t> wstring;

By the way, the wchar_t type is a built-in C++ type that represents a wide character. So, to create an 8-bit ASCII stringthe most common type of string there isall you need to do is this:

 std::string str1;

NOTE

For a primer on templates, see Appendix D , " Template Primer, " on the CD.

Creating Strings

Strings are easy to use. Here are a few ways to create them:

 std::string str1 = "Hello!";           // "Hello!" std::string str2( "How are you?" );    // "How are you?" std::string str3( 8, 'C' );            // "CCCCCCCC" std::string str4 = str1;               // "Hello!" std::string str5( str4 );              // "Hello!"

And here are a few ways to create strings from plain C-strings:

 char cstring[] = "Hello!";             // "Hello!" std::string str6 = cstring;            // "Hello!" std::string str7( cstring );           // "Hello!" std::string str8( cstring, 2 );        // "He"

I think it's a great idea to be able to do these kinds of things. In fact, whenever you have a string literal such as "Hello!" inside your program, the compiler sees that as a plain C-string (a char* ). So the lines

 std::string str1 = "Hello!" std::string str6 = cstring;

from earlier are almost identical; they both create an std::string using a C-string.

Using Strings

C-strings are notoriously difficult to work with. You need to call functions on them to do anything useful, such as concatenating, comparing, or even just finding out the length of them. BLEH! However, C++ strings take care of all those little things for you. For example:

 std::string str1 = "Hello!"; std::string str2 = "How are you?" std::string str3; str3 = str1 + " " + str2;          // "Hello! How are you?"

Isn't that cool? Using C-strings, you'd have to first check to see if str3 had enough room, which would mean that you'd need to find out the length of str1 and str2 , and possibly resize a buffer, before calling a concatenate function.

Catch my drift ? Pain in the butt! C++ strings take care of all of that junk for you.

Or how about comparing strings? Using C-strings, it looks like this:

 char cstr1[] = "Hello!"; char cstr2[] = "Hello!"; if( cstr1 == cstr2 )     // write some code here

A beginner would think that the code inside the if-statement would be executed; after all, the strings are equal, right? Not quite. C-strings are just pointers, and when you compare them like that, you're comparing the pointer values, not the actual strings. D'oh!

Now look at C++ strings:

 std::string str1 = "Hello!"; std::string str2 = "Hello!"; if( str1 == str2 )     // do some code here

This code works all of a sudden. The != operator and the less-than and greater-than operators work as well:

 std::string str1 = "ABC"; std::string str2 = "BCD"; bool b; b = ( str1 != str2 );          // true b = ( str1 < str2 );           // true b = ( str1 > str2 );           // false

The less-than and greater-than operators compare the strings alphabetically . ABC is less than BCD because it would come first in the dictionary.

And finally, you can find the size of a string:

 std::string str1 = "Hello!"; int size = str1.size();        // 6

Pretty easy, isn't it?

NOTE

To use other terminology, you can also say that you compare strings lexographically . Impress your friends by using this big word that almost no one knows the meaning of! "Oh, today I lexographically compared two strings!"... nevermind.

Other String Functions

There are lots of string functions, so I'm only going to go over the important ones. Most string functions are based on searching, which is great, because that's how I use strings most often.

For example:

 std::string str1 = "Hello Mr. Anderson."; size_t pos; pos = str1.find( "Mr" );           // 6 pos = str1.find( "He" );           // 0 pos = str1.find( "der" );          // 12 pos = str1.find( "narf" );         // string::npos pos = str1.find( "o" );            // 4 pos = str1.find( "o", 5 );         // 16

There are two interesting cases in this code. When I try searching for narf , which doesn't exist within str1 , the value string::npos is returned. This is a value for the string that cannot be valid, and whenever it is returned from a search function, it means that the search string was not found. The second interesting case is the last line of code. The second parameter of the function says, "Start searching for the search string at index 5," which means it will start searching for o after the first o was found at index 4.

There's also a reverse-find function:

 pos = str1.rfind( "o" );        // 16

The function to find any character within a given set of characters is also useful. For example:

 std::string vowels = "aeiou"; std::string str1 = "That is the sound of inevitability."; size_t pos; pos = str1.find_first_of( vowels );        // 2 pos = str1.find_first_of( vowels, 3 );     // 5    pos = str1.find_first_not_of( vowels );    // 0 pos = str1.find_last_of( vowels );         // 31 pos = str1.find_last_of( vowels, 30 );     // 29 pos = str1.find_last_not_of( vowels );     // 34

The functions with a second parameter tell the function to start searching at that index; first functions start at that index and go up, and last functions start at that index and go down.

There are more functions, of course, but I use them rarely if at all, so I don't want to spend any extra time explaining them. A good STL reference should explain all the functions in depth.

My Own String Functions

For a MUD, you're going to need to do a bit of string parsing on your own, since the C++ string library doesn't include those kinds of functions. That's not a big deal, though. Basically, I feel the extra string functions you're going to need in this book are functions to convert strings to uppercase/lowercase, trim whitespace off the ends of strings, get individual words from strings, and remove individual words from strings. And since the C++ string library doesn't have string->datatype or vice-versa functions, I'll create some of those, too.

Changing Cases

If you look at the standard C++ string class, you see that the equivalence operator is handy; but it has a problem. Look at the following example:

 std::string str1 = "HELLO"; std::string str2 = "hello"; boolean b = (str1 == str2)     // false

Even though the strings contain the same word, they are not equal because they are different characters. This is sometimes particularly troublesome . So how would you go about trying to see if the words are the same?

The logical answer is to convert both strings to the same case, and then compare them. Unfortunately, there is no standard C++ method to convert a string to a particular case. Who knows whether this was an oversight or the developers felt it wasn't needed? Many compilers implement this function on their own, but it's not part of the standard.

So let's build our own! C++ does include a function for converting individual characters to upper- or lowercase, however. These functions are located in the standard C++ header <cctype> . These functions will be in the BasicLib files: BasicLibString.h and BasicLibString.cpp.

First, an uppercase function:

 std::string UpperCase( const std::string& p_string ) {     std::string str = p_string;     for( int i = 0; i < str.size(); i++ ) {         str[i] = std::toupper( str[i] );     }     return str; }

The function essentially goes through every character in the string, converts it to uppercase, and then returns the new string. The LowerCase() function is identical, except the call to toupper is replaced with tolower . Therefore, there is no need to show it here.

Trimming Whitespace

Whitespace is a term used for space in text that is blank. Spaces, tabs, newline characters they're all whitespace. Many times you're going to want to be able to trim the whitespace off the front and the back of a string. For example, you've got a string

 "    hello    "

that has four blank spaces both in front of and behind the word "hello". You just want to get the word in there and ignore all that extra junk. Luckily, this is pretty easy using the string's search functions.

First, you need to define what whitespace actually is:

 const std::string WHITESPACE = " \t\n\r";

This is a global const string that defines the four common whitespace characters. The first is a space; the second is \t , which is the C++ escape sequence that means tab; then \n , which is newline; and finally \r , which is a carriage -return. Here's the actual function:

 std::string TrimWhitespace( const std::string& p_string ) {     int wsf, wsb;     wsf = p_string.find_first_not_of( WHITESPACE );     wsb = p_string.find_last_not_of( WHITESPACE );     if( wsf == std::string::npos ) {         wsf = 0;         wsb = -1;     }     return p_string.substr( wsf, wsb - wsf + 1 ); }

There are two locals: wsf and wsb . These stand for "white space front" and "white space back." The function uses the find functions to find the first and last characters in the string that aren't whitespace. For the example string I showed you before, wsf would be 4 and wsb would be 8, pointing at h and o . The if-statement in there checks to see if the string was entirely whitespace (meaning that no non-whitespace characters were found); if whitespace is found, the entire string needs to be cleared, so wsf is set to 0, and wsb is set to -1. Next you'll see why.

Finally, the substr function is called, chopping off the front and back whitespace portions. The first parameter to substr is the position from which the substring starts, and the second parameter is the length of the substring. So, in the example, the substring starts at index 4 ("h"), and is 8 - 4 + 1 (5) characters long. This results in hello . For the case of a string of only whitespace, the length calculation would result in 0 - -1 + 1, which is 0. So the result of TrimWhitespace( " " ); is a string that is 0 characters long.

Parsing

Parsing words out of a string is a somewhat simple task, once again utilizing the string class's search features. Basically, the idea is to count the number of runs of whitespace contained in a string (a run of whitespace is basically just one continuous chunk of characters that are all whitespace), until you find the word you want.

To show you how the function works in detail, I'll be referring to this string:

 "This is a string"

The first thing the function does is find the first character in the string that isn't whitespace. This will be the beginning of the first word:

 std::string ParseWord( const std::string& p_string, int p_index ) {     int wss = p_string.find_first_not_of( WHITESPACE );

After that executes, wss has the index of the first word (word index 0) in the string. For the example, wss would be 0, because index 0, "T" is a non-whitespace character.

Now, in order to find the correct word, you must loop through the string p_index times, finding the end of the current word, and then the beginning of the next word:

 while( p_index > 0 ) {         p_index--;         wss = p_string.find_first_of( WHITESPACE, wss );         wss = p_string.find_first_not_of( WHITESPACE, wss );     }

As you can see, the loop runs until p_index is zero; if p-index is zero to begin with, the loop never executes. So it searches for the index of the first whitespace character after the current word. With the example string, this would be 4, since index 4 is the first whitespace character after the word "This." Then the loop searches for the next non-whitespace character; in the example, that character is index 5, "i."

Depending on which word you want, this loop can repeat over and over, until the desired word is found. At the end of the loop, the wss variable should be pointing to the first letter of the word you want to extract.

Now that you have the index of the first letter of the word you want, you need to find out how long the word is, by finding the end of the word:

 int wse = p_string.find_first_of( WHITESPACE, wss );

Now, if there was a problem finding the appropriate word (for example, you wanted the fifth word when there were only four words), wss should be std::string::npos . So you need to check that:

 if( wss == std::string::npos ) {         wss = 0;         wse = 0;     }

If you couldn't find the word, set both indexes to , to signify that you want to return an empty string. Finally, you return the substring:

 return p_string.substr( wss, wse - wss ); }

Since wse is pointing to the first whitespace character after the word you want, you don't need to add a whitespace character to the length this time (as opposed to the TrimWhitespace() function). So, here it is in action:

 std::string str1 = "This is a string"; std::string str2; str2 = BasicLib::ParseWord( str1, 0 ); // "This" str2 = BasicLib::ParseWord( str1, 1 ); // "is" str2 = BasicLib::ParseWord( str1, 2 ); // "a" str2 = BasicLib::ParseWord( str1, 3 ); // "string" str2 = BasicLib::ParseWord( str1, 4 ); // ""

This kind of a function becomes incredibly useful when dealing with a MUD.

There is a similar function, RemoveWord() , but instead of returning the requested word, it returns the original string without the word. The function is identical to ParseWord() except for two aspects. The idea of removing a word is somewhat odd. Imagine the string "This is a string". If you wanted to remove word one ("is"), and only the word, you'd end up with "This a string", with two spaces between "This" and "a". That would look weird, wouldn't it? Basically, the best way to remove a word from a string is to not only remove the word, but the whitespace after the word as well. So you want to remove "is " and not "is". Therefore, you need to add another line of code to the function. After finding the first whitespace after the word to be removed, you need to find the beginning of the next word after that:

 int wse = p_string.find_first_of( WHITESPACE, wss );     wse = p_string.find_first_not_of( WHITESPACE, wse );

Notice that the first line is the same as it was in the ParseWord() function. Now, wse should be the index of the first letter of the word after the word you want to remove.

The next difference is that you're not returning a substring. This time, you're going to call the string's erase function to remove the word:

 std::string str = p_string;     str.erase( wss, wse - wss );     return str; }

Ta-da! Here are some examples:

 std::string str1 = "This is a string"; std::string str2; str2 = BasicLib::RemoveWord( str1, 0 );    // "is a string" str2 = BasicLib::RemoveWord( str1, 1 );    // "This a string" str2 = BasicLib::RemoveWord( str1, 2 );    // "This is string" str2 = BasicLib::RemoveWord( str1, 3 );    // "This is a " str2 = BasicLib::RemoveWord( str1, 4 );    // "This is a string"

Notice that when trying to remove word index 3, there is a space at the end of the string. This is the only anomaly in the algorithm, but it actually makes sense if you think about it. The RemoveWord() function treats the whitespace after the word as part of that word.

Conversions

Unfortunately, there is no direct way in C++ to convert a string to another datatype, or vice-versa. But it still is an easy process.

C++ has a stream class called stringstream , which acts like an input/output buffer, much like cin and cout . Luckily, C++ has built-in functions that allow you to convert basic datatypes to and from strings, via streams. For example, you can use cout to print ints, floats, strings, and so on:

 std::cout << 10 << 3.1415 << "hello!";

You can do the same with stringstream s:

 #include <sstream> std::stringstream str; str << 10 << 3.1415 << "hello!";   // "103.1415hello!"

You can also do it the other way around:

 int i; str >> i;                          // 103

Why is 103 in the integer? Because it picked out every digit it could find before it got to the period. If you streamed it into a float instead, it would be "103.1415".

That's how you can convert datatypes. In an effort to make the datatype easy to work with, I've created two functions to convert to and from strings. The first function converts a datatype into a string:

 template< class type > inline std::string tostring( const type& p_type ) {     std::stringstream str;     str << p_type;     return str.str(); }

The first thing you should notice is that this is a template function. Templates make life much easier, so if you're not familiar with them, you should probably try to catch up. Since this function is templated, it will work with any datatype that can be streamed into a stringstream . If you create a custom class and have it support an operator<< into a basic ostream class, your class can automatically be converted into a string class using this function.

So, the function takes a parameter of any datatype, creates a stringstream buffer, and then streams the parameter into the buffer. Finally, the stream is converted into a string using the buffer's str() function, and returned. Voil ! Here's how you use the lines of code:

 std::string str1; str1 = BasicLib::tostring( 42 );             // "42" str1 += " " + BasicLib::tostring( 3.1415 );  // "42 3.1415"

It even works with the sint64 type.

Now, the other way around is the totype() function, which converts a string into any datatype that has a defined stream-extraction operator.

 template< class type > inline type totype( const std::string& p_string ) {     std::stringstream str;     str << p_string;     type t;     str >> t;     return t; }

The totype() function is also a template function, which takes a string as a parameter and returns whatever you want. This function streams the string into the stringstream , creates a value of type type named t , and streams the buffer into it. Finally, t is returned.

Using this function is a little odd, however. For example, try doing this:

 int i = BasicLib::totype( "42" );      // COMPILER ERROR!

Template datatypes can only be determined by the types of the function parameters, and this function only takes a std::string as its parameter; it can't tell that you want to return an int . So you need to tell the function yourself:

 using namespace BasicLib; int i = totype<int>( "42" ); float f = totype<float>( "3.1415" ); sint64 s = totype<sint64>( "1152921504606846976" );

Now you've got functions to convert datatypes' to and from strings.

Searching and Replacing

MUDs need another common functionthe ability to search for a substring inside of a string and replace it.

To do this, I've created the SearchAndReplace helper function:

 std::string SearchAndReplace(     const std::string& p_target,     const std::string& p_search,     const std::string& p_replace );

SearchAndReplace is really just a simple function that uses the string's find and replace functions, so I'm not going to bother showing you the code. The code is on the CD in the file BasicLibString.cpp if you want to see it, though.

Basically, the function goes through a string and replaces all instances of p_search with p_replace , and returns the result. Here's some code:

 string s = "This string has been read once, and only once."; s = BasicLib::SearchAndReplace( s, "once", "twice" );

After that code runs, s will hold "This string has been read twice, and only twice."

[ LiB ]