Strings | Cross-Platform GUI Programming with wxWidgets

The benefits of working with a string class instead of standard character pointers are well established. wxWidgets includes its own string class, wxString, used both internally and for passing and returning information. wxString has all the standard operations you expect to find in a string class: dynamic memory management, construction from other strings, C strings, and characters, assignment operators, access to individual characters, string concatenation and comparison, substring extraction, case conversion, trimming and padding (with spaces), searching and replacing, C-like printf, stream-like insertion functions, and more.

Beyond being just another string class, wxString has other useful features. wxString fully supports Unicode, including methods for converting to and from ANSI and Unicode regardless of your build configuration. Using wxString gives you the ability to pass strings to the library or receive them back without any conversion process. Lastly, wxString implements 90% of the STL std::string methods, meaning that anyone familiar with std::string can use wxString without any learning curve.

Using wxString

Using wxString in your application is very straightforward. Wherever you would normally use std::string or your favorite string implementation, use wxString instead. All functions taking string arguments should take const wxString& (which makes assignment to the strings inside the function faster because of reference counting), and all functions returning strings should return wxString, which makes it safe to return local variables.

Because C and C++ programmers are familiar with most string methods, a long and detailed API reference for wxString has been omitted. Please consult the wxWidgets documentation for wxString, which provides a comprehensive list of all its methods.

You may notice that wxString sometimes has two or more functions that do the same thing. For example, Length, Len, and length all return the length of the string. In all cases of such duplication, the usage of std::string-compatible methods is strongly advised. It will make your code more familiar to other C++ programmers and will let you reuse the same code in both wxWidgets and other programs, where you can typedef wxString as std::string. Also, wxWidgets might start using std::string at some point in the future, so using these methods will make your programs more forward-compatible (although the wxString methods would be supported for some time for backwards compatibility).

wxString, Characters, and String Literals

wxWidgets has a wxChar type which maps either to char or wchar_t depending on the application build configuration (Unicode or ANSI). As already mentioned, there is no need for a separate type for char or wchar_t strings because wxString stores strings using the appropriate underlying C type. Whenever you work directly with strings that you intend to use with a wxWidgets class, use wxChar instead of char or wchar_t directly. Doing so ensures compatibility with both ANSI and Unicode build configuration without complicated preprocessor conditions.

When using wxWidgets with Unicode enabled, standard string literals are not the correct type: an unmodified string literal is always of type char*. In order for a string literal to be used in Unicode mode, it must be a wide character constant, usually marked with an L. wxWidgets provides the wxT macro (identical to _T) to wrap string literals for use with or without Unicode. When Unicode is not enabled, _T is an empty macro, but with Unicode enabled, it adds the necessary L for the string literal to become a wide character string constant. For example:

 wxChar ch = wxT('*'); wxString s = wxT("Hello, world!"); wxChar* pChar = wxT("My string"); wxString s2 = pChar;

For more details about using Unicode in your applications, please see Chapter 16, "Writing International Applications."

Basic wxString to C Pointer Conversions

Because there may be times when you need to access a wxString's data as a C type for low-level processing, wxWidgets provides several accessors:

mb_str returns a C string representation of the string, a const char*, regardless of whether Unicode is enabled. In Unicode mode, the string is converted, and data may be lost.
wc_str returns a wide character representation of the string, a wchar_t*, regardless of whether Unicode is enabled. In ANSI mode, the string is converted to Unicode.
c_str returns a pointer to the string data (const char* in ANSI mode, const wchar_t* in Unicode mode). No conversion takes place.

You can convert between std::string and wxString by means of c_str, as follows:

 std::string str1 = wxT("hello"); wxString str2 = str1.c_str(); std::string str3 = str2.c_str();

One trap when using wxString is the implicit conversion operator to const char *. It is advised that you use c_str to indicate clearly when the conversion is done. The danger of this implicit conversion may be seen in the following code fragment:

 // converts the input string to uppercase, outputs it to the // screen, and returns the result (buggy) const char *SayHELLO(const wxString& input) {     wxString output = input.Upper();     printf("Hello, %s!\n", output);     return output; }

There are two nasty bugs in these four lines. The first is in the call to the printf function. The implicit conversion to a C string is automatically applied by the compiler in the case of functions like puts because the argument of puts is known to be of the type const char *. However, this is not done for printf, which is a function with a variable number of arguments whose types are unknown. Such a call to printf might do anything at all (including displaying the correct string on screen), although the most likely result is a program crash. The solution is to use c_str:

 printf(wxT("Hello, %s!\n"), output.c_str());

The second bug is that returning the variable named output doesn't work. The implicit cast is used again, so the code compiles, but it returns a pointer to a buffer belonging to a local variable that is deleted as soon as the function exits. The solution to this problem is also easy: have the function return a wxString instead of a C string. The corrected code looks like this:

 // converts the input string to uppercase, outputs it to the // screen, and returns the result (corrected) wxString SayHELLO(const wxString& input) {     wxString output = input.Upper();     printf(wxT("Hello, %s!\n"), output.c_str());     return output; }

Standard C String Functions

Because most programs use character strings, the standard C library provides quite a few functions to work with them. Unfortunately, some of them have rather counterintuitive behavior (like strncpy, which doesn't always terminate the resulting string with a NULL) or are considered unsafe with possible buffer overflows. Moreover, some very useful functions are not standard at all. This is why in addition to all wxString functions, there are a few global string functions that try to correct these problems: wxIsEmpty verifies whether the string is empty (returning true for NULL pointers), wxStrlen handles NULLs correctly and returns 0 for them, and wxStricmp is a platform-independent version of the case-insensitive string comparison function known either as stricmp or strcasecmp on different platforms.

The "wx/string.h" header also defines wxSnprintf and wxVsnprintf functions that should be used instead of the inherently dangerous standard sprintf. The "n" functions use snprintf, which does buffer size checks whenever possible. You may also use wxString::Printf without worrying about the vulnerabilities typically found in printf.

Converting to and from Numbers

Programmers often need to convert between string and numeric representations of numbers, such as when processing user input or displaying the results of a calculation.

ToLong(long* val, int base=10) attempts to convert the string to a signed integer in base base. It returns true on success, in which case the number is stored in the location pointed to by val, or false if the string does not represent a valid number in the given base. The value of base must be between 2 and 36, inclusive, or a special value 0, which means that the usual rules of C numbers are applied: if the number starts with 0x, it is considered to be in base 16; if it starts with 0-, it is considered to be in base 8, and in base 10 otherwise.

ToULong(unsigned long* val, int base=10) works identically to ToLong, except that the string is converted to an unsigned integer.

ToDouble(double* val) attempts to convert the string to a floating point number. It returns TRue on success (the number is stored in the location pointed to by val) or false if the string does not represent such a number.

Printf(const wxChar* pszFormat, ...) is similar to the C standard function sprintf, enabling you to put information into a wxString using standard C string formatting. The number of characters written is returned.

static Format(const wxChar* pszFormat, ...) returns a wxString containing the results of calling Printf with the passed parameters. The advantage of Format over Printf is that Format can be used to add to an existing string:

 int n = 10; wxString s = "Some Stuff"; s += wxString::Format(wxT("%d"),n );

operator<< can be used to append an int, a float, or a double to a wxString.

wxStringTokenizer

wxStringTokenizer helps you to break a string into a number of tokens, replacing and expanding the C function strtok. To use it, create a wxStringTokenizer object and give it the string to tokenize and the delimiters that separate the tokens. By default, white space characters will be used. Then call GetNextToken repeatedly until HasMoreTokens returns false.

 wxStringTokenizer tkz(wxT("first:second:third:fourth"), wxT(":")); while ( tkz.HasMoreTokens() ) {     wxString token = tkz.GetNextToken();     // process token here }

By default, wxStringTokenizer will behave in the same way as strtok if the delimiters string contains only white space characters. Unlike the standard functions, however, it will return empty tokens if appropriate for other non-white space delimiters. This is helpful for parsing strictly formatted data where the number of fields is fixed but some of them may be empty, as in the case of tab- or comma-delimited text files.

wxStringTokenizer's behavior is governed by the last constructor parameter, which may be one of the following:

wxTOKEN_DEFAULT: Default behavior as described previously; same as wxTOKEN_STRTOK if the delimiter string contains only white space, and same as wxTOKEN_RET_EMPTY otherwise.
wxTOKEN_RET_EMPTY: In this mode, the empty tokens in the middle of the string will be returned. So a::b: will be tokenized into three tokens a, "", and b.
wxTOKEN_RET_EMPTY_ALL: In this mode, empty trailing tokens (after the last delimiter character) will be returned as well. a::b: will contain four tokens: the same as wxTOKEN_RET_EMPTY and another empty one as the last one.
wxTOKEN_RET_DELIMS: In this mode, the delimiter character after the end of the current token is appended to the token (except for the last token, which has no trailing delimiter). Otherwise, it is the same mode as wxTOKEN_RET_EMPTY.
wxTOKEN_STRTOK: In this mode, the class behaves exactly like the standard strtok function. Empty tokens are never returned.

wxStringTokenizer has two other useful accessors:

CountTokens returns the number of remaining tokens in the string, returning 0 when there are no more tokens.
GetPosition returns the current position of the tokenizer in the original string.

wxRegEx

wxRegEx represents a regular expression. This class provides support for regular expression matching and replacement. wxRegEx is either built on top of the system library (if it is available and has support for POSIX regular expressions, which is the case for most modern Unix variants, including Linux and Mac OS X) or uses the built-in library by Henry Spencer. Regular expressions, as defined by POSIX, come in two variations: extended and basic. The built-in library also adds an advanced mode, which is not available when using the system library.

On platforms where a system library is available, the default is to use the built-in library for Unicode builds, and the system library otherwise. Bear in mind that Unicode is fully supported only by the built-in library. It is possible to override the default when building wxWidgets. When using the system library in Unicode mode, the expressions and data are translated to the default 8-bit encoding before being passed to the library.

Use wxRegEx as you would use any POSIX regular expression processor. Due to the advanced nature and specialized uses of regular expressions, please see the wxWidgets documentation for a complete discussion and API reference.