String Basics

The .NET Framework finally brings a unified string definition to the multiple languages targeted at .NET. A string, as far as the Common Type System (CTS) is concerned, is just an array of Unicode characters. The .NET String class provides several methods that allow for easy comparison, concatenation, formatting, and general manipulation of strings.

Understanding the Immutability of Strings

The String class in .NET is immutable; in other words, the string itself is never modified. When characters or other strings are appended to a string, a new string is created as a result. The original string and the string to append are used to generate an entirely new string. Such immutability can cause a performance degradation for applications in which heavy string manipulation is performed. To avoid this reduction in performance, you should make use of the StringBuilder class, which is discussed in the following section, for all heavy-duty string manipulation.

TIP

Actually, there are very few cases in which you would not want to use a StringBuilder. As a general rule, if you perform more than one string concatenation within a scope block (method, for loop, and so on), or even a single very large concatenation, you should remove the standard concatenation and replace it with the use of a StringBuilder.

Applying Formatting to Strings

After declaring a string, the next task is to format data for presentation. This area of string formatting is not well documented and few examples exist to fully illustrate how rich the string-formatting features are within .NET.

Basic string formatting allows for data to be inserted to locations within a string. These insertion locations are denoted by using placeholders along with an ordinal value that corresponds to the sequence of the item to be inserted. For example, consider inserting an integer value within a string . Listing 4.1 shows how to insert values into a string.

Listing 4.1. Simple String Formatting Example

    using System;    namespace Listing_4_1 {        class Class1  {         static void Main(string[] args) {                int a    = 1;                int b    = 2;                int c    = 3;              string OneItem    = string.Format( "Value of a = {0}", a );                string TwoItems   = string.Format( "Value of a = {0}, b = {1}", a, b );                string ThreeItems = string.Format( "Value of a = {0},       b = {1}, c = {2}", a, b, c );                Console.WriteLine( OneItem );                Console.WriteLine( TwoItems );                Console.WriteLine( ThreeItems );             }         }     }

Each placeholder represents the zero-based index of the item in the argument list to insert into the string. This is the most basic type of formatting available and also the most often used.

There also exists the ability to align values either left or right within a padded region of the insertion point. The padding ensures that the width of the inserted item with is at least N character spaces wide and the alignment determines whether the inserted string is aligned to the left or right of the area. Listing 4.2 demonstrates how to make use of padding and alignment .

Listing 4.2. Padding and Alignment in String Formatting

    using System;    namespace Listing_4_2 {      class Class1 {       static void Main(string[] args)    {      string rightAlign = string.Format( "[{0,20}]","Right Aligned");        string leftAlign  = string.Format( "[{0,-20}]","Left Aligned" );         Console.WriteLine( rightAlign );         Console.WriteLine( leftAlign );       }      }    }

Beyond basic insertion and padding of values into a string, string formatting also offers the ability to format data such as currency, dates, and hexadecimal values. The list of formatting options can be separated into two categories: basic and custom. Basic and custom formatting applies to both integer values and date values. Tables 4.1 through 4.4 list the formatting for integers and dates for both basic and custom formatting.

Table 4.1. Basic Integer Formatting
Specifier	Type	Format	Input	Output
`c`	Currency	{ 0:c}	250.25	$250.25
			-250.25	-$250.25
`d`	Decimal (whole number)	{ 0:d}	250	250
			-250	-250
`e`	Scientific	{ 0:e}	3.14	3.140000e+000
			-3.14	-3.140000e+000
`f`	Fixed point	{ 0:f}	3.14	3.14
			-3.14	-3.14
`g`	General	{ 0:g}	3.14	3.14
			-3.14	-3.14
`n`	Number with commas for thousands	{ 0:n}	25000	25,000
			-25000	-25,000
`p`	Percent	{ 0:p}	.25	25.00%
		{ 0,2:p}	.25555	25.56%
`X`	Uppercase hexadecimal	{ 0:X}	15	F
`x`	Lowercase hexadecimal	{ 0:x}	15	F

Table 4.2. Custom Integer Formatting
Specifier	Type	Format	Input	Output
`0`	Zero placeholder	{ 0:00.0000}	3.14	3.1400
`#`	Digit placeholder	{ 0:(#).##}	3.14	(3).14
`.`	Decimal point	{ 0:0.0}	3.14	3.1
`,`	Thousand separator	{ 0:0,0}	2500.25	2,500
`,.`	Number scaling	{ 0:0,.}	2000	2 (Note: Scales by 1000)
`%`	Percent	{ 0:0%}	25	2500% Multiplies by 100 and adds percent sign
`;`	Group separator	{Positive-Format};{ Negative-Format} ;{Zero-format}

With the exception of the group separator, custom integer formatting is obvious at first glance. The group separator allows for multiple format options based on the integer value to be formatted. Essentially, the group separator allows for three different format specifications, based on the value of the integer to be formatted. Those specifications apply to a positive value, and then a negative value, and finally a zero value. For instance, if you want negative floating point values to appear in parentheses, the following formatting could be used:

 string result = string.Format("{0:$##,###.00;$(##,###.00);$-.--}", amount);

The next common data type for formatting is the DateTime struct within .NET. There are many options when it comes to formatting dates, and Tables 4.3 and 4.4 list the various formatting specifiers and outcomes for date formatting.

Table 4.3. Basic Date Formatting
Specifier	Description	Format	Result Using `System.DateTime.Now`
`d`	Short date	{ 0:d}	4/17/2004
`D`	Long date	{ 0:D}	April 17, 2004
`t`	Short time	{ 0:t}	11:50 AM
`T`	Long time	{ 0:T}	11:50:30: AM
`f`	Full date and time	{ 0:f}	April 17, 2004 11:51 AM
`F`	Long full date and time	{ 0:F}	April 17, 2004 11:51:45 AM
`g`	Default date and time	{ 0:g}	4/17/2004 11:53 AM
`G`	Long default date and time	{ 0:G}	4/17/2004 11:53:45 AM
`M` or `m`	Month day	{ 0:M}	April 17
`R` or `r`	RFC1123 date string	{ 0:r}	Sat, 17 Apr 2004 11:55:17 GMT
`s`	Sortable date string ISO 8601	{ 0:s}	2004-04-17T11:56:22
`u`	Universal sortable date pattern	{ 0:u}	2004-04-17 11:58:11Z
`U`	Universal sortable full date pattern	{ 0:U}	Saturday, April 17, 2004 3:58:32 PM
`Y` or `y`	Year month pattern	{ 0:Y}	April, 2004

Table 4.4. Custom Date Formatting
	Description	Format
`d`	Displays the day of the week as a number	{ 0:d}
`dd`	Displays the day of the month as a leading zero integer	{ 0:dd}
`ddd`	Displays the abbreviated day of the week	{ 0:ddd}
`dddd`	Displays the full name of the day of the week	{ 0:dddd}
`f,ff,fff,ffff...`	Displays seconds fractions in one or more digits	{ 0:f} or { 0:ff}
`g` or `gg`	Displays the era, such as B.C. or A.D.	{ 0:g}
`h`	Displays the hour from 112	{ 0:h}
`hh`	Displays the hour from 112 with leading zero	{ 0:hh}
`H`	Displays the hour in military format 023	{ 0:H}
`HH`	Displays the hour in military format 023 with leading zero for single-digit hours	{ 0:HH}
`m`	Displays the minute as an integer	{ 0:m}
`mm`	Displays the minute as an integer with leading zero for single-digit minute values	{ 0:mm}
`M`	Displays the month as an integer	{ 0:M}
`MM`	Displays the month as an integer with leading zero for single-digit month values	{ 0:MM}
`MMM`	Displays the abbreviated month name	{ 0:MMM}
`MMMM`	Displays the full name of the month	{ 0:MMMM}
`s`	Displays the seconds as a integer	{ 0:s}
`ss`	Displays the seconds as an integer with a leading zero for single-digit second values	{ 0:ss}
`t`	Displays the first character of A.M. or P.M.	{ 0:t}
`tt`	Displays the full A.M. or P.M.	{ 0:tt}
`y`	Displays two-digit year, with no preceding 0 for values 09.	{ 0:y}
`yy`	Displays two-digit year	{ 0:yy}
`yyyy`	Displays four-digit year	{ 0:yyyy}
`zz`	Displays the time zone offset	{ 0:zz}
`:`	Time separator	{ 0:hh:mm:tt}
`/`	Date separator	{ 0:MM/dd/yyyy}

Using Escape Sequences

It is often necessary to include in a string special characters such as tab, newline, or even the \ character. To insert such formatting, it is necessary to use the escape character (\), which tells the formatting parser to treat the next character as a literal character to be inserted into the resulting string. To insert the escape character, it is necessary to escape it with the escape character. The following code illustrates this:

 string escapeMe = string.Format( "C:\\SAMS\\Code" );

With the escape character in place, the value of escapeMe is "C:\SAMS\CODE".

FORMATTING NOTES

If you don't want to use the double-backslash (\\) syntax, C# provides a special shortcut that you can use. By preceding any string literal with the @ symbol, it acts as an escape for the entire string, enabling you to write code that looks like this:

 string myFile = @"C:\SAMS\Code\File.txt";

As a special note, the { character can also cause difficulty when attempting to use it in a string that contains other formatting characters. To display the { character itself, use {{ to escape it. This comes into play only during the following:

 string myString = string.Format( "{{x}} = {0}", x );

Otherwise, if no other formatting is taking place, just use a single { character.

Locating Substrings

One of the most common string-processing requirements is the locating of substrings within a string. The System.String class provides several methods for locating substrings and each method in turn provides several overloaded versions of itself. Table 4.5 details the methods for locating substrings.

Table 4.5. Substring Methods of the `System.String` Class
Method	Description
`EndsWith`	Used to determine whether a string ends with a specific substring. Returns true or false.
`IndexOf`	Returns the first index (zero-based) location of the supplied substring or character. Returns 1 if the substring is not found.
`IndexOfAny`	Returns the first index (zero-based) location of the supplied substring or partial match. Returns 1 if the substring is not found.
`LastIndexOf`	Returns the last index of the specified substring. Returns 1 if the substring is not found.
`LastIndexOfAny`	Returns the last index of the specified substring or partial math. Returns 1 if the substring is not found.
`StartsWith`	Returns true if the string starts with the specified substring or character.

Adding Padding

Just as with format specifiers and padding, the String class provides a set of padding methods that pad a string with a space or specified character. Padding can be used to pad spaces or characters to the left or right of the target string. The code in Listing 4.3 shows how to pad a string to 20 characters in length with leading spaces.

Listing 4.3. 20 Characters Wide String Left Padded with Spaces

 string leftPadded     = "Left Padded"; Console.WriteLine("123456789*123456789*123456789*"); Console.WriteLine( leftPadded.PadLeft(20, ' ' ) );

The output of the code in Listing 4.3 is as follows:

 123456789*123456789*          Left Padded

Trimming Characters

Sometimes it is necessary to remove characters from a string and this is the purpose of trimming. The TRim method allows for the removal of spaces or characters from either the start or end of a string. By default, the trim method removes leading and trailing spaces from a string.

In addition, there are two other trimming methods. trimStart removes spaces or a list of specified characters from the beginning of a string. TRimEnd removes spaces or a list of specified characters from the end of a string.

You can access the trim method and others like it on any string variable, as shown here:

 string myTrimmedString = myString.Trim();

Replacing Characters

To replace characters or substrings in a string, use the Replace method. For instance, to remove display formatting from a phone number such as (919) 555-1212, the following code can be used:

 string phoneNumber = "(919) 555-1212"; string fixedPhoneNumber =   phoneNumber.Replace( "(", "" ).  Replace( ")", "" ).Replace( "-", "" )   .Replace( " ", "" ); Console.WriteLine( fixedPhoneNumber );

Notice how the Replace method is used. Each time Replace is called, a new string is created. Thus, the cascading use of the Replace method to remove all unwanted strings is necessary.

REPLACING WITH EMPTY STRINGS

When you want to remove a character and replace it with nothing, you must use the string version rather than the empty character '' notation; otherwise, the compiler will issue a warning about empty character declarations.

Splitting Strings

String splitting comes in handy for parsing comma-separated values or any other string with noted separated characters. The Split method requires nothing more than a character parameter that denotes how to split up the string. The result of this operation is an array of strings where each element is a substring of the original string. To separate or spilt a comma-separated list such as apple,orange,banana, merely invoke the Split method passing in the comma as the split token. The following code demonstrates the result:

 string fruit = "apple,orange,banana"; string[] fruits= fruit.Split( ',' ); foreach (string fruitName in fruits)     Console.WriteLine(fruitName); //Result //fruits[0] -> apple //fruits[1] -> orange //fruits[2] -> banana

Modifying Case

The last two major methods of the String class involve changing the case of a string. The case can be changed to uppercase or lowercase and results in a new string of the specified case. Remember that strings are immutable and any action that modifies a string results in a new string. Therefore, the following takes place:

 string attemptToUpper  = "attempt to upper"; attemptToUpper.ToUpper( ); //attemptToUpper is still all lower case

To see the effect of the ToUpper() method, the result string has to be assigned to a variable. The following illustrates the proper use of ToUpper():

 string allLower = "all lower"; string ALL_UPPER = allLower.ToUpper( ); //ALL_UPPER -< "ALL LOWER";

The `StringBuilder`

To improve performance, the StringBuilder class is designed to manage an array of characters via direct manipulation. Such an implementation eliminates the need to constantly allocate new strings. This improves performance by saving the garbage collector from tracking and reclaiming small chunks of memory, as would be the case using standard string functions and concatenation. The StringBuilder class is located in the System.Text namespace.

Appending Values

The most basic use of the StringBuilder class is to perform string concatenation, which is the process of building a result string from various other strings and values until the final string is complete. The StringBuilder class provides an Append method. The Append method is used to append values to the end of the current string. Values can be integer, boolean, char, string, DateTime, and a list of others. In fact, the Append method has 19 overloads in order to accommodate any value you need to append to a string.

Using `AppendFormat`

In addition to appending values to the current string, StringBuilder also provides the ability to append formatted strings. The format specifiers are the same specifiers listed in the previous section. The AppendFormat method is provided in order to avoid calls to string.Format(...) and the unnecessary creation of additional strings.

Inserting Strings

The insertion of strings is another useful method provided by the StringBuilder class. The Insert method takes two parameters. The first parameter specifies the zero-based index at which to begin the insertion. The second parameter is the value to insert at the specified location. Similar to the Append method, the Insert method provides 18 different overloads in order to support various data types for insertion into the string. Listing 4.4 shows the usage of the Insert method.

Listing 4.4. Using the `Insert` Method to Create a SQL Statement

 using System; using System.Text; namespace Listing_4_4 {   class Class1 {     [STAThread]      static void Main(string[] args) {        StringBuilder stmtBuilder    = new StringBuilder( "SELECT FROM MYTABLE" );        Console.Write( "Enter Columns to select from MYTABLE: ");        string columns = Console.ReadLine( );          //FirstName, LastName        stmtBuilder.Insert( 7, columns );        //insert a space after the column names        stmtBuilder.Insert( 7 + columns.Length, " " );        //SELECT FirstName, LastName FROM MYTABLE        Console.WriteLine( stmtBuilder.ToString( ) );         }     } }

Replacing Strings and Characters

You might run across a need to generate strings based on templates where certain tokens (substrings) are later replaced with values. In fact, this is how Visual Studio .NET works. There is a template file from which each project is created. The new source file that is created is generated from a template and various tokens are replaced based on the type of project, the name of the project, and other options. You can achieve this same effect using the Replace method provided by StringBuilder.

Using the Replace method, it is possible to create template strings, such as SQL statements, and replace tokens with actual values as demonstrated by the following code:

 StringBuilder selectStmtTemplate = string StringBuilder(); selectStmtTemplate.Append( "SELECT $FIELDS FROM $TABLE" ); selectStmtTemplate.Replace( "$FIELDS", fieldList ); selectStmtTemplate.Replace( "$TABLE", tableName );

Removing Substrings

The Remove method allows for sections of the underlying string to be completely removed from the StringBuilder object. The Remove method takes two parameters. The first parameter specifies the zero-based index of the position denoting the starting point. The second parameter specifies the length or number of characters to remove.