Processing Strings with the Inline Assembler | Visual C++ Optimization with Assembly Code

The inline assembler can be used for processing strings. Despite the fact that the C++ .NET development environment has powerful string processing procedures, the use of the inline assembler appears to be effective in this case as well. It is often required that string variables be processed in a specific manner, and implementation of such processing with standard procedures is cumbersome and slow. First, we will discuss the most widely used types of strings and methods of converting them.

Like all high-level programming languages, C++ .NET widely uses null- terminated strings. Many various functions were developed for manipulations with such strings in this development environment. Optimization of processing such strings with assembly procedures was discussed in Chapters 2 and 3 .

However, other string types are also used in C++ .NET. The complication of manipulations with null-terminated strings led Microsoft developers to create the CString class. This class became very popular among programmers. A CString string is a variable-length sequence of characters. The characters can be either 16-bit (the UNICODE encoding) or 8-bit (the ANSI encoding). For string manipulations, the methods and properties of the Cstring class are used. This class has powerful functions whose features exceed some standard functions of C++ such as strcat or strcopy .

To initialize a CString object, use the CString statement:

 CString s = "This is a CString string";

You can assign the value of one CString object to another object:

 CString s1 = "This is a test string";  CString s2 = s1;

In this piece of code, the contents of s1 are copied to s2 . For concatenation of two or more strings, you can use the " + " or " += " operators:

 CString s1 = "String 1";  CString s2 = "String 2";  s1 += " is concatenated";  CString sres = s1 + "with " + s2;

The result of this piece of code is the following string:

 String 1 is concatenated with String 2

To manipulate with individual elements of a CString string, use the GetAt and SetAt functions of this class. The first element of a string always has a zero index. For example, to get the character that has index 3 in the s1 string that has the value STRING 1 , execute the following statement:

 s1.GetAt (3)

The same result could be achieved with the " [] " operator. In that case, the string element would be accessed like an array element:

 s1[3]

The result of this operation is an ' I ' character. To put the ' g ' character to the position with index 5 (the sixth element) in the same string, use the statement

 s1.SetAt(5, 'g')

The most powerful function of the CString class is Format . It allows you to convert data of other types to text and is similar to the standard functions sprintf and wsprintf . In the previous examples, we used this function to output the elements of an array to an edit control. Here is a small piece of code:

 for (int cnt = 0; cnt < size_i1; cnt ++)  {    s1. Format ("%d", i1 [cnt]);    s_Src = s_Src + " " + s1;  };

This code outputs the elements of an integer array i1 to an edit control. The control has the CString type because it is associated with the s_Src variable of the CString type. An auxiliary variable s1 has the same type and is used to convert an integer element of the array to the string type. The operator

 s_Src = s_Src + " " + s1;

is familiar to you. It is used to display the converted elements of the array.

As you see, the cstring class significantly simplifies the work with strings (although we looked at only a few of its features!) How can you manipulate with CString objects with the inline assembler?

It is best to illustrate this with an example. We will consider the following task: Suppose you need to substitute all spaces in a CString string with " + " characters and display the result.

To implement the task, develop a dialog-based application and place three edit controls, three static text labels, and a button on its main form. Associate the s1 variable of the CString type to the Edit1 control, the s2 variable of the CString type to the Edit2 control, and the length_s1 integer variable to the Edit3 control.

The original string with spaces will be entered into the Edit1 control, the Edit2 control will display the result of substitution of spaces with " + " characters, and the Edit3 control will display the string length.

Now, look at a fragment of the C++ .NET code that processes the string (Listing 10.43).

Listing 10.43: An onBnclicked handler that processes a cstring string with C++ .NET statements

 void CReplacecharinStringDlg::OnBnClickedButton1()  {    // TODO: Add your control notification handler code here    UpdateData(TRUE);    LPSTR Ips2;    s_Len = strlen((LPCTSTR)s1);    s2 = s1;    Ips2 = s2.GetBuffer(128);    for (int cnt = 0; cnt < s_Len; cnt++)    {      if (*lps2== ' ') *lps2 = '+';      lps2++;    }    UpdateData(FALSE);    s2.ReleaseBuffer; }

As you know, to access a random element of a string or array, you have to know the address and the size of this array and the type of its elements. For null-terminated strings, the address of a string is the address of its first element. The string elements are accessed by indexing the string address.

To access the elements of a CString string, use the GetBuffer function and pass it the buffer size as a parameter. In this case, 128 bytes will be enough. This function returns a pointer to the buffer, allowing you to work with individual elements in the same fashion as in common string processing functions. If you use the following statements:

 ... LPSTR Ips2;    ... lps2 = s2.GetBuffer (128);    ...

you will get the address of the string buffer. Now, you have to determine the string length. This is simple: Use the strlen classical function and store the result in the s_Len variable:

 s_Len = strlen((LPCTSTR)s1);

Now, search for spaces in the string buffer and substitute them with '+' characters. This can be done with the for loop. After you finish all manipulations, release the buffer:

 s2.ReleaseBuffer;

The window of the application is shown in Fig. 10.33.

Fig. 10.33: Window of an application that substitutes spaces with plus characters in a CString string

You can optimize the previous program by replacing the for loop with an assembly procedure. The source code of the procedure (we will name it replaceChar ) is shown in Listing 10.44.

Listing 10.44: An assembly function that looks for and replaces characters in a cstring string

 void CReplaceCharinCStringwithBASMDlg::replaceChar(char* ps1, int ls1)  {   _asm {          mov   EDI, ps1          mov   ECX, ls1          cld          mov   AL, ' '        next:          scasb          je    change        cont:          loop  next          jmp   ex        change:          mov   [EDI1], '+'          jmp   cont        ex:    };  }

This procedure takes the address of a string buffer and the string length as parameters. The buffer address is loaded to the EDI register, and the string length to the ECX register. To look for characters and replace them, use the scasb string command that compares the contents of the AL register (the space character) to the current element of the string. The number of iterations depends on the string length. Since the value of the address was incremented after the comparison, the space (if found) is substituted with the ' + ' character with the following command:

 mov [EDI1], '+'

The source code of the OnBnClicked handler after the changes are made is shown in Listing 10.45.

Listing 10.45: The use of an assembly procedure for searching for and replacing characters in a CString string

 void CReplaceCharinCStringwithBASMDlg::OnBnClickedButton1()  {    // TODO: Add your control notification handler code here    UpdateData(TRUE);    LPSTR lps2;    length_s1 = strlen((LPCTSTR)s1);    s2 = s1;    lps2 = s2.GetBuffer(128);    replaceChar(lps2, length_s1);    UpdateData(FALSE);    s2.ReleaseBuffer;  };

Processing strings with the inline assembler is especially advantageous when specific manipulations with string elements are required, or when the string processing algorithm is complicated. For such tasks , if the code is developed only in C++, the program usually becomes excessively complicated and slow. A wise combination of C++ and the assembler is the best solution in this case.

We looked at the features of the Visual C++ .NET 2003 inline assembler that assist you in developing effective applications. Attention was given to techniques of using the inline assembler in practice when working with various data types. We emphasize that the material presented here does not exhaust the possibilities of modern data processing technologies such as MMX and SSE, but it creates a good basis for future work in this direction.