Processing Text Strings with Program Code


Processing Text Strings with Program Code

As you learned in the preceding exercises, you can quickly open, edit, and save text files to disk with the TextBox control and a handful of well-chosen program statements. Visual Basic also provides a number of powerful statements and functions specifically designed for processing the textual elements in your programs. In this section, you'll learn how to extract useful information from a text string, copy a list of strings into an array, and sort a list of strings.

An extremely useful skill to develop when working with textual elements is the ability to sort a list of strings. The basic concepts in sorting are simple. You draw up a list of items to sort, and then compare the items one by one until the list is sorted in ascending or descending alphabetical order.

In Visual Basic, you compare one item with another by using the same relational operators that you use to compare numeric values. The tricky part (which sometimes provokes long-winded discussion among computer scientists) is the specific sorting algorithm you use to compare elements in a list. We won't get into the advantages and disadvantages of different sorting algorithms in this chapter. (The bone of contention is usually speed, which makes a difference only when several thousand items are sorted.) Instead, we'll explore how the basic string comparisons are made in a sort. Along the way, you'll learn the skills necessary to sort your own text boxes, list boxes, files, and databases.

Processing Strings by Using Methods and Keywords

The most common task you've accomplished so far with strings is concatenating them by using the concatenation operator (&). For example, the following program statement concatenates three literal string expressions and assigns the result “Bring on the circus!” to the string variable Slogan:

Dim Slogan As String Slogan = "Bring" & " on the " & "circus!"

You can also concatenate and manipulate strings by using methods in the String class of the .NET Framework library. For example, the String.Concat method allows equivalent string concatenation by using this syntax:

Dim Slogan As String Slogan = String.Concat("Bring", " on the ", "circus!")

Visual Basic 2005 features two methods for string concatenation and many other string-processing tasks: You can use operators and functions from earlier versions of Visual Basic (Mid, UCase, LCase, and so on), or you can use newer methods from the .NET Framework (Substring, ToUpper, ToLower, and so on). There's no real “penalty” for using either string-processing technique, although the older methods exist primarily for compatibility purposes. (By providing both methods, Microsoft hopes to welcome upgraders and let them learn new features at their own pace.) In the rest of this chapter, I'll focus on the newer string-processing functions from the .NET Framework String class. However, you can use either string-processing method or a combination of both.

The table here lists several of the .NET Framework methods that appear in subsequent exercises and their close equivalents in the Visual Basic programming language. The fourth column in the table provides sample code for the methods in the String class of the .NET Framework.

.NET Framework method

Visual Basic function

Description

.NET Framework example

ToUpper

UCase

Changes letters in a string to uppercase.

Dim Name, NewName As String Name = "Kim" NewName = Name.ToUpper 'NewName = "KIM"

ToLower

LCase

Changes letters in a string to lowercase.

Dim Name, NewName As String Name = "Kim" NewName = Name.ToLower 'NewName = "kim"

Length

Len

Determines the number of characters in a string.

Dim River As String Dim Size As Short River = "Mississippi" Size = River.Length 'Size = 11

Substring

Mid

Returns a fixed number of characters in a string from a given starting point. (Note: The first element in a string has an index of 0.)

Dim Cols, Middle As String Cols = "First Second Third" Middle = Cols.SubString(6, 6) 'Middle = "Second"

IndexOf

InStr

Finds the starting point of one string within a larger string.

Dim Name As String Dim Start As Short Name = "Abraham" Start = Name.IndexOf("h") 'Start = 4

Trim

Trim

Removes leading and following spaces from a string.

Dim Spacey, Trimmed As String Spacey = "   Hello   " Trimmed = Spacey.Trim 'Trimmed = "Hello"

Remove

Removes characters from the middle of a string.

Dim RawStr, CleanStr As String RawStr = "Hello333 there!" CleanStr = RawStr.Remove(5, 3) 'CleanStr = "Hello there!"

Insert

Adds characters to the middle of a string.

Dim Oldstr, Newstr As String Oldstr = "Hi Felix" Newstr = Oldstr.Insert(3, "there ") 'Newstr = "Hi there Felix"

StrComp

Compares strings and disregards case differences.

Dim str1 As String = "Soccer" Dim str2 As String = "SOCCER" Dim Match As Short Match = StrComp(str1, _   str2, CompareMethod.Text) 'Match = 0 [strings match]

Sorting Text

Before Visual Basic can compare one character with another in a sort, it must convert each character into a number by using a translation table called the ASCII character set (also called the ANSI character set). ASCII is an acronym for American Standard Code for Information Interchange. Each of the basic symbols that you can display on your computer has a different ASCII code. These codes include the basic set of “typewriter” characters (codes 32 through 127) and special “control” characters, such as tab, linefeed, and carriage return (codes 0 through 31). For example, the lowercase letter “a” corresponds to the ASCII code 97, and the uppercase letter “A” corresponds to the ASCII code 65. As a result, Visual Basic treats these two characters quite differently when sorting or performing other comparisons.

In the 1980s, IBM extended ASCII with codes 128 through 255, which represent accented, Greek, and graphic characters, as well as miscellaneous symbols. ASCII and these additional characters and symbols are typically known as the IBM extended character set.

TIP
To see a table of the codes in the ASCII character set, search for “Chr, ChrW functions” in the Visual Studio online Help, and then click ASCII Character Codes in the Other Resources section near the end of the article.

The ASCII character set is still the most important numeric code for beginning programmers to learn, but it isn't the only character set. As the market for computers and application software has become more global, a more comprehensive standard for character representation called Unicode has emerged. Unicode can hold up to 65,536 symbols—plenty of space to represent the traditional symbols in the ASCII character set plus most (written) international languages and symbols. A standards body maintains the Unicode character set and adds symbols to it periodically. Microsoft Windows NT, Microsoft Windows 2000, Microsoft Windows XP, Microsoft Windows Server 2003, and Visual Studio have been specifically designed to manage ASCII and Unicode character sets. (For more information about the relationship between Unicode, ASCII, and Visual Basic data types, see “Working with Specific Data Types” in Chapter 5, “Visual Basic Variables and Formulas, and the .NET Framework.”)

In the following sections, you'll learn more about using the ASCII character set to process strings in your programs. As your applications become more sophisticated and you start planning for the global distribution of your software, you'll need to learn more about Unicode and other international settings.

Working with ASCII Codes

To determine the ASCII code of a particular letter, you can use the Visual Basic Asc function. For example, the following program statement assigns the number 122 (the ASCII code for the lowercase letter “z”) to the AscCode short integer variable:

Dim AscCode As Short AscCode = Asc("z")

Conversely, you can convert an ASCII code to a letter with the Chr function. For example, this program statement assigns the letter “z” to the letter character variable:

Dim letter As Char letter = Chr(122)

The same result could also be achieved if you used the AscCode variable just declared as shown here:

letter = Chr(AscCode)

How can you compare one text string or ASCII code with another? You simply use one of the six relational operators Visual Basic supplies for working with textual and numeric elements. These relational operators are shown in the following table.

Operator

Meaning

<>

Not equal

=

Equal

<

Less than

>

Greater than

<=

Less than or equal to

>=

Greater than or equal to

A character is “greater than” another character if its ASCII code is higher. For example, the ASCII value of the letter “B” is greater than the ASCII value of the letter “A,” so the expression

"A" < "B"

is true, and the expression

"A" > "B"

is false.

When comparing two strings that each contain more than one character, Visual Basic begins by comparing the first character in the first string with the first character in the second string and then proceeds character by character through the strings until it finds a difference. For example, the strings Mike and Michael are the same up to the third characters (“k” and “c”). Because the ASCII value of “k” is greater than that of “c,” the expression

"Mike" > "Michael"

is true.

If no differences are found between the strings, they are equal. If two strings are equal through several characters but one of the strings continues and the other one ends, the longer string is greater than the shorter string. For example, the expression

"AAAAA" > "AAA"

is true.

Sorting Strings in a Text Box

The following exercise demonstrates how you can use relational operators and several string methods and functions to sort lines of text in a text box. The program is a revision of the Quick Note utility and features an Open command that opens an existing file and a Close command that closes the file. There's also a Sort Text command on the File menu that you can use to sort the text currently displayed in the text box.

Because the entire contents of a text box are stored in one string, the program must first break that long string into smaller individual strings. These strings can then be sorted by using the ShellSort Sub procedure, a sorting routine based on an algorithm created by Donald Shell in 1959. To simplify these tasks, I created a module that defines a dynamic string array to hold each of the lines in the text box. I also placed the ShellSort Sub procedure in the module so that I can call it from any event procedure in the project. (For more about using modules, see Chapter 10, “Creating Modules and Procedures.”) Although you learned how to use the powerful Array.Sort method in Chapter 11, “Using Arrays to Manage Numeric and String Data,” the ShellSort procedure is a more flexible and customizable tool. Building the routine from scratch also gives you a little more experience with processing textual values—an important learning goal of this chapter.

Another interesting aspect of this program is the routine that determines the number of lines in the text box object. No existing Visual Basic function computes this value automatically. I wanted the program to be able to sort a text box of any size line by line. To accomplish this, I created the code that follows. It uses the Substring method to examine one letter at a time in the text box object and then uses the Chr function to search for the carriage return character, ASCII code 13, at the end of each line. (Note in particular how the Substring method is used as part of the Text property of the txtNote object. The String class automatically provides this method, and many others, for any properties or variables that are declared in the String type.)

Dim ln, curline, letter As String Dim i, charsInFile, lineCount As Short 'determine number of lines in text box object (txtNote) lineCount = 0 'this variable holds total number of lines charsInFile = txtNote.Text.Length 'get total characters For i = 0 To charsInFile - 1 'move one char at a time     letter = txtNote.Text.Substring(i, 1) 'get letter     If letter = Chr(13) Then 'if carriage ret found         lineCount += 1 'go to next line (add to count)         i += 1 'skip linefeed char (typically follows cr on PC)     End If Next i

The total number of lines in the text box is assigned to the lineCount short integer variable. I use this value a little later to dimension a dynamic array in the program to hold each individual text string. The resulting array of strings then gets passed to the ShellSort Sub procedure for sorting, and ShellSort returns the string array in alphabetical order. After the string array is sorted, I can simply copy it back to the text box by using a For loop.

Run the Sort Text program

  1. Open the Sort Text project located in the c:\vb05sbs\chap13\sort text folder.

  2. Click the Start Debugging button to run the program.

  3. Type the following text, or some text of your own, in the text box:

    Zebra

    Gorilla

    Moon

    Banana

    Apple

    Turtle

    Be sure to press Enter after you type “Turtle” (or your own last line) so that Visual Basic can calculate the number of lines correctly.

  4. Click the Sort Text command on the File menu.

    The text you typed is sorted and redisplayed in the text box as follows:

    graphic

  5. Click the Open command on the File menu, and open the abc.txt file in the c:\vb05sbs\chap13 folder, as shown here:

    graphic

    The abc.txt file contains 36 lines of text. Each line begins with either a letter or a number from 1 through 10.

  6. Click the Sort Text command on the File menu to sort the contents of the abc.txt file.

    The Sort Text program sorts the file in ascending order and displays the sorted list of lines in the text box, as shown here:

    graphic

  7. Scroll through the file to see the results of the alphabetical sort.

    Notice that although the alphabetical portion of the sort ran perfectly, the sort produced a strange result for one of the numeric entries—the line beginning with the number 10 appears second in the list rather than tenth. What's happening here is that Visual Basic read the 1 and the 0 in the number 10 as two independent characters, not as a number. Because we're comparing the ASCII codes of these strings from left to right, the program produces a purely alphabetical sort. If you want to sort only numbers with this program, you need to prohibit textual input, modify the code so that the numeric input is stored in numeric variables, and then compare the numeric variables instead of strings.



Microsoft Visual Basic 2005 Step by Step
Microsoft Visual Basic 2005 Step by Step (Step by Step (Microsoft))
ISBN: B003E7EV06
EAN: N/A
Year: 2003
Pages: 168

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net