Manipulating Text | Microsoft Office Excel 2003 Programming Inside Out (Inside Out (Microsoft))

< Day Day Up >

Although you might normally think of Excel as a number crunching financial application, it's actually quite a versatile program. Not only can you perform myriad calculations on your numerical data, but you can also handle any text that comes along with that data. Whether you want to perform a simple task such as displaying a welcome message after a user opens a workbook or import data from a text file into an Excel worksheet, you can do so using the text-handling procedures available to you in Excel VBA and as worksheet formulas.

Determining if the Value in a Cell Is Text

The first step in manipulating a text value without generating an error is determining whether the variable or cell value you want to work with is, in fact, a text value. You can determine whether the value in a cell, a variable, or the value typed into an input box, is text by processing the string with the ISTEXT function. For example, you can use the ISTEXT function to guard against data entry errors, such as when someone types the wrong sort of data into a cell or a UserForm. This sort of problem often occurs after you've changed a procedure or put a new form into use-the data entry folks (and it could be you) get so used to typing a customer's identification code first that they forget they're supposed to start with the company name now.

ISTEXT is shown in all capital letters because it is one of many worksheet functions you can call using VBA code. For more information on calling worksheet functions such as ISTEXT in your VBA code, see the Inside Out titled "Is a Function Built into VBA or Not?" later in this chapter.

You can also use the ISTEXT function to ensure that data imported from an external source is formatted as you expected it to be before you run a set of procedures. If you've ever run a text-processing routine on non-text data, you know precisely the type of chaos a little checking can prevent. You can also use ISTEXT as a basic function when you might not be sure precisely what sort of text data you'll receive, but so long as you do know it's text you can write a procedure to cycle through the non-empty cells in a worksheet and perform at least this rudimentary check.

So what do you do if the data you want to work with as a string is actually a number? In that case, you can use the STR function to represent the number as a string. The STR function's syntax is minimal: STR(number), where number is the variable name or address of the cell that contains the number you're changing to text.

For more information on validating cell data, see "Getting Data Entry Right the First Time" on page 187.

ISTEXT is not the only function available in the IS family; Table 9-1 lists the worksheet functions you can use to determine whether a value fits a given category.

Table 9-1: The IS Family of Functions
Function	Returns True If This Condition Is Met
ISBLANK	The value refers to an empty cell.
ISERR	The value refers to any error value except #N/A (value not available).
ISERROR	The value refers to any error value (#N/A, #VALUE!, #REF!, #DIV/0!, #NUM!, #NAME?, or #NULL!).
ISLOGICAL	The value refers to a logical value.
ISNA	The value refers to the #N/A (value not available) error value.
ISNONTEXT	The value refers to any item that is not text. (Note that this function returns TRUE if value refers to a blank cell.)
ISNUMBER	The value refers to a number.
ISREF	The value refers to a reference.
ISTEXT	The value refers to text.

When you're ready to write your string data to a cell, you need to be sure the cell is prepared to accept text data. For example, if the cells are formatted using the General number format (the default) and you try to write a string that appears to be a number (for example, 0000097239) to the cell, the leading zeros will be deleted. You can ensure Excel will treat your input as a string by changing the cell's number format to Text. You would perform this action in the Excel interface by clicking Format, Cells, displaying the Number tab page, and clicking Text. You can do the same thing using the Range object's NumberFormat property.

For example, if you wanted to change the number format of the active cell (which is considered a range in this context) to Text, you would use the following line of code:

ActiveCell.NumberFormat = "@"

You can change the number format of a range to any of the values displayed in the list of Custom number formats available in the Format Cells dialog box (shown in Figure 9-1). If you're not sure which format to use, assign the format to a cell using the Format Cells dialog box, and then click Custom to display the code. Be sure to enclose the code in quotes!

click to expand
Figure 9-1: You can change the format of your cell to any of the formats in the Custom list.

Preparing String Data for Processing

Once you've determined that the data you're about to perform text operations on is, in fact, text, you can take additional steps to ensure that the data will look its best when you process it. There are two functions you can use to process your data: CLEAN and TRIM. The CLEAN function strips out any nonprinting characters from a string. Nonprinting characters are also known as control characters, because they're usually entered by pressing the Ctrl key while typing another key sequence. Nonprinting characters don't often show up in text files or worksheets, but if you import data from another program into Excel you might find them sneaking in as interpretations of formatting or data structure instructions that weren't stripped out when the original data was saved.

Inside Out
Is a Function Built into VBA or Not?

Remember that the default behavior of the Visual Basic Editor is to display the required arguments for a function and the available properties, methods, and events of an object after you've typed the name of the object and a trailing period. You can usually figure out whether you need to use the Application.WorksheetFunction object by trying to type the function in without the object and seeing if a ToolTip listing the required arguments appears. If the ToolTip appears, you can use the function as a function; if not, try typing Application.WorksheetFunction. (there is a period after WorksheetFunction there) and see if the worksheet function you want to use appears in the list of available properties for the object.

click to expand

Note

The following line consists entirely of nonprinting characters:

The TRIM function is similar to the CLEAN function in that both functions get rid of unwanted characters, but the TRIM function strips away unwanted white space (spaces, tabs, carriage returns, line breaks, and so on) before the first alphanumeric character and after the last alphanumeric character in the string. The TRIM function also strips away all but one space between words. Again, these extraneous characters can sneak in as an artifact of untranslated formatting instructions, or the extra space might have been added because the fields in the file had fixed lengths and the originating program padded the string with spaces to make up the difference.

Because CLEAN and TRIM are useful worksheet functions, the fine minds at Microsoft decided to let you use them in Excel VBA. You can call the CLEAN and TRIM functions, plus a host of other useful worksheet functions, by adding the name of the desired function as a property of the Application.WorksheetFunction object. Such calls would look like this:

ActiveCell.Value = Application.WorksheetFunction.Clean(ActiveCell.Value)
ActiveCell.Value = Application.WorksheetFunction.Trim(ActiveCell.Value)

Determining the Number of Characters in a String

Another of the basic text-processing operations that you can perform in Excel VBA is to determine the number of characters in a string, which you do using the LEN function. And, just as the ISTEXT function is useful for validating data, you can use the LEN function to ensure the data you're inputting, importing, or exporting is of the expected length. One example of an occasion when determining that data is of a particular length is if you're typing in International Standard Book Numbers (ISBNs), which are used to identify books. An ISBN is exactly 10 characters long, so if you or a colleague is typing in the title, the authors, the ISBN, and the price of every book in your company's library, you should make sure the ISBNs are of the correct length. Although ISBNs are one example when verifying the length of worksheet or UserForm data would come in handy, the applications are nearly infinite. If your order or customer numbers are all of a specified length, if your product codes are eight characters long but your customer codes are nine characters long, or if you want to make sure no stray digits were mistakenly added to a phone number, you can use the LEN function to verify that your rules are being followed.

The following code verifies the value in the active cell to ensure the product code contained in the cell is exactly 10 characters in length:

If LEN(ActiveCell.Value) <> 10 Then
    MsgBox ("The product code in this cell is not of the required length.")
    ActiveCell.Value = "Error"
End If

If you work with older database management systems, or with a database that has set character lengths for each of its fields, you should be sure to add a validation rule to a column so that you can ensure every entry was read in correctly. And, although many databases are small enough that you can waste a bit of storage by allocating more space than is strictly necessary to hold the field's value, it's a good idea to limit the size of every field (with the possible exception of a comment field) to the minimum possible number of characters.

Another good use for the LEN function is to guarantee that the passwords your colleagues assign to workbooks and worksheets are of a minimum length. As mentioned in Chapter 7, the Excel password protection scheme won't prevent your data from being compromised, but you can make an attacker's job much more time-consuming by assigning longer passwords. An Excel password can be up to 15 characters in length, but it's a good idea to require users to use passwords that are at least 8 characters long.

Inside Out
When to Validate, and When to Use an If…Then Statement

When you're working with values that need to be a certain length, you have a choice of methods to make certain everything falls into line. Way back in Chapter 8, you learned how to use the Range object's Validation property to establish criteria that a cell's value must meet before being accepted. You can set the Validation object's Type parameter to xlValidateTextLength to have Excel check a cell's value to ensure it is of the proper length. Of course, you can use an If…Then rule to the same effect. So why would you choose one method over the other?

You should use a Range object's Validation property when You want to create a single rule that is easily expressed using the Validation property's parameters.
You want the input box to have a specific title and to be grouped with the validation rules.
You want to display a specific type of message box (information, warning, or stop) and have the behavior (whether to move to the next cell or not) programmed as part of the message box type.
You want to have all the criteria stored in a single object.
You should use an If…Then…Else statement when You want to have multiple criteria and find that using multiple If...Then…Else constructions is easier than using the Validation object's Modify method.
You want to have conditional criteria that change depending on a set of circumstances. For example, the maximum credit limit an employee can assign to a customer could vary by employee.

Concatenating Text from Two or More Cells or Variables

Some of the procedures you've encountered so far in this book have generated message boxes using the text from one or more variables or worksheet cells as part of the message box's prompt. The authors of this book admit to playing a bit fast and loose with the order of topics, but now is the time to bring everything up to date by showing you how to add text from a cell, a variable, or a literal into a single output. You use the & operator.

There is a potential trap here, mainly because many readers will be familiar with the ampersand character, &, as the equivalent of the word and. Also, if you have previous programming experience, you might have used the & operator to indicate a logical "and" in expressions such as

If ((Range("C5").Value >= 1000) & (Range("D5")<=10)) Then…

Don't fall into that trap! The VBA concatenation operator & is not the same as the logical And operator, the latter of which is spelled out as the word And. The previous If condition statement is properly written as

If ((Range("C5").Value >= 1000) And (Range("D5")<=10)) Then…

The concatenation operator is fairly straightforward to use. For example, you could use the concatenation operator in conjunction with the LEN function described earlier to indicate why the data typed into a cell is invalid.

Public Sub VerifyLength()
If Len(ActiveCell.Value) <> 10 Then
    MsgBox ("The product code entered is " & LEN(ActiveCell.Value) & _ " characters, not 10.")
    ActiveCell.Value = ""
End If
End Sub

The LEN function and the & operator are also useful if you need to add characters to a cell value or a variable so the text is the expected length for export to a program that requires fixed-length data. To add characters to the beginning or end of a string, you use the REPT function in combination with the & operator. The REPT function has the following syntax:

Application.WorksheetFunction.REPT(string, times)

The string parameter provides the string to be repeated, and times indicates the number of times the character should be repeated. For example, if you worked for a fast-growing company that used a variable-length order code to track orders, you might need to change the 5 character code to a 10-character code. That's no problem in Excel-all you need to do is repeat a zero at the front of each order code to bring the length up to 10 characters. The following procedure checks the length of the order code string in the cells the user identifies and adds enough x's to make the string 10 characters long:

Public Sub MakeTen()
 
Dim strFirst, strLast, strAllCells, strPadding, strContents As String
Dim intPadding As Integer
 
strFirst = InputBox("Enter the address of the first cell.")
strLast = InputBox("Enter the address of the last cell.")
strAllCells = strFirst & ":" & strLast
 
For Each MyCell In Range(strAllCells).Cells
 
If Len(MyCell.Value) < 10 Then
    Range(MyCell.Address).Select
    strContents = MyCell.Value
    
    intPadding = 10 - Len(MyCell.Value)
    strPadding = Application.WorksheetFunction.Rept("0", intPadding)
    MyCell.NumberFormat = "@"
    MyCell.Value = strPadding & strContents
End If
    
    MyCell.NumberFormat = "@"
    
Next MyCell
 
End Sub

Caution

You need to make sure that the order code you're changing is stored as a string, not as a number. If you add a string of zeros to the beginning of a cell value that Excel translates as a number (which includes a cell with a General format), Excel will discard the zeros as meaningless. For a worksheet cell, change the cell's format to Text.

Returning the First or Last Several Characters from a String

When you work with spreadsheet data, it's likely that you'll find patterns in the data, perhaps patterns that you yourself program in. Although it's certainly possible that every character in a string will serve a known purpose, you might just need to read in the first or last few characters of a string to derive the information that you need for a particular task. For example, if the first five digits of a book's ISBN tell you the book's publisher (and they do), you could read those digits into memory, look up the publisher's identity in a database or worksheet table, and write the data into another cell programmatically.

To return the first or last several characters in a string, you use the Left function, which returns characters from the beginning of a string, or the Right function, which returns characters from the end of the string. The syntax of the two functions, with the exception of the function name, of course, is identical.

Left(string, length)
Right(string, length)

For these functions, string is the variable or range that contains the string you want to process and length is the number of characters you want to return. As an example, consider a worksheet where the items included in orders placed with The Garden Company are stored as a collection of worksheet rows.

click to expand

Because The Garden Company uses a consistent naming system for its products, you can create a procedure to determine the category of each product ordered. In this naming system, the first two characters of an item's identification code indicate to which category the item belongs (TL for Tools, SP for Supplies, and FN for Furniture). So, rather than require a user to enter the category, the procedure could do it for them.

Note

The data in the worksheet has all uppercase letters, so the entries in the procedure's Case statements look for uppercase-only category codes.

Public Sub NoteCategory()
 
Dim MyCell As Range
Dim strFirst, strLast, strAllCells, strCategory As String
 
strFirst = InputBox("Enter the address of the first cell in the OrderItem column.")
strLast = InputBox("Enter the address of the last cell in the OrderItem column.")
strAllCells = strFirst & ":" & strLast
 
For Each MyCell In Range(strAllCells).Cells
 
    Range(MyCell.Address).Select
    strCategory = Left(MyCell.Value, 2)
 
    Select Case strCategory
    Case "TL"
        ActiveCell.Offset(0, 1).Value = "Tools"
    Case "FN"
        ActiveCell.Offset(0, 1).Value = "Furniture"
    Case "SP"
        ActiveCell.Offset(0, 1).Value = "Supplies"
    Case Else 
        ActiveCell.Offset(0, 1).Value = "Error"
    End Select
 
Next MyCell
 
End Sub

For more information on and a practical example of looking up values from existing lists, including those stored in databases, see Chapter 22: 'Excel and ADO Data Sources.'

Returning Characters from Arbitrary Positions in a String

Life is so much easier when you know what's coming, and working with spreadsheet data is no exception. Well-ordered data streaming in from outside sources is one of the little joys in an Excel programmer's life, because it means you can reach into the data string and pull out what you need. One such example in the United States is the Vehicle Identification Number (VIN). VINs are 17 characters long and encode all of the pertinent information about a car: the make, the model, the color, the year manufactured, the plant where the car was manufactured, and so on. When you have a known data structure, you can use the MID function to pull out just the characters you need for a procedure.

Unlike the Left and Right functions, which pull data from the beginning or end of a string, the MID function pulls a set number of characters from the body of the string. The MID function's syntax is similar to both the Left and Right functions, with the only difference being that you define the position of the first character to return and the number of characters to be returned.

MID(string, start, length)

To pull characters in positions 4 through 8 (a total of five characters) from the value in cell D5, you would use the following code (which assumes you created the variables strCode and strDetails earlier):

strCode = Range("D5").Value
strDetails = MID(strCode, 4, 5)

Finding a String Within Another String

You might have read the heading for this section and wondered why in the world someone would want to find a string within another string. In the world of genetics, you could search for a specific protein sequence to locate a chromosome, but if you're doing that you most likely won't be working through an Excel worksheet to find that substring. Instead, you might have received rows of data that, through no fault of your own, were imported in Excel as a single cell per row. What's worse, the data fields aren't of a fixed length, so you can't use the MID function without a bit of tweaking. However, even if you are ever unfortunate enough to see data such as OI1800230IT7801CI486SPFX2D in a single worksheet cell, you can still find a way to read it if you're clever.

Rather than keep you in suspense, you should know that the data actually breaks out this way: OrderID OI1800230, Item IT7801, CustomerID CI486, Shipping FedEx Second Day. But how to you find that out? The method is equal parts cleverness and skill. The cleverness comes from marking the beginning of each field with a distinct code. In the example string just shown, the first seven characters represent the OrderID, OI1800230. The OrderID begins with the letters OI, which you can assume for the purposes of this example won't occur anywhere else in the string. The same marking technique is used to call out the Item number (IT), the CustomerID (CI), and the Shipping method (SP).

Note

You could guard against any stray occurrences of the marker sequences by putting brackets around the marker, such as <OI> or <IT>. It's the same sort of markup system used in the Hypertext Markup Language (HTML) and the Extensible Markup Language (XML), and it works well for other systems that don't need to encode brackets as values.

When it comes to locating strings within other strings, having skill is operationally defined as knowing about the SEARCH and FIND functions. Both functions return the number of the character within a string at which a specific character or text string is found, but there are minor but important differences in how the functions operate. Here are the functions' syntaxes:

SEARCH(find_text, within_text, start_num)
FIND(find_text, within_text, start_num)

find_text is the text you want to find. If you use the SEARCH function, you can use the wildcard characters, question mark (?) and asterisk (*), in find_text. A question mark matches any single character, whereas an asterisk matches any sequence of characters. If you want to find an actual question mark or asterisk, type a tilde (~) before the character. The SEARCH function isn't case-sensitive, so searching for e will also find E. The FIND function is case-sensitive and doesn't allow wildcards-you can think of it as a more precise version of the SEARCH function. With FIND, searching for e will not find E.

within_text is the text in which you want to search for find_text.
start_num is the character number in within_text at which you want to start searching. If you leave this parameter blank, the search will start from the beginning of the string (that is, start_num = 1).

The benefit of the SEARCH and FIND functions really comes to the fore when you combine them with the MID function. Once you've used the SEARCH or FIND function to locate the start of two consecutive fields, you can use the MID function to draw in the part of the string you want. There is one more subtlety of which you need to be aware. Although the MID function is part of the standard VBA package, the SEARCH and FIND functions are not, so you'll once again need to use the Application.WorksheetFunction object to call the functions, as in the following example:

Application.WorksheetFunction.Search("IT", ActiveCell.Value)

Note

The VBA function INSTR also returns the position of the character where a string begins within another string, but the function is the equivalent of the FIND function in that the INSTR function is case-sensitive.

If you reconsider the nightmare scenario where order item records were imported incorrectly, you could use the SEARCH and the MID functions to find the beginning and the end of each field's values and write the values into a cell, as in the following procedure:

Public Sub SeparateValues()
 
Dim MyCell As Range
Dim intIT, intCI, intSP As Integer
Dim strFirst, strLast, strAllCells As String
 
strFirst = InputBox("Enter the address of the first cell.")
strLast = InputBox("Enter the address of the last cell.")
strAllCells = strFirst & ":" & strLast
 
For Each MyCell In Range(strAllCells).Cells
 
    Range(MyCell.Address).Select
 
    intIT = Application.WorksheetFunction.Search("IT", MyCell.Value)
    intCI = Application.WorksheetFunction.Search("CI", MyCell.Value)
    intSP = Application.WorksheetFunction.Search("SP", MyCell.Value)
 
    ActiveCell.Offset(0, 2).Value = Mid(MyCell.Value, 1, intIT - 1)
    ActiveCell.Offset(0, 3).Value = Mid(MyCell.Value, intIT, intCI - intIT)
    ActiveCell.Offset(0, 4).Value = Mid(MyCell.Value, intCI, intSP - intCI)
    ActiveCell.Offset(0, 5).Value = Mid(MyCell.Value, intSP)
 
Next MyCell
 
End Sub

This procedure would take the strings in cells in the range entered by a user (A2:A21 in this example) and write the component values into the cells to the right, as shown in Figure 9-2.

click to expand
Figure 9-2: The SEARCH and MID functions, among others, let you reach into lengthy text strings to extract the interesting parts.

Note

If you are working with data where each field or cell is of a known length, you can dispense with the calculations to determine where each substring starts.

Inside Out
Text Processing and Browser Cookies

Running a complex corporate Web site means a lot of work for the administrators and, more often than not, a substantial cash outlay on the part of a company to get everything looking and performing just so. Part of the administrator's job, like it or not, is justifying the money and time being spent on the site. And if your site doesn't use a lot of programming, and you're the person who administers the site without doing a lot of programming, you probably live in dread of being asked to process the information stored in cookies that your Web server places on your visitors' computers. Although programming cookies is beyond the scope of this book, you might find a cookie with a single string where a user's activities are encoded in much the same manner as the order strings shown earlier in the chapter. For example, the text of a cookie might be the string UI007589TM37900.77589, where 007589 is the visitor's user identification and 37900.77589 is the date/time string representing the user's last visit (which, in this case, was 10/6/2003 6:37 P.M.).

< Day Day Up >