This section contains three additional examples of megaformulas. These examples provide a thorough introduction to applying the megaformula technique for streamlining a variety of tasks, including cleaning up a list of names by removing middle names and initials, returning the position of the last space character in a string, determining whether a credit card number is valid, and generating a list of random names.
Consider a worksheet with a column of names, like the one shown in Figure 20-2. Suppose you have a worksheet with thousands of such names, and you need to remove all the middle names and middle initials from the names. Editing the cells manually would take hours, and you're not up to writing a VBA macro, so that leaves using a formula-based solution. Notice that not all the names have a middle name or a middle initial, which makes the task a bit trickier. Although this is not a difficult task, it normally involves several intermediate formulas.
Figure 20-2: The goal is to remove the middle name or middle initial from each name.
Figure 20-3 shows the results of the more conventional solution, which requires six intermediate formulas, as shown in Table 20-2. The names are in column A; column H displays the end result. Columns B–G hold the intermediate formulas.
On the CD | You can access the workbook for removing middle names and initials on the companion CD-ROM. The filename is no middle names.xlsx. |
Figure 20-3: Removing the middle names and initials requires six intermediate formulas.
Cell | Intermediate Formula | What It Does |
---|---|---|
B1 | =TRIM(A1) | Removes excess spaces |
C1 | =FIND(" ",B1) | Locates the first space |
D1 | =FIND(" ",B1,C1+1) | Locates the second space, if any |
E1 | =IFERROR(D1,C1) | Uses the first space if no second space exists |
F1 | =LEFT(B1,C1-1) | Extracts the first name |
G1 | =RIGHT(B1,LEN(B1)-E1) | Extracts the last name |
H1 | =F1&" "&G1 | Concatenates the two names |
Note that cell E1 uses the IFERROR function, which is available only in Excel 2007. For compatibility with earlier versions, use this formula:
=IF(ISERROR(D1),C1,D1)
Note | Notice that the result isn't perfect. For example, it will not work if the cell contains only one name (for example, Enya). And, this method also fails if a name has two middle names (such as John Jacob Robert Smith). That occurs because the formula simply searches for the second space character in the name. In this example, the megaformula returns John Robert Smith. Later in this chapter, I present an array formula method to identify the last space character in a string. |
With a bit of work, you can eliminate all the intermediate formulas and replace them with a single megaformula. You do so by creating all the intermediate formulas and then editing the final result formula (in this case, the formula in column H) by replacing each cell reference with a copy of the formula in the cell referred to. Fortunately, you can use the clip- board to copy and paste. (See the sidebar, "Copying Text from a Formula," earlier in this chapter.) Keep repeating this process until cell H1 contains nothing but references to cell A1. You end up with the following megaformula in one cell:
=LEFT(TRIM(A1),FIND(" ",TRIM(A1))-1)&" "&RIGHT (TRIM(A1),LEN(TRIM(A1))-IFERROR(FIND(" ",TRIM(A1), FIND(" ",TRIM(A1))+1),FIND(" ",TRIM(A1))))
When you're satisfied that the megaformula works, you can delete the columns that hold the intermediate formulas because they are no longer used.
If you're still not clear about this process, take a look at the step-by-step procedure:
Examine the formula in H1. This formula contains two cell references (F1 and G1):
=F1&" "&G1
Activate cell G1 and copy the contents of the formula (without the equal sign) to the Clipboard.
Activate cell H1 and replace the reference to cell G1 with the Clipboard contents. Now cell H1 contains the following formula:
=F1&" "&RIGHT(B1,LEN(B1)-E1)
Activate cell F1 and copy the contents of the formula (without the equal sign) to the Clipboard.
Activate cell H1 and replace the reference to cell F1 with the Clipboard contents. Now the formula in cell H1 is as follows:
=LEFT(B1,C1-1)&" "&RIGHT(B1,LEN(B1)-E1)
Now cell H1 contains references to three cells (B1, C1, and E1). The formulas in those cells will replace each of the three references.
Replace the reference to cell E1 with the formula in E1. The result is
=LEFT(B1,C1-1)&" "&RIGHT(B1,LEN(B1)-IFERROR(D1,C1))
Replace the reference to cell D1 with the formula in D1. The formula now looks like this:
=LEFT(B1,C1-1)&" "&RIGHT(B1,LEN(B1)-IFERROR (FIND(" ",B1,C1+1),C1))
The formula has three references to cell C1. Replace all three of those references to cell C1 with the formula contained in cell C1. The formula in cell H1 is as follows:
=LEFT(B1,FIND(" ",B1)-1)&" "&RIGHT(B1,LEN(B1)-IFERROR (FIND(" ",B1,FIND(" ",B1)+1),FIND(" ",B1)))
Finally, replace the seven references to cell B1 with the formula in cell B1. The result is
=LEFT(TRIM(A1),FIND(" ",TRIM(A1))-1)&" "&RIGHT (TRIM(A1),LEN(TRIM(A1))-IFERROR(FIND(" ",TRIM(A1), FIND(" ",TRIM(A1))+1),FIND(" ",TRIM(A1))))
Notice that the formula in cell H1 now contains references only to cell A1. The megaformula is complete, and it performs exactly the same tasks as all the intermediate formulas (which you can now delete).
This megaformula uses the Excel 2007 IFERROR function. A comparable formula that's compatible with previous versions is
=LEFT(TRIM(A1),FIND(" ",TRIM(A1),1)-1)&" "&RIGHT (TRIM(A1),LEN(TRIM(A1))-IF(ISERROR(FIND(" ",TRIM(A1), FIND(" ",TRIM(A1),1)+1)),FIND(" ",TRIM(A1),1),FIND(" ",TRIM (A1),FIND(" ",TRIM(A1),1)+1)))
Interestingly, the formula that uses IFERROR contains 52 fewer characters.
Because a megaformula is so complex, you may think that using one slows down recalculation. Actually, that's not the case. As a test, I created a workbook that used the megaformula 150,000 times. Then I created another workbook that used six intermediate formulas to compute the 150,000 results. I compared the results in terms of calculation time and file size; see Table 20-3.
Method | Recalculation Time (Seconds) | File Size |
---|---|---|
Intermediate formulas | 6.3 | 11.1MB |
Megaformula | 4.2 | 2.6MB |
Of course, the actual results will vary depending on processor speed and the amount of memory installed.
As you can see, using a megaformula in this case resulted in faster recalculations as well as a much smaller workbook.
On the CD | The two test workbooks I used are available on the companion CD-ROM. The filenames are time test intermediate.xlsx and time test megaformula.xlsx. To perform your own time tests, change the name in cell A1 and start your stopwatch (your cell phone probably has one). Keep your eye on the status bar, which indicates when the calculation is finished. |
As previously noted, the "remove middle name" example presented earlier contains a flaw: To identify the last name, the formula searches for the second space character. A better solution is to search for the last space character. Unfortunately, Excel doesn't provide any simple way to locate the position of the first occurrence of a character from the end of a string. The example in this section solves that problem and describes a way to determine the position of the first occurrence of a specific character going backward from the end of a text string.
Cross Ref | This technique involves arrays, so you might want to review the material in Part IV to familiarize yourself with this topic. |
This example describes how to create a megaformula that returns the character position of the last space character in a string. You can, of course, modify the formula to work with any other character.
The general plan is to create an array of characters in the string but in reverse order. After that array is created, you can use the MATCH function to locate the first space character in the array.
Refer to Figure 20-4, which shows the results of the intermediate formulas. Cell A1 contains an arbitrary name, which happens to comprise 12 characters. The range B1:B12 contains the following array formula:
{=ROW(INDIRECT("1:"&LEN(A1)))}
Figure 20-4: These intermediate formulas will eventually be converted to a single megaformula.
On the CD | This example, named position of last space.xlsx, is available on the companion CD-ROM. |
You enter this multicell array formula into the entire B1:B12 range by selecting the range, typing the formula, and pressing Ctrl+Shift+Enter. Don't type the curly brackets. Excel adds the curly brackets to indicate an array formula. This formula returns an array of 12 consecutive integers.
The range C1:C12 contains the following array formula:
{=LEN(A1)+1-B1:B12}
This formula essentially reverses the integers generated in column B.
The range D1:D12 contains the following array formula:
{=MID(A1,C1:C12,1)}
This formula uses the MID function to extract the individual characters in cell A1. The MID function uses the array in C1:C12 as its second argument. The result is an array of the name's characters in reverse order.
The formula in cell E1 is as follows:
=MATCH(" ",D1:D12,0)
This formula, which is not an array formula, uses the MATCH function to return the position of the first space character in the range D1:D12. In the example shown in Figure 20-4, the formula returns 6, which means that the first space character is six characters from the end of the text in A1.
The formula in cell F1 is
=LEN(A1)+1-E1
This formula returns the character position of the last space in the string.
You may wonder how all of these formulas can possibly be combined into a single formula. Keep reading for the answer.
At this point, cell F1 contains the result you're looking for. The challenge is consolidating all of those intermediate formulas into a single formula. The goal is to produce a formula that contains only references to cell A1. These steps will get you to that goal:
The formula in cell F1 contains a reference to cell E1. Replace that reference with the text of the formula in cell E1. As a result, the formula in cell F1 becomes
=LEN(A1)+1-MATCH(" ",D1:D12,0)
Now the formula contains a reference to D1:D12. This range contains a single array formula. Replacing the reference to D1:D12 with the array formula results in the following array formula in cell F1:
{=LEN(A1)+1-MATCH(" ",MID(A1,C1:C12,1),0)}
Note | Because an array formula replaced the reference in cell F1, you now must enter the formula in F1 as an array formula (enter by pressing Ctrl+Shift+Enter). |
Now the formula in cell F1 contains a reference to C1:C12, which also contains an array formula. Replace the reference to C1:C12 with the array formula in C1:C12 to get this array formula in cell F1:
{=LEN(A1)+1-MATCH(" ",MID(A1,LEN(A1)+1-B1:B12,1),0)}
Next, replace the reference to B1:B12 with the array formula in B1:B12. The result is
{=LEN(A1)+1-MATCH(" ",MID(A1,LEN(A1)+1-ROW(INDIRECT ("1:"&LEN(A1))),1),0)}
Now the array formula in cell F1 refers only to cell A1, which is exactly what you want. The megaformula does the job, and you can delete all the intermediate formulas.
Note | Although you use a 12-digit value and arrays stored in 12-row ranges to create the formula, the final formula does not use any of these range references. Consequently, the megaformula works with a value of any length. |
Figure 20-5 shows a worksheet with names in column A. Column B contains the megaformula developed in the previous section. Column C contains a formula that extracts the characters beginning after the last space, which represents the last name of the name in column A.
Figure 20-5: Column B contains a megaformula that returns the character position of the last space of the name in column A.
Cell C1, for example, contains this formula:
=RIGHT(A1,LEN(A1)-B1)
If you like, you can eliminate the formulas in column B and create a specialized formula that returns the last name. To do so, substitute the formula in B1 for the reference to B1 in the formula. The result is the following array formula:
{=RIGHT(A1,LEN(A1)-(LEN(A1)+1-MATCH(" ",MID(A1,LEN(A1)+ 1-ROW(INDIRECT("1:"&LEN(A1))),1),0)))}
Note | You must insert parentheses around the formula text copied from cell B1. Without the parentheses, the formula does not evaluate correctly. |
Many people are not aware that you can determine the validity of a credit card number by using a relatively complex algorithm to analyze the digits of the number. In addition, you can determine the type of credit card by examining the initial digits and the length of the number. Table 20-4 shows information about four major credit cards.
Credit Card | Prefix Digits | Total Digits |
---|---|---|
MasterCard | 51–55 | 16 |
Visa | 4 | 13 or 16 |
American Express | 34 or 37 | 15 |
Discover | 6011 | 16 |
Note | Validity, as used here, means whether the credit card number itself is a valid number as determined by the following steps. This technique, of course, cannot determine whether the number represents an actual credit card account. |
You can test the validity of a credit card account number by processing its checksum digits. All account numbers used in major credit cards use a Mod 10 check-digit algorithm. The general process is as follows:
Add leading zeros to the account number to make the total number of digits equal 16.
Beginning with the first digit, double the value of alternate digits of the account number. If the result is a two-digit number, add the two digits together.
Add the eight values generated in Step 2 to the sum of the skipped digits of the original number.
If the sum obtained in Step 3 is evenly divisible by 10, the number is a valid credit card number.
The example in this section describes a megaformula that determines whether a credit card number is a valid number.
Figure 20-6 shows a worksheet set up to analyze a credit card number and determine its validity. This workbook uses quite a few formulas to make the determination.
Figure 20-6: The formulas in this worksheet determine the validity of a credit card number.
On the CD | You can access the credit card number validation workbook on the companion CD-ROM. The file is named credit card validation.xlsx. |
In this worksheet, the credit card number is entered in cell F1, with no spaces or hyphens. The formula in cell F2 follows. This formula appends leading zeros, if necessary, to make the card number exactly 16 digits long. The other formulas use the string in cell F2.
=REPT("0",16-LEN(F1))&F1
Caution | When entering a credit card number that contains more than 15 digits, you must be careful that Excel does not round the number to 15 digits. You can precede the number with an apostrophe or preformat the cell as Text (using Home Number Number Format Text). |
Column A contains a series of integers from 1–16, each representing the digit positions of the credit card.
Column B contains formulas that extract each digit from cell F2. For example, the formula in cell B5 is as follows:
=MID($F$2,A5,1)
Column C contains the multipliers for each digit: alternating 2s and 1s.
Column D contains formulas that multiply the digit in column B by the multiplier in column
C. For example, the formula in cell D5 is
=B5*C5
Column E contains formulas that sum the digits displayed in column D. A single digit value in column D is returned directly. For two-digit values, the sum of the digits is displayed in Column E. For example, if column D displays 12, the formula in column E returns 3: that is, 1 + 2. The formula that accomplishes this is as follows:
=INT((D5/10)+MOD((D5),10))
Cell E21 contains a simple SUM formula to add the values in column E:
=SUM(E5:E20)
The formula in cell G1, which follows, calculates the remainder when cell E21 is divided by 10. If the remainder is 0, the card number is valid, and the formula displays VALID. Otherwise, the formula displays INVALID.
=IF(MOD(E21,10)=0,"VALID","INVALID")
The megaformula that performs all of these calculations will be an array formula because the intermediary formulas occupy multiple rows.
First, you need to convert all the formulas to array formulas. Note that columns A and C consist of values, not formulas. To use the values in a megaformula, they must be generated by formulas-more specifically, array formulas.
Enter the following array formula into the range A5:A20. This array formula returns a series of 16 consecutive integers:
{=ROW(INDIRECT("1:16"))}
For column B, select B5:B20 and enter the following array formula, which extracts the digits from the credit card number:
{=MID($F$2,A5:A20,1)}
Next, column C requires an array formula that generates alternating values of 2 and 1. Such a formula, entered into the range C5:C20, is shown here:
{=(MOD(ROW(INDIRECT("1:16")),2)+1)}
For column D, select D5:D20 and enter the following array formula:
{=B5:B20*C5:C20}
Finally, select E5:E20 and enter this array formula:
{=INT((D5:D20/10)+MOD((D5:D20),10))}
Now the worksheet contains five columns of 16 rows, but only five actual formulas (which are multicell array formulas).
To create the megaformula for this task, start with cell G1, which is the cell that has the final result. The original formula in G1 is
=IF(MOD(E21,10)=0,"VALID","INVALID")
First, replace the reference to cell E21 with the formula in E21. Doing so results in the following formula in cell G1:
=IF(MOD(SUM(E5:E20),10)=0,"VALID","INVALID")
Next, replace the reference to E5:E20 with the array formula contained in that range. Now the formula becomes an array formula, so you must enter it by pressing Ctrl+Shift+Enter. After the replacement, the formula in G1 is as follows:
{=IF(MOD(SUM(INT((D5:D20/10)+MOD((D5:D20),10))),10)=0, "VALID","INVALID")}
Replace the two references to range D5:D20 with the array formula contained in D5:20. Doing so results in the following array formula in cell G1:
{=IF(MOD(SUM(INT((B5:B20*C5:C20/10)+MOD((B5:B20*C5:C20), 10))),10)=0,"VALID","INVALID")}
Next, replace the references to cell C5:C20 with the array formula in C5:C20. Note that you must have a set of parentheses around the copied formula text. The result is as follows:
{=IF(MOD(SUM(INT((B5:B20*(MOD(ROW(INDIRECT("1:16")), 2)+1)/10)+MOD((B5:B20*(MOD(ROW(INDIRECT("1:16")),2)+1)), 10))),10)=0,"VALID","INVALID")}
Replacing the references to B5:B20 with the array formula contained in B5:B20 yields the following:
{=IF(MOD(SUM(INT((MID($F$2,A5:A20,1)*(MOD(ROW(INDIRECT ("1:16")),2)+1)/10)+MOD((MID($F$2,A5:A20,1)*(MOD(ROW (INDIRECT("1:16")),2)+1)),10))),10) =0,"VALID","INVALID")}
Substitute the array formula in range A5:A20 for the references to that range. The resulting array formula is as follows:
{=IF(MOD(SUM(INT((MID($F$2,ROW(INDIRECT("1:16")),1)*(MOD(ROW (INDIRECT("1:16")),2)+1)/10)+MOD((MID($F$2,ROW(INDIRECT ("1:16")),1)*(MOD(ROW(INDIRECT("1:16")),2)+1)),10))),10)=0, "VALID","INVALID")}
Finally, substitute the formula in cell F2 for the two references to cell F2. After making the substitutions, the formula is as follows:
{=IF(MOD(SUM(INT((MID(REPT("0",16-LEN(F1))&F1, ROW(INDIRECT("1:16")),1)*(MOD(ROW(INDIRECT("1:16")),2)+1)/ 10)+MOD((MID(REPT("0",16-EN(F1))&F1,ROW(INDIRECT("1:16")), 1)*(MOD(ROW(INDIRECT("1:16")),2)+1)),10))),10)=0,"VALID", "INVALID")}
You can delete the now superfluous intermediate formulas. The final megaformula, a mere 229 characters in length, does the work of 51 intermediary formulas!
The final example is a useful application that generates random names. It uses three name lists, compiled by the U.S. Census Bureau: 4,275 female first names; 1,219 male first names; and 18,839 last names. The names are sorted by frequency of occurrence. The megaformula selects random names such that more frequently occurring names have a higher probability of being selected. Therefore, if you create a list of random names, they will appear to be somewhat realistic. (Common names will appear more often than uncommon names.)
Figure 20-7 shows the workbook. Cells B7 and B8 contains values that determine the probability that the random name is a male as well as the probability that the random name contains a middle initial. The randomly generated names begin in cell A11.
Figure 20-7: This workbook uses a megaformula to generate realistic random names.
On the CD | This workbook, named name generator.xlsx, is available on the companion CD-ROM. |
The megaformula is as follows (the workbook uses several names):
=IF(RAND()<=PctMale,INDEX(MaleNames,MATCH(RAND(), MaleProbability,-1)),INDEX(FemaleNames,MATCH(RAND(), FemaleProbability,-1)))&IF(RAND()<=PctMiddle," "& INDEX(MiddleInitials,MATCH(RAND(),MiddleProbability,-1))& ".","")&" "&INDEX(LastNames,MATCH(RAND(),LastProbability,-1))
I don't list the intermediate formulas here, but you can examine them by opening the file on the CD-ROM.