There are two exceptionally valuable approaches for improving the performance of your applications. The first of these is to write refactored code. Refactoring is a process that originated out of the Smalltalk and academic programming communities, and that emphasizes clear, concise programming, as well as automated testing. When code has been refactored thoroughly, it is easy to understand and much easier to modify for future uses. Unfortunately, it requires a great deal of discipline on the part of the software developer. The definitive reference on this programming technique is Refactoring: Improving the Design of Existing Code by Martin Fowler et al (Addison Wesley). The second approach is one of algorithmic choice. Although a thorough discussion of algorithmic analysis is well beyond the scope of this text, it's important to realize that there are many different approaches to the same problem, and some are much faster than others. As an example, we'll look at different approaches to searching a string for a specific pattern.
You could say these two basic principles boil down to "write good code using fast algorithms." This is true, but my experiences have shown that people often write bad code using slow algorithms. There are a variety of reasons why this occurs:
There are basically two kinds of performance improvements that come from modifying application source code: evolutionary and revolutionary. An evolutionary change is one that is accomplished without a significant alteration to the algorithms that provide a foundation for the application. For example, changing an multi-threaded application's locking strategies to relieve lock contention on a few key locks is typically an evolutionary change. Revolutionary change comes about from changing the underlying algorithm. The difference is that evolutionary changes tend to give you performance speedups of a few times at most -- it's very dependent on how the original application was written -- but revolutionary changes can improve application performance by orders of magnitude. Revolutionary changes also tend to be much harder to implement. Clearly, algorithmic change has a huge potential for performance improvement. So, how does refactoring fit into the sketch we've drawn of software development? The basic idea here is whenever you write a line of code, it should be refactored. If you are rewriting pieces of legacy code, those should be refactored as well. Refactoring takes a lot of time and understanding at first, but it rapidly becomes second nature. As the code becomes more highly refactored, it becomes much easier to identify and implement performance improvements. Because the code is clear, compact, and easy to understand, interpreting profiling output becomes straightforward. Implementing algorithmic change becomes vastly easier and much faster. I can't overemphasize the importance of refactoring in application development and tuning. With that philosophy out of the way, then, let's move on to a discussion of string searching as an illustration of the impact of algorithmic selection. 8.1.1 String Searching AlgorithmsLet's say that we have a string of text, perhaps "Here is a string of text which we wish to search." We want to find the word "wish" in this string. Henceforth, we'll call the string we are searching S, or the source string, and the string we are searching for T, or the target string. How can we approach this problem? Let's look at a few different approaches. 8.1.1.1 Algorithm 1: naive searchingAs a first approach, let's look for T in S by comparing the first character in T to the first character in S , and then following these rules:
Repeat until either conditions 2 or 3 are true. For example, let's say we want to find the string "cad" ( T ) in the string "cabcadchad" ( S ). The progress of our algorithm would follow what's shown in Table 8-1. Table 8-1. Naive search algorithm
In the worst case, this naive searching algorithm will require a number of comparisons equal to: length of pattern - number of possible start points = T x (S-T) For example, using a 48-character string as S and a 4-character string as T , we would require 4x(48-4)=176 comparisons in the worst case. This gives the naive search algorithm a complexity of O( st ), where s is the length of S and t is the length of T . In our efforts to improve performance, we think carefully about this method for searching, and decide we'd like a more efficient one. 8.1.1.2 Algorithm 2: Knuth-Morris-Pratt searchingIf the naive algorithm fails to find a match for a character, it advances the starting point forward one character. Unfortunately, this throws away all the information obtained by matching so far. For example, let's say we have "abababcd" as T and "ababababc..." as the search string (it's not important what's after c, as long as it causes a mismatch). When the match fails at the position after c, the naive algorithm would have us advance one character in S and start checking again. Because we know that the first six characters of T were matched, advancing by one cannot work. However, shifting the pattern by two gives us another possible match. We know that the first four characters already match, so we can continue the search from the same place in S . This means we never have to look at characters in the source text more than once. The Knuth-Morris-Pratt (KMP) algorithm has complexity O( s + t ). In general, KMP is fast exactly where the naive algorithm is slow -- when S and T both contain repeated patterns of characters. It is particularly effective when the alphabet is small (e.g., the genetic code, which uses only four characters: G, C, A, and T, or in bit patterns). 8.1.1.3 Algorithm 3: Boyer-Moore searchingThe last algorithm we come to in our search for fast, simple string matching is the Boyer-Moore algorithm, which was first published in 1977. It uses a different approach, together with two techniques to improve the amount by which the pattern is shifted in the event of a failed match, to provide much better performance. The change of approach is to attempt to match T from right to left, rather than left to right, while still moving T left to right across S . For example, if T is "wish" and S is "dish," we would successfully match "h," "s," and "i" before failing on "d" and "w." The first technique for improving pattern shifting in the event of a mismatch is to move the pattern along so that a match is made with the character in S that we are inspecting, if possible. For example, let's consider the string of text "Here is a string of text which we wish to search." We want to find the word "wish" in this string: S: here is a piece of text which we wish to search T: wish "H" does not match "e," and no "e" appears in T , so no match can contain the "e." Therefore, we can move the pattern along past the "e" -- up to four places, in fact, the width of T . Now we have: S: here is a piece of text which we wish to search T: wish The space does not match anything in T , so we can advance four characters again: S: here is a piece of text which we wish to search T: wish The match has failed again, but this time against an "i," which occurs in the pattern. We shift T so that the "i" in S and the "i" in T are aligned: S: here is a piece of text which we wish to search T: wish There is no "c" in T , so shift T over by 4. Eventually, by repeating this algorithm, we will find the occurence of T in S . This match occurs at the 34th position in S , but we require only 15 comparisons. Either the naive method or the KMP algorithm would have required at least 37 comparisons -- a huge performance benefit. In general, the longer the pattern, the fewer the comparisons. The second mechanism used by the Boyer-Moore algorithm is essentially an adaptation of the "fail array" technique used by KMP to decide how far to slide T along S , but adapted to the right-to-left search method. In general, the Boyer-Moore algorithm provides exceedingly fast searching when compared to the naive algorithm or the KMP algorithm, particuarly for long patterns in a text with a large alphabet. Its complexity is O( s / t ). 8.1.2 Caveats in OptimizationThere are three pitfalls to watch for in code tuning: diminishing returns, the development-to-use ratio, and functional equivalence. The pitfall of diminishing returns is illustrated by a program that exhibits the characteristics illustrated in Table 8-2. Table 8-2. Timing characteristics of a sample program
If we improve the performance of baz by 20%, the runtime of the program will decrease by 9%, because baz takes 45% of the total running time. If we improve the performance of foo by the same amount (20%), however, the runtime of the program only improves by 2.8%. Therefore, tuning efforts should be concentrated on areas where the improvements will provide the most effect. This usually means resolving bottlenecks in critical sections of code; fixing bottlenecks in noncritical code sections is not likely to help performance much. Another thing to watch out for is the development time to runtime ratio. In many fields, a program may be so specialized as to be run only a few times. In these cases, the time spent by developers in profiling, tuning, and testing the code may be so great that it exceeds the improvements gained by optimization. The last major caveat of code optimization is functional equivalence. Not all optimizations guarantee bitwise-exact results. For example, instead of dividing two numbers , you can multiply one by the inverse (reciprocal) of the other. This can be a much faster operation, but in some cases, the results will not be identical. One instance where this is true is when the divisor is extremely small. [2] When the reciprocal of this very small number is computed, it will overflow, and produce an infinity. You must be careful to check the results of your program after each optimization using representative input sets to avoid these sorts of errors.
|