Cure | Preventative Programming Techniques: Avoid and Correct Common Mistakes (Charles River Media Programming)

< Day Day Up >

There are two parts to curing premature optimization. The first part explains what to do when inheriting code early in a project that has been optimized too soon. If this is not tackled immediately, time will be wasted over the course of development until the cure is implemented. Time wasted leads directly to money wasted, leading to a spiral of problems, so be sure to confront these issues as soon as possible. The second part is about knowing what to do when the optimization stage finally arrives. Without this knowledge, the optimizations will be misdirected and result in problems that are very similar to those of premature optimization.

When to Refactor

Someone has already taken the development time to write an optimized version of an algorithm, so why would there ever be a reason to rewrite it with a less optimal algorithm? For the answer to this, we look back to the symptoms of premature optimization. If the optimization is isolated and does not need to be changed, then there is very little reason to replace it. However, if the optimization is intertwined with code that requires a change, the difficulty in modification might be a reason for refactoring. In addition, this might bring future benefits if more changes become necessary.

Once the decision to refactor is made, the following steps should be taken:

Isolate the current algorithm from the rest of the code; reducing the other code’s dependencies on the algorithm as much as possible. Refactoring should be done in small steps. It might require the introduction of additional abstractions in the interface to the algorithm.
Test to ensure that the changes have not broken the application.
Replace the current algorithm with a more readable algorithm.
Test.
Review the changes to make sure they have improved the readability and robustness of the code; preferably, have another programmer who has not worked on it review it as well.

If proper tests are not in place to verify that the code is not broken by the changes, now is a good time to add them. Do this before you refactor the code, thus providing regression test results from the current code to ensure that the new code provides the same functionality.

Profile, Profile, Profile

At some point, it will become necessary to optimize the application. For most projects, this should be near the end of development. However, on real-time applications and a few other types of applications, it might be necessary to perform some small optimizations during development to allow proper running of the application for testing and evaluation. Both types of optimizations are performed using the same techniques, but optimizations carried out earlier in the development cycle should only be performed until the application is returned to acceptable performance for further development. Final optimizations are required to be performed until the target performance required for application deployment is reached.

The most important tool required for performing these optimizations is the profiler. A profiler might be anything from small timing reports embedded in the code to a full separate application such as Intel VTune Performance Analyzer (Figure 1.11). The more profiling tools available, the easier it will be to get the data necessary for identifying the slowest code. An extremely useful feature to look for when considering commercial profiling tools is the level of support for call graph profiling. Call graph profiling allows the viewing of the entire call tree for profiling data and, as discussed later, is essential for determining the best optimizations. Another important consideration for real-time applications is the performance impact of the profiling on the application. This is particularly important when evaluating the performance of an application that requires constant user input. If the application becomes unusable due to poor performance, any profiling data collected must be suspect. This is another reason why a range of profiling tools is useful, allowing different types of profiling for different parts of the application.

click to expand
Figure 1.11: Intel VTune Performance Analyzer.

Profiling and optimization is really an art, but this does not mean it should be treated as a chaotic process of random changes and uninformed guessing. There is a definite advantage to following a structured approach to optimization. What follows is a general plan for profiling and optimization that should be customized to meet the particular needs of your project.

First, use a profiling tool to determine which particular high-level module in your application is taking the most time. If the application is real-time, with or without user interaction, there are several considerations to take into account when performing this step. Real-time applications are generally profiled on a frame-by-frame basis. There are really two goals when trying to optimize this type of application: average frame time and frame time consistency. Tradeoffs might need to be considered to avoid spikes that are as noticeable if not more noticeable than a slower average frame rate. In addition, user interaction introduces multiple scenarios that need to be optimized. When this is the case, carefully separate and prioritize each scenario and tackle them one at a time. You might find that optimization in one scenario improves the performance of another scenario, but do not attempt to optimize both at once.

Once the performance concerns have been narrowed to a particular module, create a repeatable set of inputs to this module so results can be measured deterministically. While this might require minor changes to the application, the elimination of random results will easily pay for this in development time. For real-time applications, you might need to write special test cases to ensure this repeatability. Once this repeatable test harness is in place, use more detailed profiling tools to determine which function is taking the most time. If you have call graph profiling available, here is where it should be used. Be sure to record this information for future reference.

Now that a particular function is found to be taking the most time, the optimization that will be applied to improve performance must be determined. Here is where a mistake is often made. Instead of starting by attempting to optimize the function itself, start by attempting to optimize how often the function is called. A proper high-level optimization can greatly reduce the number of calls to the function, making low-level optimization of the function unnecessary. This is where call graph profiling is most valuable; allowing you to see which call tree is most responsible for the time taken by the function in question (Figure 1.12). Start looking for an optimization at the highest possible level and work down to the lower levels only after determining that no optimization is possible at that level.

click to expand
Figure 1.12: Browsing the call graph in Intel VTune Performance Analyzer can give more information on the source of expensive calls, providing extra context for deciding where to make optimizations.

Once a single optimization has been chosen, implement the optimization. Then, run the exact same test scenario that was used to find the function to optimize. Compare the results and ensure that the benefit of the optimization was worth the tradeoff in readability and robustness. If it was not substantial enough to warrant the changes, revert to the original code and search for another optimization.

Repeat this process until you have achieved the desired performance target. Be sure to profile after every code change during this process; otherwise, unexpected results might lead you to improper or useless optimizations. Do not be afraid to try several algorithms for the same optimization, but test each separately and compare the results of each with no other changes made to determine which one should be kept as the final optimization.

Importance of Testing

The plethora of books on how to perform proper testing is an indication of the importance placed on it. Several new methodologies, such as Extreme Programming, and many older methodologies rely on testing for their success. Testing encompasses everything from unit tests, which test individual classes and functions, to acceptance tests, which test whether the application meets the customer’s criteria. One other very important set of tests occur at the module and application level, these are integration tests used for checking the interaction between functions, classes, and modules. Some methodologies only use a portion of the possible tests; others can go overboard and require every single element to be tested. The best possible solution is a balance that aims to create tests for likely points of failure without wasting development time on tests that will never fail.

There are two very important aspects of testing as it relates to optimization. The first set of tests that are important are the normal tests used for development stability. These include the full range of test types normally used (Table 1.2), so they should already be available by the time the optimization phase is reached. After every change made to improve performance, these tests should be executed. If the tests pass, confidence that the optimization has not broken existing code will be high. This is extremely important at the end of the development cycle, when optimization should be performed, because even minor mistakes can cause large problems and critical delays.

Table 1.2: Testing Types
Testing Type	Description	Scope	Automation
Unit	Testing to ensure that code obeys its contract and does what is expected.	Single unit (class/function/etc.)	Full
Integration	Testing to ensure that code obeys its contract and does what is expected.	Multiple units	Full
Error handling	Testing to ensure that errors are handled gracefully, including under heavy system load and full resource constraints.	Single unit/ multiple units/ application	Full
User data	Testing with real customer data to ensure that the application works under real-world conditions.	Application	Partial
Usability	Testing to ensure that the application is usable by the customer, largely user interface testing.	Application	Minimal

Performance profiling tests are the other important set of tests. Often, potential optimizations can have no effect or a negative effect on performance. It is necessary to profile after every change to compare the performance with and without the change. If the change impairs performance, then reversion to the old code should be made. Even if there is no effect on performance, the reversion should be made, because any change presents a risk to the stability of the application. It would be unwise to take such a risk unless the gain is sufficient.

Creating profiling tests is easy for non-interactive applications, but a little bit more work is required to create repeatable tests for interactive applications. Figure 1.13 shows a basic setup for creating a profiling test that will work with an interactive application. While the application is running, the input data is collected. Once the conditions that the programmer wants to test are achieved, the data can be used to simulate the run repeatedly under similar conditions. Special consideration must often be taken in certain sections of the code to facilitate this type of testing. For instance, any randomizers must allow a repeatable seed to be provided so that random numbers remain identical across test runs. Simulation must also be able to function with timing information separate from the real-world timing. This allows changes to be made that affect the real-world timing while allowing the test runs to remain as identical as possible.

click to expand
Figure 1.13: Profiling test setup for an interactive application. Inputs are stored and then played back to achieve runs that differ in as few ways as possible.

Notice that both cases stress the importance of performing the tests after every change. This comment deserves special note, as it can be tempting to make several changes at once and then perform the tests. Often this is because the testing process is time consuming. This can be reduced by ensuring that the tests are fully automated; essentially, one press should launch, and run to completion, the entire testing suite. The testing suites can also be broken into smaller test suites if it can be guaranteed that certain modules do not interact. With reasonably efficient tests, running the tests after every change will save a considerable amount of work when a particular change breaks the application. The cause will be evident immediately, and tracking down the cause is often the most time-consuming part of the debugging process.

Special Case: Libraries

Special consideration must be given to the optimization of libraries meant for external use. Unlike applications that go directly to the end user, libraries are a part-way step to the final application. This leaves the library developer with less information about the end use of the library, making optimizations a trickier business. There are still methods that can be used to handle this case in a systematic fashion, leaving less to chance that the library users will come back complaining about how the library makes it impossible for them to optimize their applications.

The first step is to research the most likely uses of the library. If potential buyers are already known, consulting with them about their goals that would use the library can be of great use. The more detail you can get about what functionality they want and the framework within which they intend to use it, the easier it will be to properly optimize the library. If no potential buyers are ready at the time, which could be a bad sign for the salability of the product anyway, the intended frameworks within which the library will be used must be guessed at by the library development team. Once this information has been gathered, representative test cases must be created that represent several simplified versions of the uses of the library. This is a balancing act between the accuracy of the test cases and the length of development time used to create them. If proper development techniques were used, some of these test cases might already be available from the normal testing procedures used by the team.

The next step is to set up an automated profiling suite that includes all the test cases in the profiling. It is very important to consider all test cases for any one optimization; otherwise, only a small percentage of the library users will be happy with the performance. As in the optimization of a normal application, choose the highest performance impact to optimize first. This time, all test cases must be considered when choosing what to optimize. Once a change is made, profile all of the test cases again. Revert if any of the cases are negatively impacted by the change. It is important to make optimizations that benefit as many potential library users as possible, and it is even more important to avoid optimizations that help only one user to the detriment of other users.

< Day Day Up >