Prevention | Preventative Programming Techniques: Avoid and Correct Common Mistakes (Charles River Media Programming)

< Day Day Up >

Preventing Complexification is about choosing the correct algorithm for the job. Let us look at some techniques that can help with this decision.

K.I.S.S.

When considering which algorithm to choose for a given situation, great weight should be given to the simplicity of the algorithm. This is known as the K.I.S.S. principle, or Keep It Simple, Stupid. By simple, we are referring to the readability and understandability of the algorithm. These qualities are directly proportional to the maintainability of the algorithm.

To better understand why this simplicity is important, consider the opposite direction. For each level of complexity added, either a group of programmers will be eliminated from maintaining that code or, at the very least, they will use more development time maintaining it. This does more than increase the development time for that algorithm alone. The programmers who are capable of maintaining the algorithm become a bottleneck for the team any time the algorithm must be changed. As more algorithms that are complex are introduced, the chance becomes greater that these bottleneck situations could result in the loss of development time for multiple programmers.

Know the Options

Before you can presume to choose the simplest algorithm, you must know what algorithms are available that fit your requirements. The good news is that many of the simpler algorithms are widely know. If you are considering an algorithm that is common knowledge, you increase the maintainability of the algorithm even if it is not the simplest. It is therefore important to understand what algorithms are well known within the computer science community. Consulting with the less experienced programmers, particularly those fresh out of college, is a good way to determine whether an algorithm is common knowledge.

When we talked about K.I.S.S., you were encouraged to choose the simplest algorithm. It might seem contradictory to now recommend choosing the most commonly known algorithm. This is not as much a contradiction as it might first appear, because algorithms that find wide adoption are generally those that are simplest to understand. This is a form of natural selection of algorithms, which supports the idea that simple algorithms are the most beneficial for software development. The longer the algorithm has been around to survive the selection process, the more likely it is to be simple and well known. However, also as with natural selection, some algorithms that were once extremely effective and popular might have become obsolete with changes in the environment. In the case of software development, this environment is the hardware, which sees constant improvements, and inevitably has an effect on the software that is run on it. Although these algorithms will eventually disappear, it takes time for them to be removed by evolution, and there can sometimes be remnants that remain indefinitely.

There will be many times when a common knowledge algorithm does not suit the requirements of the given problem. If this is the case, the first step should be to reevaluate the requirements to determine if they can be relaxed to accommodate one of the simple and well-known algorithms. If the requirements cannot be relaxed to accommodate a common algorithm, it is time to do some research. Even if you know an algorithm that will work, it is a good idea to search for a couple of alternatives. With a selection of algorithms, make sure they each meet the requirements. Next, consider their readability, understandability, and maintainability. This should be the factor given the greatest weight in deciding which algorithm to use.

Start Small

This is a good point to step back a bit and consider the topic of requirements that we mentioned previously. The choice of algorithm is most constrained by the absolute requirements of the problem. Whereas the rest of the criteria for deciding which algorithm to use are relative and only affect the ordering, algorithms that do not meet the requirements are rejected completely.

Because of this rigidity, it is important to control the detail level of requirements so as not to eliminate too many functions. If the requirements are too strict, only complex functions are likely to be left to choose from, with the corresponding problems to long-term development that they bring with them. You should therefore strive to provide the smallest and least restrictive set of requirements possible during the initial phase of the project. The requirements can always be tightened as new information is provided or discovered. The resulting changes to the algorithms used are much easier because they will progress from a simpler algorithm to a more complex algorithm.

To understand why going from a simpler algorithm to a more complex algorithm is beneficial to development time, it is best to look at it from a risk management standpoint. Imagine that you start with the simpler algorithm, which is quicker to implement than the more complex algorithm. Imagine that a few minor modifications are required at first, which are performed in a short time due to the maintainability of the simple algorithm. Now one of two things could happen. The algorithm could be sufficient and ship with the application, or a more complex algorithm could replace the algorithm because of stricter requirements. Now take the opposite scenario by starting with the complex algorithm. This takes some time to implement, and then more time is used up when minor modifications are required. Because of its complexity, a senior programmer must be involved, which costs more development time. Once again, two possibilities arise. The application ships with the complex algorithm, or you discover that the simpler algorithm has a higher performance or allows certain operations that the complex one doesn’t and therefore you must replace the complex algorithm with the simpler one. By comparing these two scenarios, you can see that on average the first scenario will cost less development time and therefore money. Even in the worst case, the development time is not much longer than the best case for the second scenario. Given that the first option has less risk, it is obviously the preferred choice.

The Right Language

While in theory all Turing complete languages, which contain both conditionals and loops, can execute any algorithm that can be executed in another Turing complete language, the reality of writing the algorithm varies across different languages. Language differences can make a difference in the apparent complexity of an algorithm, just as using external libraries can also reduce the apparent complexity.

To demonstrate the impact of the language used to implement an algorithm, let us look at two methods for adding two vectors. The first method uses standard C++ code:

// Desc:   Generic efficient vector template. // Input:   i_size - Size of vector. //      t_ValueType - Type of vector elements. template <unsigned int i_size = 4, typename t_ValueType = double> class t_Vector { public:       // Desc:   Provide access to value type.    typedef t_ValueType t_Type;       // Desc:   Provide access to size.       static const unsigned int k_size;       // ...       // Desc:   Add vector element by element.       // Input:   i_rhs - Vector to add to this vector.       // Output:   Reference to self.       t_Vector &operator+=(const t_Vector &i_rhs)       {          for(unsigned int l_index = 0; l_index < k_size; ++l_index) { m_value[l_index] += i_rhs.m_value[l_index]; }       } private:       // ...       // Desc:   Array of values that form this vector.       t_Type m_value[i_size]; };

The second method, which is accomplished through C++ template meta-programming, is much more complex:

// Desc:   Generic efficient vector template. // Input:   i_size - Size of vector. //      t_ValueType - Type of vector elements. template <unsigned int i_size = 4, typename t_ValueType = double> class t_Vector { public:       // Desc:   Provide access to value type.    typedef t_ValueType t_Type;       // Desc:   Provide access to size.       static const unsigned int k_size;       // ...       // Desc:   Add vector element by element.       // Input:   i_rhs - Vector to add to this vector.       // Output:   Reference to self.       t_Vector &operator+=(const t_Vector &i_rhs)       {          t_Add<i_size - 1>::m_Add(*this, i_rhs);          return(*this);       } private:       // ...       // Desc:   Array of values that form this vector.       t_Type m_value[i_size]; public:       // ...    // Desc:   Add elements at index and //        recurse index-1.    template <unsigned int i_index> struct t_Add    {          // Input:   io_lhs - Vector to add //                 element from and into.          //      i_rhs - Vector to add //            element from.       static inline void m_Add( t_Vector<i_size, t_Type> &io_lhs,             const t_Vector<i_size, t_Type> &i_rhs)       {             io_lhs[i_index] += i_rhs[i_index];             t_Add<i_index - 1>::m_Add( io_lhs, i_rhs);       }       };    // Desc:   Terminate recursion and add elements //        at index 0.    template <> struct t_Add<0>       {       // Input:   io_lhs - Vector to add //             element from and into.          //      i_rhs - Vector to add //            element from.          static inline void m_Add( t_Vector<i_size, t_Type> &io_lhs,          const t_Vector<i_size, t_Type> &i_rhs)          {             io_lhs[0] += i_rhs[0];       }       };    // ... };

It accomplishes the same work as the first method, except it does it at compile time. In order to accomplish this, it can only use the limited programming environment for executing code at compile time. Despite the fact that this adds to the performance of the final application, using this special template language to accomplish this is only advisable if the extra performance is required. As a side note, see [Alexandrescu01] for more information on C++ template meta-programming.

While it is not always possible to choose the language in which the implementation will be done, when it is, consideration should be given to finding the language in which the algorithm can be expressed in the simplest most readable terms. Choosing different languages is becoming a more viable solution with the introduction of .NET, which simplifies the integration of multiple languages in one application. More detail on this is provided in Chapter 3, “NIH Syndrome,” but the benefits apply here as well.

For example, some languages such as Perl support regular expressions. If you plan to write an algorithm for processing text, there is a good chance that it can be expressed in a more succinct manner using regular expressions. Therefore, it makes sense to use one of these languages when implementing this algorithm, or, at the very least, use a library that supports regular expressions. Depending on the circumstances, one choice might be preferable. While this can make the code less complex, it does only make it more readable if the programmer understands the language and syntax of the implementation. If you suspect that a programmer who does not understand the language and syntax might need to read the code, proper documentation and comments are essential. It can also be useful to point the reader to a good source for further information, such as [Friedl02] for regular expressions.

Try Explaining That

If you have any doubts that the algorithm chosen is too complex, it is a good idea to enlist the aid of an uninvolved third party. Start by explaining the problem and inquiring about possible algorithms that the other programmer believes would fit the situation. Do not reveal which algorithm you plan to use until after you have obtained the other programmer’s suggestions. At this point, you might already discover that a simpler algorithm exists that you could be using.

If you are still convinced that you have chosen the right algorithm, proceed to explain the algorithm to another programmer. This will give you a better idea of the level of complexity of the algorithm and, in addition, will provide a guideline for the level of documentation that will be required for the algorithm to be maintainable. In this case, you should enlist the aid of a less experienced programmer if you want to get the best idea of how hard the algorithm is to understand.

Refactoring for Prevention

Most of the time, we talk about refactoring in the context of curing an illness, but refactoring is also a necessary part of the normal development process due to the normal decay of code quality caused by time and unavoidable changes in requirements. Refactoring can be particularly useful in preventing Complexification in the interaction between code units. Because the excess complexity arises as a byproduct of the normal development process, it is reasonable to expect to require the use of refactoring to keep it under control.

For most languages, reducing the complexity of interactions between code units through refactoring is accomplished by reducing the number of publicly accessible points of interaction on each encapsulated code units. Because the number of possible interactions increases substantially with each point of interaction available, removing even a few points of interaction can greatly reduce the complexity of an application. The primary means of reducing these public access points is through the movement of data and methods to more appropriate and less accessible locations in the code. Occasionally, code might turn out to be obsolete as well, allowing its removal altogether.

< Day Day Up >