Chapter 12: Conclusion | Preventative Programming Techniques: Avoid and Correct Common Mistakes (Charles River Media Programming)

< Day Day Up >

Overview

Whether you have read through each individual illness or just a few that sounded interesting to you, you now have the techniques and information necessary to improve your personal programming abilities. There are certainly more illnesses to be discovered and analyzed, but before we delve into the future of preventing and curing programmer illnesses, we will examine several common concepts and techniques that can help with most types of illness.

Common Techniques

The following concepts and techniques can help in the prevention and cure of almost every programmer illness.

Language: The Gilded Cage

gild·ed \'gil-dəd\ [ME gilden, fr. OE gyldan; akin to OE gold gold] … 2 … b: to give an attractive but often deceptive appearance to …

— The Merriam Webster Collegiate Dictionary, Merriam Webster, 1997.

Language permeates our lives and allows us to communicate with each other and our descendants. Circumstances and history have created many different human languages to deal with the needs of people who use the language. The complexity of our languages is one of the most important contributors to the advances the human race has been able to make.

Nevertheless, language is not without its faults. Many tend to forget or overlook the fact that to go from thought to speech forces us to apply the constraints of the language we are speaking to the thoughts we want to voice. This traps our ideas in a cage of our own making, and we can only break them free through the evolution of our language to handle new concepts.

Take for example the Chinese word chi, which has equivalent words in several other Asian languages, that is summarized by the following definition:

Chi (qi, ch’i, ki in Japanese): Energy, subtle life force, internal energy, internal power. Manifested energy that empowers something to work and function. This concept underlies Chinese, Japanese, and Korean culture, in which the world is perceived not purely in terms of physical matter but also in terms of invisible energy. [Frantzis98]

First, even if this definition were sufficient, it would still be a lengthy replacement for the single word chi used in the Chinese language. However, notice that the definition mentions that the concept of chi underlies Chinese culture. This word is directly tied into the culture and history of the Chinese people and therefore has a much richer meaning that cannot be fully translated into the English language. The true meaning and purpose of the word is therefore inseparable from the system of which it originated.

This connection between language and the system from which it originates also acts as feedback that reinforces the culture that generated it. Part of the difficulty of understanding the word chi for English speaking people is the lack of cultural context and corresponding rigid thought patterns that prevent the understanding of the full meaning . These concepts about language can be applied not only to human language but also to programming languages.

Programming languages provide similar benefits and suffer from similar faults as human language. Programming languages allow us to communicate with computers and other devices containing microprocessors in a manner that would otherwise be impossible. Without them, constructing software for even the simplest devices would be so time consuming that it would not be a viable business except for a few special applications.

However, programming languages suffer from even more constraints upon the ideas we want to communicate to the computer. Due to the more rigid requirements that computers impose upon communication, languages must apply tight constraints and policies upon communication if the device is to understand the information. The chain of communication therefore goes from the domain of the problem to be solved, then to the human programmers who must communicate the problem and solution ideas using human language, to the implementation in the programming language that the computer requires to understand and perform the requested problem solving. This chain requires at least two translations to be performed from the initial domain, with the ensuing constraints that this implies. It is not difficult to see from this that much can be lost on the path to developing software.

The constraints of the language are not the only problems that translation between domain and languages causes; they also introduce the possibility of errors. The likelihood of mistakes increases with the complexity of the translation that is required. Some of these mistakes are simple human error that is unavoidable with the degree of complexity of the translation, but others are caused by a lack of understanding of the programming language by the programmer. This lack of understanding can easily lead the programmer to other solutions that are more complex and therefore more error prone.

Over time, the constraints of the language will affect the problem-solving techniques that the programmer uses. This trend further constrains that programmer to the solutions possible in the language that predominates his programming experiences. Even as the programming language evolves or new programming languages are adopted, the temptation to fall back on known practices is hard to resist even if these practices cause numerous problems.

What this means to you as a programmer is that there are two main competing forces in the decision of what language to use for a particular problem. The first factor is which language is most appropriate for solving the problem. This is determined primarily by the complexity of translating the problem from human language and design representation to implementation in the programming language. The simpler this translation is, the more time that is gained in both the translation and the lower number of errors that must be fixed.

The second factor is the level of competence of the programmer with the language that is chosen. Even if the translation is simpler for a particular language, if the programmer is unable to make the simpler translation there will be no benefit to using that language. In fact, it might be a detriment as the programmer struggles with the language and introduces errors that would not have happened in a language with which the programmer was more comfortable. This will most often occur when a programmer is delegated a problem and the language to use by a manager, because if the programmer performing the implementation were to choose the language, then the knowledge of the simpler translation would not factor into the decision.

Throughout most of the illnesses, the main techniques were kept as language independent as possible to encourage the use of the appropriate language for the given task with the programmers available. To fully take advantage of multiple programming languages, new languages must be learned and explored by both the individual programmer and the colleagues with whom the programmer works. This knowledge provides an extra benefit aside from opening up the number of languages available for solving a problem. Understanding multiple languages can also loosen the constraints built up around the problem-solving mechanisms introduced by a particular language. This can even lead to discovering that some techniques from a new language translate well to a language already in use by the programmer. These techniques might have simply been overlooked by the most common problem-solving paradigm used by programmers of the older language.

Risk Management

Despite our desire to make it so, software development will never be a deterministic process. Mistakes, unexpected events, and other outside forces will work to prevent any plan from completing as written. We all know this, yet so often we choose to ignore this knowledge in favor of a plan that looks optimal on paper. The problem with the optimal plan for any software project is that it often requires everything to go as expected. When a part of the plan fails, the intricate dependencies that are required for the plan to be optimal fall apart and drag the project down into a quagmire from which it might never recover.

Instead of creating a plan that rests on the fine edge, create a plan that is meant to minimize the risks involved in mistakes and changes. This does not mean to play it safe and only develop products that have been done before and are guaranteed to succeed. These projects are just as likely to encounter random events that cause failure, or they can simply stagnate as others pass the old technology by with more advanced techniques. Managing a project risk is balancing the risks against the benefits, on occasion choosing riskier plans if the payoff is worth it and sticking to the safer plans when the risky plan would not pay off sufficiently.

Testing

One of the most misunderstood and misused of the software engineering techniques is testing. The two primary misuses of testing are too much or none at all. Programmers will often swing from one of these extremes to the other. Perhaps the last project was a debugging nightmare as last-minute changes broke the build in numerous ways that were difficult to track. This project, everything will be tested down to the finest detail. On the other hand, maybe the last project was so bogged down by testing that everything was delivered late or with missing features. This project does not need the extra overhead that burdened the last project.

Of course, if the last project was completed, then you might just decide to keep the status quo. After all, the other extreme only changes the problems encountered without reducing them. This line of thinking often arises from talking with programmers who have worked on the other side of the extreme. If it is not broken, do not fix it.

However, the process is broken. Countless hours are wasted debugging on the one side, and countless hours are wasted writing unused test cases on the other side. Therefore, we should search for a solution to this situation that will retrieve these lost hours. Consider the optimal amount of testing that a project could hope to achieve. A test case would be written for each instance where the code is broken upon writing, or broken in the future, and only for these instances. Unless you have a crystal ball tucked away in your office and the knowledge to make it work, achieving this optimal solution is unlikely to happen.

Instead, we again bring into focus the ideas of risk management. The value of a test case is determined by the amount of time that the test would take to create versus the amount of time that is lost in debugging and other testing when the test case is not created. Obviously, neither of these values is known when the decision to write a test must be made. You must therefore estimate what these values would be based on past experience and your knowledge of the workings of the code.

Estimating the amount of time that the test will take to write is the easier of the two tasks, and is identical to estimating the time it would take you to implement any well-defined piece of code. The more difficult estimation is the amount of time that will be lost to debugging. This value is made up of two parts, the amount of time that a failure would take to analyze and fix and the probability that another programmer will break the code in the future. In fact, you might even be the one to break the code later, but it is best not to consider this because it is more likely to bias your answer toward too little testing. Weighting the debugging time with the probability of failure gives the appropriate value to consider against the time to write the test. This idea can be more succinctly expressed by the following rule:

Tests should be written to catch the most likely points of major failure.

One particular software methodology in which this rule can be of great assistance in balancing the time spent writing tests with the benefit of the tests is known as test-driven development. In this methodology, the tests are written first and then the code to make the tests work is added. This code is added first with just enough to allow the test to run and fail. This proves that the test can fail. Then the correct functionality is added in until the test completely passes. Writing the tests with consideration to the most likely points of failure provides twice the benefit for minimal work because the failure cases will be guarded against both later and through the initial development of the tested code. This initial safeguarding does not happen when the test is written after the code.

However, even if you use this methodology, do not ignore the need to write tests after the code is already developed. When an error occurs, write a test for it because errors tend to happen in the same place more often than they do in new places. This new test can then correctly safeguard against this error in the future, even though it was not available to save the debugging labor the first time through the coding.

How to Test

Once you are convinced that testing is an important part of the software development process, you will become curious about what to test and how to test it. There is plenty of literature available, although much of the literature is for large projects or full testing departments. Here we are more concerned about testing as it applies to what an individual programmer can accomplish. A good set of guidelines on testing can be found in [Hunt00]. To place this in context with the prevention and curing of illness, let us look at a summary of the testing process as it applies to one or only a few programmers.

Testing types can be divided into three main categories:

Unit testing
Integration testing
System testing

This is a simplification of the types given in [Hunt00] so we can look at the applicability of these types to the individual programmer.

Unit testing is the testing of a single code unit such as a module, class, or file. This level is best handled by one or two programmers familiar with the code and its goals. Having a second programmer, who did not write the code to be tested, write the tests can be an excellent way to prevent the common habit of writing tests that subconsciously avoid the breaking points of the code. Writing the tests before writing the code is another approach to help prevent writing tests that avoid the points of failure. The tests are directed at the responsibilities of the single unit. To aid in the maintenance and applicability of the tests they should be kept with the code they are testing, in either the same file or directory. Naming conventions that place test as a prefix or suffix to test cases, and test cases only can also be useful, particularly for automated testing and other automated tools.

Integration testing is the testing of the interactions between separate modules, classes, or files. This level involves anywhere from a single programmer to the entire team, but is still generally the responsibility of an individual programmer to actually write the tests. It is important to focus on the interactions between the modules and avoid regressing to unit testing that might only focus on an individual feature from only one of the modules.

System testing is the testing of the entire application, including its usability and adherence to the customer’s requirements and expectations. This form of testing is most often carried out by a person or group other than the programmers, and therefore only the results are of primary concern to the programmers. However, it is still useful to know the basics of the testing techniques being used in order to reveal flaws in the testing or the reporting of results. This can aid you in the end by providing better information at an earlier date to eliminate errors from the application. Many of these tests are not written in code, especially given the need to test and evaluate the user interface and other customer relevant factors.

When testing at the unit and integration level, the goal and therefore method of testing can again be divided into three primary types, which could also be thought of as the three Rs for easy recall:

Result testing
Resource testing
Regression testing

Result testing is the first testing that is generally written, and aims to test that the unit of code achieves the goals it was created to reach. This form of testing should also verify that the code handles errors as expected, once again given the proper result according to the requirements. Passing this test provides the necessary condition to certify that the code is working as expected. However, this does not necessarily indicate that the code is complete.

Resource testing tests whether the code performs within the limits of the resources available to it. This includes testing performance, memory usage, and I/O handling. Resource testing should perform these tests both under normal operating circumstances and under heavy load. In the case of heavy load, the code should continue to either operate or fail gracefully. Performance testing tends to be the most heavily emphasized of this type of testing, but the other tests should receive sufficient attention. This is particularly important under certain architectures.

Finally, regression testing occurs after a unit of code has passed result testing. The results from previous tests can be used to verify that the code continues to operate as expected. This is particularly useful for refactoring, where the goal is to improve code readability and maintainability without affecting the results. It is also very important for performance and other resource tuning, as their goal is also to change only the resource usage of the code without affecting the results.

This is only a brief overview of the process of testing, and you are encouraged to find out more about testing. There are libraries available for most of the common programming languages to assist in testing. For example, Java has JUnit, and C++ has CppTest. Using these libraries is important because it removes part of the tedious and error-prone part of testing, thus making testing much more useful. Regardless, testing is extremely important in whatever manner you choose to perform it.

Refactoring

Refactoring is another important technique that is essential to enhancing your skills as a programmer. Without the proper knowledge of refactoring, it becomes easy to spend too much time in the design stage trying to compensate for every possible change. The other option is to give up on the design entirely and end up with chaos as design changes are requested later in the project. To avoid these fates it is important to understand the concept of refactoring as well as learning and creating techniques for easier refactoring.

The ideas and benefits behind refactoring are best described in [Fowler00]. This book not only outlines the concepts of refactoring, but also provides a large number of systematic instructions and examples for performing many common refactorings. With this excellent reference already available, there is no reason to cover refactoring in detail. Instead, a brief description of the essence of refactoring and why it is important in the prevention, and even more so in the curing, of programmer illness gives the important factors related to the current discussion.

The basic idea of refactoring is to provide a systematic means of making changes to existing code for the purposes of fixing errors, understanding the code, and adapting to changing requirements. Programming illnesses are a major source of errors that can be fixed by proper refactoring. The benefit gained by fixing these problems in a more orderly fashion is the prevention of further errors caused by mistakes made while attempting to perform the required adjustments. In addition, proper refactoring can also improve the readability and maintainability of the code. This makes future changes and error fixes easier to accomplish.

Refactoring also provides enough confidence in the ability of the programmer to adapt to change so that the programmer can move forward once a reasonable level of design is achieved. This prevents analysis paralysis, or the tendency to become stuck in the analysis and design stage trying to anticipate all possible outcomes. However, a word of warning is important here. While a small disciplined team can achieve major success with minimal design and refactoring as advocated by Extreme Programming, not all projects are suitable for this approach. Large teams or less skilled programmers require a more complete design, even if they understand the basics of refactoring. This reflects the complexity of interactions on large projects or less skill in handling the interactions. Therefore, use refactoring as a tool to prevent over design and to fix errors, but do not use it as a crutch to excuse poor design and sloppy coding.

Toward the Programmer’s DSM-IV

In 1952, the American Psychiatric Association published the first edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-I), based on a classification system for mental illnesses created by Emil Kraeplin (1856–1026). The hope was that the disorders could be grouped to determine the proper treatment and understand the course of the disorder if left untreated. Unfortunately, the DSM-I did not fully live up to these expectations. However, through a series of revisions in 1968 (DSM-II), 1980 (DSM-III), 1987 (DSM-III-R), and 1993 (DSM-IV), the reliability and validity of the manual has increased toward meeting these expectations. [Sue94]

A similar aim can be seen for books and discussions about the mental oddities that cause so many problems for programmers of all types. Just as with the DSM-IV, the goal of this book is to improve the classification and treatment of common programming problems caused by incorrect or misinformed thinking by programmers. Also similar, there is an attempt to stress that there are several aspects to these problems, which include the project environment in addition to the thought processes of the programmer. Although not as clinically laid out as the five-axis evaluation method of the DSM-IV, there is still an attempt to stress the importance of the environment and the interrelationship of the various illnesses when it comes to diagnosing, preventing, and curing programming illnesses.

This book represents only a starting point for a more solid classification of the problems that programmers experience and a structured approach to curing these problems. This is not the first book to approach software development from the point of view of common problems and their solutions [Brown98], but it does provide an emphasis on the mistakes made by the individual programmer and a more complete approach to the thinking behind these mistakes. This is necessary to overcome the cause of the problem rather than continually fixing the results. Just as with the DSM, the reliability of the diagnosis of symptoms must be confirmed or refined. The validity of the proposed solutions must also be confirmed or improved to meet the needs of all programmers.

Community

In order to further both your knowledge of programming and the overall collection of programming knowledge, it is important to participate in the software development community. This will allow the industry as a whole to continue to grow and become one that can last for a long time to come. Discussion about the topics in this book is only one of the many venues that need exploration and discussion from programmers in all corners of the world.

There are a plethora of ways to learn and contribute to the software development community, including books, magazines, mailing lists, newsgroups, Web sites, and conferences. Just as you might find answers to questions that would otherwise take you a great deal of time to puzzle through, you might also be able to quickly answer questions that others are having difficulty solving. Most importantly, contributing the knowledge that can only be gained by experience allows the overall knowledge of the community to increase in ways that would not be possible without open communication.

A common objection that comes up when discussing participation in the community is proprietary information. This is a valid concern, although not to the degree that it is often taken. It is, however, unlikely that everything you do will fall under the umbrella of secret information. Much of the code that is written to support an application is easy to recreate and does not represent a business advantage. The primary advantage comes from the integration of the separate parts into a whole package, along with proprietary assets and a few key sections of code that contain proprietary algorithms. There is generally still plenty of knowledge and techniques gained that can be shared without giving up the competitive advantage, and in return, this encourages others to contribute information that you can use to increase your development speed and quality.

< Day Day Up >