Practice1.Ruthless Testing | Sustainable Software Development: An Agile Perspective

Practice 1. Ruthless Testing

Ruthless testing is all about developers doing absolutely everything in their power to ensure that their software is as heavily tested as it can be before the software reaches people, both QA and customers.

Ruthless testing is one of the most crucial practices of sustainable development and if there is one practice to focus on, this is it. Ruthless testing is primarily about automated testing. Developers should not do any of the following:

Spend the bulk of their time testing features the way QA or users would.
Turn themselves into mindless button pushers in the hopes of finding problems.
Spend time writing testing plans or testing scripts (i.e., push this button, then do that) that they or someone else would use to test a feature. This should remind readers of the need for minimal documentation and practicing the art of doing only those tasks that are of high value. Test scripts may be useful in a QA organization to document tests but they have limited usefulness to developers.
Advocate the hiring of lower paid workers such as co-ops or interns to perform testing tasks.

All of the above would be a waste of time, effort, and money.

Developers have access to a testing resource that is excellent at repeatedly performing mind-numbingly dull tasks: a computer! Tests can be written as code, keeping the task of software development as a coding exercise with more code produced to complete a feature, but usually more code dedicated to testing than to implementing the feature. Developers still have to do some user-level testing, just as they do today. The difference is that as developers add and change functionality and fix defects, they put in place safeguards in the form of automated tests. These automated tests are then run as often as possible to give the developers confidence that their product is behaving as expected and keeping the expected behavior over time.

There are multiple types, or levels, of testing that can (and should) be utilized in Ruthless Testing as shown in Figure 5-3. Each of these types has varying degrees of effectiveness, which is described in more detail in the following sections.

Figure 5-3. The different types, or levels, of tests that should be utilized in ruthless testing. The types of tests are organized by how aware they are of user tasks and code implementation. User-awareness indicates how close the tests are to replicating exact tasks that users perform. Code-awareness indicates the degree of knowledge the tests have with the actual implementation of the product.

Test automation is a critical part of ruthless testing. Code-aware tests have the greatest potential for automation and provide the greatest return for the least amount of effort. The user aware tests of usability, user verification, and visual validation are rarely automated because each of these tests requires human effort for the design of the tests and careful observation and analysis of the test results. However, while code-aware tests can determine whether the code is working as expected, they do not necessarily ensure that the product works as users expect. Hence, ruthless testing requires a strategy that mixes both code-level and user-level tests.

Unit Tests: Test-Driven Development

Unit tests are the lowest level of tests you can create for software because they are written with explicit knowledge of the software's classes, methods, and functions. They essentially consist of calling methods or functions to ensure that the results are as intended. Test-driven development means writing the unit tests for new code before writing the code and then ensuring that the tests always pass whenever new changes are made [Astels 2003] [Beck 2002a] [Husted and Massol 2003] [Link and Frolich 2003]. In test-driven development, tests are run every time the developer builds the software. The result of the build is either a green light (all tests passed) or a red light (a test failed).

Test-driven development is a powerful way to develop software. It should not be viewed as optional, and it is a crucial part of achieving sustainability. Some of the benefits of test-driven development are:

Code with tests can be modified and refactored with a great deal of confidence because errors are detected immediately by the tests. This is important not only for the developer who originally wrote the code, but also for developers unfamiliar with the code.
The tests act as invaluable documentation of the behavior of the software. If a change is made to one section of code that changes the behavior expected by another section of code, then the tests will fail. This is vital to preventing defects in existing features.
Developers who are very good at test-first development report that they spend significantly less time debugging their software; some even report zero debugging time!

Tip: Use Mock Objects!

If you deal with databases, network interfaces, user interfaces, or external applications you should learn about Mock Objects [Thomas and Hunt 2002] [http://www.mockobjects.com]. Mock objects are essentially a design and testing technique that let you simulate an interface so that you can test your code in isolation while having control over the types of events and errors that reach your code.

Tip: Be Creative When Implementing Automated Testing

Automated tests shouldn't always mean manually specifying calls and expected return values and error conditions as in JUnit. In many cases, it is possible to write a program to generate tests for you. For example, suppose you have a piece of software (a method, class, or component) that accepts a string of flags just like the typical UNIX utility (e.g., grep, awk, sed). You could write a program that called the code with the flags in all possible orders. This type of test is extremely thorough and bound to pay off. I have seen cases where libraries that have been in use for years with no tests have tests added to them in this way and serious problems are uncovered, in some cases fixing problems that had been long outstanding and thought to be unreproducible!

Integration Tests

Integration tests test the interface exposed by a collection of classes that are linked or grouped into a library/module/component. Unlike unit tests, they do not understand the implementation of the library. Hence, integration tests can be a useful mechanism to document and test the expected behavior of a library.

Figure 5-4 illustrates the difference between unit and integration tests. In this example, the unit tests for each class in the library test the class's public and private interfaces, while the library's integration tests only test the collective public interface of the library. Hence, integration tests can be considered optional because they are a subset of the collective unit tests. However, integration tests can still be useful because they are separately maintained and created from the unit tests, especially if the programmers who depend on the library use the integration tests to document how their code expects the library to behave. The tests don't need to be complete; they only need to test the expected behavior. Two situations where integrations tests are most useful are where a library is used in more than one application (or program) and when a third-party library is used.

Figure 5-4. Integration tests are different from unit tests. In this example, a library called Omega is made up of three classes. Each class has its own unit tests, and the library has a separate set of integration tests.

One example of when to use integration tests is when you are dealing with a third-party library or any library that the product imports in a precompiled form. What you need is a test harness and a set of tests that realistically represent how you are using that library, the methods you call, and the behavior you expect when you call those methods. You often don't need to test all the interfaces exposed by the library, just the ones you depend on. Then, as new versions of the library are shipped to you, all you need to do is plug the new version into your test harness and run the tests. The more portions of your software architecture that you can treat as third-party libraries and test in this way, the more robust the interfaces your architecture will have. Having sound interfaces will give you a great deal of flexibility in the future so that at any time you could choose to replace an implementation of one of these interfaces without disrupting the entire system. This is a huge benefit and speaks to the need to have well-designed interfaces that are easily tested.

The Importance of Automated Testing

I first learned the vital importance of automated testing on a team that I worked on that produced networking software. My team had its own live test network, and every time we made a change to our software, we would push the latest software out to the test network and run a number of tests. Some of the tests were manual ones, but the most effective tests were the automated integration tests. Our automated tests ran 24 hours a day, 7 days a week because we found that sometimes problems would not show up until after a large period of time had elapsed, sometimes days. I can remember many instances where changes that I made looked fine during a review but actually contained subtle flaws that only showed up after a number of hours or days of repeated stress testing.

One particular episode I remember on the effectiveness of automated tests involved a very large software application company, best left un-named. They found what they thought was a fault in our network protocol stacks and escalated it as a stop-ship problem. When we received the report, we were puzzled because we knew our automated tests covered the case in question. After a discussion, we found out that they thought it was our problem because their code worked on our competitor's protocol stacks. They were very stubborn and refused to admit the problem was in their code, so we got a binary of their software, installed it in our test network, and then ran it over the weekend under a special debugger with a trigger for it to stop when the error condition we were looking for occurred. Monday morning we came in, captured the offending assembly code, called them and said, "We don't have your source code, but you should be able to find it at the given address and reproduce it using our test suite." We sent them the test suite, and they found that the problem actually did occur on the other operating systems but with different symptoms. Then they found that users had reported the problem to them, but they had never been able to reproduce it!

System Tests

System tests are automated tests that cover the functioning of the complete, integrated system or product. System tests know nothing about the structure of the software, only about what the product is intended to do from the user's standpoint. System tests are intended to mimic the actions a user will take, such as mouse movements and keyboard input, so that these inputs can be fed into the application in the desired order and with the desired timing.

System tests are the most difficult type of test to implement well. This is because in order to be useful over the long term, the system test capabilities must be designed into the software. Also, it is exceptionally difficult to add system test capabilities to existing software. For new software, the extra design effort is worth it since well-thought-out system tests can be extremely effective and can significantly contribute to the productivity of the development team by reducing the dependence on people to manually test the software and also by providing an additional mechanism to prevent defects reaching people (i.e., QA and customers).

Design testability into your product. Too many teams rely on external test harnesses or manual testing by people and learn to regret not investing the extra effort. The complexities of modern computer applications make it difficult to create automated system tests. Modern applications contain complexities such as multiple threads of execution, clients and servers that exchange unsynchronized data, multiple input devices, and user interfaces with hundreds of interface elements. The result is highly nondeterministic behavior that, without built-in testability, will lead to unreproducible problems.

Record and Playback

Because of the complexity of automating system-level testing, most software organizations choose to leave system testing to its QA organization (i.e., people). However, this is a wasted opportunity because most software products, especially new ones, could greatly benefit from a record and playback architecture.

Many applications have a built-in logging capability. As the application runs, it records the actions it is taking in a log file. Log files can be useful when debugging a problem. In most cases, however, logging is only intended for the programmers. Imagine if it were possible to read the log file back into the application and have the application perform all the operations recorded in the log file in the same sequence and at the same time to produce the same result as when the recording was made. That is record and playback. With such architecture, recreating problems and debugging them becomes almost trivial [Ronsse et al 2003].

Record and playback is equally effective in deterministic and nondeterministic applications^[1]. One example of a highly nondeterministic application where record/playback is widely employed is in the computer gaming industry. In computer games, game companies invest a great deal of effort in creating special-purpose game engines. These engines take the game terrain, levels, characters, props, etc. and render them in real-time in response to the inputs of one or more users. Game engines are complex pieces of software that must consider multiple inputs (such as in a multiplayer game) where characters in the game can interact with each other and the contents of the scene, and these interactions have a notion of time, space, and sequence; interactions are different depending on where a character is relative to other characters at any given time. Because of these complexities, many game companies have by necessity designed record/playback into their game engines so they can quickly diagnose problems that without record/playback would be virtually impossible to reproduce and fix. In some 3D game engines the playback even allows the use of a completely different camera viewpoint, which is like being a neutral third-party observer to the game play. Many useful articles can be found written about these game engines on game developer web sites such as http://www.gamasutra.com.

^[1] Deterministic programs are those where there is a single line of execution that, when provided with some inputs, will always produce the same output. Nondeterministic programs have multiple execution paths that interact with each other in a complex way depending on some time-based inputs.

The Automation of User Interface Testing

One of the most frequently cited complexities in system testing is testing a product's user interface. One way to test applications with any type of user interface is through a record and playback architecture where all user input is captured in the recording and can be played back (another is to use mock objects, which are described above). With a well-structured application, this might allow the tests to be run in batch mode without the user interface, though this is not necessarily required. However, even if you don't have a record and playback architecture, there are still some useful tests that can be automated.

It is possible to use unit/integration tests on the application logic "underneath" the user interface. In this case, the most effective way to test is to start by separating the application logic from the implementation of the user interface [Astels 2003], such as through use of the Model-View-Controller pattern [Gamma et al 1995]. In terms of testing, given this type of separation of logic, it is very easy to implement unit and integration tests for the non-UI parts of the software. Manual testing of the UI will still be needed, but this would be visual validation testing. The more extensively the underlying application logic is tested, the more the visual validation can focus on the unique aspects of the user interface, such as ensuring that UI elements are displayed correctly.

Visual Validation Testing

Visual applications (those that produce an image, movie, or allow the user to interact with their data in real time) are the most complex to produce automated tests for. There are many cases where a person must look at the visual output or interact with the program in some way (such as in a computer game) to ensure that the application actually produces the outputs that it should. For example, in a 2D graphics application, the only feasible way to ensure that a circle created by the program, especially if it is anti-aliased^[2],is really a circle, is to look at it.

^[2] Anti-aliasing uses shading to create the appearance of a smooth line on the screen.

It is important to ensure that the amount of visual verification required is at an optimal minimal level. This is only possible by automating the code-aware tests as much as possible to ensure that focus can be put on the verification, not say testing for regressions. This can be restated as: Use time spent testing by people wisely; don't get people to do tests that a computer could do.

It should be noted that some aspects of visual validation could also be automated. If the visual output can be captured as images, a database of correct images (as determined by a person) can be used to ensure that images created with the new version of software are the same as the correct images, or different because they should be.

Don't Forget about the Economics of Automated Testing

Remember that it costs time (and money) to write automated tests. Some people claim that writing an automated test the first time costs between 3 and 30 times more than running a test manually. Hence, don't try to automate everything, especially if it involves building up a large infrastructure outside of the actual code that makes up your product and is used by customers. Concentrate on tests that have the highest value or the highest costs to run manually. And if you can, build testability into your software.

Performance Testing

Every piece of software should have some performance goals. Performance tests ensure that your application achieves or exceeds the desired performance characteristics. Performance could be measured in one or all of unit (e.g., to test the performance of frequently called methods or functions), integration, or system tests. The most common way to do performance testing is by using system timers in your tests and either logging the results so they can be compared or graphed against previous test runs or by creating a test failure if the performance is not within some expected range. Alternatively, there are quite a few commercial and open source performance testing applications available. My current favorite is Shark, which is a tool available from Apple for development on OS X.

Run-Time Monitoring

Software should be built with testability in mind. One useful but underutilized idea is to build in run-time monitors. A run-time monitor is a piece of code that can be enabled or disabled and displays or saves a log of critical events (e.g., messages, input/output, memory usage). Run-time monitors augment automated tests, because sometimes it is useful to get live data while the application is running (even at a customer site).

In my experience, it's best to carefully choose a small number of useful monitors that are chosen for the application area. For example, if you are concerned with real-time performance, then a useful monitor would be one that does simple run-time performance analysis and timing so that performance problems can be isolated quickly.

Resource-Usage Testing

Resource usage testing ensures that your application is using desirable amounts of memory, CPU, time, disk bandwidth, network bandwidth, for example. Resource usage tests can be valuable to catch problems early and without a person having to explicitly monitor them. Each project will have its own set of resources that are more important to monitor than others. Some of the factors are:

The type of application. A web-based application uses completely different resources than a desktop application. They might both use a database, but the desktop application is going to use native operating system resources, while the web-based application is going to use resources on the web server and network bandwidth to the client.
The operating system. Each operating system has its own set of resources that are available to applications. For example, in Microsoft Windows it is important to pay attention to graphics contexts because if they are leaked by the application, the application may crash when it tries to create a new graphics context when no more are available.
The programming language. Languages such as C and C++ are notorious for ensuring that programmers must carefully manage their memory allocations. If they don't, memory leaks may result (memory leaks are described in detail below).

Regression Testing

Regression tests are tests written to prevent existing features or bug fixes from being broken when new code changes are made. They aren't a different kind of test, because these tests should be built into unit, integration, and system tests. However, they are included as a reminder: Be sure to add tests when you fix a defectthere is nothing more frustrating than fixing a defect more than once!

Usability Testing

Usability testing is different from all the other types of testing already discussed because it cannot be automated. However, for any application with a user interface, usability testing is a critical part of sustainable development and should not be considered optional unless the interface is brain-dead simple, and even then I'd still recommend it.

Usability testing involves observing real users in realistic conditions using the product in a controlled environment. In a usability test [Rubin 1994] [Dumas and Redish 1999], careful attention must be made to not guide or bias users. Users are asked to perform a predefined task and think out loud while they use the system with more than one person observing and taking notes. The reason several people should observe the tests is to ensure there is no bias or ability to ignore issues that arose. When done well, usability testing will quickly uncover interface and interaction flaws that won't normally be uncovered until after the product ships. Hence, doing usability testing as early as possible is definitely a flavor of being ruthless.

Usability Testing Is Humbling!

I can think of a number of occasions where I have observed a usability test of a feature I implemented. As an observer in these tests, it is very tempting to just jump up and demonstrate the "correct" way to use the feature, but that would defeat the purpose of the testing! As a programmer, it is easy to understand a feature and how to use it. However, I have seen very few cases where a feature can be deemed usable without some form of alteration.

User Verification Testing

User verification testing is typically the last type of testing that is performed before a product is installed and actually used by customers. It isn't necessary for all products and will depend on the customer. For example, government agencies often demand verification testing, as do customers who use products where a product failure could result in the loss of a life, injury, or damage. This testing involves real users performing real work on the software in conditions that are as close as possible to how the end-users will use the product, often in parallel with the live system. In applications such as mission-critical software, this testing could take weeks to months due to the high cost of failure after the software is installed.

Large Customers Need Automated Tests, Too

I was fortunate to work with a customer a number of years ago that did thorough verification testing before installing any new piece of software. This customer was a large insurance firm with a complex computing environment. I think the company was visionary in its use of automated testing, because it developed a comprehensive suite of tests and a separate test environment on an isolated network. Before the company installed any software in its main network, it would install it on its test network and run its automated tests for as many hours (or days) as it felt was necessary.

I enjoyed working with this customer because this company always gave my team prompt feedback on problems it found with our product. Every time the company found a failure, we made sure that our tests were modified to prevent the same problem from occurring again. This led to an excellent working relationship. Because our automated test suite caught problems before they reached this customer and we had a proven ability to respond, this customer was not interested in working with anyone else.

Resources Required for Ruthless Testing

Ruthless testing is only effective if tests are easy to run and results can be obtained and analyzed quickly. Ruthless testing may require additional computing resources and effort, depending on the size and complexity of the product. Extra investment is required when it is impossible for the team to quickly and easily run and analyze tests on its own. You may need to also invest in some centralized or shared fast computers, although an investment in distributed testing might be advisable, where spare CPU cycles are used to run tests, especially at night. Where many tests are being run, some custom programs might help to simplify the presentation and analysis of results.

What If You Can't Implement All the Recommended Types of Tests?

The important thing to remember when considering where to spend your testing investment is that any automation is better than none. Everything you do should start with test-driven development because this instills the discipline of testing into the organization. Then, automate as many system-level tests as you can, preferably in the form of record/playback, because this will ensure testing is designed into your software architecture. If you augment these with usability and user verification testing, your project will be in very good shape.

What If You Are Working on a Product That Has No Tests?

Don't use the fact that no tests currently exist as an excuse to not write tests! Any investment in testing is better than none. Start by adding tests for all your new code, in problem areas of your code, and as changes are made to existing code.

What If Your Tests Take Too Long to Run?

For tests such as unit tests where fast turnaround is required, it may be necessary to reduce the number of tests. If you have this problem, please stop to congratulate yourselves before doing anything elsethis is a great problem to have! To reduce the number of tests, you should consider doing some test coverage analysis to identify your cornerstone tests. Cornerstone tests are tests that provide the maximum benefit (i.e., coverage) with the minimum of overlap with other tests. You could use the cornerstone tests to reduce the number of tests by removing those that provide overlapping coverage.