Section 6.6. The Difficult Parts of Testing | Practical Development Environments

6.6. The Difficult Parts of Testing

There are two ways to write error-free programs; only the third one works.

Alan Perlis

"Epigrams in Programming" ACM SIGPLAN, September 1982

Personally, I find that testing a piece of software is full of activities that somehow turn out to be harder than I feel they ought to be. This section looks at some of these difficult parts of testing.

6.6.1. Faults of Omission

No matter how systematic and thorough you've been, there are always missing tests and the bugs that they never caught. These missing tests often seem so obvious after they are noticed. Of course, there are plenty of these faults of omission in the source code too, and some are found by the tests that you did remember to write and run.^[3]

^[3] According to Brian Marick at http://www.testing.com/writings/omissions.html, 20% to 50% of bugs in shipped products are faults of omissionsource code that needed to be written, but never was.

A general intent that is helpful when thinking of tests is to not only test for what the product should do, but also test that it hasn't done anything that it shouldn't do. For example, when you add a value to a complex data structure, consider not just whether that value is present, but also whether any extra values were added or other structures were changed.

It takes a particularly doubting outlook about software products to be able to consistently imagine conditions that the designers and developers of a product overlooked. Great testers can be psychologically exhausting for a team, since they can always find something wrong. Still, a good bug can be as surprising as seeing how someone cracked your product, and it can generate the same sneaky admiration in developers for the technical insight shown.

There is an interesting quote from Donald Knuth, who has written and tested plenty of complex software products, as well as having written the classic Art of Programming series of books. In a paper describing his experiences testing the typesetting program TEX, he wrote:

I get into the meanest, nastiest frame of mind I can manage, and I write the nastiest [testing] code I can think of. Then I turn around and embed that in even nastier constructions that are nearly obscene. (D. E. Knuth, "The Errors of TeX," Software Practice & Experience 19, no. 7 [1989]: 625-626)

Which at least has passion to recommend it. The next sentence is quoted less frequently, though it has some serious implications for anyone else maintaining his tests:

The resulting test program is so crazy that I could not possibly explain to anybody else what it is supposed to do.

6.6.2. Capturing Output

Many tests display text as messages while they are running, and the final result of the test may be part of this output, which then has to be searched for by some tool or other. (Other tests are more silent and simply indicate the test's status by their exit code.) One of the hardest parts of creating a good test environment is making sure that all the text output can be captured in a synchronized and complete way on many platforms.

Most operating systems support the idea of input, output, and error streams for programs, though in very different ways at their lowest levels. By default, there is an input stream for the program to receive data on, an output stream on which text appears, and an error stream for displaying error messages. Other streams can be defined for raw file-based input and output. Each of these streams can be redirected to and from files, or merged together. Some recurring problems when using this model are:

Merged output: If two threads in a program write to the same output stream at the same time, the text is mangled, with parts of both messages appearing together. The program has to provide a way to synchronize access to the stream. Windows locks its files so that only one source can write to them at a time.
Buffered output: If you are using buffered streams, then to improve performance no data is written until a certain number of bytes have accumulated in the buffer. However, if your program stops unexpectedly without flushing the output buffer, you may lose the critical messages that would tell you why the program stopped.
Child processes: If one process starts another process, then arranging for the output from the child process to be redirected along with the output of the parent process is prone to error.
Complex code: Neither Windows nor Unix primitive operations in the area of input and output are particularly easy to use correctly.

This is a good area in which to look for working examples in any test environment.

6.6.3. Using Multiple Machines

Running tests on multiple machines creates a whole level of difficulty beyond testing on a single machine. Just the time taken to run these tests manually can quickly become overwhelming. I recall trying to test an application that used multicast without a lab full of test machines or any way to reliably run commands remotely on the machines that we did have. We ended up using everyone's desktop machines on the weekend and ran from one machine to another to start and stop the tests!

The difficulties here can be broken down into three parts: sending commands to remote machines, starting and stopping the tests, and collecting the resulting data.

6.6.3.1. Sending commands

For less than about 16 machines, especially if they are all running the same operating system, a KVM (keyboard, video, monitor) switch, a shared monitor, and a programmable keyboard can take you a surprisingly long way. Many KVM switches allow you to change between machines by using a short sequence of cryptic keystrokes. First, create an executable script to run the test on one machine and name the script using a single letter; then program the keyboard to change machines using a single function key. To run the script on all the machines, simply peck away at the two keys until everything is running. VNC (http://www.realvnc.com) is a program that lets you see the desktops of different machines, and you can do a similar thing with VNC screens. All of this isn't very elegant at all, but sometimes it's all that's needed.

Rather more general ways of sending commands to remote machines are to use rsh and its more secure descendant ssh. Two open source tools that can run commands remotely are fanout (http://www.stearns.org/fanout/README.html) and BitCluster (from BitKeeper, the eponymous maker of the SCM tool described in Section 4.6.5), which is available as an alpha release from http://www.bitmover.com/bitcluster.

Using CORBA ORBs is another way to communicate with remote machines. Applications can send command strings to multiple machines using different languages and on different platforms. You do still have to write the code that actually executes the commands on each platform, and most ORBs don't support multicast, so commands are sent to each machine in turn.

6.6.3.2. Starting the tests

Once you have a way to send commands to multiple machines, you'll need to administer the machines so that they all have the desired version of the operating systems and the correct copies of the test programs. Doing this can take a fair amount of time in itself. Now you need a master test-control tool to send the correct commands to each machine. You may want to be able to take snapshots of the test results during the tests in order to see what's happening on each machine. Without regular peeks at each machine and good logfiles with synchronized timestamps on each machine, it's often hard to work out why a test or product crashed on only a few machines out of many. Some of the tests may also require more input from the master machine after some time has passed. In this case, you'll need a way for each machine to indicate that it's waiting and is ready for more commands.

6.6.3.3. Collecting the data

Once the tests have finished on every machine, the amount of data to be processed becomes an issue. Copying all the results that are on the test machines back to a central location can overload that one machine's resources and slow down the network. If affecting the local network while the tests are running is not a concern, then you could copy results during the tests or use a single file server mounted remotely on each of the test machines.

One good idea is to preprocess the results while the data is still on the test machines and then copy just the summaries to the central location, where they can be assembled into the report for the whole test.

To ensure that your collected data is meaningful, you should make sure that the clocks on all the machines involved in the tests are synchronized (for example, by using ntp on Unix, or the Windows synchronization client or Tardis on Windows machines). You might also need to postprocess results that come from different time zones and watch out for daylight savings time occurring on different days in different countries.

One interesting project that shows what can be done with a distributed test environment is SmartFrog (http://sourceforge.net/projects/smartfrog), a framework developed by Hewlett-Packard and released as open source software.

6.6.4. Only a Developer Can Do That!

There always seem to be some bugs where the only person who can test the fix is the developer who made the fix. This is irritating, since it was usually a tester who noticed the problem and filed the bug. One example of this sort of bug is in stress testing, where to overload the product, a developer may need to change the values of array sizes, file sizes, or connection speeds, and these changes have to be made in the product's source code.

For instance, suppose that the product crashed during some long-running tests, and it was eventually determined that this was due to the rest of the internal network being overloaded for more than 15 minutes. A bug is filed, and the assigned developer believes she has a fix. It's more than a little inconvenient to overload the internal network again, so she makes changes in the source code that affect the way the product accesses the network, in order to simulate the extended network overload. How is a tester supposed to confirm that the bug is fixed? Without simulating the entire network, the closest to confirmation anyone can come to is to have the developer explain the changes and to provide test versions of the product with and without the changes.

The most common example of bugs that only developers can test occurs when part of the product has no API, and that part is where the bug is. If the bug needs to be tested by someone other than a developer, then the best solution is to add a test API to that part of the product. It's quite likely to be similar to the debugging code added by the developers anyway. Of course, you have to be careful that the test APIs are not enabled when the product is shipped, and that they are not relied on by developers and testers during the ordinary use of the product.

One way to get around this whole problem is to make sure that the testers on the project are comfortable inspecting source code, can build and execute the unit tests, and can add hooks into the product for easier testing. If the project's testers are not comfortable with these activities, then at least make sure that developers and testers are using the product in the same way, so you can avoid the equally irritating opposite of this section's title: "Only a tester can do that!"

6.6.5. Accessibility Testing

Testing a product to make sure that it can be used by people with different needs from the developers is always hard to do automatically. So much of what a customer can or cannot use is hard to embed in tests, particularly in automated tests. The classic example is making sure that information on web pages can be read by people with vision problems. Since accessibility is now required by law in various countries, this is an area that would really benefit from better automated testing tools.

My personal frustration is with the use of color alone to add information to a report or web page. Along with up to 10% of all men, I was born red/green color-blind, so some common color combinations look identical to me.^[4] This doesn't mean that all colors appear the same, or that I can't tell what red looks like (to me). It does mean that certain colors that appear quite different to my wife look the same to me, just as some people can't tell two close musical notes apart. Some practical examples are that red chalk on a green blackboard is hard to see, and maps and diagrams that have regions with similar colors are pretty much useless to me.

^[4] Color-blindness is also known as "color variance" or "Daltonism."

Perhaps you've seen products that change tiny little images from green to red to tell you when something has gone wrong? These products just don't get purchased by color-blind people, because we can't see any of the changes. Products that have considered this issue will allow you to configure the colors used in different parts of the product and may even provide some different color schemes or skins. (Skins are not there just to make the product look cool.)

As this issue has become better understood by web designers, web sites have been created with suggestions of color combinations to avoid, or even to show you what your web site looks like to a color-blind person. One good site as a starting point for more information is http://more.btexact.com/people/rigdence. There is also a wonderful tool at http://www.vischeck.com/vischeck/vischeckURL.php that simulates how web pages appear to color-blind people.