Section 6.4. A Regression Test Script

6.4. A Regression Test Script

As we've seen, Python provides interfaces to a variety of system services, along with tools for adding others. Example 6-5 shows some commonly used services in action. It implements a simple regression-test system by running a command-line program with a set of given input files and comparing the output of each run to the prior run's results. This script was adapted from an automated testing system I wrote to catch errors introduced by changes in program source files; in a big system, you might not know when a fix is really a bug in disguise.

Example 6-5. PP3E\System\Filetools\regtest.py

 #!/usr/local/bin/python import os, sys, time                      # get system, python services from glob import glob                     # filename expansion print 'RegTest start.' print 'user:', os.environ['USER']           # environment variables print 'path:', os.getcwd( )                # current directory print 'time:', time.asctime( ), '\n' program = sys.argv[1]                       # two command-line args testdir = sys.argv[2] for test in glob(testdir + '/*.in'):        # for all matching input files     if not os.path.exists('%s.out' % test):         # no prior results         os.system('%s < %s > %s.out 2>&1' % (program, test, test))         print 'GENERATED:', test     else:         # backup, run, compare         os.rename(test + '.out', test + '.out.bkp')         os.system('%s < %s > %s.out 2>&1' % (program, test, test))         os.system('diff %s.out %s.out.bkp > %s.diffs' % ((test,)*3) )         if os.path.getsize(test + '.diffs') == 0:             print 'PASSED:', test             os.remove(test + '.diffs')         else:             print 'FAILED:', test, '(see %s.diffs)' % test print 'RegTest done:', time.asctime( )

Some of this script is Unix biased. For instance, the 2>&1 syntax to redirect stderr works on Unix but not on all flavors of Windows, and the diff command line spawned is a Unix utility (cmp does similar wok on Windows). You'll need to tweak such code a bit to run this script on certain platforms.

Also, given the improvements to the os module's popen calls as of Python 2.0, these calls have now become a more portable way to redirect streams in such a script and an alternative to shell command redirection syntax (see the subprocess module mentioned near the end of the prior chapter for another way to control process streams).

But this script's basic operation is straightforward: for each filename with an .in suffix in the test directory, this script runs the program named on the command line and looks for deviations in its results. This is an easy way to spot changes (called regressions) in the behavior of programs spawned from the shell. The real secret of this script's success is in the filenames used to record test information; within a given test directory testdir:

testdir/test.in files represent standard input sources for program runs.
testdir/test.in.out files represent the output generated for each input file.
testdir/test.in.out.bkp files are backups of prior .in.out result files.
testdir/test.in.diffs files represent regressions; output file differences.

Output and difference files are generated in the test directory with distinct suffixes. For example, if we have an executable program or script called shrubbery and a test directory called test1 containing a set of .in input files, a typical run of the tester might look something like this:

 % regtest.py shrubbery test1 RegTest start. user: mark path: /home/mark/stuff/python/testing time: Mon Feb 26 21:13:20 1996 FAILED: test1/t1.in (see test1/t1.in.diffs) PASSED: test1/t2.in FAILED: test1/t3.in (see test1/t3.in.diffs) RegTest done: Mon Feb 26 21:13:27 1996

Here, shrubbery is run three times for the three .in canned input files, and the results of each run are compared to output generated for these three inputs the last time testing was conducted. Such a Python script might be launched once a day to automatically spot deviations caused by recent source code changes (e.g., from a cron job on Unix).

We've already met system interfaces used by this script; most are fairly standard Unix calls, and they are not very Python specific. In fact, much of what happens when we run this script occurs in programs spawned by os.system calls. This script is really just a driver; because it is completely independent of both the program to be tested and the inputs it will read, we can add new test cases on the fly by dropping a new input file in a test directory.

So given that this script just drives other programs with standard Unix-like calls, why use Python here instead of something like C? First, the equivalent program in C would be much longer: it would need to declare variables, handle data structures, and more. In C, all external services exist in a single global scope (the linker's scope); in Python, they are partitioned into module namespaces (os, sys, etc.) to avoid name clashes. And unlike C, the Python code can be run immediately, without compiling and linking; changes can be tested much quicker in Python. Moreover, with just a little extra work, we could make this script run on Windows too. As you can probably tell by now, Python excels when it comes to portability and productivity.

Because of such benefits, automated testing is a very common role for Python scripts. If you are interested in using Python for testing, be sure to see Python's web site (http://www.python.org) for other available tools. In particular, the PyUnit (a.k.a. unittest) and doctest standard library modules provide testing frameworks for Python programmers. In a nutshell, here's what each does:

PyUnit: An object-oriented framework that specifies test cases, expected results, and test suites; subclasses provide test methods that assert result
doctest: Parses out and reruns tests from an interactive session log that is pasted into a module's docstrings. The logs give both test calls and expected results

See the Python library manual for more details and the Vaults of Parnassus and PyPI web sites for additional testing toolkits in the third-party domain.

Testing Gone Bad?

Once we learn about sending email from Python scripts in Chapter 14, you might also want to augment this script to automatically send out email when regularly run tests fail. That way, you don't even need to remember to check results. Of course, you could go further still.

One company I worked for added sound effects to compiler test scripts; you got an audible round of applause if no regressions were found and an entirely different noise otherwise. (See playfile.py at the end of this chapter for hints.)

Another company in my development past ran a nightly test script that automatically isolated the source code file check-in that triggered a test regression and sent a nasty email to the guilty party (and her supervisor). Nobody expects the Spanish Inquisition!