Data Testing

The simplest view of software is to divide its world into two parts: the data (or its domain) and the program. The data is the keyboard input, mouse clicks, disk files, printouts, and so on. The program is the executable flow, transitions, logic, and computations. A common approach to software testing is to divide up the test work along the same lines.

When you perform software testing on the data, you're checking that information the user inputs, results that he receives, and any interim results internal to the software are handled correctly.

Examples of data would be

The words you type into a word processor
The numbers entered into a spreadsheet
The number of shots you have remaining in your space game
The picture printed by your photo software
The backup files stored on your floppy disk
The data being sent by your modem over the phone lines

The amount of data handled by even the simplest programs can be overwhelming. Remember all the possibilities of input data for performing simple addition on a calculator? Consider a word processor, a missile guidance system, or a stock trading program. The trick (if you can call it that) to making any of these testable is to intelligently reduce the test cases by equivalence partitioning based on a few key concepts: boundary conditions, sub-boundary conditions, nulls, and bad data.

Boundary Conditions

The best way to describe boundary condition testing is shown in Figure 5.5. If you can safely and confidently walk along the edge of a cliff without falling off, you can almost certainly walk in the middle of a field. If software can operate on the edge of its capabilities, it will almost certainly operate well under normal conditions.

Figure 5.5. A software boundary is much like the edge of a cliff.

Boundary conditions are special because programming, by its nature, is susceptible to problems at its edges. Software is very binarysomething is either true or it isn't. If an operation is performed on a range of numbers, odds are the programmer got it right for the vast majority of the numbers in the middle, but maybe made a mistake at the edges. Listing 5.1 shows how a boundary condition problem can make its way into a very simple program.

Listing 5.1. A Simple BASIC Program Demonstrating a Boundary Condition Bug

 1: Rem Create a 10 element integer array 2: Rem Initialize each element to -1 3: Dim data(10) As Integer 4: Dim i As Integer 5: For i = 1 To 10 6:    data(i) = -1 7:    Next i 8: End

The purpose of this code is to create a 10-element array and initialize each element of the array to 1. It looks fairly simple. An array (data) of 10 integers and a counter (i) are created. A For loop runs from 1 to 10, and each element of the array from 1 to 10 is assigned a value of 1. Where's the boundary problem?

In most BASIC scripts, when an array is dimensioned with a stated rangein this case, Dim data(10) as Integerthe first element created is 0, not 1. This program actually creates a data array of 11 elements from data(0) to data(10). The program loops from 1 to 10 and initializes those values of the array to 1, but since the first element of our array is data(0), it doesn't get initialized. When the program completes, the array values look like this:

`data(0)` = 0	`data(6)` = 1
`data(1)` = 1	`data(7)` = 1
`data(2)` = 1	`data(8)` = 1
`data(3)` = 1	`data(9)` = 1
`data(4)` = 1	`data(10)` = 1
`data(5)` = 1

Notice that data(0)'s value is 0, not 1. If the same programmer later forgot about, or a different programmer wasn't aware of how this data array was initialized, he might use the first element of the array, data(0), thinking it was set to 1. Problems such as this are very common and, in large complex software, can result in very nasty bugs.

Types of Boundary Conditions

Now it's time to open your mind and really think about what constitutes a boundary. Beginning testers often don't realize how many boundaries a given set of data can have. Usually there are a few obvious ones, but if you dig deeper you'll find the more obscure, interesting, and often bug-prone boundaries.

NOTE

Boundary conditions are those situations at the edge of the planned operational limits of the software.

When you're presented with a software test problem that involves identifying boundaries, look for the following types:

Numeric	Speed
Character	Location
Position	Size
Quantity

And, think about the following characteristics of those types:

First/Last	Min/Max
Start/Finish	Over/Under
Empty/Full	Shortest/Longest
Slowest/Fastest	Soonest/Latest
Largest/Smallest	Highest/Lowest
Next-To/Farthest-From

These are not by any means definitive lists. They cover many of the possible boundary conditions, but each software testing problem is different and may involve very different data with very unique boundaries.

TIP

If you have a choice of what data you're going to include in your equivalence partition, choose data that lies on the boundary.

Testing the Boundary Edges

What you've learned so far is that you need to create equivalence partitions of the different data sets that your software operates on. Since software is susceptible to bugs at the boundaries, if you're choosing what data to include in your equivalence partition, you'll find more bugs if you choose data from the boundaries.

But testing the data points just at the edge of the boundary line isn't usually sufficient. As the words to the "Hokey Pokey" imply ("Put your right hand in, put your right hand out, put your right hand in, and you shake it all about…"), it's a good idea to test on both sides of the boundaryto shake things up a bit.

You'll find the most bugs if you create two equivalence partitions. The first should contain data that you would expect to work properlyvalues that are the last one or two valid points inside the boundary. The second partition should contain data that you would expect to cause an errorthe one or two invalid points outside the boundary.

TIP

When presented with a boundary condition, always test the valid data just inside the boundary, test the last possible valid data, and test the invalid data just outside the boundary.

Testing outside the boundary is usually as simple as adding one, or a bit more, to the maximum value and subtracting one, or a bit more, from the minimum value. For example:

First1/Last+1
Start1/Finish+1
Less than Empty/More than Full
Even Slower/Even Faster
Largest+1/Smallest1
Min1/Max+1
Just Over/Just Under
Even Shorter/Longer
Even Sooner/Later
Highest+1/Lowest1

Look at a few examples so you can start thinking about all the boundary possibilities:

If a text entry field allows 1 to 255 characters, try entering 1 character and 255 characters as the valid partition. You might also try 254 characters as a valid choice. Enter 0 and 256 characters as the invalid partitions.
If a program reads and writes to a CD-R, try saving a file that's very small, maybe with one entry. Save a file that's very largejust at the limit for what the disc holds. Also try saving an empty file and a file that's too large to fit on the disc.
If a program allows you to print multiple pages onto a single page, try printing just one (the standard case) and try printing the most pages that it allows. If you can, try printing zero pages and one more than it allows.
Maybe the software has a data-entry field for a 9-digit ZIP code. Try 00000-0000, the simplest and smallest. Try entering 99999-9999 as the largest. Try entering one more or one less digit than what's allowed.
If you're testing a flight simulator, try flying right at ground level and at the maximum allowed height for your plane. Try flying below ground level and below sea level as well as into outer space.

Since you can't test everything, performing equivalence partitioning around boundary conditions, such as in these examples, to create your test cases is critical. It's the most effective way to reduce the amount of testing you need to perform.

NOTE

It's vitally important that you continually look for boundaries in every piece of software you work with. The more you look, the more boundaries you'll discover, and the more bugs you'll find.

NOTE

Buffer Overruns are caused by boundary condition bugs. They are the number one cause of software security issues. Chapter 13, "Testing for Software Security," discusses the specific situations that cause buffer overruns and how you can test for them.

Sub-Boundary Conditions

The normal boundary conditions just discussed are the most obvious to find. They're the ones defined in the specification or evident when using the software. Some boundaries, though, that are internal to the software aren't necessarily apparent to an end user but still need to be checked by the software tester. These are known as sub-boundary conditions or internal boundary conditions.

These boundaries don't require that you be a programmer or that you be able to read the raw code that you're testing, but they do require a bit of general knowledge about how software works. Two examples are powers-of-two and the ASCII table. The software that you're testing can have many others, so you should talk with your team's programmers to see if they can offer suggestions for other sub-boundary conditions that you should check.

Powers-of-Two

Computers and software are based on binary numbersbits representing 0s and 1s, bytes made up of 8 bits, words (on 32-bit systems) made up of 4 bytes, and so on. Table 5.1 shows the common powers-of-two units and their equivalent values.

Table 5.1. Software Powers-of-Two
Term	Range or Value
Bit	0 or 1
Nibble	015
Byte	0255
Word	04,294,967,295
Kilo	1,024
Mega	1,048,576
Giga	1,073,741,824
Tera	1,099,511,627,776

The ranges and values shown in Table 5.1 are critical values to treat as boundary conditions. You likely won't see them specified in a requirements document unless the software presents the same range to the user. Often, though, they're used internally by the software and are invisible, unless of course they create a situation for a bug.

AN EXAMPLE OF POWERS-OF-TWO

An example of how powers-of-two come into play is with communications software. Bandwidth, or the transfer capacity of your information, is always limited. There's always a need to send and receive information faster than what's possible. For this reason, software engineers try to pack as much data into communications strings as they can.

One way they do this is to compress the information into the smallest units possible, send the most common information in these small units, and then expand to the next size units as necessary.

Suppose that a communications protocol supports 256 commands. The software could send the most common 15 commands encoded into a small nibble of data. For the 16th through 256th commands, the software could then switch over to send the commands encoded into the longer bytes.

The software user knows only that he can issue 256 commands; he doesn't know that the software is performing special calculations and different operations on the nibble/byte boundary.

When you create your equivalence partitions, consider whether powers-of-two boundary conditions need to be included in your partition. For example, if your software accepts a range of numbers from 1 to 1000, you've learned to include in your valid partition 1 and 1000, maybe 2 and 999. To cover any possible powers-of-two sub-boundaries, also include the nibble boundaries of 14, 15, and 16, and the byte boundaries of 254, 255, and 256.

ASCII Table

Another common sub-boundary condition is the ASCII character table. Table 5.2 is a partial listing of the ASCII table.

Table 5.2. A Partial ASCII Table of Values
Character	ASCII Value	Character	ASCII Value
Null	0	B	66
Space	32	Y	89
/	47	Z	90
0	48	[	91
1	49	'	96
2	50	a	97
9	57	b	98
:	58	y	121
@	64	z	122
A	65	{	123

Notice that Table 5.2 is not a nice, contiguous list. 0 through 9 are assigned to ASCII values 48 through 57. The slash character, /, falls before 0. The colon, :, comes after 9. The uppercase letters A through Z go from 65 to 90. The lowercase letters span 97 to 122. All these cases represent sub-boundary conditions.

If you're testing software that performs text entry or text conversion, you'd be very wise to reference a copy of the ASCII table and consider its boundary conditions when you define what values to include in your data partitions. For example, if you are testing a text box that accepts only the characters AZ and az, you should include in your invalid partition the values just below and above those in the ASCII table@, [, ', and {.

ASCII AND UNICODE

Although ASCII is still very popular as the common means for software to represent character data, it's being replaced by a new standard called Unicode. Unicode was developed by the Unicode Consortium in 1991 to solve ASCII's problem of not being able to represent all characters in all written languages.

ASCII, using only 8 bits, can represent only 256 different characters. Unicode, which uses 16 bits, can represent 65,536 characters. To date, more than 39,000 characters have been assigned, with more than 21,000 being used for Chinese ideographs.

Default, Empty, Blank, Null, Zero, and None

Another source of bugs that may seem obvious is when the software requests an entrysay, in a text boxbut rather than type the correct information, the user types nothing. He may just press Enter. This situation is often overlooked in the specification or forgotten by the programmer but is a case that typically happens in real life.

Well-behaved software will handle this situation. It will usually default to the lowest valid boundary limit or to some reasonable value in the middle of the valid partition, or return an error.

The Windows Paint Attributes dialog box (see Figure 5.6) normally places default values in the Width and Height text fields. If the user accidentally or purposely deletes them so that the fields are blank and then clicks OK, what happens?

Figure 5.6. The Windows Paint Attributes dialog box with the Width and Height text fields blanked out.

Ideally, the software would handle this by defaulting to some valid width and height. If it didn't do that, some error should be returned, which is exactly what you get (see Figure 5.7). The error "Bitmaps must be greater than one pixel on a side" isn't the most descriptive one ever written, but that's another topic.

Figure 5.7. The error message returned if Enter is pressed with the Width and Height text fields blanked out.

TIP

Always consider creating an equivalence partition that handles the default, empty, blank, null, zero, or none conditions.

You should create a separate equivalence partition for these values rather than lump them into the valid cases or the invalid cases because the software usually handles them differently. It's likely that in this default case, a different software path is followed than if the user typed 0 or 1 as invalid values. Since you expect different operation of the software, they should be in their own partition.

Invalid, Wrong, Incorrect, and Garbage Data

The final type of data testing is garbage data. This is where you test-to-fail. You've already proven that the software works as it should by testing-to-pass with boundary testing, sub-boundary testing, and default testing. Now it's time to throw the trash at it.

Software testing purists might argue that this isn't necessary, that if you've tested everything discussed so far you've proven the software will work. In the real world, however, there's nothing wrong with seeing if the software will handle whatever a user can do to it.

If you consider that software today can sell hundreds of millions of copies, it's conceivable that some percentage of the users will use the software incorrectly. If that results in a crash or data loss, users won't blame themselvesthey will blame the software. If the software doesn't do what they expect, it has a bug. Period.

So, with invalid, wrong, incorrect, and garbage data testing, have some fun. If the software wants numbers, give it letters. If it accepts only positive numbers, enter negative numbers. If it's date sensitive, see if it'll work correctly on the year 3000. Pretend to have "fat fingers" and press multiple keys at a time.

There are no real rules for this testing other than to try to break the software. Be creative. Be devious. Have fun.

Boundary Conditions

Figure 5.5. A software boundary is much like the edge of a cliff.

Listing 5.1. A Simple BASIC Program Demonstrating a Boundary Condition Bug

Types of Boundary Conditions

Testing the Boundary Edges

Sub-Boundary Conditions

Powers-of-Two

Table 5.1. Software Powers-of-Two

AN EXAMPLE OF POWERS-OF-TWO

ASCII Table

Table 5.2. A Partial ASCII Table of Values

ASCII AND UNICODE

Default, Empty, Blank, Null, Zero, and None

Figure 5.6. The Windows Paint Attributes dialog box with the Width and Height text fields blanked out.

Figure 5.7. The error message returned if Enter is pressed with the Width and Height text fields blanked out.

Invalid, Wrong, Incorrect, and Garbage Data