The Engineering Approach | Software Testing Fundamentals: Methods and Metrics

It seems that people's perception of software engineers is based on programmers, who are regularly called software engineers. However, programmers might not be engineers, they might not act like engineers, and they might not conduct business as engineers do. The traditional branches of engineering-civil, mechanical, electrical, chemical-have a licensing process that engineers must go through to become professional engineers. This process helps ensure that an acceptable level of knowledge and competence exists in the profession. There is no such certification in software engineering.

According to Webster's New World Dictionary, engineering is "(a) the science concerned with putting scientific knowledge to practical uses... (b) the planning, designing, construction, or management of machinery, roads, bridges, buildings, etc. (c) the act of maneuvering or managing...."

Engineers are scientists who apply science to solve problems. We are the practical folks in the sweaty hats. There is art in the definition of engineering also.

Notice the reference in the dictionary definition to management. Management is, according to Webster's New World Dictionary, "the act, art, or manner of managing, or handling, controlling, directing ...."

The practice of engineering is applied science (application of the bodies of knowledge of the various natural sciences), supplemented as necessary by art (know-how built up and handed down from past experience).

The know-how built up and handed down from past experience is also called engineering practice. In civil engineering, engineering practice refers to a body of knowledge, methods, and rules of thumb that consist of accepted techniques for solving problems and conducting business.

Accountability and Performance

The reason for the existence of this body of knowledge, engineering practice, is that engineers are accountable. If a structure fails, the engineer is the one who is probably going to be held responsible.

Nobody knows everything, and mistakes will happen despite the best preparation possible. Engineers must show that they performed their duties according to the best of their abilities in accordance with accepted standards. This is called performance. The engineer's defense will be based on demonstrating that he or she followed acceptable engineering practice.

Engineering practice has some fundamental rules that are of particular interest and value to software engineers and testers:

State the methods followed and why.
State your assumptions.
Apply adequate factors of safety.
Always get a second opinion.

Each of these rules is described in the paragraphs that follow.

Stating the Methods Followed and Why

The International Standards Organization's ISO 9001/EN quality management and quality assurance standards are famous for demanding that software developers "say what they do, and do what they say." But this is only one of the rules that engineers must follow in order to justify and defend what they did or what they intend to do. Most of the content of this book is the "say what I do" part of this requirement. In civil engineering, it is more a case of "say what you do and prove you did it."

Scientific method uses measurements to establish fact, so a discussion of engineering methods would not be complete without a discussion of the metrics used to support the methods. I will formally introduce the metrics used to support the test methods in this book in the next two chapters.

Stating Your Assumptions

No one knows everything or has all the answers. To solve problems in the absence of a complete set of facts, we make assumptions. Assumptions are frequently wrong. The way to mitigate the effects of incorrect assumptions is to publish all assumptions for as wide a review as possible. This increases the chances that someone will spot an incorrect assumption and refute or correct it. If an engineer makes an incorrect assumption but publishes it along with all the other assumptions about the project, if no one challenges or refutes this assumption, the engineer will have a defensible position in the event of a failure. The engineer performed in an acceptable and professional manner.

Our software systems have become so complex that no one can accurately predict all the ramifications of making a change to one. What we do not know for sure, we assume to be this way or that. Assumptions need to be made to fill in the gaps between the known facts. And those assumptions need to be published so that others have the opportunity to refute them and to plan for them. For example, a common assumption is that the test system will be available 100 percent of the time during the test effort. If the system goes down, or if there are resource problems because other groups need access, the loss of system availability can cause a significant impact on the test schedule.

How to Recognize Assumptions

Learning to recognize assumptions takes time and practice. We make assumptions constantly. It takes a conscious effort to try to identify them. The technique I use to recognize an assumption is to first try and identify all the things that I am depending on, and then to put the words "it is assumed that" in front of all those dependencies. If the dependency statement sounds more reasonable with the assumption clause, it goes in the test agreement as an assumption. For example, the statement "The test system will behave as it has in the past" becomes "It is assumed that the test system will behave as it has in the past." System support personnel can then confirm or modify this statement during review of the test agreement. When considered in this way, it becomes clear very quickly just how much we take for granted-like the existence of gravity.

One of the most frightening mistakes I ever made in estimating a test effort was to assume that "bugs found during testing would be fixed within published [previously agreed upon] turnaround times so that testing could continue." For example, a showstopper would be fixed as fast as possible, meaning within hours, and a serious bug would be fixed in a day.

Unbeknownst to the test group, the development manager dictated that "developers would finish writing all the code before they fixed any bugs." The bug fixes necessary for testing to continue never materialized. The developers never really finish writing code, so they never fixed any bugs. All my test agreements now state this assumption explicitly.

Types of Assumptions

The following are some examples of typical assumptions for a software test effort.

Assumption: Scope and Type of Testing

The test effort will conduct system, integration, and function testing.
All unit testing will be conducted by development.

Assumption: Environments That Will Be Tested

The environments defined in the requirements are the only environments that the test effort will be responsible for verifying and validating.

Environment State(s)

All operating system software will be installed before testing is begun.

System Behavior

The system is stable.
The system will behave in the same way it has in the past.

System Requirements and Specifications

The system requirements and specifications are complete and up-to-date.

Test Environment Availability

The test environment will accurately model the real-world environment.
The test environment will be available at all times for the duration of the test effort.

Bug Fixes

Bugs found during testing will be fixed within published turnaround times according to priority.

Apply Adequate Factors of Safety

We have already discussed that we use methods and take measurements in order to make predictions. A factor of safety is a metric. It is the measure of how far wrong a past prediction was, applied to a current prediction to make it more accurate or safe. Engineers adjust their predictions to cope with this reality by applying factors of safety. We will discuss this metric now because it is part of engineering practice in general, rather than software testing in particular.

Demands placed on a design can extend far beyond the original purpose. The engineer is accountable for the integrity of the product. Even if the product is put to uses that were never imagined, there is still a performance requirement. When an engineer designs a bridge, every component and every design specification has a factor of safety applied to it. Say, for example, that the design specification states, "the bridge will be able to carry the load of tractor trailers, all loaded to capacity, parked end-to-end on the bridge deck, during a flood." The engineer would calculate all the loads produced by all those trucks and the flood, and she or he would then multiply that load by a factor of safety, generally 2, and design the bridge to hold double the original required load. This is why bridges very seldom collapse even though they must survive all manner of loads that were never anticipated by the engineer who designed them. I have seen people drive across bridges when the floodwaters were so high that the bridge was completely under water, even though common sense dictates that such a situation is very risky.

Factors of safety are not widely used in commercial software development. If a network architect has a requirement for a switch that can handle 49 simultaneous transactions, the network architect will likely buy the switch that is advertised as capable of handling 50 simultaneous transactions. That same architect will be surprised when the switch fails in real-time operation as the load approaches 40 simultaneous transactions. The reason the system failed is important. But from a reliability standpoint, the failure could have been avoided if a factor of safety had been included in the system design.

In safety-critical software, factors of safety are more commonly implemented using redundant and fault-tolerant systems rather than by expanding design capacity.

How to Determine a Factor of Safety

In this book we use factors of safety primarily in the test estimation process to help get an accurate time line for the test effort. In most branches of engineering, there are established values for factors of safety for many applications. I am not aware of any established factors of safety in software engineering test estimation. The only project management approach commonly used in software development is to allow a few "slack days" in the schedule to absorb overruns and unforeseen events. I disagree with this practice because it is often arbitrary; the time is not budgeted where it will be needed in the schedule, and the amount of time allotted has nothing to do with the real risks in the project.

Factors of safety should be determined based on the error in the previous estimation and then adjusted as needed. Even if a process does not use measurements to arrive at an estimate, a factor of safety can be established for future similar estimates. For example, a test effort was estimated to require 14 weeks. In reality, 21 weeks were required to complete the test effort. The estimate was low by a factor of:

21/14 = 1.5

When the next test estimation effort takes place, if the same or similar methods are used to make the estimate, even if it is based on an I-feel-lucky guess, multiply the new estimate by a factor of 1.5, and you will get an estimate that has been adjusted to be in keeping with reality.

Not all factors of safety are determined analytically. One of my favorite managers was a software development manager whom I worked with for three years at Prodigy. He took every project estimate I gave him and multiplied my estimate by his own factor of safety. While it was normally between 1.5 and 2, it was sometimes as high as 3.5. His estimates were always right, and our projects were always completed on time and on or under budget. Eventually I asked Gary how he knew the correct factor of safety and if he used the same factor on everyone's estimates. He told me that each person needed her or his own individual factor of safety. Gary noted that the adjustments he applied to estimates were not based on calculation but on his experience. In my case, he was correcting for the fact that I tended to push myself harder than he liked in order to meet deadlines and that as a manager he had information about project dependencies that I did not.

Note

It does not matter how a factor of safety is determined; using them improves estimates.

No one knows everything, and no method is perfect. There is no shame in producing an estimate that is initially inaccurate, only in knowingly leaving it unadjusted. Recognizing deficiencies and correcting for them before they can become problems is the goal. Factors of safety adjust estimates to accommodate unknowns.

It has been my experience that management in software houses resists factors of safety. They want to hear a shorter time estimate, not a longer time estimate. I have had good success persuading management to use factors of safety by consistently calculating the adjusted time and making it visible. It is my job to supply the information that management uses to make decisions. If management chooses to ignore my recommendations, that is their prerogative. If management selects the shorter time and if we fail to meet it, I only need to point out the adjusted time estimate to make my point. Over time, I have convinced many managers that a more accurate estimate is preferable even if it is less palatable.

Always Get a Second Opinion

No good reason exists for working without a safety net. Inspection and formal reviews are the most productive way to remove defects that we know of today. Part of the reason is that inspectors bring an outside perspective to the process, so the rest is simple human factors. My first testing mentor, Lawrence, pointed out that inspection and review works because people who do not know anything about the project are likely to find a lot of mistakes missed by people in the project. In addition, these techniques owe much of their success to the fact that it is human nature that when you are expecting company, you generally clean the house before they come over.

Having someone to check your work is very important. If you cannot get anyone to check your work, publish the fact clearly in print. This disclaimer not only protects you, but it warns those reading your work.

The Adversarial Approach versus the Team Approach

When I joined my first software project in 1985, testers worked in isolated autonomous groups separate from the development groups, and the two groups communicated mostly in writing. It was normal for the relationship between developers and software testers to become adversarial from time to time. Boris Beizer and Glenford Myers (and lots of others), experts in software testing at the time, wrote about this fact. Dr. Beizer even dedicated a chapter in one of his books to how test managers can defend and protect their testers.

Another characteristic of the time was the way the value of a test was measured. In his wonderful book, The Art of Software Testing, Glenford Myers writes, "A good test is one which finds bugs." While I agree there is merit in this thought, it suggests that a test that does not find bugs does not tell us anything useful, and that is not true. The goal of finding bugs is important, but when it becomes the sole focus of a test effort, it creates a negative imbalance in the perspective of the project personnel such that only the negative aspects of a software system are valued by the testers. This tight focus on the negative aspects of a software system is sure to cause resentment on the part of development, which can lead to an adversarial relationship.

This idea that the only valuable test is one that finds a bug was a product of the time. At that time, most software testing was conducted by engineers, or people with a science background. Testing in the traditional engineering disciplines is conducted by stressing a component until it breaks. The expected outcome is always failure. In materials testing, for instance, a newly designed steel beam is placed in a machine that will apply various loads to the beam. The tester uses very precise instruments to measure the reaction of the beam to each load. The load is increased until ultimately the beam fails. The actual load at which the beam failed is compared to the theoretical ultimate loading for the beam. The ideal situation is that the calculated ultimate load agrees closely with the actual ultimate load. This means that the predicted ultimate load was correct. Concurrence between actual and predicted behavior gives the engineers increased confidence that the other predicted behaviors of the beam will correlate closely with its actual behavior in the real world.

The traditional engineering approach to testing is not always a good fit with the needs of software testing. While some parts of the software environment can be tested to an ultimate load-for instance, the maximum number of bytes per second that can be transmitted in a certain communications environment-the concept is meaningless for most software modules. A software module is not like a steel beam. With the level of sophistication that exists in today's software systems, the software module should never fail.

The best correlation for a load failure in software is data boundary testing, which is probably why it is the most productive test technique used today. But even if a data value falls outside the expected range, the software should still process it as an exception or error. As software systems become more and more robust, it becomes harder and harder to force a load failure. Where a tester used to be able to cause telephone calls to be dropped by overloading a telephone switching computer with too many simultaneous calls, now in most situations, the callers whose calls cannot be routed immediately hear an announcement to that effect.

Normally my goal as a tester is not to "break the product." My goal is to perform verification and validation on the product. My job isn't just to test, to verify that it does what it is supposed to do; an automation tool can be trained to do that. I must also determine if it's doing the right things in the right way.

Judging the merit of something must be accomplished by weighing its positive aspects against its negative aspects. When testers get too intent on finding bugs, they can lose track of how well the system works as a whole. They can lose sight of the goal of satisfying and delighting the customer. I have seen too many test efforts that found lots of bugs but never tested the real-life scenarios that mattered to the customers, where the testers didn't even know what those scenarios were.

I don't agree with the people who say that human testers will try to "not find bugs." Every tester I know loves to find bugs; we relish them. We show them off to each other: "Hey wanna see something really cool? I can make the display monitor turn into lots of little flashing squares!" The real issue is how we handle the communications about those bugs between testers and developers and between developers and testers.

The adversarial approach that promotes aggressive behavior does not provide satisfactory results in today's mixed male, female, and multicultural work place. A team approach does provide good results while maintaining good morale in the workplace. After all, we are all on the same team; our real adversaries are the bugs.

Persuasion is an art. One of the chief tools of persuasion is argument. The word argument has a negative connotation, being linked to confrontation and adversarial situations, but it is part of the definition of the word validation. The answer to the question "Does the product do the right thing?" requires subjective judgment. Persuasion will be accomplished by argument. The quality of the tester's argument is determined by how successful it is in convincing others of its merit. The best case or most convincing argument is made through objective measurements, but measurements alone are not always sufficient to make a successful argument.