It is undisputed that measurement is crucial to the progress of all sciences. Scientific progress is made through observations and generalizations based on data and measurements, the derivation of theories as a result, and in turn the confirmation or refutation of theories via hypothesis testing based on further empirical data. As an example, consider the proposition "the more rigorously the front end of the software development process is executed, the better the quality at the back end." To confirm or refute this proposition, we first need to define the key concepts. For example, we define "the software development process" and distinguish the process steps and activities of the front end from those of the back end. Assume that after the requirements-gathering process, our development process consists of the following phases:
Integration is the development phase during which various parts and components are integrated to form one complete software product. Usually after integration the product is under formal change control. Specifically, after integration every change of the software must have a specific reason (e.g., to fix a bug uncovered during testing) and must be documented and tracked. Therefore, we may want to use integration as the cutoff point: The design, coding, debugging, and integration phases are classified as the front end of the development process and the formal machine testing and early customer trials constitute the back end.
We then define rigorous implementation both in the general sense and in specific terms as they relate to the front end of the development process. Assuming the development process has been formally documented, we may define rigorous implementation as total adherence to the process: Whatever is described in the process documentation that needs to be executed, we execute. However, this general definition is not sufficient for our purpose, which is to gather data to test our proposition. We need to specify the indicator(s) of the definition and to make it (them) operational. For example, suppose the process documentation says all designs and code should be inspected. One operational definition of rigorous implementation may be inspection coverage expressed in terms of the percentage of the estimated lines of code (LOC) or of the function points (FP) that are actually inspected. Another indicator of good reviews and inspections could be the scoring of each inspection by the inspectors at the end of the inspection, based on a set of criteria. We may want to operationally use a five-point Likert scale to denote the degree of effectiveness (e.g., 5 = very effective, 4 = effective, 3 = somewhat effective, 2 = not effective, 1 = poor inspection). There may also be other indicators.
In addition to design, design reviews, code implementation, and code inspections, development testing is part of our definition of the front end of the development process. We also need to operationally define "rigorous execution" of this test. Two indicators that could be used are the percent coverage in terms of instructions executed (as measured by some test coverage measurement tools) and the defect rate expressed in terms of number of defects removed per thousand lines of source code (KLOC) or per function point.
Likewise, we need to operationally define "quality at the back end" and decide which measurement indicators to use. For the sake of simplicity let us use defects found per KLOC (or defects per function point) during formal machine testing as the indicator of back-end quality. From these metrics, we can formulate several testable hypotheses such as the following:
With the hypotheses formulated, we can set out to gather data and test the hypotheses. We also need to determine the unit of analysis for our measurement and data. In this case, it could be at the project level or at the component level of a large project. If we are able to collect a number of data points that form a reasonable sample size (e.g., 35 projects or components), we can perform statistical analysis to test the hypotheses. We can classify projects or components into several groups according to the independent variable of each hypothesis, then compare the outcome of the dependent variable (defect rate during formal machine testing) across the groups. We can conduct simple correlation analysis, or we can perform more sophisticated statistical analyses. If the hypotheses are substantiated by the data, we confirm the proposition. If they are rejected, we refute the proposition. If we have doubts or unanswered questions during the process (e.g., Are our indicators valid? Are our data reliable? Are there other variables we need to control when we conduct the analysis for hypothesis testing?), then perhaps more research is needed. However, if the hypothesis(es) or the proposition is confirmed, we can use the knowledge thus gained and act accordingly to improve our software development quality.
The example demonstrates the importance of measurement and data. Measurement and data really drive the progress of science and engineering. Without empirical verification by data and measurement, theories and propositions remain abstract. The example also illustrates that from theory to testable hypothesis, and likewise from concepts to measurement, there are several steps with levels of abstraction. Simply put, a theory consists of one or more propositional statements that describe the relationships among concepts ”usually expressed in terms of cause and effect. From each proposition, one or more empirical hypotheses can be derived. The concepts are then formally defined and operationalized. The operationalization process produces metrics and indicators for which data can be collected. The hypotheses thus can be tested empirically. A hierarchy from theory to hypothesis and from concept to measurement indicators is illustrated in Figure 3.1.
Figure 3.1. Abstraction Hierarchy
The building blocks of theory are concepts and definitions. In a theoretical definition a concept is defined in terms of other concepts that are already well understood . In the deductive logic system, certain concepts would be taken as undefined; they are the primitives. All other concepts would be defined in terms of the primitive concepts. For example, the concepts of point and line may be used as undefined and the concepts of triangle or rectangle can then be defined based on these primitives.
Operational definitions, in contrast, are definitions that spell out the metrics and the procedures to be used to obtain data. An operational definition of "body weight" would indicate how the weight of a person is to be measured, the instrument to be used, and the measurement unit to record the results. An operational definition of "software product defect rate" would indicate the formula for defect rate, the defect to be measured (numerator), the denominator (e.g., lines of code count, function point), how to measure, and so forth.
What Is Software Quality?
Software Development Process Models
Fundamentals of Measurement Theory
Software Quality Metrics Overview
Applying the Seven Basic Quality Tools in Software Development
Defect Removal Effectiveness
The Rayleigh Model
Exponential Distribution and Reliability Growth Models
Quality Management Models
In-Process Metrics for Software Testing
Complexity Metrics and Models
Metrics and Lessons Learned for Object-Oriented Projects
Measuring and Analyzing Customer Satisfaction
Conducting In-Process Quality Assessments
Conducting Software Project Assessments
Dos and Donts of Software Process Improvement
Using Function Point Metrics to Measure Software Process Improvements
A Project Assessment Questionnaire