Current Metrics and Models Technology | Design for Trustworthy Software: Tools, Techniques, and Methodology of Developing Robust Software

The best treatment of current software metrics and models is Software Measurement: A Visualization Toolkit for Project Control and Process Measurement,^[12] by Simmons, Ellis, Fujihara, and Kuo. It comes with a CD-ROM that contains the Project Attribute Monitoring and Prediction Associate (PAMPA) measurement and analysis software tools. The book begins with Halstead's software science from 1977 and then brings the field up to date to 1997, technologically updating the metrics and models by including later research and experience. The updated metrics are grouped by size, effort, development time, productivity, quality, reliability, verification, and usability.

Size metrics begin with Halstead's volume, now measured in source lines of code (SLOC), and add structure as the number of unconditional branches of control loop nesting and module fan-in and fan-out. The newly added rework attributes describe the size of additions, deletions, and changes made between versions. Combined, they measure the turmoil in the developing product. The authors have also added a new measure of code functionality smaller than the program or module, called a chunk. It is a single integral piece of code, such as a function, subroutine, script, macro, procedure, object, or method. Volume measures are now made on functionally distinct chunks, rather than larger-scale aggregates such as programs or components. Tools are provided that allow the designer to aggregate chunks into larger units and even predict the number of function points or object points. Furthermore, because most software products are not developed from scratch but rather reuse existing code chunks with known quality characteristics, the toolkit allows the prediction of equivalent volume using one of four different algorithms (or all four, if desired) taken from recent software science literature. A new volume measure called unique SLOC has been added. It evaluates new LOC on a per-chunk basis and can calculate unique SLOC for a developing version of the product.

Naturally, volume measures are the major input for effort metrics. Recent research adds five categories of 17 different dominators, which can have serious effort-magnifying effects. The categories into which dominators fall are project, product, organization, suppliers, and customers. For example, potential dominators in the product category include the amount of documentation needed, programming language, complexity, and type of application. In the organization category, they include the number of people, communications, and personnel turnover. Customer includes user interface complexity and requirements volatility, which are negative, but then the dominators are all basically negative. Their name signifies that their presence may have an effort-expansion effect as large as a factor of 10. But when their influence is favorable, they generally have a much smaller positive effect. A range of effort prediction and cost forecasting algorithms based on a variety of theoretical, historical/experiential, statistical, and even composite models are provided.

The third measure category is development time, which is derived from effort, which is derived from size or volume. The only independent new variable here is schedule. Given the resources available to the project manager, the toolkit calculates overall minimum development time and then allows the user to vary or reallocate resources to do more tasks in parallel. However, the system very realistically warns of cost runaways if the user tries to reduce development time by more than 15% of the forecast minimum.

Because effort is essentially volume divided by productivity, you can see that productivity is inversely related to effort. A new set of cost drivers enters as independent variables, unfortunately having mostly negative influences. When cost drivers begin to vary significantly from nominal values, you should take action to bring them back into acceptable ranges. A productivity forecast provides the natural objective function with which to do this.

The quality metrics advocated in Simmons, et al. are dependent on the last three metric sets: reliability, verification, and usability. Usability is a product's fitness for use. This metric depends on the product's intended features, their verified functionality, and their reliability in use. Simply stated, this metric means that all promises were fulfilled, no negative consequences were encountered, and the customer was delighted. This deceptively simple trio masks evaluation of multiple subjective psychometric evaluations plus a few performance-based factors such as learnability, relearnability, and efficiency. Much has been written about measures of these factors. To sell software, vendors develop and add more features. New features contain unique SLOCs, and new code means new opportunities to introduce bugs. As might be expected, a large measure of customer dissatisfaction is the result of new features that don't work, whether due to actual defects or merely user expectations. The only thing in the world increasing faster than computer performance is end-user expectations: A product whose features cannot be validated, or that is delivered late or at a higher-than-expected price, has a quality problem. Feature validation demands that features be clearly described without possible misunderstanding and that metrics for their measurement be identified.

The last point in the quality triangle is reliability, which may be defined as defect potential, defect removal efficiency, and delivered defects. The largest opportunity for software defects to occur is in the interfaces between modules, programs, and components, and with databases. Although the number of interfaces in an application is proportional to the program's size, it varies by application type, programming language, style, and many other factors. One estimate indicates that 70% or more of software reliability problems are in interfaces. Aside from the occurrence of errors or defects, and their number (if any), the major metric for quality is the mean time between their occurrence. Whether you record time to failure, time intervals between failures, cumulative failures in a given time period, or failures experienced in a given time interval, the basic metric of reliability is time.

Chapter 2 defined software quality as the degree to which a system, component, or process meets specified requirements, and customer or user needs and requirements. It also introduced software dependability, which includes reliability, safety, security, and availability. Our definitions incorporate Kan's multifactor software quality definition^[13] and our definition of trustworthy software. The latter includes the ability to meet customer trust as well as unstated and even unanticipated needs, including the emphasis by Simmons, et al. on features.^[14]