Safety requirements | Practical Guide to Software Quality Management (Artech House Computing Library)

8.3 Safety requirements

Safety requirements are both obvious and subtle. Some, like the need for a dose limiting section on a radiation system, are obvious. Others, like the rod lifter dangers, are less intuitive. As part of any requirements gathering and analysis effort, the quality practitioner should be involved in trying to anticipate and record any safety issues.

Some safety requirements will be imposed from outside the project. Medical, nuclear, space, traffic (air, rail, sea, and highway), defense, and other agencies have various rules, regulations, standards, guidelines, and required practices that become safety requirements. However, other areas may involve safety issues that are not imposed by another domain.

No standards would have precluded the interaction between the computer and the rod lifter, nor would standards have prevented the alarm error. In those cases, experienced quality practitioners responsible for finding defects were able to look at the situation from a how-might-it-not-be-safe bias rather than the more common this-will-work-just-fine view. In that way, quality practitioners can be of great value in seeking potential safety failures during requirements gathering and analysis and design review. IEEE Standard 730 lays out the required contents of a software quality assurance plan (see Appendix B). Section 15, entitled "Risk Management," describes the quality system procedures that deal with safety issues.

IEEE Standard 830 (see Appendix D) calls for the inclusion of safety requirements in Section 2.5 (constraints, part k, and safety and security considerations).

Finally, IEEE Standard 1228 (see Appendix K) describes the required content for a software safety plan. For safety-critical software, this plan defines the specific roles, resources, responsibilities, and activities to be used in software development.

Requirements for safety typically arise in all parts of the SLC:

Requirements gathering, analysis, and expression (recognize hazards);
Design (discover hazards);
Coding (avoid hazards);
Testing (probe for hazards);
Operation (react to hazards).

8.3.1 Requirements period

During the requirements effort, clear hazards to be encountered or caused by the software are recognized and stated in the requirements documentation. During this period, situations that must not be allowed or must be addressed are stated. This could be called "the software must not cause or allow harm" period. Any known or anticipated hazards will be spelled out in the requirements and the safety plan.

An example from the space shuttle software might be "All computers must be in synchronism prior to engine ignition." In that case, it is recognized that the software must take action to determine whether or not the various computers are in sync.

During the requirements period, quality practitioners find themselves asking the what-if questions. Practitioners will often be testers who are knowledgeable in the concepts of the system or experienced in other safety-critical systems or software.

8.3.2 Design period

As design proceeds, hidden hazards may be exposed and discovered. This might be called the "the software could have caused or allowed harm" period.

Continuing the space shuttle example, it might be discovered that not all computers are linked and capable of being synchronized. While this may be intentional, it might also be discovered that one or more computers must be linked. New or additional software may be needed to accomplish the required linking.

In another example, a design review could determine that, under certain-processing loads, low-priority tasks could be interrupted for higher priority tasks to the extent that the low-priority tasks are never completed and the system will fail.

During the design period, the quality practitioner is a reviewer and challenger of the design against the requirements and continues to ask, what if?

8.3.3 Coding period

Some of the most costly failures of critical and safety-critical software have occurred during the coding period. The Mars lander crash, the space probe that went to the sun, and the weapon test failure were all caused by simple, yet undetected, coding errors. These errors could all have been detected by properly phrased what-if questions. What if the data description is wrong and uses miles versus kilometers? What if the looping is miscoded and creates a variable rather than a loop? What if the priority interrupt rules are violated and stores priority-four data in the priority-one field? Unfortunately, none of those questions were asked.

8.3.4 Testing period

Testing of safety-critical software must not only evaluate the software against its requirements, but must also look for previously unidentified hazards. This is the "the software would have caused or allowed harm" period

Presuming that computers that should be linked have been identified and that those not required to be linked are also known, the quality practitioner should test the linkages and synchronizations and check for missing synchronizations and improper and missing linkages to verify that proper alarms are provided for failure cases.

During the testing period, anomalous situations should be presented to the software to probe for unanticipated situations that could lead to the exposure of additional hazards in the software or system.

8.3.5 Operational period

The operational period is when as-yet unexposed hazards and defenses against them can be found. This is the "the software nearly did or did cause or allow harm" period. It is hoped, but not really expected, that all hazards to be detected, avoided, corrected, reduced, or otherwise treated by the software will be discovered, considered, and managed.

As has happened more than once on the space shuttle program, lack of synchronism among computers can cause launch delays.

The quality assurance practitioner must constantly review problem and anomaly reports, looking for situations that could become hazards, as well as those that are encountered. During the previous periods, the quality practitioner concentrated on quality control, or defect detection and correction, in the software and system. While quality control continues in the operational period, as well as the quality assurance practitioner's defect prevention activities, the emphasis has clearly shifted. The operational period is the source of the majority lessons learned that can be applied to subsequent system and software projects.