Fault-tree analysis produces a graphic display of an event that shows the contributing factors to each event. It is primarily used to analyze complex problems with several causes. We present here a simplified version of the method as described in Root Cause Analysis: A Tool for Total Quality Management by P. F. Wilson, L. D. Dell, and G. F. Anderson [WDA93].
This formalism is an effective way to analyze problems that involve multiple processes, multiple users, multiple platforms, and sequences of events that occur over longer periods of time. It is probably overkill for problems that involve only a single process, single user, single platform, and very short periods of time.
The following symbols are used to construct the cause-and-event chart:
Rectangle: Rectangles symbolize events or causal factors. Place the description of the event inside the rectangle.
Semicircle: Semicircles symbolize a logical AND connection. All of the conditions that feed it must be true for this part of the tree to be a cause of the problem. This symbol is used to represent AND gates in electronics.
Tri-arc: Polygons created from three arcs of equal length symbolize a logical OR connection. If any of the conditions that feed it are true, then this part of the tree could be a cause of the problem. This symbol is used to represent OR gates in electronics.
Triangle: Triangles represent sections of the fault tree that have been described separately. These can occur because they’re repeated or because of space requirements. Place an identifier in the triangle.
Ellipse: Ellipses symbolize conditions or constraints on a logical connection. Place the description of the condition inside the ellipse.
Diamond: Diamonds symbolize a terminal event whose cause hasn’t been determined. Place the description of the event inside the ellipse.
As with cause-and-event charts, describe all events with a simple sentence that contains an active verb; describe conditions with a simple sentence that contains a being verb.
Begin constructing the fault tree by putting the defect symptom in an event rectangle. Connect this rectangle to a logical OR connector. The inputs to this OR should reflect the independent centers of control in the system. The following sets of control centers are useful:
User, client, server
Several clients and a server
Continue constructing the fault tree by adding in causes that are relevant to each particular type of control center. Consult the earlier parts of this chapter for relevant lists of causes.
Continue branching the tree by putting conditions or causal factors under the relevant nodes of the tree. Where all of the conditions or causal factors are required for an event to happen, connect them with an AND connector. Where any one of them is sufficient for an event to happen, connect them with an OR connector.
Once you have constructed the tree, you must validate it. You do this by following each path through the tree and evaluating whether that path fits the facts known about the defect. Start at each of the leaves, and trace a path to the root. Consider whether the combination of conditions and causes attributed to the particular control center would be sufficient to cause the observed symptoms. If a subset of the path seems to recur as a possibility through several control centers, this is a good candidate for a root-cause explanation.