RELIABILITY AND QUALITY | Six Sigma and Beyond: Design for Six Sigma, Volume VI

Customers and product engineers frequently use the terms reliability and quality interchangeably. Ultimately, the customer defines quality. Customers want products that meet or exceed their needs and expectations, at a cost that represents value. This expectation of performance must be met throughout the customer's expected life for the particular product. Quality is usually recognized as a more encompassing term including reliability. Some quality characteristics are:

Psychological

Taste
Beauty, style
Status

Technological

Hardness
Vibration
Noise
Materials (bearings, belts, hoses, etc.)

Time-oriented

Reliability
Maintainability

Contractual

Warranty

Ethical

Honesty of repair shop
Experience and integrity of sales force

PRODUCT DEFECTS

Quality defects are defined as those that can be located by conventional inspection techniques. ( Note: for legal reasons, it is better to identify these defects as nonconformances .) Reliability defects are defined as those that require some stress applied over time to develop into detectable defects.

What causes product failure over time? Some possibilities are:

Design
Manufacturing
Packaging
Shipping
Storage
Sales
Installation
Maintenance
Customer duty cycle

CUSTOMER SATISFACTION

The ultimate goal of a product is to satisfy a customer from all aspects of cost, performance, reliability, and maintainability. The customer trades off these parameters when making a decision to buy a product. Assuming that we are designing a product for a certain market segment, cost is determined within limits. The tradeoffs are as follows :

Performance parameters are the designed-in system capabilities such as acceleration, top speed, rate of metal removal, gain, ability to carry a 5-ton payload up a 40 degree grade without overheating , and so on.
The reliability of equipment expresses the length of failure-free time that can be expected from the equipment. Higher levels of reliability mean less failure of the equipment and consequently less downtime and loss of use. Although we will attach reliability numbers to products, it should be recognized that the customer's perspective interprets reliability as the ability of a product to perform its intended function for a given period of time without failure. This concept of failure-free operation is becoming more and more fixed in the mind of the customer. This is true whether the customer is purchasing an automobile, a machine tool, a computer system, a refrigerator, or an automatic coffee maker.
Maintainability is defined as the probability that a failed system is restored to operable condition in a specified amount of downtime.
Availability is the probability that at any time, the system is either operating satisfactorily or is ready to be operated on demand, when used under stated conditions. The availability might also be looked at as the ability of equipment, under combined aspects of its reliability, maintainability, and maintenance support, to perform its required function at a stated instant of time. This availability includes the built-in equipment features as well as the maintenance support function. Availability combines reliability and maintainability into one measure. There are different kinds of availability that are calculated in different ways ” see Von Alven (1964) and ANSI/IEEE (1988). The most popular availabilities are achieved availability and inherent availability.
1. Achieved availability includes all diagnostic, repair, administrative, and logistic times. This availability is dependent on the maintenance support system. Achieved availability can be calculated as
  
  A = Operating Time/(Operating Time + Unscheduled Time)
2. Inherent availability only includes operating time and active repair time addressing the built-in capabilities of the equipment. Inherent availability is calculated as
  
  A =
  
  where MTTR = mean time-to-repair and the MTTR is for the active repair time.
Active repair time is that portion of downtime when the technicians are working on the system to repair the failure situation. It must be understood that the different availabilities are defined for various time-states of the system.
Serviceability is the ease with which machinery and equipment can be repaired. Here repair includes diagnosis of the fault, replacement of the necessary parts , tryout, and bringing the equipment back on line. Serviceability is somewhat qualitative and addresses the ease by which the equipment, as designed, can be diagnosed and repaired. It involves factors such as accessibility to test points, ease of removal of the failed components , and ease of bringing the system back on line.

PRODUCT LIFE AND FAILURE RATE

Let us assume that we have released a population of products to the marketplace . The failure rate is observed as the products age. The shape of the failure rate is referred to as a bathtub curve (see Figure 7.1). Here we have overemphasized the different parts of the curve for illustration.

Figure 7.1: Bathtub curve.

This bathtub curve has three distinct regions :

Infant mortality period: During the infant mortality period the population exhibits a high failure rate, decreasing rapidly as the weaker products fail. Some manufacturers provide a "burn-in" period for their products to help eliminate infant mortality failures. Generally, infant mortality is associated with manufacturing issues. Examples are:
- Poor welds
- Contamination
- Improper installation
- And so on
Useful life period: During this period the population of products exhibits a relatively low and constant failure rate. It is explained using the stress - strength inference model for reliability. This model identifies the stress distribution that represents the combined stressors acting on a system at some point in time. The strength distribution represents the piece-to-piece variability of components in the field. The inference area is indicative of a potential failure when stresses exceed the strength of a component. In other words, any failure in this period is a factor of the designed-in reliability. Examples are:
- Low safety factors
- Abuse
- Misapplication
- Product variability
- And so on
Wear out period: At the onset of wear out, the failure rate starts to increase rapidly. When the failure rate becomes high, replacement or major repair must be performed if the product is to be left in service. Wear out is due to a number of forces such as:
- Frictional wear
- Chemical change
- Maintenance practices
- Fatigue
- Corrosion or oxidation
- And so on

In conjunction with the bathtub curve there are two more items of concern. The first one is the hazard rate (or the instantaneous failure rate) and the second, the ROCOF plot.

The hazard rate is the probability that the product will fail in the next interval of time (or distance or cycles). It is assumed the product has survived up to that time. For example, there is a one in twenty chance that it will crack, break, bend, or fail to function in the next month. Typically, hazard rate is shown as

where h(t) = hazard rate; f(t) = probability density function [PDF: f(t) = » e- ^{» t} ]; F(t) = cumulative distribution function [CDF: F(t) = 1 - e- ^{» t} ; and R(t) = reliability at time t[R(t) = 1 - F(t) = 1 - (1 - e- ^{» t} ) = e- ^{» t} ].

The Rate of Change of Failure or Rate of Change of Occurrence of Failure (ROCOF), on the other hand, is a visual tool that helps the engineer to analyze situations where a lot of data over time has been accumulated . Essentially, its purpose is the same as that of the reliability bathtub curve, that is, to understand the life stages of a product or process and take the appropriate action. A typical ROCOF plot (for warranty item) will display an early (decreasing rate) and useful life (constant rate) performance. If wear out is detected , it should be investigated. Knowing what is happening to a product from one region of the bathtub curve to the next helps the engineer specify what failed hardware to collect and aids with calibrating the severity of development tests.

If the number of failures is small, the ROCOF plot approach may be difficult to interpret. When that happens, it is recommended that a "smoothing" approach be taken. The typical smoothing methodology is to use log paper for the plotting. Obviously, many more ways and more advanced techniques exist. It must be noted here that most statistical software provides this smoothing as an option for the data under consideration. See Volume III for more details on smoothing.