The purpose of performing a reliability test is to answer the question, "Does the item meet or exceed the specified minimum reliability requirement?" Reliability testing is used to:
Determine whether the system conforms to the specified, quantitative reliability requirements
Evaluate the system's expected performance in the warranty period and its compliance to the useful life targets as defined by corporate policy
Compare performance of the system to the goal that was established earlier
Monitor and validate reliability growth
Determine design actions based on the outcomes of the test
In addition to their other uses, the outcomes of reliability testing are used as a basis for design qualification and acceptance. Reliability testing should be a natural extension of the analytical reliability models, so that test results will clarify and verify the predicted results, in the customer's environment.
A reliability test is effectively a "sampling" test in that it involves a sample of objects selected from a "population." From the sample data, some statement(s) are made about the population parameter(s). In reliability testing, as in any sampling test:
The sample is assumed to be representative of the population.
The characteristics of the sample (e.g., sample mean) are assumed to be an estimate of the true value of the population characteristics (e.g., population mean).
A key factor in reliability test planning is choosing the proper sample size. Most of the activity in determining sample size is involved with either:
Achieving the desired confidence that the test results give the correct information
Reducing the risk that the test results will give the wrong information
Prior to the time that hardware is available, simulation and analysis should be used to find design weaknesses. Reliability testing should begin as soon as hardware is available for testing. Ideally, much of the reliability testing will occur "on the bench" with the testing of individual components . There is good reason for this: The effect of failure on schedule and cost increases progressively with the program timeline. The later in the process that the failure and corrective action are found, the more it costs to correct and the less time there is to make the correction. Some key points to remember regarding test planning:
Develop the reliability test plan early in the design phase.
Update the plan as requirements are added.
Run the formal reliability testing according to the predetermined procedure. This is to ensure that results are not contaminated by development testing or procedural issues.
Develop the test plan in order to get the maximum information with the fewest resources possible.
Increase test efficiency by understanding stress/strength and acceleration factor relationships. This may require accelerated testing, such as AST (Accelerated Stress Test), which will increase the information gained from a test program.
Make sure your test plan shows the relationship between development testing and reliability testing. While all data contribute to the overall knowledge about a system, other functional development testing is an opportunity to gain insight into the reliability performance of your product.
Note | A "control sample" should be maintained as a reference throughout the reliability testing process. Control samples should not be subjected to any stresses other than the normal parametric and functional testing. |
When preparing the test plan, keep these objectives in mind:
Test with regard to production intent. Make sure the sample that is tested is representative of the system that the customer will receive. This means that the test unit is representative of the final product in all areas including materials (metals, fasteners, weight), processes (machining, casting, heat treat), and procedures (assembly, service, repair). Of course, consider that these elements may change or that they may not be known. However, use the same production intent to the extent known at the time of the test plan.
Determine performance parameters before testing is started. It is often more important in reliability evaluations to monitor the percentage change in a parameter rather than the performance to specification.
Duplicate/simulate the full range of the customer stresses and environments. This includes testing to the 95th percentile customer. (For most organizations this percentile is the default. Make sure you identify what is the exact percentile for your organization.)
Quantify failures as they relate to the system being tested. A failure results when a system does not perform to customer expectations, even if there is no actual broken part.
Remember,
Customer requirements include the specifications and requirements of internal customers and regulatory agencies as well as the ultimate purchaser.
You should structure testing to identify hardware interface issues as they relate to the system being tested.
Sudden-death testing allows you to obtain test data quickly and reduces the number of test fixtures required. It can be used on a sample as large as 40 or more or as small as 15. Sudden-death testing reduces testing time in cases where the lower quartile (lower 25%) of a life distribution is considerably lower than the upper quartile (upper 25%). The philosophy involved in sudden-death testing is to test small groups of samples to a first failure only and use the data to determine the Weibull distribution of the component. The method is as follows :
Choose a sample size that can be divided into three or more groups with the same number of items in each group. Divide the sample into three or more groups of equal size and treat each group as if it were an individual assembly.
Test all items in each group concurrently until there is a first failure in that group. Testing is then stopped on the remaining units in that group as soon as the first unit fails, hence the name "sudden death."
Record the time to first failure in each group.
Rank the times to failure in ascending order.
Assign median ranks to each failure based on the sample size equal to the number of groups. Median rank charts are used for this purpose.
Plot the times to failure vs. median ranks on Weibull paper.
Draw the best fit line. (Eye the line or use the regression model.) This line represents the sudden-death line.
Determine the life at which 50% of the first failures are likely to occur (B 50 life) by drawing a horizontal line from the 50% level to the sudden-death line. Drop a vertical line from this point down.
Find the median rank for the first failure when the sample size is equal to the number of items in each subgroup . Again, refer to the median rank charts. Draw a horizontal line from this point until it intersects the vertical line drawn in the previous step.
Draw a line parallel to the sudden-death line passing through the intersection point from step 9. This line is called the population line and represents the Weibull distribution of the population.
Sudden-death testing is a good method to use to determine the failure distribution of the component. ( Note: Only common failure mechanisms can be used for each Weibull distribution. Care must be taken to determine the true root cause of all failures. Failure must be related to the stresses applied during the test.)
Assume you have a sample of 40 parts from the same production run available for testing purposes. The parts are divided into five groups of eight parts as shown below:
Group 1 | 12345678 |
Group 2 | 12345678 |
Group 3 | 12345678 |
Group 4 | 12345678 |
Group 5 | 12345678 |
All parts in each group are put on test simultaneously . The test proceeds until any one part in each group fails. At that time, testing stops on all parts in that group.
In the test, we experience the following first failures in each group:
Group 1 | Part #3 fails at 120 hours |
Group 2 | Part #4 fails at 65 hours |
Group 3 | Part #1 fails at 155 hours |
Group 4 | Part #5 fails at 300 hours |
Group 5 | Part #7 fails at 200 hours |
Failure data are arranged in ascending hours to failure, and their median ranks are determined based on a sample size of N = 5. (There are five failures, one in each of five groups.) The chart in Table 7.1 illustrates the data. The median rank percentage for each failure is derived from the median rank (Table 7.2) for five samples.
Failure Order Number | Life Hours | Median Ranks, % |
---|---|---|
1 | 65 | 12.95 |
2 | 120 | 31.38 |
3 | 155 | 50.00 |
4 | 200 | 68.62 |
5 | 300 | 87.06 |
Rank Order | Sample size | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
1 | 50.0 | 29.3 | 20.6 | 15.9 | 12.9 | 10.9 | 9.4 | 8.3 | 7.4 | 6.7 |
2 | 70.7 | 50.0 | 38.6 | 31.4 | 26.4 | 22.8 | 20.1 | 18.0 | 16.2 | |
3 | 79.4 | 61.4 | 50.0 | 42.1 | 36.4 | 3G.1 | Z8.6 | 25.9 | ||
4 | 84.1 | 68.6 | 57.9 | 50.0 | 44:0 | 39.3 | 35.5 | |||
5 | 87.1 | 73.9 | 63.6 | 56.0 | 50.0 | 45.2 | ||||
6 | 89.1 | 77.2 | 67.9 | 60.7 | 54.8 | |||||
7 | 90.6 | 79.9 | 71.4 | 64.5 | ||||||
8 | 91.7 | 82.0 | 74.1 | |||||||
9 | 92.6 | 83.8 | ||||||||
10 | 93.3 |
Rank Order | Sample Size | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | |
1 | 6.1 | 5.6 | 5.2 | 4.8 | 4.5 | 4.2 | 4.0 | 3.8 | 3.6 | 3.4 |
2 | 14.8 | 13.6 | 12.6 | 1 1.7 | 10.9 | 10.3 | 9.7 | 9.2 | 8.7 | 8.3 |
3 | 23.6 | 21.7 | 20.0 | 18.6 | 17.4 | 16.4 | 15.4 | 14.6 | 13.8 | 13.1 |
4 | 32.4 | 29.8 | 27.5 | 25.6 | 23.9 | 22.5 | 21.2 | 20.0 | 19.0 | 18.1 |
5 | 41.2 | 37.9 | 35.0 | 32.6 | 30.4 | 28.6 | 26.9 | 25.5 | 24.2 | 23.0 |
6 | 50.0 | 46.0 | 42.5 | 39.5 | 37.0 | 34.7 | 32.7 | 30.9 | 29.3 | 27.9 |
7 | 58.8 | 54.0 | 50.0 | 46.5 | 43.5 | 40.8 | 38.5 | 36.4 | 34.5 | 32.8 |
8 | 67.6 | 62.1 | 57.5 | 53.5 | 50.0 | 46.9 | 44.2 | 41.8 | 39.7 | 37.7 |
9 | 76.4 | 70.2 | 65.0 | 60.5 | 56.5 | 53.1 | 50.0 | 47.3 | 44.8 | 42.6 |
10 | 85.2 | 78.3 | 72.5 | 67.4 | 63.0 | 59.2 | 55.8 | 52.7 | 50.0 | 47.5 |
11 | 93.9 | 86.4 | 80.0 | 74.4 | 69.5 | 65.3 | 61.5 | 58.2 | 55.2 | 52.5 |
12 | 94.4 | 87.4 | 81.4 | 76.1 | 71.4 | 67.3 | 63.6 | 60.3 | 57.4 | |
13 | 94.8 | 88.3 | 82.6 | 77.5 | 73.1 | 69.1 | 65.5 | 62.3 | ||
14 | 95.2 | 89.1 | 83.6 | 78.8 | 74.5 | 70.7 | 67.2 | |||
15 | 95.5 | 89.7 | 84.6 | 80.0 | 75.8 | 72.1 | ||||
16 | 95.8 | 90.3 | 85.4 | 81.0 | 77.0 | |||||
17 | 96.0 | 90.8 | 86.2 | 81.9 | ||||||
18 | 96.2 | 91.3 | 86.9 | |||||||
19 | 96.4 | 91.7 | ||||||||
20 | 96.6 |
If the life hours and median ranks of the five failures are plotted on Weibull paper, the resulting line is called the sudden-death line. The sudden-death line represents the cumulative distribution that would result if five assemblies failed, but it actually represents five measures of the first failure in eight of the population. The median life point on the sudden-death line (point at which 50% of the failures occur) will correspond to the median rank for the first failure in a sample of eight, which is 8.30%. The population line is drawn parallel to the sudden-death line through a point plotted at 8.30% and at the median life to first failure as determined above. This estimate of the population's minimum life is just as reliable as the one that would have been obtained if all 40 parts were tested to failure.
Accelerated testing is another approach that may be used to reduce the total test time required. Accelerated testing requires stressing the product to levels that are more severe than normal. The results that are obtained at the accelerated stress levels are compared to those at the design stress or normal operating conditions. We will look at examples of this comparison during this section.
We use accelerated testing to:
Generate failures, especially in components that have long life under normal conditions
Obtain information that relates to life under normal conditions
Determine design/technology limits of the hardware
Accelerated testing is accomplished by reducing the cycle time, such as by:
Compressing cycle time by reducing or eliminating idle time in the normal operating cycle
Overstressing
There are some pitfalls in using accelerated testing:
Accelerated testing can cause failure modes that are not representative.
If there is little correlation to "real" use, such as aging, thermal cycling, and corrosion, then it will be difficult to determine how accelerated testing affects these types of failure modes.