Design Failure Mode & Effects Analysis
Analysis (DFMEA)
When first envisioned, Design Failure Mode and Effects Analysis (DFMEA) considered potential failure modes and their causes. It was first used in rocket science. Initially, the rocket development process in the 1950s did not go well. The complexity and difficulty of the task resulted in many catastrophic failures.
Root Cause Analysis (RCA) was used to investigate these failures but had inconclusive results. Rocket failures are often explosive with no evidence of the root cause remaining. Design FMEA provided the rocket scientists with a platform to prevent failure.
A similar platform is used today in many industries to identify risks, take countermeasures and prevent failures. DFMEA has had a profound impact, improving the safety and performance of products we use every day.
What is Design Failure Mode and Effects Analysis (DFMEA)
DFMEA is a methodical approach used for identifying potential risks introduced in a new or changed design of a product/service. The Design FMEA initially identifies design functions, failure modes and their effects on the customer with corresponding severity ranking / danger of the effect.
Then, causes and their mechanisms of the failure mode are identified. High probability causes, indicated by the occurrence ranking, may drive action to prevent or reduce the cause’s impact on the failure mode.
The detection ranking highlights the ability of specific tests to confirm the failure mode/causes are eliminated. The DFMEA also tracks improvements through Risk Priority Number (RPN) reductions. By comparing the before and after RPN, a history of improvement and risk mitigation can be chronicled.
Why Perform Design Failure Mode and Effects Analysis (DFMEA)
Risk is the substitute for failure on new / changed designs. It is a good practice to identify risks on a program as early as possible. Early risk identification provides the greatest opportunity for verified mitigation prior to program launch.
Risks are identified on designs, which if left unattended, could result in failure. The DFMEA is applied when:
- There is a new design with new content
- There is a current design with modifications, which also may include changes due to past failure
- There is a current design being used in a new environment or change in duty cycle (no physical change made to design)
How to Perform Design Failure Mode and Effects Analysis (DFMEA)
There are five primary sections of the Design FMEA. Each section has a distinct purpose and a different focus. The DFMEA is completed in sections at different times within the design timeline of the project, not all at once. The Design FMEA form is completed in the following sequence:
DFMEA Section 1
Item / Function
The Item / Function column permits the Design Engineer (DE) to describe the item that is being analyzed. The item can be a complete system, subsystem or component. The function is the “Verb – Noun” that describes what the item does. There may be many functions for any one item.
Requirement
The requirements, or measurements, of the function, are described in the second column. The requirements are either provided by a document or are converted from a process known as Quality Function Deployment (QFD). The requirement must be measurable and should have test methods defined. If requirements are poorly written or nonexistent, design work may be wasted. The first opportunity for recommended action may be to investigate and clarify the requirements to prevent wasted design activity.
Failure Mode
Failure Modes are the anti-functions or requirements not being met. There are 5 types of Failure Modes:
- Full Failure
- Partial Failure
- Intermittent Failure
- Degraded Failure
- Unintentional Failure
Effects of Failure
The effects of a failure on multiple customers are listed in this column. Many effects could be possible for any one failure mode. All effects should appear in the same cell or grouped next to the corresponding failure mode.
Severity
The Severity of each effect is selected based on the impact or danger to the end user / customer. The severity ranking is typically between 1 through 10 where:
- 2-4: Annoyance or squeak and rattle; visual defects which do not affect function
- 5-6: Degradation or loss of a secondary function of the item studied
- 7-8: Degradation or loss of the primary function of the item studied
- 9-10: Regulatory and / or Safety implications
The highest severity is chosen from the many potential effects and placed in the Severity Column. Actions may be identified to change the design direction on any failure mode with an effect of failure ranked 9 or 10. If a recommended action is identified, it is placed in the Recommended Actions column of the DFMEA.
Classification
Classification refers to the type of characteristics indicated by the risk. Many types of special characteristics exist in different industries.
These special characteristics typically require additional work, either design error proofing, process error proofing, process variation reduction (optimized Cpk), or mistake-proofing. The Classification column designates where the characteristics may be identified for Process FMEA Collaboration.
DFMEA Section 2
Potential Causes / Mechanisms of Failure
Causes are defined for the Failure Mode. The causes should be determined at the physics-level. The causes at a component level can be related to the material properties, geometry, dimensions, interfaces with other components and other energies which could inhibit the function.
These can be derived from pre-work documents such as Boundary (or Block) Diagrams, Parameter (P) Diagrams and Interface Analysis. Causes at the system level are cascaded as failure modes in more detailed analysis.
Geometry and dimensions are cascaded (waterfall) into special characteristics, which can be transferred to the Process FMEA. Use of words like bad, poor, defective, and failed should be avoided as they do not define the cause with enough detail to make risk calculations for mitigation.
Examples of causes are:
- Material properties (inadequate strength, lubricity, viscosity, etc.)
- Material geometry (inadequate position, flatness, parallelism, etc.)
- Tolerances or stack ups
- Interfaces with mating components
- Physical attachment / clearance
- Energy transfers (heat vibration, peak loads, etc.)
- Material flow or exchange (gas, liquid)
- Data exchanges (signals, commands, timing, etc.)
Current Design Controls Prevention
The prevention strategy used by an engineering team when planning / completing a design has the benefit of lowering occurrence or probability. The stronger the prevention, the more evidence the potential cause can be eliminated by design.
The use of verified design standards, proven technology (with similar stresses applied), and computer-aided engineering (CAE) are typical Prevention Controls.
Occurrence
The Occurrence ranking is an estimate based on known or lack of data. Occurrence Rankings follow the logic below:
- 1: Prevented causes due to using a known design standard
- 2: Identical or similar design with no history of failure
- This ranking is often used improperly. The stresses in the new application and a sufficient sample of products to gain history are required to select this ranking value.
- 3-4: Isolated failures
- Some confusion may occur when trying to quantify “isolated”
- 5-6: Occasional failures have been experienced in the field or in development / verification testing
- 7-9: New design with no history (based on a current technology)
- 10: New design with no experience with technology
Actions may be directed against causes of failure with a high occurrence. Special attention must be placed on items with a Severity of 9 or 10. These severity rankings must be examined to assure that due diligence has been satisfied.
DFMEA Section 3
Current Design Controls Detection
The activities conducted to verify design safety and performance are placed in the Current Design Controls Detection column. The tests and evaluations intended to prove the design is capable are aligned to the causes and failure modes identified with the highest risks.
Specific tests must be identified when risks are in the highest severity range (9-10) or the high criticality, non-safety combinations. Examples of Design Controls Detection are:
- Design Reviews
- Verification Test Methods
- Bogey Test to 1 Life
- Test to Failure
- Degradation Testing
Detection Rankings
Detection Rankings are assigned to each test based on the type of test / evaluation technique with respect to the time it is performed. It is ideal to perform tests (on high risk items) as early in the design process as is possible.
Testing after tools are completed is called Product Validation (PV) and is used to supplement Design Verification (DV) tests. PV tests may be used to save test time and resources on low risk items.
There is often more than one test/evaluation technique per Cause-Failure Mode combination. Listing all in one cell and applying a detection ranking for each is the best practice. The lowest of the detection rankings is then placed in the detection column.
Typical Detection Rankings can be found below:
- 1: Failure prevented through Design Solution, Design Standard, Standard Materials, etc.
- 2: Use of Computer Aided Engineering (CAE) highly correlated to real world user/stress profiles
- 3: Test to Failure with measurement of output tracking degradation (performed before Design Freeze (DV))
- 4: Test to Failure (DV)
- 5: Bogey Test, test to pass to 1 life and suspend the test (DV)
- 6: Test to Failure with measurement of output tracking degradation (performed after Design Freeze (PV))
- 7: Test to Failure (PV)
- 8: Bogey Test, test to pass to 1 life and suspend the test (PV)
- 9: Use of CAE, but not yet correlated to real world stress profiles
- 10: Cannot evaluate, no test available or current tests do not excite the cause / failure mode
Actions may be necessary to improve testing capability. The test improvement will address the weakness in the test strategy. The actions are placed in the Recommended Actions Column.
DFMEA Section 4
Risk Priority Number (RPN)
The Risk Priority Number (RPN) is the product of the three previously selected rankings, Severity * Occurrence * Detection. RPN thresholds must not be used to determine the need for action. RPN thresholds are not permitted mainly due to two factors:
- Poor behavior by design engineers trying to get below the specified threshold
- This behavior does not improve or address risk. There is no RPN value above which an action should be taken or below which a team is excused of one.
- “Relative Risk” is not always represented by RPN
Recommended Actions
The Recommended Actions column is the location within the Design FMEA that all potential improvements are placed. Completed actions are the purpose of the DFMEA. Actions must be detailed enough that it makes sense if it stood alone in a risk register or actions list.
Actions are directed against one of the rankings previously assigned. The objectives are as follows:
- Eliminate Failure Modes with a Severity 9 or 10
- Lower Occurrence on Causes by error proofing, reducing variation or mistake proofing
- Lower Detection on specific test improvements
Responsibility and Target Completion Date
Enter the name and date that the action should be completed by. A milestone name can substitute for a date if a timeline shows the linkage between date and selected milestone.
DFMEA Section 5
Actions Taken and Completion Date
List the Actions Taken or reference the test report which indicates the results. The Design FMEA should result in actions which bring higher risks items to an acceptable level of risk. It is important to note that acceptable risk is desirable and mitigation of high risk to lower risk is the primary goal.
Re-Rank RPN
The new (re-ranked) RPN should be compared with the original RPN. A reduction in this value is desirable. The residual risk may still be too high after actions have been taken. If this is the case, a new action line would be developed. This is repeated until an acceptable residual risk has been obtained.