Skip to content

Design of high-reliability hardware

For high reliability design, we highly advise following two concurrent approaches. A qualitative approach is intended to reduce systematic failures due to design errors, misinterpretations of requirements, and general human errors. Concurrently, a quantitative approach should be employed to assess compliance with reliability specifications in order to reduce the occurrence of random failures. The qualitative approach involves a simple quality assurance plan, or in other words, a development cycle plan.

Development Plan

For high reliability development, we recommend following this generic development cycle, which should be adapted to meet your specific requirements. The cycle below guides you through the major steps that we believe should be present for high reliability PCB hardware designs.

The steps below focus on electronics board design. If your design includes gateware and software, you should adapt the cycle to your specifics. Also note that some of the reports mentioned in this procedure can be combined into a single document.

  1. Kick Off Meeting. It is the official start of the project. A kick-off review should be organised in order to ensure that all requirements and all necessary information are available. At this stage, the team shall describe all project management aspects including:

    • Risk management,
    • Quality management,
    • Obsolescence management,
    • Operation and maintenance management,
    • Configuration management if needed

    It is also recommended to have an RVM (Requirement Verification Matrix) used to:

    • Identify the intended requirement compliance with the specification,
    • Identify how each requirement will be demonstrated,
    • Identify the risk level of each requirement.
  2. Preliminary Design Review (PDR). A PDR will take place in order to validate the initial design and the configuration of these prototypes. The preliminary design should be documented with the following deliverables:

    • Functional Specification Document
    • Validation Plan or an Acceptance Test Procedure
    • Some preliminary design files such as:
      • Pre Bill of Materials
      • Additional initial studies such as EMC, Thermal analysis, Safety and testability.
  3. Conceptual Design Review (CDR). After a successful PDR, the next phase shall start based on the solutions showed and agreed at PDR. For the Conceptual Design Review, the design team should release:

    • Specification files including:
      • Function Analysis,
      • Architecture,
      • Technical Specification,
      • Draft schematics of your board,
      • Interface Communication Document (ICD).
    • Preliminary design files:
      • RAMS analysis (see section below)
      • EMI/EMC studies
      • Thermal studies
      • Operation and maintenance documentation
    • Preliminary implementation file:
      • Bill of Materials
    • Update the:
      • Requirement Verification Matrix
      • Risk analysis
  4. Final Design Review. At the final design review, the development team shall present all results of its preliminary design including mechanical, electrical, electronic and RAMS. The risk of each requirement shall be assessed. If required testing activities shall be conducted in order to provide enough confidence in the proposed design. These designs and configuration are frozen and the designers shall not undertake any modification without consensus. At this stage, we recommend having the following:

    • Final design investigation (including RAMS and Operation and Maintenance documentation),
    • Design files including:
      • Drawings, final schematics, 3D models, Interface Communication Document (ICD),
      • Design Justification Document
      • Bill of Materials / Design Tree / Configuration,
    • Test documentation including:
      • Acceptance and Qualification Test Procedures.
    • Update documentation such as:
      • Requirement Verification Matrix
      • Risk analysis and reduction plan
  5. Test Readiness Review. The test readiness review will take place in order to ensure that the produced prototypes are compliant with the requirements and ready to undergo the qualification at system level. The prototypes used for testing shall be fully compliant with the configuration validated at CDR and FDR. At this stage we recommend having the following:

    • Complete definition file,
    • Acceptance Test Reports.

    If a test fails, a comprehensive report detailing the issue must be prepared. A failure analysis should be carried out, and a Failure Analysis Report must be issued. Should a design modification be necessary, it must be thoroughly documented, with all associated impacts clearly outlined. Additionally, any rework must be reported prior to its commencement.

  6. Product Qualification Review. After all qualification tests performance and reporting, a PQR shall be held in order to validate the qualification and compliance of the design with the initial requirements. After a successful PQR, the complete design will be qualified. Therefore, the product definition and all technical files are frozen. Technical files must have been updated in case of failure of the product during qualification testing. No change can be accepted after the Product Qualification Review.

Reliability, Availability, Maintainability and Safety (RAMS) Requirements

The following tasks describe the Reliability, Availability, Maintainability and Safety activities to be conducted in concurrent engineering mainly with the conception and development phases. It is a quantitative approach that ensures a rigorous dependability study throughout the development process.

Step 1: Understanding your Reliability Requirements Your hazard and risk analysis determines the essential reliability and safety measures required to mitigate risks associated with electronic systems.

Step 2: Probability of Dangerous Failure per Hour (PFH) Calculation Calculate the PFH, which measures the likelihood of a dangerous failure occurring within an hour of operation. This step is crucial for assessing the reliability of your system under continuous use.

Step 3: Evaluating Architectural Constraints Next, evaluate the architectural constraints by:

  • Calculating the Safe Failure Fraction (SFF): This calculation helps determine the proportion of safe failures to the total number of failures.
  • Considering Hardware Fault Tolerance (HFT): Assess the system's ability to continue functioning correctly in the event of hardware faults.

Step 4: Failure Rate Prediction Use the FIDES/others standard for predicting the failure rate. This methodology provides an evidence-based forecast of potential failures, aiding in understanding the system's reliability.

Step 5: Conducting FMEDA and FTA

  • Failure Modes, Effects and Diagnostic Analysis (FMEDA): Perform FMEDA to identify potential failure modes and their impact on the operation of your system.
  • Fault Tree Analysis (FTA): Use FTA to map out the pathways that could lead to system failures, providing a graphical representation of the combinations of component failures that could result in system-level failures.

Step 6: Prototype Testing and Validation Test the final qualification prototype to validate that the hardware meets reliability requirements. This step verifies that the safety measures and reliability calculations accurately reflect the operational reality of the system.

Step 7: Maintenance Plan Develop a comprehensive maintenance plan that ensures the reliability and availability of the system are maintained throughout its operational life. This includes scheduling regular maintenance activities, defining preventive and corrective actions, and establishing procedures for monitoring the system's performance over time.

If you would like to see more detail and an example of use of this methodology, you can read this article about its use for the radiation monitoring safety system at CERN.