The guide targets various failure modes due to various hazards, contributing to many incidences. The guide claims to avoid many of the failure modes, by providing guidelines to defend against the hazards.
Studies about the sources of critical accidents in operating human-made systems indicate that most of them are commonly attributed to errors made by the human operators.
Common studies on resilience
engineering focus on
analysis of
incidences, stressing the
management role and responsibility in preventing
misfortune by safety culture (e.g.,
Hollnagel et al., 2006 ). In contrast, this guide
focuses on aspects of
system
design.
Specifically, it is about preventing
design mistakes.
Typical attitude to incidences is emotion driven: instead of considering the prevention of the next incidence, people look for assigning the incidence to specific people. Following an incidence or an accident, the stakeholders typically focus on accountability issues, looking for "bad apples" ( Dekker, 2007 ) instead of on improving the safety.
This guide assumes that it is the responsibility of the
system engineers to prevent
mishaps,
and this guide is about achieving this goal. It is the developer's responsibility to eliminate the likelihood and to mitigate the costs of all
hazards, and to prevent predictable
incidences, including those typically regarded as
errors.
Risky
situations should be
expected and the design should include
means to mitigate these
risks.
Analysis of operational failure is possible when it is based on models of incidence generation ... . These models describe typical ways of hazard generation and development. The concepts underlying the methodology presented here are illustrated in the extended Swiss cheese model .... Key models used here are
More than 2000 years ago, the Roman philosopher Cicero (Wiki ...) already observed that "to err is human". The 18th century English writer Alexander Pope ( Wiki ...) added that "to forgive, divine", suggesting that the term " error" is accountability biased. This term is used extensively in investigations, to justify distracting the discussion from costly investment in resilience assurance, to cheap and handy personnel changes.
Statistics about the sources of accidents indicate that most of them are commonly attributed to the human factor ( statistics). This guide focuses on preventing errors typically attributed to the human operators. In this guide we focus on the sources of errors: psychomotoric limitations result in slips, and information related problems result in confusion.
Incidence analysis of these incidences indicates that most of them involve operator's difficulties in handling exceptional situations. It turns out that the operator's errors are typically due the operator's difficulties in handling such situations.
The models of the system resilience enable definition and design of defense layers, including methods for hazard prevention and error protection, both proactively and reactively.
The resilience models include representations of the user's and operator's behavior, such as the ways they perceive and understand the operational procedures and behavior. Resilience-oriented design enables preventing predictable interaction flaws, typically attributed to operator's errors.
Any defense added to the system introduces new hazards, called threats. It is challenging to defend the system against the new hazards, and to evaluate the costs of the various defense options, and the threats that they introduce.
Rules integrated in the behavior knowledge base about the system operation in normal and exceptional conditions enable automated hazard detection and unexpected events.
A special architecture must be designed to support the various features for consistency assurance, hazard detection and alarming.
The system resilience develops in cycles, starting with proactive assurance, followed by reactive assurance. Proactive assurance is about designing the protection layers, and reactive assurance is about learning from incidences. Small cycles allow fine tuning of the operational rules at the development site (part of alpha testing) and large cycles are used to learn from real incidences, at the customer site.
Updated on 05 Apr 2017.