Recent studies attribute failure to difficulties in handling exceptional situations. Resilient systems are those that recover from faults and exceptional situations without significant costs. Therefore, this guide focuses on proposing ways to avoid exceptional situations, to identify them and to recover when they occur.
This guide proposes guidelines for coping with failures of various sources (the operator, hardware, software, context ...)
system failure is sometimes attributed to the people who designed the system. However, this is not the case when the operation is error-prone, due to disregarding the limitations of the human operators. This guide focuses on preventing situations in which the operators are liable to err, and on assisting the operators in recovering from the risky situations.
Updated on 02 Apr 2016.