[QUOTE=Cosmic Relief]
I’m a little confused about the correct use of the term “single point of failure”.
My understanding of SPOF is a system where a maximum of one component can cause a catastrophic failure. This is a good thing, because it minimizes the probability of a catastrophic failure. SPOF is a characteristic of a system, not a component.
In the computer business, people universally use SPOF as a term to describe any component that can bring the entire system down. To them, a SPOF is a critical component that is not redundant. This defnition is apparently so entrenched that I was unable to google anything other than this definition.
I think the computer people have it wrong. And ordinarily I couldn’t give a flip about terminology wars, except I believe SPOF has a specific and unambiguous meaning that is being lost through misuse.
What say you?
[/QUOTE]
I’m a mechanical engineer working in aerospace (rocket boosters and support systems); our definition for single point of failure–used in reliability and FMECA (Failure Modes and Effects Criticality Analysis)–is definitely the second. SPFs are to be avoided whenver possible; even if the incidence (or occurance)–one of three general categories used to rate the risk level–is low, the severity of failure is high, because it means an instant failure of a subsystem and potentially the entire system in some mode. (The third category is detectability or inspection, i.e. your ability to detect or identify a failure before it becomes critical to operation.) SPOFs are always high risk, and are only mitigated by a combination of high design margins, proof and acceptance testing, and inspection. Although you would like to get rid of any single point failures in a system–especially one like an aircraft or rocket booster where system failure means mission failure, loss of payload, and potential hazard–it simply isn’t possible to design a complex system without some kind of SPF as Balthizar notes.
I have to admit that I don’t really understand the o.p.'s first definition: “…a maximum of one component can cause a catastrophic failure. This is a good thing, because it minimizes the probability of a catastrophic failure…” I think he means that it is a system in which only one component can fail (???), or that the failure of one component in the system does not impact the operation of other components in the system, either because all components are redundant or because the function of an individual component is not critical to the rest of the system. In reliabillity engineering terms this would be called a “fully redundant system” (or dual redundant, triple redundant, et cetera). In rocket boosters, flight termination systems (ordnance designed to stop the motor from thrusting and/or break it into pieces that post a minimal hazard to people and structures) are always at least dual redundant; that is, they have two seperate antenna, triggering systems, ordnance lines, et cetera, so that the failure of any component on one system doesn’t impact the reliability of the other system.
Stranger