NASA, or: How To Continue Beating Your Head Against A Brick Wall

From this article:
[Former Discovery commander Norm Thagard] said it’s unlikely that Discovery would be seriously damaged, because it took more than 100 shuttle flights before damage from debris led to a disaster such as the Columbia break-up.

::sigh::

Now, this guy is an astronaut, not a failure and reliability engineer; perhaps his statement does not reflect the reasoning processes of NASA Shuttle management, et cetera…

But this is exactly the sort of justification that was used to pooh-pooh concerns over exhaust gas blowthough and O-ring erosion that lead to the destruction of Challenger. This is the same reasoning that led to engineers concluding that pieces of insulation falling from the External Tank and punching holes in the leading edge wing panels was not worrysome. Roasted Irish babies on a popcicle stick, the fact that the system hasn’t catastrophically failed from this damage to date does not insure that it will not fail; it doesn’t decrease the probability of failure on any individual flight; on its lonesome, it tells us absolutely nothing that that we’ve been lucky so far.

Certainly, tiles have been falling off the Orbiter since April 12[sup]th[/sup], 1981; it’s been accepted (for better or for worse) that this is going to happen. So far, it hasn’t happened to such an extent or in such a way to result in catastrophe. All that data does, though, is give us a knowledge of what hasn’t happened yet. Furthermore, as there is no metric by which to graduate between “success” and “failure”, i.e. we don’t know where “acceptible” damage ends and “critical” damage begins; in short, it gives us no function by which to estimate the probability of failure on an individual flight. And if it is a potentially critical issue, the fact that it hasn’t failed in the last 113 flights just means that you haven’t hit the extent of the failure envelope yet. Averaged over time, it just means that you are getting closer to a failure event.

You fuckers need to learn about Gambler’s Fallacy. Take a basic probability and statistics course.

Stranger

Thagard’s comments by themselves do sound like the “gambler’s fallacy”, but the point here is that the type of damage observed on Discovery from this flight is consistent with the type of damage that has been seen on shuttles that have successfully returned from space. The critical areas such as the RCC panels on the leading edge of the wing and the nose are OK, and that is the type of damage that caused the catastrophic failure of Columbia.

I’ve come to the sad conclusion that a large contingent of our astronauts corps basically wants to fly, and to hell with anything else, including sound economics or even self-preservation. To call the STS an unsafe boondoggle would be charitable. It never was safe, it never will be safe, the design is so basically flawed, and so hideously complex, that it’s a testbed every single flight. Forget statistics. This is more about something akin to Rumsfeld’s famous list of “unknown unknowns”. Throw in the the inviolate Murphey’s Law, and it’s really just a (short) matter of time before another seven people get blown up. The Shuttle program is one built upon keeping, at best, one step ahead of a host of failures, any one of them potentially lethal; and major steps are only taken when catastrophic failure inevitably occurs. Any of the thousands of components that made their debut on the Shuttle might completely break tomorrow for all anybody knows, and it’s been nothing short of a miracle that more people haven’t been killed up to now. The “Can Do” attitude of our manned space program is nothing but PR and and the ability of NASA managers to keep their heads firmly buried in the sand whilst engineers bite their nails to bloody stumps. Do any of these ineffectual beaurocratic half-wits pay the deserved price when another seven people and another billion dollars go up in flames? Fuck no. So why change?

But we need to keep flying the Shuttle because we need to service the ISP which was designed for the shuttle, which desperately needed something to do when it proved to be a hideously profligate means of attaining 1960’s-level launch capacities into LOE, which simply can’t go on looking so completely and expensively useless because it might, you know, damage our National Prestige.

Coming soon to an Applebee’s near you.

There is an important distinction here between the actual probability of failure and the estimated probability of failure. With all due respect to Dr. Feynman, I don’t think he appreciated this point. The actual probability of failure can only be determined once all the shuttles are flown. In the here and now, with data and models of varying reliability, one can only estimate the probability of failure.

Gambling typically involves random outcomes of some process that is finite and known, e.g. flipping a coin. In that case, it is known with very high confidence that there are only two possible outcomes and that each is equally likely provided certain basic standards that can be stated a priori are met in the flipping process. Said differently, the data and models are impeccable and only a very, very unlikely observation would call them into question. In such a case, there is essentially no knowledge gained by observing a few flips of a coin. If a coin comes up heads three times in a row, the probability of a fourth head is still 50%.

The kinds of risk assessments they do for the Shuttle involve much less reliable data and models, and all they can do is generate estimates that are in some sense conservative. Every Shuttle flight provides more information on the reliability of the data and models, and at a fixed confidence level, this can be taken into account to affect the estimated probability of failure.

To use the gambling analogy, suppose you flipped a coin and it came up heads 100 times in a row. 100 heads in a row is so unlikely that you would be justified in suspecting that the coin is loaded, or has a magnet in it, or something, and upon examining the coin you would probably find grounds for revising your estimate of the probability of another head upward from 50%. This is the kind of thing that happens in risk assessments. Now, whether 100 successful Shuttle flights is enough to change the reliability estimate for a failure mode that ought to be very unlikely is another question. But that is just a quantitative difference. The basic concept, that new information can help you improve your estimated reliability, is sound.

The real problem is that the more complicated the system is, the more opportunties you have to fool yourself by using biased data or models. Another source of bias is for the positive outcomes to be unconsciously weighted more heavily than the negative ones. If you do that, you can fly forever and your estimated reliability is always going to go up, up, up, until that one bad day. It’s an open question whether meaningful risk assessments can ever be made for something as complex as the Shuttle.

What Hyperelastic said. Most definitely *not * an example of the Gambler’s Fallacy. In fact, given the complexity of the system, the *only * way to get a decent estimate of whether it will fail is whether the system *has * failed. It’s the same reasoning that drives medical experiments. Because the biological system of a human is so complex, you can’t possibly know what a drug’s going to do to people until you give it to some people. As the number of people you’ve given it to goes up your confidence in its safety goes up.

I will certainly buy that with 2 out of 113 missions ending catastrophically the shuttle appears to be pretty unsafe. I don’t see a clear solution to the problem though. Are we sure that this is because of poor design and not the fact that manned space flight is inherently unsafe? Maybe a 2% failure rate is as good as we can expect with our current technology. Like any new technology it seems to me there will be failures as teh design is improved. And failures on these missions are always going to be catastrophic. Are we prepared to scrap manned spaceflight altogether?

Now they’ve grounded the rest of the fleet pending an investigation.

I was under the impression that Atlantis was supposed to be flight ready in the event Discovery was damaged.

Does anybody have the failure rates for other launch vehicles. I know when I was following the globalstar launches the failure rate for those launches was higher than 2%.

All I know is that I think it is time (probably long past time) that NASA starts seriously thinking about what will replace the Space Shuttles. They’ve been around for a long time, they’ve probably done as good a job as they can be expected to do- and while I don’t have enough information/inclination to make a judgement on whether grounding all flights “until they figure out what the problem is” grounding all flights for a futher length of time seems to increase the risk of age related failures in wiring or something.

I think NASA needs to seriously consider whether risk of death is acceptable. If it isn’t- then manned space exploration is not possible. If risking death is acceptable- (and I could be persuaded that it is) there are still some issues with how much risk is acceptalbe and how risk is measured and assessed.

My probably unanswerable question (that I’d really like an answer to anyway):

Would the degree to which loss of human life (or the risk of loss of human life) be acceptable influenced by whether the attempted space exploration is government supported or privately financed?

NASA had what is called “STS-300”, a rescue mission, all planned out before yesterday’s launch. The rescue mission would go up with four astronauts, no cargo, and a bunch of empty seats, and come back with 11 astronauts.

Today, a camera was used to survey the wing leading edges for damage; preliminary results indicate no major damage. Tomorrow morning, they’ll position Discovery with its bottom facing the ISS so a photographic survey can be made of the underside. If they find damage, a decision will have to be made whether to bring Discovery back as-is or to rescue the astronauts with STS-300. The tile repair techniques are not ready on this flight.

The trouble is that all the orbiters have the same insulation. So, if they find damage, is it riskier to launch STS-300 and risk having the same damage occur, which means you have eleven astronauts stranded on the ISS and no rescue mission ready, or do you try to bring Discovery down as-is, hoping that the damage isn’t serious enough to cause another Columbia disaster?

And people say engineering is boring.

The problem, as you’ve stated, is that the complexity of the overall system defies making any assertions of probable failure of any individual component or subsystem; by the time that we know a system has failed (to the extent of causing a failure of the entire vehicle) it generally means that we’re picking up the pieces. My complaint was that the astronaut quoted in the OP was using exactly that sort of misplaced logic in his assertion. Mind you, this may not be the methodology NASA is currently using to assess failure probability, but historically, this is the justification (“Hey, it hasn’t failed yet”) used to ignore blatant, glaring problems such as the O-ring failures.

Most industries have gone to using a method called Failure Modes and Effects Analysis (FMEA), which as an engineer or technical person I’m sure you’re at least passingly familiar with, in which a component or subsystem is evaluated both on its estimated likelyhood of failure and what result the various failure modes would have upon the function of the component and downstream systems. In the case of the O-rings, application of the method would have provided the obvious result, to wit: any degredation of the o-ring is a negative, and failure of the O-ring to seal propellent escaping toward the main fuel tank could result in catastrophic failure! That would have mandated a solution; and indeed, both NASA and Morton-Thiokol engineers railed for recognition of the problem, only to be dismissed by higher-ups (and the administrative culture in general) by using the same reasoning as above: “It hasn’t failed yet, and so is less likely to fail in the future.”

The falling tiles issue has been known from the very beginning, and I have to assume that, at least in the early days of the program, some very sincere and dedicated analysis was done to assure everyone involved that the loss of a few square inches of tile here and there was unlikely to cause a catastrophic failure. That’s a perfectly legitimate analysis, even if done on the basis of some guesswork and incomplete data, and in any case there is no fix for that problem that wouldn’t result in a major redesign or scrapping of the entire vessel. But the bald assertion that an out-of-design condition that hasn’t resulted in failure yet should offer confidence of future integrity is arse-backwards logic, which often seems to be the only kind of rationale that NASA developers and managers are capable of.

Spaceflight is a risk endeavor, and it isn’t likely to get safer so with foreseeable evolutions of technology, but let’s not lie about the risks. Especially not to ourselves.

Stranger

Risk is our business.

Logical answer; you send Discovery down unmanned (or with a single volunteer) to see what effect the damage has on the re-entry and landing capability. If it’s okay, no problem. If not, you…uh…well…

Anybody got a towel?

Yeah, this is the kind of ulcer-inducing excitement I could do without. Static test firings of motors are bad enough. (Gee, I wonder if that induced resonance we predicted would fade out is actually going to break an oxidizer line and blow the whole motor?) I’d really hate to be the guy responsible for saying, “Yeah, well, your orbiter’s got a hole in the wing, but we don’t really think it’s going to be a problem for you, so…happy landings!” :eek:

It looked so much cooler in the brochure…

Stranger

And in his Senate chambers, John H. Glenn mutters under his breath “You candyasses…”

So “STS-300” uses Atlantis? I wonder what shape that thing is in, because I’d imagine they would have used the most flight ready shuttle for the mission.

Still, I think it was in poor taste for NASA to wake the astronauts this morning by playing “It’s Raining Men.”

Wouldn’t it be possible for the Russians to send up a few spacecraft to rescue the “marooned” astronauts (like in that movie, IIRC)? Why are Russian capabilities seemingly so rarely considered in matters like this?

Soyuz capsules can only take three people per. To affect a rescue of hypothetically-stranded Shuttle astronauts would take three launches, and odds are good that they don’t have three completed (or nearly completed) boosters and capsules sitting around at any one time, never mind the logistical difficulties of three launches in rapid sequence. The Soyuz booster is at least partially assembled on-pad, not fully integrated in a launch assembly building like Apollo and the Shuttle.

This is why the ISS has been constrained, for now and the foreseeable future, to have a maximum crew of only three; any more and they wouldn’t be able to make an escape.

Stranger

Lemme see if I got this straight. The fuel sensor problem they fixed by putting some electrical tape over the warning light and even though Columbia was brought down by a chunk of foam that broke off the main fuel tank that tank is STILL covered with foam with nothing holding the foam (note that it’s fucking FOAM!) on except some stickum and the will of God and neither has been holding up his end all that well lately. Is that the situation in a nutshell?

Is it in fact possible to de-orbit and land the shuttle without at least two crew? As far as I know, only atmospheric gliding tests (with the Enterprise test craft) were done without a crew aboard, and all the orbital flights to date have had at least two pilots to fly the thing. (I may very easily be wrong.)