Challenger disaster question about O-rings

Watching the new documentary on Netflix ., I have to ask

DId no one think to test the O-rings response to cold weather in the decade and a half between the authorization of the Shuttle in 1972 and the Challenger flight? They seemed genuinely surprised by Feynmann’s demonstration. .
Its clear that the contactors had their concerns, but they seemed to be along the lines of they become less reliable as conditions become colder, without any empirical data on performance at various starting temperatures. Surely it couldn’t have been too difficult to do an experiment comparing temperature and resilience?

Has anyone come up with a satisfactory answer as to why the Shuttle did not explode on the launch pad rather than halfway through the SRB burn, since the O-rings failed at almost immediately after launch. I know the hypothesis is that Aluminium Oxide residue plugged the gap initially until being dislodged by wind shear. But the Shuttle experienced vibrations throughout launch (enough that Apollo veterans on their first flight felt that it would break up) and surely the burning would deposit more?

Yes they knew about the O-ring issue long before the launch, exhaustive testing was performed and warnings not to launch from the supplier were ignored.

The rumor going around at the time was that Reagan wanted to have a live broadcast with the crew, I believe IIRC the State of the Union address. So his people over ruled the decision not to launch. Doubt this was ever proven.

They certainly knew that the o rings weren’t completely reliable, but it’s not clear to me how much everyone recognised that temperature was a major factor in this. The famous Warning letter for instance doesn’t have anything much to say about temperature as such - just that the engineer considered them to have general reliability problems, which were likely to get people killed.

I learned in engineering school that the proper response to a situation like this is NOT to go into a detailed analysis of the issue but to issue a clear warning that failure is imminent. Which is basically what this letter states. These are not beginning level engineers saying that something bad may happen and here are my calculations to prove so. These are expert level engineers whose testimony is admissible in courts. If they tell you a bridge is falling now you get out if the way, not review data.

Just this week, Scott Manley posted a YouTube about the SRBs’ engineering and mentions the fixes that were made after Challenger.

My recollection is that engineers did warn of the possibility of failure, but their bosses overruled them. Perhaps their warnings weren’t loud enough, but not everyone will argue with their boss.

Wikipedia has a pretty good account of events from liftoff to disintegration:

The vehicle didn’t explode on the launch pad because a hot plume of escaping gases needs enough size and time to do damage. The initial leak was small and plugged itself a couple of seconds after liftoff, before it had time to make anything terribly catastrophic happen.

Pretty much any flight experiences some turbulence, but according to the account on Wikipedia, the wind shear events from T+37 to +27 were uniquely violent that day, worse than on any previous flight. The leak visibly manifested again at around T +59, and it was a big enough leak to lower SRB internal pressure. That’s a BIG leak, and it was blasting right on the main tank and the SRB’s connection to it. It took only five seconds to burn a hole in the LH2 tank, and another eight seconds for the aft SRB connection to let go.

I don’t recall if it was that, or some other video I watched recently, but they mentioned that the SRB reusable casings were tested for roundness by measuring their diameter in several different directions, but that this is not actually a dependable test for circularity - of course it’s quite possible to have shapes that are very much non-circular, but have the same diameter when measured in any direction

Lots of people knew that the O-Rings were a problem. Other people at that meeting knew (since Feynman eventually realized that Kutyna had set him up to be the one to reveal the information about the effect of cold on the O-Rings (and Sally Ride had told Kutyna).

Per Kutyna’s account:

One day [early in the investigation] Sally Ride and I were walking together. She was on my right side and was looking straight ahead. She opened up her notebook and with her left hand, still looking straight ahead, gave me a piece of paper. Didn’t say a single word. I look at the piece of paper. It’s a NASA document. It’s got two columns on it. The first column is temperature, the second column is resiliency of O-rings as a function of temperature. It shows that they get stiff when it gets cold. Sally and I were really good buddies. She figured she could trust me to give me that piece of paper and not implicate her or the people at NASA who gave it to her, because they could all get fired. I wondered how I could introduce this information Sally had given me. So I had Feynman at my house for dinner.

Perhaps they did. But I think you’re missing the point by just a little: it may not be (really isn’t) necessary to know the precise failure rate of a component under a given set of conditions if it’s already been established that it’s failing more often than the design assumed it would. Simply that it fails much more frequently than its supposed to is enough. Or at least it should have been in this instance. Ignoring that increased (whether precisely quantified or not) failure rate took things from inherently dangerous to unsafe.

In fact there is no acceptable failure rate when it comes to safety systems. Even airbags in cars are expected to work 100% of the time and if they don’t the suppliers are pushed to perform better or lose the business. For high value level systems like the shuttle or fighter aircraft their is a requirement to have redundant systems in place. IIRC for the O-ring I believe that their were to be two O-rings on each seal in case one failed.

Yeah. As Feynman described it in his report

If a bridge is built to withstand a certain load without the
beams permanently deforming, cracking, or breaking, it may be designed
for the materials used to actually stand up under three times the
load. This “safety factor” is to allow for uncertain excesses of load,
or unknown extra loads, or weaknesses in the material that might have
unexpected flaws, etc. If now the expected load comes on to the new
bridge and a crack appears in a beam, this is a failure of the
design. There was no safety factor at all; even though the bridge did
not actually collapse because the crack went only one-third of the way
through the beam. The O-rings of the Solid Rocket Boosters were not
designed to erode. Erosion was a clue that something was wrong.
Erosion was not something from which safety can be inferred.

As I understand it, there were burn throughs in the o-rings before, on other flights (in cold weather), with solid rocket gasses coming out. But since that time the shuttle didn’t explode, the event was determined to not be as severe as believed. In other words, “it happened before, and nothing happened, so we can keep doing it.” Those durn engineers are just overcautious.

Until something DID happen.

Ehhhh… I’m not sure that’s true from an engineering standpoint. From a liability standpoint, sure. From a public policy standpoint, say what you’ve got to say. But from a realistic design perspective? If g-you think the odds of failure are ever zero, you’re fooling yourself.

The reliability expectations for the shuttle from the git-go was they would catastrophically lose 1 in 100 flights. That was NASA’s going-in expectation. They also expected to fly a few hundred flights over the much larger fleet flown far more often while all this was on the drawing board. Said another way, they were planning from the git-go to lose more than 1 shuttle, and probably more than 2 over the life of the program. That’s called “acceptable risk”.

“Zero defect” is for TV dramas and product liability lawyers, not for the real world.

In reality the shuttle was far less reliable that the 1% failure rate it was designed to. The good news was due to high operating costs & low budgets they also flew it a lot less than the original program design called for. So they still got their 2 catastrophic accidents, just divided over far fewer flights than they expected.

Why less reliable than planned? Largely because, as discussed in the various reports, NASA got lucky early then got complacent. Whether that complacency was suppliers lying to protect revenue and to CYA or bureaucrats lying to protect budgets and schedules and to CYA is a separate topic.

You plan for zero incidents. To do otherwise sets you up for lawsuits yes, but its not just talk. You do risk assements to minimize liability and if there is a failure you take countermeasures to prevent a reoccurance of the same failure mode. Its not fooling yourself, its playing it smart. The space program is full of smart people and they failed in the Challenger incident by ignoring the warning signs and the safeguards they set up. People were the cause of the failure, not the system itself.

There were two catastrophic failures out of 135 missions. That’s not “far less reliable” than the expected 1% failure rate. It might not even be any less reliable at all than the expected 1%.

Are you forgetting Apollo 8? Or just counting the shuttle?

I of course meant Apollo 1. And should checked that their were 135 shuttle flights.

Still Columbia was more of an accident, Challenger was humans being over confidant thinking they could force their will on science.

I think everyone in this thread has made it clear that we’re talking about the Space Shuttle.

And any engineer who plans for zero incidents should be fired for incompetence. Let’s say you have some critical part that will fail one time in 1000. Well, you say, 1 in 1000 is greater than zero, and thus unacceptable. So you build in a backup. That doesn’t make your failure rate zero-- It makes it 1 in a million, which is still greater than zero. And that’s only in the extremely unrealistic case that the backup is just as reliable as the primary but also completely independent of it, neither of which is likely to be true (and the greater the degree to which one is true, the less for the other).