FAA starts Boeing 737 Max test flights

I appreciate your insights on these issues, and thanks for the update – interesting stuff. But I’d like to better understand the part that I quoted with respect to the stuff that’s been widely publicized about the MAX that I note below.

I can see how bad AOA information can be problematic on any airplane (or bad airspeed information, which led to a series of cascading events that brought down AF447), but the common understanding is that the specific problem with the MAX originated with the larger and heavier CFM LEAP-1B engines that replaced the CFM 56-7 used on the NG. These new engines had to be mounted higher and further forward, changing the thrust line and handling characteristics in some conditions, which was the whole reason the MCAS was introduced. It was the implementation of the MCAS that was the issue – its aggressive nose-down trim behavior if it thought the AOA was too great, exacerbated by its dependence on a single AOA sensor.

This is probably an overly simplistic description, but is it not basically correct in its general outline? The 737-NG didn’t have or need an MCAS, and MCAS was effectively what drove the Lion Air and Ethiopian MAXs into the ground. The problem was further exacerbated AIUI by the fact that MCAS was never documented in the flight manual (in fact, it originally was, but was then removed, leaving a mention of it only in the glossary with no explanation).

I have no doubt that the MAX is now a fine airplane in the tradition of the previous 737 line, which IIRC were among the safest aircraft ever flown. And it’s quite probable that better-trained, more experienced pilots could have managed the situation that those two ill-fated MAXs encountered. But surely you’d have to acknowledge that the original MAX with the original MCAS design posed an unusual and needless risk far beyond just “cockpit confusion” because of a sensor failure. If not, what am I getting wrong? Just trying to learn here, from someone in a position to understand this better than anyone else.

No, you’ve got the highlights exactly right.

In the NG or equally in the original or now modified MAX, a single AOA failure (hard-over actually, where the thing goes off-scale high or low, not just “a little out of tolerance”) produces noisy false warnings, misleading instrument readings, and plenty of opportunity for cockpit confusion.

The original MAX added one more thing to the mix. For an off-scale high failure, the original MAX would, via MCAS, mistakenly crank in big doses of nose down trim repeatedly and would not quit mistakenly trying until disabled.

Which meant that from the pilot POV, what had been a pretty darn confusing failure in the NG added runaway nose down trim to the mix in the original MAX. Which difference took a basically stable flight situation albeit with confusing instrument readings and made it into a very unstable flight situation on top of the confusing instrument readings.

Which super-confusing very time-critical situation duly got away from those two crews, but not from a different Lion Air crew who had had the same malfunction in the same jet the day before.

The modified MAX fixed that; MCAS will ignore off-scale readings, and if it does mistakenly trigger for an out-of-tolerance reading, it won’t do so repeatedly nor to excess. Thereby all-but neutering the runaway instability. Leaving us with a failure scenario substantially the same as that of the NG.

As to why MCAS?

Every airliner with wing-mounted engines has the issue that as you raise the nose higher and higher versus the airflow, those engines get more and more sideways to the airflow. In a sense, the incoming air gets “under” the engines and helps to push the nose higher and higher. Which could, at least in theory, lead to a runaway instability.

By regulation, every airplane has to be designed so that as the nose gets higher, it has to “feel” heavier to the pilots. It can’t begin to feel lighter as the situation gets more extreme. And it darn sure it can’t get so extreme that the airplane starts to pitch up on its own as more and more airflow gets under the forward-mounted engines.

In the case of the NG, the natural aerodynamics of wing shape, engine shape, CG, tail size, etc. means the airplane meets that pitch feel design standard. Barely. And even then the NG has some other gizmos to make it physically harder for pilots to inadvertently pitch up into a stall. IOW, the pitch “feel” is already artificially enhanced on the NG. As it is on many traditional airliners.

In the case of the MAX, the engines being a bit bigger and a bit farther forward on an otherwise almost identical wing & fuselage meant the MAX did not naturally quite meet the design standard. The extra pitch-up tendency due to the different engines ate all the tiny margin the NG had, and then some. And the limitations of the feel enhancement systems on the NG meant they couldn’t directly be cranked up a smidgen to make up the shortfall.

So MCAS was invented to give that little extra nudge at the right time to make the feel meet the standard. Actually, MCAS wasn’t invented for the 737; it was borrowed.

It turns out every 767 ever flown has the same MCAS augmentation to meet the standard. Because the 767 aerodynamics don’t quite meet the standard without some help from the pitch trim. And never did from Day 1.

The big difference between the 767 & 737 is the 767 has triple redundant everything with some majority voting and cross-sensor averaging, whereas the 737 whose design origins are 15-20 years earlier, has dual independent everything with the pilots meant as the one and only tie breaker for any disagreements.

The full implications of that different redundancy environment wasn’t fully appreciated by the various departments at both Boeing and FAA involved in the change. So MCAS was released with a vulnerability to a single point failure that was misjudged as minor but proved in practice to be severe.

The later Boeing 777 & 787 and all Airbus products have the same underlying concern and have equivalent fixes to the unhelpful natural aerodynamics. But those airplanes being fly-by-wire from beginning to end, the “fix” may have a name in the internal software documentation, but it isn’t a separate black box in the belly. It’s just one more parameter that goes into the software’s decision logic and results as seen by the pilots.

Seems like there’s a small chance of a problem with non-standard autonomous ballast attempting to attach itself to a wing of the airplane…

Thanks very much for the very detailed backgrounder. Good to know that I had the basic picture right, though I had no idea about the other stuff. As a computer science guy, I’m familiar with triple redundancy and majority voting but I only knew about its use in spacecraft, not avionics. It’s appalling to me that something as powerful as MCAS could have been triggered by just a single bad sensor, and even disagreement between the dual sensors (of which MCAS only cared about one) was only flagged to the pilots if the airplane had the “disagree” warning light, which was apparently optional, and the Lion Air plane didn’t have it (not sure about Ethiopian). I read that here in Canada, the major MAX operators (Westjet and Air Canada) had all ordered the “disagree” warning option. I also believe it will now be standard.

@Snowboarder_Bo. See also:

This is not quite right. And as a CS guy you’ll love the backstory. Or at least recognize it instantly & grimace.

There is a disagree warning. That was never intended to be optional.

As a separate matter there are AOA display gauges on the pilots’ screens. Which are an optional add-on that rather few carriers ordered. This was/is true on NG and on MAX. Neither Lion Air nor Ethiopian had the gauges.

Due to a bug in the software build script, not ordering the gauge on a MAX had the effect of inadvertently disabling the warning too. Oops.

Both the MCAS runaway after single AOA failure and the inadvertent lack of AOA disagree warnings without gauges were known bugs that were already in work to be fixed in a later software release before the first MAX went down. They were judged to be so unlikely to occur in service before the normal release schedule that they didn’t warrant being rushed out as an out-of-band patch with all the extra regression risks attendant thereto.

Murphy did not die in WWII. He’s alive and well.

And yes, the AOA display gauges are now standard. You can’t not have them on MAX. I don’t know for sure about NG retrofit. Ours have the gauges already.

Yikes! That sounds like absolutely horrible software quality management, especially if no AOA gauges was a common configuration (but even if it wasn’t)! Heads should roll in their software testing group. It’s like they got their testing policies from Microsoft!

Again, thanks for the detailed info. It’s amazing how often (in fact, almost always) the general media gets the technical details wrong.

And of course the accidents themselves would not have happened except for another two glitches, this time in maintenance.

NGs and MAXes have the same AOA sensor. It’s essentially an external weathervane attached to an internal multi-bit Gray-encoding wheel that outputs a digital value corresponding to the vane angle.

Years before the first MAX flew, an AOA sensor was removed from an NG somewhere on Earth and sent out to be repaired, recalibrated, and recertified as a good usable part. The work was done by a shoddy shop in Miami.

The vane assembly attaches to the encoder assembly via a splined shaft. Per the repair procedures there is a go / no-go tester that is connected to the finished part which is then cycled through its range of motion while the tester confirms the output is as expected.

On this particular part, the vane assembly was attached to the encoder assembly way wrong; 60 degrees = 1/6th of a turn out of whack. Then the operator used a tester set to the wrong setting, one used for a different model of AOA sensor. This wasn’t noticed because the procedural instructions didn’t specify what setting to use; it just said “appropriate.” And that shop always left the tester set appropriately for these type vanes. Except this once. And the tech didn’t notice the difference. The fact the controls on the tester were unergonomic as all hell didn’t help.

Anyhow, the utterly defective out of calibration sensor assembly passed the defectively conducted test and was shipped to a parts broker. Where it eventually went to Lion Air and sat in their warehouse for a couple years.

Then one fateful night a shiny new MAX had a problem that the maintenance folks thought might be AOA related. So they decided to change the sensor. They happened to pull the defective one from stores and install it overnight in the midst of a driving tropical rainstorm.

The procedures call for a post-installation calibration test with with airplane avionics that would have caught the defective sensor. Doing the self-test would have had a man on a ladder in the driving rain for 10 minutes carefully holding the vane at various settings while the avionics compared the readouts with expectations.

That procedure was documented as having been accomplished. The physical evidence suggests it was not.

The next morning they took off with the long-defective sensor installed.

For want of a nail. …

See also:

Wow! :astonished: That’s quite a story!

I remember reading about the Lion Air plane having its AOA sensor replaced before the flight due to problems apparently experienced on the previous flight. I assumed – as probably everyone did – that it was replaced with a new and properly calibrated sensor. It was never explained how or why after this maintenance, they encountered this exact AOA-triggered MCAS issue! Just, wow!

Yes, that is extraordinary - thanks for writing it up, I wasn’t aware of these details. I know I’ve mentioned this before (in fact, Google shows me it was as recently as September!) but it reminds me of the BAe 146 that lost a windscreen due to the incorrect bolts being used - a series of low-probability coincidences that added up to a significant probability of a fatal accident. There are many other aviation examples, of course.

Nitpick: It was a BAC-111 (Unless there were two such incidents).

No, you’re quite right - I was thinking of Flight 5390, which did indeed involve a BAC-111 - thanks for the correction. I don’t know how I confused the two aircraft - I guess to the casual observer (i.e. me) their cockpit and tail designs are somewhat similar. But even I should have noticed the completely different number of engines, and their positioning (as well as the high/low wing difference)!

Well - you know what they say … You’ve seen one weird British aeroplane, you’ve seen them all. :wink:

The 1940-1980 British aero design aesthetic was … idiosyncratic to say the least.

Pretty much every aviation accident is like this. It takes a half-dozen or more adverse coincidences to all occur on the same plane on the same day to the same people.

That Lion Air airplane successfully flew one entire revenue flight with the defective AOA unit installed and hence the MCAS malfunction going the whole time. The difference was that first set of pilots didn’t get buffaloed. They worked the problem more or less as Boeing had assumed every pilot would, and flew sucessfully to their destination. They thought it was a minor enough deal that although they did write up the malfunction for maintenance, they did not make a major crisis report to management.

The second crew quickly lost control of the same identically malfunctioning airplane. because they got too confused too quickly and let it run away with them.

Thanks for the detailed history.
That LION Air story is one for the books. A long list of relatively minor failures and missed opportunities for correction.

Darn near every accident is that way. Darn few are just one big cause.

The Ethiopian accident will be very interesting. The local government has been very close-hold on stuff. There’s some concern there will be a lot of cover-up since their equivalent of the FAA, the Air Force, and the airline are all divisions of the same government ministry. Lots of opportunity for CYA there.

The final report should appear within the next 6 months.