How accurate are flight simulators?

I’m talking about professional flight simulators used by professional pilots for training, to avoid the expense and danger of using real planes.

They seem like a wonderful invention. They rely on accurately simulating real flight as much as possible – realistic cockpit, realistic motion, realistic ambiance. If they can do that, the outcome of a simulation (landing safely, crashing, whatever) should match real life.

So how do we know that the simulator is emulating real life? What if a pilot configures the simulator in a way never before attempted? How do we know that the simulator’s software emulates the real thing?

In the US, the FAA certifies flight simulators: Flight Simulation Training Device Qualification Guidance | Federal Aviation Administration

Pilots actually encountered the flight control issues with the 737MAX in the simulator, but it was largely ignored.

A friend is a Delta Airbus 350 first officer, formerly a 747 FO. When Delta decommissioned the 747, she spent time studying and the in Atlanta on simulator. Her first flight was a regular passenger flight with a more experienced A350 captain sitting in the other seat.

Yes, it was the 737MAX that I was thinking about, along with other aircraft.

It seems to me that if the software used in an actual plane is transferred to the simulator (is that the way it works?), that flaws in the software, like the ones in the MAX, would be exposed. If so, do the software designers try their inventions out in simulators? While not the only tests they should do, it seems like a good alternate method of debugging.

There are different levels of simulators. Some are not certified for “zero flight time” which means you don’t get your type rating until you’ve done some handling (without passengers) in the real thing. My Dash 8 type rating was finished off with circuits in a real airplane. My comments below refer to level D simulators. They can be used to give a pilot a full type rating without ever needing to touch a real airplane.

They are very good but have limitations. In terms of aerodynamics, they won’t accurately simulate conditions they haven’t been given data for. After AF447 crashed there was a push for more “edge of the envelope” scenarios to be practiced in the sim. The problem is the sim can’t replicate being held in a deep stall (for example) accurately if the manufacturer hasn’t flight tested in those conditions to get that data in the first place.

There can be other things that aren’t simulated very well. The BAe146 sim that I used to be subjected to was horrible to land, apparently it didn’t simulate ground effect. The actual aircraft was very easy to land by comparison.

Something that has come out of the B737MAX crashes and subsequent groundings, is that the B737NG sim does not accurately replicate the physical force required to move the manual trim wheel when the stabiliser is a long way out of trim.

In general I wouldn’t trust a simulator for scenarios it hasn’t been programmed for. Some unusual combination of system failures may or may not reflect the real thing. They are great when used to practice procedures for typical failures and for learning how the airplane flies within normal flight parameters.

Here’s a little blurb about Boeing’s e-cab engineering sim for the B737MAX, 737 MAX. Note that it is different from a simulator used for training.

Thanks for the link, RP. It does look like a Boeing propaganda film, however.

Here’s my angle. I have considerable experience designing and building test equipment; devices/software that attempt to determine if the final product will perform as intended. (Granted, my hands-on experience is years ago, and on missile guidance systems, not airplanes. But there must be some design concepts that are common over the years and hardware.)

I once had a significant argument with my fellow engineers. They were designing a new, hi-speed, specialized CPU, and once they got the rudiments working – clock, crude I/O, etc. – wanted to use the CPU to test the more advanced functions of the same device. I argued that this was a bad idea, and we should use an external, even consumer-grade CPU, to run the tests to avoid multiplying unforeseen flaws. I won this argument, and I don’t regret making it.

To apply that to the 737 sim, if I had been involved with the testing, I would have used a simulator, and devised as many event conflicts as were mathematically possible. What if sensor A says the speed is too high, but sensor B says the angle of attack is too low? What if the direction sensor says the plane is flying backwards or upside down?

Don’t laugh at the last one…there is a buoy out in Lake Michigan that sometimes reports wave heights of 3 feet, followed by wave heights of 255 ft, then 3 feet again. Obviously some bits in the sensor are wonky and are getting randomly flipped, and it won’t cause any grief here, but how would a plane’s computer be smart enough to know that the sudden change and the large value is unreasonable unless there was some logic to address this unforeseen, unexpected situation? And if you don’t try to find those situations early, heaven help us.

My first jet type rating was in a simulator and I never flew the real plane until afterward. I was relieved at how much easier the actual aircraft was to land. To a lesser or greater degree, I’d say that was true for each of my subsequent type ratings.

In fact, the FAA mandated a change to the software in the simulator for one of the planes I trained in and actually made it MORE difficult to the point where I felt it was totally unrealistic. My partner remarked that in her entire career she had never “red screened” a simulator until then. If you had even a few degrees of bank angle in on touchdown it would simulate a crash. Absolutely not true in the real aircraft.

Also difficult to reproduce the true feel of braking. Add to it that the rigging and condition of each individual aircraft is slightly different. So in the simulator braking feels mostly smooth, where in real life brakes sometimes grab or pull.

All this to say, I don’t expect any simulator to be a true simulacrum of reality. But the full-motion sims we use in the airlines and charter world are excellent training tools. The improvements I’d like to see are more in curriculum and required maneuvers.

Yes. I am surprised the link is still up to be honest, not the best PR.

All perfectly reasonable. I have no idea about the actual process used but yes, they’d need to do a risk analysis of conceivable failures. A full on flight sim like the e-cab might not necessarily be the most appropriate tool, I would think that most things would have been simulated as pure software before ever being loaded on to boxes fitted to a flight sim.

As I understand it, the issue with the MAX wasn’t that they didn’t know, or hadn’t thought of, what would happen if an AoA sensor failed, but that they assumed pilots would respond in a certain way and they didn’t.

Yeah I agree. Typically the problem is resolved by having multiple sensors. If you have just two sensors and they disagree then both sensor’s indications are disregarded and the resulting problem is handed to the pilot (good luck!). If you have three sensors and one disagrees from the other two then you can disregard the outlier and keep the system operational. Of course sometimes it is the outlier that is correct–no system is perfect. There could probably be more done to disregard a single sensor’s value if it seems unreasonable compared to previous values from the same sensor but that only works for obviously unbelievable values, e.g., your wave buoy. You still need to protect against subtle failures and so you still need either multiple sensors or to ensure the failure modes aren’t catastrophic. What if the buoy’s wave height readings very gradually increased from 5 to 255 feet? You need more buoys to validate the data.

This is a fascinating discussion.

If I can ask a further question, how much do the simulators simulate environmental factors like pissing rain and lightning, or give you the G-forcy feel of heavy braking?

Which brings me back to the testing…shouldn’t the situation where an AoA sensor failed be written up in the manual? Or are there too many unlikely sits like this that would fill too big a book? If so, what is the solution? Certainly we can’t ignore the one-in-a-million chance that this would happen, since there are millions of chances over time.

Which brings me to my next point, “reasonableness” analysis. Humans are pretty good at this; computers are ignorant unless programmed to handle it.

Reasonableness testing, or the lack of it, IMHO, is the cause, or at least a huge contributing factor, to the disasters of the recent 737MAX, Air France 447, and Quantas 72, just to name some off the top of my head. In each of these cases, an accurate analysis of the “reasonableness” of the situation might have changed the outcome. In some of these examples, humans made a poor assessment of the reasonableness; in others, the computer did (usually because the computer didn’t analyze it at all).

In my example of the wave height, a reasonableness test would say, “is a wave height from 3 ft, consistent over the last few hours, likely to rise to 255 ft in one hour, then subside in one hour, in the absence of any other contributing factors (passing ships, sensor malfunction, etc.) in a location that hasn’t had waves greater than 50 ft in 10 centuries? Isn’t the likelihood of a short or open in the sensor, resulting in all “1”'s (255Dec) much more reasonable? Should we at least flag the data with an asterisk, or examine possible contributing factors?”

ETA: Ninja’d by Richard Pearse on sensor voting.

How do aircraft computers detect spurious sensor readings? Triple modular redundancy, mostly. Given your stated background, I’m surprised you’re unfamiliar with it.

If you’ve got three angle-of-attack (AOA) sensors, their readings will match one another within some envelope under most circumstances. The computer is constantly comparing readings from the three, which is sometimes called “voting.” If one sensor suddenly diverges from the other two, the computer trusts the pair that match each other.

You can also use two sensors and defer to the pilot when the sensors disagree (e.g., by disabling automated systems like autopilot and/or MCAS). The 737 has only two AOA sensors, but obviously MCAS remained in effect when they disagreed.

Worse, while both a “disagree” indicator and a readout of the two sensors were available, Boeing sold them as options. IMO, that was a great example of MBA-style revenue optimization infringing on good engineering judgement.

So yeah, there is “some logic” available to detect bad sensor readings, and that logic is exceedingly common in aerospace engineering. It wasn’t applied properly in the case of the 737 Max, of course.

Do you seriously think that no one tries to find problematic corner cases, or is that just an unintended consequence of your phrasing? There’s an entire engineering subdiscipline called failure mode & effects analysis (FMEA) dedicated to “trying to find those situations early.” My entire field “tries,” and hard.

FMEA is an imperfect process even when done by the book, but it’s among the best tools we have to catch these things. And triple modular redundancy isn’t perfect either, but it’s really quite good. Modern air travel is absurdly safe.

Boeing did a lot of reckless things as they tried to minimize training requirements for their new 737. The “disagree” light shouldn’t have been an option and MCAS probably should have disengaged upon disagreement. The soft-pedaling of the new flight dynamics (and especially the soft-pedaling of even the existence of MCAS) was egregious. But Boeing’s poor engineering practices in this case don’t imply that no one has ever considered how to detect and respond to bad sensor data.

I have been in the Alaska Airlines 737 flight simulators on boy scout tours several times. One of the things that is not touched upon yet is the incredible cost of the flight simulators! Once, they turned on the pitch and tilt (but no feedback) hydraulic lifts to simulate landing for one of the groups, but every other time it has just been a static system. Each simulator sits 15’ in the air and has special bay doors to allow it to be installed in the building. Some boy once asked how much and though the “instructor” didn’t know the actual cost, it was easily in the 8 figure range. And Alaska has 6 on site. So companies aren’t buying something for $20+M which isn’t completely authentic and realistic!

Given my stated background, I am quite familiar with the concept.

What you are describing is a simple comparison between redundant sensors that are expected to provide similar information. Granted, it requires fuzzy logic, since no two sensors will agree to the 0-th decimal point of accuracy, but given a range, they can be considered to be in agreement sufficiently to not trigger an exception.

What I was talking about is disagreement across different sensor data, which may or may not be contradictory. Some correspondence differences are of no consequence, but some are substantial. To illustrate, in Quantus 72, the initial indication was simultaneous “pitch too high” with overspeed indications. While this is possible, it would be unlikely as a sudden, instantaneous value following normal values for both, 1 second earlier.

In the case of France 447, about 2 minutes or less after the initial alarm, all sensors and computers were operating normally. In this case, the anomaly was the pilot (Bonin) requesting “nose extremely high” even though normal flight parameters were in place. A reasonableness test would have flagged the pilot’s action as an exception (why command nose high at the already max altitude of a routine flight, which would cause a stall?), and perhaps saved 228 lives.

Cool.

That’s not what fuzzy logic is, but okay.

“Normal flight parameters” were emphatically not in place at that point—AF 447 had kicked itself out of normal law and into alternate law 2. And when the airspeed indicator came back, the stall warning sounded immediately. AIUI, the pilots had no way to know they could now trust the reported airspeed. I don’t understand why anyone would call that situation “normal flight parameters.”

Very quickly, AF 447’s actual airspeed fell below the A330 approach speed with full flaps. Those are normal flight parameters for a brick, not a plane.

AF 447 was a complicated disaster fueled by bad sensor data, human/computer interaction problems, dueling control inputs (from both he left and right seats) and poor cockpit communication. If a “reasonableness test” would have cut through all that, then great. But given the pilots’ disorientation, the flaky airspeed data and profound confusion about who was flying the plane, I don’t see that flagging one pilot’s nose-up command as unreasonable would change much.

Besides, Airbus’ normal and alternate flight control laws already reject or attenuate “unreasonable” inputs; I’m not sure how that’s different from what you’ve proposed. I must admit that your “reasonableness test” sounds a lot like “hindsight” to me.

Maybe I’m mistaken; it sounds like you think your reasonableness test idea is a novel and even life-saving invention. So how is it different from normal and alternate control laws? How would it have saved AF 447?

Most simulators are built on hexapods (AKA Stewart platforms). These are essentially robotic actuators with six degrees of freedom: they can move linearly fore/aft, up/down and side-to-side; they can also pitch, yaw and roll. Their range of motion is obviously finite, but you can get lots of G-forcy sensations by combining these motions.

I’m just speculating, but heavy braking might involve pitching the “nose” of the simulator downward a bit while the screens show a level attitude. Then gravity would feel like deceleration, and what you see on the screens would back that up.

Fuzzy logic is…fuzzy. But nevermind; we’re OK. :slight_smile:

I’m trying to apply a general concept (reasonableness test) to multiple situations. It may not be that universal.

RE Air France 447, if there was a reasonableness test routine operating at the time the pitot tubes came back online, the software (as I envision it) would compare the current operating parameters (highly normal) with the (highly abnormal) side stick position (nose up). Now we’re entering the speculative world, but maybe an indication that [conditions are normal, but the copilot has his side-stick in an abnormal position] would be a clue to the other two pilots that something needed to be done?

My reasonableness test is an attempt to make computer rigidity adapt to human knowledge; both have their strengths. We know that relying on computers 100% can be fatal; relying on humans 100% can be, too. Why not combine the best attributes of each?