So I work at a company that deploys embedded systems. These systems are produced in millions of units. They fail sometimes. And it costs inordinate amounts of time to hunt down the faults - often several months, where several well compensated developers are working on the problem basically full time. (even worse, when the faults get really serious, executives will schedule daily meetings on the problem, that can drag on for months, and that can’t be cheap)
So I read how SpaceX uses this pattern. It’s a pretty simple idea.
a. Divide your software into isolated modules. (they must be isolated in memory, OS isolation is optimal)
b. Each module cannot retain any state, except for modules who’s role is to do nothing but.
c. Each module can only communicate with other modules through “messages”. (they can be implemented efficient ways but the message framework must guarantee messages can’t be corrupted by the sender)
d. Each module must be completely deterministic - a given set of input messages has one and only one correct set of output messages. This notably means you have to set the FPU to IEEE mode, etc…
Why do this? Because the message passing framework has another purpose. Any messages can be copied as they are sent. Or, later, reinjected. This means if a system fails when it’s running, and you have it in debug mode saving all input messages (hardware has to be specced fast enough to make this feasible), bugs can always be reproduced. (due to the determinism property)
Almost of the system can also be unit tested as well, except for very low level code that’s role is to touch hardware registers but should use the messaging framework…
A second major advantage is that unit testing is a pain in the ass in low level languages. If the system uses messages, you can build a test framework where the tests are written in a scripting language, such as Ruby or Python. Furthermore, you can do better than just plain unit tests - you can originally implement each algorithm in Python/Ruby/Matlab. (due to using a message passing framework that is language agnostic, there is no issue with part of the codebase already being in a lower level language).
Then, once the algorithm is proven correct enough, reimplement the algorithm in a faster programming language, and test it against a massive saved pile of input messages and insure it gives the same results. Readily automatable as a regression test as well.
Memory is separate from business logic. This would also prevent a great many of the memory corruption bugs from low level programming, because to save any information, you have to pass a message to some other system to do it. That other system can be written properly to not leak memory…
Maybe I need to work at a different company. Where I work at, it’s all about right now, and rarely about any sort of method that might prevent the current software crisis.