The video posted by Stranger is excellent, but you have to have an appetite for some hairy technical details or your eyes will glaze over. It’s really clearly explained, but you do need some tolerance for references like “buffer overflow” rushing past with an assumption that you know what he’s talking about.
If that’s too technical, here’s the very very simple version.
What is this software? Crowdstrike — or more specifically, Crowdstrike Falcon, but I’ll just call it Crowdstrike for simplicity — is security software that defends computers from attack. It’s like a guard dog at your gate. If something bad tries to get into the computer, it’s supposed to protect the machine. Lots and lots of companies around the world use this software to defend their computers. It’s one of the standards in the field, and prior to this was (generally) well regarded.
What went wrong? Defensive programs like this get updates all the time. Security is not a static effort, like a big fence. The capabilities of the bad guys are constantly evolving so the protective software needs to be regularly improved to match. In this case, an update was sent out, which is normal. However, it turns out that one of the update files for Windows (and only Windows — Linux, for example, was safe) was broken in a very weird way. (The video linked above explains exactly how weird.) So when Crowdstrike tried to load the file and update the computer, the computer was unable to start up.
Why would this prevent the computer from starting up? Defensive programs like this have to run at a very deep layer on the computer; you will hear technical terms like “privileged” and “kernel,” but what this means in simple terms is that the software lives and operates at a fundamental level in order to protect the machine. This is because viruses and other bad things have become very good at exploiting holes in computer design to insert themselves at the same very deep levels, which makes them very hard to remove. So the defensive software has to run at the same deep layer if it’s going to protect the computer, and that means it’s one of the first things that gets launched when a computer is started. Unfortunately, if the defensive software locks up with a stupid error like this during startup, it blocks the computer from proceeding to anything else.
Why did this break so much of the internet? Because computer systems have become so heavily reliant on one another. Even if a specific computer system isn’t using Crowdstrike, if it’s critically dependent on another system that does use Crowdstrike, then that first system becomes useless. Consider the point-of-sale experience, where you swipe your credit card at a restaurant or grocery store or gas station. Maybe that local system doesn’t use Crowdstrike, and is still working fine. But if, after you swipe your card, the restaurant’s computer tries to contact a card-processing system that does use Crowdstrike and is broken, the restaurant can’t clear your payment. Multiply this by the thousands of systems that lean on one another to do stuff, and anywhere you get a broken link in the chain, the whole thing locks up.
How did this bad update file get sent out? Didn’t they test it? Why don’t they do phased rollouts by region? Those are the million-dollar questions right now, and lots and lots of angry people are standing at Crowdstrike’s front door demanding answers.