Crowdstrike failure - How? Why?

One thing I remember being told about at some point, but have never actually experienced is that it used to be a Big Deal to generate Position-Independent Code where there were no absolute memory addresses allowed, just offsets.

Allegedly if you weren’t making position-independent code, there was a process when your binary was loaded that when through the machine code and adjusted the addresses so they were right for wherever the binary happened to be loaded in memory.

Virtual addresses cured the need for that for user-space code, but maybe it’s still a thing for kernel code?

The only reason I think of it is that it is something that could bork memory accesses if applied/not-applied inappropriately (and in different ways on different runs/setups depending on where the stuff gets loaded). Especially if the offending .sys file is a “stealth binary” like what the Dave’s Garage guy says.

So actually the null pointer is a red herring. Its not a null pointer, a better analysis of the crash dump shows there is a null check happening. Instead it’s reading last the end of an array and retrieving some uninitialized memory treating it as a pointer. A pretty classic C/C++ mistake, not quite as bad as not checking for null but still pretty bad. And definitely something that could be caught by any static analysis tool.

My (revised) guess is three is a count somewhere in that file, that is wrong in the bad version. So it’s providing like 100 rules but saying there are actually 105 rules, but then the C++ code is never checking who many rules it has.

I’m not confident that I’m technically qualified/knowledgeable enough to ask any questions that comply with all the modnotes, but here are a couple anyway:

  1. How does anyone push a (ideally global) patch to a shit-ton of bricked computers?

  2. Even though I’ve never actually read the entirety of an EULA, I expect that a standard clause in those indemnifies the provider against liability for damage to the users’ systems (or at least disclaims it). Is such a clause going to protect Crowdstrike?

You don’t, of course. That’s why corporate IT departments are having employees bring in their computers to be fixed. If you’re referring to the initial push, that was the cause of the bricking, and hence happened before it.

Thank you. I suspected it might be that flavor of clusterfuck. But that immense of a fix seems costly enough to demand that my second question be considered (and on reflection, it’s perhaps more appropriate to IMHO).

If you’re talking about the “fix” patch, they talk about for some systems rebooting multiple times works. That’s because the process that downloads updates gets launched at the same time as crowdstrike falcon itself, and sometimes by chance it finishes before falcon gets to the “brick everything” part of its execution.

And they are the lucky ones:

Position independent code is about removing absolute locations to either jump to, or call functions at. So jump and call instructions are always relative to some offset. This makes for easy relocation of code, and generally enables a lot of sensible and useful management. It doesn’t address data locations - which where null pointers and the like cause issues.

Modern operating systems mostly require PIC, but for different reasons. Randomising the location of code, especially libraries that are loaded with a program is one tactic that makes some exploits difficult if not impossible. Typically libraries are load as a shared read only mapping of the source library file. Which makes for a lot of efficiencies as the raw data is only present in memory once, no matter how many processes use it. So the old school leader, that ran over the code fixing up all the offsets is a bad idea.
Libraries will often provide a jump table at their head, so there is a fixed entry point relative to the library code that user code initially calls to. That then redirects to the actual code. So a linked executable is provided with the entry points, and is then independent of the internals.

There is a lot of speculation about the internals of the Crowdstrike Falcon failure. Just what is in the configuration files is a deep mystery.

That’s impossible to answer at this point. We won’t know until the dozens if not hundreds of lawsuits work their way through the process. And given the fragmentary nature of the American court system, let alone other legal frameworks around the world, it’s unlikely that there would be one single consistent result in every single suit anyway.

But it should be noted that there are different flavors of “protect” here. As I said in another thread, even if you stipulate that the clause will allow Crowdstrike to prevail in every one of those lawsuits, the sheer number of them means the cost of defense will be staggering. There’s no indemnifying clause in the world that allows a company like this to handwave all the prospective lawsuits directly into the cornfield.

This is uncharted territory, in terms of scale and operational and financial impact. I don’t think any prior examples give us a reliable basis for predicting future events here.

FWIW, a rando commenter at Krebs on Security provided some of the language:

THE OFFERINGS AND CROWDSTRIKE TOOLS ARE NOT FAULT-TOLERANT AND ARE NOT DESIGNED OR INTENDED FOR USE IN ANY HAZARDOUS ENVIRONMENT REQUIRING FAIL-SAFE PERFORMANCE OR OPERATION. NEITHER THE OFFERINGS NOR CROWDSTRIKE TOOLS ARE FOR USE IN THE OPERATION OF AIRCRAFT NAVIGATION, NUCLEAR FACILITIES, COMMUNICATION SYSTEMS, WEAPONS SYSTEMS, DIRECT OR INDIRECT LIFE-SUPPORT SYSTEMS, AIR TRAFFIC CONTROL, OR ANY APPLICATION OR INSTALLATION WHERE FAILURE COULD RESULT IN DEATH, SEVERE PHYSICAL INJURY, OR PROPERTY DAMAGE.

No, I have no idea what the legal implications of such language are: see Cervaise’s post above.

Yeah, the EULA will almost certainly disclaim consequential losses, because that’s pretty much boilerplate text for any EULA, but these things are not immune to legal challenge.

A possibly-relevant analogy: most supermarkets have a sign in the car park that says something like ‘cars are parked at owner’s risk; the company is not responsible for loss or damage to vehicles, howsoever caused’, but if an employee of the store drives a pallet truck into your car, that sign is not worth the paint used to make it.

That stuff is a pretty clear prohibition for certain uses. It says do not use this software at all for these cases. As in we won’t sell it to you, and if you buy it and use it in a critical situation you are breaching the agreement. Someone who did install the software in a safety critical system and it caused damage would be entirely on the hook for that themselves. And probably the subject of criminal action if the outcomes were bad enough.

It is very common to see statements for hardware that say you need a sign off from the company CEO before you can use a product in a safety critical application.

So this language has no bearing on companies that brought the software for its intended use and suffered loss. Those companies bought the system to perform its advertised function. You can’t add language to a EULA that says the product is not fit for purpose, and expect a court to give it any weight. But you can say what it must not be used for.

The old shrink wrap EULA of days gone by where vendors disclaimed any responsibility even if the software manifestly did not work as advertised are long gone.
The various FOSS licenses perhaps being the exception. You never paid, so there is no contract. But this, as far as I know, has never been tested. Typically legal action follows the money, and going after some poor lone open source contributor is not going to pay anyone’s lawyers. So it remains an unanswered question.

The thing that gets me on these kiosks and signs, is what do they even need crowdstrike for at all?

Even taking for granted that they need to be running Windows, and they need to be connected to the open internet, these computers only ever need to be running one thing ever. The security policy shouldn’t be, “download new configuration every day to check to see if any newly-created malicious programs are running on this device and terminate those”. It should be, “Terminate any program that isn’t the one thing that is supposed to be running on this device”.

Something like that would be both more secure and wouldn’t be susceptible to being borked by an update.

And if the malware somehow corrupts that guardian process, how will it ever get uncorrupted?

Sure it is susceptible. What happens when the whitelisted software updates, and now it spawns some extra processes that are blocked? What happens when the OS updates, and now has some new processes that need to run?

I’m not trying to pick on you, I’m just saying this stuff is hard. If it was easy, then people would be doing it that way.

Security stuff can be extremely difficult because you need the ability to have extremely quick updates to deal with emerging threats, and also have very low level access where a bugs can cause major problems.

I’m not trying to defend Crowdstrike, but I am offering a defense for the IT departments that chose to use it. They’re working in a hostile environment, where the attackers only have to win once. They choose to use something supposedly tested and created by experts to help them defend their computers. They may have picked the wrong company, or they may have picked the right company who went rotten, or just made a mistake.

Microsoft has identitified the source of the problem:

It’s the EU’s fault!

See, the EU doesn’t let Microsoft be a full monopoly, as God and Bill Gates intended.

EU anti-competition law forces Microsoft to contract out some of its functions to other software companies, preventing poor Microsoft from having complete quality control over its own software.

But for that nasty EU law « Boo! Hiss! », Microsoft wouldn’t have been forced to allow Crowdstrike access to its kernel (that sounds vaguely pornographic…), and none of this would have happened.

I would consider kiosks and signs as a prime target of a certain class of hacker. The ability to display whatever took your fancy on such a visible surface would be irresistible. So hardening them would be good practice.

Having your sign displaying porn would only be the start of possible nasties.
True bad actors could subvert things in much more subtle and damaging ways.

They don’t. They could be running some form of unix on a raspberry pi. There is no need for a billboard to require a full OS like Windows.

If there is actually a computer hidden behind that monitor in the video provided by @griffin1977, then that is quite possibly the worst design of a billboard/kiosk that has ever existed. Surely the computer that drives that billboard is located somewhere that is a bit more accessible.

I doubt that these two guys are fixing the Crowdstrike problem.

Yes.

And yes.