Intel chips have security design flaw

Between the two I’d judge Meltdown the most serious for cloud providers like Amazon, MS, etc. And the most serious for big users of cloud services. Meltdown offers the possibility that some random user of AWS or Azure might do a complete memory dump of all your vertualized machines, and mitigation is going to increase costs / reduce speed / cut productivity…

You can’t do anything personally to mitigate that, because you have no control over what other people are doing in their virtual/cloud machines. The providers will have to mitigate, and they will have to mitigate worst-case for a situation where the impact is higher than normal

Spectre won’t get out of virtual machines. It is more difficult to fix, but it will be exploited by ordinary virus or web-site exploits. 99% of the risk will be mitigated by web-browser patches, and in the normal run of things exploits won’t work correctly unless you are running the specific target OS version for the specific exploit.

Or perhaps I’ve misunderstood.

Are you serious? Is the problem that bad?
I have zero understanding of how computers work, but I’ve been joking for the past 20 years that some day, somewhere,somehow, somebody is going create a computer virus that will drive the whole world back to the stone age.
Is this now a realistic possibility?

Meltdown is quick and easy to fix, albeit at a significant performance cost for some IO-intensive workloads. Intel “Haswell” CPUs and later already have PCID which limits the performance hit.

Spectre is far more serious. It allows a user process running within a VM to break out of this and access data in the hypervisor, which it can then trick into passing it data from other VMs. IOW on a big virtualized server running separate instances of SQL Server or Oracle each in their own VM, Spectre can cross VM boundaries and access in-memory user data from other VMs. On a Mac if you are running Parallels Desktop or VMWare Fusion, Spectre theoretically allows a Windows app to break out of that VM and access data in the host OS or from other Mac apps.

Meltdown is a single well-defined behavioral characteristic of Intel and certain ARM CPUs. There currently seems little doubt when it’s fixed in software via kernel page table isolation, that fix is reliable and total.

By contrast Spectre is more of a general method, of which two examples are currently known. It is much more difficult to mitigate and even when done, there is less confidence it’s totally fixed. The broad nature of Spectre is obvious from the fact it affects most CPUs designed within the past 20 years – from many different vendors. It’s possible other Spectre variants will be discovered.

The performance cost of total mitigation can also be compared. With Meltdown, it requires OS kernel page table isolation (not patching apps) which incurs a variable performance cost from essentially zero to maybe 30% in extreme cases. The only way to achieve similar confidence in a Spectre fix would be to disable branch prediction and instruction speculation. That would probably have a 5x or more performance hit – the CPU would essentially be unusable. Thus Spectre fixes to date have been more like patching holes in a leaking dike. It might not be totally fixable without tearing the whole thing down and building a better one, or in CPU terms totally rearchitecting CPU design, something like Itanium Mark II. In that case all software would have to be recompiled or re-written to run on that new architecture.

Daniel Gruss, one of the researchers who discovered these issues, described the differences this way: “Think of a Star Wars movie where someone wants to steal money. Spectre is like a Jedi mind trick: you make someone else give you their money, this happens so quick that they don’t realize what they’re doing.”

“Meltdown just grabs the money very quickly like a pickpocket. The Jedi mind trick is of course more difficult to do, but also harder to mitigate.”

I doubt it’s that bad. Off the top of my head I can think of a few possible fixes like checking privileges before caching or invalidating a speculative cache line after the fact. It may not be easy but it’s doable. It will require some changes in the cpu design but it doesn’t sound like a complete rewrite of computer theory just yet.

The original assessment from US-CERT, a US government cybersecurity group, said Spectre could not be reliably fixed in software or microcode and total replacement of all CPUs was the only true solution. They have since walked back that statement, but it shows how this is not a wild, unfounded possibility.

Engineers are very clever at fixing problems and it’s starting to look like maybe Spectre can be contained via some short-term steps. Mid-term Intel can probably add some hardware “crutches” to current x86 design to aid this mitigation. However even this might take two years. The latest Intel CPU is Ice Lake which has “taped out” and is ready for manufacturing, which will happen late in 2018. It has the same architecture as prior Intel CPUs and hence is just as susceptible to Spectre.

To make a CPU hardware change – not an architectural change, just add registers or modes – takes a couple of years for design, validation, testing and manufacturing. So until that point the only possible mitigation steps are software and microcode patches, and only for systems which can receive those. There are lots of older PCs which probably cannot receive new microcode, plus who knows how many mobile and embedded devices that cannot.

This doesn’t mean it’s a “back to the stone age” situation. It’s starting to look like maybe Spectre can be mitigated within certain limits, and maybe it will be 99% successful. However CPUs are supposed to be essentially perfect from a hardware security and calculation standpoint. It is an uneasy feeling to have CPUs that even have a 1%, or even 0.001% chance of producing incorrect results from either a calculation or security standpoint.

OTOH – even aside from Meltdown/Spectre – there has generally been a feeling among some in the CPU architectural community that superscalar “out of order” designs are evolving down a blind alley and something totally different is needed. Whether that is something like Itanium Mark II or the so-called “Mill CPU” is unclear: https://millcomputing.com/

However if totally fixing Spectre long term without lots of patches and crutches requires a new architecture, maybe it’s time for something new. Apple Macs have converted to an entirely new CPU twice (Motorola 68k to Power PC to Intel) and obviously the company and computers are still around.

Spectre’s not going to be fixed in a year. I’m not sure that it’s even going to be fixed in 3-4 years. Because it’s not something that can be simply patched quickly. Two major architectural methods for increasing performance, caches and speculative execution, when combined, are a security risk. A chip that does away with one is going to be much slower. A chip that includes both is going to be very hard to make secure. And that’s just one side-channel. There are probably others.

The problem is that there’s nothing to switch to. There’s no modern high-performance architecture that isn’t vulnerable to this sort of thing. Apple switched between existing commercially-available architectures. Designing something new to remove this flaw is going to take a while.

That’s not exactly correct, Itanium 9700 is in production right now and it’s unaffected by Spectre (at least to my knowledge): Itanium - Wikipedia

That would not be ideal and it’s obviously not intended for desktop use but it shows there’s a difference between “nothing to switch to” on both desktop and servers vs a high performance but non-ideal server-oriented CPU that’s already in production.

Even if there was a more conventional design already in production which wasn’t affected by Spectre – say if IBM POWER CPUs were not affected – the immediate “availability” would make little difference. Most of the software is Intel x86 based so it won’t run on a POWER 8 CPU. All that software would have to be re-written for POWER CPUs, and that’s a lot more than just recompiling. If that took eight years and if designing a new architecture CPU took two years after which you port all the software, that’s only a difference of eight years vs ten years.

There’s a good argument that conventional superscalar CPUs with speculative “out of order” execution are on an evolutionary blind ally, and regardless of Spectre, a totally new approach was needed anyway. That may still not happen but the CPU architectural community will now probably give it a closer look.

Resolving SPECTRE requires compromises in performance -

1/ disable speculative execution
this costs in performance, but you do gain back significant silicon.
the question is whether you can use this additional silicon to ramp up clock speeds to compensate

2/ enforce page table security
speculative execution is a quick hack, so does not implement page table security
the answer is to implement page table security during speculative execution.
however, this may well be as slow as not performing speculative execution at all, and requires more silicon

3/ make cache access constant-time
the “side-channel” is measuring the time it takes to access a cache hit vs a cache miss.
if you make all cache requests take the same length of time, you eliminate the side-channel leak
of course, this makes the memory cache pointless and removing cache slows performance

And in the most unfortunately named product since AYDS category

I’m in IT, but a lot of this stuff is way too far down in the technical detail for me. What I wanted to ask was:
If there was some sort of per-thread encryption going on - that is, anything that gets written to memory (including onboard CPU caches) is only intelligible to the process performing operations on it (so if anything else got hold of it, it wouldn’t matter), would that work? is it even possible?
I’ve a feeling that the answer is something like: yes but it would prevent any other process from efficiently predicting or speculating anything, or yes, but the encryption overhead would be intolerable

Not really. At some point the data has to be in a readable form, or at least simply convertable to a readable form, and at that point it can be copied. Anyone can look at the instructions that a program performs, so you can’t hide a secrete key inside the executable, and if other processes can read memory, you can’t hide it there either.

There may be some kind of secure hardware thing that could be used for this, but I don’t know enough about that to speculate (ha!)

But you can do some things in software that help.

Here’s a good article on what Webkit (the engine that runs Safari and several other web browsers) is doing to limit Spectre.

They point out that because processors will speculate across branches, you can’t use branches to enforce security. But processors won’t speculate across some other things, so you can use other mechanisms to enforce security that the cpu won’t speculate across, like masking or modifying pointers.

One method is to “poison” pointers, by modifying the value of the pointer so it points somewhere invalid, then modify it back to the right thing right before using it. Which is like what you’re suggesting. To be clear, anyone who was able to read memory would still be able to figure out where the pointer went. It’s not secret because the code and the data aren’t secret. But since the CPU won’t speculate past the pointer modification, if someone tries to use Spectre to have the CPU speculatively read where the pointer points, it won’t read anything useful.

Several people have mentioned that Itanium is not affected by Spectre. Does anyone have a cite for that?

Itanium has both speculative execution and caches, so it seems like it could be vulnerable.

The z14 (mainframe) has some level of low level encryption (encryption unit on the core and impacts data in cache etc.) but IBM has not made any public statements yet about spectre and z14, and there was an article on some website that stated z14 not impacted but now they have retracted the article stating something like “we wrote something that we can’t actually verify so we shouldn’t have written it”.

This all leads me to believe that even the z14 is vulnerable.

Some IBM Z-series run Linux which was the first notification of patches required: https://access.redhat.com/security/vulnerabilities/speculativeexecution

However it was also confirmed by IBM (“…vulnerability impacting all microprocessors, including…IBM POWER family…”) : IBM Security Bulletins - IBM Support

I think Z-series encryption is mainly for disk storage, not processor L1 cache memory or TLB cache (which is what Spectre attacks). I cannot imagine any encryption system fast enough to work on CPU instruction or TLB cache.

Some references state Itanium’s use of speculative execution and branch prediction is not done at run time by hardware. Rather, for a branch the compiler statically encodes instructions which are literally tagged as “speculative”. There are also instructions encoded to handle the “roll back” case for failed speculation.

However other (more credible) references state that although the speculation and branch prediction can be heavily influenced by software “hints” encoded by the compiler, it nonetheless has dynamic branch prediction hardware.

You raise a good point. I have seen several statements (maybe I even made some myself) that Itanium is immune to Spectre, but I don’t think we know for sure.

I’m not an expert and just read the redbook because I was interested, but it does say the crypto coprocesser stores results directly into L1 cache and has some other verbiage about keys being protected from kernel and user, so while not encrypting all operations, it still seems like a substantially low level of operation that has a chance of being fairly secure.

I don’t think it protects against Spectre, otherwise Red Hat would not list IBM System Z as a vulnerability: https://access.redhat.com/security/vulnerabilities/speculativeexecution

Most of IBM’s documentation about “pervasive encryption” is focused on encrypting all data on disk, not in main memory. Even if memory data and code were encrypted, it must obviously be de-crypted for the CPU to execute that and successfully manipulate the data.

The Spectre vulnerability exists due to speculative execution perturbing TLB cache in a way that an app can deduce the cache line and indirectly, the actual contents of memory which was speculatively loaded then later rolled back: Project Zero: Reading privileged memory with a side-channel

Even if the Z14 used RAM encryption, this would have to be decrypted “in flight” to actually run instructions and access data. During that window Spectre could still happen.

I’m sure mainframe shops using System Z will get more guidance from this, but I don’t currently see how encryption would prevent Spectre.

And the hits just keep coming. NVIDIA has announced that some of their GPUs are vulnerable and are issuing patches for them.

GPUs.

Also, there’s a flawed patch for some Ubuntu systems that are hosing people’s computers.

Let’s be careful out there.

Yeah, I don’t see encryption working here. Even if the caches are encrypted (which would no small engineering task and probably imply some major performance hits), the values in registers can’t be, and Spectre uses the value in a register to load/store from a particular address.