My late brother disliked the C language and hated to have to program in it. He was a systems analyst who set up a secure link between a major NY bank and the Fed during the mid 90s. His beef with C was that it didn’t include bounds check which meant that if there was a string that was declared, say, at 256 bytes and you wrote 5000 bytes to it, it would accept it and write over what was there. Whenever he had to program in C, he would program his own arrays and strings with built-in bounds checks. Yes, it was slower and took up more space, but it was also safer. Nowadays, neither space nor time is a real consideration.
From the minimal explanations I got from the news reports, it seems that the problem is that a routine that was supposed to read a few bytes to see if a link was still live actually allowed the reading of many bytes and you might get into the password file. Is this explanation correct? And is it just another manifestation of the bounds check problem?
Correct, that’s exactly what the attack is: the protocol is designed to echo back whatever you send it. You send it 1 byte of data, but tell the header that your packet contains 64k of data. The server sends back your original byte, plus 64k of memory contents. Here’s a good summary of the bug.
Your point is valid but this is different. If you tell memcpy() to copy X bytes from A to B, the compiler has no way to know whether X is sensible or not. There is no buffer overrun, B can fit X bytes just fine. But you are trusting that the user was truthful when he declared A to be X bytes long. If he wasn’t, the memcpy() will sweep in some stuff that exceeds the bounds of A.
Pedro: but the real point here is that if they were using the standard malloc() it might have done address randomization, memory zeroing, guard pages and other good security practices. They’d get those security benefits “for free”.
But instead they wrote their own that isn’t nearly as good as the version that ships with the OS. Basically, they expended more effort to come up with an inferior result. All because some OS, some time back in history, was “slow”.
I don’t see how address space randomization would help. Address space randomization protects against attacks that exploit a known memory layout (like return-to-libc attacks). The heartbleed exploit doesn’t depend on a known memory layout; it just throws darts at a wall until it gets something juicy.
Memory zeroing is only partially effective. It obviously can’t protect live data. Something like a username/password combo would have a relatively short lifetime but statistics say that you’re still going to leak some usernames and passwords. And the performance is so bad that you would never turn this on by default. This could have only mitigated the issue in the time window between an administrator becoming aware of the issue and being able to restart their openssl-using service, and the fix being released.
Guard pages are a nice idea but the performance impact is brutal. OpenBSD’s own paper describing the feature shows that while many benchmarks (which probably don’t malloc much) show no performance impact, others showed a 50-100% performance penalty. I don’t believe that any OS outside of OpenBSD has used that approach.
The last time that I checked, both Linux and Windows fall under the “terrible malloc() performance” umbrella. This is why Google wrote tcmalloc, Facebook hired the jemalloc developer and Firefox imported jemalloc into their browser. This isn’t some theoretical issue on 1995-era AIX. It is a real problem today.
It’s only different if you’re overly literal with the meanings of “subscript” and “array”.
Implicitly, any random chunk of heap is an array of bytes. The fact that memcpy doesn’t do any check on its hidden iterator is exacly what the quote is about.
And I can attest… the most literal reading of the original quote (for instance, in some broken FORTRAN IV installations) can suffer exactly the same runtime behavior as heartbleed’s unbounded check: the ability to read beyond the notional end of the data structure.
I still think your point is without substance for this particular case. If you want to deprecate memcpy as unsafe then it’s no longer C. Otherwise no compiler check in the world is going to detect this and Kernighan didn’t propose any either. A and B are just memory addresses, the compiler has no idea what was allocated there dynamically at run time.