Yes, VERY picky. It’s insane to, for example, worry about the risk of an accidental SHA collision. (A collision due to a deliberate attack is a different matter, but irrelevant to this discussion.) Since SHA uses a 160 bit checksum, the risk of a SHA collision is about 2[sup]-160[/sup]. By comparison, if we assume a Chicxulub-scale asteriod impacts the earth about every 100 million years, the risk of THREE Chicxulub asteroids striking the earth within the same MINUTE is vastly greater than the risk of a SHA collision.
I did say it was insane
I used to teach this stuff to grad students. The point was to drive home the reality that there was no “perfect” solution, and that they should always realise this - and to engineer in a manner that understood and worked with the risks and probabilities. It isn’t just about updates to firmware, the problem turns up all over the place in distributed systems. And you get people who claim to have an algorithmic solution. Sometimes a cryptographic/checksum solution isn’t viable, and you need to work with what you have.
In the case of firmware, a strong checksum is a good answer. But you do see people who don’t understand the problem still come up with algorithmic “solutions”, who don’t understand the deep nature of the problem, and thus don’t realise what an acceptable solution is.
FWIW, Amazon’s tablet/Fire TV OS update images are verified using RSA signatures.
For some tablets a few years ago there was a flaw in the bootloader that allowed you to install a self-signed update.
This is a common problem with encryption. While the principle might be safe, the implementation often screws something up.
Devices can be bricked in all sorts of ways and there’s a myriad of ways to undo it. Some Android phones can be fixed using a “fast boot cable” you can get off of eBay hooked up to a PC. But manufacturers have gotten trickier since this can be used to root a phone.
I wonder what sort of special cable/software places like Amazon uses to factory reset returned devices for resale. And they resell a lot of them. (I bricked my Fire TV Stick a year ago. Eventually a fix was posted online. Took some work but got it working again. And rooted!)
Sounds to me like the problem was not the concept, but the implementation. In some older computers the first-level boot code was no more than half a dozen instructions entered through the console switches. It was a very inefficient loader and its only purpose was to download a real bootloader, which then proceeded to do the real job of booting the OS. It sounds like this company cluelessly put far too much functionality into the first-level boot, and then on top of that failed to test it exhaustively. This kind of boot code is your “lifeboat” code and absolutely has to work right every time – so you want to keep it simple and test the hell out of it.
The obvious and fundamental reason devices usually get bricked is that they end up in a state where there’s no functional code left with the ability to load anything. Fully operational firmware may have the flexibility to load from lots of different sources, but all you need for a fallback is a very simple piece of ROM-resident boot code that cannot be damaged that loads from some basic device, and it may not need to load anything more than some very rudimentary firmware whose only function is to be able to upgrade to real firmware – the kind of incremental loading that is the original meaning of “bootstrapping”.
Of course, in the age of “cheap and quick” and throwaway devices, it’s often tempting to just design a single piece of firmware and dispense with the complexity, and if the upgrade fails and the device is bricked, bonus! – you tell the customer to buy a new device!
I get what you’re saying about the checksum being used to verify the actual final firmware installed image and then, in conjunction with the two-image ping-pong, committing to the new install or reverting to the previous one.
The problem, however, goes beyond the possibility of a checksum yielding a false verification. It is essentially the problem that the checksum is a proxy for correct operation, not proof of correct operation. For instance, it doesn’t guard against a bug that renders valid release firmware, properly installed, inoperable in a particular situation, or a bug in the switchover process itself from the old to the new. The evidence for that is your own example that I just finished quoting above. If there is one universal truth in the world of software and firmware alike, a universal truth especially well known to developers of fault-tolerant systems and mission-critical systems in space and avionics, it’s this: shit happens.
We should distinguish between hard brick and soft brick: A hard brick is where the hardware is uselessly wedged, such that there is no hope of fixing it, no reset available, and you might as well get the tin snips to strip off the chips you want to reuse, assuming such a gauche thing is even contemplated. That’s a failure in any reasonable sense, because it’s equivalent to physical destruction of the hardware.
A soft brick is when the firmware can’t do anything useful, but reset is possible, so there’s the possibility of loading new firmware. That’s actually a good failure mode: The device is broken, but it won’t do any damage to itself or anything else, and can potentially be rescued by someone with the right knowledge and tools.
Of course, in context, it might be the case that nobody with that knowledge and toolkit will be available, so you might as well hard brick and save everyone the trouble.
Or a bug which causes the firmware to enter a state where it will never halt, when fed a certain input. If only there were a way to prove there were none of those bugs in our code in a finite amount of time.
For IBM’s mainframes, the dividing date for “modern technology” was 1970 or earlier.
While most System/360 models stored their microcode in ROM, e.g. Balanced Capacitor Read-Only Storage, most System/370 models read their microcode from floppy disks at power-on (“IMPL”) and loaded it to semiconducter RAM. I think there was even some firmware in writable core storage for some S/360 models, both low-end (Model 25) and high-end (Model 85).
Yes, of course. The purpose of the checksum is simply to ensure that the software installed on the device is the same software that was developed and tested; that it’s not corrupted or truncated. It certainly does not ensure that the software is bug free. But the issue of software reliability is orthogonal to the method of updating software. Software that was installed years ago and has run fine since then can suddenly fail when confronted with a situation that was not foreseen and/or tested, even if no updates have occurred. If you know of a method to guarantee that software has no bugs, I’ll gladly pay you $10 for it.
For consumer devices, this may be an unrealistic requirement. For example, many Roku devices have no removable storage – no USB slot, no SD card slot, no serial port, etc. The only way to install software is over the network, and I think we’re agreed that putting a network stack in the boot rom is problematic at best. To add hardware support for a removable device SOLELY for the purpose of recovering from a brick condition that very rarely if ever occurs would add so much cost to the bill of materials that it’s just not defensible. It’s far cheaper to simply replace bricked units, when and if they occur.