Could the 1980s reverse engineer a modern iphone?

Something I like to point out a lot about computer science. It is common to imagine that most of what we see is the result of modern invention. In computing it really isn’t. The fundamentals that get used in most of what is done were worked out in the 60’s and 70’s.

One of the surprises that anyone who could decode the software on an iPhone is that - the OS is a Unix variant. Indeed perhaps the biggest surprise to any 80’s computer engineer would be that here, 30 to 40 years hence, the two dominant operating systems are basically Unix and VMS. They would feel right at home. But the languages have their roots in the 70’s (in particular Smalltalk, and of course C). The user interfaces we use would be quite familiar (although annoyingly stupid and cluttered with cutesy rubbish.) October 1988 saw the commercial introduction of the NeXT computer. The basic operating system design (Mach + BSD Unix) is what runs MacOS and iOS. The user interface is so close to MacOS that a Mac user from today can sit in front of a NeXT and marvel. Sure, the screen is black and white (as in two colours - black or white, no grey) and the system is not exactly fast. But it is totally usable. NeXTStep provided an integrated development environment and interface builder that many would find quite familiar in concept. And this was not an ultra high end research workstation, this is merely an expensive personal computer. I know individuals that bought them.

Not to say that there are not significant technical advances in the software. But back in the 80’s we were not pushing around paper tape and punch cards.
What I think most of us that were working back then would find about an iPhone is the surprise that silicon had that much life in it. Both in terms of being the dominant technology, able to go as fast as it does, and most importantly that the scaling of the basic VLSI technology kept going as long as it did. Moore didn’t think it would.
An iPhone is science fiction. I keep remembering a SciFi book that included what is for all intents an iPhone like device as a plot element. (The usual protagonist wakes in the future story.) Indeed, the modern iPhone out-paces the story’s PDA like device in size significantly.

The big deal in an iPhone isn’t the software. It is how the damned thing was made. There are lots of clues about how it can be made, and what basic elements of semiconductor design at this scale look like and work. The gap to how you actually make them is not at all small. But there is lot of very serious technology that can be gleaned.

the Commodore Amiga in 1986 could do 4096-color graphics at TV resolution or better, near-photographic color in a desktop home computer format. The XGA format for IBM-PC was available about 1990, 800x600 in 32K colors. So the graphics capabilities to do most of what an iPhone could do was not mind-boggling. Only the size and crisp resolution (and flat screen) would be mindboggling.

Sure, but things that we take for granted, like being monoplanes, or the traditional tail arrangement, or flaps/slats, etc… would be revelations to someone of the earlier era.

While I doubt they could duplicate any smartphone with the technology of the 1980s, I also do not doubt that the engineers and scientists of the day could learn a LOT of interesting things from it that might advance tech quite a bit in that era. And probably in ways we wouldn’t imagine; it might be as silly as how connectors work, or buttons, or something equally mundane. It wouldn’t necessarily have to be anything sexy.

One problem with “decoding the software” is the extreme complexity of code generation today vs the 1980s. In the 1980s high level languages were compiled (often with optimization) to a simple binary executable.

Today the complication and code generation process is very complex, often using many layers, intermediate code representations, and sometimes even low-level virtual machines (LLVM).

I don’t know if the latest XCode IDE uses “basic block working set optimization” but Microsoft’s products do. This takes the final emitted machine code, slices it into blocks, then reorders those blocks using profiling data obtained from monitoring program flow. This groups together execution hot spots to better favor CPU and data caches. Code paths through the resultant binary are spaghetti-like from a human perspective and the final emitted code almost impossible to debug without symbolic assistance from the compiler: https://www.microsoft.com/windows/cse/bit_projects.mspx

In the early 1980s, an executable could be disassembled, ie converted back to assembly language, and in some cases even decompiled. Lower-level CPUs in the early 1980s did not have multiple acceleration layers, e.g, CPU L1 instruction cache, L1 data cache, L2, L3, etc. They did not use highly speculative out-of-order execution.

In the late 1980s you could use a software kernel-mode debugger like Soft-ICE to set breakpoints and trace disassembled code execution of a compiled binary without having the source code. The relationship between emitted machine instructions and a lower-level language like C was more straightforward. You cannot do that today: SoftICE - Wikipedia

Today the CPU execution environment is so complex that looking at the static code stream does not remotely reflect the instruction execution order. The static instruction stream is many levels removed from the original source code, whatever that was.

Today it would be almost impossible to decompile an instruction stream to the original source code – even IF you knew the exact CPU. In the OP scenario, that A10 CPU in the iPhone and its instruction set had not yet been invented. So decompiling or even disassembling a binary executable which used a yet-to-be-invented instruction set seems nearly impossible.

That of course assumes the iPhone was not encrypted and the bus signals could be captured using 1980s technology, which is also impossible.

Rather than today’s iPhone 7, if the original 2007 iPhone was sent back in time just 10 years to 1997 – they might have a better chance then. It was not encrypted, the bus signals and CPU were a lot slower, and fabrication was less advanced, and debugging instrumentation (ie logic analyzers) in 1997 was a lot better than the 1980s. I believe some form of the ARM instruction set existed in 1997 so that might have been recognizable.

There are a few misconceptions here.

LLVM is an intermediate representation used before the final machine core emission. It would be unlikely that LLVM appears in the final run-time environment. It isn’t a virtual machine in the sense that we use VMs to virtualise the running hardware. The presence of LLVM in the development environment would be transparent to someone looking at the final core. (Apples code chain does use LLVM.) The idea of LLVM like intermediate representations is not new. gcc used an intermediate representation (although it became so cluttered with special hacks and machine specific information that it was hardly as processor agnostic as they made it out to be - one of my students in the early 90’s had an email exchange with RMS where he owns up to this. )

We used to do this sort of thing in the 80’s. There isn’t anything new about the idea at all.

Debugging is a pain, you don’t have an easy time of relating it to the source code. But there is nothing intrinsically difficult. You quickly get to understand the paradigm. I used to do this as part of my day job about 5 years ago. I wrote a system that automatically went in and optimised out a lot of the mess that was used to glue the run-time together. There are only a few simple ways the compiler would do the binding together and the code patterns were trivially found.

Still can. I used to do it.

None of this changes the code. That is the entire point of caches and dynamic code re-order. The code that is run is agnostic to the presence of such features. The same code will run identically on CPUs without caches or speculative code execution. If you compile code for a machine with speculative re-order you sometimes have to avoid certain instruction sequences, but that isn’t a problem for someone deciphering the code. The presence of caches is important when you have real-time code - but that usually means code is loaded into non-caching memory addresses - something that might puzzle someone decoding the system for a while - but they would probably guess what was happening and why. Caches were in use on machines in the 60’s. The ideas are not new.

It isn’t quite as straightforward, but it isn’t terrible either. It makes production code debugging harder and requires the use of higher level tools to get the productivity up, but there are only so many ways of making a language run-time work. Most of them had been worked out in the 70’s. (Smalltalk 80 - Bits of history, Words of Advice is a great resource.)

It doesn’t matter. The entire point of code reorder is what we term serial consistency. The final result of the execution of the code on the CPU, as externally viewed outside the CPU must be identical to the same code executed on a CPU with no reorder. Internally instructions can be retired in a varying order, but the writing back of results to memory must always remain correct.

Very difficult verging on the impossible. However the first ARM was not that far away. The A10 is a superset.

The running phone isn’t encrypted. But so much of the phone is inside the one die that the stuff that gets out of the die is a long way from the important encrypted component. The general running of the phone is - in principle - if you did have the tools - visible unencrypted. The secure enclave operation never gets off the die. There is zero chance that could be broken.

Design of the Acorn Risc started in October 1983.

Resolution and colours isn’t what is mind boggling. The SGI Reality Engine had properreal time 3D graphics capability. What we now take for granted as a GPU (Indeed NVidia was founded by some ex-SGI guys.) The market for the Reality Engines was mostly things like flight simulators. There is no way you could have played the 3D games you can on an iPhone on a Commodore Amiga or PC. You would be struggling just to display a screen-shot.

[QUOTE=joema]
…“basic block working set optimization”…This takes the final emitted machine code, slices it into blocks, then reorders those blocks using profiling data obtained from monitoring program flow. This groups together execution hot spots to better favor CPU and data caches.

[/QUOTE]

You’re saying you did ***post-link optimization
*** where the final linked binary was cut up into little chunks, reordered according to statistics from monitored execution profiling, and spliced together with JUMP instructions? Most of the research papers I see on post-link optimization are from the 1990s. This 1999 IBM research paper summarizes some methods: https://www.research.ibm.com/haifa/projects/systems/cot/fdpr/papers/fddo2.pdf

The implication for anyone trying to reverse-engineer a compiled binary is it greatly increases the difficulty – even if it wasn’t encrypted as on the iPhone 7.

[QUOTE=joema]
…Lower-level CPUs in the early 1980s did not have multiple acceleration layers, e.g, CPU L1 instruction cache, L1 data cache, L2, L3…

[/QUOTE]

Certain optimization techniques do change the binary code layout, sometimes greatly (see above). Far from being agnostic to such features, some of the optimization techniques specifically target those features. This in turn transforms the binary code into such unintelligible spaghetti that it’s impossible to decipher. This is why we must often request a non-optimized binary for debugging. Unfortunately the 1980’s scientist cannot request this from the iPhone 7.

So you did this for a machine instruction set which did not exist – IOW the OP scenario we are discussing? I don’t think so. Code patterns may be easily found provided you have ample documentation from the CPU manufacturer. That would not be the case of an iPhone 7 A10 CPU left in the 1980s.

It’s not the idea of caches that would be new in the 1980s, rather it complicates understanding what the CPU is doing. This is even true for the actual designers of the CPU, much less someone trying to reverse-engineer it using 30-yr old technology. For a popular-level account of this, see the Pulitzer prize-winning book Soul of a New Machine by Tracy Kidder.

The A10 chip package does not use external RAM. There is no physical, accessible memory data/address bus that goes off-chip. I believe the program code on external flash memory is encrypted until after it is fetched (in encrypted form) over the I/O bus and is inside the CPU/RAM package. It is virtually impossible to attack or decrypt, even using the most advanced surface-mount workstation.

So it’s not like a PC with a separate CPU, memory chips and big, easily accessible I/O bus lines. It is mostly on a single super-integrated SoC – CPU and RAM combined. The only I/O that takes place to the flash memory and ROM are encrypted.

There was speculation the earlier iPhone 5 could be attacked (using 2016 technology) by using a focused ion beam to drill into the CPU and expose the UID. I don’t think that would work on an iPhone 7 which used secure enclave. Obviously none of that would be remotely possible using 1980s technology.

[QUOTE=joema]
…“basic block working set optimization”…This takes the final emitted machine code, slices it into blocks, then reorders those blocks using profiling data obtained from monitoring program flow. This groups together execution hot spots to better favor CPU and data caches.

[/QUOTE]

You’re saying you did ***post-link optimization – *** where the final linked binary was cut up into little chunks, reordered according to statistics from monitored execution profiling, and spliced together with JUMP instructions? Most of the research papers I see on post-link optimization are from the 1990s. This 1999 IBM research paper summarizes some methods: https://www.research.ibm.com/haifa/projects/systems/cot/fdpr/papers/fddo2.pdf

The implication for anyone trying to reverse-engineer a compiled binary is that post-link optimization greatly increases the difficulty – even if it wasn’t encrypted as on the iPhone 7.

[QUOTE=joema]
…Lower-level CPUs in the early 1980s did not have multiple acceleration layers, e.g, CPU L1 instruction cache, L1 data cache, L2, L3…

[/QUOTE]

Certain optimization techniques do change the binary code layout, sometimes greatly (see above). Far from being agnostic to such features, some of the optimization techniques specifically target those features. This in turn transforms the binary code into such unintelligible spaghetti that it’s impossible to decipher. This is why we must often request a non-optimized binary for debugging (which still has symbols). Unfortunately the 1980’s scientist cannot request this from the iPhone 7 and the production binaries are not linked with symbol support.

You are talking about a situation where you have extensive manufacturer documentation for that instruction set, CPU architecture and code generation, and possibly including a binary that is linked with symbolic debug information. The OP scenario is where nothing is known about CPU or instruction set, it’s not linked with debug symbols, and likely has extreme optimization applied to the emitted code – including possibly post-link optimization.

It’s not the idea of caches that would be new in the 1980s, rather it complicates understanding what the CPU is doing. This is even true for the actual designers of the CPU, much less someone trying to reverse-engineer it using 30-yr old technology. For a popular-level account of this, see the Pulitzer prize-winning book Soul of a New Machine by Tracy Kidder. However this is likely a moot point since 1980s scientists could never monitor the A10 CPU address/data bus since it is all internal to the chip package.

The A10 chip package does not use external RAM. There is no physical, accessible memory data/address bus that goes off-chip. I believe the program code on external flash memory is encrypted until after it is fetched (in encrypted form) over the I/O bus and is inside the CPU/RAM package. It is virtually impossible to attack or decrypt, even using the most advanced surface-mount workstation – in 2017.

So it’s not like a PC with a separate CPU, memory chips and big, easily accessible I/O bus lines. It is mostly on a single super-integrated SoC – CPU and RAM combined. The only I/O that takes place to the flash memory and ROM is encrypted.

There was speculation the earlier iPhone 5 could be attacked (using 2016 technology) by using a focused ion beam to drill into the CPU and expose the UID. I don’t think that would work on an iPhone 7 which used secure enclave. Obviously none of that would be remotely possible using 1980s technology.

For the OP scenario to have a chance of working it would probably have to be an original iPhone from 2007 and sent back just 10 years to 1997, not to the 1980s.

Ah, no, missed the post-link bit. We did this in the linker with linker specific directives. The result is much the same.

Layout yes, actual operation, no. You are talking about the difference between making it easy to perform day to day debugging operations versus reverse engineering. Post link optimisation is a speed hump compared to the other problems described. You end up with a pile of jumps linking slabs of code together. Identifying entries into the slabs is often close to impossible, but tracing through and finding the source of those jumps not so hard. Again, you end up finding recognisable patterns.

You are jumping from the code to the internals of the CPU. Of course the CPU is more complicated, but the point I am making is that the code it executes and the results of that execution are agnostic to the presence of the cache.

Its a great book, but essentially has no technical content at all. You might be referring to the “missing and gate”. What caused them problems was (trying hard to remember here) that it was the first time any of them had worked on a virtual memory architecture. Not that that matters, the ideas are very similar.

Package yes, but it isn’t all on one die. I do doubt that it could be done if they only have one iPhone to work with. They would need to kill a few (lots really) before they got it worked out. And it would probably need custom silicon to make the probes in order to cope with the speeds and impedance problems. But it isn’t as desperately impossible as trying to get to the internals of a die.

Be careful to distinguish between off chip and off package here. There is a massive difference. Between the dies in the package the signals are vastly more accessible (even if that accessibility is still at the level of science fiction). Those signals are buffered and run a much more defined protocol than you would see on-chip.

I’m not sure how much is encrypted here. I agree - if they do keep the main code in ROM encrypted it is so ridiculously difficult to make impossible.

Overall, I do agree, the difficulties in accessing the code are as good as insurmountable as to make it impossible. But I would not say that people working on a slab of exposed code would not be able to decypher it. Especially as in the OP’s scenario there would have been an extant progenitor CPU with the basic form of the ISA available. Any RISC ISA is going to be a lot easier than if they had been presented with an x86, the modern forms of which are just dire with kludge after kludge in the ISA to add more and more features. Not that the ARM is any sort of poster child for a clean ISA - indeed it is arguably even more cluttered up with junk. But a highly limited set of basic instructions and addressing modes is a good start.

But going back to my original point.
Mostly what I am suggesting is that actually the software isn’t the important thing. And that even in the presence of success, would lead to a rather unimpressive outcomes. The software technology isn’t that bit an advance. Mostly what we do now is create ever more software that just doe more stuff. But the actual ideas that matter were to a large extant, already around. We have much better tools to engineer the creation of much more stuff, but that stuff isn’t inside, much more impressive. In 80’s the core technologies that drive most of our existing systems was either in existence or under development. Since then we have just got better at making it go faster and making it do more stupid things.

… Using ever fewer dev man-hours per stupid thing and ever less-skilled devs at the outer levels of the complexity onion.

Most excellent last few posts guys. Thanks. I’ve been out just long enough now that I’m matching the tail lights of my late 2000’s expertise disappear into the fog of the irrelevant past, replaced by an impenetrable wall of new and meaningless (to me) acronyms and pest practices du jour.

I thought you were a military pilot and a laundromat owner. You were in tech as well?

Every few years I get to change careers whether I want to or not. The two you mention were quite awhile time ago.

My undergrad degree was in CS and I’ve worked as a dev off and on from COBOL & BAL on 360s & 370s through DEC & HP minis to CP/M pre-PCs to Win16, Win32, and finally .NET on WinServ. My last IT job was CTO of a 30-person ISV doing homeland security-related web apps on the whole MSFT enterprise stack. Got out just as the virtualization / public cloud architecture appeared on the scene.

You should look into what’s being done in virtualization, if you remember how VM worked on IBM systems back in the 1960s and 1970s. It’s interesting to see old concepts re-worked to solve modern problems and, of course, it’s a lot easier to try it out now that modern PCs have recapitulated so much of the design of classic mainframes that not-especially-current Intel CPUs have dedicated virtualization hardware.

You know what would be interesting to drop into 1980? Ungar’s paper about generational garbage collection, which he published in 1984. See, the problem with GC prior to Ungar was that it was the old mark-and-sweep algorithm, which tended to induce large pauses while the GC algorithm ran to free up RAM. Generational GC is a modified mark-and-sweep which uses the high infant mortality rate of created objects to reduce or eliminate those pauses, leading to more responsive software in GC’d runtimes. Getting that into the right research labs (and, therefore, the right early workstation software) a few years early might have had some interesting knock-on effects in software design.

We used HyperV and/or VMS in the last product I worked on. Each dev had a pretty hefty real box with HyperV running a package of about a dozen virts representing various configs of our products.

I was referring to the cloud-scale virtualization of “click a web UI and 3 more web front ends and a new SQL loadsharing instance just magically appear.” That stuff, and all the magical provisioning & configging infrastructure was just a gleam in Marketing’s eyes as I departed.
Back in the Olden Tymes I recall we’d sometimes run IBM’s VM. Which could drag the perf on that 370 right into the shitter. OS/VS1&2 were already famous for using a lot of CPU cycles and core (RAM was the next gen’s term) for housekeeping, leaving notso much for actual workload.

Putting VM and 3 or 5 OS/VS instances in there really stank up the place.

Such fun.

Late add:
You’re right about GCs. The .Net GC is one of the beefier implementations of the generational approach. An interesting question to me is how closely coupled GC is to object orientation. I started out in the “structured programming” paradigm (albeit more honored in the breech in the pre-structured languages such as assembler & COBOL).

OO was a key enabling technology to climb the scale / abstraction / reusability ladder. But it wasn’t foreordained we’d invent it in the way we did at the time we did. Thinking about GC in the context of a non-object environment is fun. In that *out of beer and pizza at 2am in the dorm *sense of “fun”.

Interesting thought. The ideas in Ungar’s paper are pretty key for much that has come since. Various incremental GCs and also parallel, and distributed GC, and some soft real time GC. (Takes me back a bit, in a previous life I was a PI in a research group that was working on these things. We did some interesting stuff. Sadly a lot of it has leaked out of my ears in the years since.)

OTOH, mainstream languages seem to try very hard to avoid GC. C++11 had automatic memory management listed in the penultimate draft of the standard, then dropped it, (C++ is not an OO language despite the various weenies that use it’s protestations) and Python still only uses reference counting. But Java implementations do, and some are very aggressive at making use of good GC.
Then you can look at things like the Lisp machines - tagged memory so you could see the pointers. The big problem is of course pointer identification. Any language implementation where you can’t find the pointers by just looking at the object is doomed. IMHO.
(No real link between OO and GC at all. OO needs it, but so does almost any advanced language paradigm.)

There is a fair amount of technical content in the book, particularly about the complications multi-level caches created, which the logic analyzers were blind to. They were brilliant people when I worked with them, but detailed inspection of CPU operation was only possible because of the low integration level, exposed wire-wrapped nature and slow clock rates.

Today you don’t generally need low-level intra-CPU documentation since almost no programmers work at the assembler level. However with an iPhone 7 sent back to 1980, assembly/machine instructions are all you have, yet there would be no documentation about the instruction set, CPU behavior and logical subsystem layout, etc.

In the early 1980s I was confident of my ability to reverse-engineer a competing design since the small-scale and medium-scale integration chips had published specs, and clock rates were very low by today’s standards – About 5Mhz for both the 11/780 and MV/8000. Computers of that era had exposed wire-wrapped backplanes which facilitated probe access. A typical setup looked something like this: http://static.righto.com/images/alto/control_board_logic_probe.jpg
http://ed-thelen.org/RestoreAlto/LogicAnalyserInputs.jpg
http://static.righto.com/images/alto/backplane_logic_probes-w600.jpg

Back then it was straightforward to monitor and decode each microinstruction, which gave visibility beneath the instruction level.

None of that applies today – the A10 CPU in an iPhone 7 would essentially be an indecipherable black box to engineers from the early 1980s. Even if they could obtain physical test points, the logic analyzers of that era could no more read 2Ghz bus signals than Pasteur’s magnifying glass could read DNA code.

Even experts today at competitive reverse engineering such as Chipworks.com are unsure of many functions on the A10 CPU – and they do this for a living and obviously have experience with contemporary CPU designs and state-of-the-art instrumentation: http://cdn.wccftech.com/wp-content/uploads/2016/09/Revised_A10_die-840x560.png

So a 1980s engineer could not get access to the bus signals, couldn’t read them if he could, and could not get access to the flash storage which is encrypted.

If that was extracted and decrypted for him, it might slightly increase the chance he could deduce something from it, but that’s not the OP scenario. He wouldn’t know what’s code, what’s data, does it use a stack, what the instruction set is, etc. That’s like saying if reams of design documentation were sent back in time attached to the iPhone, then figuring out useful information would be easier. Yes, but in that case they wouldn’t even need the iPhone, since the information is what’s valuable and in the OP scenario the iPhone itself was the only way to extract information.

A computer engineer educated in the 1970s would generally be expecting an extrapolation of then-known architectural trends if inspecting a future CPU. In the 1970s the trend was toward heavily microcoded, ever-more-complex instruction sets, culminating in HLL (High Level Language) machines, typified by the IBM Future Systems Project, AS/400, Data General FHP (which was modeled on the Burroughs B1700):
https://people.cs.clemson.edu/~mark/fhp.html

A high priority in those days was closing the “semantic gap” between machine instructions and HLLs, ideally achieved by implementing high level language statements directly in microcode, with no compile phase whatsoever. This thinking can be seen in this 1982 paper “The Execution of High Level Languages”: http://ro.uow.edu.au/cgi/viewcontent.cgi?article=1054&context=compsciwp

It is likely a CPU engineer educated in the 1970s and examining an iPhone 7 in the early 1980s would be thinking along current architectural trends. Unfortunately that is exactly the opposite of CPU design priorities over the past several decades and as implemented in the A10 CPU. It’s true the IBM 801 (generally the world’s first RISC computer) was operational in 1980, but this was an experimental prototype used for IBM-internal research.

Yeah, the semantic gap was something of a dead end. My favourite paper on this was: What we have learned from the PDP-11, what we have learned from the VAX and Alpha.

I would disagree that all designers/engineers would be thinking only about CISC. Patterson published his seminal paper on the case for RISC in 1980, and I as I pointed out, the guys that designed the ARM started work on it in 1983. The 801 caused quite a stir, and I really doubt any competent researcher of the time was unaware of it. Pyramid was formed in 1981, and shipped its first RISC machine in 1983. Pyramid also caused quite a stir. Their machines were clearly beating the big boys. Work started at DEC in 1982 on PRISM, which became the basis of Alpha.

The Intel iAPX 432 should have warned people.

If you really did have an iPhone back in the early 80’s, you would want the smart people working on it, and they would have been well aware that this RISC thing was looking pretty much like the future already.

(I still agree that there would be little to no hope of actually reverse engineering the phone, but I do think these guys would have a pretty good idea what they were looking at. Enough to get seriously inspired.)