Whaterver happened to Non-von Neumann computers?

Back in the late 80’s/early 90’s, the von Neumann bottleneck[sup]*[/sup] was seen as a Big Problem holding back the development of better, faster computers. Researchers were coming up with new, non-von Neumann architectures weekly, each touted as the Future of Computing. 15 years on my laptop is loads faster than all their little toys and is, AFAIK, still JvN in concept. What happened? (or didn’t happen).

*by which I think they meant sharing address space between instructions and data, thereby forcing both to be accessed through the bottleneck of same data bus.

Today is your lucky day ticker. The call to come up with an alternative to the Von Neumann architecture was first made by John Backus in a speech at the ACM national conference in 1977 at which yours truly was present. (His speech later appeared in the 8/1978 CACM.) So my knowledge of this is non-trivial. In short:

It’s total and complete BS.

Note that Backus originally suggested a functional programming/data flow model but what most people proposed later were parallel models. In short, they were still just plain old Von Neumann architectures on the inside, but possibly a whole lot of such devices. No real change whatsoever.

So it eventually died the death of all not-well-thought-out ideas.

The only real innovation in computing concepts in the last 30+ years has been Object Oriented Programming. That actually achieves some of the goals that Backus wanted but it still uses the classic machine architecture. (And many of the basics of OOP also pre-date Backus’s speech.)

Well, “little toys” is a bit more contemptuous than necessary.

What happened is mostly Moore’s Law and economies of scale. Computers such as the Connection Machine have to be built pretty much by hand with custom parts galore. That takes a long time. So even if you end up with something that’s vastly better than the PCs in existence when you started, you’re competing with the current generation of processors. And all your expensive custom parts belong to the previous generation of fabrication technology. So your machine might be a tad faster, but at the same time it’s much, much more expensive.

It also turns out to be non-trivial to program unconventional machines. Not all problems can be decomposed to take advantage of high degrees of parallelism. And (outside of PhD theses) there aren’t a lot of tools to help programmers. Not to mention that many of the problems that require high degrees of parallelism can be attacked with special-purpose hardware – I’m thinking graphics cards here.

So basically, cheap and easy beats massively expensive and hard to use.
Well, I wasn't at the talk, but I had to present this paper in one of my classes one year, so I did read it many a time.    I don't think it's total BS, although I do remember at the time not understanding why so many people thought it was such a seminal piece of work.    Probably the best thing you can say about [Backus' paper](http://www.stanford.edu/class/cs242/readings/backus.pdf) is that it describes a language which, if implemented, would compile nicely on massively parallel machines.     The downside is that there'd probably be about three programmers in the world who could program in it effectively.   (It's been 25 years since I read it, but I remember finding the paper mighty tough sledding.)

Note that TCM and such are just a lot of Von Neumann machines in parallel. Claiming it to be non-VN was just hype.

It is true that a lot of stuff concerning parallelism is doomed to failure since a very, very tiny percentage of great programmers can program “in parallel” and those that can aren’t hired by software houses. Ergo, MS OSes.

[QUOTE=ftg]
Note that TCM and such are just a lot of Von Neumann machines in parallel. Claiming it to be non-VN was just hype.

[QUOTE]

Ah, yes-and-no. Backus’ point was that conventional programming languages * encouraged * algorithms that were inherently non-parallel because of their focus on iteration and sequential operations on variables. Given that you have a lot of machines in parallel in the Connection Machine, you do avoid the bottleneck of having exactly one address being accessed at any given time.

Arguably, some of the vector commands that have been incorporated into computer hardware (e.g. Altivec and whatever the Intel equivalent is) could be considered a step towards non-Von Neumann operators. (Although I’d have to actually look at how they’re implemented before I’d argue that point very strenuously.) And graphics cards, whether they are currently implemented using VN architecture or not, could certainly be implemented with custom hardware that operates directly on memory cells without recourse to a processor.

<slight hijack>
Anyone care to comment on quantum computation?

From what I know of it, it has the potential to vastly improve processing speeds. But I don’t think there are any generic models for how to program a quantum computer. (That is, some sort of quantum Turing machine.)

Over the history of quantum computation, there have been exactly two algorithms published that take advantage of QC’s capabilities. Don’t hold your breath for the revolution.

So are the vector architectures built by Cray and Convex in the 70s and 80s (and 90s) considered VonNeumannian? They were relatively very powerful for their time.

Still, Moore’s law surpassed them, and even what’s left of Convex takes advantage of multiple cheap processors for speed.

I think improvements in caching has done a lot to reduce the “bottleneck” of Von Neumann architectures. Something like a pentium will cache several instructions with a single fetch. Branch prediction also allows for longer prefetch queues, which allows the instruction fetching to be much more efficient than that of machines of even a decade or so ago. Branch prediction is one of the hot topics in processor architectures these days. Intel goes out of their way not to give out too many details of exactly how their branch prediction algorithm works, but it’s safe to say that it’s one of the keys to their processor’s performance.

The popularity of desktop computers has also done a lot to kill certain ideas in supercomputers. Desktop processors are getting rather powerful (your typical desktop PC has almost as much power as a cray supercomputer of the 80’s), and it’s getting cheaper to build a supercomputer out of parallel pentiums than it is to get the same processing power out of custom CPU designs. It’s more a question of economics than technical feasibility.

Non-Von Neumann architectures are used in some microcontrollers. The disadvantage of not having a Von Neumann architecture becomes fairly obvious when you try and program one of these beasts. You’ll run out of code or data space first, with plenty of space left over in the other. However, for microcontrollers, which typically need a fairly small code space, this architecture works pretty well.

There are applications such as vector processing where you don’t really need a Von Neumann architecture. You’ll find non-VN architectures in all sorts of signal processing applications (have a look at DSPs, for example). For general purpose computing though, it’s pretty much all Von Neumann these days.

On the other hand, one of those two algorithms is for factoring numbers efficiently, which capability would negate a great deal of current security measures. So that might in itself qualify as a revolution.

There was some commercial use of non-Von Neumann architecture.

The one I am most familiar with was the DSP56000 series from Motorola. These puppies had separate data an program memory, each with a dedicated bus. The 56000 managed to be pretty damn fast for its clock rate, just by being able to pull in data and instructions simultaneously.

Ahhh. Audio processing code written in Assembler - nothing like it to fry the brain.

The first single-chip microprocessor, the Intel 4004, had a seperation between code address space (usually in ROM) and data address space (in RAM). That is, even though it obviously wasn’t possible to rewrite the ROM, you couldn’t even load an opcode into data memory.

So, was the first chip a non-von Neumann design?

It’s probably a bit debatable. While technically it’s not a Von Neumann architecture since it does have seperate code and data space, it does have a lot of things in common with a Von Neumann architecture, i.e. the shared data bus. You could almost make the argument that the seperation of RAM and ROM in the 4004 was nothing more than the common seperation of RAM and ROM in the current memory space of most computers.

Intel for a while was calling their Penitum chips “RISC” processors, because architecturally they had a lot in common with RISC machines (pipelining, for example). By the same extension, since the 4004 has so much in common with a typical Von Neumann architecture you could say that it is a Von Neumann type machine.

However, if you want my vote, I say no, it’s not, simply because there is no way to get data into the code space or code into the data space. One of the key structures of a Von Neumann machine is that code and data are the same thing, just memory. You can’t execute data in a 4004, and you can’t do data manipulation with code.

I agree that the usual presence of opcodes in ROM has nothing to do with the vN-ness of the architecture. On the same token, the shared data bus is also of highly debatable relevance to this discussion.

Which is laugable, of course, but given how complex supposedly RISC chips (Alpha, SPARC, etc.) have become lately, it isn’t that surprising.*

Other than the seperation of code and data, what is non-vN-ness defined with? Are SIMD designs closer to what the non-vN advocates have in mind for their ideal processor?

And this is exactly where I stand, as well. If you cannot write self-modifying code, you cannot call it a vN design.

By that measure, nearly all of the idealized machines' or virtual machines’ described or assumed by programming language standards are non-vN designs, regardless of the near-extinction of that paradigm in hardware design circles.

*RISC has become almost completely debased as a descriptive term. For my money, RISC chips don’t have microcode, or (to put it another way) the microcode is the ISA. If you have to translate fetched opcodes before you can process them in your pipeline, your chip is a CISC design. Of course, I don’t know of any general-purpose CPUs which still fit that criteria. This page has an interesting overview of the history of architectures in general with an eye to the differences between RISC and CISC.

I don’t know what the formal definition is, but a typical Von Neumann machine has a common address and data bus for the instructions and data. Architecturally, the RAM/ROM and data lines on the 4004 look a lot like a typical microprocessor with the address decoder built in, like an 80188 for example (which is definately a Von Neumann architecture).

That doesn’t change my vote though, just presenting the other side, as I see it.