What's the most expensive computer in history?

Interesting that whether we’re looking at the scale of a chip, a supercomputer or Internet, the interconnects are often the most important aspect. Chip makers often brag about feature size but as that slows down, they may start talking how many metal layers their microarchitecture has. The number of metal layers seems to top out at 30 now. Are there disadvantages to piling on more metal layers?

Yeah, interconnects are one of those things that are independent of scale. They’re a problem at both micrometer and megameter scale.

More metal layers means more wafer processing, which means higher production cost. I’m not a semiconductor engineer (I just happen to be in the vicinity of them), but I suspect another problem is that you also hit diminishing returns. You can’t go up too high before wire length is a problem again, as well as resistance. You have to get power and ground down to the base as well, which takes up area. At a certain point, it doesn’t matter how much room you have for the wiring, because using that extra space will slow you down. You have to make your wiring more efficient in the first place, by having a good layout (for short interconnects) and so on.

Essentially there’s an area/volume thing going on, where the wiring takes up volume but the endpoints all have to go to a limited area (the base layer), so you end up being limited by area.

Somehow I’m also reminded of the skyscraper problem, where practical skyscrapers can only get so tall because eventually the ground floor consists entirely of elevator shafts. Skyscrapers can speed up their elevators but computer chips can’t do much about the speed of electrical signals.

I’m surprised by the layers all being connected to the base layer. I thought (based on nothing but my gut) that it was hierarchical, like streets where each building/transistor has a link to a street/lower layer which connects to a road which which connects to a highway.

Any idea how much Cerebras’ chips will cost? I wonder how much a fab designed to produce chips like that on a large scale would cost.

Their webpage says it’s 56x the size of the largest GPU, 3000x more chip memory and 10 000x the memory bandwidth. It seems like they used the real estate to bring the memory that normally would be in the V/RAM onto onchip cache instead. For highly serial work, you could end up with 1 core and 10 layers of cache.

Would an FPGA allow you to turn a chip from being optimized for serial work to being optimized for parallel work? Or turn memory into processor cores?

Hmm, ok, I’m not talking to myself, I’m just having difficulty with the new buttons.

Don’t tell Microsoft: Their Azure cloud offers FPGA based neural networks

I call it “the cost of communication” and it gets worse as you scale up the computer. It’s not just the bandwidth, its the latency and the power.

An interesting (to me) side note: I talked to the co-chair of the first DARPA exascale study after the fact. Their job was to develop an architectural path to an exaflops (1E18 floating point operations per second) machine. After several weeks of meetings and discussion, they managed to produce what looked like a feasible machine requiring “only” about 60 MW (the DARPA goal was 20 MW). That weekend, the chair was taking a day off when he got a call from one of the panel members. They had not accounted for the power in the interconnections (not on-chip, but everywhere else). So they got back together to agree on the calculations, and the internal communications, best case, added about 150 MW!

Doesn’t really contradict what I said; they’re using Stratix 10 FPGAs, which get their math horsepower (10 teraflops) from fixed DSP blocks. It’s effectively a nice hybrid approach, but the programmable part is just changing the layout of the fabric, not doing any work itself.

It’s interesting no doubt, though they’d still get better efficiency with an ASIC. On the other hand, the reconfigurability means they can better iterate on new techniques, which might give a better overall payoff for the time being.

More interesting to me though is programming the FPGA to itself resemble a brain, with noisy signals, random-ish interconnects, and asynchronous operation. As compared to the usual approach which are basically just multiplying giant matrices–high precision and highly ordered.

You might be interested in this emerging technology- p-bits, from Purdue. I was listening to a webinar from the group this morning and they are actually starting to look at FPGA-like approaches using the MTJ added to the usual CMOS.

Hmm…that didn’t do the nice block embed. Here’s a YouTube video from the group.