AI chip architecture: any biomimetic work in progress?

Dr_Paprika · May 25, 2024, 11:07pm

We probably have not yet identified all the neurotransmitters. We certainly don’t understand what all the brain cells do. Not long ago we thought the most abundant cell just acted as support and glue. Nope.

A basic introduction to some recent trends in neurology (from 2022 Economist Technology Quarterly). See “View All Chapters”. Might be paywalled.

md-2000 · May 26, 2024, 11:12pm

Not sure how this applies, since IIRC the typical chip factory will process an entire wafer (8" diameter?) at a time and then slice it up. back in the 80’s there was a startup - Trilogy(?) - that planned to put a whole IBM 370 series on a full wafer.

I assume however this is done, or how multiple chips are made on a wafer, by exposing one area at a time, so for something like a Trilogy system it would have required interconnects between specific areas of the wafer.

CaveMike · May 27, 2024, 2:03am

IME it is most frequently used to connect multiple independent chips together to maximize the speed between them. It eliminates the overhead of the packaging and pins. This is called Package on Package (PoP).

A phone PoP will stack RAM and FLASH on top of the SoC – everything looks like it is in a single chip. The SoC itself will have 12 or more core processors.

The main bottleneck in an SoC’s NPU is memory bandwidth (particularly with these huge GenAI models) so it helps with AI performance even if the AI compute is not being stacked.

There are other advantages too – the chips can be fabricated with different processes, decouple yields, etc.

This is another interesting high-level link that talks about the Through-Silicon-Via (TSV) which is just one of the methods in @DSeid’s link:

This isn’t my area of expertise and it changes year-to-year. I’m sure my description is a few years behind.

Francis_Vaughan · May 27, 2024, 8:22am

Modern, and even legacy fabs are 300mm (aka 12"). It is something of a sore point that they topped out at this, despite lots of hope for ever bigger wafers. 450mm is a bit like gallium arsenide - tomorrows technology, always has been, always will be.

I remember that there were lots of problems with attempts at wafer scale integration. Heat control and uneven heating of the wafer was one.

One of the big problems is that memory (at least DRAM) uses enough of a different process that it isn’t compatible with logic on the same die. So memory is going to be a different wafer. This messes with new architectures that try to put processing inside the memory. You can use less dense memory, and there are processes that support this. But the density is a lot lower. So it is hard to make designs that make sense.

The big dog in fabs is throughput. The eyewateringly expensive bits of kit are the machines that expose the wafers aka steppers. In order to maximise throughput you want the stepper to be exposing every die on the wafer with the same mask. There really isn’t any useful way of making a stepper use different masks for different locations. The tolerances are so mindbendingly small they have enough trouble making it work with only one mask. And what constitutes a mask nowadays is not your father’s mask.
Maybe you could stream together a set of steppers, but the efficiency would likely still be ruinous. At $100M a throw, and the next generation looking like $250M each, steppers make for very focussed thinking about efficiencies.

Modern AI systems, even those that operate as a biomimetic design basically just encode the layout and connections of the bio unit as a data structure and then run code that serially updates the emulated system. You can add parallel processing units, but this is just divide and conquer of the same basic idea. It is no different to how a legion of other simulation systems work. It is efficient because we can make the memory system that contains the system representation ridiculously dense, due to our ability to make DRAM so dense, and how fast we can make the processing units rip through this.
But working out how to make silicon look like actual brain processing is a way different matter. It may simply never catch up with brute force emulation. Very optimised emulation is the likely middle ground absent any fabrication breakthroughs.

Darren_Garrison · May 27, 2024, 9:40am

They actually exist now.

Darren_Garrison · May 27, 2024, 9:59am

Those 3D chips are extremely dis-similar to the 3D structure of a brain. Those just take two or more completely 2D chips and stack them with data connections between them. Those are stacks of pancakes while brains are a bowl of spaghetti.

Francis_Vaughan · May 27, 2024, 2:39pm

That is seriously impressive.
One thing is that they appear to be replicating the same core at each spot on the wafer, which wasn’t something that made sense eons ago. That probably helps manage a lot of the difficulties. Would be interesting to know what the yield is. One can be sure that there is lots of redundancy and mechanisms to route around failed bits.

Still many interesting questions about the design. The whole shebang only has 44 GB of memory. That may be a limit for some things. 50kB per processor. Hard to know.

Voyager · June 29, 2024, 8:09pm

Trilogy, the company mentioned, was founded by Gene Amdahl who was chief architect of the IBM 360 and the founder of the company that bore his name. I followed this since I sometimes went to meetings about an attempt to do WSI when I was at Bell Labs. It failed and they sold the technology to Alcoa of all people. Trilogy failed also.

I’ve been reviewing papers about how to test TSVs for easily 15 years, but it has been in the last 5 that I’ve seen stuff from industry about it.

I know wafer yields - I owned this information for the last company I worked for. It’ll never be even close to 100% for any kind of aggressive process. I don’t remember ever seeing a totally yielding wafer. One company from about 15 years ago was making a chip with 100 processors on it, with many spares, just like memories have spare rows and columns.
BTW there is large amounts of memory on a processor, in the cache, Looking at the die photo of one chip I worked on shows that about 30% of it is the L3 data cache, shared across six CPU cores. Standalone memories are a lot cheaper to design and fab. A lot of the slowdown is from interconnect, and 3D packaging is designed to cut that down by eliminating the need for buffering as done for chips mounted on boards.

Francis_Vaughan · July 1, 2024, 3:28am

Yeah, I should have been more clear that I was referring to DRAM. Caches are a huge component of designs, but are composed from the same logic as the rest of the processor.

I do remember IBM being very proud of a halfway process that put a form of DRAM on the processor die. But that was decades ago and I have no idea what happened to it.

Darren_Garrison · July 1, 2024, 2:09pm

Like wafer-scale chips, it’s back, baby!

https://www.sciencedirect.com/science/article/pii/S2773064622000160

Topic		Replies	Views
Will we ever advance above the integrated circuit? In My Humble Opinion	15	994	September 23, 2018
Artificial Brain Miscellaneous and Personal Stuff I Must Share	5	708	May 31, 2001
Explain to me the rise of GPUs (blockchain, AI) Factual Questions computer-hardware , ai	11	787	April 23, 2024
Of Ants and Silicon In My Humble Opinion	8	1109	February 14, 2007
Random/Chaotic Computer Architecture Factual Questions	1	714	June 8, 2002

AI chip architecture: any biomimetic work in progress?

Related topics