An end to Moore's law

Huh? Memory is now cheaper than it’s ever been. Not wasting man hours optimizing to hell a piece of code to save a couple of MBs isn’t laziness, it’s a good business decision. And the fact that Word documents only open as quickly now as they did ten years ago doesn’t say much: Word documents are now vastly more complex than they were ten years ago, and you only open them a few times a day: what’s the point of spending man hours optimizing that?

Moore’s law was only ever about the number of devices on a chip - it was one of his colleages that realised that this had a close implication for speed. Bsck then ever transistor was precious, and it was a matter of how to make a processor work at all, often by very clever tricks that used a minimum amount of logic. Adding more logic meant that more could be done in a single clock tick, and the processor went faster - even without any increase in clock speed.

Looking at current processors you see a range of things that make them go fast. The most critical is cache memory. A reasonable sized L1 cache with optimised performance is what makes thing fast. If you look at the history of processors, in a very broad sense, much has not The changed. The number of clock cycles it takes to get to the first level of memory has been of much the same order. A few cycles. And the size of the that memory has not changed much either. Some number of kilobytes. All that has happened is that the processor has got much faster. But the actual size of the processor has remained pretty static for some years now. Increases in device count on the die have been put into bigger and deeper caches, and duplicating cores. There have been quite a few times when a generation change in silicon process brought about a shrink of the core design - but no other design changes, and the extra area freed up by the shrink was simply filled up with more cache. The smaller process clocked faster, which got you some of the speed improvement, and the increase in cache size got you the rest.

The cost of going to main memory for a data item is of the order of a few hundred clock cycles. A modern processorc ore can execute an astounding amoun of work in that time. With four ALUs, plus SIMD ALUs running, you might get a peak of nearly a thousand operations completed. So the cost of a cache miss that goes to main memory is very significant. Indeed if you compare it to machines of the 70’s, it isn’t too far off the cost of a disk access in relative terms.

As word documents go, Microsoft word is extremely inefficient. It takes twice as long, sometimes more, for me to open MS Word than OpenOffice. And, I’ve installed about a dozen Open Office plugins which give it much more capability than standard MS Word.

Word is simply inefficient, and it’s not the core of what I’m saying anyway; there seems to be a mindset that efficiency isn’t all that important because inefficient programs are counteracted by more powerful systems. That’s bad, because it assumes that 1) systems will always be more powerful, 2) people want/need/can afford more powerful systems, etc.

Or not.

There are lots of exciting things that we can only do on supercomputers. I personally wish Moore would change his law from 18 months to about 3 months.

what about optical computing as a way of getting single cores above the 4 Ghz mark? is that branch of research going anywhere?

Eponysterical!

That limits how small you can make electrical components, but not how small you can (theoretically) make computers.

We already know of some exotic particles that may be more suited to building incredibly small machines than electrons, and, really, I don’t know if there’s any reason to assume that quarks and neutrinos are really fundamental particles, or if there are still smaller particles underneath.

I’m not claiming that we will be able to build computers using potentially unknown technologies and undiscovered physics, but claiming that there’s a fundamental limit for something 75 years in the future seems a lot like saying you’d never be able to build a television that fits in your pocket because you can’t make vacuum tubes small enough, or claiming that a city can never grow above a few million inhabitants because you can’t haul enough food in by horse.

Optical computing may provide a way to get clock rates higher. I’m not a silicon designer, but my understanding is that the limits are some combination of switching speed, power consumption (and heat dissipation), and total path length.

Switching speed: Semiconductors don’t switch instantaneously. You can generally make things run faster by applying more power, but that starts to cause problems with interference and heat.

Power consumption/Heat dissipation: Processors use a lot of heat. The faster your run those components, the more heat you generate, and there’s a limit to the amount of heat you can actually dissipate before the chips start to melt.

Path length: At 4 GHz, the speed of light limits your maximum path to about 7cm. In actuality, it’s shorter than that because of component delays, but let’s ignore that for now. Chips are commonly a centimeter or two across, but paths don’t go in a straight line. The more increases in speed, the more likely it is that you can’t actually get a signal from one part of the chip to another in one clock cycle. You can certainly live with that using a variety of tradeoffs, but those tradeoffs start to look a lot like using multiple independent cores anyway. As the speed goes up, you can make the chips smaller, but then you increase the concentration of heat and start to run into more problems with powerful electrical signals bleeding into other parts of the chip.

I don’t know enough about optics to know how well different approaches can solve those problems, but I’m pretty sure that it’s being heavily researched (and that it comes with problems of its own).

Bah, the internet is a fad.

Exactly! We can go on and on about the limitations based on what we know, but major technological advances will be driven by the things we don’t even know that we don’t know yet. Ben Franklin was not sending up kites with keys because he really wanted to have an iPod. Rather, he was performing an experiment that combined with the work of many other people to define, understand and harness electricity. Only once we knew enough about it could we start talking about building iPods.

Let’s not forget about the 50-year-old Scientific American article in which they postulate, someday, a computer small enough to fit into a single room!

As for Moore’s Law, you have to interpret it much like population growth curves. We can say that the human population doubles every 25 years, but no one interprets that to mean “forever” or “to infinity.”

Even if clock speeds aren’t going up, feature sizes are going down on schedule, with people from Intel talking about 8 nm. If there is a delay, it is due to economic not technological factors. So, we’ve got some time yet.

Multicore designs have advantages besides speed. You can put your transistors in cache, which is being done, you can put them into bigger word sizes, which was done, you can put them into multicores, or you can put them into increasingly complex processor pipelines to get speed through better prediction methods. (If you can predict which way a branch is going to be taken, you can prefetch along that path, and save a wait for memory.) But more complexity adds to design and debug time, and requires more designers, and you risk missing the market window which has happened more than once. Designing one core and replicating it is much simpler, and reduces time to market which is all important.

We have a long way to go. I know of one processor (I’m not sure if it ever came out, the person I knew working for the company left) which had 100 cores. This will happen.

The old model for use of parallel computers was to increase the speed of one massive computation. That’s not how the average person uses them today. It is a lot easier to allocate processes across multiple cores than to split a process. This is nothing new - when I was at Intel 13 years ago I had a 2 cpu HP workstation. I got one, the other was used to run big simulation jobs spread across the network.

It just occured to me.

The first company that invests a bunch of time and effort on a chip that has XYZ nanometer small features and miscalculates/overestimates the probabability of sucess will likely be in deep financial/marketing doo doo.

Here’s how it works. You start with test chips, that are designed to have a variety of cells that will be used in the library, and also a variety of measurement devices that let you measure speed, etc. They are fairly simple, and are used to help refine the process. Of course a lot of the assumptions about the process have been verified in the lab long before this. After that, most fabs start with memories, since again they are relatively simple and the shrink is usually really useful in making them faster anc cheaper, with more capacity. After that microprocessors and graphics units are process drivers, because they benefit more from a new process than ASICs. If for some reason a new process wouldn’t work, we’d know about it long before a product is committed. Not that there isn’t lots of pain in getting it to work.

An interesting point about the chip manufacturing process versus the chip design. I was talking to a guy at Intel a while ago, and he said that for them, the manufactuing process and the design of chips are totally separate. The process guys work in a totally different part of the company, and once they have a new process working, they simply write the design rules. The chip designers receive these design rules pretty much as revealed truth - commandments etched into stone - and must design according to them.

This division is very likely the result of some very painful and extraordinarilly expensive mistakes in the past.

Large numbers of cores on the die have been imminent for a while now. But Sun (Oracle) have kiiled the Rock (16 cores) and Intel have pretty much killed Larabee. Sun still have 8 core chips. So we are currently left with GPGPUs which don’t quite count just yet. Of course the Cell had 9 cores four years ago. Personally, I’d like to see a CM-2 on a chip.

If you are referring to Tilera, I believe the 100 core chip comes out in Jan or Feb. They currently have a 64 core chip.

No, someone else, who didn’t have any product out at the time I heard the talk. Interesting, though. Lots and lots of cores are coming, though I don’t quite see the utility for the home user.

Examples? The FDIV bug had nothing to do with manufacturing, and the problem with one major line of Intel parts came from making some bad choices during a die diet.

I think your friend’s view of the division is a bit simplistic. First, the design is mostly done in RTL, and even circuit designers work more or less at gate level. The interface comes in two places - the library design team, and product engineers who work closely with the design team. DFM rules these days are pretty complicated, but the everyday designer doesn’t have to worry about them all that much. No one is good with worrying about functionality and via sizes both.

Intel’s real strength is manufacturing, which they do better than anyone. A friend of mine who is a professor did a sabbatical at an Intel fab, and I asked him if the fab people were pretty happy. He said that even they felt like second class citizens compared to the designers. It used to be an absolute requirement to have run a fab before getting really high in the ranks, but that seems to be no longer true.

Nah–the problem is that if you can figure out how fast your quantum computer is running, you won’t know where the hell it is.

I wasn’t referring to that. Nor any design flaw. The really nasty issues were the ones one never heard of. Chips that had appalling yield rates. Chips that were taped out, and didn’t work. Not just subtle bugs, but didn’t work. This sort of thing was more the subject of rumor. And a long time ago. Very very expensive mistakes. Mistakes that can sink a company. So utterly rigid design for manufactuing rules, where the logic designer has no wiggle room at all is going to be a good start.

Depends greatly upon the part being designed. RTL is pretty much the basic level for most designs. Heck, many chips are directly synthesised from RTL. Intel designs at the gate level for processors. They optimise down to the transistor geometry level with automatic tools.

It is these I was referring to as set in stone.

Intel seems to be going a little odd of late. It used to be that their corporate philosophy was that they did fab better than anyone, and that chip designs were simply a way of selling what the fabs did. Now they seem to be viewing x86 as core competancy. This is going to end in tears.