Why has CPU speed growth slowed down so drastically?

Last I checked you can’t set processor affinity on a Mac which sucks for performance because it’s constantly shifting processes to different processors/cores along with the overhead of re-caching, etc.

Other misc. points:

  1. Heat is a primary limiter on CPU speeds
  2. 64 bit is not “faster” than 32 bit, for lots of code it’s slower, but it allows addressing more memory.

This is one of the largest reasons that most people overlook.

Be careful of thinking of CPU speed in terms of MHz or GHz.

For example, these benchmarks show that the new Core i7 940 at 2.93 GHz completes a single-threaded task (SuperPI) in 16% less time than a relatively new Core 2 Duo E8600 at 3.33 GHz. This is likely due to better memory bandwidth on the Core i7.

The point is that system bottlenecks can vary depending on the type of processing task, and that clock speed is a small part of processor performance.

I don’t think this is correct so I’ll try to clarify:

If you run two or more processes that can potentially saturate a single processor then a dual core processor will be twice as fast as its single core equivalent at overall execution speed. In your example each of the three processes would get the equivalent of 2/3 of a core to execute on and would all finish at the same time assuming they require the same amount of cycles to complete.

The hitch for dual core cpus is that a single process not designed to be multi threaded can never use more than the equivalent of one core.
I believe we are still seeing an exponential growth in the potential transistor count on integrated circuits, but instead of just getting faster processors today are also getting wider (multiple cores) and cheaper.

Here is an interesting article at ars [circa 2003] on moores law that touches on the subject a bit.

The limitations he foresaw to affect mores law (around ~2013): limits to achievable power densities, reduction in demand for the fastest processors, difficulties integrating varied components on a single die.

If it was cost-effective to develop chips with faster clock speeds, Intel and the other vendors would do it. The problem is that they have hit a wall. Besides problems in developing new processes for smaller and faster transistors, the speed of light is a major problem. When you are clocking the processor at 3 GHz, it takes a significant amount of time, relative to the clock speed, for signals to travel from one area of the chip to another area. This means that different parts of the chip effectively operate in different time zones. We have left the realm of nanoseconds and are well into the realm of picoseconds. Light travels about 3 mm in 10 ps. The actual speed of electrical signals is even slower. This greatly complicates circuit design. You can think of it as a bunch of small islands operating at the system clock rate, communicating with each other via high-latency communication links. If you want to send information from point A to point B, it may take many clock cycles for it to travel across the chip.

By the way (and largely on topic) what was a nice home computer running at at the beginning of 2000?

I’ve always built my own;

At the start of 2000 I remember I had an Athlon T-Bird @ 1.4 ghz. Pretty decent for the time.

Kicked off so much heat though. I had a largish heatsink/fan on that sucker. I remember affectionately referring to the CPU / Heatsink as “The Blow Dryer”

Whoops, I’m misremembering. I got the T-Bird at the end of 2000. Not even sure they were out in Early 2000.

Early 2000 I had a PIII @ 900 mhz

2000… I had an AMD K6-III/400 overclocked to nearly 500 or something like that, along with an appropriate amount of ram… I can’t remember for the life of me how much though.

The early Athlons were out, and so were the Celerons and PIIIs, and the P4s came out toward the end of the year, but the K6-2s and 3s, along with the PII/PIII were the more common ones in most houses, because processors were still outrageously expensive compared to the rest of the components.

The slow-down, as many others have noted, is in CPU design. Software has never driven CPU design (very few people wrote software for CPUs that weren’t on the market yet) but instead new software has been released to take advantage of the latest hardware. The slowing growth of processor speed is starting to worry the academic circles. Without faster CPUs people won’t upgrade their PCs. Without new PCs people won’t buy new versions of software, leading to stagnation of the computer industry. A fair amount of research is trying to find ways to multi-core programming easier.

Incidentally, the computer industry is still following (the orginal) Moore’s Law. More noted that the number of transistors on a chip was doubling roughly every two years. The latest CPUs continue this trend. It’s just harder to take advantage of all that power.

This is it - or close to it.

Power is the real issue. The faster you compute, the more power you need. Supplying the power, and removing the heat, is getting harder and harder. In the good old days of 1 micron designs, when the clock was off transistors would draw almost no power. Today, in order to get speeds up, transistors leak like crazy. Duplicating cores and reducing single thread performance is an easy way to cut power. Also, companies with lots of servers are paying tons for electricity, especially as electricity prices increase.

The reason we haven’t gone to smaller process nodes as quickly is more economics than technology. Each fab gets more expensive, and with greater integration there are few chips that need to be made at that node. When I was doing ASICs we had four or five different ones in a system, now we have one or two because some of the functionality gets put on the processor, and more gets put on a single ASIC. Mask costs are enormous also, so there is more use of slightly slower FPGAs now.

We will hit the wall eventually, but we’re nowhere near it now, so that is not the reason.

The fundamental problem in processor architecture is how to use all the transistors you have available. One solution is to increase cache sizes, since that improves performance and is fairly simple. There is only so much you can do with longer pipelines and improved speculation. (I’ll define these if anyone cares.) Duplicating cores works really well, since you get good performance with simpler cores and the increased complexity is only in keeping them from stepping on each other’s toes.

What software takes advantage of features in such detail that this would be an issue? The word today is backwards compatibility in almost all families - x86, Sparc, and Power. The only attempt at a really new instruction set, Itanium, has failed pretty badly, hurt in no small measure by how badly the first Itanics released emulated the x86 architecture.

Parallel programming was already a hot topic when I was in grad school 30 years ago. There are very few applications that need it. Most use is made by assigning threads to different processors, making the parallelism take place at a high level. Even fairly old servers have lots of processors in them.

This doesn’t really complicate circuit design (which is the term commonly used for the design of library cells you use to build a processor) but it does complicate floorplanning and layout. Most signals don’t go very far. Global signals have things called repeaters, to clean them up, and are known to take more than one clock cycle to get to their destination. Sometimes there are multicycle paths locally, that take two clocks to make it from one flop to another. Mostly you want to register the signals to make the timing cleaner.

Often the versions of a CPU with different speeds are exactly the same. The separation comes from a process called speed binning, done during test. You find the fastest clock at which the CPU works, and then put it into the fastest possible bin. Faster bins get sold for more money, so there is a big incentive to find the paths that are slowing down the processor and fix them. The very last person working on the design of the original Pentium was doing this, years and years after it was introduced.

Clock tree layout for processors is incredibly complicated. Almost all the CPU runs on one clock domain, with the clock internally generated. There are other clock domains, mostly for I/O. Dealing with many clock domains is a big pain. When I was at Bell Labs we had telecom chips with zillions of them. Processors are actually simpler than ASICs in this respect.

I think a big factor here is that you’re talking about notebooks. Notebooks have major issues with cooling and power usage that desktop systems don’t, and obviously cooling systems have certain very concrete limits on their efficiency.

In airborne radar, we spend a lot of time figuring out how to distribute the processing to multiple processors. A big problem we face (with CPUs which don’t share RAM) is corner turning; e.g., remapping data from the time domain into the frequency domain after an FFT & distributing that to other nodes. Historically, this required that the software designers needed to be experts in the problem domain, and we (humans, not my company) still haven’t developed tools that automatically solve this problem for us. It’s annoying to have 10 CPUs waiting for one other CPU to finish so they can move onto the next stage, and when you can see a couple of dozen CPUs, all running at 95% loading, it’s a thing of beauty. Of course, with the move to superscalar processors with all sorts of memory caching, we can’t feel comfortable with 95% loading – you should really shoot for something like 75%. Mine is a minority opinion, however, and others feel comfortable above 85% loading. But I just don’t trust these newfangled memory controllers yet.

Also, regarding desktop software that was written to be singly-threaded: any time they make an OS call, if it’s a multiprocessor-aware OS, it can direct that function to a different CPU, so even those programs can benefit from multiple CPUs. As far as the program’s thread is concerned, it just made an OS call; it didn’t necessarily have to set up semaphores & do complex memory management – it just gets its answer really fast and goes on its way. (Of course, this is highly dependent on the particular function that’s being performed).

Mobile processors are even more power aware, and I suspect that they have a lot more power management logic built in. Most processors have clock gating logic that shuts down parts of the CPU not being used, since clocking things, even when nothing useful is happening, uses power.

My company doesn’t make processors for laptops, so the stuff I was talking about is for processors going into high performance servers.

I don’t think so. The only thing restraining demand for CPU horsepower is the limitation of the mythical ‘average customer system’ that developers are coding for. Games, operating systems, web browsers, antivirus, whatever - if the most common platform in use was an 8-core 6Ghz machine with 16gb of RAM, you can bet your bottom dollar that there would be plenty of software out there capable of pegging all the meters in everyday use.
And games drive niche demand for CPU - everyday demand is driven by the likes of Microsoft, Adobe and Norton - and that unholy triumvirate can crush any CPU.

This is why I love the Dope. Thanks, guys. A lot of different answers, and I guess the truth is some combination of them.

So I’m curious, now. How much faster (as a ballpark figure) would you say a 1.9 GHz CPU of today is than the 1.9 GHz Athlon XP 2500 I bought about five years ago? Would that figure change if we’re talking about desktops?

See GPGPU for more about the concept. Essentially - the way they design graphics processing units makes them more efficient at doing certain types of calculations, so in the future you may be able to do something like encode a video on your GPU faster than your CPU could do it. You may be able to do it now, actually - I remember a beta video encoder that came out shortly after the general release of NVidia’s CUDA that was supposed to make geforce 88 series cards faster than a core 2 quad.

There’s no such thing as “overclocking mode”. As I understand it, the main reason something like a thunderbird K7 ran way hotter than a thoroughbred core was primarily because the larger lithographic process required a higher voltage to push those signals, generating more heat.

Generally, as our CPU die process shrinks and our CPUs get faster, the operating voltages and temperatures are coming down. It’s significantly easier to cool a 3 ghz 45nm core 2 duo (they come with a dinky little aluminum heatsink) than a 1.4 ghz t-bird (180nm) which could power an oven.

Clock speed doesn’t correlate as strongly with actual performance as you might think. One of the goals in designing the pentium 4 was essentially to design it in such a way that they could get really impressive clock speeds out of it. People might assume a 3 ghz P4 was faster than a 2 ghz athlon t-bred b when it wasn’t. They did different amounts of work per clock cycle, had different memory bandwidths and latencies, etc. Even factoring out the dual-core aspect, my 45nm Pennryn core 2 duo at 3 ghz would run circles around a p4 3 ghz. And early rumors of the Intel Nehalem architecture were that it could be 30% faster at the same clock speed compared to the Pennryns, although I think that may be under ideal multithreaded conditions.

I wouldn’t be surprised if demand for better CPUs has been dropping. Gaming has been a big driver of hardware development traditionally - but over the years games have become more and more dependent on the GPU rather than the CPU. Another issue is that most games are developed for consoles simultaneously with the PC, which means the technology for game development is still stuck in 2004. There are some PC games still trying to push the hardware edge, but it’s not the driving force it used to be.

Still, practical processing power has been improving substantially. A lot of the expectations set for raw clock rate numbers were skewed by the quirky and crappy architecture of the pentium 4. If you graphed out the actual number of calculations per second that the average consumer CPU put out over the years I doubt you’d see a significant slowdown in growth rate.

A comparison chart is here.

I don’t think so. If they can make a faster chip or a program which does something new or different they will create the demand. Microsoft is continually bringing out things which nobody really asked for and which many do not want but they kep creating new products and the demand for them. The new tabletop computer and other future OS will need way more processing power than we have today. We may think we do not “need” all those new features but Microsoft is telling us we will “need” them and most people in 15 years will definitely say they need them.