Are microprocessors approaching a speed limit?

If I understand correctly, increasing the frequency means making the transistors flip on & off faster but at some point, they won’t have enough energy to flip fast enough and you’ll start getting errors, right?

At that point, you need to provide them with more power by increasing the voltage, right?

Ceteris paribus, what’s the relationship between frequency, voltage and heat? Does the required voltage increase at the square of the frequency? Does the heat increase linearly or at the square of the voltage?

Can you give me examples and tell me what those examples have in common that makes them need good single thread performance?

Can you give me examples and tell me what those examples have in common that makes their performance not level off as more processors are added?

Real brogrammers code in 0s and 1s.

What applications are you writing that are so CPU intensive? I’m asking because it seems like more and more these days, there are fewer truly CPU intensive applications left.

I second that you should try to get some degree of multi-threading into your program, theres not many problems that can’t use it at all.

But if you want the absolute fastest single core performance, get an Intel 4790k i7 and spend some money on serious cooling. It’s been overclocked to 5.5 Ghz with liquid cooling and 6.4 Ghz with Liquid Nitrogen cooling:

I do AI work in sequential decision making. Some of our best methods do a large number of guided random searches per decision to make a choice. But note “guided random”, the “guide” is that the second random search relies on information gathered from the first random search, and the third random search uses the results of the first two to inform its search, and so on. In practice, you get no benefit, and often outright degradation, by relaxing the constraints so that it only takes “finished” previous computations into account, even if you increase the absolute number of computations it does. You may get some benefit by running 8 separate searches in parallel and allowing them to vote on the answer (called “bagging”), but not enough to be worth a damn. Maybe it would be if you had 3000 cores.

The actual math involved isn’t actually complex enough to benefit from multiple cores either.

Almost all serious mathematical calculation is essentially nonparallelizable. That’s hardly surprising, of course, since all the algorithms were invented by conscious reasoning – which is a strictly serial process. (There are of course many things done on the consumer end that are trivially parallelizable because the components have no serious interaction, such as a Web browser updating its page contents at the same time as a movie app displays the next frame.)

There may well be highly parallel algorithms for doing nontrivial mathematical calculations – e.g. solving differential or integral equations, doing simulations with long-range interactions, et cetera – and indeed one might suppose they must exist, in principle, since the human brain appears to do things for which we lack parallel algorithms very fast with a ridiculously slow “clock” speed (measured in kHz at best). It must use some kind of parallel computation. Unfortunately, despite an enormous amount of effort over the past 50 years (parallel computing having been “the future” for about that long) there have been few genuine advances. It’s a very, very, very hard problem.

The one exception I might argue is in the field of AI, where we have an actual working model to study (inside our heads). Here, there have been some nice advances in methods of inherently parallel computation that apply to things other than trivially parallizable tasks. This is also true for subfields of AI, like machine vision or natural speech processing. But for something like solving Newton’s equations or the Schroedinger equation for a few thousand interacting degrees of freedom – we have no genuine parallel algorithms. The world is still waiting for the mind that can conceive them.

It is not so much the transistors flipping but that the input signal doesn’t have enough time to get to the opposite value. Those square waves you see in timing diagrams don’t exist in real life. If you strobe the output of a gate when the input signal is at a value between 0 and 1, you’ll an output value between 0 and 1 also, and what the next stage sees is very implementation dependent.

I got the impression that the person I mentioned was doing heavy arithmetic calculations where each step was dependent on the one before. While logic and circuit simulation is theoretically parallelizable by splitting the circuit across processors, actually the communication overhead makes it impractical. We do simulations in parallel by distributing input sets over machines. Which is gross parallelization. We’ve got thousands of processors to run our simulations.

I’ve on heard about it but here is a reference. If you are interest search for super-linear speedup.

Nah. Only you have to enter the boot instructions for a PDP-11 from the control panel switches. Which I did many times when I TAed the PDP-11 assembly language class.

To a good approximation, light travels at a rate of one foot per nanosecond.

Well, somethings can be multi-threaded, but that just isn’t going to happen anytime soon.

For my work we use a program that converts large files from a proprietary data format to a comma separated value (CSV) format. The proprietary program that does this conversion is single threaded. As the years have gone by, the data files that need to be converted keep getting larger and larger (for other unrelated reasons), but single threaded performance in my last three machines has not increased much. The result is that it takes longer and longer to convert the data. At this point it take like half an hour for each one.

Could it be made multi-threaded? I am sure it could be. All I need is to give the people who wrote the program like $200,000 and I am sure they will be happy to re-write the thing. The problem is that we don’t have that kind of money laying around.

Is my example an industry-shaking problem? No. There are probably only a few hundred people in the world who even use it, but it sure sucks for me!

It would be awesome if chips did keep getting faster at serial tasks (like in the good old days!), but what can I do? :frowning:

Is the conversion dependent on what was in the file before? if not, you can split the source file into pieces, run each on a different thread or core, and combine at the end. Volila, cheap parallelism.

I parse and load files into a database. I do it sequentially now since it runs under cron in the middle of the night, but I could trivially distribute each file conversion to our compute ranch. I generate web pages for each chunk of data - that I do distribute because it is simple. The code waits for them to all finish and then goes on to the next step.

Lots of parallelism can be done simply. I’ll do it for you for only $50K after I retire. :smiley:

Is there some physical law that limits the thermal efficiency of a transistor?

O.P. here. I originally started this thread because I saw an advertisement on TV with Jim Parsons. Intel was advertising their next generation of processors that use far less power. If those new processors came with a substantial increase in single-thread throughput, it might be time to get a new PC. They don’t. I won’t.

Thanks for that. Informative. I haven’t actually done anything with hardware since I made a digital clock for one lab project out of 74-series chips in the early 80’s. My textbook for architecture (for programmers) went through building a very simplistic 4-bit computer out of gates and flip-flops. The last time I looked seriously at assembler was the 286 days. Times have changed.

yes, I think the last news I read was IBM saying “Wow! We managed to actually make something at 7nm”, so it’s not yet production.

But I think the majority of progress, like many other things in life, are a series of overlaid “S-curves” (sigmoid functions?). People on the rising part of the curve of any phenomenon project as if it were asymptotic or polynomial increasing, but eventually limiting factors catch up and the progress in that regard flatlines… until a major development means a progress in a different direction.

I think we are seeing that with microprocessors. As I said, speed seems to have hit a practical limit, we’ll only see minor increases; density is reaching a limit for heat dissipation; fabrication limits have slowed how small the devices are getting.

Of course, if a completely different technology leaps into the breach, then all bets are off again. We see this with solid-state replacing spinning drives. I’m surprised that full computer on a chip is not yet a thing (I mean the whole thing - SSD, RAM, processor an video) to speed the interaction. This would certainly simplify the elementary computers, such as tablets and cellphones.

There are some tricks you can do to decrease power consumption. Lowering voltage and going slower is one. Keeping it cold is another. We test at both 0C and 95C and the power consumption is much higher at 95C. But a lot of it is shutting off parts of the processor not getting used at the moment by complex clock gating schemes. Even high power processors do that, but it is more important for mobile ones. Intel hasn’t been good at this traditionally.
I’ve run panels on this subject but I’m not directly involved in it. You put our processor in your pocket and you best head straight for the Burn Unit.

They don’t teach this stuff except in logic design classes and mostly not even there. Most papers on this comes from industry.

That’s my understanding also.

We’re dealing with heat okay, though heat sinks these days are huge. Like I said, it is mostly economics. I’m going to be retired before I have to deal with the next process node so I don’t care. Running out of steam might be a good thing. If fabs no longer have to spend billions on the next node, and progress is made through architecture and packaging, chip prices might come down, both from less capital investment and from higher yields as the process matures.

Cost of a chip depends strongly on yield, and yield depends strongly on chip size. It might be cheaper to have a smaller processor and real small and cheap chip sets especially if you can use 2.5D or 3D packaging to decrease the size of their footprint. So it depends.

Moving charge from a high voltage a lower voltage releases energy. 1’s are typically a node a high voltage. That node will have some capacitance associated with it which specifies how much charge is needed to get it to a high enough voltage to represent a 1. When the node switches to containing a 0 that charge moves to a lower voltage area and energy is released as heat. By making the transistor’s and the metal wires that connect them smaller you can reduce the capacitance of the nodes and thus reduce the energy used for each transition from 1 to 0.

It is a little more complicated than the above as some of the energy is dissipated when charging from 0 to 1 and some when discharging from 1 to zero. But basically it is the amount of current used times the voltage that the chip runs at. Each new process node tends to reduce the size of feature reducing the current. The voltage needed to run the transistors hasn’t had much change for a few years most things are running at around 1 to 1.2 volts.
This is the first order model for power used in digital circuits. There are other effects that come in if you take into account that transistors may not turn all the way off (leakage) and they take some time to switch from on to off allowing extra more current to flow than is needed to just charge and discharge the nodes.

Well, only if you’re doing a single calculation. But I’m not aware of any cases (except maybe calculating extra large prime numbers) where you would only be running through a single math formula once and having it take any measurable amount of time on a human scale. Generally, you’re performing the one formula over large quantities of input data.

So you split up the input data into chunks and calculate those in parallel on different threads.

Thats exactly the sort of task that should be very very easy to multi-thread. Try and contact other customers of the program and as a group ask the vendor to make it multi-threaded. Otherwise, proprietary file formats are decoded by third parties all the time, they usually aren’t encrypted and are standard file formats that are very easy to work out with some educated guesses.

Again get a pool of users together and offer a price for getting it done on freelancer.com. Plenty of decent coders that could do a task like this on there.

A lot of the actual point of a program’s execution is sequential.
Multi threading can speed up the pre processes required to get the pertinent data into the registers for sequential operation. Multiple cores can divide up the multiple house keeping, routing of data. A result of one core can be passed directly to another core, instead of to memory and then back. Consider that a game program will have intensive math for vectors and such. But the graphics processor may need more than just the raw result. So that result can be passed to another core ( register ) for further processing, while the original core gets and processes vector data. This would take advantage of pipelines. One core getting from a pipe, the other sending out a pipe. A crude explanation. The human brain is very multicore. Different portions processing different input. Modern computers have many support processors that take the load of the main core or cores as well. Doing a huge amount of pre and post processing, that used to be done by the CPU.

To the original question.
Speed increases are very slow at the most basic level. Shortening paths and inventive data handling, more cores, etc… Are the most apparent increase in speeds. But the basic speed of transistors, memory cells can stall for various periods of time. To make a large increase in speed with all the outer things remaining the same, requires revolutionary changes in the speed of transistor switching and other such base level devices.

I’d clarify that as “performance is getting better, even though clock speeds are not [greatly] increasing”.

The Pentium 4 was Intel’s last hurrah in the Megahertz Wars. They claimed speeds would reach 10GHz at some point, but the fastest part was 3.8GHz, IIRC.

After that, they started working on efficiency (getting more done per clock cycle). Extra cores is definitely one way of doing that.

AFAIK, the Xeon X5698 is still the MHz King of Intel x86 CPUs (factory specified base speed, no turbo boost / overclocking / liquid nitrogen / etc.). Out-of-the-box it ran 2 cores at 4.40GHz (it supposedly has the full 4 cores in there, but 2 turned off - the extra cache on those cores is usable, however). It was also dual-socket compatible. The stated market for this part was high-frequency trading. I believe there were only 2 qualified systems for it - the Dell R710 and one box from HP. At least on the R710, having an X5698 meant that the 6 chassis fans were running full-speed (32000 RPM+) all the time.

The i7-4790K mentioned earlier in this thread has a base frequency of 4.00GHz (turbo boost to 4.40) and is a one-socket part. Of course, you do get 4 cores in the package.

The other reason that processors aren’t advancing past the 4-5 Ghz level is that there is very little demand for it. Even a 2 Ghz i3 Intel is fast enough for most consumers and office workers.

Increasing core count is a cheaper easier thing to achieve in terms of return for money spent in semiconductor R&D than further increasing clock speed. Most tasks that most people want to do, eg playing and encoding music and video, web-browsing and gaming can benefit from multiple cores. Most high end business applications, eg file and database servers are also multi-threaded.

So the pool of people that really need higher clock speeds is just too small to pay for the research involved.