Worse, finding a way to make one CPU (now called "core) run reasonably better performing, even with massive increase in transistor count, is ALSO now a problem.
The Speed improved toward 4GHz, the (average) Instructions per cycle improved from 0.05 to 0.1 to .5 to 1 to 1.5… But now both are blocked…
eg The next megabyte of cache doesn’t add much to performance.
Thus we move to more cores… and of course, making more use of the hardware, eg by doing graphics co-pro work too, eliminating main memory to co-pro transfers, and getting even graphics work done ahead on time, as in on spec execution of the code… (eg on spec calculation of the constant velocity, AND at the predicted change , if that occurs both being calculated)
people keep parroting this tripe, but it’s counter to reality. ever since it Vista, newer versions of Windows have run better than their predecessors, and it has been “slimmed down” to the point that it runs on phones.
I know what you mean and agree with the spirit of your comment, but trust me: the demand for faster hardware is far from over. We software types keep adding layers of gunk, and the damn users keep asking for nifty new features.
Just try starting work on an open-source project, even a seemingly simple one, and count the number of huge open-source libraries that get pulled in before you can get started. We even have systems to help pull all that gunk together for us, making it easier for us to pile crud on top of crud! (OK, a lot of it is really good crud, and I’d hate to have to reinvent all those wheels!)
The future is devices that anticipate what you want and/or learn from what you do, and it’ll take many more layers of gunk and serious CPU horsepower to do it even passably well. (And it’ll make our devices even more infuriating, since they’ll often guess wrong, and be less predictable. Yay! But once we get used to them, we’ll groan if we have to do without them.)
There is no end to the need for processing horsepower. You can call that Learjeff’s Law. It’s closely related to MacAwber’s corollary, which is that no amount of money is enough.
I agree, especially if we ignore that bit of tripe called Metro.
I’ve had that same impression several times in the last 35 years or so. As it turns out, I was wrong each time. I was using the same faulty logic that climate science deniers and bad spouses use: “But what have you done for me LATELY?”
I wouldn’t be surprised to find that gaps between significant improvements get longer as time goes on, since it will take even bigger revolutions in technology to maintain the trend.
I also remember papers on limits to Moore’s law based on rather pure physics, back in '78 or '80, that predicted it couldn’t possibly last more than 20 years. Those papers now remind me of ones discussed by Arthur C Clarke from highly regarded pre-War scientists who “proved” that orbital rocketry was impossible, based on specific impulse, thermodynamics, and payload ratios.
Seriously. My background is in scientific programming, so I’m used to writing stuff “close to the metal” as it were. I’m always amazed when I do more traditional programming how everything rests on abstraction layer built upon compatibility layer upon abstraction layer. Generally it seems to work well enough, but it does kinda make me feel like I’m building ontop of a teetering tower of chairs, each balanced precariously on top of the one below it.
But I still think what I wrote five years ago in the post you coded has held up (and it sounds like you agree). While the need for better hardware hasn’t stopped, its slowed down enough, at least for desktop computers, that most people need to upgrade far less frequently in the past to get good performance on new software.
Indeed, if my memory is correct, I think I’m writing this on the same computer I wrote that post from 2011 (plus or minus a new video-card maybe).
Massive single core performance improvements would require not just clock speed improvements, but architectural improvements. But in most cases you get more by getting the performance improvement through multiple cores rather than one increasingly complex and hard to design and debug core. Plus you can pump out the next generation much faster if you reuse the core from the last generation with minor tweaks.
I realize this is 4 years old, but we still don’t have 1000 core process, not counting GPUs. I know someone who worked in a startup doing a 100 core processor - it died.
Unless you have a very specific application, you are going to get very minimal speedup with this many cores - most will be idle for want of useful work.
One big reason for the slowdown not mentioned here is money. New fabs are very expensive, and we haven’t paid off the old ones yet. Since there aren’t a lot of places doing 20 nm now, they can afford to hold off and amortize their big investment over more years.
I’ll throw in another thing to consider. With the explosion of mobile devices raw power isn’t the main concern, it’s “CPU-power / energy-use”. E.g. a CPU that has half the power but 1/4 the energy-use is better than a full-power CPU. The massive server farms that Google and other large internet sites have similar requirements.
My guess is that if you measure, say, FLOPS/watts that we’ve easily stayed on course with Moore.
We should remember that Moore’s Law isn’t a law, it is an observation. It has almost had the force of a law because people designed their roadmaps assuming it was true and that their competitors would follow it, so it was a self-fulfilling prophecy is a sense. But all that it takes for it to slow down is for people in industry meetings to talk about how doing the next node in 18 months doesn’t make any sense. Which I’ve heard.
In fact the International Technology Roadmap for Semiconductors (ITRS) is where we try to figure out what is coming.
The days of drooling over a new faster chip are nearly over. For domestic applications we are not seeing the demand for higher chip speeds, stuff is moving to the cloud, mobiles etc.
Gamers drool over the next graphics card or roll out of fibre to the house and not the next CPU chip.
Come with me to the corporate world and even we don’t look at individual chip power as much as we did, virtulisation is taking care of business.
My experience is that over time the updates to the installed software (Windows, Office, etc) demand more of the machine. The software gets bigger while its container stays the same, until I note significant slowdowns in the performance. I don’t mean that the new version of the OS runs more slowly on my machine, but rather the update to the existing OS runs more slowly on my machine.
It’s been my experience since at least 2004; I believe I am on my third computer since that time.
As an old embedded systems software engineer, I sneer at your concept of “close to the metal”, having written routines to multiply, divide, fixed-point trig, and handle interrupts, in assembly code.
But as someone currently writing Java code, I totally agree!
It’s slowed down a bit in the laptop market, while keeping up in the mobile device market. Just a marketing shift of focus. Furthermore, there has been a dip in core speed advance while shifting from 32-bit to 64-bit processors. That’s a bigger change than most people realize (both in terms of achieved speed and transistor requirements). I suspect we’re still following Moore’s Law regarding transistor count. The fact that it isn’t translating into clock rates should be no surprise.
The computers I’ve been given to use at work get replaced every 2-3 years, and I always notice the improvement, though I admit it’s been a bit less noticeable the last couple times. Meanwhile, we generally keep our home computer for 6 to 8 years, and I use a number of tricks to keep it from drowning in the morass of bloatware(e.g., ADD MEMORY, reinstall the OS). I keep cars until they fall apart too, and they’ve been lasting longer lately!
That’s true today but it will eventually be false, thanks to the desire for more intelligence. I suspect the highly scaled cores will be simpler, though. I recently worked on highly scaled cores, for processing packets in high speed internet aggregation routers for a major vendor. Those cores were definitely simpler than x86 cores, and were optimized for a different work model than a typical personal computer. That may be the case for super smart devices of the future too. Meanwhile, we have continued to improve GPUs at a rapid pace, though perhaps only gamers notice it.
Excellent point!
Twice as fast but there are twice as many threads.
Virtualization will increase the demand for speed (and as alluded to above, operations per joule).
Slightly generalized, this has been my experience since about 1978. It’s what programmers do: we keep adding cruft until the damn things seem to slow. Then we look for opportunities to optimize and fix the worst of it. Then new hardware comes along and bails us out, and we write whole new layers of fancy cruft, and the cycle continues.
Hah - a high level language! Real men write in microcode. I’m an old microprogrammer, which is a pretty dead field these days, but I saw a paper from Intel yesterday which makes me think they are still microcoding x86 chips.
We made multiple core processors, for this kind of application. The problem is that single thread performance sucks. I was at a conference yesterday and was at a table with someone from nVidia, and we talked about the application of GPUs for parallelizing the stuff we do. They are SIMD machines, or course. A lot of the issues sound exactly the same as for the Illiac IV. (Getting data routed, etc.) However the processors are getting more general purpose, and so more useful.
Some very specific applications can see super-linear speed up, but they are rare.
An example is a search problem. When you have a choice about which branch of the search tree to take, there are several heuristics to guide you. If you assign processors to each, you can stop when the first one finds the answer.
On the other hand there is an application where the workload is easy to assign to different machines, but all of them have to get done. The different tasks can be very fast or very slow. We found that the slow ones dominate the run time, even with dynamic scheduling. (You can’t easily estimate run times.) So you saturate relatively quickly.
I remember one process control device coder showing me a program he was debugging, that used a linear approximation for a quadratic equation. 2^2=2 for sufficiently small versions of 2.
there are a number of issues with the future of Moore’s law. device density and technology is one, science is anothe -
As we reached the limits of litho-photography, for example, other technologies have allowed chip layouts with much smaller elements.
The speed of light (and hence, of electricity) is an issue - as clock cycles get shorter, a pulse from logic gates cannot travel fast enough to reach the target gates/transistors from the source; to some extent, computerized design can help minimize the path a signal needs to take. But generally, this is what is stalling the microprocessor speed just as it stalled minicomputers like VAXes two decades ago, and mainframes 3 decades ago.
The solution, for microprocessors was same as mini- and mainframe computers - more cores. Open Windows task manager (or LINUX equivalent) and see how many concurrent processes happen simultaneously. So many of these do not need reference to other tasks very often, so parallel processing improves speed… but only to a limit. The days of Apple ][ when the processor regularly dropped what else it was doing to guide the floppy disk are long gone.
There may be a future in which massively parallel programming solves some of the problems of processor limitation, but that requires a much more intensive design technology. Such tasks as analyzing options for chess moves, or analyzing language patterns to predict speech recognition, can all be done processing a huge number of choices in parallel. In these sorts of applications, multiple processor count is advantageous.
Again, the speed of the threads can be improved by pipelining, where the processor attempts to predict and pre-fetch memory contents as processing is happening to ensure the processor is not waiting as often on memory.
All these tricks have kept processors seemingly improving even as basic CPU speed hits a wall. Like so many other situations in situations as varied as financial markets or technology, often a logarithmic s-curve is mistaken for an asymptotically climbing curve by those on the sharp ascending part of the curve. Moore’s Law is a huge number of overlapping logarithmic curves, each one a different trick. The question is, can we keep pulling these rabbits out of the hat?