How do the advances in CPUs and GPUs compare?

In another thread, there was a general trend of people upgrading their GPUs but not their CPUs.

So, how have the advancements in GPUs and CPUs differed from each other? How about their memory?

I’ve heard that CPU performance improves by about 10% per generation (tick or tock), is that about right? How about GPUs?
If advancements have come faster for GPUs, how come?
How does it look like CPUs and GPUs will keep evolving?

Graphics processing is inherently parallelizable, as it relies on performing the same, usually relatively simple, operations on large numbers of pixels. Parallelizing general programming tasks is much harder. It is therefore easier to improve performance of a GPU just by adding more cores.

However that was a once off jump , from cpu (whether one core or 4… ) GPU to massively parallel CPU set , and the basic issue is the same for the GPU now, increasing transistor counts while using same or less volume,power, fab costs… Back to Moore’s law there…

The massively parallel CPU set has intimate shared registers… so that data can flow from one cpu to the next in a few cycles ,rather than say, as a network packet or as a file on disk or IPC buffer.

I don’t know if you can really compare CPUs and GPUs in terms of advancement.

First, desktop CPUs have been around a LOT longer than desktop GPUs.

Second, historically, desktop CPUs have generally taken advantage of other higher-end processor advancements, instead of pushing that envelope themselves.

Third, a lot of the ground broken for CPUs is also applicable to GPUs- the fab processes to generate modern GPUs, didn’t spring up out of Zeus’ forehead; they were engineered for CPUs.

So if it looks like GPUs are advancing faster than CPUs are, that’s probably misleading, in that Intel desktop CPUs have been evolving since around 1979, while 3d desktop GPUs, for the most part, date back to the late 1990s, with things like the 3dfx Voodoo series. (I know there were 2d acceleration functions prior to that, but they weren’t really GPUs)

So the CPUs had a nearly 20 year head start on the GPUs.
All that said, GPUs and CPUs do inherently different jobs. CPUs are generalists- they can do anything, but not necessarly as fast, as they’re not optimized for any one single job. GPUs are specialists- they do a lot of specific tasks in parallel at the same time and are designed specifically to perform those tasks as fast as possible.

All else being equal, it’s likely that your system is more GPU bound these days than CPU bound, so it pays to upgrade your GPU instead of your CPU. For example, I have a PC with a AMD FX-6300, and I can’t really get it to top out much above 30% utilization. Which implies that my GPU is probably the bottleneck, as it can take information from the PC proper and process it only so fast, and if the CPU utilization is so low, it’s probably waiting on the GPU.

So it would probably benefit me to get a newer GPU and get the whole system moving faster.

First of all, some form of graphic chips/circuits are around as long as we have CPU’s - read HERE and HERE

Basically every PC, Laptop, Console, Mobile (Windows, Mac, Android, Linux, Unix, etc…) that connects to a monitor/display has a graphic processor/chip/circuit in some form or another, otherwise you can’t display any image to the Monitor directly.

Graphic chips can be either dedicated or integrated, in both cases their job is the same - displaying images to your display.
The main differences are performance and/or requirements.

On the other hand the main job of a CPU is different to the work of a GPU, therefor the architecture and development is different from each other.
Some CPU’s even have a graphics chip integrated into them, its usually called a discrete GPU these days.

A GPU is very bad at running an OS, Excel, browsing the Internet - but very good at calculating graphics.
GPU needs to render Graphics

A CPU is good at running an OS, Excel, browsing the Internet and can calculate basic graphics.
CPU’s need to run your PC, Phone, Car, TV, Radio, etc…

Due to this, the development is different.

Here is a very simplified version.


step 1: getting faster speed - hotter - using more power - adding new features
step 2: running colder - using less power - not or slightly faster - adding new features
step 3: go to step 1


step 1: faster, new features, getting hotter/colder, using more/less/same power
step 2: modify, enhance, tweak
step 2: go to step 1

What bottlenecks tend to get in the way of adding more cores?

What parts of graphics processing are less amenable to parallelization?

I’m not sure you’re correct there: when the PC first came out there was the Hercules graphics card. Even the Apple ][ had video cards available. Now, if you mean 3D acceleration in the consumer market, then yes, you’ve got to look to 3dfx in the mid 90s.

Cost, size, power usage, heat dissipation.

That’s only true for a few types of software that are heavily dependent on GPUs, and therefore benefit more from GPU upgrade than CPU. Mostly games. It’s not really an indication of which technology is advancing faster. There are other types of software that don’t benefit at all from upgrading the GPU, and you’re better off upgrading the disk drive, memory, CPU, network, etc.

There has been some interplay for a long time as well. The SGI Reality Engine design was partly based upon Intel’s i860 CPU. Whereas earlier and later SGI systems used more custom designs, the i860 had some interesting capabilities that lended itself to graphics processing. So the geometry engine was a card covered in i860 chips (8 or 12) that formed part of the graphics pipe. Note quite consumer graphics however. Unless you had a spare quarter million.

Is memory bandwidth one of those? As in: It’s not worth it to add more cores if they can’t be supply with data from the VRAM.

How much of a difference is HBM likely to make and what kind of throughput could it scale to?

Memory bandwidth can be a real issue for multi-cored CPUs. My brother does reservoir modelling and one of his applications is single-threaded but is so memory-intensive that there’s no benefit to running more than one instance on the same PC, no matter the number of cores. Cache contention is also part of this.

Keeping cores fed is a huge issue and one of the key bottlenecks for adding additional cores. It’s why IBM and Intel keep adding so much cache for their multi-core cpu’s, without it the cores would be underutilized.

Cache issues are one of the serious problems in parallel code. But shouldn’t matter is they are separate instances. What one does need to do is use one of the processor affinity commands to tell the OS to keep the process on one particular core. Of course you still need to avoid sharing level 3 caches. So you get to the point where you can only run one instance per socket.

This is a common problem with quite a few codes. Cache matters more than anything else, and you see systems where there is only one thread per socket on some very large machines. If you look at the relative areas on the chips devoted to cpu core and cache, this isn’t all the unreasonable anyway.

However I struggle to see why a reservoir modelling code could not be parallelised without too much difficulty. I assume it is an old and probably not well supported code.

Memory bandwidth or memory capacity? Bandwidth is how much data the CPU can read from memory per second while capacity is the amount of memory available. I would be very surprised to learn of a single-threaded program that could max out the memory bandwidth of a modern system. The amount of memory bandwidth available on modern Intel systems is immense – from memory, Haswell has somewhere in the area of 100GB/s of bandwidth (with 4 memory channels populated). Unless his application is doing a huge amount of DMA, I have a hard time believing that a single core could even read from memory at that rate, let alone do something useful with the result.

Any reason that rigidbody/softbody/fluid physics and global illumination used to be done on the CPU but now trend towards being done by the GPU? In other words, if it isn’t difficult to do physics though parallelized cores, how come it’s still not well-established at the consumer level?

A good analogy is with the relationship between CPU’s and math co-processors during the 1980’s and early 1990’s. The math co-processors of that era specialized in floating point arithmetic but not graphics per se. I think some games were written to take advantage of them to some extent (Wasn’t DOOM one of them?)

Beginning with Intel’s Pentium (the original one, the one with the infamous bug), floating point co-processors were merged in to the CPU itself for all releases. The 486 (just prior to the Pentium) included a built-in co-processor as an option (the DX model).

There are some non-graphics applications that can take advantage of a GPU, Folding@Home being a notable example. It is, however, difficult to write software to do this. It has to be written the right way so that it can submit streamlined operations to the GPU that take advantage of the GPU’s strengths. If you just have a big list of miscellaneous computations that you need computed, the CPU (being a generalist) can handle that most efficiently.

Right. Most of the old graphics cards define basic interfaces, in other words, available resolutions, color palettes, enough memory to store enough graphics data, a sufficient clock speed to generate an acceptable framerate, and not a lot else. The modern GPU includes circuitry to perform complex 3D operations (e.g. linear algebra and floating point math) operations quickly. The old graphics cards didn’t do that.

The Hercules card is notable because it was a “hack” on IBM’s Monochrome Graphics Adapter (MGA) standard and let you display black-and-white only graphics using a basic MGA monitor (which originally wasn’t supposed to be used for graphics).

Does anyone have a good explanation for why GTX 970s aren’t recommended for GPGPU stuff? I understand there’s some weird architectural stuff with them, I’m taking a Deep Neural Net course and on the list of recommended cards if you want to do computations at home instead of in the lab, they listed Titan, 980Ti, 980, and 960. Apparently there’s something off about 970s.

E: It is a memory size issue, it doesn’t perform well if more than 3.5GB of memory is used. Google is my friend

Of those only the Titan - and not the Titan X - will do double-precision arithmetic.

Depends on what sort of physics you are doing. Things that are embarrassingly parallel (ie things where you can trivially subdivide the problem into parts that don’t depend upon one another) are always easy. Then it depends upon how local the interactions are, or how well you can approximate non-local interactions. Finite element analysis is reasonably straightforward. Any sort of simulation or approximation where you are only concerned about the effect on your behaviour of you immediate neighbours is similarly reasonably straightforward. (You may need tricks like red-black alternation, but this is all very easy.) Once you get into physical approximations that are less local, it all gets much harder. And it depends greatly upon how much of an approximation you can get away with. Some systems are horribly sensitive, and you can get results that are essentially garbage.

The thing with current GPUs is all about the memory footprint of the problem. GPUs go fast because they do limited and highly predictable things to data that has very clear and limited access patterns. You don’t go gallivanting all over a large address space in unstructured and difficult to predict ways. So systems with highly tailored memory architectures, instructions that are always in cache (if there even needs to be such a thing) and an ISA that is geared to doing lots of repetitive operations in sequence as it ploughs through the dataset it has. This model fits the highly regular model of graphics rendering, and also fits the highly regular simple physical models of some physics and engineering problems.

For some some physics problems the trick is in working out a suitable representation of the problem where the mathematics can be expressed neatly in, say, a multi-dimensional lattice, or other reasonably regular form. And represented in a form where time can be made to progress by mutating data living within this space. An astonishing amount of work can also be expressed as a PDE, and numerical solutions of PDEs has been grist for the mill for many decades. For instance beating general relativity into a shape suitable for numerical computation an use the ADM formalism for just this purpose.

But, some physics is just plain horrid, and lacks the needed locality, stability, or other attributes that makes it easy to parallelise.