GPU vs. CPU?

Graphics Processing Units (GPUs) seem to be a thing these days. Evidently they are optimized for processing certain kinds of data (i.e. video graphics). As it happens, this specialization is well-suited to BitCoin mining.

Some questions:

What are the differences between a GPU and a CPU?

If a GPU can crunch through calculations so much better than a CPU, what makes them poorly-suited for the mundane tasks that a CPU performs in my desktop PC? If I’m running a gigantic discrete-time numerical simulation in an Excel spreadsheet, why would the CPU be able to do it better than a GPU (if the GPU could do it at all)?

What is it about the BitCoin mining operations that makes GPUs such a good match for the job?

GPU’s do one (type of) thing really well. CPU’s do everything sort of okay.

It just so happens that that types of calculations GPU’s are optimized for are also useful for mining bitcoins.

A GPU expects a steady stream of consistently formatted data from one source. The CPU handles all types of data streams from a variety of sources, arriving at varying speeds, with varying levels of urgency.

It’s sort of like the difference between an electrician and a handyman.

This will be a gross oversimplification, but: A CPU is good at performing hundreds of different operations on a small set of data, in sequence. A GPU is good at performing hundreds of different copies of the same operation on pieces of a large set of data all at once (in parallel).

A single CPU core executes a single program at a time, running on a single set of data. CPUs are optimized to do this very fast, and use all kinds of clever tricks to do so, regardless of what the program does, how it is structured, how it’s data is structured, etc. If you have multiple CPU cores (e.g. 2-4 in a modern desktop), each runs a separate program totally independently. And if you have more programs running on your computer than cores available, the CPU cores divide their time between each program, still running just one at a time.

A GPU also executes a single program at a time, but it can perform the operations defined by that program in parallel on many different sets of data at once. A GPU might have 100-200 individual “compute units”. Each compute unit individually is much simpler and slower than a CPU core, and the compute units must also operate in lockstep: they can’t run different programs simultaneously, or even different paths of execution within the same program, e.g. two different branches of an if statement. However, if your program and data are structured in such a way that you can process 100-200 different “streams” or segments of your data identically and in parallel, they provide an enormous speedup.

This architecture (unsurprisingly) is very well-suited for graphics work, and certain computational tasks (BitCoin “mining” is simply repeatedly calculating a SHA hash with new random data until you find a hash with a sufficient number of leading zeroes). But they are not suited for general-purpose computation, primarily because of the limitations on flow control and branching: it is very rare in general purpose computation to have to perform the same set of operations identically on hundreds of different elements of data. Most general purpose computation involves performing hundreds of different operations in linear order on a small set of data, which is what a CPU is good for.

Also, programs still have to be specially written for GPUs: there are not any mature tools that will automatically port a program written for a general-purpose CPU into optimized GPU code.

A discrete-time numerical simulation, depending on the specific application, is actually probably extremely well-suited to execution on a GPU, but Excel in general certainly is not. You would have to write the simulation in a GPU-specific language in order for it to run at all, and careful structuring of the program and the layout of it’s data to avoid unnecessary loops, branching, etc. would be required to get optimal performance.

It should be mentioned that many tasks in scientific research are also well-suited to GPU calculation, and that GPUs driven by the video game market have therefore sped up certain types of research considerably.

In fact, most of the R&D costs for new GPUs is paid for by their application in financial modelling, who are early adopters for the new models. Actual graphics cards are a secondary market. :slight_smile:

Hm, I didn’t know about the financial modeling. I just know that there’s not nearly enough money in pure science to pay for development of the computers we use.

That’s why so many people who used to be in pure science end up in financial modelling. :slight_smile:

Indeed, my question arose because of a discussion with a friend who informed me that a major engine manufacturer (Caterpillar?) recently developed an engine, all the way from concept to production, without fabricating any prototypes. All of their development work was done in a computer, the bulk of which focuses on what happens in the combustion chamber. This is an immensely complex problem involving the simultaneous calculation of air movement, diesel fuel spray properties (droplet size/speed distributions), fuel evaporation, mixing, combustion reactions, heat transfer, intake/exhaust processes, moving boundaries, and so on. A couple of decades ago this kind of work was done on supercomputers (and wasn’t very accurate), but my friend told me that Caterpillar used GPUs to achieve their feat.

Figured as much. I just threw that out there as an example of something I might do at home that could potentially bog down a decent CPU.

Wiki says that GPU have very many cores which enables them to run parallel processes. Apparently, graphics are easy to parallelize. CPUs either have 1 or a small number of cores. I guess that makes CPUs more efficient at running processes that need to be run sequentially but GPUs are better at processes that can be run simultaneously.

Here’s one extremely technical drawback of GPUs that most people don’t realize: they lag in precision. This isn’t inherent in GPUs, but is a reality of most modern ones.

Moderns CPUs often have what’s known as “software” floating point units – this means that when you multiply decimals like 1.23 and 4.5219 it doesn’t have a dedicated circuit for doing this. Rather, it has a program that emulates what a floating point circuit would do. This makes CPUs cheaper and more flexible. As long as the right program is implemented on your CPU, it can multiply 16-bit (half-precision) 32-bit (full precision), 64-bit (double precision), or any other floating point number. In practice, most CPUs can multiply 32-bit and 64-bit floating point numbers at the same speed.

Floating point precision is a bit technical and hard to describe, but just know that the more bits used to represent the number, the more accurate your math becomes.

One big difference in modern GPUs is that, unlike modern CPUs, they have dedicated floating point processors (FPUs). This makes them much faster at floating point arithmetic, but this forces them to choose a bit-size. Since they’re a bit cheaper and GPUs have to be filled with the things, they tend to choose 32-bit. They have some 64-bit, but not as many. Most figures I’ve seen rate it at about eight 32-bit units to every one 64-bit unit.

This is changing, some of the very newest models cut it down to six or even four to one, but in general, if you’re using a GPU for calculations, and you’re really focusing on speed, you want to convert to 32-bit floating points. This bites a lot of novice graphics programmers, who slavishly use 64-bit numbers because they’re more accurate and just as fast on CPUs, but shoot themselves in the foot. For scientific computing precision is frequently (but not always) important enough that they stick to using doubles anyway, but sometimes optimizations can be made where they allow some of their data to be lower precision to take advantage of greater speed.

Well yes and no. The cycles per instruction of the GPU may be longer so as to allow so many cores to run… some sort of compromise.

No, if you weren’t locked into a specific design, you could model N cpu’s… as a CPU of speed N times one.

The two issues with using the GPU for an application

  • a very large program like excel is going to spend much of its time running on a single core … only very limitted sections would be parallelized.

  • it takes a lot of data, a lot of I/O … which has to squash through the bottle neck, easily fills (which makes useless so bad as if they don’t exist ) the tiny caches in the GPU …

Well OK the main CPU is designed to run the application … the GPU is designed to run N tight loops.

We’ve been using them in geophysical seismic processing for a few years now. There are a number of compute intensive operations that can be parallelized easily and huge volumes to data to process.

One drawback is that compared to CPUs, there’s a limited amount of available memory for the same amount of computation (or so say the programmers). This isn’t always a problem, but it’s something we need to keep in mind in our work.

That’s apparently being addressed somewhat with new hardware from the various vendors. The data I/O issue mentioned above is apparently also a very real bottleneck that the programmers get creative about avoiding.

In fact, some modern supercomputers are based on in part or completely on GPUs.* And as I understand it those old supercomputers were closer in function to a GPU than to a general purpose computer in the first place; they were made to perform the same sorts of mathematical calculations.

*Example

Wait, what? First you talk abut GPUs then CPUs. I’m pretty sure what you’re talking about isn’t true for mainstream CPUs.

The big difference between CPUs and GPUs is that GPUs are good at handling massively parallel sequential tasks. CPUs on the other hand spend much of their time running code that can branch in two directions at many points.

It’s a bit like the difference between painting a house and painting an oil painting. The basic idea is the same, but in the first case you just plan what to do and then execute, where you can easily split the job many ways so many people can work at the same time. In the second case, everything depends on everything else so you can’t just plan and execute, there is a constant feedback loop between the two.

What I said was misleading, but true. CPUs have hardware “FPUs” in that each core can handle, in hardware, addition, multiplication, and subtraction (possibly division). Everything else is handled by a software library.

GPUs, on the other hand, have more in their FPU hardware per core than just the basic operations. They can actually hardware multiply matrices, for instance. While it’s true that GPUs are massively parallel compared to CPUs, that’s only the very surface of differences. A GPU is not just a ton of CPU cores (except in limited cases like the Haswell line of Intel HD graphics cards).

Interesting. Had no idea.

I thought CPUs had matrix operations for a long time now (e.g. Intel MMX instruction set).

Cite?

Another key difference is memory access. GPU’s do not have the same level of fine-grained control for all processors when accessing (reading and writing) memory compared to a multi-core CPU.

If you have a program that is very parallelizable, it may not gain from running on a GPU if the memory access patterns are random for either reads or writes. The GPU is most efficient when the data structures can be read and written in adjacent blocks (there is some ability to get around this but performance drops).