Sure, you can have a SLI with 2 or even 4 GPU’s, but for whatever reason it only works good in some specific games, while in others it gives an extremely underwhelming result or it even makes the game run slower than running a single GPU.
How come we don’t instead have a single GPU that has double the physical size, double transistors, double the CUDA cores and so on? Is there something that is preventing graphics card manufacturers to make the GPU physically larger? (Not including MOBO and power consumption, which aren’t unsolvable problems)
Planes can have a single engine and if they need more power, they are made with 2 same engines, since it would take decades to research a single engine that is powerful enough as two engines. Couldn’t we…and didn’t we actually already have this same logic in GPU’s? I think some really old Radeon cards were like this, the 5970/90 or something like that.
Mainly because there’s not much of a market for it. You can sell either 10 million GTX1080s, 9 million of which are used in single configuration and 1 million are used in SLI or you can sell 9 million GTX1080s and 500K GTX2080s. But the problem is, those GTX2080s are going to be much more expensive to produce because you don’t get the economies of scale so you destroy your profit margin.
If you look at the enterprise market, there’s a lot more exotic packing of huge amounts of power onto single boards because enterprises are willing to pay the $10K a card necessary to make the economics work out.
edit: The other thing to understand is that games are solidly targeted towards the mass end of the market so you rapidly rise into diminishing returns as you add GPU power. There are theoretical games that could take advantage of all that power of high end GPUs but those won’t be written for another couple of years when it makes economic sense to do so. Until then, you’re just stuck with incrementally higher frame rates and slightly nicer shadows. Contrast that with computer vision or machine learning or any other enterprise use of GPUs where adding GPU power contributes directly to the bottom line.
The bigger the chip, the more expensive it is to produce, for one thing. Imagine a wafer with 10 flaws on it. If you produce 100 size x GPUs, you have to toss 10 of them, or 10 percent. If you produce 50 size 2x GPUs, you have to toss 10 of them, or 20 percent. (Going in the opposite direction, if you produce 200 0.5x chips, you have to toss 10 of them, or 5 percent.) the cost of all the bad chips is spread out through the good ones.
The chance of a die including a defect rises with the area, and the chance a chip is defective rises dramatically with area. That said, in systems with multiple identical sub-units - which is many modern CPU and GPU designs, it is now common to simply disable the dead sub-unit and sell the chip at a lower price point. Indeed many times the chips sold with feweer than the top line chip’s processing units are just ones where not all the units worked.
However, as noted above, there are other things thay get in the way, is heat. And that is a big problem. Differential heating can go as far as to fracture the die. But just getting the heat out of a large chip is a really serious problem. The energy density of a chip is ridiculous, and it is something of a miracle of modern engineering that things work at all.
The problem of scaling up is filled with pitfalls. Even if you could make a die twice the area economically, you still won’t necesarrily get double the performance. The chip is a 2D surface, and the interconnects scale linearly with chip size, whilst the number of processing units goes up as the square. Even on a single chip you get difficulties in inter-processor communication as the system grows. Going off chip is always bad in that the communications slows right down, although you do have some freedom to build an interconnect topology that can fit into a 3D volume. (However you don’t see this as a constraint until you reach really big compute systems.)
Back in the day you could get very large very expensive scalable graphics systems with lots of interconnection. Something like an SGI Onyxn Infinite Realityn. Compared to modern high end GPUs its performance was pretty lame, but in its day it was little short of jaw dropping. Not only could you combine multiple boxes, those boxes were build out of multiple piplelines, themselves built from multiple chips. You could spend many hundreds of thousands of dollars on a single machine.
I have two sitting here at home. I use them as side tables.
The Onyx4 was the last of breed absolute top end machine and cost more than my house did at the same time. It just sits in the corner now.
this is kind of what GPU companies do, as they release new chips. as they’ve reduced feature size they’ve packed more and more execution units into newer chips. but as has been said, the more you put in there, the more power you need to supply the GPU with, and the more heat you have to move out of the chip. top end cards are already approaching 250 watts of power draw, and almost all of that is turned into heat which must be moved elsewhere. and there’s a limit to how much heat a graphics card cooling system can transfer when confined to one or two expansion slots.
A fabrication plant is incredibly expensive: Samsung built one that cost over $14 billion. The economics only works because of making huge numbers of chips. Anything that causes the yield to drop (like making the die twice as big) causes the entire model to collapse.
They also can’t just make the feature size smaller and cram more transistors on the same size die. That is limited by the current state of fabrication technology. Eventually GPUs will be made at 7 nanometers which will allow putting a lot more transistors on the same die size, but that must await fabrication physics.
I think the old Intel “4 core” Q6600 CPU was actually two dual-core dies packaged in one chip carrier. This hurt economics and cost, but if it’s a huge-volume product (as was the Q6600), you can recoup that by making lots of them.
A specialized, multi-die GPU would not be high volume so would be very expensive. Multi-chip modules are usually viewed as expensive, niche products, and the goal (from a manufacturing standpoint) is replace them ASAP with single-die versions.
Since the multiple dies must communicate somehow, this also raises programming issues and efficiency issues regarding on-chip cache and bus bandwidth. E.g, does the API require extensions to work with the multiple dies, what about performance when threads one die must rapidly communicate with threads on another die, etc.
This also raises economic issues. E.g, with SLI you only buy the 2nd card if you want it. The manufacturer doesn’t make anything different – it’s just two cards. With a multi-chip module, it’s a discrete specialized semiconductor item. The manufacturer must tool up hyper-expensive fabrication to make that specialized part, then every purchaser of that item is burdened with the cost.
In what kinds of situations would a gaming graphics card made with 2 250mm chips have slowdowns that 1 500m chip wouldn’t have? I get the idea that on-chip interconnects are much faster and more power/heat efficient than off-chip interconnects but isn’t the whole idea of a GPU that you’ll dispatch embarrassingly parallel tasks to many independent threads; You increase throughput by increasing mass rather than speed? If so, what does it matter if those independent threads aren’t on the same chip? If an example could be provided using GPUs as gaming tech, that might easier to grasp.
There is also a limitation on die (chip) size due to the photolithography equipment (the equipment that prints the circuit patterns onto the silicon wafer). There is a hard limit on the exposure field size which depends on the particular equipment (stepper) being used. The ones we use have a maximum field of around 25mm square.
Keep in mind that the computations are not completely independent. They have to talk to each other. A shading/position change in one area affects other areas. And it’s not just local changes. A change in lighting affects a large area of the image.
The bottleneck in a lot of “naturally” parallel computations is data flow.
if the driver has to address both GPUs separately and do the scheduling/dispatch of work between the two, then the driver itself can be the source of unwanted slow-downs. ISTR the early implementation of AMD’s (when it was still ATI) Crossfire being prone to this.