What are the main components of computer chip costs?

To be fair, top-end GPUs today are more like 600 mm[sup]2[/sup], so we’re really only talking about 170 W/cm[sup]2[/sup]. That’s still a lot of heat to remove from a small location.

I looked into Fluorinert for immersion system cooling a while back, but it’s just ridiculously expensive. Like $1k/liter, IIRC. I guess if you’re buying a Cray you can afford it!

I think you’re right that getting power onto the chip is going to be your next hardest problem. If only you could run them at a higher voltage. A kilowatt is one thing; a kiloamp is just absurd. I know Intel has looked into on-die power regulators but I don’t think the input voltage is very high.

Intel has no major competitors so no incentive to bring out faster CPU and cheaper CPU or bring cost down.

The Intel CPU are way overpriced than AMD CPU. They are overpriced than what it cost to make it.

A 2015/2016 CPU is not two or three times faster than a 2009/2010 CPU yet it way more costly.

Cite?

What does “package” refer to here?

Do I have the following right:
Bigger die can take more wattage. Bigger die are largely dependent on getting better yield.

So, for a given microarchitecture, manufacturing process and node, the two main ways to get more performance are to use a bigger die or use better cooling, both of which have exponential cost curves.

A bigger die for the same design will mean a range of changes to the design, and it won’t be a simple tradeoff. Indeed “node” and “manufacturing process” really direct the design rules the designers follow - which include all the spacing rules. So, for the same feature size in the design rules, you would just be leaving bigger gaps between things, which would mean that the propagation time across the chip may rise - slowing down your ability to clock it faster. OTOH, capacitive effects between elements will drop, and that can allow faster propagation. Both of these are massive oversimplifications of the problems, I’m just trying to show that there isn’t a simple answer to die size versus speed. In reality such changes would be messy, as you are changing the design rules, and it isn’t the same node as you started with.

You could just spread sub-units around the die, but thermal effects are a significant problem. Localised heating on the die can stress the substrate - in principle even crack it. So just spreading subunits out isn’t necessarily going to be a simple answer either.

The limits are power per unit area, not so much the entire die. But nothing is simple.

Package refers to the thing you buy. The die, bonded to a carrier and encapsulated with pins or bumps to connect to the mother board. Diamond is suggested due to its ridiculous thermal conductivity. If you deposit a think layer of crystalline silicon on a slab of diamond you can build a die with very good thermal conductivity to the package. Expense and all sorts of evil problems are in the way, but it has enough merit to be taken seriously.

Price is set by supply and demand, not be manufacturing costs.

The “package” is the material the die is embedded in so that if can used on a circuit board. It’s what you see when you actually buy an IC. The IC, itself, is a very thin sliver of Si. Leads on the “package” are attached to pads on the IC in order to connect the IC, electrically, to the outside world.

Having watched a talk by Jem Davies, VP of technology at ARM:
So, there is plenty of room on the silicon to add more transistors, the problem is that if we added more transistors*, powering those additional transistors would generate too much additional heat, correct?
Irrespective of the increased power input and heat output, is adding transistors something that is costly?

Chip comparisons:
Looking at the 900 series of Nvidia GPUs**, I see that the Titan X and 980Ti (both based on GM200 with 601mm die size) have a TDP of 250 Watts. The 970 and 980 (GM204, 398mm die) have 145 and 165 TDP, respectively. The 950 and 960 (GM206, 227mm) have 90 and 120 TDP, respectively.

How come both GM200 chips have the same TDP but not the GM204 and GM206?

The 950 and 980 have a TDP/die size ratio that is proportionally about the same as the TDP/die size of the 980Ti and Titan X. How come the 960 has a TDP/die size ratio that’s nearly 30% larger and the 970 has a TDP/die size ratio that’s about 20% lower?
What explains the different core configuration ratios between the 900 series chips? Intuitively, I would think the designers would strive to maintain the same ratios of shader processors, texture mapping units and render output units between chips. What’s happening there?

*At the current node and architecture

**List of Nvidia graphics processing units - Wikipedia

Liquid cooling isn’t that expensive, and it would be even cheaper if it was shipped from a major pc manufacturer. I’m curious why it that hasn’t been done yet?

Do factory-installed liquid coolers count? Those are starting to appear on GPUs. When they’re no longer novelties, perhaps the added price of factory-installed water cooling will go from 100$ to 50$.

For CPUs, it seems the manufacturers figure that those interested in overclocking, overvolting and water cooling will want to pick their own WC. I have no idea if you and I are in the minority in being ok with standard CPU WCs.
If designers like Nvidia limit the voltage increase (on pain of voiding the warranty), that would severely reduce the gains of water cooling though. Oh, happy day when I can take my GTX 970 to 1.4 volts.

OK I didn’t know some GPUS were now shipping with liquid cooling. Thats interesting.

Boxx Technologies has prebuilt water cooled systems but they’re not really a major manufacturer.

Alienware has some water cooled pre built models as well. Cool!

Apple had a line of water-cooled machines.
They generally performed well, but when they leaked, they often ruined the entire computer (MoBo and power suppy).

Water cooling is one of those “great in principle” things.

I can only speak very vaguely here. Binning occurs across several different axes which are not completely independent. As said before, you can bin based on defective subunits. You can also bin on clocks as well as voltage.

These different axes are related to each other, though–one chip might not quite hit the clocks that another chip hits unless you bump the voltage on it. Of course that uses more power; enough that it might be a good product for desktop but not for mobile.

Efficient chips–ones that hit high clocks without raising the voltage too much–are more valuable than others, so you want to charge a premium price. So you siphon off these golden chips and sell them at the high end.

A particular chip might have a marginal subunit. You might sell that chip at full clocks with the defective unit disabled, or at lowered clocks with all units enabled, or at full clocks and config but with a bumped voltage. Which strategy you take depends on the particulars.

Another factor is that there are some thresholds when it comes to TDP. NVIDIA products have 0, 1, or 2 power connectors (in various combinations of 6- vs. 8- pin). At the high end, the cards have a 6+8 pin connector, which give a total of 300 W, and if you want to play it safe then 250 W is a good number. To go much higher you would need 8+8 (like the Fury X) or 3 connectors. For intermediate segments, you might want to have just one 6-pin connector, which implies a maximum 150 W TDP, so you’ll see some thresholding around that value.

Overall, it’s very complicated and the product positioning when it comes to TDP is very dependent on the yield distribution of your chips and the target market segment.

Thanks for the info.

So, when EVGA provides 6+8 pins or Gigabyte provides 8+8 pins for the GTX 970, what the dickens do they intend the user to do, run a 970 at 250W+ of heat? That seems like it would require changing the BIOS and presumably voiding the warranty. Yet they provide pins whose only purpose is to go far beyond the 970’s normal TDP.

Not toward you in particular but generally:
When I asked: “So, there is plenty of room on the silicon to add more transistors, the problem is that if we added more transistors*, powering those additional transistors would generate too much additional heat, correct?” I really did want to know if I got that right because I very well may not.

Also, for the same microarchitecture, will halving the node about halve the power and heat?

Quite interesting. It only makes sense on the top tier though. The impression I get is that a good GPU air cooler adds about 50$ over the reference design whereas a GPU water cooler adds about 100$. Is that about right?

I’m really not sure how much higher one can typically push the clocks and voltage on a water cooler vs a decent air cooler. If it (hypothetically) allows you to get 10% more performance, paying an extra 50$ to get 10% more performance out of a 300$ GPU isn’t worth it. On a 600$ GPU, it is worth it.

Perhaps in a few years, some GPUs will come without coolers and you’ll install your own like on some unlocked i7s.

Water coolers have improved since the old days of 2003. Anecdotally, my 5-year-old 50$ Corsair H50 has yet to spill a drop and Newegg has it rated 5/5 by 1725 customer reviews.

No idea, but it could be just for the illusion of needing that much power. An ignorant user might look at the inputs and decide to buy that brand because it seems like it can go to high power levels. Like having a giant exhaust on a 4-banger shitbox Civic.

I don’t have a better answer than “it depends”. At the high end, there isn’t more room for extra transistors. And you can’t supply much more power, since that implies a higher voltage, and too high a voltage reduces lifetime.

At the midrange, you could supply more transistors, but that’s more area and thus higher cost. Perhaps more importantly, the higher power means higher board costs–power regulators, heatsink, etc.

Sadly, it’s way more complicated than that. Some new nodes have (historically) hardly improved per-transistor power at all. Others do rather well. 16FF is a big step for users of TSMC’s fabs. The FinFet transistor architecture (which Intel has been using for a while) is a big improvement in power. I dunno about half, but it’s a good step.

Even the node dimension, “28 nanometer”, “16 nanometer”, etc. has very little relation to actual size these days. It’s put there as a general guideline for transistor scaling but in no way can you say that you get a (28/16)[sup]2[/sup]=3x improvement. No way to know without looking at the details. Among other things, the analog chip components–specifically, the high-speed serial interconnects for memory, PCI-e, etc.–have not scaled well in a while. There’s almost zero improvement in dimensions or power use for these parts of the chip.