What's wrong with AMD?

I’m not a chip designer so I can only speak in basic principles.

Only clock the chip as high as you need to. For instance, suppose you are locked to 60 Hz but the chip will render at 90 Hz flat-out. You then only need to clock the chip at 2/3 the level to hit your target.

Strongly related to this is: only use as much voltage as you need. Chips need more voltage as they are clocked higher, so if you can reduce the clocks, you can cut the voltage and hence the power.

Only clock the parts of the chip doing work. For instance, GPUs do both math ops and texture ops. But if you’re going math-heavy work, the texture units are idle. They should be unclocked, or–even better–powered off completely (this can be hard).

Just as a very general principle, don’t be wasteful in the work you do. Don’t use 32 bits when 16 bits is enough. Be smart about memory locality–better caches mean fewer memory transactions. Etc.

Semi-related–use fixed-function hardware appropriately. Fixed-function is more efficient than general-purpose. So things like triangle rasterization are best done with fixed hardware. Of course it has the downside that it only does one thing, and if you rarely do that thing, you’d better make sure it doesn’t draw power when you aren’t using it. General-purpose HW is more flexible and will have better occupancy. There’s no hard-and-fast rule here, so you just have to analyze the situation and pick the appropriate method.

Memory power use is also important. GPUs use sophisticated (lossless) memory compression to reduce traffic and increase performance. A lot of traffic is highly compressible–for instance, the first thing that happens in a frame is usually a memory clear to all zeroes. A big chunk of zeroes compresses very well. They’re much more clever than just this, of course.

In a very hand-waving sense, you want a focus on elegant hardware instead of brute force. It’s easy to throw a bunch of math units on a chip, but done poorly they’ll sit idle a lot of the time and just burn power. It’s worth spending some of that chip area on caches, compressors, clock gating units, and so on. The bullet-point numbers on the GPU will go down but the efficiency will make up for it.