In computing, the energy cost is mainly expended when gates change state.
Seems self-evident that the lowest possible energy consumption is one planck constant (h) of energy per bitwise state change, doesn’t matter whether the computer is storing them as electrical charges, electric fields, photons, or tiny balls being lifted and lowered in some lilliputian abacus-like device.
Naturally there are a huge number of gates changing state on a chip for any given instruction cycle, but for a fundamental 64 bit instruction, say:
move $[memory address], #0A0AFFDA0A0AFFDAh
then the bare minimum energy consumption would be to change ONLY the bits in that memory cell. I believe this can be generalized that the efficiency limit for any instruction is the number of bits changing, for example:
NOP
XOR RAX, RAX
might consume just one quantum energy packet for each instruction since in the case of NOP there is no actual state change other than the program counter and for the XOR, same thing if RAX was 0 before, but up to 64 in the worst case of all bits being set.
All this assumes a direct “hard wired” logic path so efficient that only those bits changed and nothing else (not practical in the same sense that a Carnot cycle is not practical either), but a useful theoretical lower bound.
With some real numbers, it seems 60 MFLOPS/watt is fairly reasonable with consumer grade hardware, and some experimental projects by IBM and others are reaching 15 GFLOPS/watt.
Assuming a single floating point operation (FLOP) changes/updates 64 bits (every bit of a double-precision float), then the absolute lower bound is (64*(6.62607015e−34)^-1, which evaluates to 2.36e+31 FLOPS/watt. This is 21 orders of magnitude better than our current best of 1.5e+10 FLOPS/watt.
In any practical implementation I’d expect at least few million gates to change state each instruction so you might consider us to be a mere 15 orders of magnitude away at present. But a computer that is fully hardwired for each instruction (i.e., no pipes, microcode, etc.,) would be the most efficient from an energy consumption standpoint.
Landauer (linked above) predicted a much higher consumption per bit operation, and using his equation I have 5.57e+18 FLOPS/watt if the operations are carried out at room temperature. This puts our best technology around 8 orders of magnitude away from the theoretical lower bound.