There are a few things to unpack here.
Saying a 1Ghz signal versus a 100MHz signal does not contain enough information to state anything about the comparative information carrying capabilities.
As above, Shannon-Hartley says that the information carrying capacity is defined by the bandwidth times and the signal-noise ratio. We have neither when just saying the frequency of what we assume is just the carrier.
Being painfully pedantic we might assume that the bandwidth is thus zero, and thus the capacity of either channel is also zero.
In actual use channels are typically allocated bandwidth to services as a fraction of the carrier frequency. So say a service is allocated a band of 0.1% of the carrier frequency. A 100MHz service might get a bandwidth of 100kHz. A 1GHz service might then be expected to get a 1MHz bandwidth. So instantly we see that we might expect, all else being equal, that 1GHz services can carry 10 times the information. But in reality it just because we are carving the bands up using fractions of the carrier, not fixed frequency ranges. In the end that is mostly the answer.
But we didn’t discuss signal to noise. Noise comes form a whole range of sources. Thermal noise in your receiver is often dominant. But our universe is intrinsically noisy. Even if you douse the entire she-bang in liquid helium, there are still remaining noise sources. This places a fundamental base to your information carrying capability. If you can push more energy into your transmission, you get a wider margin from the noise. But that takes effort, and the real life problems with things like causing interference to other services are never far away. So real life services are always power limited, even if they have limitless power available to them. Portable devices clearly have worse issues.
So eventually you get your two hard limits. Signal to noise and Bandwidth. These are the iron clad, no such thing as a free lunch, type of limits.
How you get close to the theoretical limit is a matter for technology. A simple binary on/off modulation only exploits the top 6dB of your s/n. You can only modulate your signal so quickly, as modulation of the carrier spreads energy out from the carrier frequency, hence occupying your restricted bandwidth. So if you have a 100KHz bandwidth, you might only be able to made 100,000 encoding changes to the signal per second. You need to squeeze your ability to modulate signal into your bandwidth aggressively.
If you realise that you can encode both amplitude and phase information you swiftly realise that you are effectively able to encode locations on the x-y plane (or the complex plane depending on how you want to disect the mathematics.) minimally you have four quadrants, and thus one signal modulation change can encode not two but four states - so two bits per transition. two amplitude levels and you can put four points into each quadrant, and you have 16 values for each transition - and thus 4 bits per transition. This is the basis of QAM. The better your signal to noise the more levels you can cram in, and the more information can be encoded. As above, if you can dynamically switch encoding you can ride the signal to noise to get the best utilisation of the channel. If you divide your allocated bandwidth into lots of sub-bands you can perform sub-band relative adaptive modulation, and thus squeeze your way around little narrow band lumps of interference, pushing the envelope at every corner.
As the the energy in the photons bit, yes this is part of the intrinsic noise floor. But it is more complex than just defining this is a fundamental quantisation limit below which information does not exist. The arrival of photons is stochastic, and this means that there is a distribution of arrivals over time, and without going too deep into the weeds, this allows for information below the level of quantisation. It ends up becoming folded into the noise floor of the Shannon-Hartley theorem and doesn’t change things.