50% of the introduction of that wiki article should not be devoted to a discussion of bytes. The prominence of it in the article is surely due to some editor getting very excited by the concept and amplifying it there. Also, “bytes” is not what you’d ever use. “Bits” is what you’d use. A byte is a construct stemming from computer engineering and has nothing fundamental about it.
Thermodynamic entropy measures the (natural logarithm of) the number of microstates accessible by a system whose other macroscopic thermodynamic observables are fixed. Classically, the entropy has a factor of the Boltzmann constant in it, but if you divide that out you have literally the log of the number of states the system could be in.
If you divide the (fundamental) entropy by log(2) – the natural log of 2 – you will get the number of bits that offer the same number of microstates. Ten bits have 1024 possible microstates. In information theory, entropy is always defined with base-2 logarithms, and one would say that the entropy of a random 10-bit system is, well, 10. A thermodynamic system with 1024 possible microstates would have entropy loge(1024) ~= 6.93.
So, tallying information and tallying states are both just tallying, so you certainly can talk about thermodynamic entropy in terms of bits. There are some applications where this could be worthwhile, but they are pretty niche.
Another specific complaint about that wiki page relates to the statement “If a system is challenged with a small amount of energy, then β describes the amount the system will randomize.” This sentence is mostly noise. “Challenged with a small amount of energy” doesn’t mean anything, and a system doesn’t “randomize” in the way they are trying to analogize.
To wit: Consider a system in information theory, say a string of four letters. If these four letters are to be drawn at random from the 26 letters of the alphabet, then you will need ceiling(log2(264)) = 19 bits to identify one string uniquely from another. Or, removing the “ceiling” function to just talk about the information content and not how you might represent that information, this system has log2(264) ~= 18.80 bits of information in it.
Consider the same system, except now say we know that each string of four letters will be a random valid English word, of which Google tells me there are 3996 (Scrabble dictionary). It will take only 12 bits to uniquely identify all the strings, as there are 11.96 bits of information here.
In information theory, then, an increase in entropy represents an increase in randomness. If we allow more randomness in our four-letter strings, the entropy is higher as there is more information contained. (If that last part sounds backwards, it’s because we colloquially think of “random” as “no (useful) information”, but in information theory, random things have the most information. It takes more information (bits) to convey random stuff than organized stuff.)
Switching to thermodynamic systems: If you add or remove energy to a system, the number of accessible states may change. By and large, though, the number of states changes by having completely different states available, not by randomizing anything. Since you are still counting states, you could analogize this with information theory and say that the system is more random if there are more states available, but for most systems that isn’t going to be productive. The system is just…different. It’s in a different macrostate and thus different microstates are relevant. This isn’t what you would normally called “randomizing”.
Analogy: Take a committee whose members are 5 random female US senators. There are about 66k ways to form such a committee. If we change the rules such that now the committee must consist of 5 random male US senators, we suddenly have about 16M ways to form the committee. Entropy wise we’ve gone up by 50%, but I wouldn’t say we’ve randomized the committee. I’d say we’ve just changed the nature of the system, and the new system has completely different states available to it.
Returning to thermodynamic beta itself, the best you could do to get bits (not bytes!) into a verbal definition of the units is “the change in the number of bits needed to enumerate the available microstates of a system when a unit of energy is added to the system”. Trying to define the reciprocal (temperature) units as in the OP, you’d need something like “the change in the energy of a system required to change the number of bits needed to enumerate the available microstates of a system by
one.” These are both rather tortured, especially the second one since the number of microstates is not generally an independent variable that one has any way to adjust, and in neither would I introduce the word “randomize”.