Don’t ask me why I was thinking about this in the shower. I thought it would be interesting to compare the average distributions of 1s and 0s in manmade usages of binary to natural occurences. I have a feeling it will be very near to 50% for computer code, but who knows until you do an actual count. The only natural occurence I know off the top of my head are electrochemical impulses used in neural communication, but there are bound to be lots more.
Maybe no one has ever measured this because it’s pretty useless information itself =)
If you’re only studying binary integers, then 1’s are going to have a slight advantage since every binary integer other than 0 starts with a 1.
Beyond that, my gut instinct is that you’re correct: the distribution amonst numbers which appear in software will be close to 50-50. But there may be surprises. I recently learned about Benford’s law, which says that numbers drawn from “real-life” datasets are much more likely to have 1 as their leading digit than any other digit, no matter what the base. Roughly, this is because “real-life” numbers aren’t random: they’re much more likely to be products of other numbers, which skews the distribution. Benford’s law only talks about the leading digit, so it doesn’t have anything to say about base two numbers (other than providing another argument as to why binary is an excellent choice of base for floating-point arithmetic) but I wouldn’t be surprised to learn that there are other similar effects on the distribution of trailing digits.
Orbifold, you brought up exactly what I was thinking of when I read this thread. Of course, in the Benford probabilities 1 shows up as the first digit every time, but that doesn’t help much with other places. For what it’s worth, in a base-3 system 1 shows up as the first digfit about 2/3 of the time. In our base-10 system it showes up just under 1/3 of the time as the first digit (30.1%)
In a computer situation where memory is allocated in bytes, binary numbers would appear left-padded with zeros to fill the byte. That is, if an integer is a 4-byte value, it’s going to be a 32-digit binary number regardless of value, so even if the value is 1 (binary 1), that’s 31 zeros and a single one. If you’re going to do statistics on the frequency of zeros and ones, you’d have to start out by saying whether you’re going to ignore leading zeros or not.
There would be a bias for leading ones even if the numbers aren’t determined as a product. If you assume that most “natural” numbers are drawn from some uniform distribution starting at 1 (or 0), then you’ll also see a bias towards leading ones. For instance, if you have a distribution from 1 to 999, you’ll get about the same amount of each leading digit. But if you have a distribution from 1 to 500, you’ll see many more 1s, 2s, 3s, and 4s than you will 5s, 6s, 7s, 8s, or 9s. In a distribution from 1 to 300, everything will be penalized except for 1s and 2s, and from 1 to 200, over half of your numbers (111 choices out of 200) will start with 1. One is the only digit which can never be penalized in this way, no matter what your distribution: At worst, it’ll be just as likely as any other.
On the OP’s question, though, the complication is that most binary sequences in nature aren’t really number sequences. If I’m looking at the state of a particular neuron as a function of time, for instance, I could say that I’m going to call neuron firing = 1, neuron resting = 0, and get a number sequence that way. But I could just as well call neuron firing = 0, and neuron resting = 1. Neither way is really more valid than the other, so I won’t really be able to say that nature likes 1 better than 0, or vice versa.
If you’re talking about scanning through the contents of a computer’s RAM at some predetermined point and counting 1s and 0s, you’ll find a whole lot more 0s. I’ve done this when I was writing some diagnostic software. Obviously, it will depend on whether the command set for the particular processor you’re using is 1-loaded or 0-loaded, but with the data, it leans heavily toward 0 for several reasons. Most of them are related to the 0-padding to fill an even number of words.
I can think of a few reasons for bias when scanning a computer memory.
First, if the hardware or the operating system initializes unused memory to all zeros (or all ones), then there’s a pretty big bias to start.
Second, there’s the issue padding with zeros to fill out byte or word boundaries (this includes data, and addresses: a 64 bit processor will have a lot of zeros for most of its addresses, I’m guessing).
And then, there’s the initial bits for standard seven-bit ASCII characters in eight-bit bytes, which will usually all be zero. (though balanced perhaps by the fact that letters almost all have the 6th bit set to 1).
Depends what you consider the binary expansion. I’d say that the 0s win handily since all binary representations of integers start with an infinite series of zeroes.
Then again, if you move to real numbers they all end with an infinite tail of 1s…
Ah, it sounded like you were saying that all the reals ended in infinite strings of ones. And of course integers don’t necessarily end that way; that’s just one of two equally good ways to write them.