Please ID this mathematical/statistical phenomenon: small numerals more common

In everyday numerical data (football scores, phone numbers, bookkeeping entries, etc.), it may seem intuitive to assume that the distribution of digits (0, 1, 2, 3… 9) is uniform - that there are equal numbers of each numeral.

But there is an observation that lower-value digits occur more frequently than higher-value digits.

In other words, 0 occurs more often than 1, 1 more often than 2, 2 more often than 3… you get the idea.

What is this phenomenon called?

Is this true? I’ve never noticed this. I could guess that if you’re talking about scores and such, it makes sense that lower numbers would be more common than higher numbers simply because you have to go through the lower numbers to get to the higher numbers.

You might be thinking of Benford’s Law?

It applies to the leading digits in lists of numbers from such sources. Called the “law of anomalous numbers” or “Benford’s law”:

http://en.wikipedia.org/wiki/Benford's_law

It really isn’t that startling when you realize what’s going on - it’s a consequence of the way we represent numbers.

This has been used to justify speculative run ups such as when the Dow headed toward 10,000. The theory was that the Dow would likely increase to at least 10,000 to maintain the frequency of low digits. Just sounds like the Gambler’s Fallacy to me. It was also predicted that crossing the 15,000 mark would take the longest time because it was shorter distance to go down and get back to the lower digits again than to go up to get there.

Yes. Thank you.

Football scores, like other arbitrary scoring systems or systems in which data is not continuous, shouldn’t count. Soccer games often have a 1 on side on the losing or winning end because it’s goddamn hard to get higher scores; hence fewer nines. Tennis, with a different base, is only scored in 15s, the game sets figures are only 0 -4, and the final scores will be different depending on the universe of players with different skills.

An interesting fact I learned some months ago reading SDMB, is that financial numbers from Greece were found to be fictional because they violated Benford’s Law.

Phone numbers, mentioned in the OP, won’t follow Benford’s law. They aren’t assigned sequentially, but rather in blocks. Street addresses are typically assigned sequentially, more-or-less, so should follow the rule.

You also need a sufficient spread of numbers for the law to hold. Human adult height, measured in inches, will have a huge number of sixes and sevens for the leading digit, but not much in the way of ones.

An interesting exception lies in common street names. Wikipedia says:

It is easy to understand why there are more "6th Street"s than "7th Street"s. It is because every town with seven numbered streets will have a “6th Street”, but towns with only six numbered streets will not have a “7th Street”.

The anomaly is “1st Street”. Why are there so few? The simple answer I’ve heard is that so many of them take “Main Street” (or similar) as the default, or actually begin as “1st Street” but later got renamed as “JFK Boulevard” or whatever.

One I wonder about, is Bozeman, MT, which has a 3rd avenue (and 4th, 5th, etc. up through 20-something), but which does not have a 1st or a 2nd. The street that would be 2nd is Grand Avenue, and the street that would be 1st is Willson Avenue [sic, named after a local philanthropist, not the president]. Now, I could understand if it originally went Grand - 2nd - 3rd - etc., and 2nd was later renamed after Willson, but it’s the other way around.

Try looking for Sixth Avenue in NYC. They rename the streets but everybody still gives out addresses by the old name.

Also, how often do you see the 2nd National Bank, or the 2nd Church of Christ?

The crucial thing is that the series be open-ended. My address is three digits, one of my son’s has a 2 digit address and the other one’s is four digits. Moreover, owing to the peculiarities of King’s county, WA, the latter son’s neighbor has a five digit number (they are on a curve and the one house is slightly less than 45 deg from the horizontal and the other slightly greater).

Phone numbers don’t work because they are all exactly 7 digits (or all are 10 digits if you include the area code). One way to justify Benford’s law is that these are the only way they could be scale invariant. Take a list of numbers that satisfies Benford’s law and multiply all the numbers by 7; the new list will also satisfy it. No other distribution has this property. Greece’s financial data would have given different distributions if nominated in drachmas (or dollars) than in Euros and that should have been a red flag that the figures were invented.

Well, there’s Fifth-Third bank…

People say this, but why should I expect the data to be scale invariant? Why should I expect Greece’s financial data to have the same distribution of numerical representations in drachmas as in dollars as in Euros? That seems a ridiculous expectation.

That’s not really a good test of the data, though. You’re only looking at subset of the data, and it’s a data set that isn’t even a good test to begin with because it is sequential and arbitrary.

Benford’s law would predict that 1st, 11th - 19th, 100th - 199th, 1000th - 1999th are the most common. Again, it might not hold true for streets even in that situation, but it doesn’t really matter that 1st, taken alone, is not individually the most common.

I believe it’s because of logarithmic growth.

If you have money in a bank account, it takes x time to double from 1 to 2. It takes x time to go from 2 to 4 and then 4 to 8. This exponential growth is scale invariant.

If I start at 5, it takes x time to get to 10. And it still takes x time to go from 10 to 20, so we’re still spending much more time with numbers that start with 1. No matter what you multiply by, you’ll always get more number starting at 1 in a sequence that grows exponentially.

Not all financial numbers are directly controlled by exponential growth, but the experts tell me the pattern holds. I’m willing to believe them, especially since this is a statistical analysis that allows for the occasional exception. (To rephrase that for clarity: if we say Greece’s numbers look fake, we can only say so with 95% or 99% certainty, or whatever confidence level we choose. We will never be 100% sure they’re fake because there’s always the chance that some set of true numbers do defy Benford’s law.)

Yes, it’s because a lot of kinds of data (including financial data) is based on some kind of logarithmic growth. It does not depend on being base 10: it will work in any base. (In the extreme case, with base 2, all numbers start with 1 because you ignore leading zeros.)

One way to look at it is this way: for any number n, you expect the same frequency for the interval (n,2n) as for (2n,4n), (3n,6n), (4n,8n) and (5n,10n). So, if the probability of having leading digit 1 is P, then P is also the probability of:

  • leading digit 2 or 3
  • leading digit 3, 4 or 5
  • leading digit 4, 5, 6 or 7
  • leading digit 5, 6, 7, 8 or 9.

Didn’t get this. If you’re saying that only tourists and postmen and people who work between the forties and 72nd on sixth say that stupid name, you’re absolutely correct.

Next thing you know they’ll be calling the Triboro RFK bridge.

Strictly speaking, Benford’s law doesn’t apply to log-uniform distributions, but to log-normal distributions. That is, if you take all your numbers, and take the logs of all of them, the logs of the numbers would be distributed along the familiar Gaussian bell curve.