Please ID this mathematical/statistical phenomenon: small numerals more common

F.U.Shakespeare · August 10, 2012, 2:44am

In everyday numerical data (football scores, phone numbers, bookkeeping entries, etc.), it may seem intuitive to assume that the distribution of digits (0, 1, 2, 3… 9) is uniform - that there are equal numbers of each numeral.

But there is an observation that lower-value digits occur more frequently than higher-value digits.

In other words, 0 occurs more often than 1, 1 more often than 2, 2 more often than 3… you get the idea.

What is this phenomenon called?

audiobottle · August 10, 2012, 2:53am

Is this true? I’ve never noticed this. I could guess that if you’re talking about scores and such, it makes sense that lower numbers would be more common than higher numbers simply because you have to go through the lower numbers to get to the higher numbers.

Bytegeist · August 10, 2012, 3:00am

You might be thinking of Benford’s Law?

yabob · August 10, 2012, 3:01am

It applies to the leading digits in lists of numbers from such sources. Called the “law of anomalous numbers” or “Benford’s law”:

It really isn’t that startling when you realize what’s going on - it’s a consequence of the way we represent numbers.

TriPolar · August 10, 2012, 3:15am

This has been used to justify speculative run ups such as when the Dow headed toward 10,000. The theory was that the Dow would likely increase to at least 10,000 to maintain the frequency of low digits. Just sounds like the Gambler’s Fallacy to me. It was also predicted that crossing the 15,000 mark would take the longest time because it was shorter distance to go down and get back to the lower digits again than to go up to get there.

F.U.Shakespeare · August 10, 2012, 3:27am

Yes. Thank you.

Leo_Bloom · August 10, 2012, 5:29am

Football scores, like other arbitrary scoring systems or systems in which data is not continuous, shouldn’t count. Soccer games often have a 1 on side on the losing or winning end because it’s goddamn hard to get higher scores; hence fewer nines. Tennis, with a different base, is only scored in 15s, the game sets figures are only 0 -4, and the final scores will be different depending on the universe of players with different skills.

septimus · August 10, 2012, 6:37am

An interesting fact I learned some months ago reading SDMB, is that financial numbers from Greece were found to be fictional because they violated Benford’s Law.

ZenBeam · August 10, 2012, 11:43am

Phone numbers, mentioned in the OP, won’t follow Benford’s law. They aren’t assigned sequentially, but rather in blocks. Street addresses are typically assigned sequentially, more-or-less, so should follow the rule.

You also need a sufficient spread of numbers for the law to hold. Human adult height, measured in inches, will have a huge number of sixes and sevens for the leading digit, but not much in the way of ones.

Keeve · August 10, 2012, 2:18pm

An interesting exception lies in common street names. Wikipedia says:

It is easy to understand why there are more "6th Street"s than "7th Street"s. It is because every town with seven numbered streets will have a “6th Street”, but towns with only six numbered streets will not have a “7th Street”.

The anomaly is “1st Street”. Why are there so few? The simple answer I’ve heard is that so many of them take “Main Street” (or similar) as the default, or actually begin as “1st Street” but later got renamed as “JFK Boulevard” or whatever.

Chronos · August 10, 2012, 4:56pm

One I wonder about, is Bozeman, MT, which has a 3rd avenue (and 4th, 5th, etc. up through 20-something), but which does not have a 1st or a 2nd. The street that would be 2nd is Grand Avenue, and the street that would be 1st is Willson Avenue [sic, named after a local philanthropist, not the president]. Now, I could understand if it originally went Grand - 2nd - 3rd - etc., and 2nd was later renamed after Willson, but it’s the other way around.

TriPolar · August 10, 2012, 5:18pm

Try looking for Sixth Avenue in NYC. They rename the streets but everybody still gives out addresses by the old name.

Also, how often do you see the 2nd National Bank, or the 2nd Church of Christ?

Hari_Seldon · August 10, 2012, 11:29pm

The crucial thing is that the series be open-ended. My address is three digits, one of my son’s has a 2 digit address and the other one’s is four digits. Moreover, owing to the peculiarities of King’s county, WA, the latter son’s neighbor has a five digit number (they are on a curve and the one house is slightly less than 45 deg from the horizontal and the other slightly greater).

Phone numbers don’t work because they are all exactly 7 digits (or all are 10 digits if you include the area code). One way to justify Benford’s law is that these are the only way they could be scale invariant. Take a list of numbers that satisfies Benford’s law and multiply all the numbers by 7; the new list will also satisfy it. No other distribution has this property. Greece’s financial data would have given different distributions if nominated in drachmas (or dollars) than in Euros and that should have been a red flag that the figures were invented.

ZenBeam · August 11, 2012, 12:08am

Well, there’s Fifth-Third bank…

Indistinguishable · August 11, 2012, 12:12am

People say this, but why should I expect the data to be scale invariant? Why should I expect Greece’s financial data to have the same distribution of numerical representations in drachmas as in dollars as in Euros? That seems a ridiculous expectation.

dracoi · August 11, 2012, 12:53am

That’s not really a good test of the data, though. You’re only looking at subset of the data, and it’s a data set that isn’t even a good test to begin with because it is sequential and arbitrary.

Benford’s law would predict that 1st, 11th - 19th, 100th - 199th, 1000th - 1999th are the most common. Again, it might not hold true for streets even in that situation, but it doesn’t really matter that 1st, taken alone, is not individually the most common.

dracoi · August 11, 2012, 1:00am

I believe it’s because of logarithmic growth.

If you have money in a bank account, it takes x time to double from 1 to 2. It takes x time to go from 2 to 4 and then 4 to 8. This exponential growth is scale invariant.

If I start at 5, it takes x time to get to 10. And it still takes x time to go from 10 to 20, so we’re still spending much more time with numbers that start with 1. No matter what you multiply by, you’ll always get more number starting at 1 in a sequence that grows exponentially.

Not all financial numbers are directly controlled by exponential growth, but the experts tell me the pattern holds. I’m willing to believe them, especially since this is a statistical analysis that allows for the occasional exception. (To rephrase that for clarity: if we say Greece’s numbers look fake, we can only say so with 95% or 99% certainty, or whatever confidence level we choose. We will never be 100% sure they’re fake because there’s always the chance that some set of true numbers do defy Benford’s law.)

Giles · August 11, 2012, 1:09am

Yes, it’s because a lot of kinds of data (including financial data) is based on some kind of logarithmic growth. It does not depend on being base 10: it will work in any base. (In the extreme case, with base 2, all numbers start with 1 because you ignore leading zeros.)

One way to look at it is this way: for any number n, you expect the same frequency for the interval (n,2n) as for (2n,4n), (3n,6n), (4n,8n) and (5n,10n). So, if the probability of having leading digit 1 is P, then P is also the probability of:

leading digit 2 or 3
leading digit 3, 4 or 5
leading digit 4, 5, 6 or 7
leading digit 5, 6, 7, 8 or 9.

Leo_Bloom · August 11, 2012, 5:06am

Didn’t get this. If you’re saying that only tourists and postmen and people who work between the forties and 72nd on sixth say that stupid name, you’re absolutely correct.

Next thing you know they’ll be calling the Triboro RFK bridge.

Chronos · August 11, 2012, 2:52pm

Strictly speaking, Benford’s law doesn’t apply to log-uniform distributions, but to log-normal distributions. That is, if you take all your numbers, and take the logs of all of them, the logs of the numbers would be distributed along the familiar Gaussian bell curve.

Topic		Replies	Views
Binary distribution in artificial and natural settings; more "1"s or "0"s? Factual Questions	13	974	January 25, 2005
Most commonly appearing first digit in (large) lists of numbers Factual Questions	5	941	May 21, 2004
Somebody explain Benford's Law Factual Questions	10	1414	August 1, 2008
What is the last digit of your house number? Miscellaneous and Personal Stuff I Must Share	6	399	May 18, 2022
Is There Any Way To Determine If Numeric Data Have Been Faked? Factual Questions	12	1296	June 8, 2005

Please ID this mathematical/statistical phenomenon: small numerals more common

Related topics