Please ID this mathematical/statistical phenomenon: small numerals more common

Indistinguishable · August 11, 2012, 4:17pm

Why do you say that? Benford’s law is quite literally the statement that numbers have logarithms modulo log(10) which are uniformly distributed.

Now, as it turns out, data which is normally distributed (and thus not at all uniformly distributed) becomes very nearly uniformly distributed when you take its residue modulo a value which is not very high in comparison to its standard deviation. (In other words, the fractional component of normally distributed data is approximately uniformly distributed so long as the variance isn’t too low*).

So log-uniform data satisfies Benford’s law exactly (but the necessary condition for satisfying Benford’s law is not quite full log-uniformity; there’s a modulus involved). And log-normal data does not satisfy Benford’s law exactly, but Benford’s law will provide a good approximation, if the standard deviation of the logarithm is significantly larger than log(10).

[*: An excellent account of this fact via Fourier theory (and its applicability to explaining Benford’s Law) can be found in Chapter 34 of The Scientist and Engineer’s Guide to Digital Signal Processing, available here: http://www.dmae.upm.es/Webperson…

In short, a random variable’s residue modulo some period is described by a periodic probability density function, whose Fourier series is given by the values of the random variable’s characteristic function at the multiples of the period. Thus, it will be uniformly distributed just in case the original variable’s characteristic function is zero at the nonzero multiples of the period.

In particular, if a random variable is normally distributed, then its characteristic function is Gaussian centered at 0, with dispersion parameter inversely proportional to the standard deviation (remember, we’re looking at the characteristic function, not the density function). Thus, so long as the standard deviation is reasonably large relative to the sampling period, the characteristic function decays so quickly that its values at the nonzero multiples of the sampling period will be nearly zero, and so the random variable’s residue modulo the sampling period will be nearly uniformly distributed.]

tapu · August 11, 2012, 4:33pm

I crack up whenever I pass one of those. (toronto) I picture them always having to explain it: No, we’re the fifth Third Bank. Do you see? It’s like there are quite a few Third Banks, and we’re the fifth one to open. Let me put it another way…

Indistinguishable · August 11, 2012, 4:46pm

Or rather, I should say, if the standard deviation of the logarithm is large in ratio to log(10) [i.e., the standard deviation of the base 10 logarithm is large]. My previous wording gave the implication that it specifically mattered whether this ratio was larger than 1. The larger, the better the approximation, but, as always, there’s not some specific cut-off point…

Chronos · August 11, 2012, 7:22pm

But on the other hand, with a uniform distribution (of any sort, log or otherwise), you have to specify what range you’re uniform over (and if the answer is “over an infinite range”, then you should be prepared to explain the nonstandard mathematical framework you’re using to make that legal). And you could certainly contrive distributions which are log-uniform over some range but which do not follow Benford’s law.

Indistinguishable · August 11, 2012, 7:24pm

Like I said, it’s the logarithms modulo log(10) which need to be uniformly distributed [over the finite range from 0 to log(10)]. No nonstandardness needed.

But I see your point that log-uniformity over an arbitrary finite range isn’t sufficient; if a log-uniform distribution over some range is to satisfy Benford’s law, that range needs to have length a multiple of log(10). [Possibly an infinite range using only a finitely additive concept of distribution]

Still, Benford’s law is quite literally a claim that logarithms modulo log(10) are uniformly distributed.

Indistinguishable · August 11, 2012, 8:02pm

ignore this post

Indistinguishable · August 11, 2012, 8:10pm

I keep writing this thing, and then not being sure, and backing off… perhaps someday

Indistinguishable · August 11, 2012, 8:16pm

Well, whatever. I’ll write it again, and then you can help me puzzle through whether it’s flawed:

And if data actually satisfied Benford’s law not just in base ten but in arbitrary bases (as would be expected if it were actually an intrinsic natural law and not somehow a cultural artifact), then its logarithm would have to be uniform modulo log(b) for arbitrarily large b, and thus uniform simpliciter over an infinite range, mathematical warts and all.

ultrafilter · August 11, 2012, 8:25pm

I’ll buy it.

Indistinguishable · August 11, 2012, 8:29pm

Whoops, the full link was meant to be to here.

Hari_Seldon · August 11, 2012, 11:30pm

I didn’t say you should expect it. It just happens to be true. I once read a paper giving a strong argument why it should be so. I think you are a mathematician, so I will summarize the argument. If you plot the number of numbers up to n that begin with 1,2,3, against n you obviously get a sawtooth and the ratio to n does not approach a limit. So you do the obvious thing, you apply Cesaro summation to smooth it. It still doesn’t approach a limit, but it gives a distribution much closer to Benford’s law. So do it again, apply Cesaro summation to the second sequence. Much smoother, really good enough for Benford’s law, but it still doesn’t converge. Do it again. And again. Take the limit of the iterated Cesaro sums as the number of summations goes to infinity. That limit is precisely Benford’s law.

So it ultimately comes down to the fact that if you write a random list of numbers with no upper bound (or limit on the number of digits) then you likely get a Benford type distribution and the more numbers you write down the likelier it is. At any rate, as an empirical law it works. Greece was cooking the books and the distribution was unlikely.

ZenBeam · August 12, 2012, 4:12am

Well since you’re practically begging us to find a flaw…

I don’t have a problem with what you wrote logically, but I wonder if it’s useful. Much real data will satisfy Benford’s law in base 10, but most will not satisfy it in an arbitrarily large base. The height of trees measured in feet should satisfy Benford’s law, for example, but if you choose a large enough base, say base 1,000,000,000, it won’t. But that doesn’t mean tree heights are cultural artifacts.

The base b implicitly sets a minimum spread for the data of of at least a factor of b, and better several factors.

Tim_T-Bonham.net · August 13, 2012, 6:36am

But most of those aren’t normal distributions.

Phone number area codes always had the middle digit as 0 or 1 (until recently), and the bigger cities had lower numbers, which dialed quicker (NYC=212, LA=213,Chi=312). So area codes tend to have more small digits. Then the exchange (middle 3 digits) were also originally given out starting with lower digits (but not 0 or 1). So the first areas to get phone service (usually the downtown business districts) tend to use lower digits in their numbers. Plus the many 1-800 numbers. Also, often people/businesses can choose the numbers they get; and they tend to go for lower digits.

And house addresses are given semi-sequentially (usually jumps of 4 or 6 between each house on the same side of the street. But they usually start over at each block, and there are usually 10-12 houses on each side, so the last 2 digits usually don’t go past 50 or 60. So more of the smaller digits.

And even in accounting: 0 and 5 are more common than other numbers, in prices, money transfers, etc. Plus many items are priced at $x.99, with a few cents sales tax added that becomes something in the small digit range.

So several of the examples given are NOT normal distribution. In fact, I’d guess that most numbers used in “everyday numerical data” are not normal distribution, but have some intended pattern behind them.

ZenBeam · August 13, 2012, 4:27pm

I specifically said phone numbers would not satisfy Benford’s law, so I’m not sure what you’re arguing about here.

I had thought about the jumps of (typically around here) six in addresses, but decided it wasn’t relevant. Benford’s law applies to the most-significant digit(s), not to the least-significant digit.

I could have made clear that I was thinking of street addresses over the entire country. Within a single address numbering system, it might fail, but over many systems, with different maximum address numbers, I expect it will hold. Presumably there’s a distribution of what that maximum is. There will be some largest value which may skew things a bit, but then for smaller maximum values, I’d expect a smooth distribution, with different cities and towns covering a large range. But the skipping of four or six numbers between addresses, or using even on one side of the street and odd on the other won’t matter.

Topic		Replies	Views
Binary distribution in artificial and natural settings; more "1"s or "0"s? Factual Questions	13	974	January 25, 2005
Most commonly appearing first digit in (large) lists of numbers Factual Questions	5	941	May 21, 2004
Somebody explain Benford's Law Factual Questions	10	1414	August 1, 2008
What is the last digit of your house number? Miscellaneous and Personal Stuff I Must Share	6	399	May 18, 2022
Is There Any Way To Determine If Numeric Data Have Been Faked? Factual Questions	12	1296	June 8, 2005

Please ID this mathematical/statistical phenomenon: small numerals more common

Related topics