Is it possible to apply Bedfords law to a set of numbers that range from 1.00 to 15.00 (accuracy to two decimal places) and how?
Sorry, that’s Benford’s law, not Bedford’s.
The set of numbers I want to test with Benfords law is the results from an exam that I believe was rigged.
Benford’s Law applies only to log-normal distributions, which yours probably is not. The really interesting part of Benford’s Law isn’t the law itself, but the observation that log-normal distributions (or at least, distributions close enough to log-normal that it makes a good approximation) are surprisingly common.
In you case, you would want to start by constructing some model for what you’d expect the distribution to be to begin with. What’s your set of data? Is it the set of scores from the various students who took the exam? Those are usually bimodal, with one hump corresponding to the students who studied, and another hump corresponding to the students who didn’t. A simple model would thus have about five parameters that you’d have to find.
There’s no reason you can’t apply the Benford probabilities to any interval. I’ve been told that people use them to look for bogus “random” data, because people constructing data tend to make numbers beginning with higher numbers too common, and the predictions as generated by Benford give the real rate of incidence.
If it bothers you that you have part of the range covered by the range of 1 to 9, and part by numbers beginning with “1”
, so you think that your results will be skewed, consider this: The Benford vprobabilities are unchanged by multiplication. In other words, it doesn’t matter if you’re cataloguing, say, the lengths of river as measured in miles, or in kilometers. Or in feet, for that matter. So multiply all your numbers by 1/1.5 = 0.666666667 before you take the first digit and “bin” it.
One way you can see that the Benford digits are insenmsitive to such m,ultiplication is to view the numbers as if they’re on a circular slide rule, which has the 1 at the low end joined to the 1 on a linear slide rule at the high end. The probability that a number will begin with a particular digit is proportional to the next number’s width on a slide rule, so “1” shows up as the first digit almost 1/3 of the time (actually 30.103% of the time, since the log base 10 of 2 is 0.30103). If you view numbers generated at random as occupying with equal probability any angle along that circular slide rule, then the probability of a number starting with digit n will be the width of the n + 1 slot on there = (log (n + 1)/ log(n)) Multiplying the digits all by any number is equivalent to simply rotating them relative to the sl;ide rule (just as multiplication on a circular slide rule is equivalewnt to rotating one circular rule relative to another).
Dividing your digits by 1.5 ensures that they all fit in the range 0 to 10, and your digits ought to fill one revolution of that slide rule.
It needs to be over several orders of magnitude. 1 to 15 is only one. Plus 1/3 of those numbers start with 1.
I don’t believe it needs to be over several orders of magnitude, as long as there are a lot of numbers. See my comments above about how to deal with all the numbers that begin with “1”
The law only applies to numbers that are “scale free” (which is somewhat more general than lognormal but that’s the best known case). That assumption would mostly likely not apply to test scores.
The more orders of magnitude the number set is, the better it works. When you’re talking about numbers between 1 and 15, barely 2 orders of magnitude, no it’s really not going to work.
Yes, thinking overnight, it occurs to me that over a range of 1 to 15, assuming equal probability for al outcomes, you’ll have 6 out of 16 chances for 1 to be the first digit (37.5%), and 1 out of 16 (6.25%) for each of the subsequent digits to be first. The prevalence of the 1 is a manifestation of the mechanism of Benford’s law to start working, but you’re going to have equal probabilities for all other digits. You really do need to have operation over several ranges to see the proper Benford probabilities. There’s a neat chart of this in the Scientific American article on Benford probabilities (“The Peculiar Distribution of First Digits” R. Raimi, Scientific American Dec 1969, pp. 109-119) that shows the probabilities of the first digits as the number of digits increases.
If you really DO only have numbers from 1-15, then, assuming that you have equal probabilities of each occurring, you’ll have a kind of “Mini-Benford Law”, with the probability of “1” being the first digit 37.5% of the time (more often than the true Benford probability of 30.1% !), and all the other digits at an even 6.25%
How does this look?
I did it on one set of grades and got these results:
1 Count 49
3 Count 2
4 Count 5
5 Count 24
6 Count 39
7 Count 82
8 Count 116
9 Count 77
They are from a set of almost 400 grades. There isn’t a single occurence of the number 2 which is weird.
Assuming equal probability for all outcomes is pretty much the antithesis of Benford’s law. Chronos’s reply, suggesting to make a model of what you expect the distribution to be, is the way you’d have to approach this. With that, you could look for differences between what the model gives you and what your data has.
In the OP, one thing that might stand out is the probabilities of the digit after the decimal point. People aren’t that good at making random numbers. You’d need to have enough data to have valid probabilities start to show up.
Are most of those 1s from scores of 10 to 15? What does that look like if you plot 10 - 15 separately, instead of lumped into 1?
To me, assuming that almost all of those 1s are from scores of 10 and above, that looks like a nice modal distribution, which is what you’d expect.
ETA: The classic counter-example for Benford’s law is adult human heights, which has a similar humped distribution. Given in meters or centimeters, the initial digit is almost always a 1 and maybe a few 2s. Few people are shorter than 1 meter, or taller than 2 meters. Given in inches, most will start with 6 or 7, with a few 5s and 8s, and only rarely anything else.
human hight is always going to be 1 digit (meters) or two digits (inches)… whichever unit you use, the range of human height is going to keep it as one order of magnitude. Of course you’re using the same unit for everyone, regardless of which unit you use. Use millimeters and it will have what, like 3 digits. Maybe there’s a unit that’s borderline, where some people are in the 90’s and some people are in the 100s so then you have two orders of magnitude. But like I said before, it doesn’t work well with only two. I can’t imagine a unit that will have 3 orders of magnitude or more. Human height doesn’t vary that much.
Human height isn’t going to ba a Benford-law-type distribution. For a given population subset (like Males age 25 with similar genetic background), yoyu’re likely going to get a Gaussian distribution, with the classic bell-shaped curve having a characteristic mean and standard deviation. The Central Limit Theorem dictates that you’re going to get that sort of thing from most random groups of similar objects.
Benford’s law gives you the probability of the first digit being a given value, and cases like this probably won’t work well, especially if your measurement units are comparable to the standard deviation. You MIGHT bee a Benford distribution o first digits if you measured in units much smaller than the standard deviation – say, in microns. Or in thousandths of an inch.
Although why you’d want to know the value of the first digits in that situation isn’t clear to me. It would tell you if people were cheating, and making up the distribution. But it would also giver you “false” results if people were taking the data in more normal units (inches, or cm) and converting them to microns afterwards.
Measuring human height in meters doesn’t mean you only use a single digit. I’m not 2 meters tall, I’m 1.8, or 1.80, meters tall.
Yes, and neither apparently is the OP’s distribution.