It strikes me as odd that 7 of the digits have a maximum number of repeats of 8 and the other 3 have 9. Why is it so consistent? My working assumption was that given the randomness of pi that there should be more variability than this.
Well, I do know what random means. I’m not surprised that there are strings of repeated numbers, I’m just surprised that the maximum numbers of repetitions is so consistent across the 10 digits. I expected a wider variation, that’s all.
The tool you are using seems to search the first 200 million digits of pi. Given a larger number of digits, you would presumably get longer strings of 1s, 2s, etc.
In 2x10[sup]8[/sup] random digits, it seems unsurprising that you would find most strings of length 8, and some (maybe 20%) strings of length 9.
When we say that a sequence of numbers is “random”, we mean that it is not possible to predict the next number in the sequence, even given the previous numbers. When we say that a sequence of random numbers is “uniform” (and the digits of pi are indeed uniformly random), we mean that each digit has an equal chance of occurring. If each decimal digit (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) has an equal chance of occurring, then each pair of digits (00, 01, 02, …, 98, 99) also has an equal chance of occurring, and so on with triples of digits, and tens of digits. Thus in any sufficiently large sequence of random numbers (say, the first 200 million digits of pi) , we would expect just as many occurrences of 0000000000 as we would 1111111111 or 2222222222, or for that matter 1234567890 or 1234123412. And that is exactly what you found with your little experiment.
First of all, while it is strongly suspected that pi is normal (that is to say, equal probability for all strings of a given length), it’s not known. Second of all, the digits of pi are most certainly not random, since we can predict them with 100% accuracy. Third, the property that every digit has equal probability does not imply that every pair of digits has equal probability: Consider, for instance, the number 0.1234567890123456789012…
Yes, that’s true of any naturally occurring math constant, but all the evidence we have so far points towards the digits of pi being normal.
Only from a known starting point. If I give you some subsequence of decimal digits of pi, but don’t tell you where in the sequence it starts, you can’t predict the next digit.
That number is not random and thus does not have the property in question.
It might be best not to speak of particular concrete numbers/digit-sequences as random (after all, every number X has a very non-random property: that of being exactly equal to X), and only to speak of distributions of numbers as random.
Then again, in particular contexts, this kind of speech might be alright. It all depends. As with so many things, there’s no one-size-fits-all notion of “random”.
The odds of rolling a ten sided die eight times and ending up with a particular eight digit pattern is 1 in 100,000,000 (10[sup]8[/sup]). The odds of finding a nine digit pattern in nine rolls is 1 in 1,000,000,000 (10[sup]9[/sup]). Since the site you are using has 200,000,000 digits of pi recorded, it makes sense that it would indeed fall closer to 10[sup]8[/sup] than 10[sup]9[/sup] in terms of results.
I think the OP is perhaps comparing the results to what he’d get if he sampled one digit 83 times. There, if you had such a flat distribution, you’d be suspicious.
For each digit, you’d expect a streak of 8 digits repeating once in a 100 million digits section of pi*, a 1/tenth chance of a nine digit streak, a 1/100th chance of a 10 digit streak, and so forth.
You’ve got ten digits, each of which you’d expect a streak of 8 digits, so you’d expect one of those digits to have a nine digit streak, with only a 1/tenth chance of a longer streak. So for 100 million digits, you’d expect about nine streaks of 8 digits and one of 9 digits, with small chances of longer than 9 or shorter than 8. With 200 million digits, you’d expect a couple streaks of 9 digits, but still small chances of more than 10 (about 20 percent) or less than 8 (pretty small).
So three streaks of 9 digits and seven of 8 is not unexpected for 200 million digits.
Assuming that you searched in 200 millions digits, the number (X) of repetitions of 8 digits should be on average 2 for each digit. So it should follow a Poisson law with an average of 2
( reminder : Poisson law with average m is P(X,m) = exp(-m) m^x / x! )
From this, we can compute that the probability of 0 repetitions of 8 digits (and therefore the longest number of repetitions being 7 or less) is 0.135 (P(0,2)), for each digit
Similarly, the number of occurence of 9 repetitions follows a Poisson law with an average of 0.2. The probability of at least one of such occurence is 0.18 (1-P(0,0.2))
Similar computation tells us that the probability of the longest repetition being 6 is 2*10^-9, and the proba for it being 10 is 0.02. If we rule out the case ‘6’, we obtain:
For each digit, the longest number of repetition can be:
7 with a proba of 0.13
8 with a proba of 0.68
9 with a proba 0.17
10 with a proba 0.02
Your observation with 10 samples was:
7 with a proba of 0
8 with a proba of 0.7
9 with a proba 0.3
10 with a proba 0
It seems to me that this matches quite nicely. The proper test for compairing this whould be a chi-square, but you don’t have enough samples for doing it. Pity that there are only 10 digits !
I tried coding this up. To avoid running hundreds of millions of cases, I made the approximation that for every string of 1 million digits, there will be a string 6 digits long exactly once for each of the digits 0 to 9, so I have 200 occurrences for each digit. For each of those, I then assumed there would be a 1/10 chance the next digit following the string would be the same, giving a 7 digit string. That would happen about 20 times for each digit. Then for those cases, I did the same, to get the number of 8 digit strings, and so forth.
I ran this 100,000 times, and kept track of how often I got strings of each length. What I found was that there always a string 8 digits long. About 27 percent of the time, there was at least one digit whose maximum length string of repeats was only 7 digits long. About 86 percent of the time, there was at least one string of repeats at least 9 digits long. About 18 percent of the time there was a string of repeats at least 10 digits, and about 2 percent of the time there was a string of repeats at least 11 digits long. I didn’t keep track of repeats of 12 or more digits separately. I never had a case where the maximum length of repeats was only 6.
If you assume the number of times a maximum length string of repeats is only 7 is independent* of the number of times a maximum length string of repeats is 10 or more, then about 60 percent of the time, you’ll get maximum repeat lengths of only 8 or 9, with no other lengths represented. So the possibility the OP was wondering about is more likely to occur than not, although not by a wide margin.
Most of the remaining times, you’ll get either a string of repeats 7 long or 10 long, with a few percent chance of both a 7 and a 10, or chance of a string of 11 or longer.
not strictly correct, but a more accurate approximation than assuming the two possibilities are non-intersecting. It’s not a huge difference anyway.