Repeated numbers in pi

There’s not much to add in terms of math on this one, but I remember very clearly something a stats professor once said “‘Random’ is a name given to a specific kind of pattern.” Randomness makes things unpredictable at the small scale, such as what number is coming up next, but any phenomenon that is truly random fits some very specific expectations over a large enough number of samples. Any amount of non-randomness will become very obvious.

Well, I wouldn’t go so far as that; ‘pattern’ is kind of a loaded word in this context. A better way to explain it is to use an example.
If you make up numbers, say
0 8
1 12
2 8
3 6
4 0
5 0
6 453
7 5
8 7
9 8

as the number of strings, then the sequence will not be random by definition, because you can make accurate predictions; for instance a string of fours or fives cannot be found, and strings of sixes will come up very often.

The thing people often don’t get about statistics is they don’t tell you what WILL happen. They only tell you what is LIKELY to happen.

And just because there is virtually no possibilty of it happening doesn’t mean it can’t happen.

For many that’s a tough thing to grasp.

So it may seem unlikely or not random, but in random events a seemingy unrandom order can occur. But all it means is that if it DOES happen you need to check your data to make sure it’s correct.

I was playing around with this some more. Instead of always assuming 200 million digits, I let that vary and then worked out the chance of getting only two different maximum lengths of strings of repeats (which seems to be what made the OP wonder). The result varies periodically every factor of 10, so for example the results for 40 million and 400 million digits are the same. I was wondering if the OP just happened to pick a number of digits to search through that tended to have only two different lengths, and if other choices would have more variation.

It turns out the least likely chance of getting only two numbers of digits is at 90 million (or 900 million) digits, at 50.2 percent. The most likely chance is 70.8 percent at 300 million digits. Those percentages seem to be accurate to within 0.5 percent. So the OP using 200 million isn’t particularly special, and for any number of digits, more likely than not you’ll only find two different lengths of repeats.

It’s suspicious that the minimum and maximum results are so close to 0.5 and sqrt(0.5), and that the maximum occurs close to sqrt(10) * 10^N. All the differences are within the likely errors of the approximation and randomness of the results.