Is a random number generator still random if it is streaky?

I participate in an online game which makes use of a computer run random number generator to determine the chances of various things happening in the game world.

While the following is somewhat anecdotal I can say I rarely find anyone in-game who has not experienced it. Some few have actually logged numbers and done the math and here is what they found:

Overall, with a statistically large enough sample size, the random number generator does produce results that trend very close to the stated percentage chance advertised (so a 50/50 chance when number crunched with real examples comes out to 49.7/50.3 after 1000 attempts). All well and good and pretty much what we would expect from a random number generator.

However, there are some shockingly unlikely streaks in the datasets collected so far. Certainly one might expect some unlikely runs (say 20 heads in a row on a coin flip) but in 1000 attempts one would not expect to see those 20 in a row appear several times. Some of the data one person (others have done this too and bear out the same results roughly):

—Data—
Marketing missions executed: 1543 (95% courier/5% kill missions)
Number of Kill missions: 79
Number of courier missions: 1464
Percentage of kill missions: 5.12%
Percentage of courier missions: 94.88%

Most anomalous streaks:

  1. 8 kill missions in a row
    Chance: 0.00000000390625%
    Frequence: 2x

  2. 5 kill missions in a row
    Chance: 0.00003125%
    Frequence: 5x

  3. 96 courier missions in a row
    Chance: 0.73%
    Frequence: 1x

AND

Attempts: 900 (50/50 chance of success)
Successes: 467
Failures: 433
percentages of successes: 51.9%
percentages of failures: 48.1%

Most anomalous streaks:

  1. 16 failures in a row
    Chance: 0.0000439%
    Frequence: 3x

  2. 8 failures in a row
    Chance: 0.287%
    Frequence: 5x

  3. 6 successes in a row
    Chance: 1.95%
    Frequence: 8x
    —End Data—

So, is the above simply to be expected and normal? I might expect to see something like that occasionally but the odds seem distinctly against such streaks occurring repeatedly to the same person (and many others see this too). Is the random number generator glitchy even if overall it provides the percentages it claims to?

Truly random cannot be truly average.

That doesn’t mean your generator is perfectly random, though. If none of the “anomalies” had occurred, it would definitely not be random. If “too many” occurred it might be non random, but then again, it might not. There is no Goldilocks answer. Just right would be just wrong.

Tris

I’m a little confused by the chance numbers in your second part (with the 50/50 success/failure probability). How did you calculate them?

I honestly do not know since they are not my numbers. I copied the data from a thread on the forums for this game (linked below as source in the quote). To save you from reading that if you do not want to I’ll post a bit of his thinking (the numbers I posted came from post #33 on the second page of that thread). Also note that while this thread is several months old there have been several other threads on this topic and one that currently looks to be epic in length where math is being tossed about. I’ll link those too if anyone really cares but by and large they are along the lines of this.

Right, the numbers given in that quote are what I would’ve expected (8 failures in a row as about 0.39%, 16 failures as about 0.0015%), rather than what’s in the OP.

Ah, one of my favorite pet topics, streaks SHOULD be expected in a random number generator, assuming these are independent events. We have these intuitions that streaks shouldn’t happen in random numbered sets, but it’s bound to happen. If you flip a fair coin and it’s come up heads 10x in a row, the chance it will come up heads the 11th time is still 50%.

Here’s an example I think that really helps. Take a piece of paper and a pencil and try to make a random distribution of dots. Chances are you’ll want (at some level) to see the dots reasonably spread out, this is not random. A truly random set will be very likely to have some spaces that have a high concentration of dots and other spaces that have a few dots, and these spaces themselves are random. Another experiment, write down a random sequence of numbers, chances are, especially if you’re pulling them from your head, that you’ll have very few clusters, but if you pull a random sequence from random.org, you’ll see several more clusters.

Now, streaks are also expected in a random number generator that ISN’T truly random, and the only way to do that is to actually run statistical analyses about how it acts over various intervals. Bottom line, the VAST majority of random number generators are not truly random, as truly random is extraordinarily difficult to acheive, but most simulate it well enough, that it doesn’t matter when combined with the other variables for most applications.

And, as for statistical anamolies… well, those are expected as well. For instance, if you flip a coin a million times, the chance that it comes up heads every time is an enormously small chance, but IS possible even with a fair coin. Now say you flip it a very large number of times, the probability of a million head sequence approaches 1 as the number of flips approaches infinity. IOW, statistical anomalies, even ones of exceedingly low probability WILL eventually happen over a large enough sample.

Of course, some of those streaks seem a little improbable to occur multiple times in such a small set. Is it possible there’s other dependencies that are not accounted for?

Just a WAG here on my part but perhaps he is calculating not the chances of one occurrence but of that same thing occurring three times (or whatever it was) in that sample set.

Either way I guess that would be another way for me to pose the question. That streaks happen in a random series I get. But what are the chances of the streak re-occurring after a given number of tries? And how often can we say repeating streaks is fine and the random number generator can be considered to be truly random (as far as any computer generated number can be said to be random…no need for that discussion here)?

For instance, if we ran a series of 100 tries with a 50/50 chance of a 1 or a 0 coming up I doubt we’d call the generator random if it produces fifty 1’s then fifty 0’s even though it averaged to a 50% chance overall (I know it could but very unlikely and particularly if we run the test many times and see it re-occur with some regularity).

In a random (i.e., incompressible) binary string of length n, you should expect to see runs of length log[sub]2/sub. log[sub]2/sub = 10.6, so as long as you’re splitting data into two categories, you should expect to see 11 of the same category in a row at least once.

Didn’t von Neumann say - “Anyone who uses arithmetic methods to produce random numbers is in a state of sin.”

Cool…interesting to know how that is calculated.

However, note that the sample set with 1543 tries the chances were not 50/50 but rather 95/5 so in that light the streaks with the 5% chance are more improbable.

So when assessing a computerized random number generator how is it judged “good enough” understanding it will never be totally random? I agree for most purposes very nearly random is fine but still there should be some criteria to determine when it suffices for the task at hand.

Well, really no random number generator is “random” at least not one dictated by computers (that I’ve seen). Generally how it works is it will take a number from the microporcessor’s clock, put it into an algorithm, shave off a few bits and spit out your number.

http://www.thedryeraseboard.com/compsci/algorithms/randomnumbers/

It’s not a long page so i’ll jsut quote it

So this is teh expanded version of above.

However, in spirit of your OP and assuming we have a mythical random number generator, streaks (or what appear to be) are to be expected occasionally in a truly random system (where potentially millions of people are asking for something random every second or minute).

Let me actually rearticulate that last point, there are tons of people doing random number requests in a row. What you think is you asking the server 20 times for some number is really you asking requests 1, 7, 12, 27, 93 etc so you’re not gettinga complete picture of the results.

If you have 1000 numbers in a row, where the choices are 1 or 0, there is a .5^10 that you’ll get a streak of 10 in a row. But the part that you’re missing is that there are 990 tries to do that.

The OP is right, those streaks are not what would be expected from a truly random process. The probabilities quoted are correct for getting a streak of length N in exactly N tries. For example, for 8 kill missions in a row, if you run 8 missions, the probability of all 8 being kill missions is 0.00000000390625%. For 1543 missions, the probability is higher, but certainly not more than 1543 times higher, which would be 6.0273E-06 percent. Since a run that length happened not once, but twice, the RNG is almost certainly flawed.

Rereading it you and he are right. All I know is in many games, certain quests will eb given more “priority” than other.

I.E. Given a “radom” number between one and 100
1-60 will be “kill quests”
61-80 will be delivery quests
81-100 will be something else (if applicable)

It may be because the developers designate one as more fun than the other. As such though, I wouldn’t say they’re not “random” though, take one of those wheels with a spinner in teh middle (think twister). There’s simply a larger “area” for one type over another. It’s random, it’s just not a perfect 1 in x chance.

To use an example from D&D, if you have an armor class of 16, you still roll a d20 assuming no bonuses you simply have a window of 5 numbers in which you’ll hit. noone would argue these dice rolls aren’t random, just a larger window is allowed, allowing long streaks (when reduced to “did” or “did not” hit) to occur more frequently.

There is an interesting thing which happens with judging a random number generator to be flawed or not. I mean, if the OP told us the entire sequence of successes and failures he observed in part 2, we could say “Oh, that was an event of probability 1/2^900. Incredibly unlikely!”. But we could say that no matter what the sequence was…

As it happens, we find some kinds of events more significant than others, as indicated by our willingness to make certain inductive inferences (but not others); that is to say, in our mind, our prior probability distribution, so to speak, for the sequence of numbers output by the generator is such as that observing a long streak of successes does increase our confidence in a following success, or other such things (when we observe the sorts of patterns that strike us as significant, we begin to assume that the generator has been engineered in such a way as that it will follow those patterns, rather than engineered in such a way as that it should behave ‘randomly’). This allows us to say “Oh, yes, given the observations so far, I am no longer willing to model this generator’s activity as ‘random’ (i.e., given by independent draws from a Bernoulli distribution).” But it’s not quite as simple as saying “Oh, something happened that, had this been a random distribution, was extraordinarily unlikely”, because, well, no matter what happens over many trials, the result is one with extraordinarily low probability of being given by a random distribution.

Or, as Feynman put it: “I had the most remarkable experience this evening. While coming in here, I saw license plate ANZ 912. (Calculate for me, please, the odds that of all the license plates in the state of Washington I should happen to see ANZ 912.)”

I tried coding this up in MatLab, and ran some simulations with P(kill) = 0.05, and 1543 events to see how long of a streak I would get. In a hundred runs, I got a maximum streak of 2 85 times, got a streak of 3 14 times, and got a maximum streak of 1 once. Never got a streak of 4.

Doing a sanity check, that’s roughly what you’d expect. Out 1543 missions, you’d expect about 77 kill missions. Given 77 kill missions, you’d expect the next mission would be a kill mission about 4 times. Given 4 kill missions, you’d expect about a 20 percent chance one of them would be followed by another kill mission. This isn’t the right way to estimate the probability, but rather is an over estimate, so getting a streak of 3 14 times out of a hundred is believable.

Indistinguishable, the problem you’re describing isn’t really relevant here, any more than if I said I flipped a coin and got heads 100 times out of 100, and was asking if you thought my coin was fair. Sometimes RNGs really are broken.

Of course they sometimes are, and I even outlined on what basis we can take a generator’s output and use it to deduce “This probably isn’t random”, using our usual inductive logic. I’m just pointing out the curiosity that it’s not enough to simply say “Oh, that output which I observed is one which, had this been random, would have low probability of occurring”, in and of itself.

Since he saw it, wouldn’t it be 1? Am I missing the joke here? Stats makes my head hurt, but I try to force myself to read about it.

I’d say you’re kind of ultra-getting it, actually. Feynman makes the comment in the middle of a discussion of the meaninglessness of “calculating probabilities after the fact”. Well, I wouldn’t quite say it’s meaningless, but there is a danger in misunderstanding or overstating the usefulness of doing so.

Conditioned on the knowledge that Feynman would see ANZ 912, then, of course, the probability that Feynman would see ANZ 912 is 1. However, without conditioning on that knowledge (i.e., looking at the “prior” probability distribution), the probability of seeing ANZ 912 is ridiculously low. But the fact that he did end up seeing ANZ 912 doesn’t, in itself, cause us to say “Oh, that can’t be a random license plate” or any such thing. For whatever reason, it’s not the particular kind of low-probability event which causes belief revision; it doesn’t strike us as significant.