Targeting the chi-squared test actually gets closer to your above statement than physical dice would provide.
This goes back the the Monte Carlo Fallacy again, “rolling a fair die with the PRNG is as close to uniform is possible” is not being random, but we expect that it is.
Under a true random system, even after rolling snake eyes for a million throws, the next throw has the exact same chance of hitting snake eyes as the previous throws did.
You may expect random to follow a normal distribution and in general that is a fairly safe assumption, but in a true random system that doesn’t have to happen. The best guess for the distribution of a random function is a normal distribution. This in no way justifies that the distribution should be normal, it is just a basis for a guess.
Note dice are a bad example, as they will have 6 peaks and probably not fit the normal distribution well and even though a normal distribution would be reasonable evidence of randomness, it doesn’t demonstrate randomness objectively. We use tools like the central limit theorem because proving a process to be random is very very hard, and I am pretty sure it is still an unsolved problem.
By using the chi-squared test we have something that we can model, and use as a safe approximation of true randomness while really not impacting the payout or the chance of payout in a meaningful fashion.
But to be clear once again, these progressive slots are popular because people prefer to play games with large potential wins, and when people play those games that have these large potential wins they are way too optimistic about their chances of winning which is the ONLY reason to be outraged about this.
If people only wanted to play games like blackjack and video poker, the casinos would be full of them, because the house doesn’t make money due to “luck” and they have no need to “cheat” at this level to do so.
A chi-squared test against a normal distribution doesn’t favor the house, or the player. The entire way the game is structured favors the house.
I don’t see what the Monte Carlo fallacy has to do with that statement. When I say “as close to uniform as possible” I mean, “indistinguishable from a theoretical uniform entropy source” as if we could grab a uniform distribution from the aether and draw from it. This includes nasty things like really long runs, or seeds that have crazy clumping, and every other property of a theoretically truly uniformly distributed source of entropy.
I’m not sure why you’re hung up on the normal distribution either, it’s simply the most common distribution for most processes and events we observe in nature to resemble.
You’re right that given a true source of entropy we don’t actually know its distribution, we can only, in a bayesian sense, estimate that it’s similar to a specific distribution we’ve defined. That is, a truly random source of entropy is only with some degree of certainty normally or uniformly or whatever else distributed, but that doesn’t invalidate the notion of “mathematically pure” distributions, even if we can’t actually produce one.
Because the Chi-squared test ensure that PRNGs produce a normal distribution, is easy to test and fairly easy to implement.
And you haven’t explained outside of the gamblers fallacy why this is unfair. You have a very similar if still highly unlikely chance of hitting the jackpot in a game that you should already know is going to make you lose money almost always.
It is simply a way to as try and closely approximate true random numbers and to verify that approximation is happening.
It’s not unfair? When did I say that? I’m saying for the programmer it’s much easier to work with weighting odds when you can roll uniformly because otherwise what you expect to happen won’t if you have an improperly characterized distribution. This is true in everything random. If you want something to happen 80% of the time you need to know what the distribution your (P)RNG actually looks like.
That can be many different distribution, weighted in many ways, because you can convert between samples from a lot of finite distributions. I’m talking about uniform because uniform is usually how PRNGs work, but if it were normally distributed (not technically possible because of the finite bit width but let’s say for argument) it wouldn’t matter because you can discretize normal samples to create a finite uniform sample anyway. You just have to know what you’re dealing with.
I think he is talking about “quasi”-random numbers that may be very non-random. I don’t think these are remotely appropriate for games of chance because, well, they are not random (or even pseudo-random), so I was amazed (well, not too amazed) to hear that the state and casinos don’t care, so long as they get the punters’ money.
By definition, low-discrepancy sequences will not match a chi-squared test as they are not normal distribution.
It will also reject some PRGNs that are quite popular, both bad ones and good ones because it is sensitive to noise.
This typically doesn’t impact the casual players win, but can often introduce patterns that people who are looking for and can use to win a lot of money, which is actually a bad thing for casual players and the house because it will reduce the perks and thus the experience.
In above the board legal gaming there isn’t much of an incentive for the house to cheat but there is a huge risk due to professional cheats.
Okay, I seriously, legitimately don’t understand. Please explain what the normal distribution has to do with this.
Every PRNG I’ve ever used has aimed for emulating a uniform distribution, not normal. Certainly, Salsa, ChaCha, Mersenne Twister, Isaac, and Xorshift all are typically implemented with generic “next_value” functions which return a number uniformly distributed between 0.0 and 1.0 floating point (or 0 and MAX_INT). Of course, it’s not truly uniformly random, they’re vulnerable to birthday attacks and such (with the better ones like the block ciphers requiring a lot of time to compute the attack provided a large enough block), but their quality in the short term is similar to observing an actual uniform distribution, with some being better at it than others.
I seriously have absolutely no clue what point you’ve been trying to make against me this whole time.
That really depends on whether whoever conducts the test knows what they are doing. Example (putting aside the question of generating normal distributions for now): my special secret electromechanical black box supposedly simulates the roll of a die. The brochure says something about a “standard chi-squared test for goodness of fit”. No problem; I pull 6000000 values from the box and come up with the following table:
Hmm? Oh, that’s from the old SDMB avatar extension. In online use avatars are pictures that accompany profiles, and the browser mod parsed that line to find the URL for the picture, and I didn’t want one so I set it to none.
In a scientific context, this is not advised. Link to online manual of Stata, a statistical program: [INDENT] Do not set the seed too often
We cannot emphasize this enough: Do not set the seed too often.
To see why this is such a bad idea, consider the limiting case: You set the seed, draw one pseudorandom number, reset the seed, draw again, and so continue. The pseudorandom numbers you obtain will be nothing more than the seeds you run through a mathematical function. The results you obtain will not pass for random unless the seeds you choose pass for random. If you already had such numbers, why are you even bothering to use the pseudorandom-number generator?
The definition of too often is more than once per problem.
If you are running a simulation of 10,000 replications, set the seed at the start of the simulation and do not reset it until the 10,000th replication is finished. The pseudorandom-number generators provided by Stata have long periods. The longer you go between setting the seed, the more random-like are the numbers produced. [/INDENT] They continue: [INDENT]
There is another reason you might be tempted to set the seed more than once per problem. It sometimes happens that you run a simulation, let’s say for 5,000 replications, and then you decide you should have run it for 10,000 replications. Instead of running all 10,000 replications afresh, you decide to save time by running another 5,000 replications and then combining those results with your previous 5,000 results. That is okay. We at StataCorp do this kind of thing. If you do this, it is important that you set the seed especially well, particularly if you repeat this process to add yet another 5,000 replications. It is also important that in each run there be a large enough number of replications, which is say thousands of them.
Even so, do not do this: You want 500,000 replications. To obtain them, you run in batches of 1,000, setting the seed 500 times. Unless you have a truly random source for the seeds, it is unlikely you can produce a patternless sequence of 500 seeds. The fact that you ran 1,000 replications in between choosing the seeds does not mitigate the requirement that there be no pattern to the seeds you set. [/INDENT]
There’s more detail on the forum about using a clock to set the random seed:
[INDENT]Closing remarks
So what was wrong with Allan’s original suggestion?
Allen based the seed on the time of day. Let’s say Allan gets to the
office around the same time every day. Let’s assume Allan runs
simulations around the same time on days he runs them. Perhaps he
starts them right after lunch, or just before going home. Alan is now
drawing seeds in close proximity to each other. He is trusting H() to
jumble that for him. Moreover, he is drawing from such a reduced set
that over a period of time, Allan is likely to choose the same seed!
That’s why I offered an improvement based on the date and time.
The seeds are still ordered, however, so I’m still trusting H().
In my daily set-the-seed suggestion, I just used the number of days
since January 1, 1960. I am really trusting to the design of H().
Can’t I give you a good, deterministic way to set the seed, no matter
how complicated? No. It’s the deterministic part that’s the problem.
Complication has nothing to do with it. Complication may confuse you,
but it does not confuse the universe. Here’s one way to generate seeds
for those of you that want a process. I warn you, it’s random: [/INDENT] TBC! https://www.stata.com/statalist/archive/2010-08/msg00834.html
IIRC the PRNG used in Windows will not re-seed more often than every 100 ms. There is also an upper bound on the amount of pseudo-random data output before a reseed is called for.
I don’t know exactly what randomness sources are used, but an interactive system, as opposed to a fruit machine, has many more unpredictable sources besides the timer such as mouse movement, keyboard clicks, disk access, network access, so there should be enough to distill a few bits for the occasional reseeding.
On the other hand, it’s possible that you do have a true random number generator (based on radioactive decay or whatever), but that its throughput is much lower than you need for your application. In that case, best practice is probably to use your true RNG to seed your PRNG, and then re-seed as often as your true RNG allows.
…provided your RNG samples without replacement, either actually or practically.
I mean rolling a single die is a decent RNG in terms of randomness, but you don’t want to just scroll through 6 sets of random series.
The way stata handles this is by making a “State” available. The state identifies a particular random seed as well as the location in the succeeding pseudo-random vector. For stata’s old PRNG (KISS), the state was a hash of maybe 80 characters IIRC. For the new one (Mersenne Twister) it’s something like 5000 characters. Anyway, even with KISS a budding physicist could set the seed when she was born, and have plenty to last an entire lifetime of simulations 24/7. As for the period length of the Mersenne Twister, it is just ludicrous.
Right, if your original source of randomness is dice, then you wouldn’t want to use one die roll to seed your generator. You’d want to use enough dice rolls to span the space of possible seeds. So you’d keep on rolling dice in the background until you have enough, then re-seed with all of those rolls, then start accumulating rolls again for the next re-seeding while you use the output of the PRNG for whatever you’re using it for.
To clarify, I never correct people when they assume I am a ‘she’ because I personally don’t think it matters what gender a poster is, and most people who are a ‘she’ often are assumed to be a ‘he’
I am concerned that there will be a perception that I am making that claim, and the is important to other people so I will clarify that I am male.
Thank you for trying to correct this, and I think our intent is the same, which is to respect the individual and help us all avoid hurtful assumptions.
Really, you can have a generator that produces any distribution at all, as long as you know exactly what that distribution is. But that’s by far easiest for a uniform distribution.
Oh, I think I was confused who DPRK was referring to and thought they were trying to clarify my position. I thought they were referring to me, and then got confused by avatar and rationalized to be about the line in my profile and, well…
I mean, the issue with modern computers is you’re limited to finite, discrete-valued distributions. I guess with bignum/Real packages you could defeat this, but I struggle to think of how you’d actually generate relevant values. At some point you have to sit down and generate bits of randomness. Maybe some sort of Markovian model for generating symbolic expressions?
Which isn’t to say you can’t get anything resembling a normal distribution (or otherwise) on a computer, obviously there are packages that emulate any number of distributions, just… not the true versions.