If at all? Does it vary by programming language?
Using what algorithm? For what purpose?
Generally, the answer is “never” – in most common algorithms, the sequence will retain its “randomish” character until the seequence plays out. On the other hand, pseudo-random sequences usually repeat after some number of iterations, so if you want a REALLY large number of them, you’d have to reseed.
If you’re depending on your numbers being unpredictable (casino games, for example, or cryptography), you’ll likely not be satisfied with a purely algorithmic generator, no matter how often you reseed.
On Edit: And no, the programming language doesn’t matter, except inasmuch as that might determine what algorithm is “standard” for that language. The algorithm produces the same sequence for the same seed, regardless of what language it’s implemented in.
Well, in this instance, I was trying to determine under certain conditions, how often would a particular number come up, given a million random draws. There was a significant statistical difference (about 4%) between seeding it before each draw and never seeding it. However, there was no statistical difference between never seeding it and seeding it every 10,000 draws.
From what I understood (seemingly mistakenly from your post… my last class on the subject was way too long ago), the seed selects a list of psuedo-random numbers and simply went down that list with each draw. Eventually, I was thinking that list has to run out, and then probably repeat itself. So if the list is 10,000 numbers long, for example, then running the million draw test without reseeding would just be repeating the run 100 times.
I can’t answer your question, but I can recommend a text: Numerical Recipes in C has a long and excellent chapter on pseudo-random number generation, and the pitfalls to be avoided. A decent PR generator will not repeat the sequence for the entire integer space over which it operates. So a 32 bit generator will not repeat for 4.3 billion draws. A badly done generator will repeat far more often. The book mentioned above gives several examples of “good” generators, and I have used the underlying algorithms to code in assembly.
xy
If you’re seeding before each draw, you’re not really using a pseudo-random number generator at all–you’re just using the seed as the source of randomness. This is fine if you’re seeding from a truly random source (such as a photon detector or Geiger counter attached to the computer). But if you’re seeding from the system clock (which is typically the case), then you lose all randomness unless the time between draws is both random and greater than the clock’s resolution.
For example, if you’re seeding before each draw using the number of seconds elapsed since midnight, and you’re drawing 100 times per second, then you’ll end up with clumps of 100 draws with the same value.
First, it sounds like this is a problem that can be solved exactly with basic probability rather than approximated by simulation. You don’t have simulate a coin flip (or actually flip it) a million times to find out that the chances of heads are 0.5.
You have the basic concept correct, but there is not really an a priori list. But because pseudo-random number generators are deterministic, it is the equivalent of a list. And the “list” may have a finite size before it repeats (I am not expert at pseudo-random algorithms so I don’t know whether they really repeat, or how long the sequences get before they repeat).
Gah! Seeding before every draw?! That’s a huge no-no.
The fact that you found a statistical difference between two different methods of using the pseudorandom number generator is a huge hint that at least one of them is bad. The only practical way to test the quality of pseudorandom number sequences is comparing them via various statistical tests. A good pseudorandom sequence will be statistically similar to a truly random sequence. The fact that seeding it every 10,000 draws was statistically similar should reassure you.
In general, know the period of your pseudorandom sequence. The documentation should tell you. If it doesn’t it’s not a good quality generator. A good system rand() function will have a period of at least 2[sup]32[/sup]. If you get a generator from the peer-reviewed literature, you’ll find generators with periods like 2[sup]160[/sup] or longer.
It’s usually better to seed a generator once. If you need so many pseudorandom numbers that you’re using a significant fraction of the generator’s period, you need a better generator.
Yeah yeah, I know. :smack: I just wrote the program kinda quick and I knew I was going to have to come back and rearrange how I seeded it, but I wanted to make sure all of the mechanics worked first. Additionally, I was curious as to what difference it would make between seeding correctly and incorrectly. One thing that I found interesting of was that the percentage of hits was steadily increasing, instead of settling on a constant.
Well the basic problem can be solved with mathmatics, but the larger issue takes into consideration human choice, which is beyond my ability to solve mathmatically (a friend of mine is going to give it a try), so I created a simulation that would give me a pretty good idea instead. (The problem itself would be for another thread, I think. )
My statistics professor said that a sample group of 31 is when you’ll know how data’s going to turn out (neither I nor my whole class understood how, or totally believed him, to tell the truth), but I tried a thousand. But as I mentioned above, the hit percentage was increasing, so I tried a million before deciding to go back and fix the blasted seeding issue.
The simplest hypothesis tests for small samples tend to involve the t distribution with n degrees of freedom, where n is the sample size. At 31 degrees of freedom, the difference between the t distribution and the normal distribution is negligible, so you just use the normal approximation.
Of course, the question of how many data points you need before you believe your conclusion depends on what exactly your conclusion is. If your conclusion is that so-and-so exists, or is possible, then you only need one data point, if it’s a positive one. If your conclusion is that something happens 99.999% of the time, then you need more than 100000 data points.