Doesn’t quite work. You need to pick from all the ranges. Consider just two numbers, with N=4. If the first number is 1 or 4, then we are left with a single 3-long range. 3 possibilities each, or 6 total. If the first number is 2 or 3, then we have a 1-long and a 2-long range. If we pick from the 2-long ranges, then we get 2 possibilities each, or 4 total. All told, we have 10 possible choices.
But in fact there are 6 distinct outcomes: (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), and (3, 4). There’s no way to distribute those 6 outcomes evenly across the original 10 choices; some will happen more often than others. The method of picking needs to produce a multiple of the actual number of outcomes. If the second pick can choose from any of the 3 empty slots, then we have 4*3=12 choices, which works fine (the factor of 2 comes from the fact that we can pick the numbers in either order).
Things that go thru ones head in the middle of the night. That isn’t quite right. This would be true if the three numbers were “labelled” in some way. E.g., a green, red, and blue one. But since they are not otherwise distinguishable the formula has to be divided by 6. C(n,3). But the “multiple of” still holds so N*(N-1)*(N-2) falls into line.
I can’t imagine a method to solve this problem where the product of the ranges of random choices gets below that. Something of a challenge there, maybe.
As to “it’s always a power of two” thing. Supposed you wanted a number from 0 to 2. (Range size 3.) One thing you could (foolishly) do is do 3 random calls, add them together and the result is a range that is a multiple of 3 and then you mod. Not a good idea since, for example, 0 is underrepresented. But that can be fixed. The real trick is doing this with far less than N calls. Log N is easy enough. (All numbers of sums of powers of two.) But to get to a constant requires a little bit more thought.
In any case, the “it’s always a power of two” falls apart with multiple random calls put together intelligently.
In addition to the nonuniform distribution issue that @Dr.Strangelove brought up, it appears your method requires branching to decide which range to use.
This also seems that it would require branching. Any time you need a test of any sort to see if the code did what you want, that implies a branch to me.
This would not meet the requirement of 3 guaranteed unique numbers, without needing to check them (and thus adding a branch),
Again, you’d have to test them to see which numbers are unique. And with a random range as small as 1-5, I don’t know what the probability formula is but I believe the chances are significant of not getting 3 unique numbers.
I have a feeling that it’s going to be very, very difficult to rigorously enforce “no branching”, since with no branching at all, it’s almost impossible to even have an algorithm, and a lot of “individual steps” will have branching hidden in their implementation. For instance, a modulo operator might be written with branches.
Okay, I think I’ve found what I was looking for earlier:
Suppose we’ve already picked the first two random integers as I described earlier. Now add the following lines to generate the third random integer:
R = floor(random(N-2)) + 1
R = R + ((R >= RandomInteger1) or (R >= RandomInteger2))
RandomInteger3 = R + ((R >= RandomInteger1) and (RandomInteger2))
For an explanation of the logic, suppose N = 10 and the first two random integers turned out to be 4 and 7.
This leaves 8 (N-2) possibilities for the third random integer, so begin by picking a random number R from 1 to N-2.
If R is below both 4 and 7 (i.e. 1, 2, or 3), take RandomInteger3 to be R itself.
Otherwise (i.e. in the case that R is >= at least one of them), add 1 to R. (So, 4 → 5, 5 → 6, … 8 → 9.)
If the result is now between 4 and 7, this is our RandomInteger3. Otherwise (i.e. R is >= both 4 and 7), add one more, so that 7 → 8, 8 → 9, and 9 → 10.
There’s no reason for a mod operation to have a branch unless you’re on an architecture with no divide instruction. Still, you have a point.
If we’re trying to be precise, we can come up with a virtual machine with instructions that we simply define to be branch-free. For instance, to implement mod, we might have (first argument is the destination):
DIV d, a, b
MUL d, d, b
SUB d, a, d
That’s the same as d = (a mod b) = (a - (a / b * b)).
And of course we can simply create an instruction to build the random numbers in the first place. Say, that produces a result from to 0 to range-1:
RNG dest, range
For convenience, I would add a simple macro substitution language, to define common sequences. Say:
macro MOD dest, arg1, arg2
DIV dest, arg1, arg2
MUL dest, dest, arg2
SUB dest, arg1, dest
end
Yeah, that’s another reasonable approach. It’s acting as a kind of sorting algorithm. Increment if the new number is larger than the smallest of the existing numbers, and then increment if it’s larger than the largest of the existing numbers. It could be extended further:
R4 = R4 + ((R4 >= R1) or (R4 >= R2) or (R4 >= R3))
R4 = R4 + (((R4 >= R1) and (R4 >= R2)) or ((R4 >= R1) and (R4 >= R3)) or ((R4 >= R2) and (R4 >= R3)))
R4 = R4 + ((R4 >= R1) and (R4 >= R2) and (R4 >= R3))
Obviously, this starts to get unwieldy.
However, one can use sorting networks instead. These are fixed sequences of conditional swaps (i.e., sort two values) that can achieve a total sort in O(N*log N) steps (actually it’s more like O(N*log^2 N) in practice, but close enough). Using this method, sort all the existing values between each new pick, then perform:
Rn = Rn + (Rn >= R1)
Rn = Rn + (Rn >= R2)
…
Rn = Rn + (Rn >= Rn-1)
There’s no need for the extra ands and ors since we know R1…Rn-1 is already monotonically increasing.
Just to be mean, there is a difference between the inbuilt psuedo-random number generators and a true stochastic random process generating numbers. All the inbuilt RNG functions maintain internal state and within (incomplete) known mathematical understanding of their operation will never provide two consecutive random numbers. A figure of merit is how many cycles it takes to get them to loop around. Its tend to be big. Really big.
A common implementation tactic would mean if the RNG function returned two consecutive identical values it would have managed to stick itself in an infinite loop of identical outputs. It isn’t impossible, but unlikely, where unlikely is of the order of waiting for the stars to go cold. If your program starts with a known seed for the RNG this won’t happen one tested, but if the seed is built from some semi-random component - (like the bottom bits of the system clock, and other junk) just maybe.
If this was a homework question, and a student pointed this out, and simply called the RNG three times I would not give them zero. I would however deduct something for being a smartarse.
That is not true in general. It was only ever true for some early, poorly designed PRNGs (which, admittedly, might still be alive and kicking in some places). While it’s true that the internal state vector is designed to never produce the same state vector twice in a row (since, as you say, that would imply it is stuck in a loop), only a small portion of the internal state vector is ever returned as the pseudorandom value–and that portion most definitely can repeat. It would fail all kinds of PRNG quality tests if that did not happen.
Also, modern CPUs have true physical RNG hardware, generally driven by thermal noise. They still go through an algorithmic “whitening” process to reduce bias and other effects, but it is true randomness. And like any random values, these can (and must) occasionally produce duplicate values.
That raises an interesting issue. There are in some sense two sorts of random number generators. Truly stochastic, which have particular value in cryptography, and psuedo-random number generators, which have particular value in simulation and numerical work. For the latter you really want deterministic PNRs. Otherwise it makes testing and reproducibility a serious issue. Being able to set the seed for a simulation run is really important.
Right–these physical RNGs are used only for cryptographic purposes, where it’s really important that the number is unpredictable. But as you say, for Monte Carlo simulation and the like, unpredictability is a negative trait, and you want reproducibility instead. Modern PRNGs are perfectly fine for these latter purposes, where you want the basic properties of random numbers but still desire determinism.
In any case, programmers always have a choice as to which to use. C++ offers std::random_device, which gives you a physical RNG when available, telling you how much entropy is left in its pool. On the other hand, there’s also std::mersenne_twister, which is a high quality PRNG, excellent for Monte Carlo work, but useless for cryptography. If there is no physical RNG available, there are some ways to simulate one, by using other available entropy sources, such as the timestamps of the last hundred keystrokes. It still requires a whitening process, but with conservative assumptions it still makes a decent source for, say, generating encryption keys.
And as a small example of what I mean by “whitening”:
Suppose you have a coin which produces truly random, independent result, but which has an unknown bias to it. That is to say, it might flip 60% heads and 40% tails, but aside from that the flips are totally uncorrelated. How do you turn this into a flat distribution?
One method is to always flip the coin twice in a row, and keep only the HT and TH results, using the first of each flip as the result. If you got HH or TT, try again.
This works because the odds of those are p(1-p) and (1-p)p, which are of course the same number, regardless of the value of p. This would be a slow method if p were close to 1, say 0.999, but then, such a coin would not have much entropy to it in the first place, and you’d want to flip it many times to produce a good result.
Actual whitening algorithms are more sophisticated, but it’s the same basic idea. In some cases, you might want more random bits than the hardware can produce. In this case, you spread the entropy too thinly for true randomness, but the results are still quite good. In a way, it’s like a PRNG that gets re-seeded on a regular basis with a true random value. You could only predict short runs at most.
Setting aside the interesting discussion of the last few posts about RNG vs PRNG, even a long-cycle and hence practically non-repeating PRNG will still present a problem.
When we collapse the 64- or 128- or whatever-bit binary random number coming out of the PRNG to the algorithm’s required integer ranged [1-N] for smallish N, a lot of consecutive duplicates will be raised.
Or, for perhaps a more familiar example, the FreeCell program on your computer. It has 16 million puzzles to choose from, or some such, and if you call up your friend and say “Puzzle number 1768549 is a particularly difficult one; give it a try”, and they enter in that puzzle number, they’ll see exactly the same layout of cards that you do. The game isn’t storing 16 million different shuffles in a big data table somewhere: It has a shuffling algorithm of some sort that uses a pseudorandom number generator, and the ID number of the puzzle is the seed used by that PRNG.
When No Man’s Sky advertised that they had 18,446,744,073,709,551,616 different planets in their universe, that just meant their random seed was a 64-bit number, with 264 possible values.