Original Pure Math Question

Say you have a random bit generator, a flipped coin, for instance, which has a 0.5 chance of generating a 0, and a 0.5 chance of generating a 1. Now say you’re not satisfied with bits; you want to generate a single random integer from the set {0, 1, 2} such that any of the three possibilites has an equal chance of being generated. One way of doing this is to generate two random bits, and to act accordingly:

Two bits - result
00 - generate 0
01 - generate 1
10 - generate 2
11 - repeat

Do you see how that works? If you get a 11, you generate two more bits until you get a combination that isn’t 11, and go with that. Now then, for lack of a better term, let’s call this means of generating a random ternary digit Method Alpha. One problem with Method Alpha is that it’s possible that you’ll be generating bits for quite some time (IE you keep getting 1’s from your bit generator). However, that isn’t very likely. I went and did out the math for this, and if I did it right, the expected (average) value for the number of bits you’ll have to generate is 8/3. I’ll consider a Method more efficient than another Method if the expected value of the number of bits generated is less.

Question the First: Is there any Method of generating a single random ternary digit which is more efficient than Method Alpha?

Question the Second: Is there any Method of generating a single random ternary digit for which the possible number of bits generated has an upper limit?

Now, I’m betting the answers to those two questions are No and No, so I decided to make this more interesting. You can use Method Alpha to generate a random digit of base N, but you may have to generate more than two random bits at a time. For instance, for base 5, use the following table:

000 - generate 0
001 - generate 1
010 - generate 2
011 - generate 3
100 - generate 4
101 - repeat
110 - repeat
111 - repeat

The number of bits you have to generate at a time, k, is ceil(log[sub]2[/sub]N), where ceil represents the least-integer-greater-than-or-equal-to function. Empirically, I have found that the expected value of Method Alpha, for any given N, is k×2[sup]k[/sup]/N, where k is defined as above.

Question the Third: Is there any Method of generating a single random base N digit, where N is any prime number, that is more efficient than Method Alpha?

Thanks in advance for your many enlightening posts.

When does the professor need this? I’d like to do a bit of research.

I suspect the answers to all three questions are “no.” There are ways of generating sequences of ternary or whatever random digits which approach 100 percent (I believe) efficiency when generating many such digits. Look at the section on “Arithmetic Coding” in Numerical Recipes (2nd Ed.) for a solution (and a nice explanatory graphic) for the situation where there are five “digits”, and where each has a different probability of occurrance.

Even with this algorithm, I think the answer to Q-the-2 is still “no.”

Yes there is occaisionally a method more effecient than method alpha. There are probably a lot of situations where this works buyt here is the one I worked out.

Say you wanted to generate a random digit from 0-4 (0, 1, 2, 3, or 4)

Using method alpha:

000 = 0
001 = 1
010 = 2
011 = 3
100 = 4
101 = repeat
110 = repeat
111 = repeat

Average number of “flips” is around 4.8.

Using a modifidied version of method alpha:

flip four coins

0000 = 0
0001 = 0
0010 = 0
0011 = 1
0100 = 1
0101 = 1
0110 = 2
0111 = 2
1000 = 2
1001 = 3
1010 = 3
1011 = 3
1100 = 4
1101 = 4
1110 = 4
1111 = repeat

Average number of “flips” is less than 4.3.

This is due to the fact that even though you are flipping more coins, there is much less chance of having to repeat.
There is 5.3% chance of exceeding 9 flips with method alpha. There is only a 0.4% chance of exceeding 8 flips with the modified alpha.

ZenBeam, by 100% efficiency, I assume you mean that the expected value of “bits-per-N-base-digit” is log[sub]2[/sub]N, which would be the theoretical lower limit. And yes, you’re absolutely right. For a large number of digits, you get arbitrarily close to 100% efficiency. The easiest (probably not best) way to do this would be to make the bits you generate successive digits in a base-2, er, um, decimal (eg. .0110100010001111010010…) and converting that number to base N. Then you get a string of digits in base N, and the more digits you make, the closer you get to 100% efficiency. But that’s simple. This is one instance when a single iteration is more complicated than an arbitrarily large number of them.

Lance, that’s awesome! Do you have an algorithm for determining the ideal k for any given N? That is, the number of bits-at-a-time that will give you the most efficient algorithm in Method Lance? If not, I’ll try to derive one.

Also, I stipulated that N had to be prime, because I assume that the best way of generating an N-base digit for composite N would be to generate one for each of N’s prime factors, using the most efficient means possible. If that makes sense, I wonder if I was wrong in that assumption, thereby posing…

Question the Fourth: Is there an algorithm for generating a single random p×q-base digit, for which the expected value of bits is less than the sum of the expected values for generating a p-base digit and a q-base digit, for any p, q?

Given Lance Turbo’s answer, I can answer Question the Fourth immediately. Assuming, for the moment, that his method for Base 5 is optimal (and I don’t see how you could do better), then the algorithm for 15 is shorter than the sum of the algorithms for 5 and 3, since the alg for 15 is the same length as for 5.

I’ve been doing a little time wasting and here is what I’ve come up with.

If you wanted to generate a random number from 0 to 70 inclusive, (71 possible numbers)…

Method Alpha would have you flip 7 coins, you’d have to repeat 57 times out of every 128 attempts for an average of about 12.6 total flips per random number generated.

Method Lance would have you flip 8 coins, you’d have to repeat 43 times out of every 256 attempts for an average of about 9.6 total flips per random number generated.

Method I can’t believe I took the time to figure this out would have you flip 9 coins, you’d have to repeat only 15 times out of every 512 attempts for an average of under 9.3 total flips per random number generated.

I haven’t had any luck with an algoritm to determine the best method for a given number N.

Here are some results I’ve come up with. Maybe they’ll help someone develop an algorithm for determining the optimal procedure for generating a random integer in the range 0 to (p-1) by inspecting p.

If n is an integer such that 2[sup]n[/sup] >= p > 2[sup]n-1[/sup] and we flip n times, repeating if necessary, we have Method Alpha.

If we determine n for a given p, then using n+1 flips is Method Lance, which I will call here Method Beta. Similarly, flipping n+2 times is Method Gamma, n+3 flips is Method Delta, etc.

If we define 2[sup]n[/sup] = ap + b, a>=1, b>=0, b<p, then it is possible to derive expressions for the expected number of flips for any method.

Exp[sub]p,n[/sub] = (n * 2[sup]n[/sup]) /ap. This is the convenient computational form.

Substituting for ap and dividing by 2[sup]n[/sup] gives Exp = n/(1 - b/2[sup]n[/sup]).

This shows that the smaller the remainder b, ie, the closer p is to a power of 2, the smaller is Exp. When p = power of 2, Exp is of course n, since n flips generate all possibilities with no waste.

For primes < 100, Alpha is optimal for 3,7,11,13,23,29,31,43,47,53,59,61,63,89,97.

Beta is optimal for 5,19,37,41,79,83.

Gamma is optimal for 67,71,73.

For 17 use Beta or Gamma! Both give Exp = 7.53.

For 15 = 3 * 5 (Alpha and Beta) Gamma is optimal.

For 21 = 3 * 7 (Alpha and Alpha) Beta is optimal.

For 35 = 5 * 7 (Beta and Alpha) Gamma is optimal.

For 49 = 7 * 7 (Alpha and Alpha) Gamma is optimal.

For 85 = 5 * 17(Beta and Beta or Gamma) Beta is optimal.

Alpha is optimal for 1001,2001,4001,7001,8001.

Beta is optimal for 5001.

Gamma is optimal for 3001,6001,9001.

Finally, for p = 4097, Delta is optimal.

This kicks ass!

Several mistakes here I think. I’ll point 'em out, you let me know if I’m right.

15
Exp 15 method Alpha < Exp 15 method < Exp 5 Beta + Exp 3 Alpha

Best method for 15 would be straight up alpha.

21
Exp 21 method beta = Exp 3 alpha + Exp 7 alpha

35
Exp 5 beta + Exp 7 alpha < Exp 35 gamma

49
Exp 7 alpha + Exp 7 alpha < Exp 49 alpha < Exp 49 gamma

Good work though. We can use your equation to make a step by step process to determine the best method for generating a random base N digit.

This will come in handy the next time I’m on an airplane with 66 other people, and the plane is going down, and there is only one parachute, and all we have is loose change. I’ll be able to come to the rescue by assigning everyone a set of seven numbers from 1 to 512, and then flipping a coin 9 times to determine who gets the parachute. I’ll have to do this pretty quickly because the plane is going down, so I’m glad I know the most efficient method. If I get lucky I’ll generate my lucky number and have the right to the parachute. At this point however, I’ll likely be throw out of the plane sans parachute, because it will be obvious to the other passengers that my contrived method for efficiency was merely a ploy to cheat the other passengers.

Algorithm

The most effecient method of generating a base N digit.

  1. Solve 2[sup]n[/sup] >= N > 2[sup]n-1[/sup] for n.

  2. If 2[sup]n[/sup] = N, you’re done. Flip n coins once and you’ll always have the answer.

  3. Use the “jcgmoi kick ass equation” (JKAE) to determine the expected alpha method number of coin flips.

Exp[sub]a[/sub] = (n * 2[sup]n[/sup])/N

  1. If the expected number is less than n+1, proceed to step 8.

  2. Use the JKAE to determine the expected beta method number of flips.

2sup[/sup] > B*N where B is the greatest integer where in which the equation is true.

Exp[sub]b[/sub] = ((n+1) * 2sup[/sup])/(B*N)

  1. If Exp[sub]a[/sub] or Exp[sub]b[/sub] < n+2, proceed to step 8.

  2. Use the JKAE to determine Exp[sub]c[/sub]

You should be able to dervive the Exp[sub]c[/sub] equation based on info above. You should also be able to determine whether or not you need to go on to Exp[sub]d[/sub] and Exp[sub]e[/sub] etc. or go to step 8.

  1. For every set of factors for N determine most efficienct method for all factors and add this together. If this is less than the lowest of Exp[sub]a[/sub], Exp[sub]b[/sub], Exp[sub]c[/sub], etc. Than using the lowest total of factors is the ways to go. An example of this would be generating a base 30 digit. Exp[sub]a/sub is 5.333.
    Exp(2) + Exp[sub]a/sub + Exp[sub]b/sub = 7.933. However Exp(2) + Exp[sub]a/sub = 5.267. This is why it is important to examine all factor combinations.

There you have it.

Please point out the errors in this.

LT

There’s a better way for 5, and it generalizes. The key is to both stop as soon as you have enough information and to not throw any random numbers away.

The ‘alpha’ method treats 101,110, and 111 identically,
thereby throwing away a perfectly good trinary random number, while the ‘lance’ method doesn’t stop after 3, in spite of there being the possibility of doing so.

The best method for 5 is:

000=0
001=1
010=2
011=3
100=4
1010=0
1011=1
1100=2
1101=3
1110=4
1111=repeat

combining the stop-after-3 steps possibility of alpha and the high-probability stop-after-4 steps of method lance.

Average number of flips is less than 3.618.

Generically, the best way is to flip until you have >=n
equiprobable outcomes, then assign n of those to
numbers. Keep the remainder as a random number (in the
above case, a random number from 1-3) and flip until you
again have >n equiprobable outcomes. (one flip and you
have 6 equiprobable outcomes, so assign five to numbers
and one to continuing to flip.)

There was a thread on rec.puzzles a year or so ago
about using any combination of ‘D&D dice’ (d4,d6,d8,d10,
d12,d20) to simulate a d23, for example. That was more
complicated, but analogous to this problem.

Tim

Lance cited the equation I derived and said:

Thanks very much. I agree, with a fervor that can only be understood by one who has tried to figure some of those values by hand. You get a number, you check it, it doesn’t match, your fingers cramp, your eyes cross, you try again…you get the picture, I’m sure.

Lance said, in reference to several examples I offered:

Oh , you’re right. Two big errors on my part. First, I didn’t consider the factor problem when I calced those answers. Second, on looking back over my notes I see I didn’t even transcribe the correct optimal method for some of the values.

On to your algorithm. Here I have one big concern, and to me it’s key. You correctly note that if

(1) Exp[sub]p,n + k[/sub] < n + k + 1, you can go to step 8, since

   Exp[sub]p,n + k + 1[/sub]  &gt;= n + k + 1.

You just find the least value among the (k + 1) you’ve generated and then apply your factor logic.

But where is the guarantee that condition (1) will ever occur? Isn’t it conceivable that you could keep on generating new values for Exp without ever zeroing in to the extent (1) implies? Without a proof here your algorithm is floating.

Also, I suspect(hope) that a proof would show how to ‘look’ at the range p so you could reach more directly to values of a and b. By ‘looking’ I’m not sure what I mean but I’ve been thinking vaguely about such approaches as expressing p in binary or p’s complement from 2[sup]n[/sup] in binary or even using Euclid’s algorithm on a and b. If I come up with something I’ll be sure to post.

Answer the fourth is true for all p,q where neither p nor q is a power of 2. I’ll demonstrate with the smallest example, 9.

For a base 3 digit, I agree that method alpha is ideal.

For a base 9 digit:
After 4 throws, there is a 9/16 chance of generating the desired base 9 random number, else you generate a base 7 random number.
After 5 throws (one more throw), combine the base 7 number with the new base 2 number to give a base 14 number. This gives you a 9/14 chance of stopping, else you’re left with a base 5 number.
After 6 throws, use the base 5 and the new base 2 to generate a base 10 number. You have a 9/10 chance to stop, else start the whole process over.

To sum up:
P(stop after 4 throws or less): 9/16
P(stop after 5 throws or less): 27/32
P(stop after 6 throws or less): 63/64

Versus, for two 3’s:
P(stop after 4 throws or less): 9/16
P(stop after 5 throws or less): 18/32 (still 9/16)
P(stop after 6 throws or less): 54/64 (27/32)

(the two 3 case will never catch up with the above method for 9.)

Tim

Wow! I actually needed to know the answer to this for a commercial piece of software I’m writing :slight_smile: Now I don’t have to keeping generating random digits with Method Alpha.

Um, does that mean it’s not a pure math question? :slight_smile:

Thanks guys!

Arjuna34

You guys are clearly way better at this than I am. Thanks for taking this post so seriously. :slight_smile:

Lance, one minor note that I think will make your algorithm a little more efficient. I think you may have noticed it, but just didn’t say it. In Step 8 you say that “it is important to examine all factor combinations.” But I don’t think that’s totally true. Specifically, if there is a factor of 2 involved, you can eliminate it immediately. So, if I let the un-subscripted Exp() function denote the lowest value of the Exp function for any Method, then Exp(30) is necessarily Exp(15) + 1, and in general, Exp(2N) is necessarily Exp(N) + 1.

Quoth jcgmoi:

I think that you’re misreading Lance Turbo’s algorithm, though I’m not sure. I think he’s saying, to use your notation, that if:

(1) Exp[sub]p,n+j[/sub] < n + k + 1 for any 0 <= j <= k you can go to step 8.

Now, this condition will eventually be met, because, if nothing else, Exp[sub]p,n[/sub] has a set, finite value, which will eventually be exceeded by some k. Is that clear? I hope so.

And (Tim) I think has really hit on to the best, most efficient Method. I’m working on an algorithm to get the expected value of flips with his Method (I don’t even know what to call it). But in the meantime, I’ve managed to compute it for 5 and 9. I’m pretty sure that Exp(N) will always be rational, but so far, there doesn’t seem to be any pattern to it:

Exp(2) = 1
Exp(3) = 8/3
Exp(4) = 2
Exp(5) = 18/5 (By (Tim)'s Method)
Exp(6) = 11/3 (Exp(2) + Exp(3))
Exp(7) = 24/7 (By Method Alpha)
Exp(8) = 3
Exp(9) = 14/3 (By (Tim)'s Method)
Exp(10) = 23/5 (Exp(2) + Exp(5))

and so on. By the way, some of these things are a pain to work out. I guess you can sympathize, jcgmoi. Incidentally, in computing Exp’s, you’re bound to come across something like this:

Summation over n = 0…infinity of (n / x[sup]n[/sup])

Empirically, I’ve found that this equals x/(x-1)², and so while this formula is so blindingly simple, I’ve never seen it before. Having no idea how to prove it for the general case, I just wanted someone here to verify it for me before I assume it’s true for all x, and derive an incorrect formula from it.

I’m really impressed with these posts. SDMB’ers are the best. Thanks you guys.

PS: I think this stopped being a pure math question when Lance Turbo gave that oh-so-useful application involving the plane with the 67 passengers.

Nevermind about that summation formula question. I figured it out, and boy was it trivial, compared with all this other stuff. :rolleyes: Isn’t it great how inspiration seems to hit you out of nowhere?

I’ve been looking over (Tim)'s Method some more, and came to a rather interesting realization. (Tim) showed us examples of his Method for N = 5 and N = 9, which are fine, but they’re a little deceiving, and here’s why. If N = 5, then the possible number of flips, before having to repeat, is 3 or 4. If N = 9, it’s 4, 5, or 6. I tried in on N = 11, but it didn’t work out exactly the same. For N = 11, the possible number of flips before repeating are 4, 6, 7, 8, and 10. 5 and 9 simply will not show up. That’s kind of odd, I thought, but when I looked at how this happens, I saw something interesting. You all know that fractions are represented as repeating decimals, even in binary. So, if I wanted to write 1/3 in binary, I can write it like this:

0.01010101010101…

or, for the sake of brevity, like this:

0.01…

Look at what happens when you take 1/N:

1/5 = 0.0011…
1/9 = 0.000111…
1/11 = 0.0001011101…

If the jth decimal is a 1, then j appears as a possible number of flips. If the jth decimal is a 0, then j doesn’t. And the number of digits before the thing starts repeating is exactly the number of flips you do before you start repeating. When you think about it all, it makes a bit of sense. So, using this idea, I worked out a formula for Exp(N), but I had to make a couple of definitions:

Let d = d(N) = the number of decimals in the binary expansion of 1/N before it starts repeating. Eg: d(11) = 10.

Let f[sub]N/sub be defined for i = 1…d as the ith digit in the binary expansion of 1/N. Eg: f[sub]5/sub = f[sub]5/sub = 0, and f[sub]5/sub = f[sub]5/sub = 1.

Then, we have the following formula:

Exp(N) = (2[sup]d[/sup]N / (2[sup]d[/sup] - 1)[sup]2[/sup]) × Summation over i = 1…d of (f[sub]N/sub(d + (2[sup]d[/sup] - 1)i) / 2[sup]i[/sup])

That’s a little hard to read, so I made an image of it.

At any rate, I’m not very satisfied with this formula, if only because d(N) and f[sub]N/sub are not well-behaved functions, unless I’m missing something. So, if anyone sees any way to make this at all prettier, that’d be great. What fun this is turning out to be!

Well, Achernar, I really like the fractional expansion idea. I think it might be made to work. On the other hand, I had an idea on a somewhat different tangent:

(If you thought your equations were hard to read, watch me try it in pure text without subscripts or superscripts!)

Let k be a number of coin flips thus far.

Let represent the step function, the greatest integer less than or equal to x.

The maximum number of possible outcomes is 2^k. To maximize your probability of stopping, you’ll use up as many outcomes of size N as can fit in a number the size of 2^k.

The number of times N can fit into 2^k = [2^k / N].

The probability you’ve determined the random number on or before the kth flip is P(<=k) = [2^k/N]*N / 2^k

The probability that you determine the random number exactly on the kth flip

P(k) = P(<=k) - P(<=(k-1)) = [2^k/N]*N/2^N - [2^(k-1)/N]*N/2^(k-1)

Expected number of flips equals:

Summation k= 1->infinity of k*P(k)

Now, I haven’t done infinite series math in far too long, and the use of the step function makes this quite difficult. I think this is a well-defined summation, but I can’t simplify it.

Tim

I don’t have anything important to add, but…

The best parts of trying to solve this are:

  1. It makes my work day go by so much faster.

  2. None of my coworkers can possibly have any idea what I’m working on.

  3. It looks damn important with all the subscripts and superscripts and sigma notation.

  4. Therefore, it looks like I’m busting my ass working on something that is way too complicated to explain.

I might get employee of the month!!!

Tim, your idea is good in that it’s stated more deterministically than mine is, but I think if you look closely, you’ll see that our two formulas are not that different after all. I did one or two elementary manipulations on your formula, and got this just marginally less messy sucker here. You don’t have to know anything about infinite series to do it. Just basic Algebraic manipulations. Now, with this thing, take a look at the expression in the parentheses. If it’s not familiar, try working out a few terms for various values of k and N. I think you’ll see that it’s nothing other than f[sub]N/sub. While it’s not particularly apparent from your model, this is a periodic function, and if you take into account the fact that it’s periodic, let d equal its period, and cut down the summation so that it’s finite, then you’ll manage to wind up with my result, already posted. The one problem that I see with your formula (and if we could just get around this one thing…) is that while the step function is relatively well-behaved, some strange combinations of the step function, such as f[sub]N[/sub], are not, and so you can’t predict their behavior over long intervals, like, say, an infinite set of integers. And so, unless you recognize that it’s periodic, it’s impossible to use your formula actually to get a value, because of its infinite nature. Of course, I never even studied inifite series in that great of detail to begin with, so I could be missing something here, too.