Predicting central tendency and standard dev from loaded dice?

I scraped by with a C in grad level stats, so I’ve got a feeling the answer to this may be over my head. But here goes anyway…

Let’s say I’m rolling n 6-sided dice. These aren’t fair dice; all n are loaded in the same way. Loading would be, for example, like this:

1=>1, 2=>5, 3=>1, 4=>1, 5=>1, 6=>1
This would roll a 2 half of the time and have an even distribution of 1, 3, 4, 5, and 6.

With these parameters, I know how to work out the curve in two ways:

  1. Make a table of all possible outcomes, and then count 'em up. Quickly becomes unwieldy with too many parameters.
  2. Actually roll based on these parameters a large number of times, say 100,000. Accurate results unless I’m looking for a really weird skew, but my computer’s already about to revolt.

I’m looking for a third option… which would be to enter the numbers into a function which would return central tendency and standard dev. Any ideas or anything I can read up on would be much appreciated.

Thx!

I don’t see why you can’t just use the standard formulae for expectation value and standard deviation. For your example, the expected value received from one die roll is

11/10 + 21/2 + 31/10 + 41/10 + 51/10 + 61/10 = 29/10

and the expected value from n dice rolls is then 29/10**n*. The standard deviation for one die roll can be determined pretty much the same way:

sigma[sup]2[/sup] = (1 - 29/10)[sup]2[/sup] * 1/10 + (2 - 29/10)[sup]2[/sup] * 1/2 + (3 - 29/10)[sup]2[/sup] * 1/10 + (4 - 29/10)[sup]2[/sup] * 1/10 + (5 - 29/10)[sup]2[/sup] * 1/10 + (6 - 29/10)[sup]2[/sup] * 1/10 = 229/100

Note, though it’s the squares of the deviations that add up when you do n rolls: if the variance for one die roll is sigma[sup]2[/sup], the variance for n dice rolls is *nsigma[sup]2[/sup] (not (*nsigma[sup]2[/sup].)

The central tendancy is easy. Assuming you want the mean of a single die (since mean is the measure of central density most often associated with standard deviation), it’s just the sum over sides of (the probability of landing on a given side*the value of that side). So for your example, your probabilities are .1 each of 1, 3, 4, 5, or 6, and .5 of 2, for a mean of 1.9 + 1.0, or 2.9 (a bit less than a standard die’s mean of 3.5, since this one is biased towards a lower number). If you want the mean of the sum of multiple die throws, you just add the means together. So roll ten of these things and add them all together, the mean will be 29.

For standard deviation, it gets a little bit uglier. First, we need to find the mean, as above. Then, for each side, we find the difference of its value from the mean, and square it. Then we multiply that result by the probability of that side, and add all those up. This gives us something called the variance, and the standard deviation is the square root of the variance. In mathematical terms,

variance = Sum(i=1…6) (i-mean)^2 * P(i)

(where P(i) is the probability of side i coming up) In this case, I get a variance of 2.65 (compared to a normal die’s variance of 2.916; lower because they’re more clumped around one value).

If you’re rolling multiple dice and adding (or subtracting) them, then the variances (not the means) all add up. And then once you’ve found the variance for the whole thing, then you can take the square root to find the standard deviation for the whole thing. So for instance, if you’re rolling 10 of these 2-biased dice, you’ll get a mean of 29, a variance of 26.5, and a standard deviation of 5.1478… And when you roll a lot of dice (in practice, it only needs to be about 3 or 4 of them) and add them together, you get something that’s very close to a Gaussian distribution. So your final distribution will be very close to a Gaussian at 29 ± 5.1478 .

Oh, not relevant to what you asked, but a useful fact anyway: If you have a fair die, then you can do some considerable simplifications on the standard deviation formula. A fair die of s sides has a variance of (s^2 - 1)/12 . Thus, for instance, a 6-sided die has a variance of 35/12, and a 20-sided die has a variance of 399/12.

That’s supposed to be n*sigma in the parentheses, right?

Gah, yes — I omitted a closing parenthesis. It should read: “if the variance for one die roll is sigma[sup]2[/sup], the variance for n dice rolls is *nsigma[sup]2[/sup] (not (*nsigma)[sup]2[/sup].)”