binomial distribution

I am trying to learn a little statistics, just for fun.

The equation related to binomial distribution is:

b(x; n, P) = nCx * Px * (1 - P)n - x

The nCx has me stumped. Where does it come from, and is anything really multiplied by it, or is it some type of convention just to make the equation less confusing?

/

It’s shorthand for “n choose x”, which counts up all the possible combinations of events giving x successes. The same term can be written more explicitly using factorials: nCx = n! / ( k! * ( n - k )!)

I.e., in the simplest case, where you flip two coins (n = 2 and x = 1) you can get one head from HT or TH, so there are two possible ways to get exactly one head.

Do you mean just the notation? “nCx” is an older way of writing the binomial coefficient, meaning “number of ways to choose x objects from a collection of n”.

To see why the factor is there, consider a specific sequence of trials, where “pass” has probability p, and “fail” has probability (1-p). Say (lets set n = 10 for concreteness):

pass-fail-fail-pass-pass-pass-fail-fail-fail-pass

It is easy to calculate the probability of this specific sequence of passes and fails. You just multiply by p for every pass, and (1-p) for every fail. In this case:

P(that sequence) = p * (1-p) * (1-p) * p * p * p * (1-p) * (1-p) * (1-p) * p = p^5*(1-p)^5

But the objective is not to find the probability of that specific sequence, but to find the probability of any sequence that has that number of passes (and hence that number of fails). To find this, we note two things:

(1) The probability of any specific sequence with 5 passes and 5 fails is exactly the same. I.e. the probability does not depend on the specific arrangement of passes and fails.
(2) There are 10C5 possible sequences with 5 passes and 5 fails. I.e. there are, by definition, 10C5 ways to “pick” the passes, after which the rest are fails.

Because of (1) to get the probability of all sequences with 5 passes and 5 fails, we can multiply the probability of any sequence with 5 passes and 5 fails by the number of such sequences (which is 10C5, by (2) ) to get the answer, which is the original result.

Yes; it denotes the number of combinations (as in “permutations and combinations”): the number of ways to choose x items, irrespective of order, if you have n to choose from. A different notation for the same thing is to write the n above the x (no fraction bar) inside parentheses, which is the way the formula is written in the Wikipedia article.

leahcim did a good job explaining why the formula works. If you’re not familiar with permutations and combinations, you probably ought to study up on them before trying to understand the binomial distribution; see here or here for a basic explanation.

ask the Professor. I understand his paper on this topic had a lengthy European vogue.