In simple terms, what is probability distribution? Extra Q: Determining probabilities

Can you explain what the heck probability distribution is in simple language? I know most mathematical symbols and notation but as soon as you start talking about differential calculus (at least in the technical numeric form) you’ve lost me.

Namely, what does probability distribution do? Is it similiar to significance testing (i.e. like the t-test - can it help establish relationships etc.?)? Why do we use it?

And while we’re on the subject, I get that when we have to calculate the odds of one event or another occuring, that the probabilities of each are added together e.g. a fair coin being tossed and returned as heads or tails has a probability of 1 because in effect either heads or tails being returned is 50/50 or 1/2. Therefore

1/2 + 1/2 = 1.

Which I get because you would add the probabilities together, as both events could occur (obviously not together though) in theory.

However why is it that when we have two independant events occuring e.g. two different fair coins being tossed and returned as heads or tails, that in order to determine the probability of getting heads with coin 1 and heads with coin 2, you would multiply the odds together? i.e.

1/2 (likelihood of obtaining heads with coin 1) x 1/2 (likelihood of obtaining heads with coin 2) = 1/4

How does this work and why does it work?

I’m not sure exactly what you are referring to by probability distribution. The simplest answer is that it is the distribution of probabilities of results when not all are equally probable. For instance, when you roll two dice, there are more ways of rolling a seven than a two, so if you plot the odds of getting each number, you will see a curve. Different types of curves have different names, but that’s it at the most basic. I apologize if this is too basic. It has nothing to do with significance testing.

For your second question, when there are several possible outcomes of one experiment, such as the number you roll on one die, you add. If you are talking about combinations of results in two or more trials
you multiply. If you sum all the outcomes of the first trial, you’ll get one, ditto for the second trial. If you added these independent results you’d get greater than one - multiplying gets you 1. Since when you sum all possible outcomes you need to get 1, you have to multiply.

To try to be simple and brief about it, a probability distribution is basically a list of all possible outcomes (usually numerical) of a probability experiment, together with the probability of each one. For example, if your experiment is flipping a (fair) coin, the two possible outcomes are Heads and Tails, each with a probability of 1/2. Or if your experiment is rolling a pair of (fair six-sided) dice and looking at the total, the possible outcomes are the whole numbers 2 through 12, with various probabilities (1/36 to roll a 2, 5/36 to roll an 8, etc.).

Probability distributions can be discrete or continuous. In a discrete probability distribution, there are only finitely many possible outcomes, so you can list them all separately (as in the coin and dice examples). With a continuous probability distribution, you can get any number anywhere along a whole continuum, or range of values–for example, if you randomly select a person to see how tall they are, or how much they weigh. Since the result is not necessarily a whole number (depending on how precisely you measure), there’s no way to individually list all the possibilities.

One way to answer your last question might be to say that, if you flip a pair of coins over and over again a whole bunch of times, the first coin will come up heads about 1/2 of those times, and the second coin will also come up heads about 1/2 of those times, which is 1/2 x 1/2 = 1/4 of the total number of flips.

Or another way to look at it is to point out that there are 2 possible outcomes for the first coin, and 2 possible outcomes for the second coin, so there are 2 x 2 = 4 possible outcomes for how the first and second coins together could come up. (They are HH, HT, TH, TT.) So the probability of any one of these (i.e. HH) is 1/4.

Hope this helps.

Thudlow Boink’s got a good description of distribution. But in the mention of discrete vs. continuous one point needs correction.

A discrete distribution does not need to have finitely many outcomes. It’s just that the value the outcomes take on are separate.

As a continuous distribution, take for examples the distance a baseball is hit (measured to infinite precision) or the amount of time between raindrops hitting you on the head (again, measured to infinite precision). These are random values, and given the precision, there are an infinite number of outcomes all along a line.

For a infinite but discrete distribution, consider the time between raindrops measured to the nearest second only, or baseballs hit to the nearest foot.

In mathematical terms, the number of outcomes for a discrete distribution are finite or countably infinite, while in a continuous distribution they are uncountably infinite.
Before this thread gets hijacked to St. Petersburg :

Note that the distribution must cover all outcomes. Since probabilities are said to be a fraction of 1, the sum of all the probabilities in the distribution must be 1[sup]*[/sup]. If it wasn’t, there would be some probability p’, which would be the chance that your result was not any of the already included outcomes. That itself is a unique outcome (with probability p’) and becomes part of the distribution.

[sup]*[/sup] For continuous distributions, the integral of the function = 1 over all values.

In the most abstract sense, a probability distribution is any function p defined on subsets of a set S which satisfies the Kolmogorov axioms.

The Kolmogorov axioms can be paraphrased like this:

  1. For any subset E, 0 < p(E) < 1.
  2. If you take every E which is a subset of S, compute p(E), and add them together, you get 1.
  3. If A and B are subsets of S which have no elements in common, p(A U B) = p(A) + p(B), and this relationship can be extended to any finite number of subsets.

Actually, axiom 3 can be generalized, but if you don’t like differential calculus, you wouldn’t like that either.