Lets say you have a random letter generator, creating the letters A, B, or C infinitely. In this infinitely long string, what is the probability of getting less than 10 C’s?
It seems clear that the probability of getting any single number of C’s - either 4 C’s or 5 C’s or 100 C’s - is zero. In addition, the probability of getting infinite C’s is zero. However, the probability of getting a subset of an infinite set seems as though it should be calculable.
For example, here’s a subset of an infinite set that is calculable: Assuming that IQ is normally distributed, the probability that someone has an IQ between -1 and 1 standard deviations from the mean is ~68.27. However, since IQ is normally distributed it goes on infinitely in both directions - so the probability of a person having an IQ equal to the mean is 1/∞, or zero.
How would one calculate this on my random letter generator?
Bit of a complicated question - please ask if it needs more clarification.
For n > 10, the probability of getting less than 10 Cs in a string of length n is the sum of [sub]n[/sub]C[sub]k/sub[sup]k/sup[sup]n - k[/sup] for 0 < k < 9. As far as I know there’s no simple closed form for that, but the probability you’re looking for is the limit of that value as n goes to infinity, which Maple tells me is 0.
Are you referring to your question about the probability of being at the mean in a normal distribution? I’m afraid that you are confusing a discrete distribution like the binomial (the basis for the formula you see) and a continuous distribution like the normal. You compute cumulative probabilities in a discrete distribution by summing, but you have to integrate to obtain cumulative probabilities in a continuous distribution. When your range of integration is over a single value (like the mean in this instance), the result will be zero.
BTW, you can solve your original question by using the normal distribution. Since the probability of getting a C is 1/3, for a sufficiently large number of trials of your random number generating, you can use the normal distribution to approximate the binomial. But as in the binomial formulation, as the number of trials approach infinity, the probability in question will approach zero.
This is the basis of intuition for calculus where an infinite sum of infinitely small, finite numbers sums to a finite number.
Imagine taking the area of a circle by slicing it into progressively smaller slices. As the slices approach infinity, the area of each slice approaches 0 but the total area still remains the same number.
More precisely, for any finite N the probability of getting more than N C’s is one. The probability of getting an infinite number of C’s is not really a well stated question since infinity is not a number.
For example in a sample of N the probability of getting between 0.4N and 0.5N C’s can be calculated. This probability drops to zero as N gets large (as the distribution of the number of C’s centers around N/3 more and more tightly. Now as N gets large all numbers between 0.4N and 0.5N go to infinity so the probability of getting that “particular infinity” (or group of infinity if you prefer) also goes to zero.
I’d like to point out that the OP didn’t specify the distribution of A, B, and C. One perfectly valid way to select from that set is to have a 60% chance of A, a 40% chance of B, and a 0% chance of C. In that case, you’d be sure to get no C at all in your output.
However, this doesn’t actually matter, for the specific problem as posed. Given an infinite string of letters, and any distribution at all on the alphabet of available letters, and any finite number N > 0, the probability will be exactly 0 that you will get exactly N Cs. Either each letter has a 0 chance to be C, in which case you’ll get 0 Cs, or each letter has some chance greater than 0 of being a C, in which case you’ll get an infinite number of Cs. The only way you’ll ever get a finite but nonzero number of Cs is either through an infinitely-improbable event, or if the letters are not independent (which would violate the common understanding of “random”).
If N is the number of Cs in a string, then the event where you get an infinite number of Cs is the complement of (N = 0) [symbol]È[/symbol] (N = 1) [symbol]È[/symbol] (N = 2) [symbol]È[/symbol] (N = 3) [symbol]È[/symbol] … In this case, the number of Cs is countably infinite, and every single statistician I’d ever known would use the terminology I did.
There’s no such thing as an infinitely improbable event. The minimum probability for any event is 0, and there are no infinitesimal probabilities (at least in standard probability theory). Furthermore, unless some letter in the alphabet has probability 1, every infinitely long string over that alphabet has probability 0.
I thought that “infinitely improbable” was just a synonym for “probability 0”. And while any given infinite string has probability 0, we can categorize the strings, and assign (potentially) nonzero probabilities to the categories.
Right, I understand that the normal distribution is a continuous distribution. I don’t understand why a subset of that distribution, the range from -1 standard deviation to 1 standard deviation, has a probability that is not zero even though the normal distribution extends infinitely in both directions and is continuous.
What is the difference between a normal distribution and my random letter generator in terms of the probability of a given subset? I would assume we can all agree that a section of a normal distribution has a non-zero probability, as long as that section covers a range of values. This apparently is not true in my infinite string.
Because even though it extends infinitely in both directions, the tails are sufficiently skinny, or more to the point, get even skinnier fast enough, that the area under the tails is finite and less than 1. Just because the bounds of an integral are infinite doesn’t mean that the result of the integral is infinite. In fact, the integral must be finite, for it to be a distribution, at all.