I’ll admit that without doing the math my logic followed Sage Rat’s. Every hit increases the probability of future hits and every miss increases the probability of future misses. So the overall trend should approach either all hits or all misses - and ninety-eight attempts seems like it should be enough to establish one of those patterns. Given that the 99th ball was a hit, my ballpark estimate (pun intended) was that the percentage odds of the 100th ball being a hit would be in the high nineties.
But I didn’t work it out beyond this guesstimate. So I was surprised to see the real probability was much lower.
I will agree with TroutMan’s application and declare 2/3rds to be the winner.
It would appear that my intuition was correct - numbers near 1.0 are far more likely than numbers near 0.0 - but that distribution of probabilities is still gradual enough that it drags it down to 2/3rds.
Code and output follow:
import std.stdio;
import std.random;
void main() {
enum TEST_COUNT = 100_000;
uint totalMadeIt = 0;
uint totalRelevant = 0;
uint[double] distribution;
for (uint L = 1; L <= TEST_COUNT; L++) {
bool madeIt = false;
uint hits = 1;
uint misses = 1;
double odd = 0;
for (uint L2 = 1; L2 <= 10_000; L2++) {
auto result = dice(misses, hits);
if (result == 0) {
if (L2 == 9_999) {
goto NOT_RELEVANT;
}
++misses;
madeIt = false;
}
else if (result == 1) {
++hits;
madeIt = true;
}
else {
throw new Exception("The API does not operate according to how its documentation describes");
}
}
totalRelevant++;
if (madeIt) {
totalMadeIt++;
}
odd = cast(double)hits / cast(double)(hits + misses);
distribution[odd]++;
NOT_RELEVANT:
}
writefln("This is the answer: %s / %s = %s", totalMadeIt, totalRelevant, cast(double)totalMadeIt / cast(double)totalRelevant);
double[] allOdds = distribution.keys;
allOdds.sort;
foreach (odd; allOdds) {
writefln("%s -> %s", odd, distribution[odd]);
}
}
Conditioned on the 99th throw being a hit, the distribution of the hit proportion p up through the 99th throw is such that the probability of p taking on any particular possible value is precisely proportional to that value; specifically, the probability of the hit proportion being x [where x is one of the values 1/99, 2/99, …, 98/99] is x * 2/(99 - 1) = x/49.
I see that Sage Rat’s simulation actually outputs the resulting hit proportion up through the 102nd throw (the last of 100 throws after the guaranteed hit on throw #1 and guaranteed miss on throw #2), conditioned on the penultimate throw being a hit. Same thing: conditioned on the penultimate throw being a hit, the distribution of the hit proportion p up through the 102nd throw is such that the probability of p taking on any particular possible value is precisely proportional to that value; specifically, the probability of the hit proportion being x [where x is one of the values 1/102, 2/102, …, 101/102] is x * 2/(102 - 1) = x/50.5.
Thus, the values on the right at the end of his post are approximately the values on the left times 50153/50.5 [with jitter because the simulation is only a sampling, rather than an exact calculation].
Woops, I had changed the code to go to 10,000 instead of 100, just to verify that it was consistent regardless of anything. The output is from the 100 variant, but the code should be running L2 to 100 and the miss check should be for 99.
If that’s in response to my remark that this code runs through the 102nd throw (including guaranteed hit throw #1 and guaranteed miss throw #2), with the miss check on the 101st throw, that remark was in response to the fact that hits + misses = 10_000 + 2 = 102 at the end of your code’s run, not 100, since hits and misses are both initialized to 1 before any iterations of the inner loop.
(This can also be seen in the fact that the possible hit proportions output at the end were all fractions of the form whole number/102, rather than whole number/100. (The only missing such fraction in the relevant range is 1/102, which happened not to come up in this particular batch of runs of your simulation, though it would in general come up precisely once in every 5151 run-throughs))
On edit: Eh, I don’t think the last post was actually in response to that remark of mine. Still, I’ll let this post stand.
After N free throws, there are (N-1) possible combinations of successes. In other words, after 47 throws, the number of successes is somewhere between 1 and 46. Surprisingly, all have an equivalent probability. The odds of 3 successful free-throws = 1/46. The odds of 46 successful free throws is also 1/46.
Now, if the 48th throw is successful, it is more likely that the number of successes so far is high. In other words, if #48 is a HIT, the number of hits so far is probably better than 50-50. Hence, throw #49 is also more likely to be successful.
The answer is that at any point in the game (after three throws), the odds are that 2/3 of all consecutive throws will be the same; A hit has 2/3 probability of being followed by another hit; a miss has 2/3 probability of being followed by another miss.
I had a slightly different solution to the puzzle. I submitted it to 538.com last night, but I’ll admit that I’m not 100% sure about it, so if anyone can poke a hole in it, let me know.
[spoiler]Suppose that, at some point, the player has taken n shots and made m of them. His “internal probability” is therefore m/n. On average, how will this probability change after he takes shot n+1? The possible outcomes of this shot are that he makes the shot, in which case his new internal probability is (m+1)/(n+1); this happens with probability m/n. Or, he misses the shot, in which case his new internal probability is m/(n+1); this happens with probability (n-m)/n. The expected value of his internal probability after shot n+1 is therefore
(m/n)(m+1)/(n+1) + (n-m)/nm/(n+1) = (m[sup]2[/sup] + m + n m - m[sup]2[/sup])/(n(n+1)) = m/n.
In other words, each shot does not (on average) change the player’s internal probability. When the coach walks back into the gym, he should therefore expect that the player’s internal probability is still 1/2; shot #99 then brings the expected internal probability up to 2/3.[/spoiler]
Just using your simple-sounding explanation, I think your last sentence is wrong. If the coach walks into the gym and thinks that the player’s probability is still 1/2, and then sees shot #99 go in, the “obvious” answer is then that the probability would not be 50/99 or something like that.
I agree with MaxTheVool that your last step is fallacious (unless supported by further evidence than you noted). You are correct that, prior to the player taking the 99th shot, the coach has an expected value for the player’s “internal success probability” of 1/2. Which is also 2/4 and 3/6 and 4/8, and, most naturally expressed for the situation, 49/98. (And you are also correct in your implicit assertion that, at any time, the expected value of the “internal success probability” matches the probability of sinking the next shot). But why are we to say that, after the player sinking the 99th shot, the coach then revises their expected value for the player’s “internal success probability” to 2/3, rather than to 3/5 or 4/7 or 5/9 or, most naturally, to 50/99?
What we actually need to keep track of, to follow this kind of reasoning, is more than just the “expected value” (i.e., probabilistic mean value); we need to keep track of the full distribution. And what we find is that, prior to the player taking the 99th shot, the coach has an expected value for the player’s “internal success probability” which is uniformly distributed between 1/98, 2/98, 3/98, …, 97/98. Put another way, the coach has an “expected value” for the number of hits prior to the 99th shot which is uniformly distributed between 1, 2, …, 97. When the player sinks the 99th shot, these cease to be uniformly probable possibilities; each estimate for the “internal success probability” prior to the 99th shot becomes more or less probable in proportional to its value. So the new expected value for the number of hits prior to the 99th shot becomes the weighted average of 1 with weight proportional to 1, 2 with weight proportional to 2, etc., through 97 with weight proportional to 97. This weighted average (i.e., 1[sup]2[/sup] + 2[sup]2[/sup] + … + 97[sup]2[/sup] divided by 1 + 2 + … + 97) comes out to 65. So, including the 99th shot, the coach has an expected value of 65 + 1 = 66 shots sunk so far, and thus an expected value for the “internal success probability” of 66/99 = 2/3 (and, as noted before, the expected value of the “internal success probability” will indeed be the probability of success on the next shot).
And I’ll continue to note that recognizing the order-invariance of the relevant probability distribution is in many ways the simplest yet most powerful approach, allowing us to easily tackle questions like “Given that the player made shots #5 and #20 and missed shot #9, how probable is it that they made shot #4 and missed shot #13 and will miss shot #28?”.
(Answer: this is equivalent to asking “Given that the player made the first two shots and then missed the next one (after the initial fixed make and miss), how probable is it that they make the next shot and then miss the two after that?”. The answer to which is 3/5 * 2/6 * 3/7.)