I’m not sure where infinity comes into this, actually. If infinity is a real problem, we can use a finite maximum number of humans possible. Humans take about 10^28 protons to make (within an order of magnitude), so a universe made up entirely of humans could have (about) 10^51 humans. The universe will have stars until about 10^14 years from now (if it continues to expand), so that’s about 10^12 humans lifetimes. So at a very rough estimate, there can’t be more humans in the life of the universe than 10^63. Huge (and outrageously big), but not infinite).
You’re absolutely correct and it’s funny you used protons, because I was thinking the same. At our current rate of technological progress, it’s hard to see how we find a way to live without protons. Their estimated lifespan is on the large side, but it’s finite. Someday the protons will go away, and we can’t keep existing after that. Most likely we have the same issue with stars.
So from that perspective alone, there’s no “infinity problem”. It’s true you can’t divide by infinity, but that’s only an issue for calculating simple probability on a completely blind random set with no priors. That’s the weakest possible assumption here. Imagining human population as a sequence is perfectly valid. That means it’s best understood as a random continuous sequence. Furthermore we have some prior information vis-a-vis total number of human birth, which means our models can consider priors.
Simple probability can’t exploit any of that information, but Bayesian probability was made for things like that. The denominators cancel out in that equation, so it doesn’t matter if they go to infinity. And again that’s not some weird innovation invented to predict the end of the world, it’s been known for centuries and is routinely used to make forecasts with everyday real-world importance.
Thank you for taking the time. I appreciate it. In particular, the explanation of terms and why they apply.
Infinity matters because counting is one way. This is the reason this formula does not work.
You have counted 100 of something. You may not be finished. According to this formula, there is a 50% probability that the actual count is 133 to 400. Now let’s walk through this. If you have counted to 100, 100 is a valid part of the solution set of all numbers 100 and above, up to infinity. 100 may be part of a solution to all of these counts.
This formula is big on picking the numbers it does because it wants the count to fall in the “middle”, the 25-75% range. Let’s look at those ranges. Let’s call anything above 75% a Very Big number, 51-75% a Big number, 26-50% a Small number, and 25% and below a Very Small number.
At first 100 is the biggest number of all. But as you start walking up the solution sets, 100 falls out of the Very Big range at 133. So there are 33 solutions where 100 is a Very Big number.
Once you get to 200, 100 isn’t even an above average number. There are 67 sets where 100 is a Big number. Bell curve maybe?
It takes up to 400 for 100 to move into the Very Small category. So it has 200 sets where it is a Small number. No bell curve. In fact, 100 spends twice the time as a Small number as it does a Very Big and Big number combined.
100 will spend the rest of eternity, forever, as a Very Small number. And it is a valid part of the answer to ALL of these solution sets. It takes until 1,000 for 100 to fall into the bottom 10% of all numbers. Again, the journey is slowing, 300 to get from 100% to 25% and 600 to get from 25% to 10%. 9,000 more to get to 1%, and it stays there forever. Since we have infinity as an option, multiplying these numbers doesn’t change anything. Any defined number will spend more time as a Very Small number, even in the bottom 1%, than it does anywhere else.
So that’s why the 25% to 75% range is fallacious. Because we have a data point of the portion of the count, we are looking for a solution of a total, and the instances where our data point is “too big” are far fewer than where it is “too small.”
This is wrong because you have misplaced the observer here. The observer is not “100”. What is the observer? It’s expressed in this statement: “one you get to 200”. Calcuations are only valid when you get to the 200 observation point. It makes no sense to anchor the journey solely from perspective of some number in the past, because this is Bayesian inference: it considers both observations of prior probability as well as the most up-to-date observation.
Wrong, again, for the 10th time.
The forumula doesn’t “pick” what it “wants”. This is Bayesian inference, which is a well-established and uncontroversial mathematical model that’s reliably used for centuries in real-world applications. The 50% midpoint is the peak probability for a random continuous sequence. You can read the formal mathematic treatment here to understand why nothing is being “picked”.
If you don’t want to engage with understanding Bayesian inference, the principle of mediocrity can help you understand what’s going on. I’m indifferent which one you pick, but you seem to be consistently sidestepping the Bayesian aspect because it would contradict your intuition. I would encourage you: walk through the Bayesian inference, or simply embrace mediocrity. Your choice!
There are no bell curves anywhere here. This curve is a symmetric beta distribution because it’s derived from Bayesian inference. It peaks at 50% distribution because this is the point in a sequence. Probabilities are distributed from the observer’s point of view. Once the observer moves from 100 to 200, everything changes, because we are considering new information about the sequence. You can’t do that with simple probability. This is why we have Bayes inference!
No. The 25%-75% range is derived from Bayesian calculation. You believe infinity is a problem because you’ve chosen the wrong probability model, you’ve assumed the wrong probability distribution, you’re applying the wrong point of view, and you’re using the wrong calculation.
The vulnerable point of the “Doomsday Argument” isn’t the math. Its main problems are:
- confidence in the priors (are we sure there have been 100 billion humans, that we’re adding 134mm, are we really 200k years old)
- intuition of the theory: what does “50% probabilty in 760 years mean in real terms, do we reach exactly 0 births in the year 2784, and is that the exact year? and you’re telling me it could be a lot sooner?” Not super accessible, not obvious how to use it, likely for laymen to mistake certainty for probability, and anchor on a certain endpoint.
- resistance to implications: if you look closer at the distribution, the probability that it’s earlier than 760 years is disturbingly high. It begins to overlap with a lot of plausibly calculable risks (climate change, giant asteroid, nuclear war, runaway climate change, etc). When Bayesian inference agrees with other models based on actual physical evidence, it’s… disturbingly stronger in this case.
But we shouldn’t be repeating the falsehood that Bayes theorem “assumes 50%”. If you understand Bayes theorem, then you understand why the distribution peaks at 50%.
If you don’t understand Bayes theorem, then you have to either assume the midpoint at 50%, or you have to assume that Bayes was wrong (which will be news to numberous multi-billion dollar industries that depend on it). Whichever one you pick, that’s a “you” problem, not an issue with anyone’s math.
I don’t think we know that? As far as I know, proton decay is theoretical at this point, and I rather get the impression that the theories which predicted it have rather fallen out of favour recently?
Your theories are inapplicable to counting problems because you don’t understand the concept of undefined upper bounds.
I invite you to conduct an experiment in counting. Take two friends. The first friend picks something at random for you to count. Now there are infinite things for you to count, so this could be difficult. And there’s the issue of time. If you’re counting Fran Dreschers that appeared on the sitcom The Nanny, your count will be very short. If you’re counting the number of water droplets in Lake Superior, it’s going to take a lot longer. But none of you know this, you or your two friends. You know nothing about what you are actually counting.
So let’s assume the first friend can pick something to count out of infinity. The task for the second friend is to pick out a random point for you to stop your count. This second task is impossible for your friend to complete. Because you can’t pick a random number from 1 to ?. It’s impossible. It’s impossible to pick a random number, or the middle, when your endpoint is undefined.
If you want to count something, you need to count it. Period. If you have no other definition. There are many areas where people know something about what they are counting, election estimates etc., but those require knowledge about what you are counting.
There are ways to guess the number of humans that will ever live, and they might be close. But those require knowledge about what humans actually are, their environment, etc. This number trick will never work on a blind count because of the unknown upper bound issue.
I would rather not. I would rather see your mathematical work that the situation you’re proposing. Then we can be sure you’re not overfitting your thought experiment to the types of calculations you understand.
“I would rather not.” LOL.
Another problem is that knowing nothing of the upper bound, a “stop counting” order can easily take place after the counting has long since completed.
I want to know how many people were in the rock group The Beatles. I don’t really have any insight into the answer, so I want you to stop counting at 1,000 and we can try to apply this formula you keep talking about. So I come back at 1,000 and you’ve finished your count a long time ago. I could have issued a stop order at 2, but I don’t really know anything about how many there were. For me to issue a stop order at 2, I have to actually know something about the endpoint. I don’t. So I picked a random number, and it was well after the count had concluded. It’s always possible to pick a higher number than is within the actual determined range, that’s the way numbers work. They are infinite. The count is still valid and it’s an actual count, not an estimate just because I happened to pick a stop point before the count was completed.
I also don’t know of a lot of formulas that discount earlier observations. Why have the stopping point at 1,000,000 instead of 10,000? Either are valid at the time they occur. How does the person counting the number 10,000 know that they are such a low number, so early in the count, when they are using the same formula and it is telling them that there will be 3,333 to 30,000 with a 50% confidence level?
sigh
THIS is why the Empire banished those nerds to Terminus!
The show appears to differ from the book a bit in that Seldon (or his AI ghost) seems to be taking a more active role in collapsing the Empire so something better can form. Show Seldon also seems to rely more on “chosen ones” and predicting the actions of individuals than the book, which seems to go against the whole fucking point of “psychohistory”!
I’ve done a bit of data science work over the course of my career and one of the core tenets of this sort of thing is being able to backtest your model so you can see how well it’s predicting stuff. Not make it so complicated that no human in the universe can reverse engineer it so everyone just assumes it is “correct”.
Actually, it works of you do the Berlin Wall thing. If you test for each year between 1961 and 1989, does the Berlin Wall come down between the (current year + 1/3* the current year) and (current year + 3* current year) knowing the actual outcome, for the middle 14 out of 28 years (50%) the formula works.
The problem is if you are trying to predict the end of humanity or Berlin Wall collapses, you don’t actually KNOW if you are in the middle third. you can assume you are, but you only have a 50/50 chance of being correct.

I want to know how many people were in the rock group The Beatles. I don’t really have any insight into the answer, so I want you to stop counting at 1,000 and we can try to apply this formula you keep talking about.
Why would you try to apply Bayesian inference to a counting problem with no priors? That makes no sense at all. Of course it fails, that’s the wrong use of the tool.
But! Could we use it to talk about the duration of the Beatles? Sure. It’s a sequence. We have priors. Let’s see what happens in different formulations of that:
First, the “Berlin Wall” formulation. I observe the Beatles in 1966. The Beatles are 6 years old. There’s a 50% chance that the Beatles will have 0 members in the next 2-18 years. What was the actual outcome? They publicly broke up in 1970, legally dissolved in 1974. Wow, both those outcomes fell in the range, what are the odds of that? 50%.
Second, the “Doomsday Formulation”. I observe the Beatles in 1966. By this time, the highest number in the Beatle sequence is 6. Bayesian inference suggests there’s a 50% chance the upper growth bound will be 12 Beatles. Since the Beatles were founded in 1960, the Beatle growth rate is on average 1 per year. So there’s a 50% chance that the Beatle growth rate becomes 0 by 1972.
When did the Beatle growth rate officially become zero? It’s complicated, there are actually multiple good answers for this.
If you strictly limit to the “Fab 4” lineup, that was finalized by 1962, so it was already 0 growth by 1966. This didn’t fall in the forecasted range. But was that really the end?
There are 3 different people who are commonly called the “fifth Beatle”. One of these was Billy Preston (actually he was the 3rd of the “fifth Beatles”, and the 7th of all known Beatles). Was he really a Beatle?
He appears “Let It Be”, both the album, iconic rooftop concert, and the feature film “Let It Be” of that concert as an instrumentalist, playing both accompanying and solo parts. He’s a good fit for the definition of a Beatle, meaning the last Beatle was added in 1969. This forecast is also in range, only 3 years off the midpoint.
Another definition of “end of Beatles” is when they officially became finite. In 1966, they could have conceivably continued growing (they sort of did). But there was a point when we learned the growth rate was officially finite. Again that was either 1970 (_de facto)) or 1974 (de jure) when we formally knew there would be no more Beatles. Approximating that by averaging them, we get an average official end date of 1972. Our forecast? What do you know, it was 1972. Bullseye.
Another formulation: In 1966 the average age of living Beatles was about 24 years. In what time range is there a 50% that the the average Beatle will be dead? That would be 8-72 years. What actually happened? 14 years later they dropped to 75% living. 34 years later they dropped to 50%. Half of 72 is… 36. Only 2 years off from 34. That’s only 5.8% off, it’s a good forecast.
So most of these Bayesian forecasts were close to right, or exactly right. We had one failure. But failed trials in probability don’t invalidate the model. You can’t say a 50% forecast is “wrong” in the 50% times it doesn’t hit. If you flip a coin 10 times and it comes up 100% heads, that doesn’t mean 50% was a bad forecast! You witnessed a 1/1024 event, which is lucky, but not like lottery lucky. Far rarer things happen around you every day.
The only way you can say a model is wrong is if it doesn’t converge. We can’t re-run the experiment in all Beatle-populated universes. But we did test 4 different definitions of the problem. In 3/4 of those scenarios, the actual outcome was either exact or close. 75% is not a bad fit at all.
Honestly that surprised me somewhat. I thought a sample sequence of 7 might be too small. But it worked surprisingly well.

Actually, it works of you do the Berlin Wall thing. If you test for each year between 1961 and 1989, does the Berlin Wall come down between the (current year + 1/3* the current year) and (current year + 3* current year) knowing the actual outcome, for the middle 14 out of 28 years (50%) the formula works.
The problem is if you are trying to predict the end of humanity or Berlin Wall collapses, you don’t actually KNOW if you are in the middle third. you can assume you are, but you only have a 50/50 chance of being correct.
Actually, that prediction is impossible with this formula.
The formula only works backward looking. If you already know the answer, then you can look backward. You can say that “for the answer 200, the formula works 50% of the time for all preceding numbers.” That is true for any number.
Here’s what does not work. You cannot count 50 of something and know that 50% of the time your total will come to between 75 and 200. That’s the part that isn’t true. This formula is backward looking, but it’s absolutely useless as a forecaster. Because even counts that go into the trillions and quadrillions are at 50 at one point. To forecast, you need to know the distribution of counted things. How many counts actually wind up between 75 and 200. Which is unknowable. This formula cannot forecast.

The formula only works backward looking. If you already know the answer, then you can look backward.
This doesn’t even have enough meaning to qualify as wrong. It’s incoherent.
This formula cannot forecast.
The entire field of mathematics will love to read your paper about how Bayesian probability is wrong.
Well, this is again cherry picking. Do the same for The Stones. Or Eagles, or Bob Dylan, or any number of bands still touring today that started in the 50’s or 60’s. The formula doesn’t start to become correct until the bands are already older than anyone would have predicted they’d last when they formed.
The problem that keeps cropping up is that the formula, while correct and useful, has very little predictive power and so as soon as you try to apply it to,something known, it’s often more accurate to just forecast based on what you know.
For instance, if the Stones as a band are 60 years old, how long would you expect them to continue playing? The formula says there’s a 50% chance of them playing for another 20-180 years.
I’d use better information and say that there’s a 95% chance that the actual number is lower than the lower bound of the formula’s answer. I know the Stones won’t last another 50 years because all the members will be dead. It likely won’t last another 10 years, despite having been around for 60.
The predictive power of this formula is low because it deals with completely unknown things, and there’s just very little information to go on. How long the thing has lasted is just about all you’ve got, and the formula is the best you can do in that situation. But the best you can do is still really marginal.
Or put another way: If I told you there were three unknown things behind a wall and asked you to guess how long they will last, all you could do is pull numbers out of thin air and hope.
But if I tell you how long each one has already existed, you can now use the formula to estimate, and it will crush the random guesses. You’d still be wrong a lot, but if we play that competition out 100 times, I’d bet the person with the formula and knowledge of how old each object already is will win a large majority of the time.
However, if a third person was given no other information other than what each object actually was, they’d probably outperferm the other two just by applying their knowledge of the objects.

nfinity matters because counting is one way. This is the reason this formula does not work.
You have counted 100 of something. You may not be finished. According to this formula, there is a 50% probability that the actual count is 133 to 400. Now let’s walk through this. If you have counted to 100, 100 is a valid part of the solution set of all numbers 100 and above, up to infinity. 100 may be part of a solution to all of these counts.
Thank you. Before I address the rest of your discussion can you answer my other question. I’d like to know if you agree that a range of values can be useful and that 50 +/- 25 is 50% of the range between 0 and 100 (in the random number generator case that you proposed).
For everybody else, I’ve been thinking of examples where statistical predictions can be made even when infinity is on the table:
-
Flip a fair coin until heads comes up. How many times will you flip the coin? There’s a finite probability of any non-negative integer at all - but the average number of flips is 2. If you make a bet of $1 that the flipping will end in 2 or fewer flips, with a payout of $1.01 if you’re right (so you end up with $2.01 if you win and $0 if you lose), the odds are in your favor, even though there’s a decent chance that the flipping will take five or more flips to end. The paradoxical part is that even after 10 flips of tails, the correct estimate for how many more flips it will take to finish is still 2.
-
You can make that flipping scenario more complicated by adding memory to the process. Now the game ends with you get 10 heads total. The average duration of the game is 21 flips, but it could take many many more flips to complete. Again, if after 10 (or 17, or 23) flips, if you have had no heads so far, the expected number of future flips is 21, but in this case, because there is memory (the system has “states”), at any point after which you’ve already had 5 heads, the expected number of flips to finish the game goes down (I just find this interesting) - but infinity is still on the table (when I ran this simulation a million times, sometimes it took more than 50 flips to end).
All this assumes a fair coin.
Now, I don’t accept Gott’s argument: the reason is that unlike the building case, I think we know too much about humanity to use a perfectly uninformative prior. A human race with only 1000 members is way closer to extinction than one with 8 billion (some threats that would be existential for a 1000 member species would not be a threat at all to 8 billion people (as a whole)). This is kind of like the “memory” case above, which suggests an interesting way to analyze the Bayesian argument for a building
Think of the Berlin Wall or some other building as flipping a coin with fixed but unknown odds of coming up heads, every year; if the coin comes up heads the building is destroyed that year. If the building has existed for X years, that gives us some information about the odds of heads, which in turn gives us some idea of how much longer the building is likely to exist. If your initial estimate is low (because your initial estimate was made when the building was young, so your estimate of the yearly probability of collapse has a range that includes a fair chance of near-future destruction), and you come back later, you can update your estimate to reflect the new information, increasing your estimate of future lifespan. This is a good thing - you want new information to be able to change your mind in a sensible way.

Well, this is again cherry picking.
I didn’t specify the Beatles. I didn’t even expect it to work, I just tried it and was surprised how well it works.

For instance, if the Stones as a band are 60 years old, how long would you expect them to continue playing? The formula says there’s a 50% chance of them playing for another 20-180 years.
Well sure. And a 25% it will be less than 20 years, which intuitively is very likely. But “Stones as a band” isn’t a great basis for defining this kind of inference. You could eliminate the upper bound with better information. Or could you? The Stones have had a lot of lineup changes in their history. This could continue. It’s hard to think they’ll go on without Mick Jagger for instance, but they’ve gone on longer than they have any reason to. Maybe they go on changing composition forever. It’s not likely but it’s a possibility, one that decreases over time.
I’d use better information and say that there’s a 95% chance that the actual number is lower than the lower bound of the formula’s answer. I know the Stones won’t last another 50 years because all the members will be dead. It likely won’t last another 10 years, despite having been around for 60.
Of course, absolutely. If you have more complete information, why not choose the approach that exploits it? Nobody’s saying Bayesian inference is always the best tool, or that it provides certainty. It provides what it provides. When all you have is a little knowledge about the priors and sequence, a vague but rigorously determined forecast is a big improvement on nothing. That’s all we’re saying here.
I’m agreeing with you. I’m saying that people who are having trouble with the concept are applying it to situations where there is better information, and therefore the formula seems ‘stupid’. The whole point though, is that it is useful If there is no better information.
So bringing up examples of things we know well is bound to run into the argument that the formula doesn’t work well because knowing what we do there are better ways to determine the potential lifespan of something.

I’m agreeing with you.
Oh ok. I’m not used to that.
In this thread we’ve had a lot of people imagining backtests with failed trials and saying “see, it doesn’t work” so I’m vigilant to anything that sounds like that. You don’t lose a coin toss and then say “I guess probability is bullshit since this trial didn’t yield my choice.” A 50% chance is exactly a 50% chance. A distribution is a distribution. I’m repeating this for others, I know you already know.

The problem that keeps cropping up is that the formula, while correct and useful, has very little predictive power and so as soon as you try to apply it to,something known, it’s often more accurate to just forecast based on what you know.
For instance, if the Stones as a band are 60 years old, how long would you expect them to continue playing? The formula says there’s a 50% chance of them playing for another 20-180 years.
I’d use better information and say that there’s a 95% chance that the actual number is lower than the lower bound of the formula’s answer. I know the Stones won’t last another 50 years because all the members will be dead. It likely won’t last another 10 years, despite having been around for 60.
The predictive power of this formula is low because it deals with completely unknown things, and there’s just very little information to go on. How long the thing has lasted is just about all you’ve got, and the formula is the best you can do in that situation. But the best you can do is still really marginal.
By way of agreement I guess:
There are all sorts of problems where the temptation is to throw up your hands and say, “I don’t even know how to begin to answer that question.” The odds that humanity will perish, given that we survive an additional 10 or 20 years is one of them. I make this a conditional probability, since we can say something about the international environment over the next few years. But 10 or 20 years? Not so much, at least before the Universal Emperor gives us access to Seldon’s dataset. That said, as I argued above even in this context I think you can improve on the baseline, though it would require a lot of additional work.
The formula is quick. The iron triangle of data consists of three descriptors: they are fast, cheap, and accurate. You get to choose 2. The formula is fast and cheap, with wide error.