I've developed an algorithm that predicts the future of mankind...and it doesn't look good

I was thinking you found a hole here, but I realize it’s just another case of choosing the right tool for the right job. A species of only 1000 organisms is rare. It’s special, not mediocre. It’s probably fragile, not durable, and fragility carries more information. Maybe it just suffered a catastrophe, maybe it’s over-fitted to one niche. Whatever it is, you’ll likely know the special reasons why they’re so fragile. What are our vulnerabilities, how often do we face them, how lethal are they? If this is an information-rich scenario like that, then Bayes isn’t the right tool, because it can’t consider all of that information. You should pick the method that can.

It seems like birthrate stability is a dimension that would need considering here. If they’re at 1000 but they’ve been at 1000 for a million years with an annual birthrate of 10, they’re not fragile or special, they’re durable and mediocre. Clearly they face no special calamity or vulnerability, so there’s not much to know about risks. It’s a low-information situation where only durability is known, so this method becomes the best forecasting tool under those assumptions.

The problem is that we have no knowledge if a particular number is a good guess. That is where the 50% confidence line is highly misleading.

Suppose we have 100 counting exercises. They are all the same, counting the number of pennies in a dollar. So you count one penny, stop, and estimate that there are one to three more pennies left to count. This proves to be false when you get to five pennies. So that guess was wrong. But the guesses you made at two, three and four pennies are still active. So you go on making guesses at every stop. Eventually you get to 100 pennies and 100 guesses. The count is over. 25 guesses have already been discarded as too low. 75 guesses still on the table. 50 of these guesses were right as far as the range specified. So the method works 50% of the time. It doesn’t matter that it’s all the same count. The distribution of what you are counting doesn’t matter. As long as you are counting a closed set, you will get this result every time.

But for this group of data, the initial guess when you have counted 1 is wrong, every time. THAT guess has no guarantee of being 50% accurate. Just the total number of guesses. Which you are making every time you go up another number.

So suppose we’re still in the first 1% of the total number of humans, and the total winds up being 500 quintillion or something like that. The guess at this point that the population will double is going to be “double or nothing” by additional “bets” since the question of how many is still open. Eventually the real number will be determined, by counting, and you can collect on the quintillions of bets you still have on the table.

So for the method to work, you have to front load with many incorrect guesses. And you can’t collect on any of your correct guesses until the count is actually completed. From a gambling perspective, the house will love you, since you are giving them much money which they can invest while you wait to get back to the break even point. From a life perspective, you typically aren’t going to be listened to your lowball guesses forever. This formula is a cute math exercise with zero applicability. Mainly because the method requires lowball early guesses in order to cover all possibilities in a brute force fashion.

Now in any data set it’s possible you could be given a number of low counts, but that’s not something we know without direct knowledge of the data. And by grouping theory, there are always going to be more possible groups, combinations of discrete items, than discrete items themselves. So low guesses aren’t statistically sound in that sense.

Why do you think this is different from a simple probability exercise where you throw a 100-sided die and predict there’s a 1% chance it will come up 47? If you throw it 110 times and it doesn’t hit 47, was your 1% forecast wrong, will you forever be known as the sucker who made 110 bets for nothing? Do you think that’s how this works?

I would encourage you to visit a casino, bring your best long-game strategy that loses big in the beginning and then cleans them out at the end, and observe exactly how that goes for you. Books and movies have been made about it. The house definitely will not love you, and the consequences are not pleasant.

To answer your other question, the house wins because they are holding your money and get to keep 25% which are failed before the answer is finally declared. They win by holding their money. I’m postulating a fictional casino or lottery that doesn’t keep more on top of it.

This whole formula has ZERO to do with probability. Nothing. Do not use it that way. It is not predictive in any way. It is an exercise in book balancing.

The formula indeed makes predictions at every turn. They will balance. 50% of the predictions will be correct, just like the confidence level says.

Suppose the number is googleplex. For what seems like forever, that first 25%, a quarter of the way to googleplex, you’re going to get losing bets. Then it turns and you get winning bets until three quarters of the way to googleplex. Then more losing bets, too high now instead of too low, until the count of googleplex is reached. It all balances, your confidence levels balance. Half of them are right, just like you said.

But, it is not unusual, or weird, or lucky, to go forever and have losing bets. Because it’s just counting. There are always bigger numbers. There are no constraints to having a large number. So when you say you have a 50% confidence level at any point, you really need to explain that this range you are coming up with really doesn’t mean anything. It’s not even a guess. Please do not pass it off as a guess. It really has nothing to do with the actual number. It’s all just accounting, balancing the books.

The book balancing has NOTHING to do with how much is in each count. Everything you count could be in the trillions and above. Then all of the lower numbers will be wrong every single time. The formula does not care, it balances for every number. 50% of all numbers in every number set need to be positive. In all of the numbers in all of the number sets. You may be counting such things where you NEVER have a correct answer at a particular number. Because the number is too low or too high. That is not weird or unusual or lucky or defy any odds. It just means you haven’t had anything to count that is that size. This formula does not substitute for counting.

To say things another way:

An algorithm saying there is a 50% chance of chicanery in the medium future may seem daunting in the absence of other information.

But knowing Trump has a 50% chance of being reëlected and has been frank about his plans to weaken institutions, kowtow to foreign despots, “be a dictator for a day” and subvert democracy… that doesn’t seem daunting at all. It would have ten years ago.

Since simple probability seems to be the only tool you want to talk about, let’s talk about cumulative outcome of hitting a set of numbers on a 10-sided fair die (not a sequence). I’ll demonstrate your favorite tool is also only cumulatively correct (it depends on “bookkeeping” as you phrased it). I’ll also demonstrate that over large trials, it’s less dependable than using a distribution.

Pick 5 numbers for your set. It doesn’t matter which ones. I’ve a 50% chance of being right. I throw it 10 times and I miss. Is that surprising? Not really, I knew there was a 50% chance of being wrong.

The odds were 1/2, so is that a guarantee I won’t miss on the second trial? No, .5^2 is 25%. So I’m less likely to fail, but there’s no guarantee.

Throw it 10 times, is it a lock? No, there’s still a 1/2^10 chance I’ll miss. 1/1024 chance of missing is pretty good! But still not perfect.

The more trials I run, the more I improve my odds over 50%, but there’s still a non-trivial chance I’ll fail all 10 trials. Would you say that simple probability was “wrong”? No of course not. My forecast was 100% sound, it’s just probability not certainty. If I keep running trials, and my success rate will approach 50%.

As you can see, simple probability is no different from a distribution-based approach. Simple probability doesn’t converge on “correctness” until you perform the “bookkeeping” exercise of averaging repeat trials. By contrast, you admitted and proved that a distribution-based approach will reach 100% correctness at the end. Simple probability will approach 100% certainty the more trials you make, but the odds of actually reaching 100% certainty are 1/∞, so for practical purposes, it’s not happening.

Ok, so I’ve reflected on this thread. I want to address what I see as the resistance points to accepting the conclusion.

First, this is not a prediction, and the dates or number of years are not hard numbers. This is a probability assessment. A 50% probability of the end of human births in 720 years doesn’t guarantee that outcome. It just suggests it is reasonably possible.

Also, what jumps out is seeming disconnect between human beings having existed for 200,000 years and the idea we could be gone in 720. I think that is a mental hurdle that makes accepting the results difficult.

However, this appears to be straightforward Bayesian statistics, which I admit I don’t know much about. But I feel like if the conclusion is erroneous, the problem is not in the Bayesian method, but in the way the problem is being framed - human lives vs years.

Fourth, it is hard to think the human race will stop breeding in 720 years, but the analysis says nothing about why that occurs. It could be the annihilation of humanity, or it could be advancements in medicine and technology lead to an end to aging as we know it, our transformation of our species to cyborg and then synthetic bodies.

Hey, it’s wild speculation, but it’s not a pure impossibility.

Also consider the number of global crises humanity faces, and start thinking about the probability that any one of them or even a collective set could do is in.

For example, with the war in Ukraine and the Palestinian/Israel War, and elevated tensions with China over Taiwan and other issues, it feels like we are on the verge of a conventional WW3 breaking out. And that could easily go nuclear. And even if it manages to stay conventional, with the improvements in weaponry and technology in the past 80years suggests that a global war would be far deadlier and more destructive than either of the previous ones.

So our species may be pretty robust, but we’re also incredibly capable of overcoming that robustness with our ingenuity.

I would agree. In fact the problem is really inherent in the question of the OP, before you ever get to the answer: So you’ve developed an algorithm that predicts the future of mankind. It fixes an end date which, in historically relative terms, is distressingly close. But in terms of human lifetimes, it’s over 30 generations away.

How well has humanity done in solving for issues forecast to happen in ~32 years? If we’re not good at that, how will we be any better solving for events not expected for 32 generations? If that’s the case, what’s the point even thinking about it? Or, could we come up with a definition of “the end” that would include shorter-timeframe events that might not kill everyone, but might be near enough to invest in avoiding? I don’t know the answer to either of those.

The human species is adaptable. During the Paleolithic era humans lived where the ice never melted and where the rain never fell. I think some humans will survive any earth wide catastrophe, regardless of how many humans are killed by the catastrophe.

In the short run I see problems emerging from human population growth, global warming, and the danger of nuclear war.

It’s also based on a total count of humans, not years IIRC. So in theory we could also last thousands of years with a more modest population of a few million or so.

Or we could be gone in a hundred years once we run out of fossil fuels and destroy the climate.

Or nuclear war could start next week.

Or we merge with our AI overlords in a few centuries and become something other than traditional “humans”.

It’s a probability assessment based on a simplified model with little additional information. Using the same model, I can expect a 50% chance of living to be between 66 and 200 years old. Factually correct, but not particularly useful.

Since the model comes up with the same number regardless of what you are counting, I would say “no information” is more accurate. Plus the probability at any one point is more like “50% with an error of ±50%” aka random or no information at all.

Even if we are at the billions of counting something, we may still be in the 1% percentile of that particular count. No problem for the model, since it will eventually reach the area where it produces correct results. Of course, you won’t know that until after you actually complete the count. I can’t recommend anyone ever uses this formula for anything due to the false sense of precision it provides. Look at your actual data, you’ll do better with that.

  • G.K. Chesterton, “The Napoleon of Notting Hill”