You can’t talk about the null hypothesis in isolation - what is the hypothesis here? The reason many people have a problem with a sequence of heads, say, is that they have an implicit hypothesis that a “random” sequence will have a mixture of heads and tails. And the reason you can’t say “I thought the coin was fair, but I tossed 50 heads in a row” is that this is creating a hypothesis based on the data, not checking a hypothesis with data. For example, you can’t give 1,000 people an ESP test, then claim ESP has been proven because one of them has gotten results significantly better than chance, which shows that this person has ESP.
About the only situation where I can see this scenario making sense is if you are informed that a certain coin is rigged, and you test the hypothesis that the coin is actually fair. I’ll reiterate that you can’t test for the generic hypothesis that it is rigged, because any result can be explainable by a certain rigging. As Shagnasty mentions, there are physical constraints also - practically speaking you can’t have a coin that produces a specific sequence, so certain tests for randomness might be good enough to invalidate a rigging hypothesis.
Big problem - you are never proving, or even trying to prove, that the null hypothesis is true, but only that the null hypothesis can explain the data with a certain probability. That a test for a coin being biased for all heads fails does not mean the coin is fair, just that there is a reasonably high chance that a fair coin can explain the results of the test. You don’t prove the null hypothesis only “disprove” (term used very loosely) the hypothesis.
Say you think the coin is biased all heads, and you get all tails. That will disprove the hypothesis, but neither proves not disproves the hypothesis that the coin is fair. And you can’t use this data to prove the hypothesis that the coin is biased all tails. You need to do an additional experiment with the hypothesis that the coin is biased all tails. If you get all tails again, then you can compute the p value. Even if you don’t get all tails you don’t prove the null hypothesis - as you say, the coin might be biased 75% tails.
Well, of course, some random sequences do and some random sequences don’t.
Well, if you first formulate the hypothesis “The coin is fair”, based on no data yet, and then toss the coin and get 50 heads, does that make the hypothesis unlikely? How does it do that? To make the hypothesis unlikely is for P(coin is fair | it came up 50 heads after I hypothesized it to be fair) to be low, but as I’ve been pointing out, we have no grounds on which to calculate/approximate P(coin is fair | it came up 50 heads after I hypothesized it to be fair), only grounds on which to calculate/approximate the largely irrelevant converse conditional probability (which will be 2^(-50), naturally, but of no real relevance).
If you can test for being fair, then you can test for being rigged; the probability that it’s rigged is just one minus the probability that it’s fair. Finding out information about one is just the same as finding out information about the other.
The obvious reason to believe those physical constraints is an inductive argument based on past observed correlations extrapolated into causal laws; but this is precisely the sort of argument the OP is expressing skepticism about, and thus looking for further, non-circular support for.
I understand entirely, completely, and totally that you are not going for 100% certainty watertight mathematical proof, or anything of the sort. You are just attempting to give probabilistic argument. My point is that the experiments you propose don’t even do as much probabilistically as you seem to think they do; observing 50 heads in a row tells us very little about the probability of the coin being fair, at least without some extreme background assumptions. Just because P(B|A) is very low, it does not follow that P(A|B) is very low. The connection between the two is very loose.
Indeed, it would be reasonable to assume that the coin is as likely to be pro-heads rigged as it is to be pro-tails rigged, right? And that whatever bias the coin has, it’s still a memoryless process, with separate flips being independent? In which case, P(50 heads | the coin is fair) = P(50 heads | the coin is not fair) = P(50 heads at all, whether or not the coin is fair) = 2^(-50). But, in that case, observing 50 heads tells us absolutely nothing about the probability of the coin being fair; if event B is as likely in the event of A as it is in the event of (NOT A), then observing B tells us nothing, absolutely nothing, about the probability of A holding.
My point is that the p-value gets misunderstood. The p-value is not the probability of the null hypothesis being true; it’s far, far from any such thing. So a low p-value does not indicate a low probability of the null hypothesis being true, a high p-value does not indicate a high probability of the null hypothesis being true. At the extreme end of out-and-out falsification, you do get a p-value of 0 and a probability of 0 for the null hypothesis being true, but that’s the only necessary correspondence between the two. In confirmatory cases, it’s just about impossible to extract any information about the probability of the hypothesis being true from the p-value. (Cites: Wikipedia on frequent misunderstandings of p-values; it also could be useful to read the section entitled “The Permanent Illusion” on page two of Cohen94, particularly to see where the implicit probabilistic argument in most misapplications of null hypothesis significance testing goes wrong)
Listen, even forget all this talk about null hypotheses and p-values.
The thrust (or at least one major aspect) of the OP’s skepticism is this question: Why should we assume events in the future will resemble past observations?
Without this assumption, all the significance testing in the world is useless. And if your response to the skeptical question is “Well, because that’s the way things have generally worked out so far [with predictions based on observed patterns having generally been highly accurate]”, well, I hope you can see the unsatisfactoriness of this (circular) response.
You have to dive off into hard-core philosophy from there. Don’t stop until you are questioning existence itself because that is where that road leads.
You are correct that science is made up of some circular reasoning but that great circle encompasses everything we have even found using the scientific method. It is possible to break out of the circle somewhat though. Einstein did it although it still doesn’t invalidate Newtonian physics for most things we use it for.
Exactly. The fallacy is that some people have incorrect assumptions about random sequences.
Actually, there is a large literature on randomness testing for random number generators. I haven’t followed it since I read Knuth decades ago, but here is one link. The purpose of randomness testing is to see if it is possible to predict the next number in a sequence based on previous ones. That’s not the same as computing the probability of a given sequence.
If someone fixed a coin to produce a given sequence, there would be now way of telling if it was random or not, assuming it didn’t repeat. If, however, the coin was fixed to be heads 70% of the time you’d be correct in predicting heads 70% of the time, and it would fail the randomness test. Of course you still have to give the probability that the observed behavior was due to chance - you never can 100% say a test failed.
Indeed. If the hypothesis were that the world works by coincidence, in other words is random, we can apply something like a randomness test. To pass it, you shouldn’t be able to predict the result of an action. This is so absurd, based on our experience, that no skeptic proposes to do this. I’m giving him a break, and saying that I accept coincidence as a null hypothesis, and causality as a hypothesis, which is like assuming a rigged coin as the hypothesis. I agree with you that P(A|B) being low doesn’t mean P(B|A) is low, but say that we can show the low probability of coincidence independently from both directions. I had my doubts on the relevance of the rigged coin as the null hypothesis (and still do) but randomness testing handles the problem quite well.
we’re testing to see if the coin is truly memoryless. In some sense a coin rigged to always come down heads has a memory. Anyhow, as I mention above, the true test is to see if you can predict the next toss. The mechanism by which you can is unimportant.
If you think there is an equal chance of the coin being rigged to be all heads or all tails, then your hypothesis would be all heads || all tails, and you can easily compute the chance that an observed result can be explained by chance.
Let’s say your hypothesis is that the coin is rigged to come up all 1s. You toss 50 times, and come up with all 1s. The probability that this result can be seen in a fair coin is 2**-50, and by most tests you’d accept the hypothesis. Now, let’s say your hypothesis is that the coin is fair. I have a problem figuring out how to compute the probabilty 50 heads will be seen in the null hypothesis that the coin is rigged. How is is rigged? How do you even compute p in this case. Like you say, the probability is not that the null hypothesis is true, but that the results can be explained by the null hypothesis. That I don’t get. Randomness tests, however, will come up with a very low probability that the results are random - how low depends on the test. That seems a better way of computing this side.
I’m sorry I’m repeating myself - I’m working this out too.
I have no argument with the rest of your post. We don’t really say a hypothesis is wrong, we just say that we reject it if the probability of it being explained by chance is high enough. The level used is a convention, and not a law of nature. And it indeed has nothing at all to do with the correctness of the null hypothesis.
My answer is that you don’t assume this, you continue to test it. It is not always true. Magical thinking involves the hypothesis that breaking a mirror or walking under a ladder leads to bad luck. Most of us test this, and find that it doesn’t seem to be true, and then abandon it. For other things, we have tested them so much that we can’t tell them from an assumption. If we lived in a world where magic worked, we’d probably have very different expectations.
Yeah, I’m philosophizing. That’s what the OP was doing as well. That’s what any response to the OP’s question must be. If you want to say “All that philosophizing is just intellectual masturbation”, well, you’re free to feel that way, but that’s not engaging with the OP’s concerns; it’s just being content to ignore them.
I don’t know what you’re saying here; what circle was it that Einstein broke out of? [Not the circle of inductive reasoning.] But I suspect this would end up being more a tangent than relevant to the OP.
Newton assumed (as did everybody) that speeds are absolute, space and time are two separate things, and mass is constant for any object. That works for almost anything we need to do on earth because relative speeds don’t cause much error. However, Einstein found that time references will shift for people that are moving at different velocities so that their entire world moves at a pace that is different than others (the twin paradox: one ages faster on earth than a twin in a space-shift because their worlds don’t have the same time reference). Mass also grows as an object approaches c (the speed of light).
I am saying that countless trials of Newtonian physics seemed reasonable to everyone and yet that isn’t the way that the universe truly works. There is something much less intuitive and unseen that has an effect on everything yet we can’t see it until we move to much bigger experiments. The Global Positioning Systems depend on relativity for example and that understanding is needed unless we just apply some correction factor to what we see and call it a day.
I’m aware of all that; I just didn’t see it as breaking out of circular reasoning. But alright, I suppose you could say so. Newton used inductive reasoning to establish his views, Einstein broke out of those views, inductive reasoning has a circular aspect to it. I guess you could say something like that, but it’s not as though Einstein broke out of the “great circle” of inductive reasoning. He just happened to have the benefit of various observations (Michelson-Morley experiment, say) which were unavailable to Newton. But he was still very much working within the paradigm of “If patterns have held for a long time, they are probably laws and not mere coincidences”.
Sure, you can test an output sequence for “randomness” in a computational complexity sense, or in a Kolmogorov complexity/incompressibility sense, or things like that. [Well, kinda… I have things to say here too, but it might become too much of a bog, so I’ll let them slide for now]. But, how do you apply that to the OP?
I mean, the OP’s question is essentially this: After how many straight heads do you move from “The coin is more likely to be fair than to be rigged for all heads” to “The coin is more likely to be rigged for all heads than to be fair”? And that’s a very tough question. It can’t be answered without first making some very strong assumptions about the probability distribution of possible coin behaviors, assumptions which are hard to find noncircular justification for.
I mean, you could conceivably make the math work out. Tightening up your thing about randomness as unpredictability into, say, randomness as non-simply-describable predictability (i.e., Kolmogorov complexity), we could fix a particular computational model M, and say behaviors which are expressible by short programs in the language of M are more probable than behaviors which are expressed by long programs in the language of M. [This gives us “Coin rigged to always comes up heads” as a priori more probable than “Coin rigged to come up heads the first 8000 times, then tails the next 8000 times, then heads on every prime numbered instance after that…”, as we’d intuitively want]. Punching it all up a bit, you could get to the point where, yes, after so-and-so many heads come up, it is indeed more probable that the coin is heads-rigged than that it’s fair. This is the sort of model which would validate inductive probabilistic inference. But the question still remains, what justifies that assumption we made, that easy-to-describe behavior is more likely than difficult-to-describe behavior? And various questions around that remain too.
Like I said, the crux of the OP is the question “When are we justified in concluding that such-and-such a pattern we’ve observed holding X many times will continue to hold?”. Now, you can make up this or that rule for it, and hopefully do so coherently [you’ll need to do something better than “Anything which works out 8000 times without fail is, with 94% probability, a valid law”; your rule must at least give us some guidance as to which patterns we can legitimately extrapolate and which ones are simply spurious]. But, implicit in asking the question is asking of any response “Ok, well, why should I accept that particular rule? What privileges that particular approach to inductive inference? If I try to put forth an argument using it, and my friend says ‘No, no, you don’t have enough experiments yet, not nearly enough; you think you have enough, but you are so wrong. Besides, you shouldn’t be extrapolating that pattern, but rather this slightly different one, and furthermore …’, what can I say to him?” And that’s a tough nut to noncircularly crack.
I guess there’s one aspect of the discussion which was mentioned a bit in the OP, and which Sophistry and Illusion rightly noted as Hume’s view, which I’ve somewhat ignored. Which is the view that causality is simply the same thing as always (or almost always) holding correlation; that there’s no difference between the two.
The reason I’ve not spent much time discussing that point in particular is because it segues right into, without resolving, the issue I find more interesting, the slightly different question of “Ok, well, if they’re no different, then how can we tell which patterns to predict to continue into the future and which to dismiss as just the results of coincidence?” That’s really where I think the meat of the discussion lies, and so that’s what I’ve spent my time talking about (or, well, around).
IMO the main flaw in the argument that causation cannot be discerned from infinite coincidence is that, if it is true, no argument can be asserted at all. The words you are reading in this post, for example, are not determinative of the question either way, because the words themselves cannot be shown to be the result of a proven cause (i.e. my logical thinking). Therefore there is no way to prove whether or not there is a difference between causation and infinite coincidence, since the concept of proof relies exclusively on cause and effect (e.g. all those logic syllogisms look to be logically consistent and therefore always true, but how do we know that by anything other than “well, they’ve always been true up to now”).
Thus, if you take the extreme view that there is no difference–i.e. that the only way to prove cause & effect is to do an infinite number of physical trials–you do not have the tools to prove that your own view or anything else it true; the argument is therefore a non-starter, so why bother asking the question?
The question points out the flaw of thinking that physical trial-and-error itself is the only basis for inferring cause. To me, the development of a cause-and-effect linkage is key; for example, if I develop a cause-and-effect mechanism that fills in the blanks between “switch on” and “light on” (e.g. “switch on” -> “electricity applied to wire” -> “electricity applied to light” -> “light turns on”), I verify each step in this chain every time I attempt the “turn light on” experiment. Thus, my X trials are really 4X in size.
An enterprizing thinker could then make the case that this cause->effect chain itself could be stretched out to infinity (e.g. if space is infinitely divisible, the cause->effect relationship must move through an infinite number of divisions before reaching the conclusion), and thus a single trial would be enough to verify an infinite number of trials, barring us from chalking it all up to coincidence.
Of course, there is a problem with this argument: We still need to verify the cause->effect mechanism we developed to explain the behavior, and since that involves an infinite number of steps, it can never be done. In effect we’ve moved the problem back a level, but now it is a question of verifying mental constructions (i.e. notions of cause and effect) rather than physical constructions (switch and light). The question then is one of metaphysics and epistemology rather than physics, and we know there are ways of dealing with infinite processes that are wholly abstractions (e.g. the answers to Zeno’s paradoxes).
I’m not saying it’s simple, but I believe it’s feasible for a person to develop criteria by which they are convinced of a cause->effect relationship because such relationships are wholly mental models, and so are not subject to the completely finite nature of known physical phenomena. If you insist that a physical solution alone is required, you are putting such constraints on the problem that it cannot be argued on way or another, since these same restraints must also apply to our arguments themselves.
There’s nothing particularly special about just having an infinite number of trials… an infinite number of trials can leave as much room for uncertainty/as low a probability of causality as finitely many trials can. The word “infinite” is a red herring.