Just to clarify, I have no fondness for hidden variables, as such; quite the opposite, I am happiest to discuss only the ultimately predicted observations as the fundamental entities. But I don’t believe an epistemological account of wavefunction collapse requires the wavefunction to be taken as merely representing uncertainty about underlying hidden variables with definite values [just as I don’t believe an epistemological account of probability distribution collapse can only be carried out in the specific context of taking the probability distribution to represent uncertainty about underlying hidden variables with definite values]. While I’m at it, I think most alternative accounts of wavefunction collapse are bogged down in a morass of confusion [e.g., it certainly doesn’t matter to the physics whether a photon bounces off and enters your eye or not; that’s a confused reading of the “observation” terminology].
Just to clarify something else, I, er, have no fondness for hidden variables, as such, and, quite the opposite, etc., etc., am not claiming that there actually are underlying hidden variables with definite values, etc., etc. But think much popular discussion about superpositions is glib and doesn’t actually emphasize any non-classical phenomena.
Anyway, all the more to stress the observational content of the manner in which quantum mechanics departs from the classical,
Well, let’s walk through an illustration of Bell’s theorem, and then decide what the consequences are.
We can set up a situation where there are two machines, located at different points in spacetime, outside of each other’s lightcones (basically, you can think of the setup as two machines at different locations in space, but with everything happening at the same time), with the following property: each machine can be fed an angle as input, and will then produce one bit of output (let’s call this “red” or “blue”). The interesting thing, observed empirically over many runs of this experiment, is that, if both machines are given the same angle as input, they always produce the same output. More generally, again as observed empirically over many runs of this experiment, out of all the situations where the difference between the two machine’s input angles is theta, the proportion of times where both machines output “red” is about cos^2(theta)/2, the proportion where both output “blue” is also about cos^2(theta)/2, the proportion where Machine 1 outputs “red” and Machine 2 outputs “blue” is about sin^2(theta)/2, and the proportion where Machine 1 outputs “blue” and Machine 2 outputs “red” is about sin^2(theta)/2. That is, about cos^2(theta) of the time, the machines produce the same output.
In particular, let us simplify things even further, and only concern ourselves with three angles, each separated by 120 degrees. Then the setup is that each machine has three buttons (let’s call them A, B, and C) out of which one can be pressed, after which a single bit of output is produced; if the same button is pressed on both machines, then they always produce the same output. However, if different buttons are pressed, then only about a quarter of the time do they produce the same output.
So what? Well, now let’s think about modelling this with the particular mathematical construct of a probability distribution (on the space of all 30 possible complete input-and output-configurations; i.e., an assignment of a number >= 0 to each of these, with the total summing to 1). Which distribution? Well, let’s discuss various properties such a distribution could have. It could be the case that each of the nine particular input-configurations is equiprobable (i.e., the machines’ inputs are chosen independently at uniform random); we’ll say that a distribution with this property “doesn’t predict the inputs”. It could be the case that the events at Machine 1 are independent of those at Machine 2, in the sense that the probability of each complete input-and-output configuration is simply the product of the probabilities of the corresponding Machine 1 and Machine 2 configurations; we’ll say that a distribution with this property “fixes the common causes”. Finally, it could be the case that the distribution “satisfies the empirical condition”, in the sense that the probability of any particular complete input-and-output configuration with mismatched inputs is a quarter of the probability of the corresponding input-configuration.
Again, so what? Well, the interesting thing now is that we can now prove an interesting fact. We’ll say that a probability distribution is “local” if it is the weighted average of distributions which fix the common causes and don’t predict the inputs (in mathematical jargon, it arises as a “marginal” on such distributions). The interesting fact is that no local distribution satisfies the empirical condition.
Proof: Let P be an arbitrary distribution which fixes the common causes and doesn’t predict the inputs. First, we will show that, under P, each input (A, B, or C) must make either an output of blue almost certain or an output of red almost certain (in the sense of the conditional probability being 1). This is because 0 = P(Machine 1 outputs red and Machine 2 outputs blue and Machine 1 has input A and Machine 2 has input A) = P(Machine 1 outputs red and has input A) * P(Machine 2 outputs blue and has input A) [this last step following from the independence defining fixing the common cause]. Thus, either P(Machine 1 outputs red and has input A) or P(Machine 2 outputs blue and has input A) is equal to 0. Symmetrically, either P(Machine 1 outputs blue and has input A) or P(Machine 2 outputs red and has input A) is equal to 0. But P(Machine 1 outputs red and has input A) + P(Machine 1 outputs blue and has input A) = 1/3, and similarly for Machine 2; it follows that either P(Machine 1 outputs red and has input A) = P(Machine 2 outputs red and has input A) = 0, or the same with blue substituted in for red. And symmetrically for inputs B and C. Accordingly, each input has a corresponding color which it makes almost certain.
Next, we will show that this means P(the two machines give the same output) is at least 5/9. For each of A, B, and C there is one color which it makes almost certain as output; this color has to be the same for at least two of the inputs. Accordingly, P(the two machines give the same output) = P(the two machines give the same output and have the same input) + P(the two machines give the same output and have different inputs) = P(the two machines have the same input) + P(the two machines have differing inputs with matching assigned colors) = 1/3 + P(the two machines have differing inputs with matching assigned colors) >= 1/3 + 2/3 * (2 - 1)/3 = 5/9.
Finally, as the above held for an arbitrary distribution fixing the common causes and not predicting the inputs, we can conclude that, for any local distribution P, the probability that the two machines give the same output is a weighted average of values which are at least as large as 5/9, and thus is at least as large as 5/9 itself. And also, for any local distribution, the probability that both machines have the same input = 1/3, just by taking the weighted average of values which are all equal to 1/3. But then this cannot satisfy the empirical condition, for the empirical condition tells us that the probability that the two machines give the same output = the probability that both machines have the same input + 1/4 * (1 - the probability that both machines give the same input), which would equal 1/3 + 1/4 * 2/3 = 1/2, which is less than 5/9. This completes the theorem.
Discussion of its implications can now follow.