In order to appreciate Bell’s theorem, we need to first have a look at the concept of correlation in good, old classical physics. So consider the following example: there’s two shoe-boxes, a red ball, and a green ball. If we put each of the balls in one of the boxes, with equal probability, then distribute the boxes—say, you keep one here, I take one with me to Mars—, we share a classically correlated state. The correlation simply consists in the fact that if either you or I opens their box, we will immediately know what colour ball the other has.
The question that Bell’s theorem now assesses is whether all correlations, and particularly, those of quantum theory, can be explained in terms of such shoe boxes. The coloured balls are, in this example, the local hidden variable: they are local because whatever I do to my shoebox will not impact on yours; they are hidden because until we open the box (perform a measurement), we don’t know their values (but they always possess a definite value). The terminology is actually somewhat inconvenient: actually, anything we ever see is the value of such a ‘hidden variable’; their hiddenness only applies in cases when, in a sense, nobody’s looking.
So, to recap, the question we have before us is: can all correlations be explained with parameters that 1. do not influence one another instantaneously, and 2. always have a well-defined value?
Bell’s theorem answers this question in the negative. The reason it is held in such high regard is that at first sight, this doesn’t seem to be a question that permits of a definite answer at all, and additionally, it’s a question about the foundations of our world—it’s been characterized as ‘a piece of metaphysics decided in the laboratory’.
Now, let’s have a look at how, exactly, this feat is performed. The basic strategy is to formulate a combination of correlations, that is, of shoe-box experiment outcomes, that, if it is to be explained by parameters of the type discussed above, is limited in overall value. How does one do this?
Well, once more, there’s two shoeboxes, let’s call them A and B. However, this time, we can check whatever is in the shoebox with regards to two different properties–say, colour and weight. We are, however, only allowed to check either one or the other of these properties, not both at once (yes, this has, in quantum mechanics, to do with the uncertainty principle).
Now, let’s distribute a great many shoeboxes between A and B. An experiment will consist in both A and B checking one of their shoeboxes with respect to one of the properties of the balls inside—so A might check colour, B weight, or both might check colour, and so on. There are, thus, four possible checks that could be performed. Indicating a colour check with an index ‘c’, and a weight check with an index ‘w’, these are simply: a[sub]c[/sub]b[sub]c[/sub], a[sub]c[/sub]b[sub]w[/sub], a[sub]w[/sub]b[sub]c[/sub], and a[sub]w[/sub]b[sub]w[/sub].
In order to ease our notation somewhat, we will denote outcomes of the individual experiments by the values +1 and -1, for instance: green = +1, red = -1, heavy = +1, light = -1. A joint outcome is simply the product of both outcome, of if the experiment a[sub]c[/sub]b[sub]w[/sub]—that is, A measures colour and B measures weight—is performed, and A obtains green = +1, while B obtains light = -1, we will note down a -1 for the total experiment. Now consider the quantity
C = a[sub]c[/sub]b[sub]c[/sub] + a[sub]c[/sub]b[sub]w[/sub] + a[sub]w[/sub]b[sub]c[/sub] - a[sub]w[/sub]b[sub]w[/sub].
It is simple (and I really mean simple, not mathematician-simple meaning that there exists a finite number of operations leading to the desired conclusion) to show that C ≤ 2. Consider that
C = a[sub]c[/sub](b[sub]c[/sub] + b[sub]w[/sub]) + a[sub]w[/sub](b[sub]c[/sub] - b[sub]w[/sub]).
This is just a simple reordering. But then, you can just try all possible combinations: whenever (b[sub]c[/sub] + b[sub]w[/sub]) = 2, (b[sub]c[/sub] - b[sub]w[/sub]) = 0; and vice versa, whenever (b[sub]c[/sub] - b[sub]w[/sub]) = 2 (i.e. b[sub]c[/sub] = +1, b[sub]w[/sub] = -1), (b[sub]c[/sub] + b[sub]w[/sub]) = 2. This will remain true if we perform our experiment many times, and average over the results; in any single experiment, of course, we only have access to one of the terms.
Thus, whenever we perform our experiments, we will find for C a value that is less than 2. But note that we have made certain assumptions here: most notably, that A’s choice of measurement can’t affect B’s outcome (and vice versa); for if that were not the case, then whenever A measures c, things at B’s side could sort themselves such that both b[sub]c[/sub] and b[sub]w[/sub] turn up +1, and whenever A measures w, they could conspire such that b[sub]c[/sub] = +1 and b[sub]w[/sub] = -1. This is the assumption of locality.
We have furthermore assumed that there always is a fixed value for any given property, i.e. that we could, in principle, obtain all the values for a[sub]c[/sub], a[sub]w[/sub], b[sub]c[/sub] and b[sub]w[/sub]; this is necessary in order to even speak about the quantity C intelligibly for a single experiment (in which we only ascertain the values of one pair of these values).
And now, for the punchline: it is possible to produce a quantum mechanical set up of such ‘boxes’ such that the classical bound on the correlations is violated. The boxes are, commonly, something like two electrons in a special (‘entangled’) state, and it’s not their colour and weight, but rather, their spin along a certain axis (which is always either +1 and -1) that is checked. Both A and B have two axes along which to check the spin, but otherwise, they perform just the same kind of protocol as above: take on of their electron (‘shoe-boxes’), measure the spin along either axis, note down the value, and continue; later, then, A and B meet, and multiply their values for corresponding pairs.
Now, for Bell’s theorem: if the above assumptions are valid, no system could produce a value of C > 2; however, in quantum mechanics, it is possible to obtain a greater value (equal to C = 2√2, in fact; why this is not the theoretically possible maximum value of C = 4 is a very deep question on which much research is being performed). Hence, one of those assumptions—locality or value definiteness—must be wrong. And there you have it.
Just parenthetically, I would like to note that there are other assumptions you can make, in order to derive similar inequalities—for instance, non-contextuality (the outcome of one measurement is independent of what other measurements are performed simultaneously) together with value definiteness yield the Kochen-Specker theorem, while ‘macroscopic realism’—the assumption that a given object is always in one of the states available to it—and measurement nondisturbance yield Leggett-Garg inequalities.