Basic Born rule confusion

Suppose you have N independent particles prepared in state a|0> + b|1>, where N is a large integer and a and b are positive real numbers with |a|[sup]2[/sup] + |b|[sup]2[/sup] = 1. (Yeah, I know, a and b can be negative or complex, but for what I want to do now, let’s suppose they’re not).

The state of the system is described by the tensor product (a|0> + b|1>)[sup]N[/sup]. Expanding this out, we find a weight attached to each bitstring of length N; in particular, by the same reasoning familiar from classical probability, we should find that almost all of the weight is attached to bitstrings whose ratio of 0s to 1s is very close to a : b.

Which would seem to suggest that, if you have a large number of independent particles in state a|0> + b|1>, and then measure their states, you will almost certainly find a |0> to |1> ratio very close to a : b. But doesn’t the Born rule say it should be very close to a[sup]2[/sup] : b[sup]2[/sup] instead?

What am I understanding incorrectly here? (I’m sure the resolution of this apparent conflict is a very basic issue familiar to everyone who actually works with this stuff. Yet I can’t seem to readily find answers just by reading around.)

Oh, nevermind, I think I see. Even though, in some sense, almost all of the weight is attached to a : b strings, it’s also true that, in the same sense, almost all of the SQUARED weight is attached to a[sup]2[/sup] : b[sup]2[/sup] strings. Tricky, that. (Is that right? I think so but I may just be confusing myself further…)

sighs. I think I only ever embarrass myself when I ask physics questions…

Your mistake is here:

The (potentially) complex amplitudes attached to orthogonal basis vectors are closer to sqrt(probabilities) than probabilities. A state a|0>+b|1> has probability |a|[sup]2[/sup], not a, of being measured as |0> (in the {|0>,|1>} basis). Your experiment, viewing each of the N particles in its own fixed basis, is essentially classical; and each particle is a biased coin with probability |a|[sup]2[/sup] of coming up |0>.

To think about it in another way: Consider the subspace of all terms in the expansion with m |0>s and n |1>s; by the binomial theorem there are C(N,n) of these terms, each with weight a[sup]m[/sup]b[sup]n[/sup]. Each of these terms is an orthogonal state, so when you take the norm they don’t interfere, and so the length of the projection onto this subspace is just sqrt(C(N,n)) * a[sup]m[/sup]b[sup]n[/sup].

I was aware of that already. I wasn’t proposing that the “weights” were to be interpreted directly as probabilities, but was indeed supposing their squares to be the probabilities. But it seemed to me that because the vast majority of the weight was attached to bitstrings whose ratios of 0s to 1s is very close to a : b, the same would also be true of the vast majority of the squared weight. I think my mistake was in that implication.

Ah, I see. In that context your first reply makes more sense to me.

Actually, I ended up realizing that my main mistake was this: I mistakenly assigned the sum of the amplitudes of every particular string with A many |0>s and B many |1>s to be the amplitude of “There are A many |0s> and B many |1>s”. But, instead, I should have summed the probabilities of the former to get the probability of the latter. I’m not sure if it even makes sense for the latter to have an amplitude.

So, I guess I have two questions now…

One airy-fairy but really vital question for me to understand anything about the bridge between the math (Hilbert spaces, probability theory, Fourier theory, etc.) and the actual real world (electrons, detectors, experiments, etc.): What sorts of things have probability amplitudes?

And one more concrete question: Under what circumstances can one calculate the amplitude of “A or B” (in some sense of “or”) as the sum of the amplitudes of A and of B? [Of course, if A and B can be observationally distinguished, one must sum probabilities instead of amplitudes, and I gather this accounts for the different results in the two-slit experiment with and without a detector, but while this criterion tells me some instances in which I should definitely be adding probabilities, it still leaves me unclear on when I should be adding amplitudes]

At least, I think this was my mistake… It seems to be what saying “Almost all of the amplitude is attached to bitstrings whose ratio is A:B” amounts to doing. But I suppose this is just a different way of saying the same thing I already said about my mistake, that even though the sum of the amplitudes is concentrated in one region, the sum of their squares may be concentrated somewhere else. So maybe I don’t have any new insight into what my mistake was.

Yes I think this your mistake. Each different sequence of 1s and 0s represents a different result of a quantum mechanical measurement. The number of 1s and 0s observed is not the result of a quantum mechanical measurement and as you say it makes no sense to talk about its probabilty amplitude.

I’m little confused about this actually, it seems like if you work the maths through there’s no difference between the classical and the quantum (as you would expect in this situation where each state is seperable).

It’s probably clear to you from my earlier replies that it’s not clear to me exactly what your question is; so what I write here will probably end up being completely obvious and irrelevant to your problem. But I’ll try anyway. [In this post I’m only trying to answer your concrete question, though I hope when you have an answer to it you will have at least a partial answer to your more theoretical question too.]

Probabilities of orthogonal states add; amplitudes of equal states add. This is a consequence of the more general statement that the probability of measuring a state |psi> to lie in a subspace defined by the projector P is just <psi|P|psi>. The point about adding probabilities when A and B can be observationally distinguished can be thought of as a special case of this rule, by considering the detector as a second quantum system, entangled with the system you’re interested in. The detector is in the state |a> when the system is in the “A” state and |b> when the system is “B”, where |a> and |b> are orthogonal quantum states. (Here I’m describing “measurements” as entanglements rather than projections, which may seem like cheating–at any rate it doesn’t give any nice resolution to the problem of wavefunction collapse, it just postpones the reckoning. If you like you can think of these extra measurement-apparatus states as just saving up all of the information to be measured until the end of the experiment, at which point the actual projective measurement occurs.)

In the two-slit experiment with no which-path detector, the interference pattern arises because the states passing through the two slits are equal, up to a phase depending on the path length; when the path-length phases differ by a multiple of 2pi you get constructive interference. When you add a detector, you can think of it as another two-dimensional Hilbert space adjoined to the system, with a |0>[sub]det[/sub] indicating the detection of the particle along one path and |1>[sub]det[/sub] a detection along the other path. In this case the state for the particle following one path is |psi>[sub]sys[/sub]|0>[sub]det[/sub], while for the other path it’s |psi’>[sub]sys[/sub]|1>[sub]det[/sub] (assuming an ideal detector). Since |0>[sub]det[/sub] and |1>[sub]det[/sub] are orthogonal, these are orthogonal states and there’s no interference; the probabilities add, not the amplitudes.

In your original question, you described the state explicitly as a sum over orthonormal basis elements. The probability of finding the system in a subspace described by some subset of these basis elements is just the sum of the individual probabilities: The projectors you’re interested in are of the form
P_k = sum[sub]{all n with Hamming weight k}[/sub] |n><n|
and when you compute <psi|P_k|psi> you just get the sum of the squared magnitudes of all terms with exactly k |1>s.

To see “interference,” where the amplitudes add instead of the probabilities, you can perform a quantum Fourier transform on your states; the QFT on an N-qubit state is the unitary operator defined as
F|n> = 2[sup]-N/2[/sup] sum[sub]m[/sub] exp(2piimn/N) |m>
(i.e., the Fourier series on the amplitudes; here |n> means the state of N qubits given by the binary representation of n). If you started with the equally-weighted sum
|psi> = 2[sup]-N/2[/sup] (|0…0> + … + |1…1>)
then the QFT would be
F|psi> = |0…0> ;
the equal amplitudes from all of the 2[sup]N[/sup] states have all “interfered constructively” at the state |0…0> (its coefficient is (2[sup]-N[/sup]+…+2[sup]-N[/sup]=1), and destructively at the other states (their coefficients are 2[sup]-N[/sup](1+exp(2piim/N)+…+exp(2piim(N-1)/N))=0).

That’s understandable; I wasn’t very clear in explaining my confusions. At any rate, what my question originally was I think I’ve gotten a fine handle on now; the questions I have now are no longer the question I was originally struggling with. The only questions I have now are the questions from post #6. But let me explain what my problem originally was and the resolution I reached so as to put it to rest:

Imagine you have 100[sup]3[/sup] “red” possibilities and one “blue” possibility, but the amplitude of the blue possibility is 100[sup]2[/sup] times as large as the amplitude of each red possibility (with all the amplitudes being positive reals).

Over 99% of the total amplitude is contained in “red”. Of course, amplitude isn’t the same thing as probability, but amplitude tracks probability (in the sense that the latter is simply the square of the former), so you might think that, if almost all of the amplitude is contained in some region, then so is almost all of the probability. I naively thought that. At least, in the limit as “almost all” approaches 100%.

But it isn’t true… In this case, even though over 99% of the total amplitude is contained in “red”, it’s nonetheless the case that over 99% of the total probability is contained in “blue”. (And the percentages can be made arbitrarily closer to 1 by simply replacing the “100” in the first paragraph with higher values). The amplitude can concentrate in one region while the probability concentrates in another, in this sense, in contrast to my naive presumption.

Why was I considering such things? Well, I was thinking about the fact that, in probability theory, one can recover the probability of an event’s occurrence in a single trial from the probability distribution of the results of many independent trials; the probability distribution will become more and more tightly concentrated into the region whose frequency of event occurrences matches the single-trial probability.

The mathematics of tensoring probability distributions and of tensoring amplitude distributions are, of course, exactly the same. Which suggested a paradox. Tensoring an amplitude distribution with itself over and over would result in the amplitude becoming concentrated into the region whose frequency of event occurrences matched the single-trial amplitude. So (I found myself puzzling over) it would seem that almost certainly, the observed frequency of event occurrences would match the amplitude, and not the squared amplitude.

But the resolution I’ve now realized is as outlined above: the amplitude of a large tensor power of a|0> + b|1> does indeed become concentrated around the strings whose |0> to |1> ratio is around a : b, but at the same time, the squared amplitude becomes concentrated around those whose |0> to |1> ratio is a[sup]2[/sup] : b[sup]2[/sup]. The amplitude and the squared amplitude don’t have to become concentrated in the same region, even though they track each other in some sense. That was my entire confusion. It doesn’t even have anything to do with quantum mechanics, per se; I was tripping over a mathematical point. The integral of X can be dominated by the contributions from one region while the integral of X[sup]2[/sup] can be dominated by the contributions from another region.

(But if the region in which X concentrates consists of only one (or infinitesimally close to one) possibility, then X[sup]2[/sup] does have to concentrate in the same region (and vice versa), and this caused me further confusion, as I kept reducing from the space describing all length-N strings to the space describing only the number of |0>s and |1>s within such a string, adding up amplitudes to get the “amplitude” of the region in which the number of |0>s and |1>s was whatever. Even after I’d realized what the correct resolution of the paradox would have to be, this kind of mistaken reasoning kept throwing me off of it for a while; my last two posts came when I realized that I was making this mistake as well, and while I was struggling to figure out how it played in with my previous mistake)

So, in short, my original confusion was, embarrassingly enough, actually a mathematical confusion, not a physics confusion. But my questions now actually are questions about the bridge from mathematics to physics…

Thanks for the explanations; the perspective of considering detectors as simply another quantum system entangled with everything else is particularly helpful. With that, it all makes sense; the states the detector distinguishes are orthogonal and thus add probability automatically as you noted, etc.

That having been said, I think maybe the main question underlying my “concrete” question was also an airy-fairy one… Specifically, how, in general, do I tell what the Hilbert space I’m dealing with in a physical problem is? Given a physical situation in the real world, what process do I use to figure out the relevant Hilbert space, without having to rely on somebody else telling me what it is? [Answering this question might go hand-in-hand with answering the other airy-fairy question]

:::shakes head:::

You talkin funny talk. Me no get funny talk. Make head all buzzy. Nap now. Seepy.

And what the hell is “airy-fairy”?

The short answer, I guess, is “Pure quantum states have probability amplitudes.” The problem is that in the real world it seems to be difficult to maintain a coherent quantum state; stray interactions tend to produce couplings that entangle the system with the rest of the universe, which you can describe formally in pretty much the same way as with the “measurement” entanglement I described earlier. Once the system is entangled messily with the environment, its coherent probability amplitudes cease to be relevant in the system description, and only the probabilities are important. (This is, as you probably already know, the phenomenon of “decoherence.”) But I’m not sure I’m answering the question you’re asking.

The relevant Hilbert space depends on what you’re trying to figure out, of course. Theoretical models like the harmonic oscillator have a discrete infinite-dimensional Hilbert space, and models like the hydrogen atom have one of those for the bound states along with a free continuum Hilbert space above that. But usually you’re not interested in most of those states; maybe you’re primarily interested in, say, the ground state and one particular excited state reachable from the ground state by photon absorption, or a pair of low-lying metastable states. For quantum computation, for example, you’d typically choose two of the states of the system to be the |0> and |1> states, with states outside this subspace not describing a legal state of the computer (though the other states might be used temporarily to perform a desired quantum operation). [In principle you could choose any two states as |0> and |1>; in practice maintaining coherence is hard enough that experimenters work hard at choosing these states to make their lives easier.]

Usually all the possible states of a system will be described by a boundary value problem, i.e. a wave equation with a set of boundary conditions, the Hilbert space represents the set of all possible solutions (with the usually vector rules for vector adddition and scalar multiplication) together with a complete metric induced by the inner product.

Usually we assume that the Hilbert space represents postion space and is infiitely dimensional, though we also generally assume that the space has countably many dimensions (as any more dimensions cause severe problems).

As Omphlaskeptic says though often your not interested in all the states and so you might only be inetrested in a particualr subspace. You also may be interested in a particualr observable and so perform a transform such that that observable becomes the identity operator.

So in your example of |0> and |1> we’re working in a 2-D subspace of the state space which is transformed so that the observable “1 or 0” is the identity operator.

Though I’m a little rusty on all this, so hopefully Omphaloskeptic will confirm this is correct.

I’m not sure I still follow exactly how you arrived at your mistake. My confusion was simply that if you actually taht probabilty amplitude of a sequence with m 0s and n 1s is simply a[sup]m[/sup]b[sup]n[/sup] and it’s pretty easy to prove that |a[sup]m[/sup]b[sup]n[/sup]|[sup]2[/sup] = (|a|[sup]2[/sup])[sup]m/sup[sup]n[/sup] for all complex a and b where m and n are integers which exactly the same as if we hadn’t brought QM in to it in the first place.

Obviously probabilty amplitudes are usually not real and even where they are they don’t sum to 1 for a normalized wavefunction (except for example of an instantly repeated measurement where you already know the outcome)