Math question: numbers on the infinite line

Let’s say that we can sample uniformly from -∞ to ∞. We first sample two numbers, A and B, from this distribution. We then sample a third number from this distribution, and call it C. The question is then what is the probability that C lies between A and B.

There are several ways to approach this, and they have contradictory results. Math experts, can you explain what is the correct answer? While you’re at it, can you help me explain the flaws, if any, in the 3 different reasonings below?

1. We look at this from the perspective of A and B. Let’s say that A < B; they form an interval [A,B]. The chance that C, sampled from the infinite line, falls between [A,B] is 0. So the answer is 0.

2. We look at this from the perspective of C. The chance that A < C is 1/2, and the chance that B > C = 1/2. So the chance that A < C < B is 1/4. The chance that B < C < A is also 1/4, so the probability that C lies between A and B is 1/2.

3. If we arrange A, B, and C in increasing order, then there are 6 arrangements that can happen. Of those 6 ways, 2 ways will have C in the middle, so the chance that C lies between A and B is 1/3.

Thanks for the help in advance. By the way, this is not homework. I’m merely thinking of this out of interest.

This argument doesn’t really work, because the probability regarding C’s relationship with A and the probability regarding C’s relationship with B are not independent, so you can’t just multiply them like that.

As for the main issue…

Make an assumption:

Conclude that the assumption was incorrect. You can’t have a uniform distribution over an infinite interval.

Yup. Here is a somewhat detailed discussion of the problems of taking a random sample from the set of integers.

As **Sofis **has pointed out, #2 is incorrect.

#3 is incorrect as well. The fact that there are 6 arrangements doesn’t mean those 6 arrangements are equally likely.

The correct answer is #1 – the probability is 0.

But how could they be anything but equally likely? A, B and C are independent values chosen from the same distribution. What would give any of them a preference to being higher or lower than the others?

Thanks for the response. Apparently the assumption that we can sample uniformly over the infinite line is incorrect. However, I’m quite puzzled over the reason why #2 is incorrect. I am sampling A, B, and C independently from the same (admittedly nonexistent) distribution. So the relationship between A and C should be independent of C and B, isnt it?

There isn’t one correct answer, because there isn’t a uniquely formalized question here; there are multiple different answers available to you, depending on what you want to take “sample uniformly” to mean. Yes, there is a standard jargon reading of that phrase which will lead to saying, for example, that there is no such thing as uniform sampling on the infinite number line, but that doesn’t mean you have to read the phrase the same way.

What you have done here is really quite interesting and intelligent: you’ve teased out a number of properties one might want to include in a concept of uniform sampling in this context, and shown how they contradict each other; you can’t have them all at once. This may lead you to say “Ok, there is no such thing as uniform sampling in the sense I want, as shown by precisely this contradiction”, or it may lead you to consider other, modified notions of what uniform sampling could be, retaining some naively desired properties while jettisoning others. And in different contexts, for different applications, you may find different such notions to be of use to you. I can certainly think of contexts in which the reasonings presented in 1) and 3) are relevant, for example. (Don’t believe me that it’s reasonable to consider different notions of what uniform sampling is? Sure it is; it’s exactly the same as how, e.g., in one uncontroversial sense, it’s perfectly reasonable to say you can’t uniformly sample from the natural numbers, and in another uncontroversial sense, it’s perfectly reasonable to say a uniformly sampled natural number has 50% probability of being even)

So, I encourage you not to simply give up investigating the question, but rather to continue thinking about the various assumptions that go into the different kinds of reasoning, and what the consequences of those assumptions are, how one might formalize the various notions of uniformity involved in the different arguments, comparing and contrasting them, etc.

No, because once you have established one of the relationships, you have extra information. Consider that the distribution is [0,1] instead. prob(C < A) = 0.5. Given that C < A, what is prob(C > 0.99)? prob(C < 0.01)? Knowing that C is less than another sample from the same distribution increases the probability that C is in the lower parts of the distribution, which affects the probability of it being greater than or lesser than another sample from the same distribution (i.e. B). And conversely for C > A.

If you are willing to assume that, for any given value of C, the probability of a random number being greater than C is the same as that of it being less than C, then the OP’s argument is fine. E.g., fixing C at 0.5 by fiat and drawing A and B independently uniformly from [0, 1], it is indeed true that "A < C and “C < B” are independent, that the probability C is between A and B is 1/2, etc.

Of course, the difficulty, like you say, is in the assumption that a random number is equally likely to be above or below C for any given value of C. But it’s not impossible to make sense of this assumption either. (E.g., by considering the expected value of f(x) for x drawn “uniformly” from the number line to be the limit, as N goes to infinity, of the expected value of f(x) as x is drawn uniformly from [-N, N], and then extrapolating (in an order-dependent way!) to the expected value of f(x, y, z, …) for functions of multiple arguments to be drawn from the number line, and from there reconstituting probabilities as the expected values of indicator functions in the usual way. Then, if C is drawn “first”, while A and B are drawn later, it will indeed be the case that C has probability 1/2 of being between A and B. Of course, in many situations, this will not be the notion of uniform sampling one is interested in, but in some, it may well be. As I said, I emphasize the plurality of available concepts, rather than premature standardization for standardization’s sake)

JOOC what limit do you get when you do this??

The reasoning here seems suspect. Thinking about infinites is tricky but it seems to me there are two possibilities: either a random number can be an infinite distance from zero or a random number has to be a finite distance from zero.

Now in the first case, A and B could be infinite distances from zero and, by extension, could be an infinite distance apart from each other.

In the second case, A and B are both a finite distance from zero so there is a finite distance between them. But the same limit applies to C - it also is a finite distance from zero. So the infinite endpoints really don’t say anything about the locations of A, B, and C.

Why, do you get a different one?

Let’s first draw C, then draw A, then draw B (remember, the methodology being analyzed here is order-dependent). So let f(C, A, B) be 1 if C is inbetween A and B, and 0 otherwise.

Note that, for any given (fixed) values of C and A, the limit, as N goes to infinity, of the integral of f(C, A, B)/(2N) dB from B = -N to B = N is 1/2 (so long as C and A are distinct; on the other hand, if C and A are equal, the integrand is constantly zero, and so the limit is 0 as well).

Thus, E[f(C, A, B) | C, A] = 1/2 if C is distinct from A and 0 otherwise, from which we conclude that E[f(C, A, B) | C] = constantly 1/2, from which we conclude that E[f(C, A, B)] = 1/2, which is to say, the probability of C being between A and B is 1/2, on this particular formalization of the concept…

On the other hand, if one first draws A and B, and then draws C, as specified in the OP, then this methodology will yield a probability of 0. This exactly mirrors the difference between the reasoning in 2) and the reasoning in 1) of the OP.

On the other hand, all this order-dependence is a bit ugly, and so this concept of uniform sampling, while interesting, is not my favorite to think about. My favorite would be an order-independent one justifying the symmetry argument of 3) (e.g., the one wherein probabilities are taken via the limit as the three variables are simultaneously uniformly drawn from [-N, N]^3), and thus producing the probability of 1/3. But nevermind my aesthetic preferences; the different concepts are all out there, whether we talk about them or not.

I’m going to argue that any formalization that lets you define independent random variables A, B and C that are uniformly distributed on the real line and gives a probability other than 1/3 to the event that min(A, B) < C < max(A, B) is unreasonable. This will not be easy going; you genuinely need a bit of measure theory to talk about any non-trivial probabilistic notions, and this is no exception.

Before I get there, I want to show a fact about ordinary random variables. For any independent and identically distributed random variables (A, B, C), we have that P(min(A, B) < C < max(A, B)) = 1/3. This follows by symmetry–all permutations are equally likely–but some readers might object, so let me give another argument. It’s easy to show this when A, B and C are all uniformly distributed distributed on [0, 1]. Any other distribution has a quantile function Q, and the triple (Q(A), Q(B), Q©) are independently and identically distributed according to that distribution. Quantile functions are monotone, so they preserve orders, and the claim holds.

From the standpoint of probability theory–not applications, theory–almost nothing changes once you move from probability measures to finite measures, and not much more changes once you move from finite measures to sigma-finite measures. In particular, Bayes theorem still holds, and this allows us to use improper priors without causing any mathematical headaches. I want to emphasize that these are actually used in practice, and they give methods that often work well on real problems. That doesn’t mean they’re not tough as hell to interpret, but that’s a very different problem.

So, how do we define random variables that follow improper distributions? In general, it can be complicated, but for uniform distributions it’s easy. We take our space to be the product space formed from three copies of the real line with the Borel sigma algebra and Lebesgue measure. For those who are a little less comfortable with the measure theory, this means that our sample space consists of ordered triples of real numbers, our events of interest are pretty much anything you can imagine, and the measure of a set is simply its volume.

We define random variables A, B and C which are respectively the first, second and third coordinates. It’s not too hard to check that these are independent and identically distributed, and their densities with respect to Lebesgue measure are the product of a constant and the indicator function of a set–in this case, the real line–so they follow a uniform distribution.

Because the total mass of our space is infinite, we can’t really talk about probabilities, but we can talk about the possible orderings of A, B and C. We define the event E = {(x, y, z) : x < y < z}, and we note that the entire space is the disjoint union of images of E under permutation of the coordinates (plus some sets of measure zero, but who cares?). These permutations are measure preserving maps, and so in a very precise sense, every permutation has the same total mass. That’s why I’d require any reasonable formalization to give probability 1/3 to the event in the OP.

Now, can we actually assign probabilities here? I don’t see how to do it. The most obvious technique is to truncate and use some limit theorem for integration. The difficulty with this approach is that traditional truncation arguments have all the integrals taken against the same measure, and that doesn’t work here because a sequence of probability measures can’t converge to an infinite measure. There may be some other trick you can use, but I can’t think of what it might be.

I care, unless you also stipulate some limit on the number of such sets. All that R[sup]3[/sup] is, anyway, is some sets of measure zero (a very great many of them).

The sets I have in mind are triples where at least two of the coordinates are equal. There’s no harm in ignoring them.

Yeah, I know, I just enjoy the rare occasion when I can quibble to a mathematician about lack of rigor.

This would support the OP’s 3), but note that exactly analogous arguments can be given in support of the OP’s argument 1) and 2). [See below]

But it is in fact easy-going! Your argument is “Let’s consider volume-preserving transformations to be probability-preserving transformations; note that permutation is a volume-preserving transformation. Thus, probabilities should be symmetric under permutation”. The nitty-gritty details of the Lebesgue measure are of no actual relevance (nor ought we in this context prematurely standardize on such a low-level “implementation” of the volume “interface”); the argument has nothing to do with those. It is much higher-level.

I would look at it as that you’re not really giving another argument apart from appealing to symmetry; rather, you’re augmenting the symmetry argument itself by giving a slight bit of extra justification for the symmetry principle to begin with (essentially, by saying that since finite volume (i.e., measure in proportion to [0, 1]) is symmetric under permutation, so should be probability (i.e., measure proportional to the entire line)).

Ok. This supports the OP’s 3). Now let’s do 1) and 2) in the exact same way:

1): The map which sends (A, B, C) to (A, B, C + B - A) is measure-preserving (in geometric terminology, shearing is volume-preserving). But the image under this map of {(A, B, C) | C < A} is {(A, B, C) | C < A} U {(A, B, C) | C = A} U {(A, B, C) | C is between A and B}, yielding the equation P(C < A) = P(C < A) + P(C = A) + P(C is between A and B). Setting P(C = A) to zero in accordance with zero measure, and then cancelling out the P(C < A), we find that P(C is between A and B) = 0.

2): The map M which sends (A, B, C) to (C - (A - C), B, C) is measure-preserving (in geometric terminology, shearing combined with reflection is volume-preserving); so is the map N which sends (A, B, C) to (A, C - (B - C), C), and thus so is the composite map MN.

But the image of the set {(A, B, C) | A < C & B < C} under M, N, and MN is {(A, B, C) | C < A & B < C}, {(A, B, C) | A < C & C < B}, and {(A, B, C) | C < A & C < B}, respectively. Giving these disjoint events equal probabilities, and noting that the complement of their union has measure zero, we find that each has probability 1/4; thus, combining the middle two, we find that P(C is between A and B) = 1/2.

(I want to emphasize again that I do not consider the above to be different arguments from those given in the OP; rather, I consider them particular augmentations of the OP’s own arguments, establishing a little further the grounds from which one might draw the symmetry principles implicitly appealed to in each case)

Minor typo, but the parentheticals should of course read “(i.e., measure in proportion to the unit cube)” and “(i.e., measure in proportion to the entirety of R[sup]3[/sup])”.