Collatz Conjecture Proof?

I was thinking about the Collatz Conjecture and it seemed like it should be provably false, but I’m not sure how one computes probabilities on infinites.

Basically, if instead of caring about discrete numbers, we just look at the Conjecture as something more like a Markov Chain where node A always goes to B, 100% of the time, and B either goes to itself 50% of the time or back to A 50% of the time, then yes obviously we would expect the general trend be that we’re usually in B.

Now we could further say that there are further factors, e.g. the square number super highway to hell, which we should possibly consider as further nodes off A and B, which are going to have skewed probabilities of being triggered depending on how large our current position (p[sub]n[/sub]) on the number line is, but there are only so many (a finite number of) integer sequences which would be pertinent to the Conjecture due to how the machine that’s been strapped to the number line works.

Now while the Collatz Machine, including all of the nodes for all of the pertinent integer sequences, might skew heavily towards moving in a linear direction on the number line, the fact would remain that there is a finite set of nodes and they have no internal logic to make sure that they continuously avoid going back to node A. Outside of the integer sequences that would have a mathematical relationship to the simple, limited gearwork of the machine, there is no mechanism to prevent the machine from moving up just as often as it moves down.

And then once we accept that all integer sequences form larger and larger gaps as we move higher on the number line, and the ratio of “interesting numbers” to “boring numbers” becomes an infinitesimal, eventually there must be some point where our bonus states beyond A and B go away and can be stricken out as irrelevant.

At that point, we can safely say that this is purely a probability question of whether the math can never loop back perfectly on itself. And given that our initial position (p[sub]0[/sub]) had an infinite range and so, effectively, an infinite number of iterations that it can run using the simple AB version of the Markov Chain, the probability of it looping back to its original position should be non-zero.

While a I’ll grant that this is pseudo math, I feel like that should be the result of any actual math?

No. This is just, as you put it, pseudo-math.

Too far removed to turn it into actual math?

As you note, part of the problem is the way you’re treating probabilities of infinite numbers. When dealing with probabilities among infinite sets, it’s entirely possible to have events with probability 0 that are still possible. For example, if I select a real number between 0 and 1, the probability that I select 1/2 (or any other number in that range) is zero. After all, if it wasn’t zero, it would have to be that same value for all numbers between 0 and 1, and that would mean that the total probability of all possible events would have to be infinite.

However, the fact that the probability of selecting 1/2 from between 0 and 1 is zero does not mean that the number 1/2 does not exist. Your argument seems to be based on making a similar sort of argument: even if there is zero probability of selecting a number whose Collatz sequence doesn’t return to 1, that wouldn’t imply that no such number exists.

I’m no mathematician but I can see two possible issues in your reasoning:

Firstly

There are an infinite number of cardinals so there are an infinite number of sequences.
Secondly

For a mathematical proof all numbers are relevant. It only takes one contradictory number to disprove the Conjecture.

Obligatory xkcd link

There’s really nothing there. It just doesn’t work that way. For an example of a sort of argument for this vague area of problem that does work, see some of the work in ergodic theory related to the Green-Tao theorem and similar relations.

Even if you prove this way that the hailstone (I had always heard it called the hailstone conjecture) falls to the ground with probability 1, it does not follow that it always does. Probability doesn’t work that way on infinite sets.

This is a counter-proof. I don’t believe that the Conjecture is true. The Conjecture argues that there’s some mathematical pattern that forces the device to fall to 1. I was arguing that the behavior of the machine is effectively random and the sequences that could make its behavior less random are mostly at the small end of the number line. With large enough numbers, though, it’s just random chance where it goes and there’s nothing to prevent it from looping back on itself.

There is indeed nothing to stop it from looping. Or increasing indefinitely. In fact, I am about to suggest it “should”. It seems to me that the easiest way to view the question is as a function from odd natural numbers to odd natural numbers defined by f(n) is the largest odd divisor of 3n+1. Or equivalently, 3n+1 divided by the largest power of 2 that divides it. Assuming randomness, we would expect f(n) to be (3n+1)/2 half the time, (3n+1)/4 a 1/4 of the time, (3n+1)/8 an 1/8 of the time. Adding up the resultant infinite series suggests that the expected value of f(n) is n + 1/3. Of course, that is not an integer, but it suggests that we should expect f(n) to be slightly larger than n, on average and therefore increase indefinitely. Of course, there is no known example where this happens. As far as anyone knows, it always falls to 1 eventually.

I began to wonder what would happen if we replace 3n+1 by 3n-1. Then the expected value of the corresponding function would be n - 1/3 so maybe that one should be expected to fall to 1 and stay there. So I tried a few sequences and here is the result. The very first number I tried was 37 and I got 37,55,41,61,91,17,25,37, after which it would continue to cycle with a period of 7. Next, I tried 33 and got 33,49,73,109,163,61, after which it joined the preceding cycle. Then I tried 35 and got 35,57,85,127,95,71,107,5,7,5 and then continually alternated between 5 and 7. Make of that what you will, but there is an obvious conjecture here. Note incidentally that no multiple of 3 can appear in any of these sequences except as the initial term.

Someone who can program can try a lot more examples. I chose the first at random, the others nearby.

The heuristic probabilistic argument supports the conjecture - successive odd terms decrease on average, see here:

http://www.cecm.sfu.ca/organics/papers/lagarias/paper/html/node3.html

I just don’t see where that infinite product is coming from. He starts off with the same estimate I did and then goes elsewhere with it. I think my computation is straightforward.

BTW, I made a numerical error in one of the examples.

I have had some time to think further along my original lines.

The Collatz-like formula generator

The Collatz conjecture assumes the existence of a loop which passes through 1 that all numbers will eventually attach themselves to, given enough time. One form of disproof of the conjecture would be to find a loop which does not pass through 1.

A loop can be represented as a conjunction of all of the steps within it, wrapped around each other.

The only known loop is for the number 1. It starts at 1 and ends at 1. The first step is to multiply by 3 and add 1 (3x +1), giving us 4. The second step divides by 2. Showing the two steps in conjunction to each other, this can be represented like (3x+1) / 2. If you set x to 1 and calculate the result of this formula, you’ll arrive at the same result as the Collatz conjecture does for the number 1 after it’s second step (i.e. you’ll get the number 2). The third step is another division by 2 (((3x + 1) / 2) / 2) which brings us back to 1.

We can represent the entire loop as a formula of the steps from the initial x to the eventual output which is back at the same number.

At heart, any loop that the rules of the conjecture might give rise to would simply be a case where a function like 3(x / 2) + 1 is equal to x. The output of the function is the input of the function.

We can view the collatz conjecture as a series of permutations of two functions:

a(x) = x / 2
b(x) = 3x +1

We can simply wrap a() around the output of b() or vice versa, a() around itself, and b() around itself and continue repeating that as many times as we wish in any combination that we wish.

More technically, we might say that we have some value n where n is the sum of g (the total number of a() functions that we have decided to throw in) and h (the total number of b()). Once we have decided how many, we decide what type at each step. Within the space of a certain n, a certain h, and a certain g, there are only so many permutations of a() and b() functions before, either, we arrive at the number 1 or find the entry point to some other loop.

What type, and what order to examine, we can represent that choice as an ordered list like:

abaabbabbba

Which end to start, left or right be as it may, this list describes the order that we have decided to wrap the functions in. If we start from the right (as the innermost function), for example, then we would be saying that we have chosen to solve the following function for x equals some constant value:

x = a(b(a(a(b(b(a(b(b(b(a(x)))))))))))

True Collatz formulas

If we were to choose an n (that is, a total number of a/b functions to permute around each other) and then decide the ratio of g to h to test within n, we’re going to generate a bunch of formulas that are Collatz-like but aren’t necessarily true Collatz formulas.

A ‘true’ Collatz formula would be something like:

(x / 2) / 2

It’s true because:

  1. There is some whole number greater than 0 that we can choose as the initial x where, for every a() or b() in our series, the a() or b() that we have chosen is the matching function that Collatz would have chosen given the input that has been passed to it.
  2. For that same initial x, the final output will be either 1 or the initial x.

In this case, we’re talking about the number 4. If we input the number 4, the output will be the number 1 and, further, at every step in the process the parity of the number is correct. When it is 4, from the Collatz conjecture, we expect it to be decided by 2. That’s what the formula does. Following that we expect another division by 2 and again that’s what it does. (x / 2) / 2 is a correct formulaic representation of the path that a number takes on its way to 1 or itself.

But let’s examine rule #2, to address some assumptions.

The real options for the Collatz conjecture where - unlike our permutation system - there is no finite set of steps that it is guaranteed to take before it eventually lands on 1 or, if the conjecture is wrong, finds an alternate loop, or finds a way to diverge and expand infinitely.

Plausibly, Collatz can diverge in reality, for some giant value. By nominating a thing a ‘true’ Collatz formula I am not meaning to imply that we are discussing all Collatz eventualities. We are merely discussing those cases where there is a bounded (non-divergent) case where the sequence either comes to the value 1 or the initial x and so the ensuing loop can be represented as a discrete formula. Any divergent Collatz sequence has no formula. It is a ‘true’ formula because it does correspond to a Collatz loop or path to 1/x following the official rules, not because it is representative of all Collatz sequences.

There are many fake Collatz-like formulas that are not ‘true’. For example, the function:

3x + 1

If we trace the course of things through the Collatz sequence until we hit 1 or the initial x, we will never see that formula emerge. If we try to solve for 1, the result is zero. That breaks all of our rules since they all require an initial x that is a whole number greater than 0. We’ve just determined that the only input we can give to find 1 is the number 0. Similarly, if we try to solve for x itself, we get the result -1/2. Again, this is an invalid initial input.

Rule 2 allows us to very easily rule out the grand majority of Collatz-like formulas and determine that they are not true Collatz formulas.

Sets of true Collatz and Collatz-like formulas

We can talk about Collatz formulas in terms of sets. Firstly we have the set D where D contains all positive whole numbers. There is a set C of true Collatz formulas.

Thirdly, we have the set P, where P is the set of all permutation identities (a, aa, ab, bbaaabaab, etc.). Lastly, we have the set F of all Collatz-like formulas which we can generate through our above mechanism.

There is a one-to-one matchup between C and D. For the number 4, for example, there is exactly one true Collatz formula ((x / 2) / 2 or ‘aa’). I’m sure that there’s some better way to say this in mathematical terms but, as a programmer, I would say that C is a dictionary which can be accessed and always return a non-null result for any value in D. We could reference a particular true Collatz function through a statement akin to C[4].

Similarly, there is a one-to-one making between P and F. F can be accessed like F[aa].

These are all equivalent:

C[4] = (x/2) / 2 = F[aa]

F is a superset of C.

Collatz-like formulas as linear equations

If we convert anything in C to a linear equation (e.g., y = (x / 2) / 2) and graph it, we’ll see that all known examples intersect the line y = x at a non-negative x and y coordinate. C[4] is equivalent to the formula y = 1/4 x. C[5] is equivalent to 3/8x + 1/8. C[4] intersects at (0,0). C[5] intersects at (0.2,0.2).

This is purely a factor of the above rules for a true Collatz formula. All known formulas in C can be solved for 1 not x but the answer will always be either 1 or x by definition.

The linear form of a formula in C, intersected with y = x is equivalent to solving a Collatz-like formula for x. E.g.:

x = (x / 2) / 2
x = (((3x +1) / 2) / 2) / 2

The known entries in C can all be solved in linear form by intersecting with the line y = 1. This is equivalent to solving for 1.

1 = (x / 2) / 2
1 = (((3x +1) / 2) / 2) / 2

For all known examples, solving for the intersection with y = 1 allows us to determine the correct accessor into C for that formula; the x coordinate will be that value. (E.g., 4 for the first of the two above.)

To intersect with the line y = 1, the slope of the line cannot be zero. If you reduce a formula in C to a linear equation it can never be zero. It can also never be negative. It is always a line with a positive slope that insects y = 1 at a positive location.

These linear equations also always either intersect the y axis at zero or at some positive location. If the formula consists only of a’s then the resulting formula can only go through the origin. If there are any b’s then they can shift the intersection with the y axis positively via the +1 that is part of the function. Any a’s in the mix will divide the distance along the y axis towards the origin but never past it.

For a positive slope line to intersect y = 1 while traveling through the y axis at or above zero it must intersect the y axis between 0 and 1.

Since the intersections that we are looking for are either y = 1, where x >= 1 or y = x, where x >= 1, any line which solves the first, has a positive slope, and intersects the y axis between 0 and 1 will intersect y = x between 0 <= x <= 1.

Any formula in C, if it exists, where there is a loop that goes back to itself instead of to 1, by definition, intersects y = x at a location where x & y are both > 1.

All formulas in C, represented as a linear equation, intersect y =x at a non-negative y coordinate.

This all tells us that any formula in F which, converted to a linear equation, has a slope greater than 1 cannot be a a ‘true’ Collatz formula. We can immediately disregard these.

Any formula in F with only a’s can be ignored.

There are some further quick checks that we can do to eliminate items from F if we want to know whether they might be a true Collatz formula but let’s leave it there for now.

Searching F

One type of disproof of the Collatz conjecture requires, simply, finding some initial x that loops back on itself.

Checking all x has proved to be prohibitive.

Searching x is not particularly efficient. You must repeat the Collatz sequence until either coming to 1 or, if you have enough memory to store it, find a value that you have seen before in previous tests that came to a 1.

This could quickly become prohibitive in terms of hardware memory space. But failure to do so means that you are spending the grand majority of your time zig-zagging through a space that had already been searched. There are, some optimizations (recognizing powers of 2, stopping if you drop under the range of all numbers checked so far, etc.) but, plausibly, it would be faster to search F.

Every formula in F, as a linear equation, will be of the following form (assuming I haven’t messed something up - it’s worth double-checking!)

3^h / 2^g x + (3^h - 1) / 2

You can simply crank through all values of h and g checking the slope and y axis intercept to rule out the grand majority of n (where n is the total number of steps in the possible Collatz loop). While, granted, this isn’t testing anything to do with Collatz, it would allow you to get to some fairly large values of n fairly quickly.

Every value of n that you rule out rules out that entire zig-zag through the number line. You’re not ruling out a single number, you’re ruling out an entire set of movements, repeated through infinity.

Having found a linear equation for h and g that would intercept y = x, you know immediately if you care or not. If the solution is not whole, you can move on. If it is whole then you finally have to perform a heavier check and determine whether any permutation is legal (does the parity of the number match at every step). But, until that point, you should be able to cruise past large swathes of space.

So… We’ll see if I have time to code something up.

It’s not a proof nor disproof, but it might be an optimization.

And…to ask a question since this is GQ: is this a pathway of thought that someone has already gone down? How does one search for something like this?

Here’s the thing: You’re making probabilistic arguments that there almost certainly exists a counterexample to the Collatz conjecture. But those exact same arguments would also conclude that there almost certainly exists a counterexample somewhere in the first billion billion numbers. Except, we’ve tested it, and there doesn’t. Even though it seems like there ought to be a counterexample somewhere in there, there isn’t. That’s what makes the Collatz conjecture so interesting.

Any chance that you could review my previous comment?

You have to try and see it from the reviewer’s point of view. The brain keeps reacting “Oh, that’s never going to work. That’s a fishy point.” etc. So trying to parse thru all this … “stuff” … is very taxing compared to something that doesn’t have these issues. Going thru something like this isn’t “normal” reviewing.

Basically, it sure looks like a completely hopeless approach. So why would someone bother to put in not just effort but extra effort to figure out where all the mistakes are?

There are “fluke” numbers in Math. Take
808,017,424,794,512,875,886,459,904,961,710,757,005,754,368,000,000,000.

What’s so special about it? It’s the order of the largest sporadic simple group, a.k.a. the Monster group. There are only 26 sporadic groups. The second largest is about 20 orders of magnitude smaller .What are the chances at establishing the existence of this Monster group via probability methods?

The review request was for the new question, not for the math nor the original statement.

As I understand it, The Collatz conjecture is to find out whether or not, starting with an initial integer, you indeed end up with the 4-2-1 loop; maybe you end up with another loop, or you spiral out to infinity.

As Sage Rat stated there is a series of permutations of two functions:

a=x/2
b=3x+1

What, if anything, new can we learn if, starting with 1, we work backward by reversing the functions?

Starting with 1, sprout two new branches:

a’=2x
b’=3/x-1

Yeah, people grow the “tree” from the bottom all the time. But your second one should be (x-1)/3 , and you can only use that one when the result is an odd integer.

Ah, forgot to reverse order of operations.
PEMDAS
to
SADMEP.

Btw, I figure, since we’re reversing, you don’t use one function OR the other depending on whether or not that particular integer is even or odd, you branch BOTH functions to generate BOTH even and odd