I have had some time to think further along my original lines.
The Collatz-like formula generator
The Collatz conjecture assumes the existence of a loop which passes through 1 that all numbers will eventually attach themselves to, given enough time. One form of disproof of the conjecture would be to find a loop which does not pass through 1.
A loop can be represented as a conjunction of all of the steps within it, wrapped around each other.
The only known loop is for the number 1. It starts at 1 and ends at 1. The first step is to multiply by 3 and add 1 (3x +1), giving us 4. The second step divides by 2. Showing the two steps in conjunction to each other, this can be represented like (3x+1) / 2. If you set x to 1 and calculate the result of this formula, you’ll arrive at the same result as the Collatz conjecture does for the number 1 after it’s second step (i.e. you’ll get the number 2). The third step is another division by 2 (((3x + 1) / 2) / 2) which brings us back to 1.
We can represent the entire loop as a formula of the steps from the initial x to the eventual output which is back at the same number.
At heart, any loop that the rules of the conjecture might give rise to would simply be a case where a function like 3(x / 2) + 1 is equal to x. The output of the function is the input of the function.
We can view the collatz conjecture as a series of permutations of two functions:
a(x) = x / 2
b(x) = 3x +1
We can simply wrap a() around the output of b() or vice versa, a() around itself, and b() around itself and continue repeating that as many times as we wish in any combination that we wish.
More technically, we might say that we have some value n where n is the sum of g (the total number of a() functions that we have decided to throw in) and h (the total number of b()). Once we have decided how many, we decide what type at each step. Within the space of a certain n, a certain h, and a certain g, there are only so many permutations of a() and b() functions before, either, we arrive at the number 1 or find the entry point to some other loop.
What type, and what order to examine, we can represent that choice as an ordered list like:
abaabbabbba
Which end to start, left or right be as it may, this list describes the order that we have decided to wrap the functions in. If we start from the right (as the innermost function), for example, then we would be saying that we have chosen to solve the following function for x equals some constant value:
x = a(b(a(a(b(b(a(b(b(b(a(x)))))))))))
True Collatz formulas
If we were to choose an n (that is, a total number of a/b functions to permute around each other) and then decide the ratio of g to h to test within n, we’re going to generate a bunch of formulas that are Collatz-like but aren’t necessarily true Collatz formulas.
A ‘true’ Collatz formula would be something like:
(x / 2) / 2
It’s true because:
- There is some whole number greater than 0 that we can choose as the initial x where, for every a() or b() in our series, the a() or b() that we have chosen is the matching function that Collatz would have chosen given the input that has been passed to it.
- For that same initial x, the final output will be either 1 or the initial x.
In this case, we’re talking about the number 4. If we input the number 4, the output will be the number 1 and, further, at every step in the process the parity of the number is correct. When it is 4, from the Collatz conjecture, we expect it to be decided by 2. That’s what the formula does. Following that we expect another division by 2 and again that’s what it does. (x / 2) / 2 is a correct formulaic representation of the path that a number takes on its way to 1 or itself.
But let’s examine rule #2, to address some assumptions.
The real options for the Collatz conjecture where - unlike our permutation system - there is no finite set of steps that it is guaranteed to take before it eventually lands on 1 or, if the conjecture is wrong, finds an alternate loop, or finds a way to diverge and expand infinitely.
Plausibly, Collatz can diverge in reality, for some giant value. By nominating a thing a ‘true’ Collatz formula I am not meaning to imply that we are discussing all Collatz eventualities. We are merely discussing those cases where there is a bounded (non-divergent) case where the sequence either comes to the value 1 or the initial x and so the ensuing loop can be represented as a discrete formula. Any divergent Collatz sequence has no formula. It is a ‘true’ formula because it does correspond to a Collatz loop or path to 1/x following the official rules, not because it is representative of all Collatz sequences.
There are many fake Collatz-like formulas that are not ‘true’. For example, the function:
3x + 1
If we trace the course of things through the Collatz sequence until we hit 1 or the initial x, we will never see that formula emerge. If we try to solve for 1, the result is zero. That breaks all of our rules since they all require an initial x that is a whole number greater than 0. We’ve just determined that the only input we can give to find 1 is the number 0. Similarly, if we try to solve for x itself, we get the result -1/2. Again, this is an invalid initial input.
Rule 2 allows us to very easily rule out the grand majority of Collatz-like formulas and determine that they are not true Collatz formulas.
Sets of true Collatz and Collatz-like formulas
We can talk about Collatz formulas in terms of sets. Firstly we have the set D where D contains all positive whole numbers. There is a set C of true Collatz formulas.
Thirdly, we have the set P, where P is the set of all permutation identities (a, aa, ab, bbaaabaab, etc.). Lastly, we have the set F of all Collatz-like formulas which we can generate through our above mechanism.
There is a one-to-one matchup between C and D. For the number 4, for example, there is exactly one true Collatz formula ((x / 2) / 2 or ‘aa’). I’m sure that there’s some better way to say this in mathematical terms but, as a programmer, I would say that C is a dictionary which can be accessed and always return a non-null result for any value in D. We could reference a particular true Collatz function through a statement akin to C[4].
Similarly, there is a one-to-one making between P and F. F can be accessed like F[aa].
These are all equivalent:
C[4] = (x/2) / 2 = F[aa]
F is a superset of C.
Collatz-like formulas as linear equations
If we convert anything in C to a linear equation (e.g., y = (x / 2) / 2) and graph it, we’ll see that all known examples intersect the line y = x at a non-negative x and y coordinate. C[4] is equivalent to the formula y = 1/4 x. C[5] is equivalent to 3/8x + 1/8. C[4] intersects at (0,0). C[5] intersects at (0.2,0.2).
This is purely a factor of the above rules for a true Collatz formula. All known formulas in C can be solved for 1 not x but the answer will always be either 1 or x by definition.
The linear form of a formula in C, intersected with y = x is equivalent to solving a Collatz-like formula for x. E.g.:
x = (x / 2) / 2
x = (((3x +1) / 2) / 2) / 2
The known entries in C can all be solved in linear form by intersecting with the line y = 1. This is equivalent to solving for 1.
1 = (x / 2) / 2
1 = (((3x +1) / 2) / 2) / 2
For all known examples, solving for the intersection with y = 1 allows us to determine the correct accessor into C for that formula; the x coordinate will be that value. (E.g., 4 for the first of the two above.)
To intersect with the line y = 1, the slope of the line cannot be zero. If you reduce a formula in C to a linear equation it can never be zero. It can also never be negative. It is always a line with a positive slope that insects y = 1 at a positive location.
These linear equations also always either intersect the y axis at zero or at some positive location. If the formula consists only of a’s then the resulting formula can only go through the origin. If there are any b’s then they can shift the intersection with the y axis positively via the +1 that is part of the function. Any a’s in the mix will divide the distance along the y axis towards the origin but never past it.
For a positive slope line to intersect y = 1 while traveling through the y axis at or above zero it must intersect the y axis between 0 and 1.
Since the intersections that we are looking for are either y = 1, where x >= 1 or y = x, where x >= 1, any line which solves the first, has a positive slope, and intersects the y axis between 0 and 1 will intersect y = x between 0 <= x <= 1.
Any formula in C, if it exists, where there is a loop that goes back to itself instead of to 1, by definition, intersects y = x at a location where x & y are both > 1.
All formulas in C, represented as a linear equation, intersect y =x at a non-negative y coordinate.
This all tells us that any formula in F which, converted to a linear equation, has a slope greater than 1 cannot be a a ‘true’ Collatz formula. We can immediately disregard these.
Any formula in F with only a’s can be ignored.
There are some further quick checks that we can do to eliminate items from F if we want to know whether they might be a true Collatz formula but let’s leave it there for now.
Searching F
One type of disproof of the Collatz conjecture requires, simply, finding some initial x that loops back on itself.
Checking all x has proved to be prohibitive.
Searching x is not particularly efficient. You must repeat the Collatz sequence until either coming to 1 or, if you have enough memory to store it, find a value that you have seen before in previous tests that came to a 1.
This could quickly become prohibitive in terms of hardware memory space. But failure to do so means that you are spending the grand majority of your time zig-zagging through a space that had already been searched. There are, some optimizations (recognizing powers of 2, stopping if you drop under the range of all numbers checked so far, etc.) but, plausibly, it would be faster to search F.
Every formula in F, as a linear equation, will be of the following form (assuming I haven’t messed something up - it’s worth double-checking!)
3^h / 2^g x + (3^h - 1) / 2
You can simply crank through all values of h and g checking the slope and y axis intercept to rule out the grand majority of n (where n is the total number of steps in the possible Collatz loop). While, granted, this isn’t testing anything to do with Collatz, it would allow you to get to some fairly large values of n fairly quickly.
Every value of n that you rule out rules out that entire zig-zag through the number line. You’re not ruling out a single number, you’re ruling out an entire set of movements, repeated through infinity.
Having found a linear equation for h and g that would intercept y = x, you know immediately if you care or not. If the solution is not whole, you can move on. If it is whole then you finally have to perform a heavier check and determine whether any permutation is legal (does the parity of the number match at every step). But, until that point, you should be able to cruise past large swathes of space.
So… We’ll see if I have time to code something up.
It’s not a proof nor disproof, but it might be an optimization.
And…to ask a question since this is GQ: is this a pathway of thought that someone has already gone down? How does one search for something like this?