# The identity hash

Is there a string which, when hashed with md5, returns itself? Same for SHA-1 and all the other hash functions. I tried googling for it but I couldn’t find anything.

What a fascinating question (to which I don’t have an answer)

Thinking about a practical way to discover such a thing - it would be fairly easy to set up a brute-force test on a random seed string and keep feeding the result back into the input, keeping track of the results - One other possibly interesting scenario that might arise here is to get stuck in a loop - where a small set of results lead to each other in turn, then back to the first one.

MD5 produces a 128 bit result so you’d have to take at each possible 128 value, hash it, and compare the input to the output. Assuming a single computer doing 1000 checks/second, it would approx 10.8e28 years to find all possible collisions.

Mangetout: You’ve just given the naïve ‘solution’ to the Halting Problem (to wit: run it and see what happens). Turing is laughing up his sleeve at you.

Shalmanese: Perfect hash functions (a mathematical abstraction real hash functions attempt to emulate) are allowed to have fixed points, as I understand it, and knowing about them tells you no more about the function than knowing any other arbitrary input-output pair. The magic of a perfect hash function is that every input from the arbitrary-sized domain always has the same output within the fixed range, but given an output value it’s still impossible to calculate any input values that map to it. Caching arbitrary numbers of input-output pairs is not even helpful when the input value you desire isn’t already in your cache. Needless to say we still haven’t produced such a beast, but we can use the abstraction to prove weaknesses in other parts of the system ("Even if the hash function were perfect, it still breaks… "). Finding an easy way to answer this for real hash functions would involve breaking those functions, because otherwise the search (as per Mangetout) may never end.

If you’re Googling, use the phrase “fixed point” rather than identity (as Derleth alluded to).

The brute-force search will halt eventually, assuming you’re methodical about it. The longest possible cycle is 2[sup]128[/sup]. If there are any MD5 cycles, they’re all limited to that — and, there can only be a finite number of them in total.

Still, I don’t think I’ll wait around for the brute-force search to finish. Or even make much of a start.

Likewise, you can brute-force the halting problem, for a computer only slightly smaller than the one you’re using. You just can’t use a computer to solve its own halting problem.

In case it wasn’t apparent, I wasn’t proposing a brute force search as any kind of efficient answer to the OP’s question - merely a thought experiment.

I think I’m right in saying that all possible cycles cannot add to more than 2[sup]128[/sup] steps either - or some of them would intersect (and would therefore be part of the same cycle)

Note that there is also a ‘fixed point attack’ on hash functions, which doesn’t apply here:

It’s a break in the function, but it isn’t what you’re looking for here.

Mangetout: OK, you’re right. In my defense, though, you did say ‘practical’. I don’t think waiting around for the generation of all possible 128-bit numbers is practical given the current methods of designing computer hardware.

Sure - no problem - I was thinking about how to do it. I wasn’t thinking of doing it though.

Since the range is finite, everything must eventually get stuck in a loop.
That loop might be kind of large.

Can’t help wondering if there would be any kind of pattern to the distribution and layout of the loops…

I’ve been thinking about this. Picture a bunch of rooted trees in the graph theoretical sense where all the edge are directed toward the root. The trees can come in all shapes and sizes. Now attach each root node to a directed cycle. There are a bunch of these cycles in many lengths to choose from.

Your hash is one such picture like this, but I’m not sure which one, and I don’t expect any additional structure than what I’ve given to be apparent, but I’m willing to be wrong about that.