Cryptographic Zero Knowledge Proof Question

Alice and Bob both have a secret 4000 digit number. They want to know if they both have the same number but at the same time, to not let the other person know what number they have if the numbers differ. The solution seems fairly straightforward, Alice computes a hash digest and gives the first half of the digest to Bob, Bob computes a hash digest and gives the second half to Alice. Both can verify the other person’s matches. But if they don’t match, neither can figure out what the other person’s number is. However, this relies on 4000 digit numbers being impractical to brute force. What if it was a number between 1 and 100 instead?

Assuming that there are no trusted 3rd parties is there a better protocol that could still solve this problem?

Yes, a hash on a short piece of data can be brute-forced quite easily - credit card PINs would be a good example of where it isn’t really effective.

Isn’t it also possible for hashes from non-matching numbers to collide by accident? (assuming the hash is shorter than the original data).

Alice decides a random 128 bit value and sends it to Bob.
Bob decides a random 128 bit value and sends it to Alice.
Both of them concatenate their number to the value and create a hash, sending the hash back to the other person.
Alice concatenates her random value with her number and hashes it and sees if it matches what Bob sent.
Bob concatenates his random value with his number and hashes it and sees if it matches what Alice sent.

Hm, no, my method doesn’t work I think. I’ll respond again when my brain has booted up fully.

http://icanhascheezburger.files.wordpress.com/2008/07/funny-pictures-cat-cannot-brain-today.jpg

Okay…

Alice creates a hash h(x) and sends it to Bob.
Bob creates a hash g(h(x)) and publishes it.
Bob creates a hash g(y) and sends it to Alice.
Alice creates a hash h(g(y)) and publishes it.

Now assuming there exists a hash function such that h(g(x)) = g(h(x)), they could compare equality.

The question is, does such a hash function exist?

How about Alice picks a long random number, calculates the hash of the concatenation of her secret with the long random number, then sends both the result of the hash and the long random number to Bob. This allows Bob to repeat the calculation and verify that Alice has the same secret as he does. Then he does the same thing in the other direction, allowing Alice to verify the match for herself.

Note, however, that this does not actually let Bob know for sure that he’s talking to Alice, or vice versa. Once Alice sends out the random number and the result of the hash, someone else could pick it up and impersonate her. You’d have to add more information in order to overcome this.

I would think a simple step one of any process would be to compare the (randomly computed) nth digit of both numbers. If they aren’t the same, you’ve saved an awful lot of hashing and time.

“Does your number start with a 3? No? OK then…”

Why the distinction between sending and publishing? Certainly, taking h and g to be the same function, such a (pair of) hash function(s) exists, but I’m guessing that’s not what you want.

KneadToKnow has proffered a fine suggestion. Repeatedly compare randomly selected digits. You’ll give away those randomly selected digits, but no matter what, you’ll be giving away some amount of (randomly selected) information anyway. ntucker’s suggestion is good as well, and probably more to your taste, insofar as that the information revealed, while still information-theoretically just as strong (in order to achieve the same confidence), will be harder to put to any use (one assuming direct use to be aligned with reading off particular digits rather than reading off particular hashes…).

Though ntucker’s method should be modified a bit… when Alice generates a random string (call it the salt), she should just send the salt to Bob; Bob then sends back the hash of (his secret + the salt). Alice compares this to her own, and gains confidence that Bob’s secret matches hers. This can then be run again with Alice and Bob switched.

In particular, this ensures that, when Alice is trying to verify that Bob’s secret matches hers, she can be sure Bob isn’t cheating by avoiding use of random numbers, since she’s the one sending the random numbers for him to use. [On the other hand, this does allow Alice to extract targeted information from Bob, in the unwieldy form of the hash of his secret with a particular salt of her choice.]

Alternatively, one could have a trusted third party source of random numbers. At any rate, the basic idea is good.

The problem is easy to solve when the number are large enough to be infeasible to brute force. The question is, if the number are small enough such that brute forcing is trivial, is there any way to confirm equality without leaking information?

Indistinguishable: In a two person scenario, sending and publishing are identical but this scheme could generalize to n participants in which case, there is a distinction. Also, h(x) and g(x) must necessarily be different functions otherwise the security breaks down. Security is preserved by Alice not knowing g(x) and Bob not knowing h(x).

ntucker’s scheme would not work since if Alice got a result back that did not match hers, she could simply iterate through all of Bob’s choices until she found one that matched her result and she would then know Bob’s number.

I’m convinced the double hashing approach should be the correct one as long as there exists a hashing function such that h(g(x)) = g(h(x)).

But if brute-forcing is trivial, doesn’t your method break down as well? Bob is given a hash of Alice’s secret x, Alice is given a hash of Bob’s secret y, and the whole world is given another hash of Bob’s x and of Alice’s y. Shouldn’t they both be able to brute-force everything, then?

[I’m a little confused as to what your x and y represent, but I assume the x in line 1 is different from the x in line 2, and similarly for the ys in lines 3 and 4, the four different quantities being the separate halves of the two secrets]

I mean, to put it another way, if brute-forcing to undo a hash (modulo collisions) is trivial, then what’s the point of hashing in the first place?

This assumes that Alice and Bob trust each other. But if they trusted each other, one could just transmit the whole number to the other and be done with it. But what if Bob doesn’t have the same number, but wants Alice to think that he does?

Alice: “Is digit number 1734 a 6?”
Bob: (quickly writes down a 6 on a piece of scratch paper) “Why yes, yes it is.”.

I don’t think there is a classical solution to this problem, with true zero knowledge. Consider that if there exists a solution to the 4000 digit version of the problem, there must also be a solution for the single-bit version of the problem, and I don’t think there is (though I can’t rigourously prove that there isn’t). However, I think there might be an answer in quantum cryptography, something along the lines of Alice and Bob getting an entangled pair of electrons, and arranging “If your digit is 1, measure the X component of spin, and if your digit is 0, measure the Y component of spin”.

Well, for the single-bit version of the problem, the solution is trivial, if we take the precise amount of information that we want Alice and Bob to be able to learn to be the answer to the question “Is our number the same?”. They can just transmit their bit to each other, since that’s effectively the same as simply having an oracle tell them the answer to that question.

More generally, of course, the amount of information in the secret is larger than the amount of information in the answer to that question, so things get trickier.

As for KneadToKnow’s method, it shouldn’t be that Alice asks “Is digit 1734 a 6?”, but, rather, Alice picks a random number (say, 1734) and then asks “What’s your digit 1734?”. She then asks enough of these questions to convince herself that Bob has the same number [of course, he might just have the same digits at those particular locations; there has to be some reason to suppose that his having the same digits at many locations is evidence that he has the same number overall, which wouldn’t be the case if they both just picked random numbers and then went through the protocol, but might be the case in many other situations; certainly, if Bob were to feel entitled somehow to make the claim that he had the same number as Alice, he would be able to run through this protocol to demonstrate it, without Alice having to give away anything, and without Bob having to give away much], without, at any point, giving away her own digits at those locations.

Well, I suppose, if Bob were really feeling entitled to make the claim that he had the same number, he’d just transmit the number to prove it. But KtN’s method could be of some use if Bob wants to engage in damage control in the possibility that he actually does not have the same number. Which, I think, is the sort of thing being looked for. The key, though, is that there has to be some reason to believe that a large streak of hits is due to more than chance.

(Ideally, Alice and Bob would take alternating turns regarding who was asking for and who was receiving digits, and could/would stop the protocol as soon as they received a mistaken digit. You could also hash the secret first and then only exchange digits from the hash; information-theoretically, ignoring collisions, it’s all the same, but this can be preferable for reasons like those I said before)

FTR, the notion I mentioned was not intended to be iterated over and over, but simply done once, in an attempt to quickly eliminate the vast majority of non-matching numbers. At the absolute most, I would send it through two iterations. It just makes sense to me to spend a cycle (or two) of processing time up front in acknowledegment of the low odds of two numbers matching, which only get lower as the numbers get bigger.

Yeah, that occurred to me, too, some time after posting that. It’s one thing to know that the two numbers differ; it’s another to know in which bit (or bits) they differ. I still have a hunch, though, that the quantum solution is better than the optimum classical solution.

How about this: Alice and Bob both can both read English. Alice chooses some random portion of text from a random book, a paragraph or so. She uses hash functions to convert her number into a 256 bit AES key and uses that to encrypt the text. She sends the ciphertext to Bob. Bob uses the same process as Alice to convert his number into an AES key. Bob uses it to decrypt the ciphertext. If he can read the text then he knows their keys are the same, if he can’t he knows the keys are different. He could brute-force the ciphertext until he gets something readable but that would just give him the hash, not Alice’s original value.