Two public key cryptography questions

Specifically, I have a question about the RSA method. I read The Code Book a while back, which gave a pretty good explanation of how RSA and Diffie-Helman-Merkle exchanges work, but there was one detail that seemed noticbly missing to me.

In a nutshell, a person’s public key is the product of two very large prime numbers. The two prime factors are then used as the private key. The private key can be derived from the public key, but only by factoring the public key, which is so computationally expensive that for a very large number, it’s practically impossible (we hope). Likewise, the encryption process is a one-way function that is very nearly impossible to reverse without knowing the two prime numbers that were used to create the public key. Ok, that’s all great.

First question:
All of the descriptions I’ve read say something like “Alice simply picks two giant prime numbers and multiplies them”. But how do you “simply pick two giant prime numbers”? I’m not a mathematician, so maybe I’m missing something obvious, but it seems like the process for checking for “primeness” would be virtually identical to the process for prime factorization. How can you quickly generate a very large prime number without also being able to quickly factor very large numbers?

Second (unrelated) question:
The distribution of primes is, on a large scale, predictable. That is, we can’t predict exactly when prime number will occur with our present understanding, but the density of primes seems to follow a simple pattern defined by 1/ln(N). In short, there are an infinite number of primes, but they’re getting more spread-out the higher we go. In other words, primes are more frequent between 1 and 100 than between 200,000,000,000 and 200,000,000,100.

This made me think about the future of public key cryptography. Computers are getting faster, but it’s still taking longer and longer to find the next prime. At the same time, we keep increasing the key length (i.e. using larger pairs of primes) to keep out of reach of the faster computers’ ability to factor large numbers. But is it an equillibrium? Will there come a time at which we can’t generate new primes fast enough to keep our head above water?

I’m trying to imagine how a very large organization would go about trying to crack RSA. You could go about it directly by trying to factor the public key and derive the private key for decryption. But what if you precompiled multiplication tables for all known primes? Then you wouldn’t have to do factorization at all - you could just look up the public key in a database and see what its factors were.

It would be a monumental undertaking - I doubt it’s even possible if only because of the storage space required. Every new prime would double the size of the database. And searching that much data, even if it were indexed, might take longer than just doing the factoring. But traditionally, precompiling data tables like that has been used to drastically speed up cracking times. I haven’t even tried to calculate how many possible keys there would be to calculate, so I may be way off-base here. Maybe the size of the problem is still so large that even an organization like the NSA can’t do what I’m talking about, but I’m curious if this will ever be a problem. Will we reach a point at which we can calculate all the new key pairs a newly discovered prime will provide before the next prime is calculated?

Ok, forget about the idea of precalculating a table of keys and their corresponding factors. I finally found someone who bothered to do the calculations, and that would be essentially impossible. Just to store all of the unfactored 200-digit (decimal) numbers would require ~5.985x10^203 bits of storage. If you could store things at a density of 100GB per a millionth of gram, the total mass required for the storage would be about 10^56 times greater than Chandresekhar’s limit, and quite possibly greater than the mass of our galaxy. Regardless, it would collapse into a black hole. Phooey.

Granted, my idea would only store those numbers that were factors of two primes, but on the other hand, RSA key lengths can easily be 4096 bits with existing technology, which is quite a bit more than 200 decimal digits. Even if you had a fabled quantum computer, you’d probably still be better off using it to factor numbers instead.

The rest of the question still remains, though. Will we eventually be able to factor large numbers faster than we can generate new primes?

No. If you can factorise a number, you can determine whether or not it is prime. On the other hand, there are tests that will tell you a number is not prime without giving you any specific factors. The primality problem is strictly contained within the factorisation problem.

Most (all real-world?) RSA implementations do not generate “real” prime numbers but numbers that are prime numbers with an insanely high probability. This can be done much faster than generating real primes, mathematically safe testing for primality or factoring. So you might be using a method that gives you a corrupt key every every one million years but you take that risk in exchange for a huge key that will work in practice and is extremely hard to factor (because this requires precision, or else is completely meaningless)

This isn’t quite right. In RSA for example, you have to perform a calculation using the primes to generate both the encryption and decryption key. Then the primes are discarded. They’re used in generating the key pair, but they aren’t themselves part of the key pair. In RSA, you pick two primes, p and q. You calculate n=pq. Then you randomly choose an encryption key e which meets certain requirements and calculate a decryption key d using the extended Euclidean algorithm. The public key is e and n and the private key is d. The individual primes are never used again. If that’s already clear and you just simplified your OP, ignore my digression.

As kellner said, there are a lot of algorithms for generating probable prime numbers. For example, PGP (an implementation of RSA) uses the four Fermat tests to check primality of a randomly chosen candidate, and a candidate which passes all four tests as only a 1 in 10^52 chance of being a false positive (i.e. not prime).

I haven’t thoroughly reviewed this site so I can’t vouch for its accuracy, but it seems to have a good description of the key generation in PGP and addresses some of your questions.

I’d recommend Bruce Schneier’s book “Applied Cryptography” if you’re interested in the details of these implementations.

The test for primality of a large target number more or less goes like this:

Pick a random number, do a certain calculation on the target. All primes will pass, composites will pass with a 50/50 probability. If it fails, then chuck the target and get a new one. Start over. If it passes, repeat test cycle over and over until you are semi-happy. Different people suggest different number of cycles, but 100-200 ought to do it. (It all depends on how paranoid you are.)

About 10 years ago, a similar test was developed for composite: all composites pass and primes fail with a certain probability.

If you run both tests together, eventually, one of them will tell you for sure if it’s prime or composite. The only variability is how long it takes. Extremely high odds of getting a reliable answer in a very short time.

You have to be careful in comparing value vs number of bits. Some people use n for both and they end up all turned around.

For 2048 bit numbers, about 1 in 1419 numbers are prime. So the primality check will go thru a lot of bad test numbers. But half will fail the first test cycle, 1/4 fail after 2 test cycles, etc. Since this is a one time setup for generating keys (as opposed to everytime you wanted to encrypt something), that’s not too bad. So you’re going to have to do 1500+ test cycles.