Do computers have a monstrous file of prime numbers to pick from?
First of all, you will have noticed the program makes you bang on the keyboard, etc., to generate random bits, so you can tell it is not just deterministically picking numbers from a table.
Second, after the program appropriately picks a random number, it does not necessarily actually verify that it is prime! Faster to use a few pseudoprime tests, leaving a possible (but negligible in practice) probability that the generated number is not prime.
The program will pick a number within some range and run a test on it. There are more than one test it can run. Miller–Rabin primality test, Fermat primality test or the Solovay–Strassen primality test. If the number fails those it tries again.
If something more certain is needed it can use the AKS primality test.
Basically, they don’t try to pick a random prime. They pick a random number, and test if it’s prime, and if it’s not, repeat until it is.
Even at the large values used for encryption, primes aren’t all that rare, especially if you start with a technique that filters out small factors (like, if you start with a number that’s not a multiple of 2, 3, or 5, that’s already eliminated 11/15 of all possible numbers).
It’s actually not all that hard to deterministically test with 100% certainty that a number is prime, but it’s even faster to use one of the probabilistic tests, that says that it’s probably prime (with a chance of failure less than 1 in 10 to the big number power), so those are usually used instead.
The more interesting question is how encryption techniques pick random numbers at all. Computers themselves can’t generate truly random numbers. Pseudorandom numbers (doing things like applying a convoluted function to the system time at the moment the function is called) are good enough for many purposes, like games, but not good enough for high-stakes encryption, because if an attacker can figure out what “random” numbers you picked, the whole scheme falls apart. So you need some way of getting a good source of true randomness. For a home user, you can usually find enough entropy in things like human input and tiny fluctuations in the processor temperature, but enterprise systems can end up doing things like pointing a webcam at a big shelf full of lava lamps.
I recently read this story about how someone recovered his lost Bitcoin wallet by leveraging that kind of vulnerability:
He forgot his password, but it was discovered that the tool he used to generate the password was dependent on the clock for randomness:
They fiddled with the clock and brute forced guessed passwords based on likely characters and eventually were able to regenerate the original password to unlock the wallet.
And it’s easy to pick a number that’s not a multiple of 2, 3 or 5, just by creating your number to be in the form of 30*n+1 (where n is some number created as DPRK suggests). For example, 543341 is a random number I just created by mashing the keys. It’s not prime. But 543341*30+1 = 16300231 is actually prime (which means I’m pretty lucky, but not all that lucky)
Well, you’d be better off picking 30n+m, where m is one of 1, 7, 11, 13, 17, 19, 23, or 29. You want a technique that can potentially hit all prime numbers in some range, all equally likely.
Of course, if you’re doing this by computer, you could pick a number modulo 210 that is guaranteed to not be a multiple of 2, 3, 5, or 7, or a number modulo 2320 that’s guaranteed to not be a multiple of 2, 3, 5, 7, or 11, and so on. I’m not sure where the optimal cutoff point is, here: Each number added to your exclusion list is only a small decrease in the chance of hitting a composite, but on the other hand, a list of primes up to 2320 is also fairly small.
True - I was oversimplifying. 30n plus 1 or a prime less than 30 is better.
P.S. I recently learned that the median 2nd largest prime factor of any integer is 37, which is just cool
I looked through the prime-generating code in a widely-used library. The code does include a list of primes less than 5000. It checks divisibility by all of those. Then it does a Fermat test, Rabin–Miller, etc.
To generate a prime of a certain size, it picks a random number of that many bits, sets the low-order bit and one or two of the high-order bits to 1, and starts there.
Related question: if we try to do public key cryptography, of the kind that relies on the product of two primes, but we use probabilistic primality tests and one of our primes is in fact slightly composite, what would be the outcome? Would the resulting encrypted text be impossible to decrypt? Would it be trivial to decrypt even without the keys? Something else?
This demonstrates an interesting issue that keeps the spooks in the CIA and NSA up at night. The NSA (allegedly) collects all sorts of encrypted data and stores it for future reference, should they somehow find the key. So if some math whiz or AI figures out how to crack a code, then suddenly all previous encryption using that key (or technique) can be broken - much like the Venona project did over the years.
IIRC from an old lecture on the basic technique, essentially it relies on a tranforming a message using modulo of the product of two primes. So presumably if one number is not prime, it just makes it easier to crack the encryption.
The effect of using a composite where a prime is needed depends on which algorithm is being used. For RSA, the most common cryptosystem, I’m pretty sure that using a composite will break the algorithm, in the sense that encrypting a message and then decrypting the result will not always give you the original message.
The various proofs of correctness of RSA depend on the primality of the inputs (usually called p and q). For example, the Wikipedia article shows a proof using Fermat’s Little Theorem to replace (m^{p-1} \bmod p) with 1. But this equality does not hold if p is not prime.
For example, pick p=37 (prime) and q=77 (not prime) and the encryption exponent e=5. Then
n=37*77=2849
\lambda(n) = lcm(36,76)=684
d=e^{-1} \mod 684 = 137
Trying to encrypt 200 we get E(200) = 200^e \bmod n= 912
But decrypting 912 does not give 200: D(912) = 912^d \bmod n = 1495
Perhaps it will “probably” give the original message because if it didn’t then that would imply a better primality test. If you don’t get the original back then you didn’t have a prime to begin with.
sorry for bringing the debate down a notch (or 16) … but why is it important that the numbers in the encrypt. algorithm are prime numbers?
and how do encr.algo. work in the first place (explain it to me like I am 15)
thanks in advance
I am no expert, but I have I have invoked encryption in a cross platform Java-C# project (two quite different code languages, though there are similarities)
The prime number thing is because calculating a prime is hard. The person sending “knows” the prime (they don’t but…). The person receiving “knows” the prime (they don’t, but…)
What they both do know is how to enforce the use of the prime numbers in same way.
I hate to link to LinkedIn, which is hardly an authoritove source, but here we go:
For those of whom - as even I, a software developer who uses cryptography on a daily basis - have some nervous distrust of how it all works:
It is magic.
(Somewhat toungue in cheek, but “magic” still seems appropriate)
Different algorithms use primes in different ways. Many of them rely on the fact that multiplying two primes to get a composite number is easy, but factoring a composite number into primes is hard. scudsucker’s link is a reasonable basic introduction but if you really want to understand the details of how RSA works, you could start with the Wikipedia article.
This is why perfect forward secrecy is important. When PFS is used (such as for talking to this very website) cracking the private key (the secret key on the server) does not give access to any previous encrypted data, because that is all encrypted with unique session keys. Cracking any individual session key only gives access to encrypted data from that session, and no others. I think for web servers session keys are typically rotated every 1-24 hours.
What I wonder is why they don’t use pi or e to generate random numbers. Both number’s digits at any given place in the infinite sequence are random, so taking all the digits from place, say, 14,927 to 15,386 gives you a random 459 digit long number (or any other value you want, preferably a much higher one). If this is still deemed to unsafe because pi and e are known you could take every second digit, discarding the others, or take them in the sequence a, c, d, b, a+4=e, c+4=g, d+4=h, b+4=f, a+8=i… over and over, and then if you whish even read the result backwards. It would still be truly random, IMO. If you put the pseudo-randomness in where to start and end (the 14,927 to 15,386 in my example), the rest of the resulting sequence would be random. So why not proceed that way?
I use the paint mixing analogy to try to explain it.