How do encryption algorithms pick prime numbers?

The challenge would be that the program would have to have a random way of coming up with the starting and ending positions of pi. If a hacker can figure out how the program picked the positions 14,927 and 15,386, then the hacker can figure out which digits of pi the program used. If the program uses something like the system clock to pick those positions, then a hacker can set the clock back to that point in time and produce the same 14,927 and 15,386 values that the program used initially.

I don’t want this to be a hijack, but it’s related and I’m extremely curious.

If you know the “seed” number to a random number generator, and you put in that same seed number today and a year later (and the program is the same), it will generate the same “random” numbers in the same sequence both times, right? Or would it treat the same seed number differently? Or am I missing something extremely fundamental =)

You (by definition!!) cannot use any algorithm to generate random numbers. Pi, e, doesn’t matter. That is why a random number is chosen using random bits, coming from having the user bang on the keyboard, jiggle the mouse, take pictures of lava lamps, or any other unpredictable sources that are used to accumulate “entropy”.

Once you have a source of random numbers, you are ready to go on and generate random primes, RSA keys, or whatever you need.

This statement is false as there is nothing random about the digits of pi, nor AFAIK is it proven that it is even a normal number. There are ways of cryptographically “stretching” a short random key to make it longer, (not involving pi however).

Using a well-known value like pi or e would make it vastly easier for an attacker to reproduce the random number. Say your algorithm to generate 100 random digits is to pick 100 consecutive digits from the first billion digits of pi. Now the attacker only needs to test a billion possible values: 109 possibilities instead of 10100 as they would need to do for a truly random 100 digits. Adding in some ad hoc mixing like using every K-th digit only multiplies the number of possible values by a small constant.

ETA Note that only 100 trillion (1014) digits of pi are known. Most cryptographic application need random numbers that are much longer than 14 digits, so picking your random number from digits of pi greatly reduces the search space.

Yes that is correct. An algorithmic random number generator will produce the same output given the same input. That’s why if an algorithmic RNG is used for cryptographic purposes, the input must be high-quality random data, coming from a non-algorithmic source.

Wait, so it’ll actually spend random bits to generate a number that might even be a multiple of 3, and hence have to be discarded? That sounds incredibly wasteful, to me.

Oh, there are certainly “better” primality tests, in the sense that they’re guaranteed to work, while still being relatively quick and easy to perform. They’re just not as quick as the almost-guaranteed ones, which matters when you’re just discarding numbers until you find one that works (you might need to test many numbers). I suppose that if you really wanted to be sure, then once you got a number that passes all of the probabilistic tests, you could then use one of the definite tests on the number. But realistically, it won’t matter, because you could keep running those “almost definite” tests for the lifetime of the Universe on every computer ever made, and still never see them fail.

This, I think, warrants its own post. The most important concept is the idea of everyone having two keys, a private key and a public key. If you want to do cryptography, you first randomly generate a private key, and tell it to nobody. No, not even them. Then, you do some easy math on your private key, and use it to generate a public key, and tell that to everybody. Now, anyone who wants to send you a message can use your public key (which they know) to encrypt it, and you can use your private key (which you know) to decrypt it, but since nobody else has your private key, nobody else can decrypt it.

There are some things that are necessary, in order for this to work. First, the private keys have to be truly random, because if anyone can guess your private key, they can read everything meant for you (and the fact that the correct private key will produce readable messages, while an incorrect one produces gobbledegook, makes it very easy to test if your guess is correct).

Second, it has to be (relatively) easy to generate the public key, given the private key. You only have to do this step once per user, so it’d probably be OK if you needed to run a PC for an hour to generate it, or the like, but it can’t be a “takes longer than the age of the Universe” sort of thing. As it happens, in most of the standard techniques, it is actually easy enough to do it by hand, even.

Third, it needs to be really, really hard to generate the private key, given the public key. It will always be technically possible, but this step can and should be “longer than the age of the Universe” hard.

Fourth, of course, the public key needs to be usable to actually encrypt messages, and the private key needs to be usable to reverse that encryption.

None of this inherently involves primes. But the first technique that was developed that does meet all of these requirements, the RSA algorithm, happens to use them (though there have been other methods developed since then that work other ways). In RSA, the private key is (mostly) a pair of large prime numbers, and the public key is (mostly) the product of those two large prime numbers. Multiplying two numbers together, even large numbers, is easy. You can do it by hand, and a computer can do it quickly. But there’s no known good method for finding the factors of a number, if the factors are all large.

As an added benefit, by the way, you can also use the RSA technique the other way around: I can use my private key to encrypt a message, and then anyone in the world can use my public key to decrypt the same message, and thus be certain that the message was actually sent by me (or at least, by someone who knows my private key, but that should only be me). I don’t think that all of the other methods have this property, but don’t quote me on that.

Why is it so wasteful? If the number happens to be a multiple of 3 (or 5, or…), add 2 and consider that number. Only after trying 10000 or so times does it re-roll and start again at a random point.

PS they could (but, apparently, do not?) mark off, say if the candidate happens to be divisible by 3, the values p+6, etc. to not check for divisibility by small primes. I suppose the authors consider this not worth the effort?

If we can radically improve the performance, keep in mind that all this is open source :slight_smile: Remember, also, that we do not want just any prime, we want a random prime of a specified-in-advance magnitude (number of bits).

They could do that, but then they’re not getting all of the possible primes in their chosen range with equal probability. It’s a small weakness, true, but it’s also an entirely avoidable one.

Oh, sure, that part is easy. My 30n+m method, where m = 1, 7, 11, 13, 17, 19, 23, or 29, can do that easily, with appropriate choice of n, while still picking all primes with equal probability.

Most public key systems allow signature generation by encrypting with the private key. McEliece was originally thought to not allow signatures, but the Niederreiter variant of McEliece does. I seem to remember that there are some others public key encryption schemes that don’t allow signing but can’t recall them at the moment.

Looking at a different library (not the one in the crypto library) in which they do care about ensuring a uniform distribution of primes, but not necessarily about efficiency, it really does choose a random integer each time and tests it.

I asked it to come up with a 2048-bit prime and it did not take too long to output

32195036273566555202323673919075528613846258971353293149673646142136505548368686091280860756013625541093429947363149852423042234843267843391178349089494666313743312143824993885967811522793203504437192257907981461639336004079655033712193347306477181827157070253931268571078008407720054657932390287126151150701005263844857055095086641904712854521623850736215054933891053647261500920096243404246736515605528185297867444135749230527542057580463348975902644896837361661635260167484941867737610149355523681457889767465491215560636555383604054947163146698057823252486707589649694424791125594983171856136278286782302573056769

I am not sure how many random bits were consumed, though.

Pretty far off there. The core inner problem is factoring, not prime testing. If you generate two large primes and multiply them, it’s (often) hard given the product to find the original primes. But given the two primes it’s easy to find the product. That sort of asymmetry is basic to cryptography…

As noted, you can find large numbers that are “good enough” pseudo-primes with modest effort and if you really put the pedal down you can be quite sure.

People who work in coding up any program that relies on crypto security have to have a really, really, amazing depth of knowledge on the subject. For RSA this includes all sorts of ranges and tests to avoid known attacks. (And of course who knows how many currently known but secretly held, or future attacks there might be.)

Sadly, there are far too many self-taught amateurs coding things our there and the result is amazing holes all over the place.

The current rec. for key length is 2048 bits. This is long enough for security for the next several years but not next several decades. It also requires enough computation that actually using all RSA for communications is impractical. Instead, when it is used, it’s only for an initial key exchange to start a faster system. It’s really not a mega-dominant crypto system like many people think.

One of my profs expressed it as: What is the probability that Fermat’s little theorem is wrong and in the 350 years since it was proved, not a single mathematician has noticed the error? The probability that this number is not prime is less than that.

Yeah…people on my team get frustrated at times, when I insist that none of us have the knowledge and expertise to make real product security decisions. I insist that we involve our security team. People without the proper training simply don’t know what they don’t know in this space.

I recall when the internet and its obsession with encryption was just taking off, there were a number of products sold that were very quickly broken. IIRC, one was a database of “questionable sites”. The person writing the article explaining this pointed out how incredibly simple the encryption was, easily broken, and made the point that too many programmers think they are clever than they are (surprise!). That without a good grounding in number theory and primes and cryptography, they do not realize all the holes in their clever self-made schemes.

I’m sure I’m misunderstanding something, but do you mean that the product of those large prime numbers is the key?

The public key, that everyone knows, is the product. The private key, that only you know, is the two individual numbers. This makes it easy to find the public key given the private key, but hard to find the private key given the public key.

Thanks for that - I had assumed (mistakenly) that the word “key”, with “key” being in the singular, meant that likewise, would be comprised of only one number.

That’s the case with traditional encryption (i.e. every form known before (IIRC) the 1970s).

The traditional systems are still important – public-key systems generally use a traditional encryption algorithm to encrypt the bulk of the data using a randomly-generated key, and use the public-key system only to encrypt the key so that the recipient can securely obtain it and decrypt the message. It’s done that way because there are secure (as best we can tell) traditional algorithms that use much less computer power to encrypt a given amount of data than any known public-key system.

Technically both the public and the private key have two “numbers”. Conventionally (e, n) for the public key and (d, n) for the private key. Where n = pq is the product of the two big primes.

But to a certain degree you are allowed to pick either e or d arbitrarily and then compute the other, so a lot of implementations always pick e=65537 as the non-modulus part of the public key, which makes the situation seem less symmetric than it is.