I found this article very interesting. As I understand it, the thrust of the piece is that the NSA has been in cahoots with at least two chip makers (Intel and Via) with the goal of making the output of those companies’ random number generator seeds somewhat ‘less than random’ i.e. “lessening their entropy”. The idea being, of course, that the folks at the NSA being aware of the nature of the seeds’ relatively low entropy would thereby be able to exploit it - essentially a back door to the RNGs.
My question is not about the NSA or the legality (or morality) of its operations.
Rather, I am a curious to learn why anyone who is serious about their privacy, and wishes to encrypt their secrets such that they are, effectively, unbreakable using current (and foreseeable future) software/hardware would choose to use any source other than their own “locally-generated” data to seed and operate their RNG.
Phrased differently, given that the output of RNGs depends critically on their seeds, the absolutely pivotal, essential nature of the seed’s randomness seems obvious. So, why would you ever use someone else’s seeds for your RNG-generating algorithms? Why not use seeds generated in and by your own local environment (e.g. some function of the heat produced from your computer itself, or even better, seeds derived from the radioactive decay of some locally-held radioisotope)?
Is seed production as alluded to in the preceding paragraph not easy to do? Expensive? Difficult to maintain? What is it that prevents companies, organizations and the like from generating their own seeds given the huge risks inherent in using those produced by others (as the linked article neatly reveals)?
Unless you use radioactive decay, there will never be any truly random seed (or random number.)
In addition, a lot of people have no clear idea about this kind of thing at all. “password” is still an incredibly commonplace password. (A radio show I was listening to mentioned that “letmein” was also used an awful lot.)
I used to base my pseudo-random-number seeds on my PC’s millisecond clock, until I was advised this isn’t as good as it might seem. How was I to know? The science is advancing right under our feet!
That is in fact how HW RNGs work–they use some circuit which experiences random variation from thermal fluctuations. There are several varieties but here is Intel’s approach.
The problem is that because the design is closed, there’s no way to verify that results haven’t been messed with somehow. RNGs require digital conditioning to remove bias, among other things. It’s here that there is a conceivable backdoor.
Sure, someone could build their own RNG. It’s actually incredibly easy. But most companies just don’t have the inclination or expertise. So like anything else they pay for someone else to provide the service. And again like anything else there’s the risk of a violation in trust. After the leaks, no one trusts HW RNGs any more, so the software makers are adapting.
As I understand it, developing a true random number generator is very difficult. Proving that it is truly random is tough as well. Showing a given set of numbers isn’t random is pretty easy though. The point is that hardware is the best way to generate a random number, and doing that efficiently and provably takes a lot of effort. If Intel is going to do all the work, why not use it? Now that NSA may have gotten into the act, people aren’t willing to trust it. But just doing it by yourself is tough.
But if you know where the sequence starts, the numbers are completely predictable, hence not random. That’s why they’re pseudo-random number generators. Bad seeds cost the Montreal casino over $600,000 in a hard-earned lesson about randomness.
There’s been some disagreement in the response about whether RNGs are “easy” to build or not, and also about whether the seed or the algorithm is key. I will appeal to authority and quote the Intel engineers from the article linked above (and again below in this post) who say it’s the seed that counts: “Researchers have managed to devise pseudorandom-number generators that are considered cryptographically secure. But you must still start them off using a special seed value; otherwise, they’ll always generate the same list of numbers. And for that seed, you really want something that’s impossible to predict.”
One of the comments on that article, the comment already two months old, was interesting: “NSA have been colluding with Intel to develop this RNG. So don’t use any output from it or they’ll break your encryption. Never trust a closed design, closed source black box for your random numbers.”
I can’t say understood the techniques described in the article but, even from my position of ignorance, ISTM they are doing a lot of ad hoc ‘correcting’ of the randomly generated signal. That seems like an invitation for either introducing some non-random element or, what actually did happen, allowing someone to deliberately screw with the process.
The seed is more important than the algorithm. With an ideal seed, it wouldn’t matter what algorithm you used, as long as it wasn’t idiotic or deliberately sabotaged. With an ideal algorithm, though, a predictable seed will still lead to predictable results.
The catch is that improving your algorithm is usually a lot cheaper and easier than improving your seed. Better algorithms are just software, and can be run fairly quickly, but better seeds usually means custom hardware, which is generally pretty slow.
You make both a valid point and an invalid point. True random number generators were once hard to imagine but anyone can buy them or even make one these days. You can buy radioactive decay detectors that simply plug into any computer via a USB port for a few hundred to a few thousand dollars if you want your own equipment or you can use a web service that does it for you. Given no hijinks or NSA involvement, those solutions will give truly random seeds.
The practical problem is that most people don’t care enough about true random numbers to use those solutions. Pseudo-random numbers were almost always good enough for most practical purposes until the U.S. government and NSA decided they could dedicate massive resources to cracking whatever key they want to. Most security solutions still depend on pseudo-random seeds and those are crackable especially if they are intentionally predictable through a conspiracy like is being described (that is an example of a true conspiracy BTW; a few of them really are true and the NSA is guilty as they come).
When I was growing in the 1980’s, people used to tell stories about the lengths the Soviet Union KGB would go to to spy on their own people. Now, we have it to courtesy of the NSA. I haven’t heard from Russian comedian Yakov Smirnoff in a while but he isn’t going to have to make new jokes. “In Soviet Russia, TV watches You!” is now “In free America, computer watches You!”.
Still, anyone that is tech savvy and wants to go through the effort can beat any of the NSA’s cracking efforts. The strong defenses are unbeatable through known technology as long as you have a true random number generator and a trusted algorithm.
One significant factor is the throughput you need. A gigabit/second RNG is fairly difficult. A few hundred bits/sec are easy, though, and good enough to use as a seed. A Zener diode, resistor, and ADC accomplishes this easily.
The postprocessing is often called “whitening” and can eat into your throughput. Suppose your RNG is biased and produces 1 bits 2/3 of the time, and 0 bits 1/3 of the time. Obviously that’s not acceptable, but if you instead take a bunch of bits and XOR them together, you can asymptotically approach an even distribution. Of course that reduces your throughput, but if all you need is a 256-bit seed on program startup, you can afford to waste some bits.
Intel vaguely describes their whitening algorithm–they say that they split each 512-bit chunk into two 256-bit chunks, then apply some algorithm to combine the two. I dunno if they describe the algorithm in any more detail somewhere else, but even if they did, the design is still closed and hence impossible to verify.
My point stands: a SEED does not need to be RANDOM, just different. As long as you can guarantee you aren’t re-using seeds (ever or often) then you won’t need to worry about how random the seed is, the PRNG algorithm provides the randomness.
Unless I am mistaken, most of the posts in this thread misunderstand the nature of the “backdoor.”
A random sequence f[sub]t[/sub] = f[sub]t[/sub](seed, PRNG) depends on a pair (seed, PRNG) with the PRNG published and the seed a one-time random throwaway. If the seed is exposed to eavesdropping, the entire sequence is exposed, but that’s not how the exploit in question works.
Instead the exploit observes the random sequence f[sub]t[/sub], knows what PRNG is, and exploits the computational weaknesses of PRNG to deduce the seed. This means that knowing some past values of f[sub]t[/sub] (t < T), allows one to know its future values as well (t > T).
I think.
What the NSA did was to encourage use of a PRNG for which they knew how to do the reverse computations needed to recover the seed. I am certainly not an expert on cryptology, but IIUC a simple way to make such reverse computation more difficult is to simply run two different PRNG’s with different seeds and XOR their results together. Lack of this and other safeguards led experts to deduce that Dual EC_DRBG was deliberately crippled from the beginning; the fact that NSA can decrypt these streams is probably “old news.”
This is sort of true, but easily misunderstood as well. There are all sorts of evil subtleties.
In cryptography you always assume that the eavesdropper knows all the algorithms in use. Thus you assume that the various government agencies know what PRNG you are using, and what encryption.
If you know what the PRNG is, you can guarantee to always get exactly the same sequence of numbers if you know the seed. Thus the PRNG adds exactly zero additional security. The use of a PRNG comes about in communication because of mundane issues. However the number that the PRNG is seeded with is the cryptographic key. The entire security of the communication depends upon the seed being secure. If you can guess the seed, you break the encryption. If the search space of the seed is artificially limited it becomes viable to break the encryption.
The constraints on the seed are clear - never ever reuse them. There are attacks that can recover the seed if two separate messages use the same seed. The seeds must be impossible to guess. Yes they must be different, but there are only two choices in choosing a new seed. A process that is deterministic to some extent, or a process that no determinism, and is truly random.
That gets us back to the first assumption. We assume that the eavesdropper knows all our algorithms. So, and way of choosing a new seed that is in any way deterministic is by definition algorithmic (there is some form of expression that captures the deterministic component). The eavsdropper knows any such expression, and thus is able to use this knowledge to limit the search space of seeds. The only way to prevent such an attack it to choose a truley random seed. This is expressed as the amount of entropy in the seed. A perfect seed choice has as much entropy as the maximum information content of the seed. That is really hard. So we at least set limits on the minimum amount of entropy we will accept in a seed.
So getting back to the accusation. If the process implemented in the chips artificially limited the space of random numbers generated, you have a clear backdoor. Such a backdoor could be very hard to detect.
The problem with random numbers and psuedo-random numbers is that in order to prove them non-random you need to find an autocorrelation mechanism. Simply put, given the history of previous numbers can you make a prediction about the next number to be generated that is better than pure chance. For psuedo-random numbers generators the answer is always yes - but actually finding the algorithm is arbitrarily hard. Once you find it, there is no argument. But finding it can be as hard as breaking the encryption. What passes for a true random number source could be perverted in a manner that is essentially impossible to detect. The sun would go cold before you managed to generate enough numbers and search all the possible algorithms. But, if you knew the algorithm, you could prove it was perverted instantly. Reverse engineering the chip might get you there, but that isn’t easy. A stochastic perversion, one that simply provides a few billion possible answers that the eavesdropper knew to search across would be as good as a deterministic algorithm in the chip as a backdoor, but even harder to reverse out and prove the existence of.
That depends on how often you re-seed your PRNG. If you’re really concerned about security, then you don’t produce any more bits of output than you input in your seed. In this case, it doesn’t matter if someone can use a known output to determine the seed used to produce that output, because you’re not going to be using the same seed again. It’s only when you get greedy and run the PRNG repeatedly, using its previous output as the seed for the next run, that this becomes an issue. Of course, true randomness is expensive, and you often don’t need nearly that much security, so sometimes this is justified… Just make sure you know the risks, and can accept them.
But, isn’t it also true that changing the seed introduces the possibility of reducing the randomness of the generated sequence?
The more frequently you change the seed, the more you are shifting your reliance on pseudo-randomness from the PRNG algorithm to your seed algorithm.
At the extreme, if you changed the seed for every new PRNG number needed, you’ve just eliminated the PRNG algorithm and created a new PRNG algorithm that is probably not nearly as random.
There are also commercially available solutions that use single-photons randomly being transmitted or reflected by a beam splitter, such as ID Quantique’s QUANTIS.
If your seed algorithm is truly random then there is no need for a PRNG, you already have a real RNG.
So the assumption is that the use of a PRNG is because you don’t have a truly random sequence of numbers.
If that is the case, then the PRNG is mathematically constructed to be substantially close to (but not actually) random.
Because there was a lot of thought and testing that went into the PRNG to confirm randomness, if you start monkeying with the seed frequently and non-randomly, the result could be a reduction in the randomness achieved by the PRNG.
At the extreme, someone non-randomly “seeding” their PRNG for every new number generated has possibly just eliminated most of the value of the PRNG.