Is it inherently impossible to generate truly random numbers using computers?

This quote makes it sound like a near law. Is it, or have methodologies progressed since then?

I don’t know that it’s ever been impossible. You could simply measure line noise on a sound card output, something thermal, or anything else that is unpredictable, and base your algorithm on that.

After reading von Neumann’s quote, I notice that he words it differently than you did, and he’s probably correct. If I understand what he’s trying to say, you could word your question to be “Is it inherently impossible to generate truly random numbers using software?”, in which case I can’t think of a good way. Keep in mind that many people use the internal clock to pass a seed value to their pseudo-random number generator, and I think that you could debate whether that’s truly software based.

The operative phrase is “based solely on deterministic computation”. The quote is still correct. There are sources of randomness in modern computers, which can be used to generate truly random numbers. See /dev/random - Wikipedia

The objective/limitation is to have something that is purely arithmetical, and not based on measuring random physical phenomena. Plus, I’m not sure how truly “random” sound card static would be. I suspect it would have certain subtle biases if truly put to the test.

It’s possible to sample noise from somewhere, but that isn’t what von Neumann was talking about. Von Neumann was saying that it’s impossible to take a deterministic computation and expect it to produce a random output, because it will always produce the exact same result. That much is true and will always be true.

Taking some random number and putting it through a deterministic computation will give you a random number. That’s what people are talking about when they mention sound card noise or measuring the timing between keystrokes: Both of those things are, to the computer, numbers, and they can be considered random for most purposes. However, and this is the main point, the input is not deterministic, so any computation based on that input cannot be deterministic. Von Neumann’s statement doesn’t apply in this case.

In that case, it is indeed impossible. Any arithmetic algorithm will, given the same input, always produce the same output. That output may be completely indistinguishable from true randomness by any statistical test you can think of, but it will not be random in the sense of being unpredictable, because anybody who knows the algorithm and its input values can produce the same output.

Sure, but you wouldn’t copy the output from the sound card straight into your cryptographic application or whatever you need true randomness for. There exist all kinds of interesting techniques to get lots of high-quality randomness from a poor-quality input source.

As a simple example, let’s say that you have a source of nondeterministic bits, but 80% of them are 1 and only 20% are 0. But what you want is 0 and 1 with a 50/50 distribution. The answer is simple: sample two bits in a row; if they both have the same value, discard them and sample another two. If their values are 0 and 1, consider your input bit to be 0. If their values are 1 and 0, consider your input bit to be 1. Similar techniques exist to get rid of other kinds of bias.

By the way, another point: any deterministic pseudo-random number generator will eventually start repeating itself. The maximum obtainable period depends on the amount of memory available to the algorithm to store its internal state.

Now, even with a small amount of state you may well be able to generate many terabytes of data before it starts repeating, and that data may well be perfectly random by any statistical test you can come up with. But if you care about the principle of the thing, then obviously something which just repeats the same (long) sequence of numbers over and over again, cannot be said to be truly random.

As the above posters have said, it is inherently impossible to generate truly random numbers using computers. But, the good news is that you can get close enough to randomness that it makes no significant difference. Using a variable seed number as an input is a common method, as mentioned in the above posts.

Actually, there are no truly random numbers outside of computers either. For example, consider rolling a pair of dice at a craps table. If you were to exactly replicate all aspects of the throw (i.e. the same dice, held and thrown exactly the same way with the same force and spin, with the room at the same temperature, the A/C blowing at exactly the same speed, the stars aligned exactly the same, etc…) you would get exactly the same outcome. In reality, there are just too many minor variables that influence the outcome to exactly replicate them, and the results are for all intents and purposes random.

The same is true with computer-generated random numbers. The difference is that the environment inside the computer is far easier to replicate, so the possibility of getting identical results is much higher (actually 100% if the computer is running correctly). So, a variable input or seed number is used to induce variability in the output. If the seed number is suitably “random”, this will result in suitably random-looking output.

For large scale applications (for instance running numerous simulations based on random effects), a large set of seed numbers is typically used. This set will be tested using various statistical techniques to ensure it does not have any identifiable patterns and that the values are nicely distributed across the range of possible values. If not, it may be cleaned up before use (post #7 describes one such “cleaning” technique). Often these sets are generated by observing some low-level physical phenomena in the environment and converting the values into usable numbers (for example, the sound card noise previously mentioned). There are even ways of “doubling up” and using one seed number (such as the computer clock time) to select a second seed number from the cleaned list.

So, the computer by itself cannot generate a random number, but with a little external help it can get close enough not to matter for all but the most demanding applications.

Hope that helps a bit. :slight_smile:

Wrong, by the standard that has been used in this thread, and this misunderstanding invalidates a lot of your post: It’s possible to connect a device to a computer that contains a radioactive material and a Geiger counter that measures the amount of time elapsed between every decay event, or every time that radioactive material throws off some radiation of some kind. That is absolutely, 100% random as long as the source material is still radioactive.

(It biases downwards over time predictably as the material becomes progressively less radioactive, but it’s possible to compensate for that as long as you keep track of how old the sample is and how many half-lives it’s lived through.)

I built a random number generator from a radiation detector (Geiger counter) with a computer interface and a radiation test source. The computer measured the time intervals in between decay events. I’d argue that the results were “truly random”.

The site Random.org claims to produce true random numbers. It also contains discussions of randomness similar to what’s been discussed here.

Another good site is HotBits.

I don’t see why it’s so hard to get a computer to return a random number.

Beg to differ. That would be true if we actually lived in a Newtonian universe…
But we don’t. As others have mentioned, radioactive decay is truly random. Can’t be predicted. Not just, “too many minor variables,” but, would not happen the same way again starting from the same conditions. Particles just spontaneously decay. Any time you have enough energy, particles, (with their anti-particles,) can just spontaneously appear. Changing the outcome of any experiment that happens to be taking place. Alternately, “virtual” particles can spontaneously appear, (with their anti-particles, and “borrowing” energy that isn’t there,) interfere with other particles, then self annihilate again.

On the macro scale, we can come close to robot-tosses-coin-same-way-every-time-gets-same-result, but on the quantum scale? Not a chance.

http://en.wikipedia.org/wiki/Bell’s_Theorem
quote:“No physical theory of local hidden variables can ever reproduce all of the predictions of quantum mechanics.”

OK, so you cannot make an algorithm of which the output is truly random, without feeding it random input from some external source. But you can certainly make an algorithm which produces a stream of numbers which is statistically indistinguishable from true randomness.

Which is fine if you are writing a game, or a simulation of some natural process. In that case, it is sufficient that the output of the RNG behaves as if it was truly random, and we know how to do that. In fact, for such purposes, having a RNG which is “random but deterministic” is actually convenient, because repeatable programs are much easier to debug.

(However, do not use the RNG that is part of the standard library that came with your programming language. Those usually suck.)

But if you want to use your random numbers for cryptography, that’s not good enough. Then, it is not sufficient that your numbers appear random to a statistical test. It must be impossible to analyse the output stream and thereby deduce the internal state of the algorithm, even for an expert mathematician armed with enormous amounts of computing power, even when the algorithm is known.

There exists an encryption algorithm called the one-time pad. It is the simplest algorithm in all of cryptography; it is also the only one that has been mathematically proven to be unbreakable.

Basically, it works like this: I generate a huge amount of random data, and burn two copies to DVD. This is the key. I give you one of the copies and keep the other. Then, if you want to send me a message, you XOR every bit of the message with one bit of the key. As long as the DVDs do not fall into the wrong hands and we never re-use key bits, this is absolutely safe even against an infinite amount of computing power. The reason for that is that if somebody intercepts your encrypted message and tries to guess at its contents, they can always come up with a key which would have produced that message. And because every possible key is equally likely, they will never know which message you actually sent.

However, here’s the catch: this only works if the random key is really random! The fact that an OTP requires a megabyte of key data for every megabyte you want to transmit, and the key can never be reused, makes it rather cumbersome for use in the real world. So, a lot of different people have come up with this clever idea: instead of pre-sharing a huge amount of randomly generated data, you just pre-share the seed value for a RNG, and then use the random data stream from that RNG as your OTP.

Such an approach can certainly work, and it may well be as secure as any other encryption algorithm using a key of the same size as your seed value. However, when you do this, you no longer have an OTP, and the mathematical proof which guarantees the unbreakability of the OTP no longer holds! That is because with a deterministic PRNG, it is not the case that every possible bitstream is equally likely, and therefore for a given ciphertext message there will be only one legible message which could have produced that ciphertext.

There are lots of snake-oil crypto software vendors who make claims like “our software is provably uncrackable because it is based on the OTP, but it is much easier to use because you need only a short password to access your data!” Avoid such companies like the plague, because if they don’t understand the basic principles of how an OTP works then it is extremely unlikely that they know how to design secure software.

Nothing intelligent to add, but I remember reading this link a while back, which I found somewhat shocking.

http://www.boallen.com/random-numbers.html

I like the radioactivity based methods. But, what sort of a random number are we talking about? A uniformly distributed one? Gaussian? If you count the radioactivity events in a given time span, you will get an integer, and if you do it many times, the set of integers you get will be Poisson distributed. Obviously they can’t have a Gaussian distribution, because for one thing you will never get negative counts.

Derleth, how are the pauses between successive events distributed? They, too, can’t have a Gaussian distribution, verified by noting they, too, can’t be negative. What are the requirements for a transform to turn random numbers having one distribution into random numbers having another one, without spoiling the randomness in any sense?

Bonus question: how many years before somebody figures out there is actually something nonrandom about radioactivity events and their timing?

I don’t have as much a response as a question. What applications apart from cryptography actually require a truly random number generator? For example, say I am interested in performing a Monte-Carlo sensitivity analysis of some function with ten input variables to determine which variable has the most influence on the final result. Using a pseudo-random number generator such as RAND() in Excel (where an even distribution is desired) you can still get an answer based on millions of iterations wherin you can be reasonably assured that all possible combinations are tested . Assuming it is designed correctly, how much more accurate could this type of analysis be with truly random input vs. pseudo-random input?

What types of analysis require truly random input?

I wondered something similiar for along time…but more along the lines of this:

Either there is no such thing as radomness or radomness doesnt really mean what we think it means.