How do we define randomness ? If I generate ‘random’ a number between 0 and 1, how can I tell if my algorithm for generating the number is doing so randomly. If I try it 100 times and each time it generates 1, it would seem that it was not random, but the probability of it generating 100 consecutive 1s is the same as generating any other (seemingly more random) series of numbers.
I’ve had a couple of ideas about this,
1, It depends how guessable the next number to be generated is. This seems logical but it would depend on how good at guessing you are.
2, It depends how quickly the average number guessed converges to 0.5 (as we increase the number of guesses). I like this one slightly more because we can compare it to other converging series, but the series will not always converge at the same rate.
Not really, I think that randomness does exist ( for example a random atom, of a radioactive substance will decay first). But how can we tell if an algorithm for generating random numbers is a good algorithm ? How can we tell that a known series of numbers (like PI) is random ?
There is no good algorithm for generating random numbers. It was proven in tthe 19th century by Charles Babbage that any deterministic approach could never produce truly random numbers. The randomness of a sequence of numbers is determined by statistical analysis.
However, the OP had the gist of it: it has to do with predictability.
If you can’t predict the next number (or when all predictions succeed at the same rate) then that number is random.
What’s nifty is that “real information” is also unpredictable!
Suppose you have a movie reviewer who hates movies. He gives EVERY movie a “two thumbs down.” You can always predict what he will say. Perfect predictability…and no information. You don’t know any more after you’ve read his review than you knew before it.
But suppose that you have a good reviewer, who rates movies fairly. If, in this summer season of blockbusters, there are about as many good flicks as stinkers, then he will have about as many raves as pans. You can’t predict what he’ll say about any given movie, and thus the reviews have the appearance of “randomness.” But they convey real information.
This pops up a lot in data compression schemes: for instance, you know that, 99% of the time (well, less after Sept 11, 2001) the letter “u” will follow the letter “q.” Thus, the letter combination “qu” conveys very, very little more information than the letter “q” alone. So you can save space on your memory system by storing “qu” as “q” and then restoring the “u” later.
Anything generated from a deterministic formula is, by definition, pseudo-random. There are some “cryptographically-strong” psuedo-random generators that are considered good enough for generating crypto keys and such. There are also a number of different statistical tests such as Chi-square for evaluating the quality of a random sequence.
Check out Donald Knuth’s “Art of Computer Programming, Vol 2: Seminumerical algorigthms”. Knuth is pretty much the standard for this kind of thing, so it’s a good reference. Bruce Schneier’s “Applied Cryptography” also covers several algorithms.
An algorithm cannot generate random numbers (by definition, IIRC). Algorithms produce pseudorandom numbers. The use of an algorithm to produce pseudorandom numbers is usually used to simulate real-world data to test software, or computer gambling. A pseudorandom number generator often tries to use an algorithm that cannot practically be reverse engineered (i.e., you can’t guess the next number), or uses other random-like events (i.e., the millisecond at which a user presses a key, radio static) to keep it mixed up. But the purpose is not to be truly random, it is to generate a sequence that approximates randomness for practical purposes. That is, the distribution of the results over time should approach a true random distribution.
I have confined my post to a topic that is pretty objective. Philosophical discussions about randomness will easily end up in GD.
P.S. I was tracking that thread that Q.E.D. linked to, but despite much interesting conversation there was never a conclusive definition of randomness.
Randomness is defined differently by different people. Ask a Physicist, a Mathematician, a Statistician and a Computer Scientist and you will probably get 4 or more different answers. Statisticians, for example, focus on whether a sequence of numbers satisfies certain statistical tests.
But the OP brings to mind certain things of interest to one group of Computer Scientists. First you talk about an algorithm, which means you are generating pseudo-random numbers. Secondly, you indicate that the predictability of the next number is of interest. This is exactly the core thought in the development of Algorithmic Information Theory, first proposed by Church and developed fully by Kolmogorov.
In short, if there is a computer program that generates your infinite sequence of numbers, it is not random. (In this context.) E.g, the digits of Pi are most definitely not random.
As to your convergence point, this doesn’t matter in Algorithmic Information Theory (it can be calculated, but it’s a non-issue). To Statiticians, it is probably a very big deal.
“Anyone attempting to produce random numbers by purely arithmetic means is, of course, in a state of sin.”
–John Von Neumann
So you get semantic confusions based on different meanings of the word “random”.
For example, suppose I produce an ideal list of random numbers. It is so random that no one else would be able to guess or recreate this list in a trillion jillion years. But I publish this list in the Topeka Mining Quarterly, and the US cryptographers use it for all secret battlefield transmissions, cuz no one reads the TMQ.
But, the Other Side has a secret cell of spies in Topeka. Why? Well, they just chose Topeka at random. Maybe they like the fresh air girls who wear gingham.
Anyway, one day as their computers are trying to decrypt US codes, they apply the list published in the Topeka Mining Quarterly. To their surprise, it works and all our secrets are revealed! Our battle groups get smushed the next day.
The Other Side shake their heads and say “What were they thinking? The used a public domain list! Why didn’t they use random numbers instead?”
Nothing in the definition of “random” has anything to do with keeping the numbers secret. This is more an issue with the implementation details of your crypto scheme. I guess you could play semantic games that once you create a list of random numbers, the ones in that sequence are no longer random because, given one, you can predict the next. In that case, the only random number is the next one to come out of your ideal generator, not the ones you’ve already got.
A Physicist, a Mathematician, a Statistician and a Computer Scientist all go duck hunting. A duck flies by. The computer scientist couldn’t get a shot off, so he paused to figure out how to reboot his gun. The physicist calculated the trajectory of the duck and predicted its path of flight, but dropped a digit and the shot went a foot in front of the duck. The mathematician ddetermined that the duck was flying in a Cinci parabola, determined the Y for the next X but shifted the decimal point and the shot went a foot behind the duck. The statistician jumped up and yelled, “Good work boys, you got him!”
You can’t make a truly random-generating algorithm, but you can build a device to generate random numbers. Some of these use a tiny radioactive source. The actual decay time is randomly distributed, in a way. It isn’t generated by any code. I understand that such devices have been built and used for just this purpose.
There are certainly ways to test the randomness of a set of numbers. The average value of random numbers between 0 and 1 should be 0.5, in the long run. There shouldn’t be any correlations between successive numbers, or every other set of numbers, and so on. I suspect you can find such tests in books like Press et al’s Numerical Recipes, or in Knuth’s The Art of Computing. For the practical purposes of most folks, many of those mathematic random number generators are pretty good. IBM, I believe, used to publish books of “random numbers” that passed several statistical tests back in the 1950s and 1960s.
Unfortunately the project isn’t up right now, but one of the coolest random number generators was (will be?) Lavarand. They scanned several lava lamps to generate a random image, which could be used to generate random numbers (since the flow in the lamp is chaotic, it’s as good as random can be).
Linux has two different facilities for generating randomness from user input (plus one more raditional pseudorandom number generator). The important one, known as /dev/random, generates actual nonpredicatable random numbers by comparing the timing of user inputs (keypresses, mouse clicks, etc.) In order to ensure that its input is not duplicable, it discards most of the significant figures from the information it gathers, thereby accumulating only a small amount of randomness at a time. Since most applications need a lot of fairly good random input, rather than a tiny amount of very secure input, there is a second device, /dev/urandom. This is just a traditional pseudorandom algorithm, but it works by acting on a few digits of /dev/random’s output to produce a much longer output stream. For most applications, this is quite random enough, and much less of a finite resource.
I don’t think you’ll find a satisfactory definition of randomness. I took a class in probability (a math dept class, not a statistics class) and the best they came up with was a definition of a random experiment . That is, an experiment for which the outcome cannot be predicted with certainty.
I think randomness should really be considered a mathematical primitive, like “line” or “length”. Try to define length for me. Everybody takes for granted that you can describe a length with a number, but nobody ever defines the process. Similarly, you can quantify randomness with a probability, or a probability distribution.
And there’s where some chances for misunderstanding can start. Many people will say that something’s “not random” simply because it doesn’t obey the kind of probability distribution that is expected. For example, if a gambler rolls a couple of dice and starts getting, say, twice as many double sixes as expected (if they were fair dice), he’ll claim the dice “aren’t random”. I don’t think this is quite the right thing to say.
Similarly, when people ask whether a string of numbers are random or not, I think what they’re really asking is whether it’s reasonable to believe the numbers were produced by a process with a uniform (or some other) distribution.
Another thing people do is to try to determine what the next number in a sequence is based upon knowledge of the prior numbers. If they can “break the code” then they’ll say the sequence isn’t random. But, if you don’t know anything about the process that produced the sequence, how can you really be sure the next number will fit your code? I could write down a thousand digits of pi, then stick my social security number in there someplace. If I started reading the numbers, you might say “hey, those aren’t random, they’re digits of pi!”. However, the next number you predict might be wrong because my SSN is coming up.
They are not random in the sense that you can generate the next digit through an algorithm, but they are random in the sense that you cannot predict what the next digit is by looking at what all the digits you have now.
To me, this view is too simplistic. Claude Shannon’s Information Theory is interesting but not complete.
For example, any conclusion about the movie reviewer is necessarily inductive. There is no telling that whether he will like the next movie or not. If he doesn’t, your induction gets a bit stronger. But if he does, your induction goes out the window.