# Question on Randomness

Let’s say I have a list of 100 different values. I need to pick 50 of those values at random.

I can easily write a small formula to give me 50 values chose at random (well, based on some seed somewhere) by the computer, or I can have someone else randomly pick the values.

What are the differences between these two methods? Is there any reason why I might choose on method over the other?

oops, mods, this was meant to go in GQ, I would appreciate a shove in that general direction.

Red bean tango tissue?

Sorry. The trouble with having a human pick 50 numbers is that he’s likely to try to space out the values in an effort at being “fair”. Truly random numbers will cluster in some places and have large gaps at others.

So there is an inherent bias?

Wouldn’t it be possible for the “random” computer generated list to match whatever a human would pick out, even if that is spaced out rather than clustered?

If I were to give you both lists, could you, could anyone tell which one was created by the computer and which one was created by a person?

Neither method is good if you want true randomness.

Computer random number generaters can simulate random numbers fairly well for most purposes. But they all have exceptional cases where they break down and the underlying pattern of the numbers shows through.

And humans are simply horrible random number generators.

It depends on the length of the list. If it’s only 50 numbers long … maybe, depending on how clever the human was. If it’s 500 numbers long, absolutely.

I could give it a shot. I’d first look at the distribution of the numbers, especially the later numbers, since I assume a person might be thinking “57! Wait, did I say 56 before? Better pick something far away… 81!”

This sound like a cool experiment. I’ll post both the lists once I’m done with it and see if most people can guess which one was created by me and which one was created by the PC.

“Truly” random is an almost impossible (perhaps literally impossible) thing to achieve with a computer.

However, as mentioned, they can get pretty darn close and are good enough for most any purpose.

There is no way a human will be remotely random if they consciously choose the numbers.

However, I wonder if they can be considered random to an outside observer who knows nothing of the person choosing to be able to make any predictions on what numbers will be chosen.

Does unpredictability = random?

Generally when we’re looking for a good random number generator we want more than just unpredictability. We also want each possible outcome to have the same probability of occuring, and we want the results of future trials to be independent of the results of past trials.

No - unknown factors or uncomprehended complexity can also cause unpredictability. Which, uh, is actually how computers usually generate “random” numbers. (Technically they’re pseudorandom numbers, based on an unknown, but nonrandom, seed or seeds.)

Humans cannot pick numbers at random. For only 50 numbers, I would expect a computer pseudo-random number generator to be perfectly adequate. There are a number of web sites that will provide lists of random numbers. Here’s one: http://www.random.org/ (actually the first one I googled). There was one I recall reading about that uses a lava lamp. One of the ones I googled used (or claimed to use) atmospheric noise. If I were playing poker and wanted to take some action half the time at random, I would cast a surreptitious glance at my watch and if the second hand were between 0 and 30, do it and if between 30 and 60 not do it. (Why surreptitious? Well mechanical aids are generally banned.) That’s not truly random, but it is the best you can do, AFAIK.

I just read a blog posting on this.

In short: Look at the two pictures. The left one is what humans tend to think randomness is, the right one (with more clusters and gaps) is what randomness actually is.

Also, there are better (computer) random number generators than Microsoft’s “rand” function. I ran a test on rand: I generated 47 million random numbers between 1 and 47 inclusive, so you would expect each integer to have 1,000,000 occurrences. What I found was that, while the count for each integer was pretty close to 1,000,000, the lower half of the range had more counts above 1 million and the upper half of the range had more counts below 1 million. This was consistent over several iterations of the test.

I found a better random number generator on the web, but don’t have it here with me at work. If you want to know what it is, send me a message.

J.

I don’t doubt that any system default rand function is poor, but one has to be careful about how it’s tested. In the case given here, there’s a good way and a poor way of testing. (I’m not saying you did it poorly, but I don’t want other less experienced readers to do poor testing.)

The poor way is something liker = (rand()%47)+1;and the good way is something like thisr = ceil((rand()/(RAND_MAX+1.0))*47.0);I’m assuming the rand() function is the typical function that returns an integer between 0 and RAND_MAX, inclusively. The poor way takes the modulus by 47 (that is, returns the remainder after dividing through by 47). The problem is that RAND_MAX is not a multiple 47; it’s usually one less than a power of two. Imagine it was 63–do you see the problem? 48-63 get mapped to 0-16, and so the lower numbers get hit more often.

The good way avoids this by doing the extra work to properly scale the random integer.

From what I understand, there is computer software to catch white collar criminals by looking for randomness. If you were to make a series of legitimate monetary transactions, they would have a more or less random pattern. If you were to fake the transactions, you’d be careful to make them look random, which they then will not.

Ah yes, that’s Benford’s Law. Deviations from it are an excellent way of catching naively faked data.

With a little work, sure, you could probably write a program to create lists that look like human-generated ‘random’ number lists.
But why would you want to?

Very small randomness in this sample: It’s obviously a right hand “bash the keys” method with a left hand “A” thrown in.

These almost always have a period to them – for instance, in yours the A shows up every 5ish characters (or would if it went on longer).

You also bash in a circle, switching between counterclockwise and clockwise, punctuated by the “A”.

It’s a pretty low entropy signal, as opposed to vbcrvj35b6uhuo5mor8p, which has a bit more entropy.

Given a big enough set of your keyboard bashing, a one time pad of plaintext plus your random string would be pretty easy (for someone with expertise and motivation) to break.

I’m game for this.