Several probabilities questions

Enderw24 · November 26, 2012, 10:43pm

I’m playing around with Random.Org and a few questions popped into my head.
I created a random list of 10,000 numbers from a list from 1-10,000. Since there were repeats, it was obvious that the random number generator didn’t eliminate numbers once they were used.

I plug the list of numbers into an Excel spreadsheet column and put them in numerical order. Then I compare that list of numbers to their row numbers and note the difference between them.

For instance, if the numbers 1-8 were chosen at random for 1,5,8,2,6,4,5,4.
In numerical order they’d be
1
2
4
4
5
5
6
8

And you’d note that #1, 2, 4, 5, and 8 were the 1st, 2nd, 4th, 5th and 8th numbers.

I did this twice. Once 202/10,000 matched. The second time 141/10,000 numbers matched.

First question is what’s the expected number of matches in a list of 10,000? How would you even go about figuring that out?

Second, I also looked at how far off the number was from the row number. I subtracted the row# from the actual# and came up with one range of -99 through 34 and another with a range of -91 through 27.
What would the expected range be over the long term of doing this experiment?

OldGuy · November 26, 2012, 11:55pm

I can’t answer your question off the top of my head, but what you’re looking for for the first question is “order statistics.”

Chronos · November 27, 2012, 1:42am

I don’t have the math handy to back me up, but I would expect both to scale as the square root of the sample size. Thus, for instance, if you picked a million numbers up to a million, you’d have a thousand-ish of them in the right place, and the ones that were out of place could be out by as much as a thousand-ish places.

CookingWithGas · November 27, 2012, 3:00am

I don’t know the answer to the questions but

it wouldn’t be a random number generator if it did.

Lance_Turbo · November 27, 2012, 4:02am

The general solution “expected number of matches” when drawing n number from 1…n with replacement is:

Sum for 0 <= k <= n-1 (n!/(k! n^(n-k)))

When n = 10,000 Python tells me the expectation is 124.99912186808118.

Enderw24 · November 27, 2012, 4:52am

Interesting. So you’d believe that the number of matches should roughly equal the total range away from 0? My ranges were around 120-130 for both. Both heavily skewed in one direction, though I imagine that was just a quirk of two samples and it would even itself out over time.

Sorry, what’s k?

ultrafilter · November 27, 2012, 5:00am

k is the index of summation.

Buck_Godot · November 27, 2012, 6:28pm

To Answer the second question, consider dividing all the number on your list by N so that all of your data lies between 0 and 1. Then looking at the largest deviation is exactly the same a performing aKolmogorov–Smirnov test. for repeated runs the number of negative or positive ranges should be symmetric, but within a run of 10,000 there may by chance be a tendency one way or another.

Overall you expect a maximal absolute difference of about 0.83Sqrt(N) with 95% or repeated runs falling between 0.48Sqrt(N) and 1.48*Sqrt(N)

Topic		Replies	Views
What is the expected frequency distribution in this scenario? Factual Questions	16	838	September 21, 2018
Expected frequency distibution of randomly selected numbers. Factual Questions	6	1101	May 11, 2000
Random Number Generation Question Factual Questions	22	3813	May 23, 2012
Math Dopers: A little help? Factual Questions	21	1222	July 1, 2004
Question on Randomness Factual Questions	23	1941	April 11, 2009

Several probabilities questions

Related topics