The Straight Dope

Go Back   Straight Dope Message Board > Main > General Questions

Reply
 
Thread Tools Display Modes
  #1  
Old 05-22-2012, 01:49 PM
RetroVertigo RetroVertigo is offline
Guest
 
Join Date: Aug 2005
Random Number Generation Question

We have a procedure for testing samples by assigning numbers and then testing the sample number randomly generated.

Now the process for assigning and generating numbers is very long for each number. If we have 20 samples we end up testing 16. So to save 5 minutes I randomly generate the numbers for the 4 I'm not testing. My boss hates when I do this since it isn't SOP, but doesn't make me change the way I do it, rather just complains.

Is the probability math different by negatively selecting instead of positively selecting the random numbers?
Reply With Quote
Advertisements  
  #2  
Old 05-22-2012, 02:00 PM
CJJ* CJJ* is offline
Guest
 
Join Date: Nov 2004
Do you mean that you have samples numbered 1 thru 20, then pick 4 of those numbers at random as the ones to exclude, rather than picking 16 of those 20 as the ones to include? If so, there is no difference; the 16 tested samples are just as random for each method.

Now, if the random number generator is slightly biased--say, for example, it produces the number X at a higher-than-expected rate than pure chance--I wonder which method would be better? My instinct says choosing only four numbers would be better because it's fewer opportunities for the bias to occur, but that's just a guess
Reply With Quote
  #3  
Old 05-22-2012, 02:09 PM
RetroVertigo RetroVertigo is offline
Guest
 
Join Date: Aug 2005
Quote:
Originally Posted by CJJ* View Post
Do you mean that you have samples numbered 1 thru 20, then pick 4 of those numbers at random as the ones to exclude, rather than picking 16 of those 20 as the ones to include? If so, there is no difference; the 16 tested samples are just as random for each method.
Yes.

Quote:
Now, if the random number generator is slightly biased--say, for example, it produces the number X at a higher-than-expected rate than pure chance--I wonder which method would be better? My instinct says choosing only four numbers would be better because it's fewer opportunities for the bias to occur, but that's just a guess
There is no bias. Purely random.

Thanks this is what I thought, and once I do the math for my boss, I get a free lunch. I just wanted to double check before opening my mouth.
Reply With Quote
  #4  
Old 05-22-2012, 02:56 PM
zoid zoid is offline
Charter Member
 
Join Date: Sep 2001
Location: Chicago Il
Posts: 5,496
Quote:
Originally Posted by RetroVertigo View Post
Yes.



There is no bias. Purely random.

Thanks this is what I thought, and once I do the math for my boss, I get a free lunch. I just wanted to double check before opening my mouth.
Purely random is probably not true, that's almost impossible to accieve.
Reply With Quote
  #5  
Old 05-22-2012, 02:58 PM
Chronos Chronos is offline
Charter Member
 
Join Date: Jan 2000
Location: The Land of Cleves
Posts: 47,954
If you want to express it mathematically, the statement you're looking for is that (20 choose 4) = (20 choose 16).
Reply With Quote
  #6  
Old 05-22-2012, 04:11 PM
The Hamster King The Hamster King is offline
Charter Member
 
Join Date: Jun 2000
Location: Los Angeles
Posts: 8,749
Quote:
Originally Posted by RetroVertigo View Post
Now the process for assigning and generating numbers is very long for each number.
Now I'm curious what this process is.

Something tells me you're not rolling a 20-sided die.
Reply With Quote
  #7  
Old 05-22-2012, 04:14 PM
CJJ* CJJ* is offline
Guest
 
Join Date: Nov 2004
Quote:
Originally Posted by Chronos View Post
If you want to express it mathematically, the statement you're looking for is that (20 choose 4) = (20 choose 16).
To amplify a bit, "Number of ways Y objects can be chosen from X objects"--written here as "(X choose Y)"--can be shown to be X!/[Y!(X-Y)!] (X! = "X factorial" = 1*2*3*...*(X-1)*X). With that formula, it's easy to see the equality.
Reply With Quote
  #8  
Old 05-22-2012, 04:53 PM
RetroVertigo RetroVertigo is offline
Guest
 
Join Date: Aug 2005
Quote:
Originally Posted by zoid View Post
Purely random is probably not true, that's almost impossible to accieve.
I meant that as there was no human bias to choosing the numbers, but re-reading CJJ* that is not what they were talking about.


Quote:
Originally Posted by The Hamster King View Post
Now I'm curious what this process is.

Something tells me you're not rolling a 20-sided die.
Nothing complicated, it is more like internally generated paperwork that must be filled out each time we use the generator, which ensures a quality control. We use an company-wide program (kinda a beefed up Excel) with an equation probably similar to Random.org (only no one really has access to it for me to verify). The annoying part is filling out the paperwork to go along with it. It easily takes five minutes per sample to finish.

So thanks to you wonderful people, I now have an extra hour of my day not filling in boxes with control numbers. Thank-you!
Reply With Quote
  #9  
Old 05-22-2012, 05:03 PM
The Hamster King The Hamster King is offline
Charter Member
 
Join Date: Jun 2000
Location: Los Angeles
Posts: 8,749
Quote:
Originally Posted by RetroVertigo View Post
Nothing complicated, it is more like internally generated paperwork that must be filled out each time we use the generator, which ensures a quality control. We use an company-wide program (kinda a beefed up Excel) with an equation probably similar to Random.org (only no one really has access to it for me to verify). The annoying part is filling out the paperwork to go along with it. It easily takes five minutes per sample to finish.
LOL ... you should just roll a D20. It would be faster and more random than whatever numbers the official pseudo-random generator is spitting out.

(FWIW ... random.org doesn't use an equation. Their numbers come from sampling noisy physical systems.)
Reply With Quote
  #10  
Old 05-22-2012, 05:06 PM
Dr. Love Dr. Love is offline
Charter Member
 
Join Date: Mar 2002
Posts: 925
Quote:
Originally Posted by CJJ* View Post
Now, if the random number generator is slightly biased--say, for example, it produces the number X at a higher-than-expected rate than pure chance--I wonder which method would be better? My instinct says choosing only four numbers would be better because it's fewer opportunities for the bias to occur, but that's just a guess
If you're worried about bias or the effects of covariance from your random number generator, you can choose your entire sample with a single random number. Take your random number between 0 and 1, multiply by (20 choose 4 = ) 4845 and take the floor (or ceiling). Now you just have to transform this integer into a set of 4 numbers. The first (19 choose 3 = ) 969 will contain 1. The next (18 choose 3 = ) 816 will contain 2, but not 1. Of those that contain 2 but not 1, the first (17 choose 2 = ) 136 will contain 3, and the next (16 choose 2 = )120 will contain 4 but not 3, and so on.

Last edited by Dr. Love; 05-22-2012 at 05:07 PM.
Reply With Quote
  #11  
Old 05-22-2012, 05:09 PM
Dr. Love Dr. Love is offline
Charter Member
 
Join Date: Mar 2002
Posts: 925
Quote:
Originally Posted by The Hamster King View Post
LOL ... you should just roll a D20. It would be faster and more random than whatever numbers the official pseudo-random generator is spitting out.
That depends on how you're generating numbers. Trying to get 16 unique results in a row with a D20 might take a while.
Reply With Quote
  #12  
Old 05-22-2012, 05:13 PM
ultrafilter ultrafilter is offline
Guest
 
Join Date: May 2001
Quote:
Originally Posted by RetroVertigo View Post
Nothing complicated, it is more like internally generated paperwork that must be filled out each time we use the generator, which ensures a quality control. We use an company-wide program (kinda a beefed up Excel) with an equation probably similar to Random.org (only no one really has access to it for me to verify). The annoying part is filling out the paperwork to go along with it. It easily takes five minutes per sample to finish.
I'd be willing to bet a lot of money that this is not a good random number generator. You'd almost certainly be better off using something that contains open source random number generation algorithms.
Reply With Quote
  #13  
Old 05-22-2012, 05:31 PM
RetroVertigo RetroVertigo is offline
Guest
 
Join Date: Aug 2005
Quote:
Originally Posted by The Hamster King View Post
LOL ... you should just roll a D20. It would be faster and more random than whatever numbers the official pseudo-random generator is spitting out.

(FWIW ... random.org doesn't use an equation. Their numbers come from sampling noisy physical systems.)
I didn't know that, but i was just using them as an example.

Quote:
Originally Posted by ultrafilter View Post
I'd be willing to bet a lot of money that this is not a good random number generator. You'd almost certainly be better off using something that contains open source random number generation algorithms.
I not have access to the equation/ algorithm so I can't speak to it. Whatever we use it meets standards for my company, which is a fairly large multi-national corporation, so I'm in no position to change it (no matter how stupid it is )
Reply With Quote
  #14  
Old 05-22-2012, 05:44 PM
Indistinguishable Indistinguishable is online now
Guest
 
Join Date: Apr 2007
Quote:
Originally Posted by CJJ* View Post
To amplify a bit, "Number of ways Y objects can be chosen from X objects"--written here as "(X choose Y)"--can be shown to be X!/[Y!(X-Y)!] (X! = "X factorial" = 1*2*3*...*(X-1)*X). With that formula, it's easy to see the equality.
It's easier to understand the equality just from the observation in the OP... if you choose a set of objects to include, that amounts to a corresponding choice of objects to exclude.

Last edited by Indistinguishable; 05-22-2012 at 05:45 PM.
Reply With Quote
  #15  
Old 05-22-2012, 06:11 PM
The Hamster King The Hamster King is offline
Charter Member
 
Join Date: Jun 2000
Location: Los Angeles
Posts: 8,749
Quote:
Originally Posted by Dr. Love View Post
That depends on how you're generating numbers. Trying to get 16 unique results in a row with a D20 might take a while.
Oho ... but he only needs to get 4 unique results in a row, which is much easier!

Assuming that the Official Corporate Random Number Generator can be set to spit out a number between 1 and 4845, the OP could create a look-up table with all the ways to pick 16 things from 20, and ask the OCRNG to generate an index into the table. Then he only needs to file the paperwork for one random number instead of four.

Of course his boss's head would probably explode ... .
Reply With Quote
  #16  
Old 05-22-2012, 07:03 PM
RetroVertigo RetroVertigo is offline
Guest
 
Join Date: Aug 2005
Quote:
Originally Posted by The Hamster King View Post


Of course his boss's head would probably explode ... .
It wouldn't take much. You should've seen the emails that went back and forth between department heads when it was discovered that I was helping an intern with homework by doing Bradford assays (for those that never took a biochemistry class, its a method that is probably only done in classrooms now, and has been around forever) in our lab. His problem wasn't me actually fucking around on-the-clock, but that we were following the professors procedure, instead of the company dictated way to do things.

Other then that he is a really nice guy, but very old school with how he does stuff.
Reply With Quote
  #17  
Old 05-22-2012, 07:12 PM
Absolute Absolute is offline
There are no absolutes.
Charter Member
 
Join Date: Apr 2000
Location: In flight
Posts: 3,669
Just curious - what is the rationale behind excluding 4 samples out of 20 from the test, as opposed to excluding 10 or 15? It does not seem like you save that much time compared to just testing all of them, nor gain all that much certainty compared to testing a smaller subset.

Last edited by Absolute; 05-22-2012 at 07:12 PM.
Reply With Quote
  #18  
Old 05-22-2012, 07:32 PM
RetroVertigo RetroVertigo is offline
Guest
 
Join Date: Aug 2005
Quote:
Originally Posted by Absolute View Post
Just curious - what is the rationale behind excluding 4 samples out of 20 from the test, as opposed to excluding 10 or 15? It does not seem like you save that much time compared to just testing all of them, nor gain all that much certainty compared to testing a smaller subset.
I have no clue. We always keep 80% of the prepared batch, and the other four in this case go off to various QC departments. I don't know the reasoning behind why those numbers are chosen.
Reply With Quote
  #19  
Old 05-22-2012, 09:55 PM
Chronos Chronos is offline
Charter Member
 
Join Date: Jan 2000
Location: The Land of Cleves
Posts: 47,954
Quote:
LOL ... you should just roll a D20. It would be faster and more random than whatever numbers the official pseudo-random generator is spitting out.
Yes, if it's a good-quality d20. Many polyhedral dice are horribly biased.
Reply With Quote
  #20  
Old 05-22-2012, 10:04 PM
D18 D18 is offline
Guest
 
Join Date: Jan 2001
Relevant Dilbert.
Reply With Quote
  #21  
Old 05-23-2012, 12:07 AM
szabrocki szabrocki is offline
Guest
 
Join Date: Feb 2006
I did this in the 70's

All random generators need a seed number. I use the voltage coming into the power supply on a server.

Say it is 230 volts. It is really 230.XXXXXX volts. That XXXXXX constantly changes in microseconds.

Use the decimal ( or part of it) as the seed number and that is about as random as you can get as it changes every moment.
Reply With Quote
  #22  
Old 05-23-2012, 11:33 AM
septimus septimus is offline
Guest
 
Join Date: Dec 2009
Quote:
Originally Posted by szabrocki View Post
All random generators need a seed number. I use the voltage coming into the power supply on a server.
To each his own. My great-uncle used the U.S. Treasury daily balance.
Reply With Quote
  #23  
Old 05-23-2012, 12:08 PM
Chronos Chronos is offline
Charter Member
 
Join Date: Jan 2000
Location: The Land of Cleves
Posts: 47,954
It depends on how much entropy you need, and how quickly you burn through it. You can also get entropy from user input devices like keyboards and mice, or from a microphone, or from the least-significant bits of the processor temperature, all things which the typical computer has access to. If you really want to get serious about it, you get a special piece of hardware that has a sample of some radioactive material in it, and use the timing of the decays to generate entropy.
Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 02:32 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

Send questions for Cecil Adams to: cecil@chicagoreader.com

Send comments about this website to: webmaster@straightdope.com

Terms of Use / Privacy Policy

Advertise on the Straight Dope!
(Your direct line to thousands of the smartest, hippest people on the planet, plus a few total dipsticks.)

Publishers - interested in subscribing to the Straight Dope?
Write to: sdsubscriptions@chicagoreader.com.

Copyright © 2013 Sun-Times Media, LLC.