I’m designing a test that draws 10 random questions out of a bank of 40. What is the chance I’ll get repeat draws?
I need to answer this in case the project managers bring it up. There might be a way to avoid repeats, but if the chance is low enough, I might not have to bother.
I’m going to leave it to the smarter people to do the math, but I’m quite sure that the probability is unacceptably high. This is a variation on the “same birthday” problem, where one has to calculate the probability that, given a number of randomly-chosen people, two or more share the same birthday. It is quite surprising how few people are required to make it more likely than not that two DO share the same birthday.
Again, I’m not a statistician and someone is welcome to show that I’m wrong.
The chance of the 2nd draw being different from the 1st is 39/40. Given that, the chance of the 3rd draw also being different from the first 2 is 38/40, etc. So the answer is 39/40 x 38/40 x 37/40 x … x 31/40 = 0.293. So there’s only a 29.3% chance of there being no duplicates.
My view: You have one chance out of forty to draw any given question. However, you actually have ten chances out of forty to draw the same given question because you have ten shots at it, so I view it as one chance out of four to draw the same question twice.
I’m sorry, but "views’ don’t matter in mathematics. (Conjectures can be important.) Stating a view particularly after the correct answer and a explanation of same have been given is less than productive. If you want to argue a previous answer is wrong, go ahead.
It isn’t that difficult to select without getting a duplicate. Assign a random number to all 40 questions, sort by the random number, and pick the 10 smallest random numbered questions.
That happens to give an answer close to the correct one, but only by chance. Change the original numbers away from 10 out of 40 and there are a few others where you luck out, but most of the time your method gives a horribly wrong answer to the question posed.
Say he was picking 15 question out of 30. By your method he has 15 shots at making a 1/30 draw, for a probability of 50%. The real answer is that there’s a 98.6% chance of a repeat.
Actually even in this case Jasmine’s method gives a horribly wrong answer. Jasmine says the probability of drawing a duplicate is 25%. The correct answer as given by scr4 is 70.3%.
:smack: Yup. I messed that one up. I just thought “29 and 25 are pretty close, I wonder how much you have to change the conditions for that guesstimate to be truly awful”, and failed to properly comprehend what they were stating.
It’s already been pointed out that this is wrong, but I think it’s worth saying a bit more about why it’s wrong (since this sort of reasoning, though incorrect, is common).
The error is thinking you have “ten chances out of forty to draw the same given question.” But what you really have to consider is, not 10 chances of drawing any one particular question, but 10 chances of drawing any question that matches any of those other 10.
What that doesn’t do, is randomly select every 10-element subset of questions with equal probability. Unlike, say, mcgato’s algorithm or Fisher–Knuth–Yates’s algorithm. So it may be worth it to specify what the results should be before selecting an algorithm.
What makes this a bad method though? Assuming that the division into the 10 banks really is random. Yes, after the banks are created then some combinations become impossible. But why would that matter, if you’re creating a new series of random banks every time?
Maybe I did not understand Knowed Out’s proposal to use banks. If you already know a method of creating a series of random banks, why not create 4 banks of 10 each instead, and let the first bank be the set of questions? And how would you create a series of random banks more simply than performing a random shuffle anyway?
That would be OK, but it sounded like the OP was talking about a fixed set of 10 banks with 4 questions each. That definitely would rule out many possible combinations.
Though that’s not necessarily a bad thing, depending on the goal of this randomization. If these are test questions, each bank could consist of 4 questions in the same category or subject. Then you’re guaranteed to get 1 question out of every category/subject.
Is it possible for you to just set it up so that the questions are drawn in sequence, and when a question is drawn, it is marked as picked and can’t be picked again? Think of it like drawing numbers out of a hat and discarding them as you go.
OK, I get it. I was thinking that the banks were randomly created, you were thinking the banks were arbitrarily created, but we don’t know for sure. Although as you say, if you have a method to randomly sort 40 questions into 10 banks of 4, then you’ve got a method to randomly sort 40 questions into 4 banks of 10, so why not just use the method directly.
And it matters if this is a one-off process, or if you need to use your choosing method over and over again. If you’re doing it multiple times and don’t mix the banks every time, then you’ll eliminate lots of possible combinations. But that might not matter either, if that’s not important.