Probability question

Knowed_Out · June 15, 2018, 2:33pm

I’m designing a test that draws 10 random questions out of a bank of 40. What is the chance I’ll get repeat draws?

I need to answer this in case the project managers bring it up. There might be a way to avoid repeats, but if the chance is low enough, I might not have to bother.

ZonexandScout · June 15, 2018, 2:51pm

I’m going to leave it to the smarter people to do the math, but I’m quite sure that the probability is unacceptably high. This is a variation on the “same birthday” problem, where one has to calculate the probability that, given a number of randomly-chosen people, two or more share the same birthday. It is quite surprising how few people are required to make it more likely than not that two DO share the same birthday.

Again, I’m not a statistician and someone is welcome to show that I’m wrong.

scr4 · June 15, 2018, 3:01pm

Agree with Zonexandscout.

The chance of the 2nd draw being different from the 1st is 39/40. Given that, the chance of the 3rd draw also being different from the first 2 is 38/40, etc. So the answer is 39/40 x 38/40 x 37/40 x … x 31/40 = 0.293. So there’s only a 29.3% chance of there being no duplicates.

Jasmine · June 15, 2018, 3:10pm

My view: You have one chance out of forty to draw any given question. However, you actually have ten chances out of forty to draw the same given question because you have ten shots at it, so I view it as one chance out of four to draw the same question twice.

OldGuy · June 15, 2018, 3:19pm

I’m sorry, but "views’ don’t matter in mathematics. (Conjectures can be important.) Stating a view particularly after the correct answer and a explanation of same have been given is less than productive. If you want to argue a previous answer is wrong, go ahead.

scr4 · June 15, 2018, 3:22pm

p.s. For the Birthday Problem, the answer is >50% if you have 23 or more people. Nice demonstration for teaching statistics in a classroom.

mcgato · June 15, 2018, 3:34pm

It isn’t that difficult to select without getting a duplicate. Assign a random number to all 40 questions, sort by the random number, and pick the 10 smallest random numbered questions.

naita · June 15, 2018, 3:37pm

That happens to give an answer close to the correct one, but only by chance. Change the original numbers away from 10 out of 40 and there are a few others where you luck out, but most of the time your method gives a horribly wrong answer to the question posed.

Say he was picking 15 question out of 30. By your method he has 15 shots at making a 1/30 draw, for a probability of 50%. The real answer is that there’s a 98.6% chance of a repeat.

Lance_Turbo · June 15, 2018, 3:57pm

Since the question has been answered, here’s one way to get 10 truly random numbers out of 40 without duplicates…

random.org

Knowed_Out · June 15, 2018, 3:58pm

Lovely. Looks like I have to add more coding. Thanks, all.

markn_1 · June 15, 2018, 4:01pm

Actually even in this case Jasmine’s method gives a horribly wrong answer. Jasmine says the probability of drawing a duplicate is 25%. The correct answer as given by scr4 is 70.3%.

Knowed_Out · June 15, 2018, 5:07pm

I think I’m just going to divide each bank of 40 into 10 banks of 4 each, and have one draw from each of the 10 banks.

naita · June 15, 2018, 8:56pm

:smack: Yup. I messed that one up. I just thought “29 and 25 are pretty close, I wonder how much you have to change the conditions for that guesstimate to be truly awful”, and failed to properly comprehend what they were stating.

Thudlow_Boink · June 15, 2018, 9:03pm

It’s already been pointed out that this is wrong, but I think it’s worth saying a bit more about why it’s wrong (since this sort of reasoning, though incorrect, is common).

The error is thinking you have “ten chances out of forty to draw the same given question.” But what you really have to consider is, not 10 chances of drawing any one particular question, but 10 chances of drawing any question that matches any of those other 10.

DPRK · June 15, 2018, 9:09pm

What that doesn’t do, is randomly select every 10-element subset of questions with equal probability. Unlike, say, mcgato’s algorithm or Fisher–Knuth–Yates’s algorithm. So it may be worth it to specify what the results should be before selecting an algorithm.

Lemur866 · June 15, 2018, 9:36pm

What makes this a bad method though? Assuming that the division into the 10 banks really is random. Yes, after the banks are created then some combinations become impossible. But why would that matter, if you’re creating a new series of random banks every time?

DPRK · June 15, 2018, 9:47pm

Maybe I did not understand Knowed Out’s proposal to use banks. If you already know a method of creating a series of random banks, why not create 4 banks of 10 each instead, and let the first bank be the set of questions? And how would you create a series of random banks more simply than performing a random shuffle anyway?

scr4 · June 15, 2018, 9:54pm

That would be OK, but it sounded like the OP was talking about a fixed set of 10 banks with 4 questions each. That definitely would rule out many possible combinations.

Though that’s not necessarily a bad thing, depending on the goal of this randomization. If these are test questions, each bank could consist of 4 questions in the same category or subject. Then you’re guaranteed to get 1 question out of every category/subject.

Delayed_Reflex · June 15, 2018, 9:57pm

Is it possible for you to just set it up so that the questions are drawn in sequence, and when a question is drawn, it is marked as picked and can’t be picked again? Think of it like drawing numbers out of a hat and discarding them as you go.

Lemur866 · June 15, 2018, 10:00pm

OK, I get it. I was thinking that the banks were randomly created, you were thinking the banks were arbitrarily created, but we don’t know for sure. Although as you say, if you have a method to randomly sort 40 questions into 10 banks of 4, then you’ve got a method to randomly sort 40 questions into 4 banks of 10, so why not just use the method directly.

And it matters if this is a one-off process, or if you need to use your choosing method over and over again. If you’re doing it multiple times and don’t mix the banks every time, then you’ll eliminate lots of possible combinations. But that might not matter either, if that’s not important.

Topic		Replies	Views
Probabilities involving multiple groups Factual Questions	8	784	September 19, 2006
Random??? Factual Questions	10	1679	August 16, 2012
Easy, easy probability question. Factual Questions	7	714	November 7, 2000
Probability/argument with g/f Factual Questions	32	1501	October 9, 2003
Probability question Factual Questions	10	1600	August 21, 2009

Probability question

Related topics