How to determine sample size for binary variable

CookingWithGas · May 12, 2012, 11:03pm

I am very rusty at statistics. I am trying to determine how to calculate the sample size for a given confidence level and confidence interval (margin of error). I know the population size. The variable is binary. My statistics text talks about sample size to determine the mean of a statistic normally distributed in a population (like average scores for third graders taking standardized tests) but not my situation. Because the variable is binary, the type of distribution doesn’t really apply.

The scenario is how many data values from a database must be sampled to determine the density of errors in the database. A value either has an error or it doesn’t; we don’t define multiple errors in the same data value.

As an example let’s take 10 million values, 99% confidence level, and margin of error of ±5%.

I learned this stuff in school but never used it again. A link to a site with a good explanation would be sufficient if you know of one. All I could find by searching was similar to what’s in the text.

iamnotbatman · May 13, 2012, 1:17am

Binomial distribution?

ultrafilter · May 13, 2012, 2:16am

I think this page describes it pretty well. If the errors are rare, you’d probably be better off using the Wilson score interval, but it’s not an unforgivable sin if you don’t.

CookingWithGas · May 14, 2012, 2:46pm

This looks like just what I need, but it doesn’t take into account population size, only sample size. How can that work? Does it assume an infinite population?

ultrafilter · May 14, 2012, 2:49pm

Population size doesn’t matter unless your sample is a very large fraction of it.

CookingWithGas · May 14, 2012, 3:24pm

I think this is a little different than what I need. This answers the following question:

If the probability of finding an attribute in a population is x%, what is the probability of finding that attribute in n members of a sample of size s?

My question is the reverse:

If an attribute is found in n members of a sample size s, what is the confidence level of finding that attribute in the population in the same proportion within a given confidence range?

Buck_Godot · May 15, 2012, 6:57pm

Never mind made a mistake see next post

Buck_Godot · May 15, 2012, 7:08pm

Actually just look the links by ultra filter.

Topic		Replies	Views
Statistics sampling question: am I doing this right? Factual Questions	15	1660	August 31, 2012
Is my company being cheated? (Sadistics) Factual Questions	3	1170	July 12, 2006
Stats: Correct Use of Binomial Distribution Factual Questions	8	1622	May 4, 2016
How do I calculate a really simple (and stupid) confidence interval? Factual Questions	10	1395	December 22, 2006
How is margin of error calculated for polls? Factual Questions	27	3534	January 17, 2010

How to determine sample size for binary variable

Related topics