I am very rusty at statistics. I am trying to determine how to calculate the sample size for a given confidence level and confidence interval (margin of error). I know the population size. The variable is binary. My statistics text talks about sample size to determine the mean of a statistic normally distributed in a population (like average scores for third graders taking standardized tests) but not my situation. Because the variable is binary, the type of distribution doesn’t really apply.
The scenario is how many data values from a database must be sampled to determine the density of errors in the database. A value either has an error or it doesn’t; we don’t define multiple errors in the same data value.
As an example let’s take 10 million values, 99% confidence level, and margin of error of ±5%.
I learned this stuff in school but never used it again. A link to a site with a good explanation would be sufficient if you know of one. All I could find by searching was similar to what’s in the text.