How do I calculate a really simple (and stupid) confidence interval?

Ok, I’m completely at a loss here. I was trying to make a joke about a statistical survey (read as: doomed from the start) that was very small, and thus wildly inaccurate, but I can’t seem to figure out how to calculate the confidence interval for my “survey.”

My survey size is 2. Myself, and the first person to walk through my door after I discovered a certain fact about the company we work for. He did not previously know the fact either.

The population size is 250, the approximate number of people working at my company.

I want to know the percentage of people who work at my company who were not yet aware of this fact, to a 95% confidence level. I expect that the range will be huge, but I have no idea how to calculate it. I found several formulae and online calculators, but they all seem to break down on such a small survey. Little help?

The standard deviation is too large in small samples for confidence limits to be calculated.

Your data simply cannot be used… or generates obvious conclusions:

  1. 0.4% (I.E only you) know the fact.
  2. 99.6% (I.E. only your coworker didn’t) know the fact.
  3. Something, anything, in the middle.

What’s the fact?

I think there’s a slight confusion (ambiguity in how I phrased the question): I’m saying that neither of us knew the fact. Or perhaps I should phrase it as “neither of us were told the fact on starting our employment here.” So, with that definition, the standard deviation of the sample is zero: we both answered the same way.

The fact is a particular of how our health plan covers eye exams. Not particularly interesting, except to those who could have saved money using it and didn’t know about it.

I suppose that I could say that the lower limit is 2/250 people not knowing this, and the upper is all 250 people, but it seems like there ought to be some statistical math involved somewhere (even if it is small enough to reduce to zero in this case).

Obviously, if I polled three people, the chances are still small, but if I polled 20, then the confidence interval ought to be higher than just 20/250 to 250/250, right? So, what’s the term in the calculation that drops away for a sample of 2 but is important for one of 20?

I don’t know how useful this will be but my tables for small sample statistics give the following.

90% confidence level. Sample size of 5. With 0 successes in the sample the success rate for an infinite population is 0 to 41%.

95% confidence level. Sample size of 10. The success rate would be 0 to 31%.

Obviously the upper bound is 250, that is, everyone is ignorant. For a lower bound, you want an n such that n/250 * (n-1)/249 = 0.05. which works out to be n = 55.29 ~ 55. So if only 55 people are ignorant of the fact, there is only a 5% chance that you two would be ignorant (assuming indepedence blah blah).

Thus, your confidence interval is {22.16%, 100%}.

Thanks Shalmanese, that was exactly the kind of calculation I was looking for.

Since you do not have a random sample, you cannot even think in terms of a formal inference. Additionally, since only your “sample size” is 2, the only possible estimates for your sample proportion are 0, .5 or 1.00. The standard error for these sample proportions, respectively, are:

sdp=sqrt(p*(1-p)/n)=sqrt(01/2)=0
sdp=sqrt(p
(1-p)/n)=sqrt(.5*.5/2)=.5/sqrt(2)
sdp=sqrt(p*(1-p)/n)=sqrt(1*0/2)=0

At n=2, the usual stochastic properties of the sample proportion do not apply.
You need proper random samples to use confidence estimation, and that excludes cnvenience sampling.

At n=2, even the conditionally exact, permutation-based methods will fall short, even if you did manage to do a random sample at n=2.

cerberus, I’m afraid I don’t understand completely. Are you saying that Shalmanese’s calculations are incorrect, or are you just adding further information about the statistics?

I realize that my “sample” was not random. But let’s assume for a moment that I really did make a very small random sample. There should still be a calculable 95% bound, even if it is very large, right?

It would be highly uninformative … think about n=2:

There are three possibilities: 0 of 2, 1 of 2, 2 of 2. In the extreme cases, you get a lack of variation in the sample, hence the zero standard error. In the middle case, you get an interval centered on p=.50.

The methodology for n=2 is utterly, completely useless, because the minimum resolution of a sample of 2 is 50%.

This is computing a confidence interval post-experimentation. You already have the result. Your calculations assume normality which is not valid in this case. With a sample size of 2, you can go all the way back to first principles and work it out from there.

The approach fails, since it is not based on a random sample from a well-defined population. And again, even if you did have an appropriate random sample, the only available point estimates for the proportion based on a sample of 2 are: 0, .50, 1.00. Useless.