Stats 101 question - significance and binomials.

Chessic_Sense · January 19, 2012, 3:23pm

I have a population of people and they have two discrete characteristics. Let’s say they’re either Virginian or not, and either tall or not tall. I need to know if tall people should validly be suspected of being Virginians.

So we’ve taken a sample. Unfortunately, it’s not random- subjects volunteered. Many were Virginians. We know that there’s a correlation between Virginians and tall people.

But is it significant or not? I vaguely recall doing a binomial test, but I’m not sure if that’s called for here. I also remember something about putting the numbers in a 2x2 square and doing some math to figure it out. I can’t google it because I don’t know what it’s called.

Note: We’re not really interested in Virginians as a whole…just those that volunteer for these sorts of things. So we’re cool with our self-selected sample, right? I mean, we only care about predicting the characteristics of future volunteers.

Buck_Godot · January 19, 2012, 3:43pm

A binomial test is used to determine whether a given population deviates from a know probability. Say determining whether a coin is fair by seeing if the number of heads in 50%. In your case you have no known probability but want to see if tallness is significantly more common in one group than another group. For this the two recommended tests are the Chi-squared test and theFisher’s exact test. In general the Fisher’s exact test is better, but the Chi squared test is easier to calculate if you have a large sample size. Here is anonline calculator for both tests

Buck_Godot · January 19, 2012, 3:50pm

ETA: These tests conditioned on the number of cases in each of the two categories, so even if you are biased for Virginians or for Tall people (say your survey comes from a big and tall store in Richmnond) the test will work, provided you aren’t biased for tall (or short) Virginians

cerberus · January 19, 2012, 4:36pm

The basic theory typically requires random sampling from a clearly defined population.

Chessic_Sense · January 19, 2012, 4:36pm

We’re skewed, but we’re not sure if we’re biased. We’re definitely biased for Virginians, but we don’t know about tall people. The bar is set pretty high. Only about 1% of people are “tall” by our metric. Is that a problem?

Cerberus - While we’re examining a self-selected group of Americans, we’re only interested in survey respondents. We don’t care about everybody else. We don’t care if Virginians are tall, just if our survey respondents are (and thus, will be) tall Virginians. So in that sense, I think we’re actually sampling 100% of the defined population. Right?

nivlac · January 19, 2012, 6:47pm

Let’s say you define some cut-off height as tall or not tall. This could be the mean height of some larger population from which the Virginians (Vs) were selected. From the V sample, count how many are tall based on your pre-defined tall definition. You can then calculate the probability that the observed number of tall Vs is as large as it is under the null hypothesis that the Vs are no different (or no shorter for a one-way test) than the general population. This p-value can be calculated using the binomial distribution. Then you can either reject or not reject the null hypothesis based on the p-value (reject if p-value is smaller than your significance level, etc.). That’s one way to do it.

Buck_Godot · January 19, 2012, 6:51pm

None of this should be a problem from a theoretical point of view. Although with only 1% of cases being tall, you will need to either have a large sample size or a very large effect in order to detect a difference.

In general if you have N total samples, and p < 50% are Tall and q < 50% are non-Virginian, then the size of the effect you are able to detect will be more or less proportional to pqN.

Buck_Godot · January 19, 2012, 6:58pm

This would require assuming that your estimate for that percentage in the Null population was exactly correct. Since you are likely estimating it from the finite set of non-V in your sample, it will not be completely accurate, and you need to take this extra variability into account in your test. To do it right you really need to use either the Chi-squared of Fisher test.

Buck_Godot · January 19, 2012, 7:05pm

Ooops forgot a square root. Should have been Sqrt(Npq)

cerberus · January 20, 2012, 9:47pm

No. The original question was about the relationship of two populations: Tall People and Virginians. The leap you want to take is to address the question with a sample.

Random samples allow for the incorporation of sampling variation, non-random
sampling does not.

You can certainly address the issue in terms of the single sample used, but the answer only applies to that sample.

Topic		Replies	Views
A quick stats computation Factual Questions	7	657	November 13, 2008
No Statistical Significance Factual Questions	14	1007	March 24, 2001
Stats: Correct Use of Binomial Distribution Factual Questions	8	1627	May 4, 2016
stat. significance test possible? Factual Questions	16	897	February 17, 2003
Help with Statistical Significance Factual Questions	18	1759	July 19, 2011

Stats 101 question - significance and binomials.

Related topics