You can have a look at my stats analysis here in post 86 but I don’t really know what I’m doing. It seems that dopers don’t seem to be born on Mondays for some reason, and there’s under 10% chance it ain’t chance.
But I dunno anything about what I talk, so would appreciate y’all having a look at that thread.
Note that the above analysis assumes that no reason for deviation; I would expect an analysis looking for Monday deviation would be statsitically significant.
Therefore, while I did an analysis just checking if deviation from uniform distriubtion was unusual, or at least I think I did, maybe you could also check to see if assuming Monday is different does soemthing.
Oh and just did the maths for the current 255 values, see:
Chi squared equals 10.808 with 6 degrees of freedom.
The two-tailed P value equals 0.0945
By conventional criteria, this difference is considered to be not quite statistically significant.
The P value answers this question: If the theory that generated the expected values were correct, what is the probability of observing such a large discrepancy (or larger) between observed and expected values? A small P value is evidence that the data are not sampled from the distribution you expected.
Review your data:
Row # Category Observed Expected # Expected
1 a 35 36.4285714268 14.286%
2 b 20 36.4285714268 14.286%
3 c 38 36.4285714268 14.286%
4 d 40 36.4285714268 14.286%
5 e 42 36.4285714268 14.286%
6 f 45 36.4285714268 14.286%
7 g 35 36.4285714268 14.286%
Er… this looks like a false positive. You’re taking the lowest score out of 7 and saying that it outside the 90% confidence interval. That doesn’t sound remarkable. If there were 10 choices for example, you would expect that one of them would be outside of that interval.
I’ll leave it to somebody else to see whether I messing up 1 tail/ 2 tail testing.
If you make the claim that Mondays are signficantly underrepresented, test it, and the test confirms your claim, then you are in a fairly strong position.
If you poll data, and then take the lowest represented day, you are in a weak position.
Do the same thing with license plates. If you can write down the first license plate of the first white pickup truck you will see later that day, and then later test this (and succeed), you might make a case for having a gift.
Seeing the first license plate of the first white pickup truck you come across and then write it down: not so special.
I’m just curious about possible reasons why. It is definitely a self-selecting sample, but I would think even in a self-selecting sample, something like day of birth should be random. I can’t see any reason to think it’s not, and am curious what those reasons could be, even if they don’t matter.
That said, I just think it’s just statistical noise.
I would think that having a self-selected respondent group would only matter if there was some correlation between why they were self-selecting and the choices in the poll. If people born on Mondays were less likely to respond to polls, for example. That doesn’t seem likely to me, though.
I suspect people having Caesarians are skewing the numbers, which may partly explain it.
That’s not a random sample from the entire population - speaking about significance is meaningless. Your results might still accurately reflect the distribution in the population, sure, but we have no reasonable way of establishing the confidence with which we can make that assertion.
Hang on. Doesn’t that just depend on what your question is? Obviously with a non-random sample of Dopers, we can’t make any inferences about the larger population of Dopers, Americans, or english-speaking-people-on-the-internet. But we can still say whether or not this (non-random, probably biased) sample of birth days is consistent with the naive hypothesis that birth days should be uniformly distributed.
That’s what the Chi-square test in post #2 tells us, yes? I.e. if we sample from uniformly distributed birth days, we’d see similar deviation from expectation in about 1 in 10 trials (P = .09). Which is a bit uncommon, but it’s common enough that you can’t make strong conclusions about this particular data set.
And I’m pretty sure there aren’t inappropriate multiple comparisons here. That’d be the case if the OP tested whether the number of births on each day individual deviated from 1/7.
Disclaimer: I am not a statistician, but as a biologist I know enough just stats to be dangerous…
You can measure the chi-squared distance between the uniform distribution and the observed distribution, but all the theory about the sampling distribution under the null requires that your sample be random. It’s math, and so what matters here is what you know, not what you can believe.
If the day of the week were truly random, and you selected in some non-random way that was independent of the day of birth, how can that skew the polling?
I can believe that if the day of the week is not random, then the self-selecting could skew the results. e.g., if dopers tend to be wealthier than average, wealthier people are more likely to have C-sections, and C-sections are done mostly on Tuesday to Friday, then you’ll get an enhanced disparity, not representative of the general population. But that’s only because the day of the week wasn’t random to begin with.