In reading a medical journal article, I came across the following statement:
Mean age of girls was 13.56 ± 0.79, and of the boys was 14.73 ± 1.02 (p = 0.032).
My question: What does the ‘p’ refer to?
Many thanks,
mmm
In reading a medical journal article, I came across the following statement:
Mean age of girls was 13.56 ± 0.79, and of the boys was 14.73 ± 1.02 (p = 0.032).
My question: What does the ‘p’ refer to?
Many thanks,
mmm
p is the probability that the result was from chance alone. It’s saying that “if the null hypothesis is true, then this result would only occur 3.2% of the time.”
My guess is that it is testing whether there is a statistically significant difference between boys and girls. Thus it is testing whether the means are significantly different, and it gets the pvalue of 0.032, which would be significant at 95% confidence.
Can you give us more info about the study? It’s not common for age to be a dependent variable. Did they, say, look at a school’s band and see if maybe older kids or younger kids tend to join the band? I can’t think of many other scenarios where you’d measure age as a dependent variable.
So…the lower the p value, the greater chance the results are reliable?
What sort of p value is considered acceptable? That is, what is the threshold between valuable info and questionable results?
It is a published study on types of headaches experienced by adolescents. The participants were followed-up with for four consecutive years.
There’s no hard and fast answer to that. A value of 5% might be acceptable in many cases, but not in others (e.g. a DNA match in a murder trial).
I wouldn’t phrase it that way. Think of it as a measure of plausibility of the null hypothesis. The lower the p-value, the harder it is to believe that the null hypothesis is true.
In general, people use p = .05 as a cutoff for “significance”. But just claiming that every test with a p-value less than a certain value gives you valuable information is wrong-headed to say the least. What you want to do really depends on the problem at hand, and it’s much harder than the average user of statistics believes to write down a set of instructions that will always give meaningful results.
The common convention is p < 0.05 indicates a statistically significant result. That generally means your result is due to something interesting 95% of the time, rather than just a random fluke. Admittedly, it’s a fairly arbitrary line to draw.
In the context of this study, it means that you can be fairly confident in the result that girls get headaches about one year before boys, on average.
Of course, p < 0.05 isn’t irrefutable proof of anything. One in twenty results with a p = 0.05 are due to chance alone. Also, researches can cherry-pick significant results out of a mass of random data (consciously or unconsciously). Or there could be some methodological error by the researchers – maybe they didn’t do a truly random survey of adolescents.
P-values only indicate statistical significance; it is up to us as experts in our particular field (medicine, law, engineering, etc.) to determine whether there is practical significance.
In hypothesis testing, the general framework is binary: you either reject a null hypothesis or fail-to-reject it (a lot of books say “accept the null hypothesis” but that’s too close semantically to saying the null is true, which we can never conclude from a test). Nowadays, most journals don’t report “reject/fail-to-reject” but rather the p-value. But incorporating p-values into the hypothesis framework is simple: just compare the reported p-value,to your personal threshold, with the idea that smaller p-values imply stronger (statistical) evidence against the null hypothesis.
Typical choices of cutoff values are 0.05, 0.1, and 0.01 (roughly in that order). Why these levels? More convention than anything else, similar to why we in the U. S. drive on the side of the road. There is nothing magic about these cutoffs.
You’ve got to tell us more about what the article said. What hypothesis was it trying to prove? All you’ve told us was that it was about headaches in adolescents. What claim did the article make about headaches in adolescents? It can’t have been about the difference between the ages of the boys and girls. The two age ranges you give overlap, so nothing could have been proved about the relative ages with a probability of .032.
Yes it could be. “Overlapping” does not preclude a difference. In fact, normal distributions with different means will always overlap. There *is *a crude rule of thumb that overlapping standard error suggests no significant difference, but that’s really only good for checking a gut impression after squinting at a graph for a few seconds. In this case we don’t even know if the plus or minus is reporting standard error or standard deviation.
The part I quoted was just discussing the population of the study.
Here is a more complete version:
I am merely curious as to what the ‘p’ represents.
mmm
That’s fine, and everyone understands that, and it has been explained as well as it can be. The problem is that a naked p value means almost nothing. It’s meaningful only in context. Some of my professors would argue that it’s also meaningless unless you report what statistical test was used to generate the p value, along with all the other values that were used in that specific test.
Mention should also be made here of Baysian statistics, which is basically just a way of quantifying that extraordinary claims call for extraordinary evidence. If you find a result that would only be produced by the null hypothesis with a 1% probability, does that mean that you can conclude that the null hypothesis is false? Not necessarily, depending on what the null hypothesis is: It could just mean that you got lucky (or unlucky, depending on how you look at it). You need to have some prior estimate of the probabilities to compare to.
For instance, suppose I want to know if a coin is fair, and flip it eight times in a row, and it comes up heads every time. Well, that certainly tells me that something unusual happened. But is the unusual thing that happened just that I saw some lucky coin flips, or is the unusual thing that I have a two-headed coin? Well, in general circulation, two-headed coins are far, far rarer than legitimate coins, probably less than 1 trick coin per 10,000 legit coins. So it’s far more reasonable for me to conclude that I got lucky with a fair coin, than that I have a trick coin.
On the other hand, if I’m at a magicians’ convention and I find a coin, and flip it the same number of times and get all heads, then I’m working with a different prior. Trick coins are probably pretty common at magicians’ conventions, perhaps as much as 1 in 50 or so. Even though my p-value is the same, what I’m comparing it to is different. Now, it’s much more reasonable to assume that I have a trick coin, rather than that I just got a fluke set of flips.
Since this is the context, they are telling the reader that there was a statistically significantly difference in ages between the boys and girls. That may or may not introduce a confound into the study (that is any claim made about a difference between boys and girls may actually be attributable to a difference between younger and older children).
Does the “(p = 0.032)” part belong in the sentence it’s in? It sounds like they’re saying “For girls, the average age in the study group is 13.56, and the standard distribution of the ages is 0.79, and for boys, the average age in the study group is 14.73, and the standard distribution of the ages is 1.02. But there’s a 3.2% chance that’s not correct.”
Assuming they know basic math, there shouldn’t be any uncertainty in the averages or standard deviations of the people who make up the study group.
No, it definitely doesn’t mean anything like that.
No. It’s not a measure of uncertainty of the measurements. It’s a measurement of probability. It’s saying that there’s a 3.2% chance that these exact numbers could have come about through chance alone, even if there were no real effect.