A question regarding metastudies

Something I’ve been wondering about lately.

Suppose we have 10 studies on the difference in reaction times between men and women when performing a certain task. 5 of them find a statistically significant difference (a p-value less than .05), and 5 of them don’t.

Very naively, I might think that since these studies are independent, that I can use the binomial distribution to find the probability that 5 studies would show a significant difference if the probability of each study finding a significant difference is any specific value. By that analysis, having 5 studies of 10 with significant differences is strong evidence that there is a significant difference.

Is this a valid analysis of the problem? If not, what’s wrong with it?

Meta-analysis requires that you take effect size into account. That is, when you are combining the results, it’s not just the direction of the difference, or the number of times it occurs across studies, but the magnitude of the difference that tells you that you have a consistent finding.

So if I had five studies with p < .05, and five with .05 < p < .1, that would be evidence of consistency, but if those other five were with .5 < p, there’d be no evidence, right? That makes sense.

Individual signficance tests are influenced by sample size, though, which is why you can’t just rely on the size of p. Effect size statistics used in meta-analysis control for sample size.

If the tests were with exactly the same conditions, then you should be able to average them all together to get a higher p. But the practical problem with meta-analyses is that no two studies are just alike, and someone has to compare one’s apples with another’s oranges. Then you have the problem of publication bias, where the studies with a positive outcome are more likely to be published. This would tend to show an effect where none exists. That’s why meta-analyses are usually not very strong evidence.