Yet another statistics question: combining tests of significance

I’ve been doing some reading about meta-analysis techniques that combine tests of significance from multiple studies, all of which are testing the same null hypothesis. Fisher’s combined probability test is perhaps the first such technique, but it is more sensitive to studies with low p-values compared to high p-values. Stouffer’s Z-transform, on the other hand, is symmetrically sensitive to studies with low and high p-values. However, it treats all the studies to be combined as though they have the same power.

The weighted Z-method allows tests to be given a weighting factor so that they have an unequal influence on the final combined p-value. In reading about weighting, it seems that there is no consensus as to what the weighting factor should be. To quote a few references: “to be decided upon by the investigator”, “based on elegance, internal validity, and ecological validity [of the various studies]”, “arbitrary”, “weight each by its degrees of freedom”, “ideally…weighted proportional to the inverse of its error variance”, and “calculated from the sample sizes of each study”.

So, are there some sort of guiding principles that would allow one to make an informed decision as to what weighting factor to choose for a particular meta-analysis? Are some weighting schemes more ‘conservative’ than others? If I want to weight studies based on n/s^2 instead of n, for example, how might I justify it, or vice versa? Does one perform an unweighted test of combined significance first, and then decide whether to involve weighting?

This is the one I’ve seen done most often in physics. The idea is that, if you fit a function to a data set, and then generalize the function to add degrees of freedom, you’ll always improve the fit. For instance, you’ll always get a better fit with a higher-order polynomial than with a lower-order one. But the features in the data which make the higher-order fit better are likely to just be noise, so you don’t want to use the higher-order fit unless it’s really justified. A higher-order polynomial has more degrees of freedom, so by normalizing to the number of degrees of freedom, you’re penalizing the higher-order fit, so you won’t use a higher order unless it’s really justified.

Thanks, Chronos.