I’ve been doing some reading about meta-analysis techniques that combine tests of significance from multiple studies, all of which are testing the same null hypothesis. Fisher’s combined probability test is perhaps the first such technique, but it is more sensitive to studies with low p-values compared to high p-values. Stouffer’s Z-transform, on the other hand, is symmetrically sensitive to studies with low and high p-values. However, it treats all the studies to be combined as though they have the same power.
The weighted Z-method allows tests to be given a weighting factor so that they have an unequal influence on the final combined p-value. In reading about weighting, it seems that there is no consensus as to what the weighting factor should be. To quote a few references: “to be decided upon by the investigator”, “based on elegance, internal validity, and ecological validity [of the various studies]”, “arbitrary”, “weight each by its degrees of freedom”, “ideally…weighted proportional to the inverse of its error variance”, and “calculated from the sample sizes of each study”.
So, are there some sort of guiding principles that would allow one to make an informed decision as to what weighting factor to choose for a particular meta-analysis? Are some weighting schemes more ‘conservative’ than others? If I want to weight studies based on n/s^2 instead of n, for example, how might I justify it, or vice versa? Does one perform an unweighted test of combined significance first, and then decide whether to involve weighting?