Education Coordinating Group

Executive Summary

Campbell Collaboration Statistical Analysis Policy Brief

Systematic reviews of the effects of interventions and relations among variables often rely on statistical summaries of the results of primary studies. Because Campbell Collaboration (C2) systematic reviewers are likely to face a variety of statistical issues in conducting reviews, this policy brief attempts to:

  1. identify the key issues that are confronted by C2 systematic reviewers who want to synthesize the results of studies statistically,
  2. outline possible ways that statistical procedures might be used, and
  3. provide exemplars of how these methods might be used.

In this brief we address six key issues concerning statistical analysis, and make proposals for C2 policies for each. A summary of the issues and our proposals follows.

When conducting a research synthesis, is it ever appropriate for a C2 reviewer to do a review without statistically integrating the results of studies? If yes, what are the characteristics of the literature that make this permissible?

Proposal: Study findings should be represented as effect sizes (i.e., indices of treatment impact or relationship strength) in C2 reviews whenever the studies being summarized present quantitative findings. Statistical integration should only be used in any C2 review (or any part of a C2 review) where a summary conclusion from at least two studies is desired, the studies and effect sizes are sufficiently similar to justify integration, and the number of studies is sufficient to support the analysis used in that statistical integration.

When statistical integration is used in a C2 review, are there certain statistical procedures that should routinely be carried out? If so, what are they?

Proposal: Statistical summaries of average effects and variation in effects should be computed (and reported) for fixed-effects, random-effects or both types of analyses. The specific statistics used will depend on whether the review is aimed at (a) estimating a mean effect across studies, (b) examining the variation in effect-size estimates across studies, or (c) fitting a model of effect-size variation.

When systematic reviews retrieve and code characteristics of statistical analyses, what characteristics of the analyses should routinely be coded, and, if possible, examined for their impact on the outcomes of studies?

Proposal: Reviewers should code (a) characteristics of the statistical analyses used in the primary study and (b) details about the computations used for the effect size derived from that study. C2 takes the position that it is important to document specific statistical procedures and methods for computing an effect size just as it is important to code study design differences. Coding of statistical procedures allows the use of sensitivity analyses as a method for examining how differences in statistical methods of studies or effect-size computations influence the results of the systematic review.

Should multiple (nonindependent) effect-size estimates from the same study ever be used in a C2 synthesis?

Proposal: Reviewers should not ignore dependence among study outcomes. They should use some procedure to deal with dependence, describing and giving a justification for that procedure, even if it is ad hoc. Simple approaches such as dropping or combining outcomes or using sensitivity analyses may make sense if the amount of dependent data is small. More sophisticated analyses may be needed if multivariate data are prevalent in the review. In such cases the reviewer must assess the similarity of studies and availability of reliable information on the extent of dependence.

Should C2 have a role in advancing cross-design synthesis methods (e.g., propensity scoring and alternatives)? What must be considered if/when reviewers combine estimates of effect from randomized trials with estimates of effect based on other designs, such as surveillance systems, passive observational studies, etc?

Proposal: In some syntheses results from subsets of studies in the synthesis will not be comparable. In such cases reviewers should not summarize across the designs, but rather should report both sets of results separately. In other cases where effects are more comparable, the reviewer may wish to summarize across designs as well as provide separate results by design. Assumptions underlying such comparisons should be made explicit, and the reviewer should critically examine the data for the possibility of design-related differences in effects. Further, when such comparisons are made, the type of design should be tested as a moderator variable and separate results should be reported.

Furthermore, while the primary focus of C2 is on matters directly related to research cumulation, the study and careful application of methods of cross-design synthesis is consistent with the goals of C2.

What should be the role of C2’s Social, Psychological, Educational and Criminological Trials Register (SPECTR) in supporting or informing the statistical research that might be done in the Campbell context?

Proposal: The Steering Committee should endorse the use of SPECTR for research on normative methodological and reporting practice in relevant research domains, improving information for imputation in effect-size computation, and studying associations between synthesis methods and results.

Click HERE for full text