The importance of understanding what effect you are measuring
A few years ago, I read of a randomized controlled trial, which claimed to show that running fewer days a week was better for improving your running performance. This finding seems rather unlikely. And indeed it is almost certainly not true. The effect the researchers found was probably not about the frequency of training at all.
So here is what they did. They randomly assigned runners to the treatment group, which followed a three-day a week running programme, or the control group, which followed their usual training. Runners in the control group averaged five runs a week. After a couple of months, the researchers measured improvements in running performance of the two groups. Those who had trained three days a week achieved a significantly larger improvement in running performance than did those who ran five days a week.
But that doesn’t show that running less is better, as the frequency of training was not the only difference between the two groups. The other difference was that the three-day-a-week group were following a programme of structured training. Most runners don’t vary their pace enough in training. They run their long runs too fast, their short runs too slow, and don’t run intervals. There are huge benefits to structured training, which includes a weekly long slow run, a shorter, faster tempo run, and repeat sets of fast running (intervals). The study was almost certainly really measuring the benefits of structured training. Had the control group followed the same programme plus the other two days with their usual running or some cross training, it is likely that any difference found would have favoured the control group - just piling on the miles does help both speed and stamina.
The lesson here is that the researchers were not making the right comparison to test what they wanted to test. If they wanted to test the effect of the frequency of training, they should have made sure that other aspects of the training regime were the same between the two groups.
This example has a direct analogy with studies of interventions in which a new programme is compared to the standard of care. That is, the new intervention is not compared to doing nothing but to the existing services they are getting. This approach is usual practice for clinical trials and increasingly common in evaluations of social programmes. Any effect found may not be an effect from the design of the new programme but because of the extra resources it brings, and a focus on implementation fidelity. The growth of implementation science has highlighted that things are often not being done as they are meant to be. So if the control group gets the existing standard of care, but that standard of care is underfunded and poorly implemented, that isn’t the right comparison to tell us if the new programme works. It may well be that existing practice, properly funded with support to correct implementation, would work just as well.
There is a growing trend in several countries toward contracting out social service delivery to NGOs to implement licensed interventions, also called branded programmes. But we don’t really know from the existing evidence base if these programmes are better than well-funded, properly implemented existing practice.
There is an important lesson here for the design of both primary studies and systematic reviews. We need to pay more attention to, and report, what is going on in the control group. We need to assess costs and implementation for both the treatment and control intervention to know what effect we are really measuring. Only if costs per client and implementation fidelity are the same in both treatment and control (or in all treatment arms in A/B designs) can we be sure what is being measured is indeed a programme effect. Unfortunately, few studies do this. And until we do, we will continue wasting money and wasting lives.