What is effective and for whom: the importance of looking beyond averages and chasing the tail

I have recently been in Australia where a key social policy issue is children at risk. What are effective programmes to keep these children in the home environment whilst not being held back by their disadvantaged background?

There is an increasing demand for rigorous evidence of what works.  But what do we mean when we say a programme or policy ‘works’? Failing to unpack what we mean by ‘works’ can have devastating human consequences.

It is likely that we say a programme ‘works’ when we have a study showing a statistically significant effect size. If we equate significant effect size with an effective an intervention, then we are making several mistakes: relying on single-study evidence, looking at statistical not practical significance, and focusing solely on the first moment (average treatment effect) instead of considering the second moment (which helps identify cases where the intervention is ineffective and potentially harmful).  

Home visitation programmes targeted at families with children at risk are a common approach. Evaluations of these programmes use measures of family functioning as the outcome. Evaluations usually report the average treatment effect. If the distribution of effects is well behaved (as it should be in a large sample) then the average is both the mean and median as in the figure here, which shows the distribution of effect sizes.

Figure: Distribution of effects

In my figure, the programme ‘works’ for 84 per cent of families, and children in those families. That is they experience an improvement in the outcome. For the other 16 per cent it doesn’t work, and can actually be harmful.  So, in this case a conventional test of statistical significance would conclude the programme doesn’t work at all, as it does this if 5 per cent or more of the distribution is in the red tail. In this case, maybe it does work, but our sample was too small to find that – an underpowered study.

When we do find a statistically significant effect, we are leaving these 5 per cent in the home environment at increased risk, since for these children family functioning worsens because of the programme. Five percent sounds low, but is 50 children for every 1,000 exposed to the programme. In extreme cases, a child left in the home environment, because they are in a programme which ‘works’, dies through neglect or is killed.

The public and political response to such tragic circumstances is, ‘This programme doesn’t work. Close the programme, and sack the people responsible.’ But the programme does work for the majority, and we are worsening their situation if the programme is stopped.

There are three responses to such circumstances.

The first is educating policy makers, politicians, the public and the media about the nature of evidence. There needs to be an understanding that the average treatment effect is just that, an average. As shown in the picture above, there is a distribution around that average. For some children the programme helps a lot, for some just a little, and there may be others who it will harm.

Secondly, the job of researchers and knowledge brokers give the evidence to policy makers and allow them to decide what to do. Stating the average treatment effect is not a complete statement of the evidence. Would that politicians demanded ‘Show me the second moment’. We need to do better at how we state the evidence. For example, ‘Overall this programme is beneficial. We expect the large majority of children will be able to remain in the home environment safely, being healthy and educated because of this programme. But about 15 per cent of children won’t benefit, and some may actually be harmed by the programme.’ Whether the harm to those children is worth the greater good to the majority is a political decision, not one for the researchers, and certainly not one to be hidden by a focus on the first moment.

One can imagine politicians wanting to follow a ‘do no harm principle’. The left hand tail of the distribution of effect sizes will ALWAYS cross the vertical axis. So do no harm may be interpreted as only x per cent of the distribution being in that tail. Again, policy makers need fill in the x, not researchers.

Finally, and most importantly, we need to chase the tail. We would like to be able to identify the children in the ‘red for danger’ tail of the distribution so that they are NOT exposed to the programme. That is, we need a screening tool for case workers. We can only identify these children with larger studies (i.e. larger samples), more studies and better evidence synthesis. Larger studies have the sample size to look more closely at the contexts in which programmes are effective or not. We can also identify these children by having more studies to contrast programme effectiveness across different contexts and populations in high quality evidence synthesis. Without such analysis we are cannot answer the ‘will it work here’ question. We put lives at risk with ignorance like this.

And here comes the challenge for meaningful knowledge translation. We do not expect caseworkers to drive to meet a new case family after leafing through the latest issue of Journal of Evidence-Informed Social Work to look for tables with appropriate subgroup analysis. Rather we hope that they have an evidence-based practice checklist of how to ‘diagnose’ the family, and so allocate to a particular ‘treatment’.  We are not there yet, but should be headed that way if research is to make a difference.

Contact us