By Howard White
This article is the first of three in a series on "Reflections on the evidence architecture". In the next two articles I will explore the misuse and underuse of evidence.
It is very fashionable now to speak of using evidence and evidence-based programmes. But actually, evidence is being under-used and misused so that billions of dollars continue to be wasted on poorly designed and ineffective programmes.
There is a need for a much more systematic use of evidence throughout the project, programme and policy cycle, or when considering new practices. How evidence can be used is shown in my figure for the Evidence-driven project cycle (shown below).
Use of evidence begins at the design stage, which draws on four different types of evidence:
- quantitative date on prevalence to show that the issue to be addressed is a cause for concern, and
- research into why it is a problem;
- formative research assessing likely underlying factors causing the problem which need to be addressed in an intervention, and
- consulting the global evidence-base as to what has worked elsewhere to tackle the problem.
For example, teenage pregnancy rates are around one-third in much of sub-Saharan Africa. Births to young mothers are bad for the health of both mother and child: the mother misses out on schooling, and the child is more likely to suffer irreversible cognitive deficiencies. At 30% prevalence, teenage pregnancy is both bad and common, so clearly a priority issue. Qualitative research shows a range of factors contributing to teenage pregnancy including cultural factors such as "sexual cleansing" (requiring girls to have sex after their first period), and traditional dance ceremonies – such as disco matanga in parts of Kenya – providing many opportunities for sex; abusive relationships with teachers and relatives (some of whom may be motivated by the belief that sex with a virgin can cure HIV/AIDS); and poverty, driving girls to transactional sex. "Poor parenting" is also suggested as a factor, especially if mothers themselves are selling sex as a livelihood strategy.
Unfortunately, the evidence shows most programmes to tackle teenage pregnancy to be ineffective, with the exception of multi-component programmes, which have a small effect. Requiring multi-component programmes makes sense: community interventions address cultural norms, programmes need to empower girls to make be able to make better choices and to reduce the peer pressure from boys and men, channels need to be found for girls to report abuse, and contraception needs to be available for those who are sexually active. But this evidence is mostly from the United States. So an updated review is needed looking at the global evidence – though the fact remains that most programmes to tackle teenage pregnancy in Africa are not being evaluated.
Whatever programme is adopted needs to be tested. Testing is integral to the evidence-based project cycle. Evidence-based policy is not a blueprint approach. We do not say, it worked in Amsterdam so let’s do it in Accra. Rather, it worked there so let's try it and test it here.
Testing goes through three stages: formative evaluation, efficacy trials and effectiveness trials.
Formative evaluation tries out the intervention on a very small scale, possibly just one community, to see if there are problems of implementability or acceptability by the beneficiary population. It is not unusual for externally-designed programmes to run into such problems. As captured in the funnel of attrition, participation rates are often much lower than expected, for reasons such as failure to work with community leaders or simply to inform or persuade intended beneficiaries of the benefits of the programme. Many programmes suffer from very low take up rates. For example, some subsidized, or even free, health micro-insurance schemes have had take-up rates of less than 1% in Pakistan and the Philippines. In such cases, money is not the binding constraint – it may be the lack of facilities or of staff and drugs at facilities – which needs to be addressed.
In other cases, the intended technology may not work under field conditions. The International Initiative for Impact Evaluation (3ie) funded a study to support compliance in self-administered TB treatment. The treatment group were given a stick to urinate on after taking the medication. If they had taken the medication then a code would appear which they could text in to get extra phone credit. But the sticks didn’t work, so the intervention was redesigned as an mhealth SMS reminder intervention. A formative evaluation would have caught this problem before evaluating at scale.
Efficacy and effectiveness trials
If the programme passes the formative evaluation then next is the efficacy trial.
Efficacy trials test for impact, preferably with a randomised control trial (RCT), conducting the programme with as near to perfect fidelity as can be achieved including reaching the planned target population. The efficacy trial is to test whether the programme works when done under as near ideal conditions as possible. To be approved for general use, drugs have to pass efficacy trials. Development programmes should be subject to the same discipline. The vast majority of development impact evaluations, especially RCTs, are in fact efficacy trials. Implementation is often overseen by the research team, sometimes with graduate students in the field for the duration. Efficacy studies can show if the intervention works under these ideal conditions, but not if it will work in practice when implemented through regular government or NGO channels.
So if the efficacy trial shows the programme to work, it can be taken to scale. And then it should be evaluated again: this is the effectiveness trial. It is common to find that programmes which have been shown to work in efficacy studies don’t work at scale. A well known example is that of contract teachers in Kenya, which improved test scores in the efficacy trial but not when taken to scale by government.
Similarly, our review of farmer field schools found they were effective in changing farmer behavior, increasing yields and farm income. But the positive effects came from pilot programmes. Evaluations of large scale farmer field school programmes did not find the same effect.
And as the programme moves to different contexts or populations, testing should continue, and there can be A/B testing to improve design. Process evaluations are also built into this process. All the time there is an iterative learning process to repeat a stage, or even go back one or more steps.
The evidence from all these studies contributes to the global body of evidence which should be summarized in systematic reviews. This brings us back to the top of the cycle: consulting the global evidence base to see which programmes are effective in tackling priority problems.
The above is the ideal. In my next two blog posts I will explore this misuse and underuse of evidence.
To contribute to the global debate on these issues, join us at the What Works Global Summit 2019 in Mexico City this October.