Mar

How to Fix Flaws in Evaluation? Use Evaluation Standards

I started working on evaluations back in the early ‘90s, evaluating programme aid. That is money given to developing country governments which is not attached to any specific expenditure. I had done a few of these when I was asked to evaluate Swedish programme aid in Zambia. The outline for Sida evaluation report included a chapter on the evaluation method. I hadn’t been asked that before. I’ll just go and evaluate it I thought.

Whilst I did have a method, I had never had to write it down. Having to do so got me more interested in the methods being used to evaluate the programme aid than the evaluation findings themselves. I convened a workshop to which I invited bilateral agencies to present how they evaluated programme aid. The most striking finding from looking at these presentations – which are collected in a special issue of the IDS Bulletin – is that they mostly didn’t have an explicit method either.

Jump forward nearly 30 years to when I published a blog on 10 common flaws in evaluation – one of which is ‘methods-free evaluations’. How to fix these flaws? It turns out it’s not sufficient to simply require a methods section in the evaluation report. Most have that. But they describe how they will collect the data. Or give an inadequate description of methods. ‘Mixed methods’ is not a method! Which methods will you be mixing?

Many evaluators – and that includes me – have never had any formal training in evaluation. Evaluation is not a profession, like being a lawyer or an accountant – in which there is a recognized qualification required to practice. And it is full of different evaluation approaches with different names. Indeed, the same name is used to mean different things by different people. But that’s a subject for another blog.

I think the answer is better use of evaluation standards. And we already have these hiding in
plain sight.

In the mid-2000s I started hearing about something called a systematic review. I was to become heavily engaged in the production and use of systematic reviews first at 3ie and then at the Campbell Collaboration. This involved a steep learning curve. One of the things I learned about was critical appraisal, which assesses the conduct and reporting of a study to determine how confident we should be in the study findings. Studies may be badly designed or reported, may run into bad luck – such as high attrition – of have inadequate reporting (again, attrition, which is often poorly reported in quantitative studies). In most reviews the majority of included evaluations are rated as being of low or medium confidence. It is rare to find an evaluation for which we can have high confidence in its findings.

I started to ask myself, why aren’t the evaluations being done better? They should look at our critical appraisal tools – the list of items used to assess confidence in findings – and make sure they do those things. The penny dropped. A critical appraisal tool is just the other side of the same coin for evaluation standards.

There are evaluation standards. And like most things, health is ahead of the crowd. Most notable is CONSORT, which stands for Consolidated Standards of Reporting Trials. In the early days of 3ie one of our first staff members, Rob Bose, developed an adaptation of CONSORT for studies of development interventions which we published in the Journal of Development Effectiveness. We might as well as have published pages from the Delhi telephone directory for all the attention it got. Ron was too far ahead of his time. But now, I hope, the time has come. And we have tools suitable for that purpose. Critical appraisal tools are available for all different types of study design – see the collection curated by SURE. AT CEDIL we produced one for mixed methods impact evaluations – including saying which methods! And we have a new tool developed for small n impact evaluations.

With Hugh Waddington and Hikari Umezawa, we recently conducted a meta-evaluation of evaluations of support to civil society advocacy. These are mostly small n impact evaluations. Ten years ago I wrote a paper with Daniel Phillips about methods and biases in such evaluations. That paper had informed the guidance of the Dutch Ministry of Foreign Affairs on how to conduct such studies. So, we were asked to see how they are doing.

For our meta-evaluation we developed a new critical appraisal checklist to assess how study teams dealt with possible sources of bias. For example, one bias is the ‘people like us bias’ in which evaluators speak to development agencies and government but are steered away from possibly dissenting voices. So does the evaluation report who they spoke to and how they identified? Evaluations often ignore the role of other agencies, or domestic factors, in explaining observed changes. So one item in the assessment tool is ‘Does the evaluation articulate alternative causal hypotheses, including the role of contextual/external factors, such as social/cultural setting, political or economic trends, and parallel interventions or other stakeholder actions, that may influence outcomes?’.

If evaluators using small n impact evaluation methods like process tracing were to ensure that their evaluations satisfied the items in out assessment tool then we could have far more confidence in their study findings.

There is a caveat. I much prefer evaluations with a strong narrative. I try to write mine as a story along the causal chain. Using a checklist may lead to the same problem as evaluation reports written in response to an over-specified terms of reference. The report goes item by item answering questions in the ToR, but you cannot see the wood for the trees. There is no big picture. But using a critical appraisal tool does not mean writing the report with the same structure. You can write a report with a strong narrative. But make sure the items in the checklist are covered.

Want to know more? Our meta-evaluation is here, and you can hear Hikari speak about it at an event next week. Register here.