Without Replication, Should Program Evaluation Findings Be Suspect as Research Findings Currently Are?
Replication of research -- the reproducibility of
findings -- is a methodological safeguard and hallmark of research universally
lauded by scientists to justify their craft. As we are continuing to learn with more
certainty, it is theory not much put into practice. Claims about research
finding may be more likely to be false than true. Scientific studies are
tainted by poor study design, sloppy and often self-serving data analysis, and
miscalculation – problems that replication of the studies and duplication of
the results would largely correct. Again, the problem is that it’s not done.
The continuing work of John Ioannidis at Stanford
University, Brian Nosek at the University of Virginia, and others shows that
much research is not and cannot be replicated. Almost a decade ago in these
pages (Courts
Have No Business Doing Research Studies, Made2Measure, October 15, 2007), I
highlighted a 2005 paper by Ioannidis titled “Why Most Published Research
Findings Are False” that caused a stir in the scientific community and prompted
many scientists and consumers of research
to begin questioning whether we can trust evidence produced by research
studies. Today, with more than $80
million of funding of a “research integrity” initiative by the Laura and John Arnold Foundation,
the science critics and reformers like Ioannidis and Nosek have been given a
solid platform to question the culture of science that produces studies that
can’t be reproduced.
Can program
evaluation be questioned as well? Program evaluations
are assessments of changes in the
well-being (status or condition) of individuals, households, communities or
firms that can be attributed to a project, program or process, along with the
systematic determination of their quality, value or merit. Rooted
in the tradition of behavioral and social research, does program evaluation --
especially impact evaluation that relies on randomized controlled trials – exist
in a similar culture of research described by Ioannidis, Nosek, and others that
does not support replication and reproducibility of results? In my experience,
program evaluations in the area of justice and the rule of law are one-off
affairs funded by donors who are seldom, if ever, prompted to support
replication of the results.
For several years, I have called for a bigger space for performance measurement and management (PMM)
in the toolkit of international development of justice and the rule of law relative
to program evaluation and global indicators. I argue that justice
institutions and justice systems that take responsibility for measuring and
managing their own performance in delivering justice using PMM, rather than
relying on external assessments done by third parties such as typically is done
in program evaluation and global indicators, are likely to have more success
and gain more legitimacy, trust and confidence in the eyes of those they serve.
Replication or reproducibility of results highlights a
critical design difference between PMM and program evaluation or evaluation
research. Basically, replication means repeating the performance measurement or
evaluation research to corroborate the results and to safeguard against
overgeneralizations and other false claims. In contrast with program
evaluation, repeated measurements – i.e., replication of results on a regular
and continuous basis, ideally in real time or near-real time -- are part of the
required methodology of PMM.
I’d like to believe that my suspicion of a lack of
replicability and reproducibility of program evaluation findings strengthens my
argument for more space in the toolkit of international development for PMM. Of
course, my suspicions are just that until “program evaluation integrity” studies
like the research integrity studies funded by the Arnold Foundation confirms those
suspicions.
© Copyright CourtMetrics 2017. All rights reserved.