Wednesday, March 15, 2017

Without Replication, Should Program Evaluation Findings Be Suspect as Research Findings Currently Are?

Replication of research -- the reproducibility of findings -- is a methodological safeguard and hallmark of research universally lauded by scientists to justify their craft.  As we are continuing to learn with more certainty, it is theory not much put into practice. Claims about research finding may be more likely to be false than true. Scientific studies are tainted by poor study design, sloppy and often self-serving data analysis, and miscalculation – problems that replication of the studies and duplication of the results would largely correct. Again, the problem is that it’s not done.

The continuing work of John Ioannidis at Stanford University, Brian Nosek at the University of Virginia, and others shows that much research is not and cannot be replicated. Almost a decade ago in these pages (Courts Have No Business Doing Research Studies, Made2Measure, October 15, 2007), I highlighted a 2005 paper by Ioannidis titled “Why Most Published Research Findings Are False” that caused a stir in the scientific community and prompted many scientists and consumers of research  to begin questioning whether we can trust evidence produced by research studies.  Today, with more than $80 million of funding of a “research integrity” initiative by the Laura and John Arnold Foundation, the science critics and reformers like Ioannidis and Nosek have been given a solid platform to question the culture of science that produces studies that can’t be reproduced.

Can program evaluation be questioned as well? Program evaluations are assessments of changes in the well-being (status or condition) of individuals, households, communities or firms that can be attributed to a project, program or process, along with the systematic determination of their quality, value or merit. Rooted in the tradition of behavioral and social research, does program evaluation -- especially impact evaluation that relies on randomized controlled trials – exist in a similar culture of research described by Ioannidis, Nosek, and others that does not support replication and reproducibility of results? In my experience, program evaluations in the area of justice and the rule of law are one-off affairs funded by donors who are seldom, if ever, prompted to support replication of the results.

For several years, I have called for a bigger space for performance measurement and management (PMM) in the toolkit of international development of justice and the rule of law relative to program evaluation and global indicators. I argue that justice institutions and justice systems that take responsibility for measuring and managing their own performance in delivering justice using PMM, rather than relying on external assessments done by third parties such as typically is done in program evaluation and global indicators, are likely to have more success and gain more legitimacy, trust and confidence in the eyes of those they serve.

Replication or reproducibility of results highlights a critical design difference between PMM and program evaluation or evaluation research. Basically, replication means repeating the performance measurement or evaluation research to corroborate the results and to safeguard against overgeneralizations and other false claims. In contrast with program evaluation, repeated measurements – i.e., replication of results on a regular and continuous basis, ideally in real time or near-real time -- are part of the required methodology of PMM.

I’d like to believe that my suspicion of a lack of replicability and reproducibility of program evaluation findings strengthens my argument for more space in the toolkit of international development for PMM. Of course, my suspicions are just that until “program evaluation integrity” studies like the research integrity studies funded by the Arnold Foundation confirms those suspicions.

© Copyright CourtMetrics 2017. All rights reserved.


Monday, February 20, 2017

How To Be Heard by Policymakers

The design of international development is ill-suited for our fast-paced world. It is not unusual for aid programs to take five or more years from blueprint to start-up and another five years for results to be reported, and even more time for the results to be “translated” into policy. 

How Scientists Should Act

Writing in the February 10, 2017 issue of Science, Erik Stockstad summarizes the message of Paul Cairney, a political scientist at the University of Stirling in the UK, author of the book, The Politics of Evidence-Based Policy Making. Cairney’s message is for those scientists who want their findings to find their way into policy:

Data does not speak for itself.  Scientists should be “sifters, synthesizers, and analyzers” to make the evidence “speak.” Cairney repeats the common refrain of policy-makers: “I don’t have the time to consider all the information. How do I decide?”

Policymaking is disorderly. Scientists need to dispense with the notion that policymaking is an orderly process. It is anything but. This should not be a justification for scientist to avoid getting involved.

Publishing the results is not enough. I have written in these pages and elsewhere about the three development phases and requirements of effective performance measurement and management (PMM): the “right” performance measures to yield the relevant data; the “right” distribution system for the getting the results to the right people in the right way, and at the right time (preferably in real-time or near real-time); and processes that encourage the “right” use of the performance data. Similarly, Cairney says, scientists who want their evidence to influence policy must be persistent, find the right networks, and the “right moment” to get their results to policymakers.

Pick Your Battles. Scientists should realize that results that matter are likely to be controversial and disputed. PMM, like most scientific endeavors, is not just a diagnostic exercise but, more or less, an exercise of power and control.  Cairney counsels scientists to avoid areas where emotions are high, think of other ways to engage using techniques of presenting technical information in accessible and persuasive language that recognizes “an audience’s pre-existing concerns, values, and biases.”

Be patient. I have been frustrated by the slow pace in which justice systems around the world have embraced PMM, despite what I view as sound principles and strong evidence of its merits. Cairney says that we should have patience, and lots of it. It takes two or three decades for profound changes to be made, even in areas like smoking and cancer where strong evidence of cause and effect have been developed.

New Techniques and Tools

Cairney’s advice is certainly wise in terms of how scientists should behave and position themselves to influence policymaking. But what about technocratic changes and new scientific tools that scientists might use in the design and dissemination of their findings to influence policymaking?  

In a talk last week on February 16 at William & Mary, Caroly Shumway, Director of Center for Development Research at the United States Agency for International Development (USAID), Chief Scientist for the Global Development Lab at USAID, and the Senior Science Advisor to the Administrator of USAID, discussed how USAID is using science and technology to transform international development efforts. She mentioned one such tool: rapid feedback. According to the USAID website, Rapid Feedback MERL (the acronym referring to monitoring, evaluation, research and learning) is a “collaborative approach to learning and adapting. Improved data capture and compressed feedback loops provide decision-makers with timely, actionable evidence. Design and implementation decisions can be optimized to maximize chances of impact and improve prospects for long-term success.”

Why not require scientists to hew to standards of real-time or near real-time for “compressed feedback loops,”  standards that are de rigueur in business and much of the private sector (think of the DOW, sports reporting and emerging in PMM in the public sector)? Why should scientists who profess an interest in shaping policy adhere to rigid standards for the timing of reporting that are defined by scientific designs divorced from the demands of policymaking?  Should those designs not be flexible enough to meet those standards? Why should even researchers relying on randomized controlled trials be precluded from providing rapid feedback in real-time? 

Consider, for example, a researcher who is required to report in real-time his or her evidence that his or her research is producing data that might not be reproducible for various reasons. Would not such rapid feedback benefit both policymaking and good science?

© Copyright CourtMetrics 2017. All rights reserved.

Labels: ,

Tuesday, February 07, 2017

Politicians in Greece Make Data the Enemy

In a January 23, 2017 post (Right Use and Politics in Performance Measurement and Management), I argued that practitioners of performance measurement and management (PMM) must not ignore the reality of politics if they want to ensure the “right use” of PMM. They do so at peril of coming into the cross-hairs of foes who would “shoot the messenger” rather than consider data that they don’t like.

The lesson that performance measurement data is not above politics came into full international view today on the front page of the Wall Street Journal article today written by Marcus Walker under the title Greeks Make Data The Enemy: Facing resurgent debt crisis, politicians indulge in conspiracy theories involving former  statistics chief.

Andreas Georgiou, an American-trained economist and Greek citizen who moved from the U.S. and became Greece’s first independent head of statistics in 2010, stands accused by his foes of manipulating the country’s deficit figures in a plot to force austerity measures imposed by the European Union (EU) and the International Monetary Fund (where Georgiou worked before he took the chief statistician job in Greece). He could face multiple trials and imprisonment under the charges even though Greek investigators and prosecutors concluded that he committed no crimes and was merely applying EU accounting rules.

EU representatives worry that Greek statistics will become “a political plaything,” as well they should. The sad story of Mr. Georgiou, who moved back to the U.S. in August 2015 without seeking to extend his five-year term with the Greek government, underscores the fact that PMM is not only a diagnostic exercise but also an instrument of power and control.

 © Copyright CourtMetrics 2017. All rights reserved.

Labels: ,

Wednesday, January 25, 2017

“As Is” Adoption of the UN Sustainable Development Goals (SDGs) Without Correction Is a Mistake

On September 25, 2015, the United Nation’s General Assembly adopted the Sustainable Development Goals (SDGs), officially known as “Transforming Our World: The 2030 Agenda for Sustainable Development.” The 2030 Agenda was hailed by then UN Secretary General Ban Ki-moon as nothing less than “a defining moment in human history.”  Many critics, on the other hand, argued that the details of the SDGs – not necessarily their grand ambitions - do not bear close scrutiny.

Leading up to the adoption of the SDGs, the prolonged debate about the goals the world set for 2030 had been heated, fraught with seemingly endless consultations. Nonetheless, in a surprise to many if not most informed observers, the sprawling package of SDGs, including 17 overarching goals and a mind-boggling 169 associated targets, was adopted virtually unchanged from that proposed on August 12, 2014, by the Open Working Group of the UN General Assembly on SDGs.

The SDGs Are Not SMART

In an article in the current issue of the  William & Mary Policy Review, and in a conference sponsored by the United Nations Development Program (UNDP) last month in Tashkent, Uzbekistan focused on Goal 16 of the SDGs, the “justice and peace” goal, I joined scholars and commentators who have criticized the SDGs as sprawling and misconceived, difficult to understand, unmeasurable and unmanageable in their present formulation. I pointed out that the sprawling package of the of 17 goals, 169 targets or sub-goals, and 230 indicators of success simply does not meet the goal setting “SMART” criteria, i.e., most are not specific, measurable, attainable, relevant, and time-bound. I argued that the SDGs need to be made so in order for them to make a positive impact on sustainable development by 2030 comparable to that of the narrower predecessor Millennium Development Goals (MDGs), which expired at the end of 2015. I concluded that it can be achieved by taking three courses of action that promise to result in a cohesive framework with a limited set of indicators that constitute a balanced scorecard to assess progress toward justice outcomes:

1)      formulate detailed operational definitions and instructions for the provisional indicators and associated targets;

2)      streamline the proposed provisional indicators to a more limited number of measures, i.e., a vital few; and,

3)      ensure that countries and their statistical offices and performance measurement departments take ownership of the framework of indicators.

Adoption “As Is” Without Correction

I assumed that the flaws in the current formulation of the SDGs would be self-evident to countries and stakeholders at the global, national and subnational levels as they prepare to implement the SDGs. Indeed, it is difficult to see how nations can, without serious revision of most of the 17 goals, 169 targets (sub-goals), and 230 tentative indicators, incorporate the SDGs into their national planning processes, policies and strategies, as envisioned by the UN. I expected UN member nations to make the SDGs SMART before they adopted them. That is, I did not expect implementation initiatives to adopt the sprawling package of the SDGs “as is” before making them SMART, akin to the costly mistake of automating a seriously flawed manual process such as case management “as is” without correcting the flaws prior to automation. Alas, there are some alarming signs that such mistakes are happening.

For example, with the support of UNDP, Albania is embarking on a comprehensive implementation of the SDGs in line with its national and subnational strategic plans (including its National Strategy for Development and Integration 2015-2020 and European Union integration agenda).  While it is difficult to know precisely how many flaws of the SDGs were identified and corrected before integration into Albania’s national and subnational strategic frameworks, the evidence points to the assumption that the SDGs were largely adopted “as is” without much correction.

This kind of use of the SDGs “as is” is not restricted to countries and their public institutions. For example, in a recent article by Valerie L. Karr, Jacob Sims, Callie Brusegaard and Ashley Coates (researchers at the University of Massachusetts Boston’s School for Global Inclusion and Social Development and AidData, a research lab at the College of William and Mary which tracks who is funding what, where, and to what effect in order to inform development policy) reported on a study designed to test whether the World Bank’s efforts to align its work with the SDGs in one area -- the inclusion of people with disabilities in development efforts --  is effectively translated into inclusive development projects on the ground.  The researchers combed the World Bank online project database and identified projects that involved people with disabilities using an analytical tool that produced concrete examples of World Bank initiatives from 2009 – 2015 corresponding with the SDGs. What is telling is that, like the efforts in Albania, the researchers did not seem to have made any effort to ascertain whether the SDGs as articulated were sufficiently SMART to align with the World Bank’s efforts which, in contrast, were carefully screened for relevance.

Opportunities and Risks

Countries seeking to implement the SDGs face both opportunities and risks.  On the one hand, collaboration and cooperation with United Nation agencies such as UNDP and other donors, partners, and stakeholders hold out both promises of international support and assistance for national and local sustainable development efforts and elevating a country’s stature and standing in the global community. In what I might be characterized as a “bandwagon effect,” it may be tempting for countries and organizations at the national and global level to hop on board without knowing in advance where the bandwagon is heading.

On the other hand, efforts to integrate and coordinate the sprawling package of the global agenda of the SDGs – including 17 goals, 169 targets or sub-goals, and 230 tentative performance indicators – with national and subnational strategic frameworks are fraught with daunting difficulties and significant risks that stand in the way of meaningful implementation of the SDGs. In Albania, for example, national and subnational frameworks include over 50 strategies, national plans and policy documents. Simply mapping the elements of these frameworks against the ill-defined SDGs, which the Economist declared were a “mess” and possibly “worse than useless,” without first making them SMART risks being a meaningless wasteful exercise. It also may mire Albania into the kind prolonged debate, disagreements, and bureaucratic infighting over special interests that plagued the long drawn out process of formulating the sprawling package of the SDGs.

© Copyright CourtMetrics 2017. All rights reserved.


Monday, January 23, 2017

Right Use and Politics in Performance Measurement and Management

In previous posts here (see Ensuring the Right Use of Performance Data: A Cautionary Tale from Health Care, June 26, 2012) and other writings, I have urged the broadening of the scope of inquiry about performance measurement and management (PMM) beyond the “right measures” and the “right delivery” of the information provided by the measures (for example, by such mechanisms as real-time performance dashboards) to the politics of the “right use” of that information. Trained in the social sciences, scholars and practitioners of PMM may think they can exclude politics from their models, thinking that it sullies the discipline of PMM or that politics is the business of other fields. This is a mistake, especially for international development. The necessity of consideration of politics is argued, among many international scholars, by Francis Fukuyama in the first chapter of his  2011 book, The Origins of Political Order.

The requirements of the “right use” recognizes that PMM – and other technologies of knowledge production and governance such as program evaluation and global indicators -- is not just a diagnostics exercise devoid of politics. Knowledge-power theory teaches us that PMM and other knowledge production is an exercise of power and control, a lesson central to the research and writing of Kevin Davis, Benedict Kingsbury, Sally Engle Merry, Angelina Fisher and their colleagues at New York University’s Institute for International Law and Justice. Much of the resistance to the “right” use of PMM to increase efficiency, improve effectiveness, further public accountability, and achieve transparency stems from fear that a third party deemed to lack legitimate authority is using PMM to usurp power and control from the organization or institution whose performance is being measures.

To ignore the reality of such politics is to render PMM useless, i.e., data derived from the right measures delivered in the right way but resisted and not used are literally useless. As the Nobel laureate Joseph Stiglitz put it at the American Economic Association conference in Chicago this month (as reported in the January 14 Economist), economists need to pay attention not just to what is theoretically possible but “what is likely to happen given how the political system works.”

© Copyright CourtMetrics 2017. All rights reserved.


Thursday, May 12, 2016

Incentive: The Missing Ingredient in Performance Measurement and Management (PMM) in Courts

Woody Allen is said to have once quipped: “I was in a warm bed and, all of a sudden, I found myself in the middle of your strategic plan.” What will it take for courts and other justice institutions to get out of their warm beds and embrace performance measurement and management (PMM)? What are the incentives?

Business Incentives Do Not (Yet) Exist for Courts

For private sector organizations, PMM is an imperative, an essential business evaluation tool that is a matter of survival. In the long-term, if profits are insufficient to cover expenses they surely soon will be out of business. In the short-term, if cash-flow does not cover employee salaries, they will close their doors sooner. Other than net profit and cash-flow, critical measures for businesses include return on investment, market share, customer satisfaction and loyalty, and employee engagement. For businesses moving the needle on these measures in the right direction provides both an incentive and a tool for improvement. Success in one area can prompt focus on doing better in other areas.

For courts and other justice institutions, such incentives do not exist. While some courts have been closed or placed into receivership (e.g., the Detroit Recorder’s Court in the 1980s), the rarity of such occurrences are exceptions that prove the rule that survival is not an everyday worry for courts. 

Parallels in Health Care

In previous posts I have explored innovative financial incentives for PMM for courts (e.g., gainsharing, a type of profit-sharing system used by local governments and at least one court). And, like many of my colleagues, because hospitals and doctors, and courts and judges, are much alike, I have looked to health care for ideas (e.g., “never events” in court administration).

In a recent op-ed in the Wall Street Journal, Ezekiel J. Emmanuel, Chairman of the Department of Medical Ethics and Health Policy at the University of Pennsylvania, describes an innovated pilot program, Independence at Home, that merits scrutiny by court leaders and managers.  The program is part of a movement in health care to reward providers based on quality, not quantity of care.

Dr. Emmanuel begins by describing a wheelchair-bound 87-year-old patient in the program, Luberta Whitfield, who suffered a stroke that left her right side paralyzed a few years ago. She has emphysema and diabetes, is dependent on oxygen, and recently tore the right rotator cuff on her good arm. The program gives the sickest Medicare patients like Ms. Whitfield primary care right in her home. Since it launched in 2012, it has succeeded in delivering high-quality care at lower costs than traditional Medicare. Thanks to the program, Ms. Whitfield still lives in her own home. Here’s how the program works.

Patients who qualify for Independence at Home need to have been hospitalized in the past year, suffer from two or more chronic conditions, require help with daily tasks, and must have needed services such as a stay in a skilled nursing facility within the last year. These are the type of patients that are the key to saving money; they make up 6% of Medicare patients but account for nearly 30% of Medicare’s cost. According to an analysis by the Centers for Medicare and Medicaid Services (CMS) cited by Dr. Emmanuel, these patients are so sick that 23% die each year and each account for $45,000 in annual Medicare spending. He contends that the program could save Medicare tens of billions over ten years.

Once in the program, patients receive coordinated primary care focused on keeping them healthy and in their home and out of the hospital. Emmanuel characterizes the care they receive as “concierge care for the sickest – not the richest.” Now here’s the intriguing part that may be of interest to court administration.

Physician groups who join the program and bid to provide the Independence at Home services have financial incentives in the form of bonuses to keep patients out of the hospital, which saves money, while still meeting Medicare’s quality standards. Bonuses are given only after the total costs for their patients’ care is reduced for two consecutive years. If they fail to achieve these reductions, they cannot share in the savings.

In a June 2015 press release, the CMS announced good results for the first performance year of the Independence at Home demonstration, including both higher quality care and lower Medicare expenditures. The CMS analysis found that the 17 physician groups in the program saved an average of $3,070 in the care of 8,400 Medicare beneficiaries in the program's first year, for a total of more than $25 million in savings, while delivering high-quality health care at home in accordance with six quality measures (e.g., fewer hospital readmissions within 30 days). CMS announced that it would award incentive payments of $11.7 million to nine of the participating physician practice groups.

Can This Work for Courts?

Critical features of the Independence at Home pilot project are its focus on the quality of care, not quantity, and its dependence on measureable outcomes supported by rigorous PMM.  As I noted in my previous posts on gainsharing, notwithstanding questions of legality and opposition on philosophical or political grounds (e.g., court excellence is mandated by law and, therefore, should not be supported by financial incentives), the success of this CMS demonstration project bears close watching as a model for courts. Incentive payments could be triggered, for example, by sustained reductions in cost per case, a relatively underused court performance measure that is part of both the CourTools and the Global Measures of Court Performance, achieved without loss of quality in accordance with stringent standards and criteria for various case types.  

As my colleagues Victor (“Gene”) Flango and Tom Clarke suggest in their book, Reimagining Courts: A Design for the Twenty-First Century, courts need to be reimagined and transformed. They should innovate continuously.  The gap between government’s information technology, including that of courts, and the private sector seems not to be shrinking but widening.  People expect access to government services and assessing quality as easily as looking up a restaurant on Yelp or Google.  Incentives for good performance outcomes, the modus operandi in the private sector, need to find their way into court administration as they are slowly making their way into health care.

© Copyright CourtMetrics 2016. All rights reserved.


Friday, April 29, 2016

Advancing Performance Measurement and Management (PMM) in the Justice Sector

Who else is doing PMM where? How is it working out for them? Answering these two questions will advance performance measurement and management initiatives more than any effort to date.

For many years, I’ve been in the business of convincing courts and other justice institutions to develop political will and capacity (OK, mostly just trying) to measure and manage their performance in an effective, accountable, and transparent manner. I used to think that widespread buy-in by the justice sector surely would be seen by the time of the development of well-conceived models beginning with the Trial Court Performance Standards in the late 1980s and early 1990s, to the CourtTools ten years ago, to the Global Measures of Court Performance in the last few years. But buy-in for PMM certainly has not been overwhelming. Instead, at best, it has been a slow slog for advocates of PMM.

Who Else is Doing PMM Where?

In a recent article in the William & Mary Policy Review (Volume 7, Number 1), I suggest a way of speeding up this slog. (The article should be available on the Journal’s website and on HeinOnline shortly.) The PMM that is taking place today in justice systems throughout the world, relatively limited that it may be, needs to be documented and made visible and known to be used, I wrote. That is, knowledge production should be accompanied by knowledge transfer. Unfortunately, this is not taking place at an effective speed and extent, largely because the institutions and countries actually engaged in PMM at the local levels understandably are not much in the business of disseminating and promoting their PMM beyond their jurisdictions and borders. Unlike research firms, universities, justice-related organizations, and donors, they lack incentives to promote their work.

Over the last several months, for example, I only learned through personal contacts with the principals that the Victoria courts in Australia had adopted the Global Measures of Court Performance and that High Court in Lahore, Pakistan, currently is including seven of the eleven measures of the Global Measures into its new case management system. I’m trying to follow-up on the latter as I write this post. This type of word-of-mouth, hit-or-miss, anecdotal transfer of knowledge will not accomplish the trick of PMM knowledge transfer.

To address the problem, my colleagues and I at the Institute for the Theory and Practice of International Relations at the College of William & Mary last year launched the Justice Measurement Visibility (JMV) Project, a project that aims to identify successful PMM in throughout the world focused on Global Measures of Court Performance (which is part of the International Framework for Court Excellence developed by the International Consortium for Court Excellence). For those interested in adopting or adapting the Global Measures, we hope the project will answer the inevitable question for which today, unfortunately, we have only an unsatisfactory answer, “Who else doing this today?”

How Is It Working Out for Them?

Working with the Courts and Tribunal Academy of Victoria University in Australia in February and March this year, I ran into unexpected headwinds of resistance to PMM, mostly coming from judges.  In several venues, I presented the idea of PMM in a way that I was convinced at the time would receive enthusiastic support. I tried with moderate success at best to address confounding questions and counter a few verbal bullets point-by-point. For example, I responded to the criticism that the very idea of public accountability for court performance and transparency is antithetical to the principles of judicial independence and separation of powers by arguing that public accountability driven by a system of performance measurement and management can and will strengthen, not weaken, judicial independence and the institutional integrity of courts and judicial independence.

Reflecting on the experience with these presentations, I had an epiphany of sorts, the sudden and striking realization that advocates of PMM, including me, were not practicing what we preach, namely that we should measure results that matter, and count what counts. Shouldn’t we be looking at the results of PMM itself in the same way? Yes, I realized we were failing to address not only the question, “Who is doing PMM where?” but also the more important follow-up second question, “How Is It Working Out for Them?”

How could I have missed this? Would not resistance to PMM dissolve and the naysayers silenced if we could only answer these two questions clearly and succinctly? If we could say, for example, that yes, two-thirds of the justice institutions and justice systems using PMM throughout the world have improved their performance. Moreover, three of them have very much in common with that of the questioner.

A new ambitious project, will be joined with the JMV Project at William & Mary, will address this second question. The project, which does not yet have a name, is in the proof-of-concept stages. I will describe progress in future posts here. Comments would be welcomed.

© Copyright CourtMetrics 2016. All rights reserved.