Creating Positive Change with Data and Evaluation – Part II: Incentive Alignment


Brian Beachkofski leads Third Sector’s efforts in data, evaluation, and modeling to include developing and executing strategy, capturing best practices, and providing training.

This post is based off a recent speech I made at the Harvard University Center on Education Policy Research for a meeting of the Proving Ground partners. Proving Ground is an initiative committed to helping education agencies meet their practical needs, by making evidence cheaper, faster, and easier to use. Their goal is to make evidence-gathering and evidence-use an intuitive part of how education agencies conduct their daily work. 

This is part two of a three part series.

Data is now so common that it is subdivided into big data, streaming data, small data, personal data, and countless other terms. Given its prevalence, we have data that can be used to learn how to improve the social sector. It’s not a question of capability. Analysis approaches are mature enough to provide insight; most data isn’t even used to improve operations. What it comes down to is government office incentives and risk calculus.

When I worked in the Pentagon, the incentive issues became obvious when putting together a portion of the Air Force Budget.

Every office had the incentive to say their challenge is bigger than ever even if that overestimates the demand. The way your budget grows is to show that your need is higher and greater than ever. Another assumption is that every office director wants to grow their office’s budget. No one want to show that they’ve made progress, because if you do, your budget will get cut. Back in the Pentagon, there were very few times when an organization came to us and said, “we have made great progress on our work and have extra money we’d like to repurpose to address new issues” or “we’ve fixed the issue and you can go ahead cut our budget.”

So, if organizations generally want to grow, and growth is driven by increasing need, organizations have little incentive to demonstrate lower demand for services.

That’s an admittedly wildly cynical view of how government works. So the weaker form of the theory, the programs are making an impact, but they claim that the need would have grown even faster without the work the office did. This does happen from time to time. For example, it is unclear what the poverty rate would be without the anti-poverty programs discussed in the last post and it is impossible and unethical to ever test those claims by suspending the programs while people are still in need.

But there are ways to test the counterfactual. How can we determine what would have happened without a particular program? Natural experiments happen. As an example, some program have enrollment size restrictions and use a lottery to provide benefits. Other natural experiments are when there is an eligibility cliff where people on one side of a metric are fully eligible for benefits, but those just on the other side are ineligible. Comparing those just above the threshold to those just below can isolate the impact of the benefit. But even in these examples, we rarely see government offices look to rigorously examine the counterfactual outcomes.

But the counterfactual is only half of an impact measure. There are also incentives to not rigorously measure the outcomes of those served. Back in the budget office there was another common justification for budget. “The ROI is 10:1, so we can make 10x impact on your funding.” This was so common that most budget requests gave an ROI, ones that would make venture funds drool. The incentive is to make this number as large as possible without being patently absurd. Funding agencies rarely even question the details of the ROI calculation or sometimes even understand how they were calculated. Any metric that is not understood but determines funding is going to be manipulated by those seeking funding. If it’s been exaggerated, that office then has no incentive to measure through their data the true impact. This is especially true when that office would be the first to measure impact. They would then be compared to overly generous estimates.

Take this example. In the education space, companies selling materials have the same incentive as those touting high ROIs. Claims of impact are as large as possible, are generally not comparable to other metrics, and can influence purchasing decisions. Moreover, once a purchasing decision is made, the claims of their impact are not measured or used for future purchasing decisions.

Pay for Success aims to fix these incentive issues by explicitly measuring against a counterfactual, evaluating the program impact, and connecting payment to that impact. Since payment depends on meeting pre-determined levels of impact, the incentive to exaggerate claims is moderated. Because the counterfactual results are subtracted from the project’s impact, correctly characterizing the need is part of the project design. PFS can remove the incentives to exaggerate or set unrealistic claims of impact or ignore the real counterfactual outcomes. PFS will require a cultural change, but the benefits of creating incentives to learn and cite facts is worth it.