What works philanthropy: building evidence for transformational change

An interview with Jon Baron, Vice President of Evidence-Based Policy at the Laura and John Arnold Foundation, on how philanthropy can provide guidelines to private funders and public policies through impact assessment

What works philanthropy: building evidence for transformational change

The Laura and John Arnold Foundation was established in 2008 under its funders’ belief that philanthropy should be transformational and should seek to solve persistent problems in society through innovation. The young philanthropists set out their “Philosophy of Philanthropy” to “incentivize bold, creative thinking and effort, with the goal of igniting a renaissance of new ideas and approaches applied to persistent problems”.

With that goal in mind, in the last six years the Foundation has granted over $944 million, allocating almost $247 million to the “evidence-based policy and innovation” area. As a matter of fact, how can you produce transformational change and seek to solve large complex social problems without solid evaluations able to identify effective – and ineffective – solutions?

We had a talk with Jon Baron, Vice President of Evidence-Based Policy at the Laura and John Arnold Foundation and Leap Ambassador, on how the Foundation is performing rigorous experimental impact assessment analysis to create knowledge and help private resources and public funding be addressed on “what works”.


Could you give us an overview of the Foundation’s philosophy of philanthropy and on how it translates in practice?

The starting point is that many social programs funded by either philanthropists or governments unfortunately do not produce the hoped-for improvements in people’s lives, compared to usual services in the community. That’s true for many initiatives in education, job-training, crime prevention, healthcare delivery and so on. We know this based on findings from the most rigorous program evaluations – what are called Randomized Controlled Trials (RCTs).

The frequent finding of weak no positive effects also applies in other fields such as medicine, where RCTs are done all the time, and in business: Google and Microsoft in the United States have conducted 13,000 RCTs on different business strategies in the past few years and 80-90% of them has found no significant effects. The overall lesson is that finding strategies that do work – in medicine, business or social programs – is challenging.

There are important exceptions to this pattern, however – some programs have been rigorously evaluated and found to produce meaningful positive effects. An example in the United States is the Nurse-Family-Partnership, a program where specially trained nurses regularly visit young, first-time low-income moms-to-be, starting early in the pregnancy and continuing through the child’s second birthday. The nurse helps the woman providing insights on parenting, nutrition, no smoking or drinking during pregnancy, etc. The program has been evaluated in three well-conducted RCTs in the U.S. (and two in Europe – one in the Netherlands, one in UK) and proved to be successful in producing positive outcomes such as a large reduction in child abuse and neglect, and improvements in educational and other important outcomes for children of the most vulnerable mothers.

A lot of what we do in the Evidence-Based Policy department I am in charge of is to fund randomized control trials aimed at building the body programs with rigorous evidence of important positive effects. My division is currently funding over 40 RCTs on manifold topics (job-training, education, early childhood, care delivery, etc.) with the objective of building evidence through rigorous evaluation. Other parts of our foundation are working with multiple state and local governments to try to get them to advance evidence-based decision-making – that is, to build rigorous evidence and use it in policy decisions in order to improve government performance.


Finding strategies that do work in medicine, business or social programs is challenging. We work to build rigorous evidence about what works and to advance its use in public policies in order to focus funding on programs that demonstrate positive findings


How is this approach to evaluation mirrored in your funding choices?

We recognize that finding programs that have a meaningful effect is challenging, more challenging that most people think. That’s why we’re very selective about what we choose to evaluate, focusing on highly-promising programs, because otherwise we could end up funding a wide range of studies that are rigorous but produce many disappointing findings of small or no effects. So we try to focus our evaluation efforts on programs where there is prior promising evidence – not necessarily a RCT but a quasi-experiment or a more preliminary kind of study – that suggests a meaningful effect is plausible.

In other words, we are trying to build on promising findings that have emerged in the literature, on what’s already known, in order to maximize our chances of success in finding programs that are effective. We fund a lot of replication studies for instance: where there is a promising finding we want to see if it works in other areas with a larger and more definitive evaluation.

We provide funding to grantees to carry out the study and we partner with them: before we award a grant, we make sure that the study is well designed; that the research team is capable of carrying out the study like this; that it is measuring outcomes that are of policy and practice importance. We also work with grantees to make sure the study is implemented in a rigorous way.

And we want results to be shared whether they are disappointing or positive: we make that part of the grant agreement and we work closely with our grantees to make sure that the study is carried out and reported with integrity. We want to be sure that the study results are reported in a way that is no more or less encouraging than the actual findings. We emphasize that with our grantees – we are just trying to get the truth.


We recognize that finding programs that have a meaningful effect is challenging, more challenging that most people think. Spending money the old-fashioned way without evidence generally doesn’t solve problems


There’s a lot of controversy on two main issues when it comes to RCT studies: high-costs and “ethics”, as performing such analysis means excluding a part of the population from a program. What is your perspective?

Those issues are real in some cases but often they can be overcome. Let me give you an example that covers both the ethics and cost side. One of the studies that we are funding is an evaluation of a program called “Bottom Line”. It’s an initiative that operates in the U.S. in a number of different cities with the goal of providing one-on-one counselling to low-income high-school students who are academically proficient but also are first-generation college students (i.e. their parents have never gone to college). For example, the counselling helps them choose a good affordable college, that is appropriate for them academically, and provides support throughout the college years to help them complete their degree.

The program is oversubscribed: it does not have sufficient funding to be able to serve every student who is interested. So, in order to do the evaluation ethically, the program shifted from a first-come-first-served admission process to a randomized lottery for determining who can join the program: in other words, random assignment determines who enters the program (the treatment group) and who does not (the control group). It was ethical because they did not have the resources to serve everybody anyhow.

Also, this study has been done at a low cost. All of the main outcomes (who gets into college, who stays in college, who graduates) are being measured over a seven-year period using administrative data, i.e. data collected by somebody else: there is a database in the U.S. called National Student Clearinghouse that covers 98% of U.S. college and university students. The Bottom Line RCT is producing positive findings so far: at the two-year follow-up, the program was found to produce a 10 percentage point increase in continuous enrolment in the first two years of college, in comparison to the control group. This is a large study – 2,400 students in the sample, 3 different sites, and outcomes measurement over a seven-year period – yet the whole study is being done for less than $200,000 since outcomes can be measured with administrative data so there is not the necessity to locate all 2,400 people and give them a survey to find out whether they are still in college.

A lot of the RCTs we fund are reasonably low cost; they generally range from about $100,000 to $1 million, with a few cases up to $3 million. Clearly you can’t do a low cost RCT in every situation but in many areas it is possible.


When it comes to smaller organizations, how do you think they can embed impact assessment in their operations? Are there other valid methods in your perspective?

We must acknowledge that it does not make sense to do a RCT for many programs – for a variety of reasons: sometimes the program is too preliminary, or not serving enough people to do a rigorous evaluation, or sometimes it would be too expensive for the stage of development of the program.

There are different types of earlier stage evaluations that philanthropists can foster. The first one is to verify whether the program is operating as it was designed to before carrying out any impact evaluation. For example, for a job-training program, you would investigate aspects such as “do people show up? Do they complete the program? Is the training delivered consistently with the protocol? Are the key elements being taught?” etc.

Then, one can perform an impact evaluation of different types. You can do it with a non-participant matched comparison group (quasiexperiment): you don’t use randomized control, but you see whether they are doing better, similar or worse, than a comparison group. Or, you can do a small-randomized trial with a restricted sample (like 60 individuals) to see whether you get a sizable short-term effect (as in our previous example, “are people getting an initial job?”).

This way you can measure the short-term effect to build preliminary evidence. In our grantmaking process, we generally look for this type of prior promising evidence before we will fund a major RCT.

In this progression of impact assessment, philanthropy can focus, even with limited resources, on the earlier stage. It is nonetheless important to recognize that promising evidence from those preliminary studies often does not hold up when more definitive randomized evaluations are performed: it is a good place to start and provides good reason to invest in more definitive studies, but it is not yet sufficiently reliable to say if the program is actually effective.


If you don’t have sufficient resources for large RCTs you can focus on preliminary impact evaluations to build promising evidence


Does the SROI method have a role in your evaluations?

It does for lot of studies that look at socio-economic return. Of course you must have a good impact evaluation, a credible finding – a RCT or something very close to that – to do a social return analysis. Such an analysis is only valid if there is reliable evidence of impact on something (such as high-school graduation, reducing crime, employment earning etc.): if you don’t have that credible evidence on impact, you don’t have any credible evidence on social return. We would generally fund either benefit-cost analysis or social return analysis as a last stage after there is strong evidence of program impact.


Which are the main results you have achieved when it comes to public policies?

One of the concepts we have proposed is that government social spending programs are set up with evidence as the defining principle in who gets the funding. So we have proposed a tiered-evidence approach to government social spending that has been enacted into law and policy in a number of program areas over the past 10 years, such as early childhood, K-12 education, teen pregnancy prevention, and foster care.

The general concept is: the largest grants go towards programs that have strong evidence of effectiveness based on RCTs. They get a large grant and then there is a requirement for a randomized replication study to see whether the earlier positive findings can be reproduced. Smaller grants are provided towards more innovative programs, along with a rigorous evaluation to determine whether they work or not: if they do work, they can move into that top tier and become eligible for the largest grant; If not, the funding discontinues.


For further information 



This website uses cookies to improve your browsing experience or help us provide our services. By closing this banner, scrolling this page or clicking any of its elements, you give consent to our use of cookies.