Clements 2008 Improving Aid Evaluation

American Journal of
Evaluation
Volume 29 Number 2
June 2008 195-214
Reducing World Poverty by © 2008 American Evaluation
Association
Improving Evaluation of 10.1177/1098214008318657
http://aje.sagepub.com
Development Aid hosted at

http://online.sagepub.com
Paul Clements
Thomaz Chianca
Ryoh Sasaki
Western Michigan University
This article argues that given its structural conditions, development aid bears a particularly heavy
burden of learning and accountability. Unfortunately, however, the organization of evaluation
guarantees that evaluations will be inconsistent and it creates incentives for positive bias. This
article presents evidence from organizational studies of aid agencies, from the public choice
literature, from eight development projects in Africa, and from one in India, that demonstrates
positive bias and inconsistency in monitoring and evaluation. It proposes that the evaluation
function should be professionalized through an approach titled “monitoring and evaluation for
cost effectiveness,” and it compares this approach to the World Bank’s results-based monitoring
and evaluation, the Development Assistance Committee’s five evaluation criteria, and evaluations
based on randomized trials. This article explains the analytics of the proposed approach and
suggests directions for further research.
Keywords: development assistance; cost effectiveness; evaluation bias; professionalism
A fter more than 60 years of international development aid, its effectiveness at improving
economic and social conditions and reducing poverty is still unclear. This article argues
that weaknesses in aid are substantially due to a flawed incentive structure in aid management
and to associated failures in learning and accountability. One aspect of this incentive
structure—the mix of political and developmental motives driving aid—is inherent in the
structure of aid. This “political problem” can be ameliorated but not removed. The other main
element in the flawed incentive structure, however, arises (a) from the flow of resources from
taxpayers or contributors through to intended beneficiaries and (b) from the inherent difficulty
and long-term nature of aid investments. The accountability chain for development managers
is weak, incentives for legitimization conflict with those for effectiveness, and the development
task is a very hard one. Let’s call these (a) the “principal-agent problem” and (b) the “learning
challenge.” They can be addressed, and the key to addressing both of them, we argue, is better
evaluation. In particular, the key is routine evaluations that estimate and explain each inter-
vention’s impacts and cost effectiveness on the basis of criteria and judgments that are con-
sistent between evaluations.
The task of this article, therefore, is threefold. First, we present evidence of the limited over-
all impacts of aid and discuss interactions between the political problem, the principal-agent
Paul Clements, Associate Professor of Political Science, Political Science Department, 1903 W. Michigan
Avenue, Kalamazoo, MI 49008-5346; e-mail: clements@wmich.edu.
Authors’ Note: We are grateful for significant comments and feedback, which we believe have led to major
improvements in the article, from several anonymous reviewers and from the journal’s editor, Robin Lin Miller.
195
196 American Journal of Evaluation / June 2008
problem, and the learning challenge. We argue that although there are many excellent indi-
vidual evaluations of development programs, the political problem and the principal agent
problem together lead to evaluations that on average are weak and positively biased. Second,
we present evidence of weak and biased evaluation from the macroperspective, from the eval-
uation literature, and from a series of case studies in Africa and India. The case studies are
helpful not only to substantiate our argument but also to illustrate various ways in which pos-
itive bias is played out. Third, we present a proposal for the central element of a solution to
the principal agent problem and the learning challenge. For both learning and accountability,
it is important that evaluations should estimate the total impacts that can be attributed to a
development intervention and also estimate the intervention’s cost effectiveness. Such esti-
mates can be informed by a detailed knowledge of other similar programs and of the dynamics
of the sector in which the intervention takes place. Also, evaluations informed by this kind of
knowledge are likely to be more helpful for program managers. To reduce positive bias, to
enhance the consistency of evaluations, and to establish appropriate knowledge of evaluation
technique and of the various development sectors, we propose that a particular kind of eval-
uation association should be established. Finally, we compare our proposal with several
recent reforms in development evaluation.
Limited Overall Impacts of Aid

and Problems Arising From the Structure of Aid
The gist of our argument is that aid could be greatly improved if aid professionals adopted
a routine and effective orientation to cost effectiveness, and this is what our proposal supports.
It is important to appreciate the limited effectiveness of aid over the past 60 years. Many
econometric studies on the relationship between aid and economic growth and/or other devel-
opment goals have found either no statistically significant impacts or limited and qualified
impacts. These include, to mention only a few, Mosley, Hudson, & Horrell (1987), Boone
(1996), Easterly (2003), and Burnside and Dollar (2004). Thus, two senior World Bank
officials note:
Meta-analysis of ninety-seven different studies on the impact of aid on growth, drawing on three
different approaches used in the literature, concluded that at best there appears to be a small
positive, but insignificant, impact of aid on growth. . . . Much (though not all) aid has also been
wasted on poorly conceived and executed projects and programs, often fettered by debatable
conditionality. (Bourguignon & Sundberg, 2007, pp. 1-2)
Taking a different approach, Schaefer (2002) notes that for the 97 countries America has
aided from 1980 to 2000 for which data are available, with total official aid to these countries
exceeding $144 billion in real terms, real average per capita gross domestic product (GDP)
declined during the period from $1,076 to $994.
One line of explanation for the poor results of aid that we note in passing involves
recipient countries’ economic policies. Some studies find stronger positive effects from aid in
countries with policies that neoclassical economists consider “good” in the sense that they
support efficient allocation of resources through the price mechanism (e.g., low inflation, low
budget deficits, low barriers to trade). Other studies, however, do not find these effects, while
the strongest predictor of growth, the level of investment (Rodrik, 1999), is harder to manip-
ulate. Although we agree that getting governments of low-income countries to adopt eco-
nomic strategies that support investment and macroeconomic stability can be helpful to
Clements et al. / Reducing World Poverty 197
development, it can be difficult and should be conceived as complementary to development

programs and projects on the ground.
Another line of explanation that has received increasing attention in recent years involves the
incentives facing aid professionals. Neither the taxpayers or private contributors who provide
the resources for aid nor the expected beneficiaries (mainly the poor in low-income countries)
are in a position to hold those who allocate and manage aid resources effectively to account
(Wenar, 2006, p. 10, 16-17). However, everyone involved in allocating and managing aid has
incentives that compete against the incentive to enhance the well-being of the poor. On one
hand, this breaks the accountability feedback loop (Svensson, 2006, p. 119). For example, there
is significant evidence that promotions of program managers at official development agencies
of the United States (Clements, 1995, pp. 584-585) and Sweden (Svensson, 2006, p. 119)
normally are not influenced by the performance of the projects on which they have worked (see
also Martens, 2002a, p. 13). On the other hand, the lack of performance-based incentives also
makes it less likely that the great challenges will be overcome in learning what approaches are
likely to work better in specific contexts and in applying that knowledge.
The development literature routinely notes that aid is driven by political as well as devel-
opmental goals and that these goals sometimes conflict. Donor governments often use aid to
gain influence with recipient country governments (Horta, 2006, p. 5). Also, organizational
analyses of aid agencies beginning with Tendler’s 1975 classic have found that economic
analysis of projects often functions more to justify decisions that have already been made than
to support the ongoing allocation of resources (pp. 94-95). Personnel of official development
agencies often face incentives to “move money” to fulfill spending targets, or a so-called
“project approval culture,” undermining the quality of project designs (Clements, 1993,
p. 1634, 1639-1642; Clements, 1996; Horta, 2006, p. 9; Nelson, 1995; Tendler, 1975; World
Bank, 1992). As Svensson (2006) notes, “when managers are not held accountable for
impacts, other objectives (like spending the budget) become more important” (p. 121).
Recently, economists and scholars in the public choice tradition have begun to analyze
incentive structures in the delivery of aid. These studies are based on the assumption of rational
utility maximization—that the choices people make are oriented to maximizing the satisfac-
tion of their interests—although some take it that most development professionals include
among their interests the achievement of development goals. The basic idea is that decisions
within aid agencies reflect the incentive environments in which these agencies evolve. First,
these agencies lack both a market test (such as the private sector faces) and effective democ-
ratic oversight (such as many government agencies face). Second, as aid professionals com-
pete for resources, objective information on program impacts can be a liability as well as an
asset. Hence, there may be a low demand for good evaluation. As Pritchett (2002) concludes,
based on a formal model drawing from his experience at the World Bank,
the model shows how ‘advocates’ of particular issues or solutions—the public action equivalent
of entrepreneurs—have incentives to under invest in knowledge creation because having credible
estimates of the impact of their preferred program may undermine their ability to mobilize political
(budgetary) support. (p. 251)
The most thorough analysis is Martens, Mummert, Murrell, and Seabright’s (2002) The
Institutional Economics of Foreign Aid. Although they do not cite Tendler, they reach similar
findings: “Careers are often built on demonstrating good performance in more easily moni-
torable tasks, such as ‘committing and spending budgets’” (Martens, 2002a, p. 20). The chapter
on evaluation assumes that efforts of implementing agencies depend on donor agencies’
ability to monitor their performance (Martens, 2002b, p. 163). For donor agencies, however,
and for politicians who approve aid funding, less accurate evaluations reduce their vulnera-
bility to critics and the chance of losing tax payers’ support (pp. 171-172):
[E]valuation plays a key role in establishing a political performance equilibrium for foreign aid
programmes, though that equilibrium is not necessarily efficient in terms of satisfying taxpayers’
objectives of genuine wealth transfers. Evaluation serves to maintain this political equilibrium
and is, in itself, unable to enhance aid performance. Manipulation of the quality of evaluation
reports helps politicians to drive an informational wedge between the objectives of taxpayers and
aid suppliers, thereby maximizing their votes but reducing satisfaction for both interest groups.
(Martens, 2002b, p. 155)
In their 2006 paper on “Evaluation Bias and Incentive Structures in Bi- and Multilateral Aid
Agencies,” Michaelowa and Borrmann find similar incentives to use evaluations to enhance
legitimacy. They also extend Martens’s analysis by examining the incentives for evaluators to
collude with project managers: “The evaluator maximizes his utility via the determination of
the degree of objectivity (or bias) of the evaluation result” (p. 319). Because most evaluators
depend on future contracts from the same donor agency, managers’ incentives for positive
bias translate into similar incentives for evaluators (p. 320).
Of course, incentives are not the only factors influencing the organizational cultures of
development agencies. Most aid professionals have a sincere desire to help the poor to
improve their conditions. It takes only a little reflection to see that aid agencies need effective
evaluation to accomplish this, for learning if not for accountability, so the will to do good
translates naturally into demand for evaluation. This benign imperative finds traction through-
out an organization, and as most aid agencies grow and mature, they establish an evaluation
function. However, although “everyone” is committed to evaluation in general (some more
than others), incentives for positive bias arise in the course of the struggle for resources and
the defense of reputation.
We call this flawed incentive structure the “principal-agent problem” because, taking tax-
payers as the principals, their agents, development professionals, do not have strong incen-
tives to use aid resources to maximize development impacts. Given also that politicians who
provide the aid budgets often have political aims that compete with development aims and
that they have incentives not to demand strong evaluation, one might expect evaluation stan-
dards to be weak. However, the learning challenges in development are greater than those in
most professional arenas. Institutional contexts for aid are difficult and diverse, and building
sustainable improvements with and for the poor is very difficult. Although learning should be
based on estimates of program impacts, the impacts of many development programs extend
for many years past the end of the program, so they are hard to estimate. Given the incentive
environment we have described, phrases like “under invest in knowledge creation,” “less
accurate reports,” and “weak evaluation standards” all indicate positive bias, undermining
both learning and accountability. But the need for learning and accountability in development
aid is particularly great. It is plausible, therefore, that these weaknesses contribute signifi-
cantly to aid’s disappointing impacts.
Evidence of Weak and Positively Biased Evaluation
Applied to evaluation, the terms weak and positively biased refer to different kinds of prob-
lems. Weakness (in our context) may suggest an inadequate methodology, a focus on inputs
and outputs rather than outcomes and impacts, and failure to achieve coherence in assessing
the program’s results and their causes. An evaluation is positively biased, by contrast, when
its conclusions are more positive than its evidence warrants. Given the incentive structure, one
would expect both weakness and positive bias to be widespread in aid evaluation. Weak eval-
uations, however, are likely to be de facto positively biased, if not by making unwarranted
positive claims then by failing to ask the relevant questions. A program may have many pos-
itive outcomes but still have limited impacts. When managers frame the evaluation narrowly
or evaluators fail to ask hard questions, it is easier to reach positive conclusions, but such eval-
uations fail to address the learning challenge both for the program and for the broader devel-
opment community. By contrast, an evaluation that is analytically strong but positively biased
may be very helpful for local learning and even somewhat helpful for the broader community.
It may correctly identify program characteristics associated with stronger and weaker out-
comes. Positive bias overstates the worth of these outcomes, however, supporting compla-
cency within the program and undermining cost effectiveness in the community.
We have noted that the cumulative impacts of aid at the country level are less than robust.
Development agencies’ summaries of their own evaluations, however, almost always indicate
that the majority of their programs succeed. There is a gap between macrolevel assessments of
aid effectiveness, based on cross-country econometric studies or on simple averages, and
microlevel assessments, based on aggregations of evaluations. The classic overall assessment of
aid, Does Aid Work? by Cassen et al. (1986), answers that yes, on the whole aid does work, but
it is based mainly on evaluations from the donor agencies themselves. Following the global
pattern, the World Bank has found outcomes from 60% of its agricultural projects in Africa from
1991 to 2006 to be satisfactory, while the overall performance of African agriculture declined
substantially (World Bank, 2007, pp. xxiv, xxv). At the same time, the Bank finds that
M&E [monitoring and evaluation] at the project level has been of limited value in answering
fundamental questions about outcome, impact, and efficiency, such as who benefited, which
crops received support and how, what has been the comparative cost effectiveness, and to what
can one attribute gains. (World Bank, 2007, p. xxviii)
Several African countries have received more than 10% of their GDP in aid over decades in
which their GDP growth has been very low or negative, but the aid agencies find the majority
of their projects to be successful.
In terms of formal logic, there is no necessary inconsistency between successful projects
and lack of economic growth—it is only necessary that conditions would have been even
worse without the aid. However, in addition to the incentives discussed above, the prepon-
derance of microlevel evidence also indicates that many aid evaluations are weak or positively
biased. For example, Picciotto, a former director-general of evaluation at the World Bank,
asserts that the record of international evaluation has been “dismal” (Bollen, Paxton, &
Morishima, 2005, p. 190). Svensson (2006) asserts that “to the extent that evaluations are
handled by the aid agency itself, which is typically the case, it will be subject to attempts at
manipulations” (p. 123). Based on a survey of members of InterAction, the main U.S. coali-
tion of international nongovernmental organizations, Chianca (2007) finds that only about a
quarter have developed evaluation policies and standards and less than 10% subject their eval-
uations to meta-evaluations or carry out any regular synthesis of evaluation findings (p. 129).
Bollen et al. (2005) assess more than two dozen evaluations of democracy and governance
programs of the U.S. Agency for International Development (USAID). They find “a lack of
methodological accuracy and inappropriate coverage of important information about the impact
of assistance interventions” (Bollen et al., 2005, p. 1999). Their assessment, they conclude,
appears to match the assessment of other researchers who have reviewed the evaluations of inter-
ventions by other international development agencies. The typical evaluation lacks an appropriate
research design, measures of inputs and outputs, and controls for confounding variables to justify
sound assessments of whether an intervention accomplished its goals. (Bollen et al., 2005, p. 202)
In his study on the institutional economics of foreign aid, Svensson (2006) considers educa-
tion spending by many donors in Uganda and Tanzania. He finds that in Uganda in 1991-1995
only 13% of mainly donor funded capitation grants were reaching schools: “[T]he donor com-
munity had no idea (and had done little to find out) about its impacts” (p. 126). In Tanzania,
donors did not know the organizational arrangements for spending their grants or the levels
of disbursements, and in one of the largest programs (on school books), only about 20% of
funds disbursed were reaching schools (p. 127). Svensson concludes that “even in a priority
sector like education, donors have limited knowledge of the actual impact of the program they
are financing and the intended beneficiaries are passive players at best” (p. 124).
On the other side of the ledger, Michaelowa and Borrmann (2006) find the evaluation
system of the World Bank to be in many respects a model. They assert that “institutional pro-
visions ensure that OED [the Bank’s Operations Evaluation Department, recently renamed
the Independent Evaluation Group] is truly independent within the Bank” (p. 321), because
its director-general reports to the executive board, representatives of the countries that fund
the bank, rather than to the bank’s president, and because “he is engaged on a fixed term con-
tract which can be renewed only once and may not be employed anywhere else in the Bank
thereafter” (Michaelowa & Borrmann, 2006, p. 321). They also find that average success rates
for projects from the Implementation Completion Reports (ICRs) by project managers do not
greatly exceed scores from ex-post Project Performance Assessment Reviews (PPARs), eval-
uations carried out usually 2 to 5 years later by OED. Michaelowa and Borrmann also note,
however, that OED personnel normally come from and return to operational positions within
the World Bank, introducing a possible incentive for bias, and these authors do not directly
assess particular ICRs or PPARs.
Eight World Bank and USAID Projects in Africa
More direct evidence on World Bank and also USAID evaluations is found in Clements’s
1999 report on his study of informational standards in the management of development assis-
tance (see also Clements, 1996). His sample included eight projects in Africa, four funded by
the World Bank and four by USAID. Projects were selected to give a range of well advanced
or completed projects in an area that could be covered with available resources based on a
page or less of identifying information per project including no information on results. Desk
reviews were conducted first based on documents available in the United States, and then, the
author spent 11 months in Kenya, Uganda, and Malawi studying the projects in the field. The
study was designed to assess informational standards overall—not final evaluations per se—and
five of the eight projects were not yet completed at the time of the study (four because of
extensions). Nevertheless, a pattern of inconsistency and positive bias was quite evident.
Final evaluations for the three completed projects, USAID’s Cooperative Agriculture and
Agribusiness Support Project in Uganda and On-Farm Grain Storage Project in Kenya and the
World Bank’s Water Supply and Sanitation Rehabilitation Project in Uganda, all had conclu-
sions representing strong positive bias. A main component of the $40 million agricultural
cooperatives project, a set of now bankrupt cooperative shops importing agricultural inputs, had
been found to be highly corrupt (Beijuka, 1993), and Clements found evidence of significant
corruption in project loans through the cooperative bank. At the project’s conclusion, it had
not established any profitable agribusinesses. Nevertheless, its final evaluation found it to
have been “successful in achieving many of its objectives” (Roof, Wiebe, & Trail, 1994, p. vii).
Plans for the Kenya grain storage project and the Uganda water and sanitation project had
estimated economic rates of return (ERRs) of 24% and 20%, respectively, and at their con-
clusions, they reestimated returns at 24% to 34% and 18%. Because any ERR over 10% is
quite respectable, these represented very favorable results. Unfortunately, they were based on
incorrect methodologies. The evaluator of the grain storage project found that very few farm-
ers had independently adopted the grain store that the project had designed and promoted, so
he conjured up a plan for a massive major follow-on project. On this basis, he estimated that
by 2001, 200,000 to 500,000 farmers would have adopted the recommended granary
(Kariungi, 1989). No follow-on project was carried out, but his 24% to 34% ERR was
reported uncritically for the project in its subsequent documents. Supervision reports for the
Uganda water and sanitation project had noted that the government was not allowing the
water company to raise its prices along with inflation, many government and private con-
sumers paid their rates late or not at all, and the company’s personnel were being paid salaries
below the poverty line. Nevertheless, the completion report cited an 18% ERR on the grounds
that construction targets were 90% achieved (mainly by foreign contractors). The project plan
had estimated economic returns based on water sales from 1988 to 2004, but the completion
report failed to update estimates of future water sales attributable to the project.
Among the projects not completed at the time of the study, none was developing an ade-
quate basis for estimating impacts at its conclusion. Supervision reports for the World Bank’s
(jointly managed) Third and Fourth Population Projects in Kenya indicated that the project
was succeeding based on the country’s declining total fertility rates—but the project had little
impact on these rates. USAID had a much more effective Family Planning Services and
Support Project in Kenya, but although Clements found it to be highly cost effective at reduc-
ing poverty, it was not generating a basis in evidence that would support an account of its
impacts when it concluded. USAID’s Promoting Health Interventions in Child Survival
Project in Malawi had run into major structural problems, leading to a complete redesign after
a third midterm evaluation; it was not generating data on its impacts on child health.
Similarly, the World Bank’s Malawi Infrastructure Project was not generating data on impacts
from construction outputs well below original targets. The World Bank’s Southwest Regional
Agriculture Rehabilitation Project in Uganda had encountered serious management problems,
and its major road-building component was just getting started when the project was sched-
uled to end. It had not collected data that would allow assessments of its agricultural inputs
or agricultural extension components, but it had done traffic surveys before and after its minor
local road-building component. Although road construction was more than 60% below target
(763 km vs. 2000 km), traffic on the upgraded roads was more than double the original esti-
mates, suggesting that (depending on maintenance) economic targets for this component
might be met.
The World Bank had estimated ERRs for various components of the Malawi Infrastructure
Project and the Southwest Region Agriculture Rehabilitation Project, as had USAID for its
Malawi child health project, but with the exception of the traffic survey-based estimates for
the local roads noted earlier, these estimates were not sustained by ongoing data collection.
When the projects were completed, there would be little basis for estimating the projects’
impacts and cost effectiveness.
From eight blindly selected projects, not one had an adequate foundation for assessing its
economic returns or other impacts. Not only was there strong positive bias in the economic
returns re-estimated for the two completed projects, but the biases were based on incorrect
methodologies. The evaluation for the completed Uganda cooperatives project, another project
which would appropriately have been subject to economic analysis, was also positively biased.
Altogether, this indicates that the responsible project managers did not expect the evaluations
to be reviewed critically. Although most of the project plans included ERR estimates, none
developed an M&E system that would sustain an appropriate assessment under this standard.
The Sodic Lands Reclamation Project in India
These African projects suggest a commitment to carrying out evaluations but inadequate
rigor and consistency to support the kind of learning that could enhance the development
community’s cost effectiveness. Another case illustrating positive bias and associated incen-
tives is the first Uttar Pradesh (UP) Sodic Lands Reclamation Project in India. The ex-post
evaluation for this World Bank–funded project was selected for a graduate class on M&E of
international development projects based on its methodological sophistication. Although most
of the evaluations that this class studied had conclusions that seemed positively biased, in this
case, the evaluation’s methodological rigor and otherwise high analytic quality put the bias in
particularly sharp relief. The authors of the present article subsequently read the project’s
completion report and found it positively biased as well. It turned out that a second, much
larger sodic lands reclamation project had been initiated under the same management before
the first was completed. The completion report for the original project included evidence that
indicated great threats to the sustainability of the project’s impacts, but we will see that its
cost–benefit analysis assumed these threats away. If the evaluators had taken proper account
of these threats, the project’s ERR would have been much lower, raising serious questions
about the design of the second sodic lands project. The failure to do so poignantly illustrates
the failure of the evaluation function to sustain appropriate learning and accountability.
The first sodic lands project was carried out from 1993 to 2001 with an initial budget of
US$80 million. It was designed to restore fertility to some 45,000 hectares of agricultural land
affected by a build-up of salts caused by decades of poor drainage. When drainage is blocked,
whether naturally or by roads and canals, surface water accumulates and evaporates. The
evaporating water leaves sodium ions that form electrochemical bonds with clay particles in
the soil, creating toxic salts. Sodification has left barren an estimated 1.25 million of the 17
million hectares of farmland in the Indian state of UP and a further 1.25 million hectares with
reduced yields (World Bank, 2001b).
In previous decades, the state and central governments had conducted several reclamation
projects. Their success had been limited, however, because of inadequate farmer participation,
weak institutional support, and weak coordination among stakeholders (World Bank,
Operations Evaluation Department, 2004, p. 2). Reflecting this experience, this project
included a particular focus on community participation, institutional development, and stake-
holder coordination. Because the reclamation technology itself was not new, these “soft”
components were considered particularly important to the project’s long-term success. UP is
India’s third poorest state, and 75% of UP’s population of 160 million are employed in agri-
culture with an annual per capita income under $200 (World Bank, 1998). With 95% of
landowners in UP’s sodic areas having holdings of less than one hectare (World Bank,
Operations Evaluation Department, 2004, p. 2), the World Bank had designated the project as
a Program of Targeted Intervention, signaling a focus on reducing poverty.
The project was implemented by the UP Land Development Corporation (UP Bhumi Sudhar
Nigam, UPBSN) under the state Department of Agriculture. Its objectives were as follows:
Table 1
Project Components and Their Costs
Costs (US$ Million)
Components Appraisal Actual
Land reclamation: Provision of an effective drainage network; application of gypsum,

irrigation development, and support for the establishment of food and tree crops on
privately owned land and forest tree species on community land. 64.80 66.91
Institutional development: Strengthening of the project implementing agency, the Uttar
Pradesh Bhumi Sudhar Nigam, to provide comprehensive coordination for project
activities, business management and reporting, supporting nongovernmental organizations
to assist in mobilizing and organizing the communities in the project area. Also
strengthening of the Remote Sensing Application Center, responsible for site identification
and selection and for monitoring of changes in soil and groundwater environments. 7.05 10.84
Agriculture development and technology dissemination: demonstration of reclamation
models for the production of crops, fruit trees, and forestry species on sodic lands, nursery
development for fruit tree seedling production, extension support involving motivational
campaigns, production of publicity material using mass communication techniques. 4.70 23.41
Reclamation technology development and special studies: Adaptive research to improve
existing reclamation technology, diversification of cropping systems, and development
of methods for preventing further expansion of sodicity. 3.60 2.55
Total project cost 80.2 103.7
(World Bank share) (54.7) (54.7)
Source: World Bank, Operations Evaluation Department, 2004, p. 4.
1. to develop models for improved agricultural production and environmental protection through
large-scale reclamation of sodic lands,
2. to strengthen local institutions to manage such schemes, and
3. to contribute to poverty alleviation (World Bank, Operations Evaluation Department, 2004, p. 2).
Table 1 presents the project’s four main components and estimated and actual expenses. At
81% of the original budget, the main component was land reclamation. (The fivefold increase
in expenses for agriculture development and technology dissemination reflects farmers’ costs
for second-year crop production from reclaimed land, which were counted as farmers’ contri-
bution to the project [World Bank, 2001a, p. 9].) Planned and actual expenses by source are pre-
sented in Table 2. Because of the increase in the quantity of land reclaimed (from 45,000 ha in
the plan to 68,400 ha at completion) and due to greater than expected cost sharing, the benefi-
ciaries’ contribution increased from 15% to 36%. Note, however, that beneficiary contributions
were mainly in kind, consisting primarily of work on their own land. Financing by the World
Bank did not change, and the contribution from the Government of UP was less than expected.
Land was reclaimed mainly by improving drainage and then flushing out the salts. The
project also applied acidic gypsum to neutralize the alkaline salts, and it provided boreholes
and pumps, which could be used for irrigation after the salts were flushed out. Most of the
work was carried out by farmers organized into Site Implementation Committees (SICs) and
Water User Groups (WUGs). The SICs were village-level bodies responsible for
i. distributing inputs for land reclamation such as seeds, fertilizers, gypsum, boreholes, and pump sets;
ii. allocating work for conducting link drains;
iii. land classification;
iv. maintaining the drainage system; and
v. conflict resolution (World Bank, Operations Evaluation Department, 2004, p. 49).
Table 2
Project Financing by Source
Appraisal Estimate Actual Expenditures
Financing Source US$ Million Percentage US$ Million Percentage
Beneficiaries 12.0 (in kind) 15 37.1 (in kind) 36

Government of UP 13.5 17 11.9 11
World Bank (IDA) 54.7 68 54.7 53
Total 80.2 100 103.7 100
Source: World Bank, 2001a, p. 10.

Note: UP = Uttar Pradesh; IDA = International Development Association.
The WUGs were informal groups of 10 to 15 farmers who hold four to five contiguous
hectares of land. A borehole was drilled on the land of one of the farmers, with a pump set
that could be owned by the individual or by the group. The WUGs were expected to share
water to flush out the salts and for irrigation, and WUG members were expected to work as a
team. A WUG was an operational unit of the village’s SIC. Given that UP is part of India’s
“Hindu heartland,” where village society is still largely organized on the basis of caste, this
highly participatory project design would present particular management challenges.
As an incentive for participation and in an effort to enhance sustainability, smallholder
farmers were granted land titles by the government. Tenure security was intended to increase
the incentive for farmers to invest in ongoing maintenance of land reclaimed under the pro-
ject. Because of the project, 62,800 farmers were given plots of reclaimed land, and of these,
22,600 were formerly landless laborers.
Project M&E
The Indian Institute of Management Lucknow (IIML) was the external M&E agency for
the project. Besides developing a computerized management information system, it is not
clear from available reports what other specific tasks the IIML completed during the project.
The World Bank developed four reports associated with its management and evaluation over
the life of the project: (a) 1993 Staff Appraisal Report (SAR), (b) 1997 Midterm Review, (c)
2001 ICR, and (d) 2004 PPAR.
The SAR estimated that the project would yield a Net Present Value of $36.5 million at a
discount rate of 12% and an ERR of 23%. Benefits included (a) increased production of wheat
and rice, (b) increased horticultural production, and (c) increased income from livestock and
off-farm employment (World Bank, 2001a, p. 20). The ERR was calculated for a 20-year life-
time of project benefits. Once more land than anticipated was reclaimed in the project’s ini-
tial years, the target coverage was extended from 45,000 to 68,000 hectares. Financial
contributions from the Bank and the Government, however, as noted in Table 2, were not
increased (World Bank, 2001a, p. 2).
The ICR, a self-evaluation by project staff, was conducted at the time the loan was fully
disbursed and formally closed. Based on the increased area reclaimed, it re-estimates the ERR
to be 28% with a Net Present Value of $24.5 million. The report notes that this calculation
includes only benefits from increased wheat and rice production. If other benefits had been
included, such as from horticultural and livestock production and from off-farm employment,
the ERR would have been even higher. The new estimate was, however, based on three
Table 3
Principal Ratings From the Project’s Implementation Completion Report
and Project Performance Assessment Report
Ratings
Criteria Assessed ICR (2001) PPAR (2004)
Outcome Satisfactory Moderately satisfactory

Sustainability Likely Unlikely
Institutional development impact Substantial Substantial
Bank performance Satisfactory Satisfactory
Borrower performance Satisfactory Satisfactory
Source: World Bank, Operations Evaluation Department, 2004, p. iii.

Note: ICR = Implementation Completion Report; PPAR = Project Performance Assessment Review.
assumptions: (a) weaknesses in maintenance of drains and in provision of extension services

would be overcome, (b) farmers would continue to develop technical and managerial skills,
and (c) yield levels would remain constant, with two crops a year from each unit of land
(World Bank, 2001a, p. 20). The report notes that if rice and wheat yields were to decline by
12%, the ERR would fall to zero. This situation would be triggered if routine maintenance of
drains was neglected, leading to resodification.
The ICR was reviewed by the World Bank’s OED to “independently” examine its main
findings. Although the routine ICR review confirmed all the ratings provided by project staff
in their self-evaluation (World Bank, Operations Evaluation Department, 2004, p. iii), the
ERR estimate in the ICR is not based on the project’s actual results. It is based on conditions
that the project had been expected to fulfill but that by 2001 had not been achieved. The ERR
at project completion would have been considerably lower if it were based on what the pro-
ject had actually accomplished.
The Project Performance Assessment Report, conducted 3 years after the loan was offi-
cially closed, was based on document reviews, a visit to the project site by a mission from the
OED, and a series of studies undertaken by the Center for Development Economics, Delhi
School of Economics (India), in 30 villages of UP’s Raebareli District. These studies included
a survey of 1,200 households, 60 key informant interviews, and 60 community focus groups.
Eleven of the 30 villages were selected from areas not covered by the project to serve as a
comparison group. They had similar geographic, socioeconomic, and cultural characteristics
to the 19 beneficiary villages.
The OED gave positive ratings for the project’s main dimensions, with the exception of
sustainability. Table 3 presents the principal ratings from the ICR and PPAR assessments. For
the criteria of relevance and efficacy, the PPAR gave ratings of “substantial,” while it found
the project’s efficiency to be “modest” given that project benefits were unlikely to be sus-
tained. The PPAR does not re-estimate the project’s ERR, so the ERR of 28% from the ICR
would presumably remain as the final rating in the World Bank databases. The evaluators did
note, however, that the assumptions made by the ICR in calculating the ERR were too opti-
mistic, especially in regard to maintenance of drainage, farmers’ ongoing technical and man-
agerial development, and maintenance of future crop yields (World Bank, Operations
Evaluation Department, 2004, p. 14). The PPAR identifies three serious threats to project sus-
tainability: (a) inadequate attention to drainage issues, including limited incentives for the
farmers to continue working as a group and taking responsibility for field and link drain main-
tenance; (b) lack of institutional coordination among governmental agencies to provide the
necessary attention to soil fertility issues; and (c) limited interests of beneficiaries in contin-
uing to participate in the village-level organizations created by the project (World Bank,
Operations Evaluation Department, 2004, pp. 17-19).
In estimating the ERR at project completion, evaluators had used the amount of land
reclaimed as the main evidence of project impacts. For farmers, however, participating in the
project was a way to gain access to land titles and to free or highly subsidized seeds, fertiliz-
ers, and pump sets. The majority of beneficiaries, with the most degraded “Class C” land,
received on average $709 worth of agricultural inputs and support per hectare, while their
labor contribution was valued at $269 per hectare (World Bank, Operations Evaluation
Department, 2004, p. 15). In this context, the amount of land reclaimed is not a reliable
indicator for project impacts—increased agricultural production and reduced poverty.
The PPAR found considerable evidence that project benefits were unlikely to be sustained.
Almost half of survey respondents considered the operations of the main drains to be bad,
while more than a third found the drains to be deteriorating. Focus groups and observations
by the evaluators also indicated that maintenance of the drains was poor. The Irrigation
Department was responsible for maintaining the main drains, but it was unable to access the
funds. The PPAR acknowledged that the problem of lack of resources for drainage would be
even worse at the end of the second sodic lands reclamation project. The PPAR also found
that few farmers knew that drain maintenance was the Irrigation Department’s responsibility,
so they were unlikely to pressure government for improvements. Also, almost 95% of survey
respondents did not identify effective drainage as essential to avoid sodicity (World Bank,
Operations Evaluation Department, 2004, p. 17).
On the farmers’ side, the village groups were expected to maintain the project’s impacts.
The PPAR found, however, that in most cases, the SICs and WUGs had been created too
quickly. Even though efforts were made to strengthen these organizations, less than 40% of
respondents in project villages knew of any WUG in their village. Focus groups showed that
the villagers had no ownership of the WUGs and that the few existing WUGs had limited
connections with other community institutions. Apparently, the farmers were not stimulated
to do their part or convinced about the importance of maintaining the link and field drains.
Focus group interviews found that many small farmers were already starting to replow the
land dedicated to drains to increase their cultivable area (World Bank, Operations Evaluation
Department, 2004, pp. 18-19).
With both the Irrigation Department’s main drains and the farmers’ link drains deteriorat-
ing, the sodicity of the land was likely to increase again. Also, the original improvement in
the quality of the soil was marginal in some areas. Most barren sodic soils have a pH above
10. Based on 5 to 6 years of soil data, the project’s Annual Soil Monitoring Study had found
that 60% of 216 monitored plots had minimal reduction in pH. They remained highly alka-
line with pH between 9.5 and 10, while a pH of 7 is neutral and a pH of 5.5 to 6.5 is satis-
factory for most crops (World Fertilizer Use Manual, n.d.). Twenty-five percent of plots
showed no improvement in pH, and the remaining 15% showed improvement initially, but the
quality started deteriorating in later years (World Bank, Operations Evaluation Department,
2004, p. 9, Note 11). The PPAR also demonstrates that the project had failed to create ade-
quate awareness among communities and government agencies about technical aspects of
managing sodicity (World Bank, Operations Evaluation Department, 2004, pp. 8-10).
The “unlikely” rating for project sustainability in the ex-post evaluation, 3 years after project
completion, indicates that the project’s ERR would collapse, probably to zero or below. It is
remarkable that despite this rating, the PPAR nevertheless finds the project outcome “moder-
ately satisfactory,” institutional development impacts “substantial,” and both bank and borrower
performance “satisfactory.” Regarding institutional development, the project had strengthened

the implementing agencies but not the village organizations that would assure the continuity of
outcomes. Communities were put in a position of dependency and lacked the power and tech-
nical capacity to make decisions. The project failed to raise farmers’ awareness of the importance
of soil sodicity, and farmers never thought about the project as a long-term initiative in which
they should take the lead. The project also fell short in assuring equity in participation—the
more affluent members of the communities (from higher castes) dominated the SICs and
WUGs, getting, consequently, most of the agricultural inputs supplied by the project. Hence, the
project ended up reinforcing the status quo (World Bank, Operations Evaluation Department,
2004, pp. 10-13).
We find, in sum, that conclusions from both the ICR and the PPAR are positively biased
but in a curious way. In both cases, the report itself notes factors that undermine the positive
conclusions but then concludes positively regardless. The completion report reaches a highly
favorable ERR but qualifies it with illegitimate assumptions. The PPAR reaches positive con-
clusions about the project outcome and bank and borrower performance despite finding
impacts unsustainable. It produces evidence that indicates a highly unfavorable ERR, but it
does not recalculate the ERR. Data standards are quite high, but the data do not support the
conclusions. For the development community, the World Bank, and the Government of India,
the key question is whether the project turned out to be a cost-effective allocation of
resources. It appears that an objective reading of the data at project completion—and certainly
3 years later—would have indicated that it was not. The actual reports, however, shine positively
on the World Bank and on the implementing agency. It appears unlikely that the second sodic
lands project, initiated in 1998 with a $300 million budget, was in a position to learn the
lessons of the first.
It is notable that the flaws in the World Bank (as well as USAID) evaluations discussed
here are straightforward analytic errors. This casts doubt on the effective independence and/or
rigor of the Bank’s OED. Evaluators ought not to be able to get away with such shoddy prac-
tice, and they ought to know that they could not get away with it. The apparent prevalence of
these “errors” suggests that they are not mistakes but that they reflect the incentive environ-
ment governing evaluation.
M&E for Cost Effectiveness
Evaluation of development assistance should help to address the enormous management

challenges involved in maximizing aid’s cost effectiveness at reducing poverty. The goal is to
bring a combination of information and motivation to bear on each program decision that is
likely to lead to a cost-effective choice. Of course, each situation is unique, but each situation
also bears resemblances to past development scenarios. The best basis for deciding what is
likely to be most cost effective in a given situation is a synthesis of what turned out to be more
and less cost effective in previous scenarios that resembled the present in relevant dimensions.
A program evaluation should serve at least three purposes. First, it should support program
managers (or stakeholders more broadly) in deciding what to do next; call this “local learn-
ing.” Second, it should estimate the program’s cost effectiveness and the reasons for it so
people designing and managing other programs can learn from this experience; call this
“community learning.” Third, it should provide a basis for holding managers accountable for
their choices in terms of their consequences for cost effectively reducing poverty. This
accountability function, in turn, can motivate managers to study and to use the information
that evaluations supply.
Now all three of these functions—local learning, learning by the development community,
and accountability—are best served by evaluations that estimate a program’s cost effective-
ness and explain its causes. The case for local learning is least apparent, because in this con-
text, an evaluation can be very helpful by identifying reasons for weaker or stronger
performance (and therefore, implicitly, directions for improvement) even without anchoring
the analysis in an estimate of cost effectiveness. If it can provide such an anchor, however, the
evaluation provides a stronger basis for interpreting its judgments and any recommendations.
For example, the ICR and PPAR for the first sodic lands project are both analytically strong
and may have provided useful lessons for managers of this project and other similar projects.
The value of their lessons is undermined, however, by their unwarranted positive ratings. If
both had included reliable ERRs, managers would have been in a better position to interpret
their other findings.
While in principle estimates of cost effectiveness are central to local learning, community
learning, and accountability, their value depends on their precision, validity, and consistency.
The ERR estimates in the evaluations discussed above were very weak. A program completion
evaluation typically needs to estimate impacts not only over the program’s lifetime but often
for 5 to 25 future years. Such estimates cannot be scientific measurements; they are inevitably
based on judgments. These judgments, however, are not merely a matter of taste. Estimating
future impacts is an objective problem subject to systematic investigation. Unfortunately, it is
a problem that the development community has not systematically addressed.
To address the technical challenges in estimating program impacts, to protect evaluations
from positive bias, to ensure that impact estimates are consistent with one another, and to
make evaluations more immediately helpful, we propose that evaluation for development
assistance should be professionalized. A professional association should establish conven-
tions and standards particularly for estimating program impacts (although also for other
aspects of evaluation) and ensure that its members apply these conventions and standards con-
sistently and correctly. Because technical and incentive problems in program evaluation are
broadly similar to those in accountancy, associations of accountants and their standards (e.g.,
the American Institute of Certified Public Accountants and Generally Accepted Accounting
Principles) could serve as a model (Clements, 2005, p. 31; Wenar, 2006, pp. 17-22).
Many of the lessons about designing and managing programs and about estimating their
impacts and cost effectiveness are likely to accrue on a sectoral basis (e.g., within sectors such
as agricultural extension, microfinance, and family planning). Moreover, an evaluator who is
steeped in the literature and experience of the sector is likely to be better at estimating and
explaining the cost effectiveness of a program in that sector and also at making recommen-
dations. We propose that an association should start with two or three sectors, develop the
appropriate principles and conventions, and train a corps of evaluators in their application.
Impact estimates should thus be rendered reliable and consistent initially within the selected
sectors and then gradually between sectors.1
The association would have a guidebook, an exam, a stamp, a standards committee, and an
online repository. The guidebook would explain the association’s approach to estimating impacts
and cost effectiveness and present its conventions and protocols. The exam would control entry
to the association based on adequate mastery of the guidebook. When a member stamped a com-
pleted evaluation with the association’s stamp, this would guarantee that the evaluation is con-
sistent with the approach and conventions of the guidebook. The standards committee would
initially review all evaluations by members of the association to ensure consistency with the
guidebook. A serious violation would be grounds for expulsion from the association, and this is
the rule that establishes the members’ independence. “Errors” such as those in evaluations for
the Uganda water project, the Kenya grain storage project, and the India sodic lands ICR would
constitute grounds for expulsion. All completed evaluations would be indexed and included in
the repository. Evaluator members would be responsible for ensuring the consistency of their
judgments (e.g., about future impacts of completed projects) with those of other similar evalua-
tions, and over time, these would be refined and codified in the guidebook.
Although we have only reviewed final evaluations of four projects, the problems with their
impact estimates are consistent with what one would expect in light of the incentives facing
evaluators and the present organization of aid evaluation. To professionalize the evaluation
function, however, would enhance the professionalism and integrity of the development com-
munity overall. To clarify the nature and limits of our proposal, it may be helpful to compare
it with other contemporary efforts in development evaluation.
The World Bank’s Results-Based M&E

Results-based M&E is a holistic approach to M&E beginning in the planning stage. As
represented in the World Bank’s Ten Steps to a Results-Based Monitoring and Evaluation
System: A Handbook for Development Practitioners (Kusek & Rist, 2004), it emphasizes
developing a consensus among management stakeholders to support the overall M&E
process, developing key performance indicators and baseline data, monitoring for results, and
incorporating evaluation findings in ongoing management. This approach is intended to apply
not only to traditional development programs and projects but also to sector-wide approaches
and to policy-level interventions.
Cost–benefit analysis requires judgments about the quantity of the flow of benefits attrib-
utable to an intervention compared to the situation one would expect if the intervention had
not taken place. These becomes increasingly difficult to make as the intervention represents
a smaller factor within its system and as other factors have greater and less measurable influ-
ences on the flow of benefits. Thus, policy interventions, such as structural adjustment
programs, are notoriously hard to evaluate because of the many intervening factors and the
difficulty of constructing a counterfactual. In these cases, and for smaller high-level inter-
ventions (e.g., training senior civil servants), the results-based approach represents a sensible
compromise on formal cost–benefit analysis. Development programs are often weak in start-
ing off with appropriate indicators and baseline data; our proposed approach depends on
something like what Kusek and Rist propose for establishing a program’s M&E system. Only
besides selecting indicators and physical targets, program plans should also include ex-ante
estimates of ERRs or of other measures of cost effectiveness. Indeed, it is ironic that although
ERR analysis is a form of results-based evaluation in which the World Bank has been a leader
since the 1960s, Kusek and Rist portray results-based M&E as a recent innovation.
The disadvantage of results-based M&E is that it does not establish the worth of program
results. A program that reaches all its timid targets may be less cost effective than one that fails
to reach ambitious goals. Indeed, a results-based evaluation regime establishes incentives for
program planners to select targets that are easier to reach. A results-based regime also does not
yield consistent units for assessing similar programs, and for programs that happen to select
indicators with the same units, it does not support the consistent application of measurement
tools or interpretation of results.
The Development Assistance Committee’s (DAC)

Evaluation Criteria
The most widely adopted aid evaluation framework over the last decade has been the eval-
uation criteria of the DAC of the Organization for Economic Cooperation and Development.
Under the DAC criteria, a program is assessed in terms of its (a) relevance, (b) effectiveness,
(c) efficiency, (d) impact, and (e) sustainability (DAC, 1991, p. 5). The World Bank changes
“effectiveness” to “efficacy,” substitutes “institutional development impact” for “impact,” and
adds as sixth and seventh criteria “bank performance” and “borrower performance.”
Under the DAC criteria, “relevance” usually refers to the consistency of a project’s goals with
government and donor goals, and “effectiveness” refers to the achievement of objectives along
the lines of results-based M&E. “Efficiency” can be interpreted in terms of a thorough-going
analysis with comparable units such as ERR or else merely as achieving program objectives in
an efficient manner. The fourth and fifth criteria, “impact” and “sustainability,” are somewhat
self-explanatory.
Note that when “efficiency” is interpreted in terms of ERR analysis, insofar as program
impacts can be assessed in economic terms, it largely subsumes the criteria of impact and sus-
tainability. An ERR compares the quantity of program benefits to costs over the entire life of
the costs and benefits. Benefits are simply economic impacts, so ERR clearly covers the
fourth DAC criterion. It is possible to have a sustainable but small stream of benefits—a sus-
tainable trickle—but in this case, the criterion of sustainability is not appropriate for assess-
ing the program’s worth. It is also possible to have a great rush of benefits that may not be
sustainable but that greatly exceed costs; here too sustainability is a misleading criterion.
Where sustainability is an appropriate criterion—where substantial long-term impacts are
expected and “needed”—it is covered under ERR, and where it is inappropriate, it is better
for it to be replaced by ERR.
The DAC criteria may provide a more nuanced assessment than the single criterion of cost
effectiveness, but if efficiency is not assessed in terms of overall cost effectiveness, the DAC
criteria exhibit the same shortcomings as results-based M&E. We propose that the third DAC
criterion, “efficiency,” should be replaced by “cost effectiveness” (ERR or a close substitute)
and that the rating for this criterion should serve as a ceiling for the ratings of effectiveness,
impact, sustainability, and donor and borrower performance. This approach combines the
nuance of the DAC criteria with the rigor and consistency of M&E for cost effectiveness.
Evaluations Based on Randomized Trials

A trend that is gaining strength in aid evaluation involves increasing evaluations that employ
randomized trials.2 Evaluations using these designs can estimate impacts with statistically
specifiable confidence because randomization permits measurement of both the program
effect and the counterfactual. Of course, randomized trials can only estimate impacts that
have already occurred, and most development programs are designed to produce impacts for
some years after the program budget is spent. In these cases, estimates of program impacts at
project completion must include an element of judgment. Also, it is often a separate task to
determine what aspects of the program’s design and implementation are responsible for the
measured impacts, and many evaluations with randomized trials fail to produce evidence in
units that permit comparisons of cost effectiveness with other kinds of programs.
Increasing evaluations with randomized trials and implementing M&E for cost effective-
ness should be viewed as complementary. It may be realistic to hope that aid evaluations with
randomized trials may increase from less than 1% of the total to perhaps 2% or 3% in com-
ing decades. This would increase information about what works in development, providing
valuable points of reference for impact estimates by members of the evaluation association
that this article proposes. Although evaluations with randomized trials produce a small
number of highly reliable data points, M&E for cost effectiveness reformulates them (as nec-
essary) in terms of cost effectiveness, extends their lessons across the development commu-
nity, and applies them to raising general standards of accountability.
Analytics of M&E for Cost Effectiveness
The essential features of M&E for cost effectiveness are that it yields a reliable and con-
sistent estimate of each intervention’s cost effectiveness and an explanation for how these
results were achieved. Explanations of results should be up to the standard of a thorough logic
model, that is, most plausible given the evidence. These features allow it to strengthen learn-
ing and accountability locally and across the community. The key steps in the evaluation are
estimating total impacts attributable to the intervention over the life of these impacts and
translating these impact estimates into units of cost effectiveness. Note that total impacts are
not only intended impacts—they include negative and positive side effects—and that the eval-
uator needs to identify any serious ethical lapses in project implementation even if impacts
are strong.
Although it is important for impact estimates to be both consistent and accurate, the pro-
posed evaluation association would focus on consistency first. Consider the evaluations dis-
cussed in this article. The question is how to estimate total impacts from recovered sodic
lands, a rehabilitated water and sanitation system, an innovative granary and advice on
postharvest treatment of maize, and (supposedly) rehabilitated agricultural cooperative insti-
tutions. In each case, estimates rely on information the program or project has developed over
its lifetime—this normally is not under the evaluator’s control. Rather, the association would
ensure that evaluators apply consistent conventions and algorithms in estimating impacts
based on information that is available or that the evaluator could establish. For example, the
evaluator would have criteria for assessing the strength of village institutions or of a national
water company and would adjust impact estimates accordingly. In due course, the evaluator
would know of evaluations of perhaps a dozen similar programs with diverse outcomes.
Comparisons between the present program, in its various dimensions, and these similar
programs would contribute to impact estimates. Where information is weaker, the evaluator
might estimate that impacts fall within a broader range (e.g., ERR between 5% and 15%
depending on x, y, and z). The evaluator would aim to make an impact estimate that is con-
sistent with those for similar programs and reasonable given the evidence. Accuracy would
improve over time as better baselines and monitoring systems are established and as experi-
ence accumulates.
To translate impact estimates into units of cost effectiveness, one needs to assign a
value to each unit. For example, when a poor person’s income is increased by $1, standard
ERR analysis assigns it a value of $1, but if the objective is to favor income gains to the
poor, one might assign it a value of $2. In the development community, health interven-
tions are increasingly assessed in terms of gains in disability adjusted life years (DALYs;
see World Health Organization, n.d.). To translate a DALY score into a cost-effectiveness
score, one needs a value for each DALY. Because a threshold of $100 per DALY is often
used to assess health programs in low-income countries, one might assign one DALY an
impact value of $100. One might assign a year of good quality primary education a value
that reflects both the economic contribution to human capital and the value to the student
of being better educated (Clements, 1995). It is essential that each kind (unit) of impact is
assigned the same value in each evaluation; one role for the proposed association would
be to establish impact values and to ensure that they are applied consistently. (On the cost
side as well, any nonmonetary costs need to be incorporated consistently.) As the associ-
ation would proceed on a sectoral basis, impact values for a given sector could be assigned
when the association enters it.
Conclusion
Although cost–benefit analysis has been one of the main frameworks for monitoring and
evaluating development projects, organizational arrangements for project evaluation (particu-
larly at project completion) have left the analysis vulnerable to positive bias and inconsis-
tency. All the evaluations of World Bank and USAID projects that we have reviewed have
demonstrated significant positive bias, and there has been no institutional mechanism to make
impact estimates consistent. The development community should try to reduce world poverty
as much as it can with available resources. The evidence suggests, however, that learning and
accountability with respect to this goal, at least in the World Bank and USAID, have been
endemically weak. Experience for the development community does not lead to lessons, or
not nearly as well as it could have done, and managers persist with weak programs even when
prospects for success are dim. The other side of this poor practice is an important opportu-
nity: The development community could build a much stronger sensibility for cost-effective
choices into its organizational cultures, and this would substantially enhance the profession-
alism of the development enterprise.
As with any management innovation, M&E for cost effectiveness should be assessed in
terms of its likely costs and benefits. To have independent and consistent evaluations by
trained evaluators at the completion of most programs and projects, and to improve baselines
and monitoring systems, could raise the cost of the average project by perhaps two or three
percentage points. This would be on top of resources presently devoted to M&E, which our
analysis indicates often are not well spent. This could be expected to lead to the more rapid
identification of successful approaches and of the reasons for their success, and it would
alter the incentive environments facing project managers. The managers of the UP sodic
lands projects, for example, might have postponed starting the second project until the first
had been evaluated, and then, they might either have worked harder to build sustainable vil-
lage institutions to manage drainage or have reconsidered the project design altogether.
World Bank supervisors of the Uganda water and sanitation project might have insisted that
the government adhere to its commitment to raise water prices along with inflation so the
water company’s employees could have been paid decent wages. Managers of USAID’s
Uganda cooperatives project might have halted the project when serious corruption was dis-
covered. To get an idea of relative proportions, it seems plausible to expect that the proposed
reforms could eliminate at least 30% of poorly designed projects (because of better com-
munity learning) and lead to improvements in the management of the average project
(because of better learning and better accountability) that would increase rates of return by
5%. More thoroughgoing independent evaluations would also be more likely to identify the
few cases of serious corruption and to instigate appropriate legal action. Altogether, such
gains would repay the investment in M&E several-fold.
We have noted that the selection of projects considered in this article is not representative
of the development community overall. The extent of inconsistency and bias in evaluations of
development programs and projects needs to be studied more widely. It should be noted, how-
ever, that the project-level evidence we have considered is consistent with our structural
argument. Given the incentive structure and the lack of institutional support for consistency
in the evaluation of donor-funded development projects, it would be surprising if positive bias
and inconsistency were not widespread.
Notes
1. The Active Learning Network for Accountability and Performance in Humanitarian Action is perhaps the closest
approximation to what we propose (Active Learning Network for Accountability and Performance in Humanitarian
Action, n.d.). Sources for evaluation principles include, inter alia, the norms and standard for evaluation in the United
Nations system (United Nations Evaluation Group, 2005a, 2005b) and the Program Evaluation Standards (American
Evaluation Association, n.d.).
2. The Poverty Action Lab at the Massachusetts Institute of Technology (Abdul Latif Jameel Poverty Action
Lab, n.d.) has demonstrated that evaluations with randomized trials can be applied more widely to development prob-
lems than many had expected, and the Evaluation Gap Working Group is promoting an International Council to cat-
alyze impact evaluations based on this methodology (Center for Global Development, n.d.).
References
Abdul Latif Jameel Poverty Action Lab. (n.d.). Abdul Latif Jameel Poverty Action Lab. Retrieved April 13, 2008,
from http://www.povertyactionlab.com/
Active Learning Network for Accountability and Performance in Humanitarian Action. (n.d.). ALNAP: Active
Learning Network for Accountability and Performance in Humanitarian Action. Retrieved April 13, 2008, from
http://www.odi.org.uk/ALNAP/
American Evaluation Association. (n.d.). The Program Evaluation Standards. Retrieved April 13, 2008, from
http://www.eval.org/evaluationdocuments/progeval.html
Beijuka, J. K. (1993). Financial and operational review of Uganda Cooperative Central Union Limited: Draft final
report. Kampala, Uganda: U.S. Agency for International Development.
Bollen, K., Paxton, P., & Morishima, R. (2005). Assessing international evaluations: An example from USAID’s
Democracy and Governance Program. American Journal of Evaluation, 26(2), 189-203.
Boone, P. (1996). Politics and the effectiveness of foreign aid. European Economic Review, 40(2), 289-329.
Bourguignon, F., & Sundberg, M. (2007). Aid effectiveness—Opening the black box. The World Bank. Retrieved
March 6, 2007, from http://siteresources.worldbank.org/DEC/Resources/Aid-Effectiveness-MS-FB.pdf
Burnside, C., & Dollar, D. (2004). Aid, policies, and growth: Revisiting the evidence (World Bank Policy Research
Paper 3251). Washington, DC: The World Bank.
Cassen, R., & Associates. (1986). Does aid work? (1st ed.). Oxford, UK: Oxford University Press.
Center for Global Development. (n.d.). When will we ever learn? Closing the evaluation gap. Retrieved April 13,
2008, from http://www.cgdev.org/section/initiatives/_active/evalgap
Chianca, T. (2007). International aid evaluation: An analysis and policy proposals. Unpublished PhD dissertation,
Western Michigan University, Kalamazoo.
Clements, P. (1993). An approach to poverty alleviation for large international development agencies. World
Development, 21(10), 1633-1646.
Clements, P. (1995). A poverty oriented cost-benefit approach to the analysis of development projects. World
Development, 23(4), 577-592.
Clements, P. (1996). Development as if impact mattered: A comparative organizational analysis of USAID, the World
Bank and CARE based on case studies of projects in Africa. Unpublished PhD dissertation, Princeton University,
Princeton, NJ.
Clements, P. (1999). Informational standards in development agency management. World Development, 27(8), 1359-1381.
Clements, P. (2005). Monitoring and evaluation for cost-effectiveness in development management. Journal of
Multidisciplinary Evaluation, 1(2), 11-30.
Development Assistance Committee. (1991). Principles for evaluation of development assistance (Document
Reference Number OECD/GD(91)208). Retrieved December 1, 2005, from http://www.oecd.org/document/22/
0,2340,en_2649_34435_2086550_1_1_1_1,00.html
Easterly, W. (2003). Can foreign aid buy growth? The Journal of Economic Perspectives, 17(3), 23-48.
Horta, K. (2006). The World Bank’s decade for Africa: A new dawn for development aid? Yale Journal of International
Affairs, 1(2), 4-23.
Kariungi, F. T. (1989). National economic savings through adoption of improved grain storage structures and storage
management. Nairobi, Kenya: U.S. Agency for International Development.
Kusek, J. Z., & Rist, R. C. (2004). Ten steps to a results-based monitoring and evaluation system. Washington, DC:
The World Bank.
Martens, B. (2002a). Introduction. In B. Martens, U. Mummert, P. Murrell, & P. Seabright (Eds.), The institutional
economics of foreign aid (pp. 1-33). Cambridge, UK: Cambridge University Press.
Martens, B. (2002b). The role of evaluation in foreign aid programmes. In B. Martens, U. Mummert, P. Murrell, &
P. Seabright (Eds.), The institutional economics of foreign aid (pp. 154-177). Cambridge, UK: Cambridge
University Press.
Martens, B., Mummert, U., Murrell, P., & Seabright, P. (2002). The institutional economics of foreign aid. Cambridge,
UK: Cambridge University Press.
Michaelowa, K., & Borrmann, A. (2006). Evaluation bias and incentive structures in bi- and multilateral aid agencies.
Review of Development Economics, 10(2), 313-329.
Mosley, P., Hudson, J., & Horrell, S. (1987). Aid, the public sector and the market in less developed countries. The
Economic Journal, 97(387), 616-641.
Nelson, P. J. (1995). The World Bank and non-governmental organizations: The limits of apolitical development.
New York: St. Martin’s Press.
Pritchett, L. (2002). It pays to be ignorant: A simple political economy of rigorous program evaluation. Policy
Reform, 5(4), 251-269.
Rodrik, D. (1999). The new global economy and developing countries: Making openness work (Policy Essay No. 24).
Washington, DC: Overseas Development Council.
Roof, J., Wiebe, K., & Trail, T. (1994). Final evaluation report: Cooperative Agriculture and Agribusiness Support
Project (617–0111). Kampala, Uganda: U.S. Agency for International Development.
Schaefer, B. D. (2002). The millennium challenge account: An opportunity to advance development (Heritage
Lecture #753). Retrieved December 1, 2007, from www.heritage.org/research/tradeandforeignaid/hl753.cfm
Svensson, J. (2006). The institutional economics of foreign aid. Swedish Economic Policy Review, 13, 115-137.
Tendler, J. (1975). Inside foreign aid. Baltimore: Johns Hopkins University Press.
United Nations Evaluation Group. (2005a). Norms for evaluation in the UN system. Retrieved December 1, 2005,
from: http://www.uneval.org/docs/ACFFC9F.pdf
United Nations Evaluation Group. (2005b). Standards for evaluation in the UN system. Retrieved December 1, 2005,
from http://www.uneval.org/docs/ACFFCA1.pdf
Wenar, L. (2006). Accountability in international development aid. Ethics & International Affairs, 20(1), 1-23.
World Bank. (1992). Effectiveness implementation: Key to development impact (The Wapenhans Report). Washington,
DC: Author.
World Bank. (1998). India—Second Uttar Pradesh Sodic Lands Reclamation Project (project information document).
Washington, DC: Author.
World Bank. (2001a). Implementation completion report, India, Uttar Pradesh Sodic Lands Reclamation Project
(Report No. 22886). Washington, DC: The World Bank, Rural Development Sector Unit, South Asia Region.
World Bank. (2001b). India’s Uttar Pradesh Sodic Lands Reclamation Project I. Land resource management.
Washington, DC: Author. Retrieved December 1, 2005, from http://lnweb18.worldbank.org/ESSD/ardext.nsf/
17ByDocName/IndiasUttarPradeshSodicLandsReclamationProject1
World Bank. (2007). World Bank assistance to agriculture in Sub-Saharan Africa: An IEG review. Independent
Evaluation Group. Washington, DC: Author.
World Bank, Operations Evaluation Department. (2004). Project performance assessment report, India, Uttar
Pradesh Sodic Lands Reclamation Project (Credit 2510) (Report No. 29124). Washington, DC: The World Bank.
World Health Organization. (n.d.). Disability adjusted life years (DALY). Retrieved April 13, 2008, from
http://www.who.int/healthinfo/boddaly/en/index.html
World Fertilizer Use Manual. (n.d.). Retrieved October 1, 2006, from http://www.fertilizer.org/ifa/publicat/html/
pubman/introd5.htm

Clements 2008 Improving Aid Evaluation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Clements 2008 Improving Aid Evaluation

Uploaded by

Copyright:

Available Formats

American Journal of

Development Aid hosted at

Keywords: development assistance; cost effectiveness; evaluation bias; professionalism

Limited Overall Impacts of Aid

development, it can be difficult and should be conceived as complementary to development

Evidence of Weak and Positively Biased Evaluation

Eight World Bank and USAID Projects in Africa

The Sodic Lands Reclamation Project in India

Components Appraisal Actual

Land reclamation: Provision of an effective drainage network; application of gypsum,

Source: World Bank, Operations Evaluation Department, 2004, p. 4.

Financing Source US$ Million Percentage US$ Million Percentage

Beneficiaries 12.0 (in kind) 15 37.1 (in kind) 36

Source: World Bank, 2001a, p. 10.

Criteria Assessed ICR (2001) PPAR (2004)

Outcome Satisfactory Moderately satisfactory

Source: World Bank, Operations Evaluation Department, 2004, p. iii.

assumptions: (a) weaknesses in maintenance of drains and in provision of extension services

performance “satisfactory.” Regarding institutional development, the project had strengthened

M&E for Cost Effectiveness

Evaluation of development assistance should help to address the enormous management

The World Bank’s Results-Based M&E

The Development Assistance Committee’s (DAC)

Evaluations Based on Randomized Trials

Analytics of M&E for Cost Effectiveness

You might also like