You are on page 1of 15

503872

research-article2013

JTEXXX10.1177/0022487113503872Journal of Teacher EducationFuller

Article

Shaky Methods, Shaky Motives: A Critique


of the National Council of Teacher
Qualitys Review of Teacher Preparation
Programs

Journal of Teacher Education


2014, Vol 65(1) 6377
2013 American Association of
Colleges for Teacher Education
Reprints and permissions:
sagepub.com/journalsPermissions.nav
DOI: 10.1177/0022487113503872
jte.sagepub.com

Edward J. Fuller1

Abstract
The National Council on Teacher Qualitys (NCTQ) recent review of university-based teacher preparation programs
concluded the vast majority of such programs were inadequately preparing the nations teachers. The study, however, has
a number of serious flaws that include narrow focus on inputs, lack of a strong research base, missing standards, omitted
research, incorrect application of research findings, poor methodology, exclusion of alternative certification programs, failure
to conduct member checks, and failure to use existing evidence to validate the reports rankings. All of these issues render
the NCTQ report less than useful in efforts to understand and improve teacher preparation programs in the United States.
The article also suggests alternative pathways NCTQ could have undertaken to work with programs to actually improve
teacher preparation. The article concludes by noting that the shaky methods used by NCTQ suggest shaky motives such that
the true motives of NCTQ for producing the report must be questioned.
Keywords
preservice education, educational policy, education reform

Introduction
Recent headlines and leading remarks about U.S. teacher
preparation programs proclaim Teacher prep programs get
failing marks (Sanchez, 2013), University programs that
train U.S. teachers get mediocre marks in first-ever ratings
(Layton, 2013), The nations teacher-training programs do
not adequately prepare would-be educators for the classroom, even as they produce almost triple the number of graduates needed (Elliot, 2013).
As readers of the Journal of Teacher Education are likely
aware, these remarks stem from the recently released Teacher
Prep Review by the National Council on Teacher Quality
(NCTQ, 2013b). Partnering with the U.S. News & World
Report, the NCTQ released their evaluation of universitybased teacher education programs in the United States based
on 18 standards developed by NCTQ. The study is another
critique of U.S. teacher preparation programs that follows a
long history of critiquing teacher preparation programs
(Zeichner & Liston, 1990).
Critics of traditional teacher preparation have used the
report as evidence that teacher preparation in the United
States is broken and we need to fix the system by either
radically changing traditional university-based programs
and/or abandoning traditional programs in favor of alternative programs. For example, Arthur Levine (2013) wrote,

The NCTQ described a field in disarray with low admission


standards, a crazy quilt of varying and inconsistent programs,
and disagreement on issues as basic as how to prepare teachers or
what skills and knowledge they need to be effective. The report
found few excellent teacher-education programs, and many more
that were failing. Most were rated as mediocre or poor.

While the NCTQ critique is not substantially different


than previous calls for reform, the critique comes at a time of
increased belief that traditional preparation programs and
public schools have failed and we need to end the effective
monopoly that education schools have on teacher training.
Policymakers must foster a robust marketplace of providers
from which schools and school districts can choose candidates (Kamras & Rotherham, 2007).
Not surprisingly, even before the release of the report,
NCTQs effort raised a number of concerns by teacher preparation programs such that most programs refused to participate in the effort. Indeed, only 10% of the more than 1,100
programs identified by NCTQ fully participated (American
1

Penn State University, University Park, PA, USA

Corresponding Author:
Edward J. Fuller, Penn State University, 204D Rackley Bldg., University
Park, PA 16802, USA.
Email: ejf20@psu.edu

Downloaded from jte.sagepub.com by guest on January 19, 2015

64

Journal of Teacher Education 65(1)

Association of Colleges of Teacher Education [AACTE],


2013). Moreover, NCTQs conclusions generated myriad
critiques from a wide variety of individuals and organizations around the country, many of which are located on the
AACTE website.
While many of these critiques raised excellent points, the
majority were subject specific (e.g., reading), state specific,
or university specific.1 The critiques generally did not, then,
provide a complete critique of the NCTQ review. More
importantly, few commentators provided an in-depth critique of either the rationale for the approach taken by NCTQ
or the methodology used by NCTQ. Finally, none of the critiques examined the relationship between star rankings and
outcomes for particular sets of preparation programs.
This commentary addresses some of the shortcomings in
the other reviews in four ways. First, this critique covers the
breadth of issues from a more holistic perspective rather than
from the perspective of a particular subject area organization, state, or university. Second, this commentary provides
an in-depth review of the NCTQ standards, rationale for the
NCTQ methods, and the methods themselves. Third, this
commentary uses data from Texas to examine some of the
key concerns with the NCTQ review. Finally, this review
uses data from Texas and Washington to examine the relationship between star rankings and preparation program outcomes, including certification test passing rates and graduate
value-added estimates for reading and mathematics.

Purpose
The purpose of this commentary is to examine the effort by
NCTQ to evaluate, judge, and rank university-based teacher
preparation programs using a one- to four-star system. This
commentary is important for those in the field of teacher
preparation for two primary reasons. First, the NCTQ report
will be conducted in future years and those seeking to attack
and dismantle university-based preparation programs will
use the reports as evidence of the poor quality of such programs as shown above. Those in Colleges of Education
particularly in teacher preparation programsneed to be
acutely aware of the report details and the problems with the
report so that they can engage effectively with others in a
thoughtful and educated manner. Indeed, I contend being
knowledgeable about the political happenings in our field is
part of the job duties of a professor. In particular, such
knowledge is necessary to thoughtfully discuss the issue
with the media and policymakers at all levelsincluding
those at your own university. This is an important role for
faculty that has traditionally been largely ignored but is
increasingly important given the unrelenting attacks on education in the mainstream media. Finally, despite the flaws of
the NCTQ report, it does accurately document the paucity of
research examining the association between what happens
in preparation programs and outcomes such as teacher
placement, teacher retention, teacher sense of self-efficacy,

licensure, certification scores, the quality of graduates


teaching, and K-12 student outcomes.
My comments are separated into five major sections:
(a) methods, (b) personal background and perspective, (c)
rationale for study design, (d) problems with NCTQ methodology, (e) suggestions, and (f) conclusions. However, before
delving into these five sections, I briefly describe the methods for this commentary and discuss my own experience in
teaching and teacher preparation.

Method
This commentary is based on my own analysis of the NCTQ
report as well as a number of other critiques of the report. My
own analysis was initially posted as a blog the day before the
report was released and was based on the many problems
with the past NCTQ reports. Subsequent to the release of the
study, I expanded my critique based on the details of the
report. Finally, for this commentary, I read a number of critiques of the NCTQ report from numerous organizations and
scholars in the field.
While this review encompasses the major critiques made
by others, it also includes my own unique critiques from my
experiences in the field as a researcher. Thus, most of my
unique contribution appears in the critique of the NCTQ
methodology and in the critique concerning the exclusion of
alternative preparation programs. My qualifications for making such critiques are presented below.

Personal Background and Perspective


I graduated from a traditional undergraduate, universitybased preparation program at The University of Texas at
Austin. I subsequently taught for 3.5 years and then completed a Masters Degree in educational administration with
the intention of becoming a principal. However, I chose to
pursue my PhD from The University of Texas at Austin in
education policy. After working as an evaluation and research
specialist at University of TexasAustin, I was hired as the
Director of Research at the State Board for Educator
Certification (SBEC)the state agency responsible for the
certification of teachers and accreditation of teacher preparation programs. As Director of Research, I examined analyzed
data on some outcomes of teacher preparation programs,
including certification scores, production of graduates,
placement rates, and retention rates. Subsequent to leaving
SBEC, I worked at University of TexasAustin, and evaluated a number of preparation programs around the state. I
was also hired by the Center for Research, Evaluation and
Advancement of Teacher Education (CREATE) to provide
such data to a consortium of teacher preparation programs
across the state. As a proponent of providing greater information on preparation programs to prospective students of
teacher preparation programs, I assisted Texas legislators in
crafting a bill that created a consumer report card for all

Downloaded from jte.sagepub.com by guest on January 19, 2015

65

Fuller
teacher preparation programs in Texasincluding alternative certification programs (ACPs). Thus, I have experience
in working with teacher preparation program data and creating report cards on such programs.
Finally, I am a strong proponent of thoughtfully collecting data and carefully analyzing such data as a means to provide useful feedback to preparation program personnel,
make available information to prospective preparation program students, and hold preparation programs accountable.
Yet, I cannot emphasize enough how careful such efforts
need to be because collecting and appropriately analyzing
such data is terribly complex and requires highly skilled
researchers with deep knowledge of preparation programs.
My commitment to these ideals is evidenced by my aforementioned activities in Texas.

Critiques of the NCTQ Report


This section of this commentary focuses on two overarching
critiques of the NCTQ report: Rationale for Study Design
and Report Methodology. Within each of these sections, I
address multiple problems with the report.

Critique 1: Rationale for Study Design


This section focuses on the rationale for the study design. My
critique is focused on four issues in particular: the studys
focus on inputs rather than outcomes, the lack of a research
foundation for the standards chosen by NCTQ, omitted
research and standards, and incorrect application of research
findings in developing the study.
Focus on inputs. As many commentators have pointed out,
one major criticism of the report is the almost unilateral
focus on inputs and the lack of any serious consideration of
outcomes. While inputs are certainly important to preparation programs and quality preparation, the outcomes are
what truly differentiate quality preparation from poor preparation. Outcomes that could have been examined include
teacher placement rates, teacher longevity in the profession,
actual behaviors of teachers in the classroom, and the effect
of teachers on various student outcomes.
NCTQ claims the barriers to assessing outcomes are simply too large to overcome without the investment of a substantial amount of money. In fact, I would agree with NCTQ
on this pointassessing the outcomes of teacher preparation
programs would be quite costly and difficult. For example,
analyzing outcomes such as placement, retention, and impact
on student test scores would require states to collect and
make available detailed data in a number of areas such as
teacher characteristics and prior experiences; teacher production, placement, and retention; the link between test
scores and students; the link between students, their teachers,
and the preparation programs of the teachers; a wide variety
of school characteristics; and the characteristics of the

principal. Most states do not collect a sufficient amount of


these data and some that do collect such data either do not
know how to use it or simply do not want to use it.
Furthermore, even if NCTQ had obtained access to the
data, Wineburg (2006) contends that using such data would
be problematic. Specifically, she argues:
Profound methodological problems occur when linking
individual teacher actions with subsequent pupil performance,
including substantial intervening variables, questions about
appropriate measures of student learning, issues regarding the
lack of test standardization between school and districts, and
problems in the mechanics of tracking candidates and accessing
data. (p. 52)

Her cautions seem warranted given recent efforts to


examine the effectiveness of preparation programs in increasing student test scores. For example, in their exploratory
effort to examine the effectiveness of graduates from Florida
teacher preparation programs on student test scores, Mihaly,
McCaffery, Sass, and Lockwood (2012) concluded that making such comparisons is quite difficult for numerous reasons,
one of which is the need to control for unobserved characteristics of schools in which graduates teach. While variables
about observed characteristics such as student demographics, school size, and overall test performance are typically
available to researchers, unobserved characteristics such as
community support, school climate, quality of mentoring
provided, and teacher collegiality are not available. To control for the unobserved characteristics, researchers use
school fixed effectsa method that forces the comparison
made between a teachers student growth scores to other
teachers in her/his school rather than to all teachers across
the state. For fixed effects to work properly in the case of
evaluating preparation programs, teachers from the various
programs must be employed in the same schools. Mihaly et
al. (2012) found this to not be the case for all of the programs. Furthermore, they found schools that did employ
teachers from multiple programs were substantially different
in terms of student demographics and other contextual factors; thus, conclusions about the efficacy of preparation programs would be limited to those schools. Ultimately, the
authors concluded that any efforts to compare, judge, or rank
teacher preparation programs based on graduates effectiveness in increasing test scores will be inaccurate. Thus, even
if NCTQ included outcome measures in the rankings, the
outcome measures could very well be inaccurate.2
Given the extreme difficulty in assessing outcomes
described above, excluding outcome measures was a reasonable decision by NCTQ. What is terribly troubling, however,
is that NCTQ decided to make the quantum leap from inputs
to quality preparation. Essentially, NCTQ claimed that it can
assess the quality of a preparation programs teachers based
only on inputs that are measured almost solely by a review of

Downloaded from jte.sagepub.com by guest on January 19, 2015

66

Journal of Teacher Education 65(1)

syllabi of some, but not all, courses taken by students in a


program.3 This would, in fact, be reasonable only if a large
body of research had fully established causal linkages
between the inputs included in the study and important outcomes such as those described above. Yet, this is simply not
the case. There is simply not enough research evidence to
make such a leap (Coggshall, Bivona, & Reschly, 2012), particularly when considering the scant research base linking
outcomes for core subject area teachers from different preparation programs across different school levels and contexts.
This issue is addressed in the next section.
An alternative pathway available to NCTQone that
seems to not be an option for many of those currently pushing the current brand of education reformswould have
been to invest in high-quality research linking inputs and
processes with important outcomes before evaluating and
judging programs. More specifically, NCTQ could have chosen to limit the standards to only those with a robust research
base while concomitantly investing in such research. NCTQ
and the funders behind the NCTQ report could have invested
in a number of studies focusing on one or two particular standards and then incorporated the findings into the standards.
While some might argue taking this pathway would extend
the time frame for the improvement of teacher preparation
programs to unacceptable lengths, others would certainly
argue that getting the standards right is more important than
setting them quickly. Furthermore, preparation programs are
unlikely to be able to initiate and fully implement changes in
a short time period anyway. Perhaps, focusing on a small
number of standards in the initial year and building on those
standards would actually increase the probability of preparation programs adopting the standards.
Standards lack a solid research base. The second major critique of the NCTQ effort is the failure to ground the standards on a solid research base. This is not entirely the fault of
NCTQ as the research base connecting inputs to outcomes is
simply not robust enough to develop a clear set of standards
upon which programs could be ranked or held accountable.
Some may claim this is the fault of researchers, but lack of
access to data, the incredible complexity in conducting such
studies, and inadequate funding to support such studies seriously impede the development of a solid research base.
Given the paucity of research in this area, even NCTQ
admits their evaluation standards are not based on an extensive literature base. Indeed, NCTQ states,
[Our] standards were developed over five years of study and are
the result of contributions made by leading thinkers and
practitioners from not just all over the nation, but also all over
the world. To the extent that we can, we rely on research to
guide our standards. However, the field of teacher education is
not well-studied.

Note that the word researcher is not included in this


description. NCTQ relied on thinkers and practitioners, but

not researchers. While thinkers and practitioners can undoubtedly provide useful insight, researchers are critical to such
standard setting. In fact, many beliefs based on common
sense turn out to be incorrect after research examines an issue.
In the full report,4 NCTQ provides a difficult to interpret
graph about the sources of support for the various standards.
The most striking revelation of the graph is that high-quality
research was only a very small source for the development
and adoption of the standards.
Even the research consensus portion of the graph, however, is quite misleading as will be explained below. To their
credit, NCTQ does provide additional documentation for
each standard by providing the number of research studies
supporting each standard in separate documents located on
their website at http://www.nctq.org/teacherPrep/ourApproach/standards/. For each standard, NCTQ (2013a) classified research for each standard in two stages: first
considering design strength relative to several variables
common to research designs, and second, considering
whether student effects (as measured by external, standardized assessments) were considered (p. 2). More detailed
descriptions of stronger and weaker designs as defined by
NCTQ are included in the appendix.
Using the tables provided by NCTQ for each standard, I
created Table 1 that includes the number and percentage of
studies for each standard within the four possible categories
created by NCTQ. As shown in Table 1, only 9 of the 18
standardsjust 50%rely on more than one study classified as having a strong design and a focus on student test
scores. Astonishingly, 7 of the 18 standards did not have a
single study classified as having a strong design and a focus
on student test scores. Only three standardsselection criteria, elementary mathematics, and High School Contenthad
five or more such studies. Thus, I would argue only three
standards have enough studies to create some sort of consensus that a particular standard is associated with positive student outcomes.
Even this is misleading in two ways. First, NCTQ does not
provide any connection between the listed research studies
and the individual indicators within each standard, the core
subject areas included in the study (elementary reading and
mathematics, English language arts, mathematics, science,
and social studies), or the school levels addressed (elementary schools, middle schools, and high schools). Thus, while
a standard may have a few supportive research studies, we do
not know how well research supports the actual indicators
used by NCTQ or whether the research supports the use of
those indicators across the various subject areas and school
levels. For example, while the research provided by NCTQ
on secondary content provides some limited evidence of the
importance of subject matter knowledge in improving student
achievement, NCTQ uses the evidence to adopt an indicator
that measures whether a graduate has at least 30 hr of content
courses or a major in the field. The research cited by NCTQ,
however, does not support the adoption of this indicator in
English language, arts, or social studies. More disturbingly,

Downloaded from jte.sagepub.com by guest on January 19, 2015

67

Fuller
Table 1. Number and Percentage of Studies Per Standard by Strength of Methods and Examination of Student Outcomes.
Strong design

Weak design

No
outcomes

Outcomes

Outcomes

No outcomes

Standard

Total studies

Selection criteria
Early reading
English language learners
Struggling readers
CC Elementary mathematics
CC Elementary content
CC middle school content
CC high school content
Special education
Classroom management
Assessment and data
Equity
Student Teaching 1
Student Teaching 2
Secondary methods
Instruction design for special education
Outcomes
Evidence of effectiveness

6
2
0
2
5
2
3
5
0
2
0
2
0
1
1
0
0
0
31

46.2
9.5
0.0
16.7
14.3
13.3
33.3
35.7
0.0
9.1
0.0
5.3
0.0
6.7
10.0
0.0
NA
NA
11.4

6
1
1
0
6
2
2
2
1
2
2
1
1
0
0
1
0
0
28

46.2
4.8
33.3
0.0
17.1
13.3
22.2
14.3
16.7
9.1
7.4
2.6
5.6
0.0
0.0
6.7
NA
NA
10.3

0
1
0
1
0
0
0
0
0
0
8
0
0
0
0
0
0
0
10

0.0
4.8
0.0
8.3
0.0
0.0
0.0
0.0
0.0
0.0
29.6
0.0
0.0
0.0
0.0
0.0
NA
NA
3.7

1
17
2
9
24
11
4
7
5
18
17
35
17
14
9
14
0
0
204

7.7
81.0
66.7
75.0
68.6
73.3
44.4
50.0
83.3
81.8
63.0
92.1
94.4
93.3
90.0
93.3
NA
NA
74.7

13
21
3
12
35
15
9
14
6
22
27
38
18
15
10
15
0
0
273

Note. CC = Common Core; NA = not applicable.

NCTQ cites a study by Goldhaber (2007) as being supportive


of the NCTQ high school content standard even though the
study examined only elementary teachers. Thus, NCTQ simply generalized the findings to other grade levels despite no
evidence that such a generalization was warranted.
Second, NCTQ does not provide a table that lists the findings of the studies. A more transparent approach would be to
provide documentation of what findings were statistically
significant and not statistically significant and, if statistically
significant, the direction of the association (positive or negative). This would shed some light on the conflicting evidence
of the studies cited. For example, the report cites a study by
Monk (1994) that subject-specific pedagogy courses influence student outcomes as evidence for examining whether
programs offer subject-specific pedagogy courses. Yet, elsewhere in the document, NCTQ completely ignores Monks
finding of diminishing returns to teacher effectiveness after 5
secondary mathematics courses by recommending at least 10
courses or a major in mathematics. A similar example of
selective research reporting exists with respect to the student
teaching standard. NCTQ cites Boyd, Grossman, Lankford,
Loeb, and Wyckoff (2009) as support for the adoption of an
indicator requiring the student teacher be provided written
feedback at least 5 times. Yet, the very same study found that
the existence of a capstone project has the same impact. Why
did NCTQ select one finding to include and not the other?
And why does the methodology section of the report indicate

that information on capstone projects was collected, yet the


existence of a capstone project was not included in the study?
An even more startlingly example is NCTQ citing a Harris
and Sass (2011) study as providing strong support for requiring content instruction at all three schooling levels (elementary schools, middle schools, and high schools) despite the
fact that Harris and Sass state, There is no evidence that
teachers pre-service (undergraduate) training or college
entrance exam scores are related to productivity (p. 798).
This not only directly contradicts the standards that were
supposedly supported by the authors study, but directly contradicts the entire thesis of the NCTQ review! Ultimately,
examples such as this suggest NCTQ simply cherry-picked
the research results which supported their preconceived
notions of what quality preparation entails.
Omitted research, missing standards, and narrowly defined standards. While the research base in this area is relatively thin,
NCTQ clearly did not even read all of the pertinent literature
and utilize the results from the extant literature in adopting
standards. For example, Eduventures (2013) contended a
number of inputs that do not appear in the NCTQ study have
at least some research evidence that establishes a link
between the preparation activity and student outcomes.
These areas include the quality of instruction provided in
teacher preparation and content courses; the provision of student support services; mentoring and induction provided by

Downloaded from jte.sagepub.com by guest on January 19, 2015

68

Journal of Teacher Education 65(1)

the program; and the length of the clinical experience


required of students.
Similarly, the Literacy Research Panel argued in two different documents (Dooley et al., 2013; Pearson & Goatley,
2013) that the NCTQ standards have important gaps with
respect to reading and appear to randomly apply standards to
the various school levels. Pearson and Goatley (2013)
pointed out that the NCTQ standards ignore the following
areas: speaking and listening; writing; the role of texts in
learning; grouping of students for instruction; motivation
and engagement for learning; and metacognition.
Furthermore, a number of reviewers have argued that the
standards lack any mention of the need to address diversity
issues within the instructional framework of teacher preparation programs (Dooley et al., 2013; Montano, 2013; Pearson
& Goatley, 2013). For example, Dooley et al. (2013) stated,
NCTQs singular focus on the five elements of reading is
neither broad nor deepnor is it helpful for preparing teachers
for diverse classrooms. Any report that talks of students as
though theyre all alikeas the NCTQ review doesneglects
the reality of todays diverse classrooms.

Perhaps NCTQ would argue there is no solid research


base that examines the association between specific preparation program practices and effects for subpopulations of students, but this certainly did not stop NCTQ in other areas.
While not specifically focusing on the actual practices of
specific programs, Goldhaber and Liddle (2011) found that
some programs were more effective in improving the test
scores of economically disadvantaged students than other
programs. This suggests that program characteristics or practices may, in fact, be associated with greater effectiveness
with historically disadvantaged populations.
An additional omission is the failure of NCTQ to have a
standard on the instruction of English Language Learner students and struggling readers at the secondary level. NCTQ
does have standards in both these areas at the elementary level,
but the standards do not apply to secondary preparation programs. In pointing out this contradiction, Pearson and Goatley
(2013) noted English learners are virtually everywhere, and
they are not just in elementary schools! Struggling readers are
a fact of life in secondary schools, too.
Perhaps most importantly, the standards for reading/English
and mathematics instruction are not grounded in the research
that is clearly evident in the standards developed by the
National Council of Teachers of English or the National
Council of Teachers of Mathematics. It is unclear why NCTQ
believes they know more about reading and mathematics
instruction than actual experts. In fact, the NCTQ review provides no evidence that the authors even read the various content
area standards developed by experts in the field. The point here
is that NCTQ simply did not consider all available research.
Incorrect application of research findings.Given how the
NCTQ report utilizes research studies to adopt standards

against which programs are judged, one could easily argue


that the NCTQ authors do not understand how research
should be utilized. Most scholars in the field of teacher preparation would readily agree that there is some high-quality
research that examines the link between teacher preparation
practices and program outcomes, but that we also need much
more research to make definitive conclusions about best
practices. Using research to identify potential best practices
and using research to rank and grade institutions are two
totally different uses of research. When scholars conduct
research, they are looking for patterns in the data in an effort
to ascertain whether a certain characteristic of teacher preparation programs is associated with improved teacher practice
or student outcomes. Such research is extremely useful and
should not be discounted. Such research, however, is correlational rather than causal. Moreover, there is a certain
amount of variation in such findings such that one could not
reasonably conclude from the research that every teacher
preparation that used a particular strategy was high-performing
or that every teacher preparation program that did not use a
particular strategy was low-performing. Researchers conclude that teacher preparation programs tend to have better
outcomes if they use a particular strategy, but that some programs that use the strategy are low-performing and some that
dont use the strategy are high-performing. Only after systematic replication of such findings across varying contexts
would the adoption of standards be considered reasonable
(Stanovich & Stanovich, 2003). Even then, ranking programs based on the standards would be problematic without
examining the actual effectiveness of the programs.
NCTQ, however, completely misuses the research by
contending that every program must use a certain strategy.
That is simply not what research says or what researchers
would advocate in terms of how the data should be used.
Indeed, there is widespread consensus that research should
not be used this way which is why researchers are loath to
rank or grade programsthey know rankings will be inaccurate and potentially cause harm to good people who run
effective programs and give undue recognition to ineffective
programs.
Summary. The above evidence calls into serious question the
validity of the NCTQ standards. Unfortunately, the evidence
base is simply not robust enough to support adopting all of
the standards, while research ignored by NCTQ would support the adoption of standards not included in the NCTQ
report. Ultimately, one must question the motives of NCTQ
to adopt and rank programs based on a very shaky foundation underlying the standards. While NCTQ may believe that
ranking programs will prod policymakers to adopt accountability systems based on the NCTQ standards and the adoption of such accountability mechanisms will improve the
preparation of teachers, there is no evidence that such systems will improve programsespecially absent the financial
support to engage in improvement efforts. Under such a scenario, programs may invest substantial time, money, and

Downloaded from jte.sagepub.com by guest on January 19, 2015

69

Fuller
effort to meet standards that additional research may determine to not be associated with important outcomes.

Critique 2: Problems With NCTQ Methodology


In this section, I first review, then critique, the methodology
used by NCTQ.
Overview of NCTQ methodology.NCTQ states that their
review was based on 11 different data sources: syllabi,
required textbooks, institutional catalogs, student teaching
handbooks, student teaching evaluation forms, capstone
project guidelines, state regulations, institutiondistrict correspondence, graduate and employer surveys, state data on
institutional performance, and institutional demographic
data. The first phase of the study included two methods:
reviewing of institutional websites and asking programs to
provide information. The second phase of the study also
included two methods: a content analysis of the syllabi as
well as ancillary materials and the required readings.
The primary data source was syllabi, although the other
10 types of information were collected if available. Data
were validated for accuracy by a team of trained general
analysts (NCTQ, 2013b, p. 82). From the syllabi, the analysts examined the topics taught and the textbooks used.
Finally, according to NCTQ, the analysts and reviewers are
experts in the field. The methodology section in the full
report provides a more detailed description of this process.
Critique of the NCTQ methodology. There are a number of critiques of the NCTQ methodology. These critiques include
the use of syllabi as an indicator of content, insufficient data
and response rate, exclusion of ACPs, and failure to adequately document the ranking methodology.
Use of syllabi as an indicator of content and quality. As mentioned above, much of the NCTQ report is based on a review
of syllabi in required courses. NCTQ claims that using syllabi to assess course content is an accepted research practice and that syllabi are likely to overestimate the coverage
of content during a class, thus using syllabi is a generous
method for ascertaining the enacted curriculum.
NCTQ is correct in stating that many studies on teacher
preparation use the review of syllabi in the research study.
Such studies, in fact, have been published in peer-reviewed
journals. For example, Pugach and Blanton (2012) used
such methodology in their article published in the Journal of
Teacher Education. The authors, however, did not rank programs, advocate for the adoption of standards based on their
findings, or even mention the programs by name. The purpose of authors using such methods in the research arena is
simply to investigate and inform the conversation, not to
suggest programs meet a standard suggested by their findings. The standard of evidence should be much greater
when identifying programs, adopting standards, and ranking
programs.

Even though some researchers use syllabi reviews to


assess course content, prevalence of a strategy does not make
such a strategy appropriate, particularly when making highstakes decisions about programs such as assigning them
labels regarding effectiveness. NCTQ provides no research
on the degree to which syllabi accurately reflect course content, most likely because there is no easily identifiable
research that addresses this issue. My own exhaustive
research on the topic, in fact, did not result in the identification of any studies directly assessing whether syllabi provide
an accurate indication of the content covered in a course. In
fact, NCTQs own audit panel of experts concluded that
NCTQ should [study] how accurately reading syllabi reflect
the actual content of classroom instruction (NCTQ Audit
Panel, 2013)
While there is no research on whether a review of syllabi
accurately assesses course content, there is a strategy to
ensure a greater degree of accuracy when depicting course
content. Indeed, such studies often include member
checksproviding an opportunity for participants to review
the findings and correct any inaccurate information (Lincoln
& Guba, 1985; Patton, 2002). The purpose of such member
checks is to increase the accuracy of the information gathered as a means to improving the validity of the inferences
drawn from the study (Lincoln & Guba, 1985; Patton, 2002).
Interestingly, NCTQ conducted member checks in prior
reports, but not for the most current report. The failure to
conduct member checks resulted in a large number of factual
errors in the report. Indeed, Darling-Hammond (2013) stated,
It is clear as reports come in from programs that NCTQ staff
made serious mistakes in its reviews of nearly every institution.
Because they refused to check the dataor even share itwith
institutions ahead of time, they published badly flawed
information without the fundamental concerns for accuracy that
any serious research enterprise would insist upon.

Thus, NCTQ abandoned the one strategy that was necessary


to provide credibility to their use of syllabi review.
In addition, as any teacher knows well, engaging students in learning particular content is by no means any
guarantee that students learn the content and are able to
apply the content in real-world situations. NCTQ did not
ascertain such data even cursory data such as licensure/
certification test scores. NCTQ could have also supplemented their data by asking for information about the credentials and qualifications of instructors. While this is no
guarantee of effective teaching, additional information
would be useful in determining the potential quality of
instruction provided.
Insufficient data and response rate. As researchers are well
aware, efforts to generalize conclusions about the universe
of programs is dependent on collecting a sufficient amount
of data from a large enough percentage of programs. NCTQ
clearly did not collect a sufficient amount of data from the

Downloaded from jte.sagepub.com by guest on January 19, 2015

70

Journal of Teacher Education 65(1)

Table 2. Number and Percentage of University-Based Preparation Programs Scored by NCTQ.


Elementary

Secondary

Standard

Scored

% scored

Standard

Scored

% scored

Selection criteria
Early reading
CC Elementary mathematics
CC Elementary content
Student teaching
English language learners
Struggling readers
Classroom management
Lesson planning
Assessment and data
Outcomes
Evidence of effectiveness

1,175
609
712
1,175
659
527
621
420
335
337
496
1

100.0
51.8
60.6
100.0
56.1
44.9
52.9
35.7
28.5
28.7
42.2
0.1

Selection criteria
CC high school content
CC middle school content
Student teaching
Classroom management
Lesson planning
Assessment and data
Secondary methods
Outcomes
Evidence of effectiveness

1,146
1,121
1,146
619
420
333
321
665
497
0

100.0
97.8
100.0
54.0
36.6
29.1
28.0
58.0
43.4
0.0

Note. NCTQ = National Council on Teacher Quality; CC = Common Core.

universe of programs nor did they make any effort at all


to compare the characteristics of programs providing complete data with the characteristics of all programs. As noted
above, only 10% of the more than 1,100 programs identified
by NCTQ fully participated (AACTE, 2013). As shown in
Table 2, the percentage of programs for which NCTQ was
able to collect data was quite low for almost all of the standards. The tables, in fact, reveal that data were collected for
less than one-half of programs for 50% of the elementary
standards and 60% of the secondary standards. Such a low
collection and participation rate certainly calls into serious
question the validity and generalizability of the inferences
made by NCTQ. Undoubtedly, an article submitted to the
Journal of Teacher Education would never be accepted for
publication given such low rates. Yet, few reports on the
NCTQ review even mentioned this as being problematic.
Exclusion of alternative certification programs. NCTQ also
ignores the increasing relevance of alternative providers of
teachers such as Teach for America. In Texas, more teachers are routinely produced by ACPs than traditional undergraduate programs. In California, a substantial percentage
of newly minted teachers are from ACPs. The same is true
for other states and metro areas around the country. NCTQ
notes that 20% of the teachers produced in the U.S. graduate
from ACPs which were excluded from the analysis. This
is a substantial proportion of the teachers and students in
U.S. schools. Given the importance of these programs
particularly in large states such as Texas, California,
New York, and Floridaone has to question why NCTQ
excluded such programs from the study.
Because I have extensive experience in working with
Texas data, particularly as Director of Research for the state
agency tasked with overseeing teacher preparation and licensure, I examined the Texas data on teachers and teacher preparation programs to provide contextual information about the
impact of excluding ACPs.5

With respect to production, 55% of the individuals obtaining an initial teaching certificate from an in-state teacher
preparation program in Texas from 2003 to 2010 were from
ACPs.6 In comparison, only 40% of newly certified individuals were from traditional university-based undergraduate
programs. Moreover, since 2008, 33% of all newly certified
teachers were from privately managed programs that tend to
have very low grade point average (GPA) requirements (or
none at all) and, in some cases, provide no preservice hours
prior to the person entering the classroom despite state regulations that require such hours (Vigdor & Fuller, 2012).
Furthermore, individuals from the privately managed
ACPs tend to be much more likely to fail content certification
tests. For example, Table 3 documents the number of certification test-takers, number of those test-takers passing the test
on the first attempt, and the percentage passing on the first
attempt for selected Texas Examination of Educator Standards
(TExES) certification tests in the 2012 academic year. For
four of the six secondary content tests, individuals from
university-based programs (including university-based ACPs)
had passing rates more than 20 percentage points greater than
individuals from privately managed alternative programs.
There were large differences for the other tests as well.
Such results cannot be explained by basic differences in
teacher demographicsrace/ethnicity, sex, or age. Indeed,
using logistic regression analysis, Vigdor and Fuller (2012)
examined individual certification scores on the TExES tests
administered from 2003 to 2007 by type of certification program. The regression analysis was based on the following
model:
ln (P / (1 P)) = + 1 (PC) + 2 (PT) + t,
where P = the probability of failing the certification test, =
a constant, PC = personal characteristics (sex is female, race/
ethnicity is White, the interaction of sex and race/ethnicity
age, and age squared), PT = program type (private ACP,

Downloaded from jte.sagepub.com by guest on January 19, 2015

71

Downloaded from jte.sagepub.com by guest on January 19, 2015

1,266

1,619
353

296

79.3

81.2
88.7

88.4

4,529 2,781

5,972 3,783
1,443 1,002

5,569 3,906

61.4

63.3
69.4

70.1

204

238
34

177

137

169
32

164

67.2

71.0
94.1

92.7

306

350
44

634

188

222
34

534

61.4

63.4
77.3

84.2

166

217
51

138

101

135
34

96

60.8

62.2
66.7

69.6

359

434
75

464

257

322
65

434

71.6

74.2
86.7

93.5

484

585
101

416

236

304
68

312

48.8

52.0
67.3

75.0

321

439
118

134

147

234
87

92

45.8

53.3
73.7

68.7

Note. TExES = Texas Examination of Educator Standards; AC = alternative certification.


University-based programs include all programs housed at universities, including traditional undergraduate programs, postbaccalaureate programs, and alternative certification programs. State data do not differentiate between the
type of program at each university.
b
AC programs include the following types of programs: school district, regional Education Service Center, privately managed programs, and community colleges. Not private programs include district, regional Education Service
Center, and community college programs. Private AC programs include nonprofit and for-profit entities not associated with the aforementioned entities.

1,596

1,994
398

335

51.7

Science 8-12

2,933 1,515

English 8-12

54.4
62.1

Mathematics 8-12

3,978 2,164
1,045 649

Science 4-8

69.0

Math 4-8

5,234 3,610

English 4-8

University-based
programa
AC programsb
AC programs
(not private)
AC programs
(private)

All generalist

Takers Passed % pass Takers Passed % pass Takers Passed % pass Takers Passed % pass Takers Passed % pass Takers Passed % pass Takers Passed % pass Takers Passed % pass Takers Passed % pass

Generalist 4-8

Program type

Generalist EC-6

Table 3. Number of Test-Takers, Number of Test-Takers Passing on Initial Attempt, and Percentage of Test-Takers Passing on Initial Attempt for Selected TExES Certification
Tests in Texas (2012).

72

Journal of Teacher Education 65(1)

Table 4. Odds Ratios and p-Values for Logistic Regression Analysis of individuals Failing a Texas Certification Examination, 2003-2007.
Program

Generalist EC-4

English 4-8

Math 4-8

Science 4-8

Generalist 4-8

English 8-12

Math 8-12

Science 8-12

Type

Exp(B)

Significance

Exp(B)

Significance

Exp(B)

Significance

Exp(B)

Significance

Exp(B)

Significance

Exp(B)

Significance

Exp(B)

Significance

Exp(B)

Significance

Other ACP
ACP: Private

1.044
1.551

.114
.000

1.078
1.439

.567
.004

0.995
1.287

.943
.001

0.919
1.323

.254
.001

0.880
1.648

.149
.000

1.258
1.548

.002
.000

0.969
1.171

.598
.022

0.833
1.176

.030
.063

Note. ACP = alternative certification program.

other ACP, charter, out-of-state, and cert by exam), and =


year fixed effect. In nonmathematical terms, this equation
reads as: An individuals odds of failing a content certification exam is influenced by that individuals personal characteristics and the type of preparation program attended. Two
program types were used as independent variablesprivate
ACPs and other ACPs (district, region education service center, and community college). In addition, binary variables
indicate that a person was entering from out-of-state or seeking certification by passing the exam after receiving certification in another area. The reference category was individuals
from university-based programs. Unfortunately, the data
did not differentiate between university-based undergraduate, postbaccalaureate, or alternative programs.
As shown in Table 4, the odds of an individual7 from a
privately managed ACP failing various content certification
tests were statistically significantly greater than for individuals from university-based programs. Most disturbing about
these findings is that such teachers are allowed to instruct
students despite not having passed a content exam. In fact,
based on my own analysis of data and my prior experience
working for the state, Texas often allows individuals from
ACPs to instruct students for as many as 3 years without having passed a content examination.
The evidence on these programs is so troubling that even
the President of NCTQKate Walshperceived the private
ACPs in Texas to be substandard. In an email conversation I
had a few years ago with Ms. Walsh, she stated,
[Your study of the placement of teachers from alternative
certification programs into schools by the percentage of poor
and minority students enrolled in the school] very much jives
with the data that Jennifer Pressley collected in Illinois, if those
alt route paths are as awful as you and I both think or know they
are . . . the poorest schools are getting these teachers, no question.
(K. Walsh, personal communication, November 30, 2011)

So, the President of NCTQ believes that many of the


ACPs are awful, but these programs are excluded from the
analysis. Why exclude programs that NCTQ believes are
awfulones that, according to their own responses to a
Texas Education Agency survey of teacher preparation programs, provided zero hours of preservice training and
allowed individuals to enter with less than a 2.0 GPA? Why
exclude programs whose graduates are more likely to fail
Texas certification exams? Doesnt this exclusion simply

give a pass to potentially some of the very worst programs in


the country, some of which produce more than 1,000 teachers per year?
This exclusion calls into question the very intent of the
NCTQ effort. Indeed, a legitimate question to ask NCTQ is if
the purpose of the report is to reduce reliance on universitybased programs and privatize teacher preparation. If they
want to do that, we have the aforementioned evidence from
Texas as to how that might work out.
Relationship between NCTQ stars and program outcomes.Despite NCTQs strong critique that too few programs assess their own effectiveness, NCTQ failed to assess
their own findings by examining the relationship between
their rankings and available evidence on outcomes. One outcome that is often available from state education agencies is
the percentage of graduates passing licensure/certification
examinations on the first attempt or on the most recent
attempt. Texas has long provided information on initial passing rates as part of the Accountability System for Educator
Preparation adopted in 1993. Using these data, I compared
the initial preparation program passing rates on the generalist examination for those seeking certification for early
childhood through the sixth grade in 2010 and the number of
NCTQ stars awarded.
While certification scores are undoubtedly an imperfect
indicator of program quality, the percentage of graduates
passing a content certification examination on the initial
attempt certainly indicates the program has rigorous selection criteria and/or ensures graduates have had adequate
access to content instruction during their preparation. On
stronger footing would be the assertion that a program with a
low passing rate is probably not a high-quality program
despite the particular strategies used by the program. The
most important point is that the inclusion of this metric
would be an outcome measure rather than simply another
input measure.
As shown in Figure 1, the initial passing rates for programs
vary wildly across the NCTQ star rankings. For example, a
program receiving zero stars had an 81.6% passing while one
of the programs receiving two NCTQ stars had a passing rate
of only 38.9%. NCTQ would have us believe the first program is substantially worse than the second programso
much worse, in fact, that NCTQ issued a consumer alert about
the first program. Similarly, a program that received 1.5 stars
had a passing rate of 93.5%, while another program that also

Downloaded from jte.sagepub.com by guest on January 19, 2015

73

% Pass EC-6 TExES Exam

Fuller

if their intent was to improve preparation programs, one would


assume that they would have done due diligence in ensuring
their rankings have some relationship to outcomes.

100
90
80
70
60
50
40
30
20
10
0
0.0

0.5

1.0

1.5
2.0
NCTQ Stars

2.5

3.0

3.5

Figure 1. Percentage of graduates passing the TExES EC-6


certification exam on the initial attempt by NCTQ star ratings.

Summary. Thus, there are at least three major issues with the
methodology used by NCTQ. Most troublesome is the failure of NCTQ to examine the relationship between their rankings and important preparation program outcomes. There are
certainly other issues that have been mentioned by the many
other individuals who have critiqued the study. Ultimately,
all of the methodological issues cast serious doubt on the
findings by NCTQ. Indeed, given the seriousness of the
issues, the findings of the report should be ignored by the
public and policymakers.

Note. TExES = Texas Examination of Educator Standards; EC = early


childhood; NCTQ = National Council on Teacher Quality.

Suggestions for Improvement of Such


Efforts

received 1.5 stars had a passing rate of 16.7%. NCTQ would


have us believe the two programs have the same level of quality with respect to preparing elementary teachers. A simple
calculation of the correlation between the number of stars and
percent passing revealed a coefficient of .178 that was not
statistically significant. Again, while certification scores are
an imperfect indicator of quality, I would argue that access to
the percentage of students passing certification tests would
lead most peopleincluding prospective students and
principalsto reach different conclusions about the programs than NCTQ.
Furthermore, NCTQ could have used the results from
Goldhaber and Liddle (2011)8 to examine the relationship
between the NCTQ stars and student value-added scores for
individual preparation programs in Washington. If they had
done so, they would have discovered that both the University
of Washington-Bothell and Eastern Washington earned one
star from NCTQ, but the value-added scores for the
University of Washington-Bothell were statistically significantly greater than for Eastern Washington in both reading
and mathematics. The difference was particularly large in
reading. If we simply accept the NCTQ review at face value,
we would assume the two programs are of equal quality
when methods to assess effectiveness specifically endorsed
by NCTQ clearly show the two programs are not equally
effective. Again, these results cast serious doubt on the
NCTQ ranking systemeven when using a method endorsed
by NCTQ. A number of other programs in Washington also
received a one star ranking from NCTQ, but there are statistically significant differences between many of the programs.
Thus, NCTQ could have used readily available data to
examine the validity of the inferences made from their rankings
and found serious issues with their ranking system. Any organization serious about assessing program quality would have
undertaken such an effort and reexamined their methodology
in light of the findings. The failure of NCTQ to do so calls into
question their ultimate intent in publishing the rankings. Indeed,

If the NCTQ standards and methodology are flawed, what


should be done? If an organization was actually serious about
improving teacher preparation in this country, they could
adopt a number of strategies to create a relevant report card.
First, and foremost, the organization would need to work
with preparation programs, not against them. A sense of collaboration and common purpose is likely to be more fruitful
than throwing stones.
Second, the organization should create working groups to
review the relevant research in each of the subject areas at
each school level. NCTQ appears to have convened one
working group. No one group of individuals has the overarching knowledge or expertise to create standards for all
subject areas across all school levels and for all types of
students.
Third, the group should arrive at consensus about the
research-based indicators in each area. Only indicators
with a strong research foundation should be included.
Supplementary indicators based on the views of experts in
each of the areas could be adopted as well, but used as
supplementary information rather than as a component of
the ranking system.
Fourth, the group should implement pilot studies to ensure
the quality of their standards and that the standards measure
what they are intended to measure. Thus, as shown above,
the group could determine if rankings in a particular standard
were appropriately correlated with various outcome measures. Of course, as discussed below, the group would ideally rely on input, process, and outcome measures in a
ranking system. Given the current state of data collection and
accessibility, this cannot be done in every state.
More specifically with respect to the type of data to be
collected, I would advocate for the collection of a wide range
of data on the inputs, processes, and outcomes of preparation
programs. Based on my own experience in this area and my
review of the researchparticularly Darling-Hammond
(2006)I would advocate the data described in Table 5 be

Downloaded from jte.sagepub.com by guest on January 19, 2015

74

Journal of Teacher Education 65(1)

Table 5. Proposed Variables for Evaluation of Teacher Preparation Programs.


Outcome variables
Impact on student outcomes
Placement rates of graduates
Retention rates of graduates
Knowledge and skills of graduates
Licensure/certification scores
Performance assessments
Portfolios
Teaching ability of graduates
Observations by principal
Observations by cooperating teacher
Observations by supervising teacher
Teacher self-report on effectiveness
Teacher self-report on self-efficacy
Quality of preparation
Perceptions of graduates
Perceptions of principal

Process variables

Input variables

Effective instruction
Number of required courses
Number of required clinical hours
Quality of mentoring
Change in content knowledge
Change in pedagogical knowledge

Qualifications of instructors
Class size
Supervisor-student teachers ratio
Course content
Coherency of courses
Number of teachers per mentor

collected. Equally important as data collection is data analysis. Many of the outcomes are influenced by factors outside
the control of the program. Thus, appropriate statistical
methodologies would need to be used to accurately assess
outcomes for individual programs.
My list is certainly not exhaustive and not all of the variables are substantiated by a peer-reviewed body of literature.
It does, however, provide ideas for those engaged in efforts
to gather and analyze data on teacher preparation programs
with the intent of improving practice. Data points, however,
regardless of how they are collected, simply do not provide
enough information to make high-stakes decisions about
teacher preparation programs.
Again, I come back to the Texas case because I know it
quite well. Texas was the first state to adopt an educator
accountability system. The system was based purely on data
and almost entirely on the passing rates of graduates on the
state certification exams. A few programs were cited as
unacceptable and in need of improvement and all of those
programsto the best of my knowledgeresponded appropriately and increased their passing rates. After the explosion
of privately managed programs after 2003, numerous complaints from teachers from such programs and principals
employing the graduates of such programs became more pronounced each year. Partially in response to these complaints,
the Texas state legislature passed a bill that created a
Consumer Report Card for all teacher preparation programs
in Texas that included a wealth of information such as
entrance requirements, placement rates, retention rates, and
other data on programs.
Ultimately, in addition to implementing the state-mandated
consumer report card, the state also started making statemandated site visits to programs to conduct audits. These
audits must occur at least once every 5 years. While largely
focused on compliance with state statutes, the audits provided a much more in-depth assessment of the behaviors of

teacher preparation programs. For example, some of the


audits found that ACPs were allowing too many entrants into
the program with less than the minimum GPA (a very low
bar of 2.0). The similar state audit also found that three of the
eight largest ACPs had allowed interns to become teachers
without having obtained the required probationary certificate
necessary for a noncompleter to teach in a public school
(State Auditors Office, 2008). The point is that site visits are
a necessary component of properly judging the quality of
preparation programssimple quantitative data are not sufficient by themselves.

Conclusion
As shown above, there are a number of very serious problems
with the NCTQ report. These issues range from the rationale
for the reviews standards to various methodological problems. Myriad other problems with the review exist that are
well documented by others elsewhere.9 Not mentioned previously is the issue of applying the same set of standards across
all certification areas at all levels and holding all areas and
levels accountable to the same standards. Should research
focus on the effective practices specific to each certification
area and level and then identify the commonalities across all
programs? Or, alternatively, should a set of generic standards
that apply to all certification areas and levels serve as the
focus of research that examines the association between the
standards and outcomes? The answers to these questions are
certainly not clear. However, without sufficient evidence in
all certification areas and levels, NCTQ has established a
common set of standards that apply to all areas and levels.
This risks losing the important differences in effective practice across areas and levels.
Finally, and most disturbingly, the star ranking system
does not even appear to be associated with program outcomes such as licensure/certification test passing rates or the

Downloaded from jte.sagepub.com by guest on January 19, 2015

75

Fuller
aggregate value-added scores in reading or mathematics of
programs. NCTQ could have chosen to ensure some semblance of a correlation between their star system and outcomes using publicly available data from various states, yet
they chose not to. NCTQs refusal to even attempt to validate
their own effort gives substantial support to those who
believe NCTQ has absolutely no intention of helping traditional university-based programs and has every intention of
destroying such programs and replacing them with a marketbased system of providers.
As I have shown above, Texas went down that route and
the results were not pretty. Given their existence relied upon
students enrolling in programs, privately managed alternative programs allowed individuals with less than a 2.0 undergraduate GPA to enter programs. These same programs, not
surprisingly, had abysmally low passing rates on the state
certification examinations. The programs even allowed
uncertified individuals to enter the classroom and instruct
students. Does NCTQ really believe that a Wild West freemarket system will increase the quality of the preparation of
teachers and improve student outcomes?
If NCTQ wants to truly help improve student outcomes
by improving teacher preparation, they should stop using
incredibly weak methods, unsubstantiated standards, and
unethical evaluation strategies to shame programs and start
working with programs to build a stronger research base and
information system that can be used by programs to improve
practice. Yes, teacher preparation certainly has room for
improvement, but throwing rocks from a glass house is not
helpful to anyone but NCTQ and the organizations funding
the NCTQ study.
Given the very shaky foundation upon which the NCTQ
review was built and the shaky motives of NCTQ in conducting the review, the entire review should be discounted by educators, policymakers, and the public. If NCTQ was truly
interested in improving all teacher preparation programs, there
are certainly different pathways that could have been chosen.
For example, NCTQ could have invested resources to
conduct high-quality studies examining the association
between inputs and processes with outcomes. Validity studies could have been conducted in states with easily accessible outcome data such as Louisiana, Florida, North Carolina,
Washington, and Texas. Furthermore, NCTQ could have
chosen to create state working groups to discuss the different
details of available data so that NCTQ employees would not
misinterpret the data and to use member checks to ensure
reported data was accurate. NCTQ could have chosen to
include all programs, not just university-based programs. A
completely alternative pathway could have been to simply
report the findings from the study and, instead of assigning
stars, simply argue that programs and institutions like NCTQ
should work together to improve data collection and analysis
as a means to improve program outcomes.
In the end, NCTQ chose the pathway that rejected the
voices of those educators highly committed to improving
teacher preparation and chose to highlight their own voices

and agenda instead. This has damaged any sense of partnership between teacher preparation programs and NCTQ.
As such, funding should be provided to organizations truly
committed to the improvement of teacher preparation rather
to those that care mostly about their own level of influence.

Appendix
Classification of Research Studies
National Council of Teacher Quality (NCTQ) provides the
following information about the classification of research
studies into strong or weak designs:
Studies with stronger design use some sort of control or
comparison group in an experiment, natural or otherwise, or use
a multiple regression for evaluation. These studies have a sample
size of 100 or more unless the subjects involved are not
individuals (e.g., teacher preparation programs) in which case
the minimum sample size was determined based on the context
of the study and the nature of the subjects. In the case of
experiments, the number of subjects in each of the treatment and
control groups had to total 100 or more to classify the relevant
study as having strong design. In cases in which dyadic groups
were analyzed, 50 participants constituted the minimum sample
size for categorization as having strong design.
Studies with weaker design have no comparison or control,
are often simply case studies with potential selection bias and
rely on survey or otherwise qualitative data. These studies have
a sample size of fewer than 100.
Some studies with control groups were categorized as having
weak design when the control group was inappropriately selected
or the study did not provide enough details about the control
group to rule out significant differences between the treatment
and control groups.
In the case of studies that had both strong and weak characteristics,
categorization was determined by whether the research would
be useful for teacher educators, teacher education program
administrators and/or policymakers. If it seemed potentially
useful, it was categorized as strong design.

Declaration of Conflicting Interests


The author(s) declared no potential conflicts of interest with respect
to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.

Notes
1. For examples of these critiques, see http://aacte.org/resources/
nctq-usnwr-review/responses-to-2013-nctq-us-news-a-worldreport-review.html
2. Goldhaber and Liddle (2011) found that the inclusion of
school fixed effects did not substantially later the rankings of

Downloaded from jte.sagepub.com by guest on January 19, 2015

76

Journal of Teacher Education 65(1)

preparation programs in Washington. However, Washington is


a much smaller and geographically compact state than Florida.
Furthermore, the models used in the two studies were different
which could account for the contrasting findings.
3. National Council on Teacher Quality (NCTQ) claims that syllabi
from all required courses were requested. However, clearly not
all courses were reviewed. This is highly problematic as NCTQ
may conclude that a particular topic is not taught when, in fact,
that topic was taught in a course that was not even reviewed
4. See Figure 38 on page 77 in the NCTQ report.
5. These finding presented below will be included in a paper to be
submitted to the Journal of Teacher Education in the coming
months.
6. This is based on my analysis of all individuals (194,008) obtaining initial certification in Texas from in-state teacher preparation programs or though school district emergency permits from
2003 through 2010.
7. Only teachers with an identifiable preparation program were
included in the analysis. Thus, test-takers with no identified program were excluded.
8. Ironically, NCTQ cites a version of this article identified as
Goldhaber, D., et al., Assessing Teacher Preparation in
Washington State Based on Student Achievement (paper presented at Association for Public Policy Analysis & Management
conference) as evidence that programs differ in terms of their
impact on student achievement. Yet, they failed to use the information in the paper to check the validity of their own findings.
9. For examples of these critiques, see http://aacte.org/resources/
nctq-usnwr-review/responses-to-2013-nctq-us-news-a-worldreport-review.html

References
American Association of Colleges of Teacher Education. (2013,
June 18). NCTQ review of nations education schools deceives,
misinforms public. Washington, DC: Author. Retrieved from
http://aacte.org/news-room/press-releases/nctq-review-ofnations-education-schools-deceives-misinforms-public.html
Boyd, D. J., Grossman, P. L., Lankford, H., Loeb, S., & Wyckoff,
J. (2009). Teacher preparation and student achievement.
Educational Evaluation and Policy Analysis, 31(4), 416-440.
Coggshall, J. G., Bivona, L., & Reschly, D. J. (2012). Evaluating
the effectiveness of teacher preparation programs for support
and accountability. Washington, DC: National Comprehensive
Center for Teacher Quality.
Darling-Hammond, L. (2006). Assessing teacher education the usefulness of multiple measures for assessing program outcomes.
Journal of Teacher Education, 57(2), 120-138.
Darling-Hammond, L. (2013, June 19). Why the NCTQ teacher
prep ratings are nonsense. Palo Alto, CA: Stanford Center for
Opportunity Policy in Education.
Dooley, C. M., Meyer, C., Ikpeze, C., OByrne, I., Kletzien, S.,
Smith-Burke, T., . . .Dennis, D. (2013). LRA response to the
NCTQ Review of Teacher Education Programs. Retrieved
from http://www.literacyresearchassociation.org/pdf/LRA%20
Response%20to%20NCTQ.pdf
Eduventures. (2013, June 18). A review and critique of the National
Council on Teacher Quality (NCTQ) methodology to rate schools of
education. Retrieved from http://www.eduventures.com/2013/06/areview-and-critique-of-the-national-council-on-teacher-qualitynctq-methodology-to-rate-schools-of-education/

Elliot, P. (2013, June 18). Too many teachers, too little quality.
Yahoo News. Retrieved from http://news.yahoo.com/reporttoo-many-teachers-too-little-quality-040423815.html
Goldhaber, D. (2007). Everyones doing it, but what does teacher
testing tell us about teacher effectiveness? Journal of Human
Resources, 42(4), 765-794.
Goldhaber, D., & Liddle, S. (2011). The gateway to the profession: Assessing teacher preparation programs based on student achievement (Working Paper No. 2011-2.0). Seattle, WA:
Center for Education Data and Research.
Harris, D. N., & Sass, T. R. (2011). Teacher training, teacher quality, and student achievement. Journal of Public Economics,
95(7), 798-812.
Kamras, J., & Rotherham, A. (2007). Americas teaching crisis.
Democracy. Retrieved from http://www.democracyjournal.
org/5/6535.php?page=all
Layton, L. (2013, June 18). University programs that train
U.S. teachers get mediocre marks in first-ever ratings. The
Washington Post. Retrieved from http://www.washingtonpost.com/local/education/university-programs-that-train-usteachers-get-mediocre-marks-in-first-ever-ratings/2013/06/17/
ab99d64a-d75b-11e2-a016-92547bf094cc_story.html
Levine, A. (2013, June 21). Fixing how we train U.S. teachers. The
Hechinger Report. Retrieved from http://hechingerreport.org/
content/fixing-how-we-train-u-s-teachers_12449/
Lincoln, Y. S., & Guba, E. G. (1985). Establishing trustworthiness.
In Y. S. Lincoln & E. G. Guba (Eds.), Naturalistic inquiry (pp.
289-331). Newbury Park, CA: SAGE.
Mihaly, K., McCaffery, D., Sass, T., & Lockwood, J. R. (2012).
Where you come from or where you go? Distinguishing
between school quality and the effectiveness of teacher preparation program graduates (CALDER Working Paper No.63).
Washington, DC: CALDER and American Institutes for
Research.
Monk, D. H. (1994). Subject area preparation of secondary
mathematics and science teachers and student achievement.
Economics of Education Review, 13, 125-145.
Montano, T. (2013, June 28). Debunking NCTQs teacher prep
review. California Teachers Association, Retrieved from
http://www.calitics.com/showDiary.do;jsessionid=78C550C8
45B509AFEDB0BA9C1A9DB64E?diaryId=15104
National Council on Teacher Quality. (2013a). Standards.
Washington, DC: Author. Retrieved from http://www.nctq.org/
teacherPrep/ourApproach/standards/
National Council on Teacher Quality. (2013b). Teacher prep
review. Washington, DC: Author.
National Council on Teacher Quality Audit Panel. (2013). Audit
panel statement on the NCTQ teacher prep review. Washington,
DC: National Council on Teacher Quality. Retrieved from
http://nctq.org/dmsView.do?id=2181
Patton, M. Q. (2002). Qualitative research and evaluation methods
(3rd ed.). Thousand Oaks, CA: SAGE.
Pearson, P. D., & Goatley, V. (2013, July 2). Response to the NCTQ
teacher education report. Newark, DE: International Reading
Association. Retrieved from http://www.reading.org/general/
Publications/blog/LRP/literacy-research-panel/2013/07/02/
response-to-the-nctq-teacher-education-report
Pugach, M. C., & Blanton, L. P. (2012). Enacting diversity in dual
certification programs. Journal of Teacher Education, 63(4),
254-267.

Downloaded from jte.sagepub.com by guest on January 19, 2015

77

Fuller
Sanchez, C. (2013, June 18). Study: Teacher prep programs get failing marks. National Public Radio. Retrieved from http://www.
npr.org/2013/06/18/192765776/study-teacher-prep-programsget-failing-marks
Stanovich, P. J., & Stanovich, K. E. (2003). Using research and
reason in education: How teachers can use scientifically
based research to make curricular and instructional decisions.
Portsmouth, HN: RMC Research Corporation.
State Auditors Office. (2008). An audit report on the Texas education agencys oversight of alternative teacher certification
programs. Austin, TX: Author.
Vigdor, J., & Fuller, E. J. (2012). Examining teacher quality in
Texas. Unpublished expert witness report for Texas school
finance court case: Texas taxpayer and student fairness
Coalition v. Robert Scott and State of Texas.

Wineburg, M. S. (2006). Evidence in teacher preparation:


Establishing a framework for accountability. Journal of
Teacher Education, 57(1), 51-64.
Zeichner, K., & Liston, D. (1990). Traditions of reform in U.S.
teacher education. Journal of Teacher Education, 41(2),
3-20.

Author Biography
Edward J. Fuller is an associate professor in the Department of
Educational Administration at Penn State University. He also
serves as the director for the Center for Evaluation and Education
Policy Analysis and associate director of policy for the University
Council for Educational Administration.

Downloaded from jte.sagepub.com by guest on January 19, 2015

You might also like