A constellation of interest groups, including education schools, powerful foundations like Carnegie and Rockefeller, and a variety of professional associations, has been urging states to adopt stricter licensing and certification requirements for teachers. They not only share a deep-seated belief that certified teachers are better teachers; they also claim that their views are supported by a comprehensive body of research. Academics like Stanford University professor Linda Darling-Hammond and associations like the National Council for the Accreditation of Teacher Education (NCATE) and the National Commission on Teaching & America’s Future (NCTAF) routinely refer to the “dozens” or “hundreds” of studies showing that certified teachers outperform their uncertified peers. For instance, from an NCATE newsletter published in 1999: “Over 100 studies show that qualified teachers outperform those with little or no preparation in helping students learn.” And from NCTAF’s 1997 report Doing What Matters Most: Investing in Quality Teaching, authored by Darling-Hammond: “More than 200 studies contradict the long-standing myths that-anyone can teach’ and that-teachers are born and not made.'”
Such statements lend a patina of scientific objectivity to the rules and regulations sought by these organizations. After all, who could oppose tighter licensing requirements if they were shown to lead to higher student achievement? Moreover, the use of phrases like “hundreds of studies show” makes it seem as though the question is closed, the debate over. Who could doubt findings that have been replicated “hundreds” of times? Checking the veracity of such statements would be a painstaking task, and why doubt them when they’re made by respected scholars? Thus there has been no comprehensive effort by scholars in the field to drill down through these layers of evidence to discover what, if anything, they say about the merits of certifying teachers.
One of the primary goals of the Baltimore-based Abell Foundation, where I am a senior policy analyst, is to improve the public schools that serve poor children. For years my colleagues and I have witnessed the frustration of principals who were prevented by the state department of education from hiring promising teaching candidates because they hadn’t met the requirements for certification. We wondered if the training provided by schools of education was so valuable that it had to be required, as every state currently does. So we began a search for the evidence linking certification with higher student achievement.
The Maryland Department of Education got us started with 12 documents that the state believes demonstrate the value of certification. We were immediately struck by the paucity of evidence found in these 12 documents and decided to keep digging. Finding many of the studies cited by advocates proved difficult. We often had to track down the author in order to obtain studies that no longer are, or never were, available through a university library. A few unpublished studies ultimately proved impossible to find, but we eventually looked at every published study, some going as far back as the 1940s, that supporters of teacher certification cite. We also retrieved many unpublished studies, mostly doctoral dissertations. Even though these dissertations had not undergone the peer-review process that most academic fields consider a prerequisite, we were willing to consider any evidence. In the end, we closely examined the findings of well over 200 studies, literature reviews, meta-analyses, and advocacy pieces.
We focused on research that examines how the various attributes of teachers affect student achievement, counting as legitimate evidence only those studies that used this measure of teachers’ effectiveness. Other measures of teachers’ effectiveness, such as principals’ or other supervisors’ ratings of their teachers’ performance, are highly unreliable and studies using such measures are often characterized by poor research design.
Certification carries a high price, both financially and in the burdens it imposes on would-be teachers. About one-third of the coursework required of a prospective teacher in college is under the purview of a school of education. Certification requirements also dictate how states will appropriate millions of tax dollars to universities and colleges, and they determine the structure of incentive programs so critical to school districts in times of teacher shortages. Access to the profession is littered with obstacles for some of the most promising candidates: middle-aged professionals who wish to switch careers and talented college graduates who didn’t major in education. Even in states that appear to offer a bypass around certification, these alternative programs are structured so that retrofitting-taking education coursework while teaching-is almost always required. To justify making the path to the teaching profession so difficult, certification needs to yield significant benefits in terms of student learning. Anything less, and states ought to question whether education schools should be the predominant path to becoming a teacher, rather than just one of many.
The body of research on the effects of teacher certification is astonishingly poor. Some of the most oft-cited studies had such serious flaws that no properly trained researcher would take them seriously. Early in our project we were amazed by the extent to which research is misquoted, misinterpreted, and misrepresented. Several researchers complained that their work, though widely cited, did not appear ever to have been read, considering the conclusions that were reached.
The examples presented here illustrate the seven main ways in which the supporters of teacher certification have twisted and turned the research findings to suit their policy purposes. The entire analysis of more than 200 studies and citations to the studies listed here can be found in the full study and in the unabridged web version of this essay, at www.educationnext.org.
–Research that seems to support teacher certification is selectively cited, while research that does not is overlooked.
In the 1997 NCTAF report Doing What Matters Most, Darling-Hammond describes the findings of a study (Ferguson 1991) that examined various characteristics of teachers and their relationships to student achievement. The study demonstrated that certain attributes play an important role in raising student achievement. Darling-Hammond cites this study as proof of the paramount importance of a teacher’s possessing knowledge about “teaching and learning.” As a result, she urges states to invest in more formal teacher education in order to ensure that teachers obtain this knowledge. She claims that Ferguson found that “every additional dollar spent on more high-qualified teachers netted greater increases in student achievement than did less instructionally focused uses of school resources.”
Yet Ferguson’s study did not show that a teacher’s pedagogical training had an effect on student achievement. It was teachers’ verbal ability that had a very large effect. At no point did Ferguson suggest that his research argues for states investing in more formal teacher education, though he did try to figure out how money could be used to attract teachers with higher SAT scores.
To measure teacher’s verbal ability, Ferguson used a literacy test that was administered in 1986 to all public school teachers in Texas. He found a surprisingly large correlation between how well teachers did on this relatively easy test (the pass rate was 97 percent) and their students’ achievement on a standardized test. Ferguson distinguishes this literacy test from tests like the National Teacher Exam (NTE), because it did not test pedagogical knowledge, a distinction that Darling-Hammond ignores. He specifically notes that previous research has found “scarce and weak” evidence of a relationship between tests of pedagogical knowledge and student achievement. Darling-Hammond wrongly states that the Texas test was a licensing test (actually it was a recertification test given to already hired teachers) and “that it tested both basic skills and teaching knowledge.”
Why does Darling-Hammond feel justified in claiming that the test proves the importance of teaching knowledge? In fact, the Texas test did contain a section entitled “professional knowledge.” However, Ferguson (and, indeed, anyone who ever took the test) understood this section as a test of basic vocabulary. In a series of ten multiple-choice questions, teachers were asked to pick the right definition of such phrases as “standardized tests,” “classroom management,” and “certification.” It is certainly incongruous to suggest, as Darling-Hammond does, that knowing the definitions of these basic terms provides proof that prospective teachers benefit from sitting through 30 credit hours of education coursework or that states should invest their resources in even more formal training.
Darling-Hammond cites another study (Greenwald, Hedges, and Laine 1996) in support of her contention that states need to increase their investments in teacher education. She asserts that the study found that “spending on teacher education swamped other variables as the most productive investment for schools.”
Yet, after reviewing 60 studies, Greenwald, Hedges, and Laine’s firmest and most significant conclusion was that verbal ability had by far the most significant effect on student achievement. In a separate e-mail, one of the authors confirmed this finding: “Teacher ability (which was generally measured as teacher’s verbal ability),” Hedges wrote, “seems to show the strongest and most replicable effect on achievement.” However, Darling-Hammond, as she so often does, overlooks this central finding. Her reference to spending on teacher education was based on a tangential exercise the authors engaged in, in which they speculated on how school resources could be best invested in order to improve student achievement, including what would happen if more money were invested in teacher education.
But Darling-Hammond never acknowledges a key failing of this exercise. The authors stated that they were unable to come up with particular ways in which school districts could spend money to improve the average verbal ability of their teachers (though other researchers such as Ferguson and Manski have suggested that higher teacher salaries might do so), so they left out possible ways that money might be spent to raise verbal ability. Teacher education was only able to “swamp” all other spending variables because the authors couldn’t figure out how to improve the most important teacher attribute.
–Analyses are padded with imprecise measures in order to conceal the lack of evidence in support of certification.
Darling-Hammond, in Teaching and Knowledge: Policy Issues Posed by Alternative Certification for Teachers (1992), asserts that there are “consistently positive relationships between student achievement in science and the teacher’s background in both education courses and science courses,” and she cites four studies in support of this statement. In fact, not one of the four studies looked at education coursework as an independent variable. Of course, if a researcher only reports that the combination of education coursework and science coursework improves student achievement, it is impossible for the reader to discern if both or only one of the variables were responsible for the effect.
Davis (1964) studied the effect of teachers’ science coursework and their participation in National Science Foundation summer institutes. It did not study the effect of teachers’ coursework in education.
Taylor (1957) found an effect only when education coursework was combined with science coursework. No effect from education coursework was found when the variable was isolated.
Druva and Anderson (1983) found that science coursework improved student achievement significantly, but that education coursework had no effect when the variable was isolated.
Perkes (1967) can be considered a stand-off at best. Students of teachers with more education coursework under their belts scored higher on a test of higher-order thinking, but lower on a science achievement test.
Another example: in Teacher Quality and Student Achievement: A Review of State Policy Evidence (1999), Darling-Hammond reviews what the research says about the relationship between student achievement and many different teacher variables, including teacher’s general academic ability, intelligence, subject-matter knowledge, pedagogical knowledge, experience, and certification status. Her intention is to systematically prove her central thesis, that “measures of teacher preparation and certification are by far the strongest correlates of student achievement.” While conceding that teachers’ knowledge of their subject is important, she claims that the findings relating subject-matter knowledge to teacher effectiveness “are not as strong and consistent as one might suppose.” She states that five studies have found “no consistent relationship between the subject-matter tests of the National Teachers’ Exam and teacher performance as measured by student outcomes or supervisory ratings.”
This is misleading. Not one of the five studies found a negative relationship between the NTE subject-matter tests and student outcomes. First it is important to understand the distinction between the Core Battery portion of the NTE and the subject-matter portion of the NTE, both of which have been replaced by the Praxis exams. The Core Battery was a test of teachers’ basic skills and knowledge of pedagogy. The subject-matter portion was a test of a teacher’s knowledge of the subject area that he or she was going to teach.
Andrews, Blackmon, and Mackey (1980) found a positive relationship between teachers’ scores on the NTE English and elementary subject-matter tests and supervisors’ ratings. The only negative relationship the authors found was between teachers’ scores on the NTE physical education and special-education tests and supervisors’ ratings of their performance.
Ayers and Qualls (1979) examined the relationship between teachers’ scores on the NTE and how their students rated their performance. But students’ ratings are extremely unreliable; students may rate a teacher who gives them no homework higher than a teacher who challenges them.
Quirk, Witten, and Weinberg (1973), a literature review on the effect of the NTE, found only one study that looked at the relationship between the NTE and student achievement, and it was written in 1947. Though too old to be considered very relevant, it reported a positive relationship between the teachers’ scores on the NTE and student achievement.
Haney, Madaus, and Kreitzer (1987) is another literature review, recycling the same findings found in Ayers and Qualls and in Quirk, Witten, and Weinberg. (One of the authors, George Madaus, told us that he was not aware of any research showing a negative correlation between the NTE subject-matter test and student achievement.)
Summers and Wolfe (1977) was the only study of the five to explore the relationship between teachers’ performance on NTE subject-matter tests and student achievement. They found a largely positive effect. (The negative effect to which Darling-Hammond refers was probably what Summers and Wolfe noted as the “perversely” negative relationship between 6th grade teachers’ scores on the NTE Core Battery, a test of pedagogy and basic skills, and their students’ achievement.)
–Researchers focus on variables that are poor measures of the qualities they are interested in, sometimes ignoring variables that are better measures.
In a 1985 literature review that is still extensively cited, Evertson, Hawley, and Zlotnik argued that the concern with whether teachers know their subject matter is overblown. They cited three studies that supposedly found that there is “no or [a] negative relationship between teacher knowledge (as measured by GPA and standardized tests) and student achievement.”
One of these three studies is a 1977 dissertation by Eisenberg. It’s true that Eisenberg failed to find a correlation between teachers’ GPAs and their students’ achievement, but GPA is considered a fairly crude measure of a teacher’s subject-matter knowledge; teachers presumably take courses in college other than those in their subject area, courses that might drag their GPAs down. Evertson, Hawley, and Zlotnik neglect to report that Eisenberg found a significantly positive effect from a better measure of teachers’ knowledge of mathematics: their knowledge of algebraic concepts and postgraduate coursework in calculus. The other two studies that Evertson, Hawley, and Zlotnik cited-both unpublished-were similarly misreported.
Instead of using standardized measures of student achievement, advocates sometimes design their own assessments in order to prove certification’s value. Ashton and Crocker (1983) cited Copley’s 1975 dissertation as evidence that beginning teachers are better prepared by virtue of having taken coursework in pedagogy and educational methods. The assessment Copley used was a survey designed by doctoral students in education. It asked principals to rate their new teachers on factors that are unrelated to student achievement, such as how often they participated in professional meetings, if they have a good disposition, and if they have a sense of humor.
Sometimes, when a study doesn’t use measures of student achievement, advocates simply say it does. Denton and Lacina’s 1984 study is cited repeatedly by advocates of certification because it was supposed to have found a positive relationship between formal teacher preparation and student achievement (Evertson, Hawley, and Zlotnik 1985; Darling-Hammond 1999). Yet Denton and Lacina never even looked at student achievement; their study measured only the morale of student teachers and the ratings of student teachers by their supervisors.
–Research is cited that is old and often irretrievable.
The fact that research is relatively old does not automatically make it irrelevant. Yet older studies should be regarded skeptically. There are a number of reasons: 1) student achievement probably wasn’t used as the measure of teacher effectiveness; 2) before the advent of the modern computer, in the mid-1960s, some of the more sophisticated analyses were not feasible; 3) the structure and makeup of schools change, making the findings less applicable to the current situation; 4) most important, older studies may not control for critical variables, such as students’ backgrounds or past achievement.
In Reforming Teacher Preparation and Licensing: Debating the Evidence (2000), Darling-Hammond cites numerous studies to support the statement: “Knowledge about teaching and learning shows even stronger relationships to teaching effectiveness than subject-matter knowledge.” However, only four of the studies were published after 1980; the remaining 12 were published between 1950 and 1979.
–Research that has not been subjected to peer review is treated without skepticism, and there is a particularly heavy reliance on unpublished dissertations.
The process of peer review, having researchers’ fellow experts review a study before it is deemed worthy of publication, is practiced in all fields of serious scientific study. Yet many assertions about the benefits of teacher certification are largely, if not exclusively, dependent on the evidence provided in unpublished dissertations, papers delivered at conferences but never published, or articles published in the many education journals, like Phi Delta Kappan, that are not “refereed” (they don’t have a system of blind peer review).
For instance, Ashton and Crocker (1987) cite numerous studies on teacher preparation to support their conclusion that coursework in education makes teachers more effective than coursework in their subject matter does. They claim that 9 of the 14 studies they found showed that subject-matter coursework made no difference. It took a careful read of the footnotes to discover that all but two of these nine studies were dissertations, unpublished and unavailable to scrutinize because they ranged from 25 to 45 years old. The two published studies were equally problematic. Rothman, Welch, and Walberg (1969) studied only 35 teachers who were accepted into an elite project developed by the Harvard Physics Project. Generalizing the findings to the greater population of teachers is inappropriate. The other study (Perkes 1967) produced mixed results: students whose teachers took more subject-matter coursework reported higher scores on an achievement test, but lower scores on the STEP, a test of higher-order thinking. Another example: Druva and Anderson (1983) reported a largely positive link between education coursework and “successful teaching,” but 54 of the 65 studies reviewed were dissertations or unpublished articles, all written between 1966 and 1975.
The difficulty of tracking down some of these studies is worth noting. We tried to find one frequently cited unpublished paper, delivered at a 1990 American Educational Research Association (AERA) conference in Boston, which was written by Gomez and Grobe. It was not available from the archives of the AERA and could not be located through the services of a university library. Even the authors no longer had a copy of the paper, and none of the researchers who cited the study was able or willing to produce the report after numerous requests. The authors’ recall of their findings did not correspond to Darling-Hammond’s presentation of their findings.
-Studies that support teacher certification routinely violate basic principles of sound statistical analysis that other academic disciplines take for granted; methodological errors go unchallenged.
First, studies often do not control for variables that are critical to understanding student performance. For example, Darling-Hammond, in NCTAF’s 1997 report Doing What Matters Most, asserts, “Students will achieve at higher levels and are less likely to drop out when taught by certified teachers.” She supports this claim with three studies, none of which controlled for students’ family backgrounds. Studies of teachers’ influence on student achievement need to include controls for students’ socioeconomic background, as this variable appears more important than any other variable in determining student achievement. (When citing these studies a second time in the 1999 report Teacher Quality and Student Achievement: A Review of State Policy Evidence, Darling- Hammond acknowledged this fact. It would have been more appropriate not to mention these studies at all.)
Second, conclusions are sometimes based on sample groups that do not mirror the general population. The work of Edward Begle, a respected mathematician, is one of eight studies cited by Evertson, Hawley, and Zlotnik (1985) as showing that little evidence exists to support increasing teachers’ knowledge of their subjects beyond what is typically required for certification. It’s true that Begle (1972) found that the number of mathematics courses taken by a teacher stopped making much of a difference at a certain point. But he points out a critical limitation of these findings that others disregard, including Evertson, Hawley, and Zlotnik: the teachers in the study were an elite group of science teachers who had been accepted by the National Science Foundation Summer Institute. They also felt comfortable enough with their knowledge of mathematics to volunteer to take the math test used for the study. It should come as no surprise that variation in the number of math courses taken by teachers in this group would not lead to different outcomes in student achievement. Differences in the number of math courses taken by teachers in the larger population of science teachers could still lead to variation in student achievement.
Third, conclusions are sometimes based on samples that are simply too small to produce results that are reliable or that can be generalized to the larger population. The findings from a study that included only a few teachers cannot yield reliable findings that apply to a larger population. Here is a sampling of studies on certification and the number of teachers that were involved: Bullough, Knowles, and Crow, 3 teachers; Davis, 29 teachers; Eisenberg, 28 teachers; Grossman, 3 teachers; Hall, 38 teachers; Hice, 40 teachers; Howe, 51 teachers; Hurst, 55 teachers; Lins, 27 teachers; Perkes, 32 teachers; McNeil, 38 teachers; Rothman, Welch, and Walberg, 35 teachers; Thoman, 29 teachers.
Fourth, some studies suffer from serious statistical errors known as aggregation bias that can lead to misinterpretations. For instance, in 1999, Darling-Hammond published a widely publicized study of the relationship between student performance on the 4th and 8th grade National Assessment of Educational Progress (NAEP) and the certification status of each state’s teachers. She found that states with higher student achievement also employed a greater percentage of certified teachers. The problem is the inability to match students with their teachers when using data that are aggregated at the state level. States differ in ways that make drawing conclusions about causation quite troublesome. For instance, states with higher percentages of certified teachers may also have strong accountability systems that focus their schools on student achievement. Or the families in states with higher percentages of certified teachers may place more emphasis on education, thereby raising student achievement. In short, many variables besides teacher certification might explain why scores were higher in some states than in others, but Darling-Hammond was able to control for very few of them.
In the study, Darling-Hammond acknowledges the likely distortions in her analysis: “Aggregating data to the state level produces different results than one would find if one looked at similar kinds of data at the individual student, teacher, school, or district level.” However, even the concession that the findings probably aren’t accurate doesn’t stop her from maintaining that the data are still useful “for the purposes of assessing broad policy influences at the state level.” In effect she is stating that the data, accurate or not, should be sufficient for making public policy.
The Abell Foundation study is not the first to uncover the lack of rigorous and legitimate evidence on teacher certification. Charged by the U.S. Department of Education with combing the existing research on teacher preparation and subjecting it to scientific standards used in other fields, scholars with the Center for the Study of Teaching and Policy at the University of Washington (Wilson, Floden, and Ferrini-Mundy 2001) eliminated all but 57 studies written in the past 20 years. However, most of these 57 studies were “interpretive” case studies involving only a few teachers. The actual number of longitudinal or quasi-longitudinal studies that controlled for students’ backgrounds and used student achievement as the measure of whether teacher preparation made a difference was far fewer. Under the scholars’ own standard, only six studies containing any evidence that teacher certification is effective were left standing, a fact that was omitted in Wilson, Floden, and Ferrini-Mundy’s text.
Supporters of certification attribute the low student achievement in the nation’s poorest school districts at least partially to the high number of uncertified teachers working in these districts. However, this view remains unsupported by sound research. Moreover, the even lower number of certified teachers employed by the nation’s inner-city parochial schools and elite private schools would appear to discount this theory. If certification were crucial, certainly these schools would privilege certified teachers in order to satisfy their tuition-paying families. The theory that teacher certification leads to high-quality teaching is based more on what we think ought to be true (shouldn’t coursework in pedagogy and educational methods create better teachers? Shouldn’t teachers have to go through education school, just as lawyers go to law school and doctors go to medical school?) than on controlled experimentation. It is a leap of faith taken without the benefit of supporting evidence. The evidence, it turns out, is astonishingly deficient.
Reduced to their essence, the existing routes to teacher certification consist of little more than counting course titles on the college transcripts of prospective teachers. The process is incapable of providing any insight into an individual’s ability, intellectual curiosity, creativity, affinity for children, and instructional skills. Acting as a very crude proxy for teacher quality, the process is incapable of identifying significant, justifiable reasons for denying uncertified candidates access to the profession. A highly able candidate who does not take a required course is no more allowed to teach than the candidate who is poorly educated and unable to pass the teacher’s examination.
Determining who is qualified to teach is a task fraught with ambiguity and nuance, far more difficult than the mechanical process of counting a teacher’s coursework suggests. Given the faulty principles on which certification is based, it is not surprising that its value cannot be proven. Regulatory policy cannot supplant the need for human judgment.
-Kate Walsh is a senior policy analyst at the Baltimore-based Abell Foundation.