For as long as there have been tests, there has been cheating. Consider the so-called cribbing garment, an undergarment that was worn by examinees during the administration of civil-service examinations in China more than 1,000 years ago. On the fabric, examinees would meticulously inscribe 722 essay responses to likely exam questions. Until recently, the phenomenon of cheating had been limited mainly to test takers. Thus efforts to ensure test security and the reliability of results focused mainly on detecting and preventing cheating by students. But the increasing use of tests to assess the performance of not just students but also teachers, principals, schools, and the education system as a whole has engendered a growing trend: that of educators themselves attempting to subvert accountability systems by artificially inflating student test scores. In short, the problem of cheating has spread from the examinees to the examiners.
Cheating by educators comes in many forms, ranging from the subtle coaching of students to the overt manipulation of test results. For instance, a colleague of mine tells of a principal who would begin each morning’s announcements with a greeting to students, such as, “Good morning students, and salutations! Do you know what a salutation is? It means ‘greeting,’ like the greeting you see at the beginning of a letter.” Students learned the meaning of words like “salutation” from the principal’s daily announcements; they probably never learned that his choice of words like “salutation” was done with the vocabulary section of the state-mandated, norm-referenced test in hand. A recent U.S. News & World Report article described a case in Ohio, where one educator is accused of physically moving a student’s pencil-holding hand to the correct answer on a multiple-choice question. Another widely reported case involved a principal in Potomac, Maryland, who stepped down amidst charges that she had gone through students’ test booklets in the classroom and called them up to change or elaborate on their answers.
This isn’t just a recent phenomenon. More than a decade ago, at an evening reception following a conference for school district superintendents in one midwestern state, I happened upon a conversation among several superintendents who, with cocktails in hand, were chuckling and winking about how their quality-control procedures for student testing involved “pre-screening the kids’ answer sheets for stray marks.” What was so funny—I found out later from one of the superintendents—was that stray marks included things like wrong answers. Wink, wink. This practice apparently continues. In Texas, where the accountability system is particularly rigorous, 11 school districts were investigated for an unusually high number of erasures on students’ answer sheets in 1999.
|Critics of accountability view cheating as the natural, and not so reprehensible, result of placing undue emphasis on the results of a single test. Some even view cheating as a kind of civil disobedience.|
Despite their headline-grabbing nature, such blatant cases of cheating are probably rare. There is usually more subtlety involved, such as when a teacher prods a student to review his or her answer: “Why don’t you take another look at what you wrote down for number 17?” Some examiners cheat by failing to monitor their students properly while proctoring an exam. Others cheat by omission, such as when a teacher reminds students who are likely to attain low scores that it would be okay for them to be absent on the day of the test. A more sophisticated version of cheating by omission occurred in the Austin, Texas, school district in 1999. School administrators entered incorrect student-identification numbers on the answer sheets of low-scoring students, which invalidated their scores and thus raised the schools’ average performance. In states that force schools to give absent students a score of zero for reporting purposes (which eliminates the incentive to encourage them to stay home), all students are encouraged to attend school on test day. But some students are afforded “testing disability accommodations” such as an individual aid, reader, extra time, or other assistance that isn’t usually a part of the student’s educational experience.
Perhaps the largest cheating scandal to date involved teachers and principals in the New York City school district. In December 1999, Edward Stancik, the district’s special commissioner of investigation, released an exhaustive study that found that cheating by 12 educators was “so egregious that their employment must be terminated and they should be barred from future work with the [Board of Education].” For instance, one teacher would have her students first write their answers on a piece of scrap paper. She would then correct their answers before they bubbled in their official answer sheets. The report named another 40 educators who were recommended for disciplinary action; 35 of them had engaged in actions that the investigators judged serious enough to warrant potential termination. The report concluded that the school district had known about the extensive cheating by educators “for years,” and that “educators were not held fully liable for their misconduct.” A follow-up report named another ten educators who had engaged in seriously inappropriate behaviors during testing in New York City, some of them so blatant—such as writing answers to test questions on the chalk board—that immediate termination of employment was recommended.
Some opponents of testing see cheating by educators as a reason to abandon high-stakes accountability systems. For instance, last year Alfie Kohn, a prominent critic of testing, told Congressional Quarterly, “The real cheating going on in education reform is by those who are cheating students out of an education by turning schools into giant test-prep centers.” These critics view cheating as the natural, and not so reprehensible, result of placing undue emphasis on a metric—standardized tests—that yields an incomplete picture of both student and teacher performance. Some even view cheating as a kind of civil disobedience. But, to collect useful information about educational progress, testing is an indispensable if imperfect tool. And if tests are to yield useful information, their validity must be ensured. The answer to cheating is not to abandon accountability. The answer is to limit if not eliminate the cheating. What follows is a survey of what we know about the extent of cheating and some proposals to guard against the subversion of accountability systems.
Cheating can be defined as any action that violates the rules for administering a test. Commercial test publishers produce carefully scripted directions and clear guidelines for administering their tests. The guidelines lay out, in detail, all the actions that would compromise the test results. Similar instructions and rules accompany the customized tests that form the bedrock of many state accountability systems. Some states have strongly worded professional codes of ethics that explicitly define the responsibilities and boundaries associated with mandated testing. What constitutes cheating may even be codified in state law. For instance, the Ohio Revised Code proscribes “any practice that results solely in raising scores or performance levels on a specific assessment instrument without simultaneously increasing the student’s achievement level as measured by tasks and/or instruments designed to assess the same content domain.” The law provides for the termination of an offender’s employment, the suspension of an educator’s license for a violation, and the charging of an offender with a misdemeanor.
National organizations representing various professional associations have also developed standards for educators who administer standardized tests. For example, the American Federation of Teachers and the National Education Association helped to produce the Standards for Teacher Competence in Educational Assessment of Students, which require that teachers “should be skilled in recognizing unethical, illegal, or otherwise inappropriate assessment methods and uses of assessment information.” The most explicit statements regarding cheating can be found in the Standards for Educational and Psychological Testing, a product of collaboration among professional organizations in the fields of education and psychology. Among other guidelines, the standards explain that those involved in educational testing programs must “ensure that test preparation activities and materials provided to students will not adversely affect the validity of test score inferences” and “maintain the integrity of test results by eliminating practices designed to raise test scores without improving students’ real knowledge, skills, or abilities in the area tested.” In short, educators cannot plead ignorance; there has been no dissemination problem regarding what constitutes cheating. Professional codes of ethics that cover virtually every profession whose members work in school settings and state laws that govern those who have a license or credential in the field of education contain strict guidelines for administering tests. Anyone who is connected with testing in American education knows—or should know—how to conduct assessments that yield accurate and credible results.
The Consequences of Cheating
When cheating occurs, testing yields inaccurate information about individual students. The error is compounded when this information is then used for any educational purpose, and specific students wind up paying the price. One student may not receive the remedial instruction in reading that she needs. Another student may be incorrectly assigned to a special program for gifted and talented students that has a limited number of slots. Another may receive a scholarship that should have gone to one of his peers. And yet another may receive a diploma without having learned those minimum skills deemed necessary for success in college or on the job.
|One teacher revealed that she checked students’ answer sheets “to be sure that [they] answered as they had been taught.”|
From the perspective of educational policy-making, the same invalidities that yield misleading test scores at the individual level also serve to muddle the interpretation of group test performance. Policy-makers and educational administrators have increasingly come to rely on group data to inform their decisions on staffing, curricula, professional development, and teacher-credentialing requirements and to measure the effectiveness of educational reforms. By distorting test results, cheating can lead to ill-advised initiatives, improperly focused resources, and inaccurate conclusions about the course of education reform. Given the confluence of achievement gains—in states ranging from Texas to North Carolina to New York—with the pervasive reports of cheating by educators, it is entirely reasonable to question how much of the former can be attributed to the latter.
Though not attempted in this article or elsewhere to my knowledge, the costs of cheating probably could be measured in dollars and cents. What cannot be measured are the effects of cheating at more fundamental levels. For example, when students learn that their teachers or principals cheat, what is the effect of this kind of role modeling? While fallen professional athletes might be able to say, “Don’t look at me as a role model, I am just an athlete doing a job,” educators cannot: a significant aspect of their job is the modeling of appropriate social and ethical behavior. Also, how might educator cheating affect students’ attitudes toward tests or their motivation to excel? How might it affect their attitudes toward education, their trust or cynicism with respect to other institutions, or their propensity to cheat in other contexts?
|In California, 36 percent of teachers thought it appropriate to practice with current test forms.|
Research on Cheating
Shocking anecdotes don’t tell us much about how serious the cheating problem actually is. Just how prevalent is cheating by educators? Only a few studies have directly asked educators whether they have engaged in what have come to be referred to euphemistically as “inappropriate test administration practices.” The most common avenue of research is to poll educators regarding their general perceptions of cheating in their schools. One such study asked 3rd, 6th, 8th, and 10th grade teachers in North Carolina to report how frequently they had witnessed certain inappropriate practices. Of those polled, 35 percent said they had engaged personally in such practices or were aware of others’ unethical actions. The teachers reported that their colleagues engaged in a range of inappropriate practices two to ten times more frequently than they had. The practices included giving extra time on timed tests, changing students’ answers, suggesting answers to students, and directly teaching specific portions of a test. More flagrant examples included teachers’ giving their students dictionaries and thesauruses for use on a state-mandated writing test. One teacher revealed that she checked students’ answer sheets “to be sure that her students answered as they had been taught.” Other teachers reported using more subtle strategies, such as “a nod of approval, a smile, and calling attention to a given answer,” to enhance their students’ performance. In another study, of teachers who were drawn from two large school districts, 32 percent of the teachers surveyed reported allowing students to practice on old forms of standardized tests for two or more weeks.
A total of 40 schools were included in a study that was initiated in order to investigate suspected cheating in the Chicago public schools. Of the 40 schools, 17 served as “control” schools, which were compared with 23 “suspect” schools that had exhibited irregularities in the performances of their 7th and 8th grade students on the Iowa Tests of Basic Skills (ITBS). The irregularities consisted of unusual patterns of score increases in previous years, unnecessarily large orders of blank answer sheets for the test, and high percentages of erasures on students’ answer sheets. The researchers readministered the ITBS under more controlled conditions and found that, even after accounting for students’ reduced level of motivation on the retesting, the “suspect” schools clearly did worse than the “control” schools on the retest. The researchers concluded that they might have underestimated how much cheating was going on at some schools. A study of cheating in the Memphis school district revealed extensive cheating on the California Achievement Test, including a teacher who displayed correctly filled-in answer sheets on the walls of her classroom.
The Perceptions of Educators
The most troubling stream of research on cheating concerns the attitudes of educators toward cheating. They seem increasingly indifferent toward cheating, and there appears to be a growing sense that cheating is a justifiable response to externally mandated tests.
Several studies have attempted to investigate educators’ perceptions of cheating. In a 1992 study, 74 pre-service teachers were asked to judge the appropriateness of certain behaviors. Only 1 percent thought that either changing answers on a student’s answer sheet or giving hints or clues during testing was appropriate, and only 3 percent agreed with the idea that allowing more time than allotted for a test was acceptable. But 8 percent thought that practicing on actual test items was okay; 23 percent judged the rephrasing or rewording of questions acceptable; and 38 percent thought that practice on an alternative test form was appropriate.
The beliefs of pre-service teachers appear to translate into actual practices when they enter the classroom. In 1991, a large sample of 3rd, 5th, and 6th grade teachers in two school districts was asked to describe the extent to which they believed specific cheating behaviors were practiced by teachers in their schools. On the positive side, a majority of respondents (shown in Figure 1) said that, for all of the behaviors listed but one, they occurred rarely or never. Equally noticeable, however, is that in several cases, 15 percent or more of the respondents reported that a behavior occurred “frequently” or “often.” A full 23 percent of the teachers said that they thought teachers often gave hints to children who were having difficulty. Twenty percent said they thought teachers “frequently” or “often” gave students extra time to finish their tests. Likewise, 20 percent said they thought teachers “frequently” or “often” gave students practice on passages that were highly similar to those used on the test.
A 1991 survey examined perceptions about two specific kinds of “test preparation” practices: having students practice for a state-mandated, norm-referenced test using another form of the same test or having students practice on the actual test to be used. The survey polled six groups of educators, including teachers, principals, superintendents, and school board members in California. The results, shown in Figure 2, reveal fairly broad acceptance of these behaviors, even among board members. For instance, 36 percent of teachers in California thought it appropriate to practice with current test forms.
|Some tests serve solely an instructional function; they provide solid diagnostic information on students in order to make informed decisions about their educational programs . . .|
What Can Be Done?
What can be done to address the problem of cheating? At some point, we will need to reconceptualize testing entirely. We must find more effective ways to link, consistently and directly, successful test performance to student effort and effective instruction. If poor performance were accompanied by sufficient diagnostic information about a student’s weaknesses, then all concerned might view identification and remediation of those weaknesses as more beneficial then cheating. Such an initiative will require changing much of the status quo in curriculum, instruction, and assessment. But there are less far-reaching, more pragmatic actions that can be taken immediately. The following list provides a start.
Dissemination. It has been said that we more often stand in need of being reminded than we do of education. As mentioned earlier, every large-scale testing program provides a description of appropriate test-administration procedures; state regulations define the boundaries of legal conduct for test administrators; and education-related associations have produced guidelines for sound testing practice. Nonetheless, those caught cheating often protest that they did not know the behavior was wrong. If only as a reminder, every implementation of high-stakes tests should be accompanied by dissemination of clear guidelines regarding appropriate testing practices. Such reminders should be clearly worded, pilot-tested to refine the meaning that educators take from the guidelines, and distributed and signed by all who handle testing materials.
Procedures. Some minor procedural changes would hinder the ability of educators to cheat. For example, bar-coding or other methods of identifying testing materials plus a system of tracking testing materials would be easy to implement. Federal Express and United Parcel Service know the location of every package at any given time and can reconstruct precisely the hands that a package has passed through. The same tracking can be used for testing materials. Other simple steps would include the sealing of cartons and bundles of testing materials; delaying the delivery of testing materials to schools until just before test administration; and, once delivered, requiring that materials be maintained securely by a named person who is responsible for them.
“Truth in testing.” States with so-called truth-in-testing laws should reconsider their relative benefits. These laws often require that the content of state-mandated tests be disclosed following the administration of a test. They have the best of intentions, but the unforeseen consequence of such laws has been an increase in educators’ use of previous versions of tests for classroom practice, resulting in further narrowing of instruction. Moreover, the economic costs to “truth in testing” states have been staggering. Disclosing each year’s tests renders them useless, making it necessary to develop entirely new monitoring instruments one or more times each year.
Scaling Back. The expansion of testing and accountability systems has elicited two reactionary responses to the concurrent rise in cheating: 1) that large-scale testing for accountability be abandoned; or 2) that testing for accountability rely more heavily on constructed-response formats that, ostensibly, would be less prone to corruption. For instance, it is more difficult to forge or coach a student’s answer to an essay question or a science experiment than to alter a bubbled-in response or to provide the key to a multiple-choice item.
The difficulty with these reactions is that they fail to address the core issues. High-stakes pupil testing arose in the 1970s in reaction to the complaints of some business leaders—along the lines of, “We are getting high school graduates who have a diploma, but can’t read or write!” As UCLA Professor of Education James Popham observed at the time: “Minimum competency testing programs . . . have been installed in so many states as a way of halting what is perceived as a continuing devaluation of the high school diploma.” The public perception was that the gatekeepers were leaving the gates wide open. Perhaps a widespread misunderstanding of the relationship between self-esteem and achievement was to blame. Educators understandably wanted all students to have the personal esteem associated with high achievement. But awarding higher grades in order to boost self-esteem and stimulate further achievement too often had neither effect. The sense that grades weren’t accurate measures of achievement led to the imposition of externally developed and administered tests.
Thus the obvious error in calls to return to the past is that such a strategy only returns American education to the situation that caused accountability tests to be introduced in the first place. Moreover, though current tests are susceptible to cheating, the solution of returning to measures and procedures that are even more easily manipulated is unthinkable.
Nevertheless, we should consider limiting the amount of testing for accountability. Some tests serve solely an instructional function; they serve our need for solid diagnostic information on students in order to make informed decisions about their educational programs. When these tests are used also for accountability purposes, the expanded incentives to cheat can corrupt the information we have on student progress. Likewise, not all tests—especially those designed for purposes of decision-making—need have instructional value. If we clarify the purpose of each test, we can minimize the scope of mandated accountability tests, the time required for their administration, and the opportunities for cheating.
Consequences. In conjunction with limiting opportunities for cheating, we must revise the procedures that are used to unearth cheating and the penalties that are handed down. Many tests are currently administered behind closed classroom doors with little independent oversight; there are strong disincentives for educational personnel to report cheating; and, in most jurisdictions, the responsibility for investigating cheating rests with school personnel who have an inherent conflict of interest in ferreting out inappropriately high student achievement. Revised procedures should include: 1) random sampling and oversight of test sites; 2) increased protections for whistle-blowers; 3) stiffer penalties for cheaters, including permanent disqualification from teaching within a state and more coordinated sharing among the states of information regarding educators who have had their licenses revoked; and 4) assignment of investigative responsibilities to an independent authority.
A Qualified Boon
As we are learning, accountability is not an unqualified boon to American education. Nascent accountability systems have been difficult to implement and have had some undesirable consequences. For instance, in some situations the use of tests as the primary accountability mechanism has resulted in an extreme narrowing of what students are taught. When so much is at stake, educators tend to limit their instruction to the content that is covered on a mandated test. Moreover, some educators perceive the imposition of an externally mandated test as an inappropriate intrusion into an area of professional practice and discretion. They believe that their knowledge of a student’s true ability far exceeds whatever information can be gleaned from a single test; thus the idea that a single test should not be used against any student (for example, to deny grade-to-grade promotion or a high school diploma) is widespread among teachers. Here the same moral reasoning that many current teachers learned in their education courses during the 1970s may come into play: the ends do justify the means; cheating is a justified response to a system that punishes students and teachers on the basis of incomplete information.
|. . . When these tests are used also for accountability purposes, the expanded incentives to cheat can corrupt the information we have on student progress.|
But cheating itself leads to inaccurate information and misguided decisions about students. It signals that students have learned the skills we want them to, when in fact they haven’t. It threatens the values that we hope to impart to students via those we have charged with their education. It leads to mistaken conclusions about the efficacy and pace of needed educational reforms. Even if cheating is limited to a minority of educators, as it most likely is, its effects are devastating. It is no more justifiable than telling a sick patient that he is well and then sending him on his way.
–Gregory J. Cizek is an associate professor of educational measurement and evaluation at the University of North Carolina at Chapel Hill and the author of Cheating on Tests: How to Do It, Detect It, and Prevent It (1999).