Can We Predict Who Will Be a Great Teacher? An Interview with Allison Atteberry
As part of Bellwether’s forthcoming work on teacher preparation, we explored how much we know about teachers at various stages of their careers. How much do we know about a teacher before they enter the classroom? What about after they’ve been teaching a few years? Is any of this information strong enough to act on?
These sorts of questions led us to a paper by Allison Atteberry, Susanna Loeb, and James Wyckoff that looked at how well a teacher’s early-career performance predicted her effectiveness in subsequent years. I spoke with Atteberry, an Assistant Professor in the Research and Evaluation Methodology program at the University of Colorado Boulder, about the findings. What follows is a lightly edited transcript of our conversation.
Chad Aldeman: Your paper is titled, “Do First Impressions Matter? Improvement in Early Career Teacher Effectiveness.” For someone who hasn’t read the paper, what’s the answer: Do first impressions matter?
Allison Atteberry: What we mean when we say “first impressions” is how teachers seem to be performing in terms of value-added right as they start in the profession, that crucial first and second year. Ultimately what we find, particularly in math, is these first impressions are important and are predictive of how teachers will do in the longer-term as they move into years 3, 4, and 5.
The short answer is we can know, with some probability, what’s going on with teachers based on their early career performance in terms of how they’re going to be doing on value-added in future years.
Can you say something about what a normal trajectory would look like? How much can we expect teachers to improve in their first few years on the job?
That’s been a topic of a lot of research over the years, and findings across studies have been fairly consistent on this. We see pretty dramatic returns to experience on the order of 15-20 percent of a standard deviation on test scores over the first five years. That varies somewhat study to study, but generally speaking that’s the most notable growth period for teachers. Over this period, teachers go from being somewhat below-average to somewhat above-average. That’s the normal trajectory.
Do all teachers improve, or do some improve faster than others?
In some ways this has become the most interesting question. We wanted to go above and beyond what had been done before in looking at the average growth patterns and see what happens across the full spectrum of teachers.
In fact, most teachers do not follow the average pattern. Some may come into the profession relatively low-performing relative to their peers and stay low-performing. There are also teachers who start low-performing who rocket up in their first few years, but that’s not the modal pattern. In general there is a correlation between where you start and where you are a few years down the road.
Generally speaking, there’s a lot we can learn about the conditions for why some teachers find opportunities or supports that allow them to make considerable growth, while others do not. That’s not the focus of this paper, but it’s the next step after this paper.
If someone struggles in their first year of teaching, what are the chances they’ll be significantly better a few years later?
In this paper, we take teachers and separate them into quintiles based on their initial performance. One thing we do is ask, for teachers in the bottom 20 percent of initial performance, what percent are in higher quintiles in later years? What we find is that in math, of the teachers starting out in the bottom quintile, only about 5 percent of them end up in the top quintile, whereas about 60 percent of this group ends up in the bottom two quintiles in subsequent years.
Another way to make this same point is to think about comparing initially low-performing teachers to an average brand-new teacher. For example in math, two-thirds of the initially lowest performing quintile continue to perform below the mean performance of a brand new teacher even 3-5 years after entry. Thus, the future performance of more than two-thirds of the initially lowest performing quintile does not rise to match the performance of a typical new teacher.
This suggests some lack of movement. Not to say it never happens—it can, and it’s worth studying why—but that’s not the modal pattern.
Now that’s for math. The patterns are not quite as consistent when you look at English Language Arts. You don’t see quite as much of a tendency for early-career low-performing teachers to necessarily stay in the bottom two quintiles. We do see a little more movement. Another way to say that initial value-added scores are a little less predictive in ELA.
If a district actively identified and acted on early-career high- and low-performers, what would be the consequences or the trade-offs they’d be making? So, for example, if districts dismissed initially low-performing teachers or actively retained high-performers, what would happen over time, or what would happen under that sort of policy?
At the end of the day, this question has to be answered by the district, because the consequences really depend on what policy lever you have in mind—there are likely many productive ways a district could use this information aside from dismissals. But say you are talking about the possibility of not retaining pre-tenure teachers who repeatedly appear amongst the lowest performing during their first two years on the job. Our research shows that these teachers have a pretty low probability of moving into the top two quintiles in future years. However even though on average, the prediction of future performance based on early-career performance would be right most of the time, the district would also need to confront the fact that they’d definitely get it wrong some of the time. That is, if value-added scores were the sole criteria used to make this decision, you’re going to dismiss a small number of teachers who would have become good or even possibly great. The district would have to acknowledge that they’d open themselves up to making these kinds of Type I errors. And if you’re talking about dismissal, that’s obviously a high-stakes decision, it impacts teachers’ lives a lot, and in the long term it could affect some potential teachers’ willingness to enter the profession in the first place.
On the flip side, since we do find this early-career information is relatively predictive of future performance, one could view the situation through another lens. If you have the ability to predict with confidence that a certain teacher is unlikely to serve students well in the future, but the district doesn’t act on that information, the district will continue to assign students to this teacher even though a problem could have been identified early on. Further, because of what we know about the unequal sorting of teachers across schools, newer and struggling teachers are more likely to be assigned to historically-underserved student populations. So not acting has potential costs both for student achievement and student equity issues.
However, the story really changes if, instead of tying early career performance to dismissals, you focus on a policy intervention such as targeted supports. When it comes to PD, districts often blanket-provide these services to teachers regardless of need. But if a district could use this early-career information to identify struggling teachers, then resources like mentoring, PD, coaching, or other supports could be more efficiently targeted to those who could benefit the most. It might be worthwhile to use this sort of information to identify teachers who need extra help to make meaningful improvement and get on a better trajectory.
Going back to this information about what we know about teachers and when, could you talk about the relative value of different pieces of information that we know about a teacher at different stages of their career? What do we know before a teacher enters the classroom, and how does that compare to what we know after 1 year or 2 years or multiple years?
If you look at the kinds of things we observe about teachers prior to stepping into the classroom—things like their licensure credential scores, their SAT score, their competitiveness of the undergraduate institution they attended, whether they did TFA or some other alternative certification pathway—it turns out that the set of these things are not terribly predictive of how teachers will perform in their mid-career. Furthermore, some newer research by Dan Goldhaber and colleagues has suggested that what teacher preparation program a teacher attends is not strongly related to on-the-job performance. That’s a bit disappointing, because we’d love to be able to do a better job of selecting and training teachers to be predictably great once they start in the classroom. But that mostly hasn’t been born out in the research.
That said, there has been a little bit of recent research suggesting some districts are doing things that help select teachers that might be more predictive in the hiring process. For instance, Brian Jacob, Jonah Rockoff, and Eric Taylor have a draft paper on Washington, DC schools that looks at undergraduate GPA, as well as performance on a “mock” lesson that applicants are required to submit. Both those turned out to be predictive of future teaching performance. So it’s not to say that we don’t have any room for improvement there. We do. But most of the research suggests that, at least right now, there’s not as much as we’d like to be able to do on the pre-career period.
However, once teachers enter the classroom, every year we start collecting information about how all their kids are doing in terms of student achievement, and a lot of districts have other evaluation systems that involve coaching or observational protocols. That information seems like it could be really fruitful for having a better sense of what might be coming down the line for a teacher and how they might serve students going forward.
Now, this is all within a pretty limited context of thinking about teacher performance in terms of value-added on student test scores, and that could be missing a lot about what makes a teacher great. So this is not to say that this is the perfect system by any means.
You’ve already talked about some of the policy implications of Type I versus Type II errors and different personnel decisions a district might make based on the types of information they have and when they have it, but are there any other policy implications you’d want people to take away from this work?
There are two things that come up for me right away. One is that a potential downside to this is that people might think it would be very easy to identify teachers who are initially low-performing and dismiss them quickly. That could have unintended consequences of who might go into the profession and how people feel about being evaluated so quickly based on measures they might not entirely trust.
We know there’s sometimes a lack of trust for value-added measures in general, so the idea of making them such a high-leverage personnel decision could come with some greater costs to the profession. That’s worth acknowledging and thinking about how teachers might respond to a policy like that.
The other thing that’s an obvious connection here is thinking about the recent debate over teacher tenure policy which has been in the courts. People have been debating whether we decide too soon whether teachers should be granted tenure and whether that actually has some particularly harmful effects for students who are already historically underserved. Our research suggests that there is some promise in the first years of starting to get some predictive information about how teachers are performing. That information is imperfect and won’t always lead to the right decision, but it suggests that it might take a longer period of time than just a single year to really know with a sufficient amount of certainty whether to dismiss or retain a teacher.
Anything else you’d want people to take away from this body of research?
One other thing is that when thinking about what we can do to help shape a productive teaching workforce, there’s this dichotomy between trying to do things before teachers get into the classroom around their training and selecting, versus things we can do to intervene after they enter the workforce. They both have different advantages. Some strategies might have lower cost and be easier to implement. But they may only help with certain teachers or certain groups of students. Either way, we still have to do the hard work of learning how to help people who are dedicated to kids improve their practice. I think that’s one of the most fruitful avenues for future research.
– Chad Aldeman
This first appeared on Ahead of the Heard.