Averaging in Education

It’s how you use it that counts

By 09/07/2016

Print | NO PDF |


In his recent book The End of Average, Harvard neuroscientist Todd Rose argues that designing education for the average student is fundamentally misguided — because there are no average students (See “The Not-So-Golden Mean,” book reviews, Spring 2017). He deploys a central analogy from the history of aeronautical design: In the 1940s, well-trained U.S. Air Force pilots began mysteriously crashing their mechanically sound planes. It turned out that the new, more powerful jets were hard to control because of their one-size-fits-all cockpit design. Although the Air Force had used the average body measurements of its pilots to construct the cockpit, there were literally no pilots who fell within the average range on 10 key dimensions. To prevent future crashes, engineers scrapped the standard cockpit and replaced it with an adjustable one that could accommodate the “jagged profile” of individual pilots. Applying this example to education, Rose argues against a standardized education system designed for the average student and in favor of personalized learning designed for individual students.

How Many Students Are Average?

Inspired by Rose’s example, my colleagues and I at Panorama Education decided to ask the question, How many students are average? We analyzed data collected from an entire high school on nine measures of social-emotional learning (SEL): teacher-student relationships, self-efficacy, grit, growth mindset, social awareness, self-management, sense of belonging, school safety, and emotion regulation. Using the same statistical method as in the Air Force example—which defined average as being between the 35th and 65th percentiles on all variables—we found that only one out of the 1,142 students in the school was average across all nine SEL measures.

Conversely, we found that 63 students were average on zero of the nine measures. In other words, students in this school were 63 times more likely to be average on nothing than average on everything. Looking at other school datasets, this ratio is typical. And because the SEL variables were all normally distributed, these findings have nothing to do with well-known problems of using averages for data that don’t fit a bell curve. Just as the military shouldn’t expect to find average pilots in their cockpits, schools shouldn’t expect to find average students in their classrooms. If anything, they should expect to find just the opposite.

Application Matters

Based on these examples and analyses, it’s easy to conclude that averaging is bad. But another historical example of averaging suggests that things are not so simple. In 1907, Sir Francis Galton analyzed data from a county fair competition in which 800 participants guessed the weight of a slaughtered ox. Galton discovered that the average weight estimate (1,197 pounds) almost perfectly matched the actual weight of the dead ox (1,198 pounds). Moreover, the average estimate was better than all (or nearly all — it’s impossible to tell from the original article) of the individual estimates. The aphorism that two heads are better than one turns out to be true for much more than just ox estimates.

Why is averaging across individual variation a disastrous strategy for designing cockpits but a winning strategy for estimating ox weight? How can the average both describe nobody and beat everybody?

The answer lies not in the statistic itself but in its application. When used to increase precision, understand group differences, measure intervention effects, or characterize populations, averaging across individual measurements can be extremely useful. But when used to characterize individuals or design individual experiences, averaging can lead us astray.

In the field of education, we shouldn’t think of the “average student” — defined statistically in any way (for example, using the mean, median, mode, or a percentile range) — as more than a useful abstraction that allows us to summarize student data. As Todd Rose argues now and educational psychologist Lee Cronbach argued 60 years ago, education systems should be designed for individual students and educators to avoid a “mean mindset.” But averaging is vital, too: It provides us insight into teacher efficacy; school performance; trends over time; group differences; and proposed policy, programmatic, or pedagogical interventions. We need averages to test hypotheses, measure success, and communicate data.

Some Better Ways to Average

How can educators and education researchers average more wisely? One way is through language, by replacing the phrase “the average student” with “on average, students.” Whereas the former personifies a statistic, the latter separates people from statistics.

Another way is through data reporting and visualization by presenting raw data (such as histograms and scatterplots) rather than just averaged data (such as bar or line plots).

Finally, whenever we present an average, we should also convey how much individuals diverge from that average; such information offers necessary context.

A Case Study: Online Bullying

As an example, consider three descriptions of students’ self-reports of online bullying, based on Panorama data. Figure 1 shows the least sophisticated (and probably the most common) presentation. After reviewing the data as presented in Figure 1, the school’s principal would likely conclude that online bullying is only a “slight” problem at the school and focus his or her attention on other issues. But is this really the best course of action based on the underlying data?


A slightly more sophisticated analysis might include subgroups, as shown in Figure 2, and lead to a different outcome: The school’s principal might still conclude that the school is relatively safe but would probably not dismiss the issue of online bullying altogether. He or she might wonder why girls report more online bullying than boys do, and what the school could do to reduce this gender gap. But is this really the story these data tell?


Consider the third description in Figure 3 that altogether avoids a mean mindset. It most fully represents the data and, in doing so, tells a very different story. This presentation doesn’t just emphasize individuals over averages: It reveals important realities that the others conceal. Although the likelihood of online bullying is low on average, lots of students are effectively sounding an alarm about their experiences — one that goes unheard in the first two presentations but comes across clearly in the third. From Figure 3, a principal would likely conclude that the problem of online bullying should be confronted rather than dismissed. And although there is a gender difference in online bullying, it’s relatively small compared with the differences among individuals. In addressing online bullying, a principal would be more likely to treat students as individuals first and as boys or girls second, and not view bullying as just a “girl problem.”


A Case Study: GPAs and Growth Mindset

Consider another example from the same dataset in which high school students’ cumulative grade point averages (GPAs) are related to their scores on Panorama’s Growth Mindset scale, which measures how much students believe they can change their intelligence, behavior, and other factors central to their school performance. Figure 4a typifies a mean-centric approach, and Figure 4b shows a better approach. Both presentations reveal that a growth mindset is associated with higher grades, but only Figure 4b reveals individual variability. Consumers of Figure 4a are more likely to overstate the relationship between GPA and a growth mindset, and consumers of Figure 4b are more likely to appreciate that mindset is neither a limiting factor nor a major determinant of academic success.


Getting the Whole Story

Averages can enlighten as much as they can obscure. We should neither call for the end of the average nor mindlessly adopt a mean mindset. In some applications, including those in education, averaging can obscure important individual differences, lead to one-size-fits-no-one designs, and value sameness over uniqueness. But in other applications, averaging can usefully aggregate data, make predictions, reveal group differences or intervention effects, and efficiently communicate evidence. As educators and researchers, we can avoid throwing the baby out with the bathwater by investigating, analyzing, and communicating averages as well as individual differences.

Samuel T. Moulton is Research Director at Panorama Education in Boston, MA, and Research Associate in the Department of Psychology, Harvard University.

Sponsored Results
Sponsored by

Harvard Kennedy School Program on Educational Policy and Governance

Sponsored by