Rigorous Preschool Research Illuminates Policy (and Why the Heckman Equation May Not Compute)
I have previously written about the extent to which advocates for greater public investment in center-based programs for young children, including some education researchers, misrepresent the quality and relevance of the research they cite to support their claims.  Their behavior is captured by the metaphor attributed to 19th century Scottish novelist, Andrew Lang, about people who “use statistics as a drunken man uses lamp-posts, for support rather than for illumination.” 
On the positive side of the ledger, I can think of no other area of education in which policymaking and discussion rely as much on appeals to evidence. That the evidence is often weak, misleading, or irrelevant is a serious problem, but at least the action is on the right field of play. Leaning on the lamp-post for support may be a way station to using it for illumination. We have arrived at that way station with early education and childcare. The challenge now is to create a supply of new, high-quality evidence to address unanswered questions and to help consumers of existing research separate the wheat from the chaff.
Tennessee pre-K follow-up. One important recent contribution to the pool of high quality evidence is the follow-up of the Tennessee Voluntary State Pre-K program (TVPK) using the full randomized sample. 
The TVPK is a current, scaled-up, state-funded pre-K program in Tennessee that is offered to four-year-olds from low-income families and other high risk children in that age group. It is a full school‐day program that provides a licensed teacher and aide in every classroom. It meets 9 of the 10 standards for preschool program quality promulgated by the National Institute for Early Education Research, which means that it is one of the mostly highly rated state programs in the country. 
Research on the program has been described in a previous Evidence Speaks report, and elsewhere.  The findings as previously reported are that the program produces strong effects at the end of the pre-K year favoring children in the pre-K program compared to the control group (e.g., pre-K children compared to controls know more letters of the alphabet), but that the effects switch sign to favor the control group as the children are followed into the early elementary school grades (e.g., control group children compared to pre-K children do better on tests of math skills).
Confidence in these previously reported findings has been appropriately tempered by a limitation in the research design caused by an imbalance in the proportion of parents randomly assigned the two conditions of the study who gave written permission for their children to be tested by the evaluation team as they progressed through elementary school. The researchers responsible for the evaluation responded to this threat to their random assignment design by creating a treatment and control group through statistically matching subjects whose parents had given permission for follow-up testing. It is this subsample of children who were tested in the early elementary grades and for whom results have been previously reported. Such a matched group design is weaker in terms of its ability to support strong causal conclusions than a random assignment design because it doesn’t eliminate the possibility that the two groups differed at the outset of the study on variables not measured and therefore not included in the matching algorithm. The prominent economist, James Heckman, has dismissed the reported findings as “incompetent” based on the way data were analyzed.
The most recent findings from the study as reported a few weeks ago are based on the full original sample of children as randomized to treatment and control conditions at the outset of the study rather than the previously studied subsample of children whose parents gave permission for follow-up testing by the evaluation team. This was possible because the children originally randomized were subject to the state tests required of all children in public schools that are administered for the first time at the end of third grade.
Using the state test data and the full randomized sample, the evaluators report negative impacts for reading, math, and science scores at the end of third grade for children assigned to TVPK.  The negative impacts on math and science are statistically significant and substantive: children randomly assigned as preschoolers to TVPK had lost ground to their peers who had randomly not been offered admission to the pre-K program. The loss was equal to about 15 percent of the expected gap in test scores between black and white students at that age. In other words, children who were given the opportunity to attend TVPK were, on average, harmed by the experience in terms of their academic skills in elementary school. The reasons for this are not clear. One speculation is that the TVPK program was developmentally inappropriate, i.e., too much like kindergarten or first grade. 
There is no longer a methodological escape hatch for people who want to dismiss the results of the evaluation. It, along with the national Head Start Impact study,  are the only two large sample studies in the literature that have applied a random assignment design to modern scaled-up pre-K programs and followed children’s progress through school. Both show sizable positive effects for four-year-olds at the end of the pre-K year, but these effects either have either diminished to zero by the end of kindergarten year and stay there in later grades (Head Start) or actually turn negative (Tennessee). Advocates of greater public investment in state pre-K programs are beginning to incorporate these results into their thinking.  It is only very high quality research that can force such a reappraisal.
Recent findings based on systematic reviews. The early childhood research community, to its credit, has begun to come to grips with the mixed signals about longer term benefits that are being sent by the totality of the modern research literature on the impact of pre-K programs. An important case in point is a recent consensus statements by a panel of 10 distinguished researchers on what can be concluded from research on large scale pre-K programs:
Convincing evidence on the longer-term impacts of scaled-up pre-k programs on academic outcomes and school progress is sparse, precluding broad conclusions. The evidence that does exist often shows that pre-k-induced improvements in learning are detectable during elementary school, but studies also reveal null or negative longer-term impacts for some programs. 
In a similar vein, three other early childhood researchers, including the economist Greg Duncan, a titan of the field, recently wrote about the fleeting effects of pre-K in the Washington Post.  The headline captures their conclusions:
Preschool can provide a boost, but the gains can fade surprisingly fast. What children typically learn are skills they would pick up anyway.
This finding, just as was the case for the previously noted consensus statement, was driven by a comprehensive review of evidence based on articulated standards for quality and relevance. This is exactly what policymakers and the interested public need if their goal is to ground early childhood programs and practice on conclusions derived from solid knowledge, both of what we know and what we don’t know.
The Heckman Equation may not compute. James Heckman’s most recent empirical work and outreach on early childhood programs  has drawn a lot of attention in the media. For example, the Washington Post headline is “Why your children’s day care may determine how wealthy they become.”  For the New York Times, it is “How Child Care Enriches Mothers, and Especially the Sons They Raise.”  The headline writers are getting their take from Heckman through the advocacy materials on his website, The Heckman Equation.
At the broadest level, the conclusion tendered by journalists reporting on Heckman’s work and Heckman himself is that the U.S. needs to invest heavily in very expensive center-based child care for children from birth to kindergarten. And, we are told, the taxpayer will get 7.3 dollars back for every dollar spent in implementing Heckman’s policy recommendations.
A reasonable case can be made for providing much higher levels of public investment to support the childcare needs of lower-income families. Indeed, I’ve argued just that based on the pressing need of working families for that service, their inability to afford it, and the positive effects on all members of low-income families of having extra income.  But the research on longer-term outcomes for children from such investments is sparse, as is much needed research on the differential impacts of different ways of delivering such childcare supports, e.g., should we let families choose what they want or should there be standardized, government-run, center-based care?
It would be important and valuable if Heckman’s recent work filled in some of the large gaps in knowledge on this topic and pointed to the critical research that still needs to be carried out to increase the likelihood that new public investments will achieve their desired result. Unfortunately, the underlying research on which Heckman stands has serious problems in external and internal validity.
External validity is the extent to which research findings support generalization to other situations. Heckman’s generalizations are to present public childcare policies, but his data are from the Abecedarian project, an intensive birth-to-five early childhood intervention program delivered to roughly 50 children who were judged at birth to be at risk of mental retardation. All the children were from very low-income families, and nearly all were black. The program was designed and carried out by faculty and staff at the University of North Carolina at Chapel Hill in the 1970s.
Abecedarian was a full-day, year-round program with a very low teacher-child ratio (1:3 for infants and 1:6 for five-year-olds), and unusually qualified staff and management. Children received their primary medical care on site through staff pediatricians, nurses, and physical therapists. There was a defined curriculum intended to foster language and cognitive skills and thereby increase IQ. Many efforts were made to involve families through voluntary programs. Supportive social services were provided to families facing problems with housing, food, and transportation. The annual direct expenditure per participant has been reported as roughly $19,000 in present dollars. 
The Abecedarian program is not childcare. It is not a viable model for childcare. There is nothing now available to parents called childcare or daycare that is even grossly similar to Abecedarian in the program that is delivered, the characteristics and social circumstances of the children and families that are served, the teachers and staff who are employed, the age at which children are initially enrolled (6 weeks), the continuity of enrollment from infancy to 5 years, the delivery of on-site primary health care, program leadership and management, or costs. As a program from which one can generalize results with confidence to present day public policies on childcare, Abecedarian fails abysmally.
Internal validity is the degree to which the design and analysis of an evaluation of a program’s impact can support causal conclusions about whether the program worked. A study of the impact of a social program that has high internal validity is designed and analyzed such that there is confidence that the groups being compared differ only in their experience of the treatment program. This is best achieved through random assignment to treatment conditions, which assures (within the margin of error associated with the sample size) that differences between groups of subjects other than their experience of treatment are evened out by the laws of chance.
We’ve seen with the experience of the TVPK evaluation, as described previously, how important random assignment and internal validity are to the influence research findings can exercise on pre-K policy; and, as exemplified by Heckman’s criticism of the subsample analysis as “incompetent,” how vociferous the criticism will be of studies and analytic approaches that challenge dominant policy positions while falling short of the gold standard of methodological quality.
Abecedarian was designed as a randomized trial. The original evaluators randomly assigned 120 families to the Abecedarian treatment or to a no treatment control condition. Heckman’s analyses and conclusions depend on the integrity of that random assignment.
Unfortunately, there were compromises in the random assignment protocol that are likely to have pushed upwards the estimates of the program’s impact. Specifically, 12 percent of the families assigned to the treatment group declined participation in the study before the study began, whereas only 2 percent of the families in the control group declined data collection on their children.  This is a big problem for internal validity because we are left with the result that parents who refused participation in the Abecedarian program were less willing or able to commit to a long-term early childhood intervention for their children than the overall population of families who were recruited into the study. Having these reluctant families out of the treatment condition but still in the control condition raises serious questions about whether the two groups that have been followed longitudinally in the years since were equivalent at the outset of the study. Further, the children in the families declining treatment were never tested and information on the families is limited. This means that it is impossible to control statistically for observable differences in the children who were lost at the point of randomization.
Compounding this problem, Heckman reports what has to the best of my knowledge heretofore not been reported on the sample selection process for Abecedarian, that the “program officers recruited additional subjects who were added to the program before the subjects were 6 months old. Our calculations indicate that there were eight replacements. We cannot distinguish in the data the subjects who were initially randomized from the replacement children and there is no documentation on how these subjects were recruited.” 
As is the case for the imbalance in the decline-to-participate rates across families assigned to the control and treatment groups, the inclusion in the treatment group of children not randomly assigned to treatment nullifies the assumption justified by random assignment of equivalence of the treatment and control children at the outset of the study. Likewise similar to the decline-to-participate problem, there is no way to control for this problem statistically, in this case because the identities of the children who got into the treatment group through the undocumented back door are not known.
In short, Abecedarian is not a child care intervention, nor a realistic model for one. It is a hothouse university-based program from nearly a half century ago for a few dozen children from very challenging circumstance who were deemed to be at risk of mental retardation. Its relevance to present day policies on child care for the general population is uncertain, at best. Even ignoring this failure of external validity, the reported results from Abecedarian favoring the treatment participants are in doubt because the evaluation of Abecedarian’s impacts on participants is seriously compromised by a large imbalance in decline-to-participate rates by those assigned to the treatment vs. the control condition; and by the presence in the treatment group of an appreciable proportion children who were not randomized into the condition. Thus, we have an evaluation with significant issues in internal and external validity that cannot be used confidently as the basis for conclusions about the policy directions this country should take in child care.
Summary and conclusions
Public policy discussions and decisionmaking on the early education and care of young children have come to rely heavily on appeals to evidence. That is a good thing, but too often the evidence in play has been methodologically weak, or of low relevance, or too ambiguous in outcomes to support the conclusions drawn from it. That situation is improving due to a greater appreciation of and reliance on research that is of high quality and high relevance as opposed to evidence that is selected and utilized primarily because it provides support for established point of views and policy preferences.
The U.S. needs more, and more effective, investments in the early education and care of young children. The TVPK evaluation and recent conclusions from prominent early education researchers on the fleeting or uncertain effects of preschool on children’s progress through later schooling send strong signals about how much we need to learn about how to do this productively.
This does not mean that new investments and approaches should wait until definitive research is in hand. I, for one, am very much in favor of spending a lot more now. But the recommendations I’ve made about how to do that, which are predicated on the value of family financial support and the necessity for heavily subsidized childcare for low-income families, are tentative.  I don’t know that I’m right. I’m providing my best guesses, as informed by evidence but far from compelled by it. That is the best that can be said for almost every policy prescription that is put forward in education and social policy.
This has clear implications for how government should proceed in addressing any policy proposals on early education and care—mine, Heckman’s, anyone else’s. Go forward with promising ideas with a public acknowledgement of uncertainty and an approach designed to learn from error. Don’t place big and irrevocable bets on conclusions and recommendations that are far out in front of what a careful reading of the underlying evidence can support. Very few policy prescriptions are slam dunks, even those that seem to have good research behind them. In the early education and care of children, just as in the rest of social policy, we need to be a learning society, prepared to try new approaches to address pressing problems and to learn systematically from trial and error in their implementation.
— Grover J. “Russ” Whitehurst
Russ Whitehurst is a Senior Fellow in the Center on Children and Families in the Economic Studies program at the Brookings Institution.
This post originally appeared as part of Evidence Speaks, a weekly series of reports and notes by a standing panel of researchers under the editorship of Russ Whitehurst.
The author(s) were not paid by any entity outside of Brookings to write this particular article and did not receive financial support from or serve in a leadership position with any entity whose political or financial interests could be affected by this article.
1. https://www.brookings.edu/research/does-pre-k-work-it-depends-how-picky-you-are/; https://www.brookings.edu/research/more-dubious-pre-k-science/; https://www.brookings.edu/research/do-we-already-have-universal-preschool/; https://www.brookings.edu/research/whitehurst-testimony-on-early-childhood-education-to-the-house-committee-on-education-and-the-workforce/
15. https://www.brookings.edu/wp-content/uploads/2017/03/es_20170309_whitehurst_evidence_speaks3.pdf; https://www.washingtonpost.com/opinions/this-policy-would-help-poor-kids-more-than-universal-pre-k-does/2016/07/28/3512bb14-420f-11e6-88d0-6adee48be8bc_story.html?utm_term=.94fc21a07eb0; https://www.brookings.edu/wp-content/uploads/2016/07/Family-support3.pdf
17. These percentages are based on the seven families refusing participation in the treatment group following randomization as reported by the original study authors in https://www.jstor.org/stable/1131410?seq=1#page_scan_tab_contents rather than the smaller number of four such families used by Heckman and his co-authors in https://cehd.uchicago.edu/wp-content/uploads/2017/01/abc_comprehensivecba_appendix-pub.pdf.
19. https://www.brookings.edu/wp-content/uploads/2017/03/es_20170309_whitehurst_evidence_speaks3.pdf; https://www.washingtonpost.com/opinions/this-policy-would-help-poor-kids-more-than-universal-pre-k-does/2016/07/28/3512bb14-420f-11e6-88d0-6adee48be8bc_story.html?utm_term=.94fc21a07eb0; https://www.brookings.edu/wp-content/uploads/2016/07/Family-support3.pdf