Rigorous Trials and Tests Should Precede Adoption of School Reforms

Paul Peterson
January 22, 1999
The Chronicle of Higher Education

Few doubt that education in our inner cities is in desperate need of improvement. Decades after the civil-rights movement began, equal educational opportunity remains more a slogan than a reality. Minority-group students in U.S. elementary and secondary schools continue to learn much less than their white peers, as measured by a wide variety of tests of student achievement, such as the SAT and the National Assessment of Educational Progress. Students who learn less in school are less likely to attend college, are likely to earn significantly less money later in life, and are less likely to establish a stable family life. Almost everyone agrees that something needs to be done. Yet there is little consensus on what to do.

Nostrums and ideological agendas abound in the absence of solid data on what will help to close this gap. Phonics, better-prepared teachers, instruction solely in English, single-sex schools, and the elimination of tracking students according to ability levels have all been peddled as answers. But no one has ever adequately tested any of those reforms, or most of the others that have been proposed. We simply do not have reliable evidence indicating that any of them do or would succeed in improving minority students' scores.

If such reforms were new drugs or medical procedures, the Food and Drug Administration would reject every one of them for lack of adequate supporting documentation. Not one of the reforms has been tested in a well-designed, large-scale, scientifically conducted experiment, though most of the proposed reforms are amenable to such trials.

Few people would deny the value of experiments in the physical sciences. The theories advanced by scientists from Newton to Einstein have been accepted only because they survived experimental tests. And even powerful theories have been modified when experimental evidence contradicted them. Of course, scientific experiments generally require the researcher to alter one condition or factor, while keeping all others constant. When humans are involved, this kind of experiment is harder to perform -- both because ethical considerations preclude research that might harm the subjects, and because it is virtually impossible to hold all but one human factor constant. Humans are too complex and too variable for us to be sure that all subjects are comparable in every way except one.

The solution is the randomized field trial, which has become the staple of medical research. In this sort of experiment, a reasonably large number of individuals are randomly assigned to one of two groups; then one group is exposed to the factor under investigation (say, a new drug), while the other -- the control group -- is not. If the experiment includes enough subjects, we can assume that the two groups, on average, are similar, except that one is exposed to the experimental condition. Thus, we can attribute significant differences in average outcomes between the two groups to exposure to the factor under investigation.

Today, it is impossible to market a new medical product without demonstrating, by means of a randomized field trial, that the product both is effective and does not cause side effects that would make the cure worse than the disease. To be sure, such randomized trials do not insure that every medical product is safe for everyone; some of the unfortunate side effects of Viagra, for instance, became clear only after thousands of men -- a far greater number than was practical for a field trial -- had used the product. But few of us would want to return to the days before R.F.T.'s, when doctors routinely used treatments that didn't work at all, or that did more harm than good.

Unfortunately, very few educational reforms have been tested in an R.F.T. In fact, according to the Harvard statistician Fred Mosteller and his colleagues, such experiments make up less than 1 per cent of published educational research. One notable exception is a Tennessee experiment on the effects of reducing class size. Legislators in Tennessee had debated whether reducing class size would enhance students' learning, and they ultimately authorized a large-scale R.F.T. to examine the question.

In the experiment, researchers found that first graders in classes of approximately 15 children had scores in reading and math on the Stanford Achievement Test that were 0.2 to 0.3 of a standard deviation higher than the scores of first graders in classes of approximately 25 children -- a difference that is generally regarded as quite substantial. (The difference between the average test scores of black and white students at all educational levels is now approximately one standard deviation.) If similar gains could be achieved for blacks in subsequent grades, making sure that black children are in small classes conceivably could bring the average scores of African-American students up to the average level of those currently achieved by white students.

However, the trial results indicated that continuing the small class sizes into the second and third grades only maintained -- but did not increase -- the size of the effect achieved in first grade. It may be especially important for students to learn the fundamentals of reading and math in small groups; class size may be less critical for more advanced material.

The debate over class size has diminished since the completion of this research. Many observers credit Congress's recent passage of a law giving states $1.1-billion to hire more teachers, and thus enable districts to reduce class size in public elementary schools, at least in part to the persuasive results of the Tennessee study.

One of the most controversial of all educational reforms -- providing vouchers or scholarships to enable inner-city students to attend private schools -- has become the subject of a randomized field trial. My colleagues and I recently collected data from more than 1,400 students from low-income families in New York City. Half of the students were among those chosen by lottery from thousands of applicants to receive scholarships from the School Choice Scholarships Foundation to attend the participating private school of their choice. We selected the other students in our study at random from the applicants who did not win scholarships.

When we compared students using vouchers to attend private schools with the appropriate control group, we found that the average scores on the Iowa Tests of Basic Skills in grades four and five increased 0.18 of a standard deviation in reading, and 0.23 in math. Second-grade results were of similar magnitude, but we detected no significant effects in third grade. We will continue the study for two more years to see whether the students who stay in private schools maintain or increase these gains in scores. No one characteristic of the private schools seems to account for the differences; perhaps it is a combination of factors, or the better matching of students and schools that accompanies greater choice.

We must be careful not to make unsupported generalizations based on either the class-size or the school-choice study. A reform that works in one state may not succeed in another. For instance, reducing class size may not make any difference if we do not have a large supply of capable teachers. Allowing parents to choose their children's schools may help only if enough private schools are available. But at least prototypical examples of these educational reforms have been subjected to rigorous tests, allowing public debate over their use to proceed in a more informed way.

Other educational reforms may have larger effects on students' test scores. Unfortunately, nobody is conducting the necessary research to find out if this is so, or even to learn which solutions hold the most promise.

Admittedly, randomized experiments are not always possible. It would not be ethical to assign individuals randomly to groups, then ask the members of one group to smoke and the members of the second group to refrain. As a result, the only way to estimate the relationship between smoking and cancer is to take into account statistically all the other factors that might cause cancer. Unfortunately, it is easy to attack these kinds of statistical studies on the grounds that the analysis did not include some factor or other -- one of the reasons tobacco companies were able to argue against the obvious dangers of their products for as long as they did.

Similar issues arise in education. For example, a number of statistical studies indicate that neither the amount spent per pupil nor the professional qualifications of teachers (the courses they have taken and the degrees they have earned) have much effect on students' learning. But in the absence of randomized experiments, no one really knows whether these findings are correct.

Certainly, not all R.F.T.'s produce clear results. The experiments may be designed badly, the researchers may not collect the relevant data, or the subjects who can be studied may differ too much from ordinary people for the results to be useful. We cannot ignore these issues, but the answer is not to forgo random experiments. Instead, as in medicine, we need to conduct more of them in different places, with varying kinds of students -- older and younger, middle-class and poor, urban and rural, and so forth. But this brings up the matter of cost, perhaps the biggest obstacle to widespread use of R.F.T.'s. In medical research, pharmaceutical companies are willing to spend the millions of dollars required for R.F.T.'s, because the F.D.A. won't approve new drugs without such experiments; the companies can't make a profit on their drugs without paying for the tests.

In the case of vitamins and other products that cannot be patented, companies do not have the financial incentives to conduct tests, so few randomized field tests are conducted.

The same problem applies in education. If an R.F.T. shows that reducing class size, allowing parents to choose which schools their children attend, or any other reform is effective, other schools can readily copy the reform. With little opportunity for anyone to obtain a patent that can yield large profits, there are few incentives for anyone to conduct R.F.T.'s.

Several steps need to be taken. First, major foundations need to demonstrate the power of R.F.T.'s by helping to finance evaluations of widely discussed innovations. Second, state departments of education should routinely sponsor R.F.T.'s when they consider introducing an educational innovation.

Third, Congress should require the Department of Education's Office of Educational Research and Improvement to devote a significant proportion of its funds to R.F.T.'s. This requirement could be included in legislation to reauthorize the office, now being considered by Congress. The office has provided important information on the state of U.S. education by sponsoring longitudinal studies of educational trends. But, as a National Academy of Science report has pointed out, too much of the office's budget is being frittered away on surveys of educational practices and the budgets of a plethora of educational labs and centers whose work has seldom, if ever, had a major educational impact.

Finally, just as the F.D.A. certifies prescription drugs only after they have been tested in field trials, so the Department of Education should provide its stamp of approval on educational practices only after they have survived R.F.T.'s. Such a decision, by itself, would transform the world of education research and practice.

Yes, this would be a costly undertaking, involving hundreds of millions of dollars -- perhaps even more. But education is a large and vital sector of our society, in which we now spend more than $300-billion, often on practices that have not been systematically evaluated. So the eventual benefits would far outweigh the costs.

Some will object that children should not be subject to experimentation. But we are experimenting on them now, when we subject them to reforms that we implement before careful study, relying instead on the hunches or ideological preferences of legislators or members of school boards, or the theories of prominent educators. Advances in science have always depended on disciplined inquiry. In education, just as in medicine, we need hard facts, not untested guesses.