Standardized tests, an American obsession

With No Child Left Behind, there is an increasing focus on standardized testing as a means of assessment. This is not a new phenomenon, nor is it unique to the public school system. At every turn, there is a test waiting to be taken, whether for school, college or the workplace.

Some have claimed that our public school system’s obsession with test scores is part of a larger scheme for the central government to take over education and institue a “cradle-to-grave” planned economy. While these tendencies certainly exist, an overemphasis on this ignores a deeper problem lying at the heart of our culture and why we accept and even embrace education plans with a high emphasis on test scores and accountability. It also overlooks the fact that American parents overwhelmingly support (PDF) these tests. Within the American psyche is a basic, albeit flawed, equation that states: “test scores = ability.”

Ironically, actual performance is routinely overlooked in favor of test scores in attempting to predict a candidate’s subject mastery, readiness for higher education or ability to perform in the workforce.

What do these tests actually measure?

There are essentially two issues in testing which every test designer, administrator and evaluator must take into consideration: validity and reliability. Validity essentially determines whether a test measures what it intends to measure. Reliability tells you how likely the results are to be repeated. Standardized tests have the allure of perceived validity because of their clear format and easily quantifiable scores. But how valid are they, really?

Take this item from New York’s practice test:

The year 1999 was a big one for the Williams sisters. In February, Serena won her first pro singles championship. In March, the sisters met for the first time in a tournament final. Venus won. And at doubles tennis, the Williams girls could not seem to lose that year.

The story says that in 1999, the sisters could not seem to lose at doubles tennis. This probably means when they played:

A. two matches in one day
B. against each other
C. with two balls at once
D. as partners

Is this test measuring reading skills or tennis knowledge? A good reader could probably figure out the question, but the child with knowledge of the rules of tennis has a decided advantage.

And beyond isolated exam items, is the story much different? What are these tests really measuring? Peter Sacks, in his book, Standardized Minds, concludes that “scoring high on standardized tests is a good predictor of one’s ability to score high on standardized tests.” Research has not been able to correlate achievement on these tests with any future success in school or work. Some other facts pointed out in his book:

  • There is almost no relationship between scores obtained on the GRE and performance in graduate school.
  • A student’s high school record is the best predictor of early college success. Adding the results of the SAT to this adds little to the prediction of college performance.
  • There is a strong correlation between standardized test scores and and the socioeconomic status of the parents. “The data is so strong in this regard that one could make a good guess about a child’s standardized test scores by simply looking at how many degrees her parents have and what kind of car they drive.”

And with little or no measurable benefit to students or to predicting future success, these test have been allowed to shape instruction. According to educational researcher Bruce C. Bowers: (PDF)

However, the main purpose of standardized testing is to sort large numbers of students in as efficient a manner as possible. This limited goal, quite naturally, gives rise to short-answer, multiple-choice questions. When tests are constructed in this manner, active skills, such as writing, speaking, acting, drawing, constructing, repairing, or any of a number of other skills that can and should be taught in schools are automatically relegated to second-class status.

But we ignore the research and continue to test in the same flawed manner.

