Norm-Referenced Test

LAST UPDATED:

Norm-referenced refers to standardized tests that are designed to compare and rank test takers in relation to one another. Norm-referenced tests report whether test takers performed better or worse than a hypothetical average student, which is determined by comparing scores against the performance results of a statistically selected group of test takers, typically of the same age or grade level, who have already taken the exam.

Calculating norm-referenced scores is called the “norming process,” and the comparison group is known as the “norming group.” Norming groups typically comprise only a small subset of previous test takers, not all or even most previous test takers. Test developers use a variety of statistical methods to select norming groups, interpret raw scores, and determine performance levels.

Norm-referenced scores are generally reported as a percentage or percentile ranking. For example, a student who scores in the seventieth percentile performed as well or better than seventy percent of other test takers of the same age or grade level, and thirty percent of students performed better (as determined by norming-group scores).

Norm-referenced tests often use a multiple-choice format, though some include open-ended, short-answer questions. They are usually based on some form of national standards, not locally determined standards or curricula. IQ tests are among the most well-known norm-referenced tests, as are developmental-screening tests, which are used to identify learning disabilities in young children or determine eligibility for special-education services. A few major norm-referenced tests include the California Achievement Test, Iowa Test of Basic Skills, Stanford Achievement Test, and TerraNova.

The following are a few representative examples of how norm-referenced tests and scores may be used:

  • To determine a young child’s readiness for preschool or kindergarten. These tests may be designed to measure oral-language ability, visual-motor skills, and cognitive and social development.
  • To evaluate basic reading, writing, and math skills. Test results may be used for a wide variety of purposes, such as measuring academic progress, making course assignments, determining readiness for grade promotion, or identifying the need for additional academic support.
  • To identify specific learning disabilities, such as autism, dyslexia, or nonverbal learning disability, or to determine eligibility for special-education services.
  • To make program-eligibility or college-admissions decisions (in these cases, norm-referenced scores are generally evaluated alongside other information about a student). Scores on SAT or ACT exams are a common example.

Norm-Referenced vs. Criterion-Referenced Tests

Norm-referenced tests are specifically designed to rank test takers on a “bell curve,” or a distribution of scores that resembles, when graphed, the outline of a bell—i.e., a small percentage of students performing well, most performing average, and a small percentage performing poorly. To produce a bell curve each time, test questions are carefully designed to accentuate performance differences among test takers, not to determine if students have achieved specified learning standards, learned certain material, or acquired specific skills and knowledge. Tests that measure performance against a fixed set of standards or criteria are called criterion-referenced tests.

Criterion-referenced test results are often based on the number of correct answers provided by students, and scores might be expressed as a percentage of the total possible number of correct answers. On a norm-referenced exam, however, the score would reflect how many more or fewer correct answers a student gave in comparison to other students. Hypothetically, if all the students who took a norm-referenced test performed poorly, the least-poor results would rank students in the highest percentile. Similarly, if all students performed extraordinarily well, the least-strong performance would rank students in the lowest percentile.

It should be noted that norm-referenced tests cannot measure the learning achievement or progress of an entire group of students, but only the relative performance of individuals within a group. For this reason, criterion-referenced tests are used to measure whole-group performance.

Reform

Norm-referenced tests have historically been used to make distinctions among students, often for the purposes of course placement, program eligibility, or school admissions. Yet because norm-referenced tests are designed to rank student performance on a relative scale—i.e., in relation to the performance of other students—norm-referenced testing has been abandoned by many schools and states in favor of criterion-referenced tests, which measure student performance in relation to common set of fixed criteria or standards.

It should be noted that norm-referenced tests are typically not the form of standardized test widely used to comply with state or federal policies—such as the No Child Left Behind Act—that are intended to measure school performance, close “achievement gaps,” or hold schools accountable for improving student learning results. In most cases, criterion-referenced tests are used for these purposes because the goal is to determine whether schools are successfully teaching students what they are expected to learn.

Similarly, the assessments being developed to measure student achievement of the Common Core State Standards are also criterion-referenced exams. However, some test developers promote their norm-referenced exams—for example, the TerraNova Common Core—as a way for teachers to “benchmark” learning progress and determine if students are on track to perform well on Common Core–based assessments.

Debate

While norm-referenced tests are not the focus of ongoing national debates about “high-stakes testing,” they are nonetheless the object of much debate. The essential disagreement is between those who view norm-referenced tests as objective, valid, and fair measures of student performance, and those who believe that relying on relative performance results is inaccurate, unhelpful, and unfair, especially when making important educational decisions for students. While part of the debate centers on whether or not it is ethically appropriate, or even educationally useful, to evaluate individual student learning in relation to other students (rather than evaluating individual performance in relation to fixed and known criteria), much of the debate is also focused on whether there is a general overreliance on standardized-test scores in the United States, and whether a single test, no matter what its design, should be used—in exclusion of other measures—to evaluate school or student performance.

It should be noted that perceived performance on a standardized test can potentially be manipulated, regardless of whether a test is norm-referenced or criterion-referenced. For example, if a large number of students are performing poorly on a test, the performance criteria—i.e., the bar for what is considered “passing” or “proficient”—could be lowered to “improve” perceived performance, even if students are not learning more or performing better than past test takers. For example, if a standardized test administered in eleventh grade uses proficiency standards that are considered to be equivalent to eighth-grade learning expectations, it will appear that students are performing well, when in fact the test has not measured learning achievement at a level appropriate to their age or grade. For this reason, it is important to investigate the criteria used to determine “proficiency” on any given test—and particularly when a test is considered “high stakes,” since there is greater motivation to manipulate perceived test performance when results are tied to sanctions, funding reductions, public embarrassment, or other negative consequences.

The following are representative of the kinds of arguments typically made by proponents of norm-referenced testing:

  • Norm-referenced tests are relatively inexpensive to develop, simple to administer, and easy to score. As long as the results are used alongside other measures of performance, they can provide valuable information about student learning.
  • The quality of norm-referenced tests is usually high because they are developed by testing experts, piloted, and revised before they are used with students, and they are dependable and stable for what they are designed to measure.
  • Norm-referenced tests can help differentiate students and identify those who may have specific educational needs or deficits that require specialized assistance or learning environments.
  • The tests are an objective evaluation method that can decrease bias or favoritism when making educational decisions. If there are limited places in a gifted and talented program, for example, one transparent way to make the decision is to give every student the same test and allow the highest-scoring students to gain entry.

The following are representative of the kinds of arguments typically made by critics of norm-referenced testing:

  • Although testing experts and test developers warn that major educational decisions should not be made on the basis of a single test score, norm-referenced scores are often misused in schools when making critical educational decisions, such as grade promotion or retention, which can have potentially harmful consequences for some students and student groups.
  • Norm-referenced tests encourage teachers to view students in terms of a bell curve, which can lead them to lower academic expectations for certain groups of students, particularly special-needs students, English-language learners, or minority groups. And when academic expectations are consistently lowered year after year, students in these groups may never catch up to their peers, creating a self-fulfilling prophecy. For a related discussion, see high expectations.
  • Multiple-choice tests—the dominant norm-referenced format—are better suited to measuring remembered facts than more complex forms of thinking. Consequently, norm-referenced tests promote rote learning and memorization in schools over more sophisticated cognitive skills, such as writing, critical reading, analytical thinking, problem solving, or creativity.
  • Overreliance on norm-referenced test results can lead to inadvertent discrimination against minority groups and low-income student populations, both of which tend to face more educational obstacles that non-minority students from higher-income households. For example, many educators have argued that the overuse of norm-referenced testing has resulted in a significant overrepresentation of minority students in special-education programs. On the other hand, using norm-referenced scores to determine placement in gifted and talented programs, or other “enriched” learning opportunities, leads to the underrepresentation of minority and lower-income students in these programs. Similarly, students from higher-income households may have an unfair advantage in the college-admissions process because they can afford expensive test-preparation services.
  • An overreliance on norm-referenced test scores undervalues important achievements, skills, and abilities in favor of the more narrow set of skills measured by the tests.
Most PopularMost RecentMost SharedSynonymsAbbreviations