A standardized test is any form of test that (1) requires all test takers to answer the same questions, or a selection of questions from common bank of questions, in the same way, and that (2) is scored in a “standard” or consistent manner, which makes it possible to compare the relative performance of individual students or groups of students. While different types of tests and assessments may be “standardized” in this way, the term is primarily associated with large-scale tests administered to large populations of students, such as a multiple-choice test given to all the eighth-grade public-school students in a particular state, for example.
In addition to the familiar multiple-choice format, standardized tests can include true-false questions, short-answer questions, essay questions, or a mix of question types. While standardized tests were traditionally presented on paper and completed using pencils, and many still are, they are increasingly being administered on computers connected to online programs (for a related discussion, see computer-adaptive test). While standardized tests may come in a variety of forms, multiple-choice and true-false formats are widely used for large-scale testing situations because computers can score them quickly, consistently, and inexpensively. In contrast, open-ended essay questions need to be scored by humans using a common set of guidelines or rubrics to promote consistent evaluations from essay to essay—a less efficient and more time-intensive and costly option that is also considered to be more subjective. (Computerized systems designed to replace human scoring are currently being developed by a variety of companies; while these systems are still in their infancy, they are nevertheless becoming the object of growing national debate.)
While standardized tests are a major source of debate in the United States, many test experts and educators consider them to be a fair and objective method of assessing the academic achievement of students, mainly because the standardized format, coupled with computerized scoring, reduces the potential for favoritism, bias, or subjective evaluations. On the other hand, subjective human judgment enters into the testing process at various stages—e.g., in the selection and presentation of questions, or in the subject matter and phrasing of both questions and answers. Subjectivity also enters into the process when test developers set passing scores—a decision that can affect how many students pass or fail, or how many achieve a level of performance considered to be “proficient.” For more detailed discussions of these issue, see measurement error, test accommodations, test bias and score inflation.
Standardized tests may be used for a wide variety of educational purposes. For example, they may be used to determine a young child’s readiness for kindergarten, identify students who need special-education services or specialized academic support, place students in different academic programs or course levels, or award diplomas and other educational certificates. The following are a few representative examples of the most common forms of standardized test:
- Achievement tests are designed to measure the knowledge and skills students learned in school or to determine the academic progress they have made over a period of time. The tests may also be used to evaluate the effectiveness of a schools and teachers, or identify the appropriate academic placement for a student—i.e., what courses or programs may be deemed most suitable, or what forms of academic support they may need. Achievement tests are “backward-looking” in that they measure how well students have learned what they were expected to learn.
- Aptitude tests attempt to predict a student’s ability to succeed in an intellectual or physical endeavor by, for example, evaluating mathematical ability, language proficiency, abstract reasoning, motor coordination, or musical talent. Aptitude tests are “forward-looking” in that they typically attempt to forecast or predict how well students will do in a future educational or career setting. Aptitude tests are often a source of debate, since many question their predictive accuracy and value.
- College-admissions tests are used in the process of deciding which students will be admitted to a collegiate program. While there is a great deal of debate about the accuracy and utility of college-admissions tests, and many institutions of higher education no longer require applicants to take them, the tests are used as indicators of intellectual and academic potential, and some may consider them predictive of how well an applicant will do in postsecondary program.
- International-comparison tests are administered periodically to representative samples of students in a number of countries, including the United States, for the purposes of monitoring achievement trends in individual countries and comparing educational performance across countries. A few widely used examples of international-comparison tests include the Programme for International Student Assessment (PISA), the Progress in International Reading Literacy Study (PIRLS), and the Trends in International Mathematics and Science Study (TIMSS).
- Psychological tests, including IQ tests, are used to measure a person’s cognitive abilities and mental, emotional, developmental, and social characteristics. Trained professionals, such as school psychologists, typically administer the tests, which may require students to perform a series of tasks or solve a set of problems. Psychological tests are often used to identify students with learning disabilities or other special needs that would qualify them for specialized services.
Following a wide variety of state and federal laws, policies, and regulations aimed at improving school and teacher performance, standardized achievement tests have become an increasingly prominent part of public schooling in the United States. When focused on reforming schools and improving student achievement, standardized tests are used in a few primary ways:
- To hold schools and educators accountable for educational results and student performance. In this case, test scores are used as a measure of effectiveness, and low scores may trigger a variety of consequences for schools and teachers. For a more detailed discussion see high-stakes test.
- To evaluate whether students have learned what they are expected to learn, such as whether they have met state learning standards. In this case, test scores are seen as a representative indicator of student achievement.
- To identify gaps in student learning and academic progress. In this case, test scores may be used, along with other information about students, to diagnose learning needs so that educators can provide appropriate services, instruction, or academic support.
- To identify achievement gaps among different student groups, including students of color, students who are not proficient in English, students from low-income households, and students with physical or learning disabilities. In this case, exposing and highlighting achievement gaps may be seen as an essential first step in the effort to educate all students well, which can lead to greater public awareness and changes in educational policies and programs.
- To determine whether educational policies are working as intended. In this case, elected officials and education policy makers may rely on standardized-test results to determine whether their laws and policies are working or not, or to compare educational performance from school to school or state to state. They may also use the results to persuade the public and other elected officials that their policies are in the best interest of children and society.
While debates about standardized testing are wide-ranging, nuanced, and sometimes emotionally charged, many debates tend to be focused on the ways in which the tests are used, and whether they present reliable or unreliable evaluations of student learning, rather than on whether standardized testing is inherently good or bad (although there is certainly debate on this topic as well). Most test developers and testing experts, for example, caution against using standardized-test scores as an exclusive measure of educational performance, although many would also contend that test scores can be a valuable indicator of performance if used appropriately and judiciously. Generally speaking, standardized testing is more likely to become an object of debate and controversy when test scores are used to make consequential decisions about educational policies, schools, teachers, and students. The tests are less likely to be contentious when they are used to diagnose learning needs and provide students with better services—although the line separating these two purposes is notoriously fuzzy in practice (thus, the ongoing debates).
While an exhaustive discussion of standardized-testing debates is beyond the scope of this resource, the following questions will illustrate a few of the major issues commonly discussed and debated in the United States:
- Are numerical scores on a standardized test misleading indicators of student learning, since standardized tests can only evaluate a narrow range of achievement using inherently limited methods? Or do the scores provide accurate, objective, and useful evidence of school, teacher, or student performance? (Standardized tests don’t measure everything students are expected to learn in school. A test with 50 multiple-choice questions, for example, can’t possibly measure all the knowledge and skills a student was taught, or is expected to learn, in a particular subject area, which is one reason why some educators and experts caution against using standardized-test scores as the only indicator of educational performance and success.)
- Are standardized tests fair to all students because every student takes the same test and is evaluated in the same way? Do the tests have inherent biases that may disadvantage certain groups, such as students of color, students who are unfamiliar with American cultural conventions, students who are not proficient in English, or students with disabilities that may affect their performance?
- Is the use of standardized tests providing valuable information that educators and school leaders can use to improve instructional quality? Is the pervasive overuse of testing actually taking up valuable instructional time that could be better spent teaching students more content and skills?
- Do the benefits of standardized testing—consistent data on school and student performance that can be used to inform efforts to improve schools and teaching—outweigh the costs—the money spent on developing the tests and analyzing the results, the instructional time teachers spend prepping students, or the time students spend taking the test?
- Do math and reading test scores, for example, provide a full and accurate picture of school, teacher, and student performance? Do standardized tests focus too narrowly on a few academic subjects?
- Does the narrow range of academic content evaluated by standardized tests cause teachers to focus too much on test preparation and a few academic subjects (a practice known as “teaching to the test”) at the expense of other worthwhile educational pursuits, such as art, music, health, physical education, or 21st century skills, for example?
- Do standardized tests, and the consequences attached to low scores, hold schools, educators, and students to higher standards and improve the quality of public education? Do the tests create conditions that undermine effective education, such as cheating, unhealthy forms of competition, or unjustly negative perceptions of public schooling?
- Should some of the most important decisions in public education—such as whether to reduce or increase school funding or fire teachers and principals—be made entirely or primarily on the basis of test scores? Are standardized-test scores, which could potentially be misleading or inaccurate, too limited a measure to use as a basis for such consequential decisions?
The Glossary of Education Reform by Great Schools Partnership is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.