A high-stakes test is any test used to make important decisions about students, educators, schools, or districts, most commonly for the purpose of accountability—i.e., the attempt by federal, state, or local government agencies and school administrators to ensure that students are enrolled in effective schools and being taught by effective teachers. In general, “high stakes” means that test scores are used to determine punishments (such as sanctions, penalties, funding reductions, negative publicity), accolades (awards, public celebration, positive publicity), advancement (grade promotion or graduation for students), or compensation (salary increases or bonuses for administrators and teachers).
High Stakes vs. Low Stakes
A “low-stakes test” would be used to measure academic achievement, identify learning problems, or inform instructional adjustments, among other purposes. What distinguishes a high-stakes test from a low-stakes test is not its form (how the test is designed) but its function (how the results are used). For example, if test results are used to determine an important outcome, such as whether a student receives a high school diploma, the test would be considered a high-stakes test regardless of whether it’s a multiple-choice exam, an oral exam, or an essay exam. Low-stakes tests generally carry no significant or public consequences—the results typically matter far more to an individual teacher or student than to anyone else—and scores are not used to burnish or tarnish the reputation of teachers or schools.
While high-stakes tests come in many forms and may be used for a wide variety of purposes, the following provide a brief overview of a few representative applications of high-stakes testing:
- Students: Test results may be used to determine whether students advance to the next grade level or whether they receive a diploma. For example, a growing number of states require students to pass a reading test to advance from third grade to fourth grade, while others require students to pass a test to graduate from high school.
- Educators: Test results may be used in the job-performance evaluations of teachers or to determine professional compensation. For example, in recent years more school reformers, elected officials, and policy makers have been calling for teacher pay (including bonuses), as well as hiring, firing, and tenure decisions, to be partly based on student test scores. For a related discussion, see value-added measures.
- Schools: Tests results may be used to trigger penalties for schools, including negative public ratings, the replacement of staff members, or even closure. For example, some federal and state policies require that test results be used to impose a variety of consequences, such as firing or transferring some or all of a school’s administrators and faculty, or forcing a school to pay for additional services and transportation costs for students. In addition, standardized-test scores are also increasingly being used, along with other measures, in various state and independent efforts to assign A–F letter grades to schools.
As a school-reform mechanism, the use of high-stakes testing is generally motivated by the belief that the promise of rewards or the threat of punishment will motivate and incentivize educators to improve school performance, teaching effectiveness, and student achievement. By attaching rewards and punishments to tests scores, the reasoning goes, students, teachers, and school administrators will take the tests seriously, make personal or organizational changes, and put in the necessary effort to improve scores. (It should be noted that this rationale is among the most contentious issues in education today, and research on human motivation suggests that such incentives and punishments may not work as intended. For a more detailed discussion, see Debate below.)
Another major rationale for high-stakes testing is that scores can be used to hold schools and teachers accountable for providing a high-quality education to all students, including student groups that may have historically underperformed academically or been underserved by schools, such as students who live in high-poverty communities or troubled urban areas, or students of color, students with special needs, and students from low-income households. In this case, high-stakes testing is related to the concept of equity—fairness in education, access to learning opportunities, and greater equality in educational achievement, attainment, and benefits—and the strategy is broadly motivated by the desire to close learning gaps, achievement gaps, and opportunity gaps.
Another common feature of high-stakes testing is the public reporting of test results. While individual student scores remain confidential, average or aggregate test scores for schools, districts, and states are commonly reported in public forums, and they tend to receive widespread attention from parents, the media, and the general public. Given that schools are public institutions supported by tax revenues, the public reporting of test results is generally motivated by the belief that school performance should be transparent and publicly known, policies and government agencies should regulate schools and ensure quality, and parents and the public have the right to know when a school is underperforming and should have the opportunity to advocate for improvements.
While high-stakes tests are used for a wide variety of purposes, the following descriptions provide a few representative examples of the major ways in which high-stakes testing is used to influence the performance of students, educators, and schools:
- School and school-system reform: Perhaps the most widely discussed example of high-stakes testing is the 2001 federal law commonly known as the No Child Left Behind Act, which is considered one of the most far-reaching efforts to use high-stakes tests to drive the improvement of schools, teaching quality, and student achievement. In brief, the law—as it was originally designed, at least—mandates that each state develop learning standards and standardized tests to track school performance. The tests are administered at multiple grade levels to measure how well students are meeting the standards. The law also require that test results be tracked and reported separately for different “subgroups” of students, such as minority groups, students from low-income households, students with special needs, and students with limited proficiency in English. By publicly reporting the test scores achieved by different schools and student groups, and by tying those scores to rewards, penalties, and funding, the law aims to improve schools deemed to be underperforming—as determined by test scores—and close long-standing achievement gaps. To hold schools accountable for improvement, schools and districts have to report test results for a variety of student subgroups, such as African American and Hispanic students, not just average results, which could mask differing levels of achievement among student groups. Schools and districts are then required to show that they are making progress toward “proficiency,” which is defined by each state and measured by state-developed standardized tests. Failing to achieve proficiency can trigger a variety of penalties, sanctions, and potential funding consequences, including the stigma of being labeled a “low-performing school.” For each year that a school fails to meet a state’s benchmarks for improvement, the stakes are raised and sanctions may become more severe. Ultimately, a low-performing school might be closed, converted to a charter school, put under the management of a private company, or taken over by a state department of education, among other possible outcomes. While the No Child Left Behind Act is one of the most controversial and contentious educational policies in recent history, and the technicalities of the legislation are highly complex and still evolving (for example, the U.S. Department of Education recently granted states the option of submitting proposals that, if approved, would waive certain provisions in the law), it is one widely discussed example of high-stakes testing being used to influence school, teacher, and student performance.
- High school graduation: Generally speaking, widespread concerns about whether high school graduates lack the education and skills necessary to succeed in college, modern workplaces, and adult life have motivated calls for high-stakes graduation tests. The basic rationale is that diplomas should represent readiness for postsecondary learning and careers, and that students should not be allowed to earn a diploma if they haven’t acquired sufficient skills and knowledge. Citing complaints and evidence (such as surveys of college instructors and employers) that many public-school graduates are not well prepared, some reformers, elected officials, and policy makers have sought to base the awarding of high school diplomas on test results, which are perceived—by advocates—to be sufficient proof that students are prepared to move on. Many states have passed policies related to graduation tests, and some states have tried different approaches to address concerns about the fairness of such exams. For example, some states allow students who fail to retake tests multiple times and students with disabilities to demonstrate proficiency in alternate ways. For related discussions, see measurement error, test accommodations, and test bias.
- Grade promotion: To help ensure that students are not simply moved on from one grade level to the next without acquiring the skills they will need to succeed academically, test results are sometimes used to determine whether students will be promoted in their education. Reading ability in the elementary grades, for example, is often the focus of these score-based promotion policies.
- Teacher evaluation: Student test results are being factored in to teacher evaluations as part of a wide-ranging effort to reward “effective” teachers and to either support or penalize “ineffective” teachers. In recent years, many states have changed teacher-evaluation policies and systems to make student test scores a “significant” factor in the evaluation process. As result, student test results are being factored into teacher evaluations, potentially influencing decisions related to compensation, tenure, hiring, and firing.
High-stakes testing is one of the most controversial and contentious issues in education today, and the technicalities of the debates are both highly complex and continually evolving. While a comprehensive discussion of this debate is beyond the scope of this resource, the following brief descriptions provide an illustrative overview of the major arguments commonly made for and against high-stakes testing.
Proponents of high-stakes testing may argue that the practice:
- Holds teachers accountable for ensuring that all students learn what they are expected to learn. While no single test can measure whether students have achieved all state learning standards (standardized tests can measure only a fraction of these standards), test scores are nevertheless one method used to determine whether students are learning at a high level.
- Motivates students to work harder, learn more, and take the tests more seriously, which can promote higher student achievement.
- Establishes high expectations for both educators and students, which can help reverse the cycles of low educational expectations, achievement, and attainment that have historically disadvantaged some student groups, particularly students of color, and that have characterized some schools in poorer communities or more troubled urban areas.
- Reveals areas of educational need that can be targeted for reform and improvement, such as programs for students who may be underperforming academically or being underserved by schools.
- Provides easily understandable information about school and student performance—in the form of numerical test scores—that reformers, educational leaders, elected officials, and policy makers can use to develop new laws, regulations, and school-improvement strategies.
- Gives parents, employers, colleges, and others more confidence that students are learning at a high level or that high school graduates have acquired the skills they will need to succeed in adulthood.
Opponents of high-stakes testing may argue that the practice:
- Forces educators to “teach to the test”—i.e., to focus instruction on the topics that are most likely to be tested, or to spend valuable instructional time prepping students for tests rather than teaching them knowledge and skills that may be more important.
- Promotes a more “narrow” academic program in schools, since administrators and teachers may neglect or reduce instruction in untested—but still important—subject areas such as art, health, music, physical education, or social studies, for example.
- May contribute to higher, or even much higher, rates of cheating among educators and students, including coordinated, large-scale cheating schemes perpetrated school administrators and teachers who are looking to avoid the sanctions and punishments that result from poor test performance. Systematically changing test answers, displaying correct answers for students while they are taking the test, and disproportionately targeting historically low-performing students for expulsion are just a few examples taken from recent scandals.
- Has been correlated in some research studies to increased failure rates, lower graduation rates, and higher dropout rates, particularly for minority groups, students from low-income households, students with special needs, and students with limited proficiency in English.
- May diminish the overall quality of teaching and learning for the same disadvantaged students who are the intended beneficiaries of high-stakes testing. Because of strong pressure on schools and teachers to improve test results and avoid score-based penalties, students of color and students from lower-income households and communities may be more likely to receive narrowly focused, test-preparation-heavy instruction instead of an engaging, challenging, well-rounded academic program.
- Exacerbates negative stereotypes about the intelligence and academic ability of minority students, who may worry so much about confirming negative racial stereotypes that they underperform on important exams (a phenomenon generally known as “stereotype threat”). And if such students perform so poorly that they fail a high-stakes graduation test, the failure will only limit their opportunities in higher education or future employment, which only perpetuates and reinforces the conditions that give rise to stereotypes.