Value-added measures, or growth measures, are used to estimate or quantify how much of a positive (or negative) effect individual teachers have on student learning during the course of a given school year. To produce the estimates, value-added measures typically use sophisticated statistical algorithms and standardized-test results, combined with other information about students, to determine a “value-added score” for a teacher. School administrators may then use the score, usually in combination with classroom observations and other information about a teacher, to make decisions about tenure, compensation, or employment. Student growth measures are a related—but distinct—method of using student test scores to quantify academic achievement and growth, and they may also be used in the evaluation of teacher job performance (see discussion below).
Value-added measures employ mathematical algorithms in an attempt to isolate an individual teacher’s contribution to student learning from all the other factors that can influence academic achievement and progress—e.g., individual student ability, family income levels, the educational attainment of parents, or the influence of peer groups. If, for example, teacher effectiveness was determined simply by looking at student test scores at the end of a school year, then teachers with the most highly motivated students from the most educated households would likely get much higher ratings than teachers whose students have troubled home lives, significant learning disabilities, or limited English-language proficiency, for instance. In reality, the latter teacher could be more skilled and effective than the former, but the test scores of a student population that faces significant learning challenges might not accurately reflect the teacher’s abilities.
By only comparing a teacher’s effect on student learning against other teachers working with similar types of students, value-added systems attempt to avoid comparisons that would be perceived as unfair (although the fairness and reliability of such calculations is a matter of ongoing dispute). The measurements typically attempt to quantify how much more (or less) student achievement improved in comparison to what would be expected based on past test scores and personal and demographic factors. While value-added measures use many different algorithms and statistical methods to gauge teacher effectiveness, most of them consider the past test scores of a teacher’s students.
The following highly simplified example will serve to illustrate the general process: To obtain a value-added score for a sixth-grade teacher, the fourth- and fifth-grade test scores of every student in the class might be collected. A mathematical formula would factor in the test-score data alongside a variety of other information about the students, such as whether the students are on special-education plans or whether their parents dropped out of school, completed high school, or earned a college degree. The formula would generate projected sixth-grade test scores for each student, and then the sixth-grade scores would be compared to the predicted scores after students take the test. The teacher’s value-added score would be based on the average difference between the actual scores earned by students and the predicted scores. (Here’s another way of phrasing it: value-added measures consider the test-score trajectory of the students in a given teacher’s class, at the time they arrived in the class, while also controlling for non-teacher factors, to determine whether the teacher caused the trajectory to increase, decrease, or stay the same.)
Although the terms student-growth measures and student-growth percentiles are sometimes used interchangeably with value-added measures, the two approaches are technically quite different. Student-growth measures compare the relative change in a student’s performance on a specific test with the performance of all other students on that same test. The scores of all students are used to create an “index of student growth” and to identify a median achievement score that can be used as a point of comparison for all student scores—i.e., some students will show growth that is greater than the median, while others will show growth that is lower than the median. In contrast with value-added measures, student-growth measures do not attempt to control for outside factors that may influence a student’s relative improvement on a test, such as individual ability, family income, or the educational attainment of parents, for example.
Ongoing concerns about inadequate school performance, low standardized-test scores, and persistent achievement gaps, among other issues, have led school leaders, education reformers, elected officials, and policy makers to focus on improving the effectiveness of teachers in public schools. The basic rationale is that if schools can identify the most highly skilled teachers, improve the skills of lower-performing teachers, or remove ineffective teachers from the classroom, schools will be able to realize rapid improvements in test scores, graduation rates, and other indicators of performance.
In pursuit of this goal, some educators, researchers, and reformers have gravitated toward the idea that the most efficient or objective way to identify effective and ineffective teachers is to use student test scores as a primary indicator—and value-added measures are an extension of this general view. Consequently, educational experts, researchers, and statisticians have tried to create mathematical or more “scientific” models to isolate the impact of individual teachers from the numerous personal, situational, cultural, and familial factors that might influence student achievement and test performance. And many recent reform proposals—by the federal government, state legislatures, districts, and national reform organizations—have sought to connect valued-added scores and student-growth percentiles to a variety of rewards and punishments, from increased school funding and teacher bonuses to penalties, poor performance ratings, and negative publicity.
The use of test-based, value-added measures in teacher-performance evaluations and compensation decisions is one of the most controversial and contentious issues in public education today. Many teachers unions, for example, have vigorously fought against policies and proposals that would make student test scores a factor in job-performance evaluations, and some have even gone on strike to protest certain proposals. Some educators argue that the practice is inherently unfair—because a wide variety of factors can influence test scores that are unrelated to teaching ability—and that the strategy will ultimately be self-defeating because it will drive good teachers away from working in the most high-need schools, or with the most high-need students, out of fear that they will receive poor ratings, be penalized with lower salaries, or even lose their jobs. Despite this resistance, some states have adopted, or are in the process of adopting, legislation and policies that will require a certain percentage of a teacher’s job-performance evaluation to be based on value-added scores and student-growth percentiles, with percentages that range from five or ten percent to a much as fifty percent. Yet, as with other applications of high-stakes testing, many educators, researchers, and experts have cautioned against using value-added ratings in isolation from other information to make important decisions about teachers.
The following are a few representative arguments that may be made by proponents of value-added measures:
- The need to improve school performance and educational results requires that the best teachers be identified and matched with the most needy students. Value-added measures offer an objective and consistent way to measure teacher effectiveness on a large scale.
- Current approaches to teacher evaluation have failed to distinguish effective from ineffective teachers. Many job-performance evaluations are highly subjective or flawed, which is why most teachers receive positive evaluations, even in cases where students are clearly underperforming. Value-added measures, even if they are not perfect, represent an improvement over existing systems.
- Struggling students cannot afford to spend years of schooling getting poor instruction from ineffective teachers—they will only fall further and further behind. The least-effective teachers must be given a reasonable opportunity to improve, and if they don’t they must be removed and replaced with better teachers. Value-added scores provide an objective measure for making these difficult decisions.
- Value-added measures are a more fair way to evaluate teachers, and the impact they have had on their students, than considering student test scores or achievement levels in isolation from other influencing factors.
- The best teachers deserve to be identified by objective measures so they can be acknowledged and rewarded.
- Concerns about value-added measures are overstated, since they have shown in some studies to be more accurate than other accepted and widely used teacher-evaluation methods, such as principal evaluations.
The following are a few representative arguments that may be made by critics of value-added measures:
- Value-added measures are ethically questionable, unproven, and not yet ready for real-world application. Evidence suggests that there is a significant risk that the measures will misidentify effective teachers as ineffective and ineffective teachers as effective. The potential consequences do not justify the risk.
- Research shows that out-of-school factors could account for up to eighty percent of the variation in student test scores, so it’s highly doubtful that value-added algorithms can accurately and reliably isolate the effect that an individual teacher has on student learning. The contributing issues are just too complex to be reduced to a single mathematical formula. Consequently, teachers are still liable to be blamed for factors that are beyond their control.
- The student-performance and testing data used in value-added calculations may be flawed or inaccurate, even if the value-added algorithm is considered sound. Since data can easily become corrupted by numerous factors (see measurement error), it’s ethically questionable to compensate or fire teachers on the basis of numbers that could be inaccurate or misleading.
- Basing teacher evaluations on test data is another high-stakes use of test results, and the method will likely contribute to or exacerbate the same problems associated with high-stakes testing, including cheating and teaching students only the narrow range of material evaluated on tests.
- Value-added scores for individual teachers can vary widely from year to year, rating them as excellent one year and ineffective the next, even when the teaching strategies they use remain consistent—which suggests that value-added measures can be imprecise or misrepresentative, with potentially significant consequences for teachers.
- Student-growth measures should not be used to rate teachers because they do not attempt to control for other influences on student achievement.