Disaggregated data refers to numerical or non-numerical information that has been (1) collected from multiple sources and/or on multiple measures, variables, or individuals; (2) compiled into aggregate data—i.e., summaries of data—typically for the purposes of public reporting or statistical analysis; and then (3) broken down in component parts or smaller units of data. For example, information about whether individual students graduated from high school can be compiled and summarized into a single graduation rate for a school or a graduating class, and annual graduation rates for individual schools can then be aggregated into graduation rates for districts, states, and countries. Graduation rates can then be disaggregated to show, for example, the percentage of male and female students, or white and non-white students, who graduated. Generally speaking, data is disaggregated for the purpose of revealing underlying trends, patterns, or insights that would not be observable in aggregated data sets, such as disparities in standardized-test scores or enrollment patterns across different categories of students, for example.
While most disaggregated education data is numerical, it’s both possible and common to disaggregate non-numeric information. For example, educators, students, and parents in a school district may be surveyed on a topic, and the information and comments from those surveys could then be aggregated into a report that shows what the three groups—educators, students, and parents—collectively think and feel about the issue. The compiled information could then be disaggregated and reported for each distinct group to compare differences in how educators, students, and parents perceive the issue. Information collected during polls, interviews, and focus groups can be aggregated and disaggregated in a similar fashion.
To further illustrate the concept of disaggregated data and how it may be used in public education, consider a school with an enrollment of 500 students, which means the school maintains 500 student records, each of which contains a wide variety of information about the enrolled students—for example, first and last name, home address, date of birth, racial or ethnic identification, date and period of enrollment, courses taken and completed, course-grades earned, test scores, etc. (the information collected and maintained on individual students is often called student-level data, among other terms). Once or twice a year, the school district may be required to submit student-enrollment reports to their state department of education. Each school in the district will then compile a report that documents the number of students currently enrolled in the school and in each grade level, which requires administrators to summarize data from all their individual student records to produce the enrollment reports. The district now has aggregate enrollment information about the students attending its schools. Over the next five years, the school district could use these annual reports to analyze increases or declines in district-wide enrollment, enrollment at each school, or enrollment at each grade level. The district could not, however, determine whether there have been increases or declines in the enrollment of white and non-white students based on the aggregate data it received from its schools. To produce a report showing distinct enrollment trends for different races and ethnicities, for example, the district schools would then need to disaggregate the enrollment information by racial and ethnic subgroups.
Aggregated vs. Disaggregated Data
To aggregate data is to compile and summarize data; to disaggregate data is to break down aggregated data into component parts or smaller units of data. While this distinction between aggregated and disaggregated data may appear straightforward, there is a nuance worth discussing here: a lot of “disaggregated” data in education is actually data that has been technically aggregated, at some level, from records maintained on individual students. For example, graduation rates are widely considered to be “aggregate data,” while graduation rates reported for different subgroups of students—say, for students of different races and ethnicities—is typically considered to be “disaggregated data.” Yet to produce reports that disaggregate graduation rates by race and ethnicity, data on individual students actually has to be “aggregated” to produce summary graduation rates for different racial subgroups. Most likely, this distinction between aggregated and disaggregated data arose because, historically, only aggregated data on school-wide, district-wide, or statewide educational performance was readily or publicly available. When investigating or reporting on topics such as aggregate data or disaggregated data, it is important to determine precisely how the terms are being used in a particular context.
Before the early 2000s, most state education agencies and districts only collected aggregate data on students enrolled in public schools. Today, however, all 50 states in the United States have state-level systems that collect and maintain student-level data, not just aggregate records, which allows state education agencies to produce both aggregate and disaggregated reports on school and student performance (public-school districts typically collect student-level data from schools, and states collect student-level data from districts).
While aggregate data such as high school graduation rates or average test scores can yield a variety of important insights, a significant number of school leaders, researchers, education reformers, and policy makers have advocated in recent years for the importance of disaggregating data to expose underlying trends and issues such achievement gaps, opportunity gaps, learning gaps, and other inequities in the public-education system. If, for example, the only graduation data available are annual rates for individual schools, this aggregate data may hide significant disparities in graduation rates for students from low-income households, students of color, students with disabilities, or students who are not proficient in the English language. It’s possible for a school’s aggregate graduation rate to appear strong overall—say, 90 percent—but when the data are disaggregated for different groups of students, the disaggregation may reveal, for example, that more than 50 percent of the African American and Hispanic students in the school fail to graduate.
When data are disaggregated, educators also have more detailed information about the educational performance and learning needs of certain groups of students, which allows them to design more appropriate or effective educational experiences and academic support. For example, disaggregated data may help school leaders and educators to direct limited resources—such as funding, staff time, or social services—where they are needed most (i.e., to those groups of students who are the furthest behind, struggling the most academically, or at greatest risk of dropping out).
Generally speaking, the main purpose of collecting and reporting both aggregated and disaggregated data is to provide useful information about the performance of public schools and students to those who are monitoring schools or working to improve them. While both forms of data are essential to understanding how the public-education system is working, aggregate-data reports are generally limited to the identification of broader trends and patterns in education, while disaggregated data are more useful for diagnosing deeper underlying problems such as disparities in educational performance among different student groups.
In public education, aggregate data have been widely collected and publicly reported for decades, and for the most part the use of aggregate data has not been as controversial a topic in public education, primarily because aggregate data present far fewer concerns about student privacy than the collection, sharing, and use of data and personal information about specific students.
Although student safety, privacy, and confidentiality are more serious concerns with student-level data and personal information, disaggregated data may, in some cases, indirectly reveal the identities of specific students even when the data seemingly contains no personally identifiable information—i.e., information that might, directly or indirectly, reveal the identity or personal information of specific students. In some rural schools, for example, the minority student population may be very small—perhaps only one or two students of color in the entire school. If state or district records contain, say, test scores or proficiency levels for various racial subgroups, the identity of individual African American, Hispanic, or Asian students could be inadvertently revealed even though the disaggregated data are otherwise “anonymous” (by looking at the data, those who are familiar with the school, or who know who the minority students are, may be able to deduce which students earned which test scores, for example). For this reason, states, districts, and schools may mask or suppress (i.e., not publicly report or share) certain data when subgroups are small enough to potentially connect otherwise anonymous data to specific students.
For a more detailed discussion of related debates, see personally identifiable information.
The Glossary of Education Reform by Great Schools Partnership is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.