Archive for July, 2015

Student Subgroup

LAST UPDATED:

In education, student subgroup generally refers to any group of students who share similar characteristics, such as gender identification, racial or ethnic identification, socioeconomic status, physical or learning disabilities, language abilities, or school-assigned classifications (e.g., special-education students). While “student subgroup” may be applied informally to any number of locally defined groups of students, the term typically refers to specific categories of students defined in federal and state legislation (and related rules and regulations) or used in data-collection processes, public reporting, research studies, statistical analyses, and other formal governmental or academic mechanisms employed to track the educational performance and attainment of particular groups of students.

In the United States, however, the term student subgroup is predominantly associated with a specific set of federally defined student subgroups for which public-education data are collected and reported by schools, districts, and state education agencies in accordance with requirements outlined in the 2002 No Child Left Behind Act. The law requires states to publish annual public reports on the educational performance of students across several distinct subgroup classifications outlined in Section 1111 of the Elementary and Secondary Education Act: economically disadvantaged students, students from major racial and ethnic groups, students with disabilities, and students with limited English proficiency.

Because the student subgroups widely used in public education and related data reporting are typically determined by legislation and regulatory guidance, it should be noted that student subgroups are (1) subject to regular modification or redefinition when applicable laws, rules, or regulations change, and are (2) defined in complex technical documentation that may be difficult to parse and interpret, even for specialists in the field. For these reasons, it is important to determine precisely how a student subgroup is being used or defined—and why—when investigating or reporting on the topic.

The following section provides a brief overview of a few of the most common student subgroups used in public education:

  • Gender Subgroups: The two gender subgroups widely used in public education are male and female. While historically these student subgroups have not been controversial, growing awareness of and sensitivity to students identifying as transgender poses potential complications for this approach to subgroup classification.
  • Racial and Ethnic Subgroups: When the law was originally passed, the No Child Left Behind Act required states to report data for the following racial and ethnic subgroups: (1) African American or Black, (2) American Indian or Alaska Native, (3) Asian or Pacific Islander, (4) Hispanic, and (5) White. Following changes in federal reporting guidelines for racial and ethnic data in 2007, a new subgroup of “two or more races” was introduced, among other modifications. Students who identify themselves as being of more than one major racial or ethnic group are now reported as part of this subgroup by state education agencies (and cannot be counted as part of any other racial or ethnic subgroup). In addition to the racial and ethnic subgroups required by the No Child Left Behind Act and used for the purposes of official federal reporting, some states or school districts may choose to collect and report education data on other racial and ethnic subgroups for which they have statistically large student populations. For example, Filipino, Puerto Rican, or Hmong students may be reported separately by some districts or states, typically because they have large populations of these students and they want to track and monitor student achievement the groups.
  • Students with Disabilities Subgroup: Any student with an Individualized Education Program, as defined by the Individuals with Disabilities Education Act, is reported in the “students with disabilities” subgroup. Students are counted as part of this subgroup for the entire time they are receiving special-education services in a public school and for two years after exiting a special-education program.
  • Students with Limited English Proficiency Subgroup: Students who are classified by their school as “limited English proficient,” often abbreviated as LEP, are reported in this subgroup. In general, districts and schools will use English-language tests or other forms of assessment to determine whether students are proficient in the English language. Students who have been designated as limited English proficient may continue to be counted in this subgroup for two years after they are deemed proficient in English. For a more detailed discussions of this topic, see English-language learner, long-term English learner, and the U.S. Department of Education’s guidance on limited English proficient students.
  • Economically Disadvantaged Subgroup: Historically, schools, districts, and governmental agencies have defined students as “economically disadvantaged” based on their eligibility to receive free or reduced-price lunch under the National School Lunch Program. In light of recent changes to the administrative guidelines for the program, however, which may result in more schools providing all students with free lunches regardless of eligibility, schools, districts, and state education agencies may have to consider alternative mechanisms to monitor economically disadvantaged student populations in the future.
  • Migrant Subgroup: Students are assigned “migrant status” when a parent or guardian’s principal means of livelihood is migratory work, typically in the agricultural or fishing industries. Migrant students move frequently from one school district to another as their parent or guardian obtains temporary or seasonal employment. The U.S. Department of Education’s Migrant Education Program oversees the relevant regulations and definitions for this student subgroup.

Reform

Before the No Child Left Behind Act became law in 2002, most state education agencies and school districts only collected aggregate data on students enrolled in public schools—i.e., data on the overall performance of all students in a given school or district. Today, however, all 50 states in the United States have systems that collect and maintain student-level data, not just aggregate records, which allows state education agencies to produce both aggregate and disaggregated reports on school and student performance. Specifically, states can now report on the academic achievement and educational attainment across the major student subgroups described above.

While data such as high school graduation rates or average test scores can yield a variety of important insights, a significant number of school leaders, researchers, education reformers, and policy makers have advocated in recent years for the importance of collecting, tracking, and monitoring data on student subgroups for the purpose of exposing underlying trends and issues such as achievement gaps, opportunity gaps, learning gaps, and other inequities in the public-education system. If, for example, the only graduation information available are annual rates for schools, this data may hide significant disparities in graduation rates for students from low-income households, students of color, students with disabilities, or students who are not proficient in the English language. It’s possible for a school’s graduation rate to appear strong overall—say, 90 percent—but when the data are disaggregated for different student subgroups, the different graduation rates may reveal, for example, that more than 50 percent of the African American and Hispanic students in the school fail to graduate, or that only 25 percent of English-language learners earn a diploma.

When data are reported for different student subgroups, educators also have more detailed information about the educational performance and learning needs of specific groups of students, which allows them to design more appropriate or effective educational experiences and academic support. For example, student-subgroup data may help school leaders and educators to direct limited resources—such as funding, staff time, or social services—where they are needed most (i.e., to those groups of students who are the furthest behind, struggling the most academically, or at greatest risk of dropping out).

Generally speaking, the primary purpose of collecting and reporting data on different student subgroups is to provide useful information about the performance of public schools and students to those who are monitoring public schools or working to improve them. While both aggregate and subgroup data are essential to understanding how the public-education system is working, district-level or school-level reports (i.e., aggregate data) are generally limited to the identification of broader trends and patterns in education, while subgroup data is used to identify deeper underlying problems—specifically, disparities in educational performance and attainment across different student groups.

Debate

While the use of student subgroups is generally not the objective of significant debate in public education (most educators, school leaders, policy makers, and reformers typically support the practice), the act of classifying and sorting individuals into broad groups tends to give rise to some level of debate or controversy. For example, a student’s gender, racial, or ethnic identification may not easily fit into or be accurately described by existing student subgroups, and consequently discussion, debate, or dispute may arise when students identify as transgender or mixed race.

In addition, social stigma associated with poverty, disability, language ability, or citizenship status—and the broader political and societal debates about these issues—may also intersect in a variety of ways with the definition, classification, and public reporting of student subgroups in education. For example, given the culturally sensitive and often ideologically contentious nature of the peripheral issues raised by the participation of non-English-speaking students in the American public-education system—including politicized debates related to citizenship status, English primacy, immigration reform, and social-services eligibility for non-citizens—it is perhaps unsurprising that students who are not proficient in English, and the instructional methods used to educate them, can become a source of debate (e.g., a significant number of states have adopted “English as the official language” statutes, and citizen referendums have passed in other states prohibiting instruction in Spanish or other languages except in special cases—see dual-language education for a related discussion).

De-identified Data

LAST UPDATED:

In education, de-identified data generally refers to data from which all personally identifiable information has been removed—i.e., data about individual students, teachers, or administrators that has been rendered anonymous by stripping out any information that would allow people to determine an individual’s identity. Common forms of personally identifiable information include first and last names, home addresses, social security numbers, and other types of information that may reveal—advertently or inadvertently—an individual’s identity in a given set of data. The primary reason for “de-identifying” data is to protect the privacy or identity of the individuals associated with the data.

De-identified data are commonly used for research purposes in education. For example, a state education agency might hire an organization or university to study the results or impact of educational policy such as a recent expansion of state-subsidized pre-kindergarten programs. The researchers would then request the data they need to conduct the study (e.g., records showing the number of students enrolled in pre-kindergarten programs over a ten-year period), and the education agency would then assemble the necessary datasets. Before releasing the data files to the researchers, however, the agency would use a “de-identification process” to prevent individual identities from being revealed in the information provided to the external researcher. In many cases, the education agency and the research organization will also sign a formal agreement specifying how the data can be used and how files need to be disposed of once the study has been completed.

Data may also be de-identified when an education agency, district, or school shares information with external organizations and individuals not authorized to access or view personal information—for example, consultants and companies under contract to provide specialized services to districts and schools.

It is important to note that some datasets may indirectly reveal the identities of specific students or individuals even when the data seemingly contains no personally identifiable information. For example, some small, rural schools have very small minority student populations—perhaps only one or two students of color in the entire school. If state or school records contain, say, test scores or graduation rates for various racial subgroups, the identity of individual African American, Hispanic, or Asian students could inadvertently be revealed even though the data are otherwise “anonymous.” For this reason, states and schools may not publicly report or share certain data when subgroups are small enough to potentially connect otherwise anonymous data to specific students.

The most common strategies for de-identifying data are deleting all personal information in a data file and either “suppressing” or “masking” a selection of data so that the remaining information cannot be used to identify individuals. For more detailed discussions, see data masking and data suppression.

In addition, some de-identified datasets may contain what are often called “re-identification codes”—or random numbers assigned to individual records that have otherwise been stripped of personally identifiable information. Re-identification codes, for example, might allow researchers to match two anonymous datasets when conducting a study. Say a state education agency provides a set of data files to researchers who are studying whether a specific program resulted in academic gains for students. While conducting the study, the researchers determine that an additional year of data is needed to complete their analysis. The education agency may then use re-identification codes to “identify” the students in the original dataset (while still masking their personal identities), and then link those student records to the same students in the new dataset.

Data Suppression

LAST UPDATED:

In education, data suppression refers to the process of withholding or removing selected information—most commonly in public reports and datasets—to protect the identities, privacy, and personal information of individual students, teachers, or administrators. Data suppression is used whenever there is chance that the information contained in a publicly available report could be used to reveal or infer the identities of specific individuals.

Data Suppression vs. Data Masking
The terms “data suppression” and “data masking” refer to similar yet distinct processes—although in some cases the terms may be used interchangeably. When data are suppressed, the information is entirely removed or deleted, most commonly in files and reports that are publicly shared. When data are masked, the information is concealed from view or encrypted in a file, but the masked data remains encoded in the file or database and can be accessed (or “re-identified”) by those with the proper authorization codes or passwords. For a more detailed discussion, see data masking.

When sharing data publicly, or with third parties such as contractors or researchers, state education agencies and school districts are generally required to take steps to protect individual privacy. In addition to suppressing data that will directly reveal the identity of individuals, such as names and social-security numbers, education agencies will also modify datasets—e.g., by “suppressing” selected information—that may indirectly reveal the identities of specific students even when the data seemingly contains no personally identifiable information.

For example, some small, rural schools have very small minority student populations—perhaps only one or two students of color in the entire school. If state or school records contain, say, test scores or graduation rates for various racial subgroups, the identity of individual African American, Hispanic, or Asian students could be inadvertently revealed even though the data are otherwise “anonymous” (by looking at the data, those who are familiar with the school, or who know who the minority students are, may be able to deduce which students earned which test scores, for example). For this reason, states, districts, and schools may suppress—i.e., not publicly report or share—certain data when subgroups are small enough to potentially connect otherwise anonymous data to specific students.

Suppression may also be needed when reporting percentages. If a report shows that 100 percent, or zero percent, of students in a particular grade at a school scored at a certain level on a test, for example, any readers familiar with the school will have learned personal information about individual students.

State education agencies and districts will typically have policies on data suppression that outline what types of data need to be suppressed in specific situations. For example, a policy may require the suppression of data in public reports when any subgroups represent less than five students. Most public data reports will explain why certain data has been suppressed.

For related discussions, see de-identified data, personally identifiable information, student-level data, and unique student identifier.

Data Masking

LAST UPDATED:

In education, data masking refers to the process of concealing or encrypting selected information—most commonly in school-performance reports and datasets prepared by state education agencies and school districts—to protect the identity and privacy of individual students, teachers, or administrators. Data masking is used when reports are shared with third parties who are not authorized to access secure or private information—such as academics, researchers, or consultants—that could potentially be used to infer or reveal the identities of specific individuals.

Data Masking vs. Data Suppression
The terms “data masking” and “data suppression” refer to similar yet distinct processes—although in some cases the terms may be used interchangeably. When data are masked, the information is concealed from view or encrypted in a file, but the masked data remains encoded in the file or database and can be accessed (or “re-identified”) by those with the proper authorization codes or passwords. When data are suppressed, the information is entirely removed or deleted, most commonly in files and reports that are publicly shared. For a more detailed discussion, see data suppression.

Data masking is frequently used in research scenarios. For example, a state education agency might hire an organization or university to study the results or impact of educational policy—say, a recent expansion of state-subsidized pre-kindergarten programs. The researchers would then request the data they need to conduct the study (e.g., records showing the number of students enrolled in pre-kindergarten programs over a ten-year period), and the education agency would then assemble the necessary datasets. Before releasing files to the researchers, however, the agency would “mask” selected information—such as the first and last names of students—to prevent individual identities from being revealed in the information provided to the external researcher. Data may also be masked when education agencies, districts, or schools share information with any other external organizations or individuals not authorized to access or view personal information—for example, consultants and companies under contract to provide specialized services.

While the specific methods of data masking can be highly technical, the basic technique will be familiar to most people: credit-card statements that present only partial account numbers combined with Xs or online passwords that are represented as small dots are both common examples of data masking. While the companies masking account numbers and passwords know what the Xs or dots represent, masking or encrypting the information provides a layer of security against identify theft, fraud, and other abuses of customer information.

For related discussions, see de-identified data, personally identifiable information, student-level data, and unique student identifier.

Disaggregated Data

LAST UPDATED:

Disaggregated data refers to numerical or non-numerical information that has been (1) collected from multiple sources and/or on multiple measures, variables, or individuals; (2) compiled into aggregate data—i.e., summaries of data—typically for the purposes of public reporting or statistical analysis; and then (3) broken down in component parts or smaller units of data. For example, information about whether individual students graduated from high school can be compiled and summarized into a single graduation rate for a school or a graduating class, and annual graduation rates for individual schools can then be aggregated into graduation rates for districts, states, and countries. Graduation rates can then be disaggregated to show, for example, the percentage of male and female students, or white and non-white students, who graduated. Generally speaking, data is disaggregated for the purpose of revealing underlying trends, patterns, or insights that would not be observable in aggregated data sets, such as disparities in standardized-test scores or enrollment patterns across different categories of students, for example.

While most disaggregated education data is numerical, it’s both possible and common to disaggregate non-numeric information. For example, educators, students, and parents in a school district may be surveyed on a topic, and the information and comments from those surveys could then be aggregated into a report that shows what the three groups—educators, students, and parents—collectively think and feel about the issue. The compiled information could then be disaggregated and reported for each distinct group to compare differences in how educators, students, and parents perceive the issue. Information collected during polls, interviews, and focus groups can be aggregated and disaggregated in a similar fashion.

To further illustrate the concept of disaggregated data and how it may be used in public education, consider a school with an enrollment of 500 students, which means the school maintains 500 student records, each of which contains a wide variety of information about the enrolled students—for example, first and last name, home address, date of birth, racial or ethnic identification, date and period of enrollment, courses taken and completed, course-grades earned, test scores, etc. (the information collected and maintained on individual students is often called student-level data, among other terms). Once or twice a year, the school district may be required to submit student-enrollment reports to their state department of education. Each school in the district will then compile a report that documents the number of students currently enrolled in the school and in each grade level, which requires administrators to summarize data from all their individual student records to produce the enrollment reports. The district now has aggregate enrollment information about the students attending its schools. Over the next five years, the school district could use these annual reports to analyze increases or declines in district-wide enrollment, enrollment at each school, or enrollment at each grade level. The district could not, however, determine whether there have been increases or declines in the enrollment of white and non-white students based on the aggregate data it received from its schools. To produce a report showing distinct enrollment trends for different races and ethnicities, for example, the district schools would then need to disaggregate the enrollment information by racial and ethnic subgroups.

Aggregated vs. Disaggregated Data

To aggregate data is to compile and summarize data; to disaggregate data is to break down aggregated data into component parts or smaller units of data. While this distinction between aggregated and disaggregated data may appear straightforward, there is a nuance worth discussing here: a lot of “disaggregated” data in education is actually data that has been technically aggregated, at some level, from records maintained on individual students. For example, graduation rates are widely considered to be “aggregate data,” while graduation rates reported for different subgroups of students—say, for students of different races and ethnicities—is typically considered to be “disaggregated data.” Yet to produce reports that disaggregate graduation rates by race and ethnicity, data on individual students actually has to be “aggregated” to produce summary graduation rates for different racial subgroups. Most likely, this distinction between aggregated and disaggregated data arose because, historically, only aggregated data on school-wide, district-wide, or statewide educational performance was readily or publicly available. When investigating or reporting on topics such as aggregate data or disaggregated data, it is important to determine precisely how the terms are being used in a particular context.

Reform

Before the early 2000s, most state education agencies and districts only collected aggregate data on students enrolled in public schools. Today, however, all 50 states in the United States have state-level systems that collect and maintain student-level data, not just aggregate records, which allows state education agencies to produce both aggregate and disaggregated reports on school and student performance (public-school districts typically collect student-level data from schools, and states collect student-level data from districts).

While aggregate data such as high school graduation rates or average test scores can yield a variety of important insights, a significant number of school leaders, researchers, education reformers, and policy makers have advocated in recent years for the importance of disaggregating data to expose underlying trends and issues such achievement gaps, opportunity gaps, learning gaps, and other inequities in the public-education system. If, for example, the only graduation data available are annual rates for individual schools, this aggregate data may hide significant disparities in graduation rates for students from low-income households, students of color, students with disabilities, or students who are not proficient in the English language. It’s possible for a school’s aggregate graduation rate to appear strong overall—say, 90 percent—but when the data are disaggregated for different groups of students, the disaggregation may reveal, for example, that more than 50 percent of the African American and Hispanic students in the school fail to graduate.

When data are disaggregated, educators also have more detailed information about the educational performance and learning needs of certain groups of students, which allows them to design more appropriate or effective educational experiences and academic support. For example, disaggregated data may help school leaders and educators to direct limited resources—such as funding, staff time, or social services—where they are needed most (i.e., to those groups of students who are the furthest behind, struggling the most academically, or at greatest risk of dropping out).

Generally speaking, the main purpose of collecting and reporting both aggregated and disaggregated data is to provide useful information about the performance of public schools and students to those who are monitoring schools or working to improve them. While both forms of data are essential to understanding how the public-education system is working, aggregate-data reports are generally limited to the identification of broader trends and patterns in education, while disaggregated data are more useful for diagnosing deeper underlying problems such as disparities in educational performance among different student groups.

Debate

In public education, aggregate data have been widely collected and publicly reported for decades, and for the most part the use of aggregate data has not been as controversial a topic in public education, primarily because aggregate data present far fewer concerns about student privacy than the collection, sharing, and use of data and personal information about specific students.

Although student safety, privacy, and confidentiality are more serious concerns with student-level data and personal information, disaggregated data may, in some cases, indirectly reveal the identities of specific students even when the data seemingly contains no personally identifiable information—i.e., information that might, directly or indirectly, reveal the identity or personal information of specific students. In some rural schools, for example, the minority student population may be very small—perhaps only one or two students of color in the entire school. If state or district records contain, say, test scores or proficiency levels for various racial subgroups, the identity of individual African American, Hispanic, or Asian students could be inadvertently revealed even though the disaggregated data are otherwise “anonymous” (by looking at the data, those who are familiar with the school, or who know who the minority students are, may be able to deduce which students earned which test scores, for example). For this reason, states, districts, and schools may mask or suppress (i.e., not publicly report or share) certain data when subgroups are small enough to potentially connect otherwise anonymous data to specific students.

For a more detailed discussion of related debates, see personally identifiable information.

Unique Student Identifier

LAST UPDATED:

A unique student identifier is typically a number or code assigned to students enrolled in public schools that allow state education agencies, districts, schools, collegiate institutions, researchers, and others to monitor, track, organize, and transfer student records more efficiently and reliably. In the United States, state education agencies assign a randomly generated series of numbers and/or letters to individual students, with each student assigned one unique identifier. One of the primary advantages of a unique student identifier is that it’s used in place of a student’s name or other personal information that may compromise the privacy or reveal the identity of the student. For privacy-related reasons, social security numbers are generally not used as unique student identifiers in the United States, and several states have even passed laws that explicitly forbid the use of social security numbers as unique student identifiers.

It should be noted that different terms may be used when referring to unique student identifiers in education, including statewide student identifier, student identification number, or student ID, among others. Because these terms may or may not be used synonymously in certain technical contexts, it is important to determine precisely how the term is being defined when investigating or reporting on student data.

In addition to protecting student privacy, unique student identifiers are used to improve the quality, accuracy, and reliability of student data. By assigning students unique identifiers, a wide variety of educational records maintained by different educational agencies, schools, or programs—from report cards and test scores to disciplinary records and school-attendance data to learning-disability assessments and special-education plans—can be reliably associated with individual students in the vast, complex data systems that track information related to tens or hundreds of thousands of students enrolled in a public school in a given state.

Once a unique student identifier has been assigned, it remains attached to the student as long as he or she is enrolled in public school. The same unique identifier is used if a student transfers from one school district to another in a state, and it will remain in use if a student moves out of state for a period of time and then returns to the state. Because unique student identifiers are generated for use in a specific state’s educational data system, students will be assigned a different identifier when they enroll in different state’s public-education system. The United States does not use unique student identifiers at the national or federal level.

In some states, young children enrolled in state-subsidized prekindergarten programs may be assigned unique student identifiers that will stay with the students when they enroll in public school, while other states may not yet assign unique student identifiers to preschool students for any number of reasons (e.g., they may not have data systems capable of assigning and tracking unique student identifiers for preschool students, or they may lack the necessary staffing or funding). Postsecondary institutions—both public and private colleges and universities—have historically used social security numbers as their unique student identifiers. Some state agencies of higher education, however, do use public-school student identifiers in collegiate records (a practice that facilitates the transfer of student data between public schools and postsecondary institutions), but unique student identifiers are only used for students who graduated from in-state public high schools.

Unique student identifiers are often considered essential for the effective management of student-level data in longitudinal data systems—i.e., data systems that are used to track information over long periods of time, such as years or even decades. Because data related to an individual student may be stored in multiple data systems across multiple districts, schools, and state agencies, unique student identifiers are seen as the most accurate way to link individual student records across all the different data systems tracking students over multiple years. The use of unique student identifiers can also, for example, improve the speed with which transcripts and other records are transferred among schools, in addition to other benefits.

When working with large sets of data from multiple schools or across multiple academic years, maintaining data quality, accuracy, and reliability is an enormous challenge. Unique student identifiers can improve data quality by ensuring that individual students are consistently identified in a wide variety of databases, files, or reports. For example, districts and schools may inadvertently record a student’s name differently—e.g., Tommy Smith may have previously been enrolled in his previous school as Thomas E. Smith. Or there may be multiple Tommy Smiths enrolled in the same school and same grade level at the same time.

Debate

Although unique student identifiers can improve the accuracy and reliability of student data, and facilitate the transfer of data reliably across different systems, unique student identifiers also raise concerns about student privacy. For a more detailed discussion of privacy concerns and related debates, see personally identifiable information.

Student-Level Data

LAST UPDATED:

In education, student-level data refers to any information that educators, schools, districts, and state agencies collect on individual students, including data such as personal information (e.g., a student’s age, gender, race, place of residence), enrollment information (e.g., the school a student attends, a student’s current grade level and years of attendance, the number of days a student was absent), academic information (e.g., the courses a student completed, the test scores and grades a students earned, the academic requirements a student has fulfilled), and various other forms of data collected and used by educators and educational institutions (e.g., information related to disciplinary problems, learning disabilities, medical and health issues, etc.). It should be noted that an increasing number of organizations, institutions, or companies may also collect or have access to student-level data on public-school students, typically as a part of a contract for services or a research study conducted in collaboration with schools, districts, or state education agencies.

It should be noted that a wide variety of terms may be used when referring to student-level data in education, including individual-level data, individual student-level data, student unit-level data, unit-record data, student unit-record data, record-level data, and record-level student data, among others. Because these terms may or may be used synonymously in certain technical contexts, it is important to determine precisely how the term is being defined when investigating or reporting on student data.

Increasingly, new educational technologies are redefining the definition of “student-level data,” given that educational software and online learning programs, for example, can collect a huge amount of information and metadata about the students who use them—information that was formerly impossible to track before the advent of sophisticated technologies and analytical tools—which includes information such as the geographic location of the computer being used by a student or the amount of time it took a student to answer certain questions or solve certain problems. Many online learning programs routinely collect hundreds or even thousands of distinct data points while students are using the systems—data that may then be used for any number of educational or non-educational purposes (e.g., to improve the software, modify the questions or problems students see, study how children and youth learn, or market the product to potential buyers).

Reform

Student-level data is collected and used for a wide variety of purposes, and it intersects with efforts to improve schools and educational systems in a number of ways—too many to comprehensively describe here. To cite just a few representative examples, however, student-level data may be used to:

  • Maintain more robust, accurate, and comprehensive student records for educators, students, graduates, parents, collegiate institutions, employers, and others who may need or request the information.
  • Inform or improve the instructional process by giving teachers and other educators and specialists information about the distinct learning needs, academic progress, and educational achievements of specific students.
  • Inform or improve various student-support strategies or systems, which may include any number of academic, behavioral, mental, health, or social services that students may need or access.
  • Improve the accuracy and reliability of aggregate educational data—such as graduation, dropout, or enrollment rates reported for schools, districts, and states—that originates from individual data collected on a large number of students (for a related discussion, see unique student identifier).
  • Track trends in the educational performance of individual students or educational systems over time using information such as school-completion data or standardized-test scores, for example.
  • Identify problems or weaknesses in the educational performance of students, teachers, schools, or districts for the purpose of improving academic achievement, teaching effectiveness, or educational results.

Before the advent of technologies and software applications that allow schools, districts, and state agencies to collect an array of highly detailed data on individual students, student-level data was generally limited to teacher grade books, report cards, school transcripts, attendance files, and other administrative records maintained by schools and districts. Because this information was largely or entirely paper-based—and therefore difficult, time consuming, and costly to collect, organize, or analyze—it limited the ability of educators, researchers, and others to use student-level data to diagnose education problems, track trends in performance over time, or improve the effectiveness of schools or teaching, for example. Advances in educational software, computing technologies, internet access, and innovations such as cloud-based data storage and “big data” analytics have fueled a dramatic increase in the collection and use of student-level data in recent years.

Since at least the early 2000s, some districts and state education agencies have been using large-scale data systems capable of collecting, archiving, and generating reports on a vast array of student-level data originating from multiple sources, ranging from schools to standardized tests. As technological advances make the collection of data on individual students more efficient, inexpensive, and potentially valuable to the educational process, an increasingly large, diverse, and ever more complex body of student-level data is being collected, archived, analyzed, and used at all levels of the educational system and by a growing number of researchers, institutions, organizations, and companies.

It should be noted that if student-level data is being collected for reasons other than maintaining academic records for students and their families, it is almost certainly being used, in some form, to reform or improve schools and education systems—even if the purpose is merely to provide more accurate, useful, and detailed information about performance to those working to improve schools.

While much of the discussion about student-level data in public education is focused on the large-scale collection of personal data on individual students—and on the potential applications and possible abuses of that information—the term also encompasses any information that teachers and other educators or specialists may use during the process of educating individual students. For example, teachers may keep journals, logs, or other records detailing the distinct learning needs or progress of individual students—information that may or may not be shared with colleagues and administrators or formally reported to state education agencies and other entities outside of the school. Personal learning plans, for example, are one of the many possible methods that educators might use to collect data on individual students. Early warning systems—usually databases of academic, attendance, and disciplinary information that educators use to identify and monitor students who are struggling academically or in danger of dropping out of school or not graduating on time—are another example.

Debate

Teachers and schools have always collected and maintained records of student-level data, but the transition from paper-based systems to digital systems, and from small-scale data collection by schools to large-scale data collection by state agencies and private companies, has given rise to numerous debates about student-level data and student privacy. For this reason, most debates related to the collection, storage, and use of student-level data are connected to concerns about student privacy.

For a more in-depth discussion of debates related to student-level data, see personally identifiable information.