What Do We Know About Student Growth Models?

Written by: Adrienne Hu

Primary Source: Green & Write

As part of the No Child Left Behind (NCLB) waiver, states are to explain how they will factor student test scores into teacher evaluations. As a result, many states have incorporated the use of student growth data into state policy related to teacher evaluation and even specified the weight of these data on a teachers’ summative evaluation ratings.

Currently, there are two major student growth models: the value-added model (VAM), and the Colorado Growth Model, which is also widely known as the Student Growth Percentile (SGP) model. VAM is the most controversial and widely discussed student growth measure in teacher evaluation conversations (e.g., see here and here). While SGPs have received less attention before, growing attention and supports has been given to SGP model among the educational community and policymakers. Many states, such as Michigan and Arizona, have incorporated SGPs into their accountability framework, but not necessarily in their teacher evaluation systems (please see a previous Green & Write post discussing the use of SGPs in Michigan here).

At a paper session at 2015 Annual American Educational Research Association (AERA) Conference, Audrey Amrein-Beardsley and Margarita Pivovarova from Arizona State University presented two papers on VAMs and SGPs respectively. This week’s post will highlight some of their findings regarding these two student growth measures, and discuss the future directions for both research and conversation around teacher evaluation.

Image courtesy of Renato Ganoza.

Image courtesy of Renato Ganoza.

The Validity and Reliability of SGPs

Amrein-Beardsley and Pivovarova note that SGP model has not been studied as much as the VAMs. Most validity studies on SGP model are comparing and contrasting SGPs to VAMs, and found that these models are indeed very different in many aspects and do not correlate high with each other. As more schools and districts are using SGPs in their accountability framework, the researchers felt the needs to examine SGPs in relation to other measures of teacher quality (validity), and to examine the consistency of teachers’ median SGPs over years (reliability, or inter-temporal consistency). The sample and data they used come from the Arizona Ready For Rigor project in which teachers were observed four to six times over three school years, and the median SGPs of a class were also collected for each teacher. The preliminary conclusions are as follows:

  • The median SGP is not stable for the same teacher over time (reliability is low)
  • Agreement between the median SGP of teachers and their observations is low (validity is low)
  • The past median SGP has low predictive power of teachers’ future SGP when controlling for grade, subject (English Language Arts and Math), school, and district fixed effects

In sum, SGP models seem to have similar issues as VAMs, which got criticized on its low correlation to observation measures as well as its volatility across years for the same teacher (e.g., see here and here).

How Do People in the Field View VAMs?

The other paper that these two researchers presented is not about the measurement properties of VAM, but an investigation of experts’ view on this methodology. They examined only four prestigious AERA journals in the field of education and sampled 28 papers written by 67 authors. They found that only the expertise of the authors seemed to be significantly related to their stances and views on VAMs. In other words, there is a clear divide between educational researchers/statisticians and economists/econometricians on VAMs. Specifically, only authors in education have written about 1) the issue of non-random sorting of students to teachers; 2) the appropriateness of using VAMs to make causal interpretations; and 3) small class size and regression to the mean problems. Authors in economics and econometrics tend to ignore these issues or accept that this mess in the real world of education is a given. Research by economists and econometricians thus focuses more on model specifications and other technical aspects to improve the VAMs.

Another highlight from Amrein-Beardsley and Pivovarova’s findings is the extent to which these sampled authors espouse the use of VAMs in teacher evaluations. Most economists/econometricians (77%) were in favor of VAMs to be used for accountability purposes while the rest (23%) have some concerns but are overall pro-VAMs as well. In contrast, the majority (84%) of the authors in education or educational statistics expressed some concerns of VAMs even though they are in favor of VAMs to be used for accountability purposes, while the remaining 16% were strongly against VAMs, and no sampled authors in education embraced VAMs without reservations.

SGP Model Versus VAM

Although SGP models also face issue of non-random sorting of students to teachers just as VAMs, it is merely used as a way to quantify the change of student achievement rather than to estimate student learning. Most importantly, SGP does not attribute students’ change in achievement to teacher effects. Therefore, it does not claim that individual teachers “cause” student growth. Since the causal interpretation of VAMs is probably the biggest concern of educational researchers/statisticians, it is more likely for them to support SGPs over VAMs. Moreover, administrators and policymakers might also favor SGPs because it is non-proprietary, and it is also intuitive and easier for teachers and administrators to understand. However, the presentation discussed in this post has informed us of the undesirable properties of SGP as a measure, despite some of the abovementioned advantages. In particular, because SGP does not claim teaching causes learning, what does it mean when it is included in a teacher’s evaluation rating? In this sense, SGPs are more interpretable to be used for school and district accountability, which is what some states are currently using it for.

In closing, both student growth models have issues to address before they can be implemented and used effectively, and more research is needed to provide empirical evidence of their uses in different contexts and for different accountability purposes. It is unlikely that the NCLB rewrite will include specific student growth measures as a mandate. We shall see whether either of these student growth models will become more prevalent in state policy around teacher evaluation or whether both will fade out of the education world with time.


Contact Adrienne Hu: husihua@msu.edu

The following two tabs change content below.
Adrienne Hu
Sihua (Adrienne) is a doctoral candidate in the Program in Mathematics Education (PRIME). She is currently on the research team of the Algebra Teaching Study and the Study of Social Network and Ambitious Math Instruction. Her research interests include pre-service teachers’ statistical knowledge for teaching, classroom observation instruments, and the teaching and learning of contextualized algebraic tasks.
Adrienne Hu

Latest posts by Adrienne Hu (see all)