Written by: Paul Rubin

Primary Source: OR in an OB World

Disclaimers:

- This a post about statistics versus decision analytics, not a prescription for improving the educational system in the United States (or anywhere else, for that matter).
- tl;dr.

The genesis of today’s post is a blog entry I read on Spartan Ideas titled “Is Michigan Turning Away Good Teachers?” (Spartan Ideas is a “metablog”, curated by our library, that reposts other blogs by members of the Michigan State University community. The original post can be found here.) The focus of that post is on changes to the certification examination that would-be teachers in Michigan are required to pass. I’ll quote a couple of key passages here, but invite you to read the full post to get the entire context:

Research has found that only about 8% of the differences in student achievement can be attributed to teachers and only 3% of that can be attributed to the combined impact of teachers’ certification, ACT/SAT scores, degrees, and experience.

…

Because teachers’ examination scores have been found to be only weak predictors of their impact on student learning, an assessment that has a low pass rate by design may prevent some who would be effective teachers from obtaining a teaching certificate, a concern that is supported by research.

(The link in the quote is a 2002 article in Education Next by Dan Goldhaber, senior research associate at the Urban Institute.)

My first reaction to the “weak” connection between teacher characteristics and learning outcomes is that it sounded like bad news for people on all sides of the current debates about educational reform. On the one hand, to the “blame the teacher” crowd (who like to attribute perceived problems in the public education system to poor or bad teachers, teacher tenure etc.), one might say that if teacher quality explains “only” 8% of variance in learning outcomes, quit picking on them and look elsewhere. On the other hand, to people (often affiliated with teacher unions) pushing for better compensation, better working conditions etc., one might point out that those are generally incentives to recruit and retain better teachers; so if teacher quality explains “only” 8% of variance in learning outcomes, perhaps those dollars are better spent elsewhere (upgrading schools, improving neighborhood economic conditions, …).

What struck me second about the original post was the use of the phrases “only about” and “weak predictors”. This seems to me to relate to a difference between statistics, as it is commonly taught (and used), and what some people now refer to as “decision analytics”. In my experience, the primary focus of statistics (and its sibling “data analytics”) is to identify patterns and explain things (along with making predictions). That makes measures such as correlation and percentage of dependent variance explained relevant. In contrast, decision analytics emphasizes changing things. Where are we now, where do we want to be, which levers can we pull to help us get there, how much should we pull each, and what will it cost us to do so? That perspective may put more emphasis on measures of location (such as means), and on which input factors provide us “leverage” (in the archimedean sense of the term, not the regression sense), than on measures of dispersion (variance).

It is common, at least in the social sciences, to categorize predictors as “strong” or “weak” according to how much variation in the dependent variable they predict. This is the statistics perspective. I understand the attractiveness of this, particularly when the point of the model is to “explain” what happens in the dependent variable. At the same time, I think this categorization can be a bit dangerous from a decision analytics standpoint.

Fair warning: I’m about to start oversimplifying things, for the sake of clarity (and to reduce how much typing I need to do). Suppose that I have a unidimensional measure \(L\) of learning outcomes and a unidimensional measure \(T\) of teacher quality. Suppose further that I posit a linear model (since I’m all about simplicity today) of the form

\(L = \alpha + \beta T + \epsilon\)

with \(\epsilon\) the “random noise” component (the aggregation of all things not related to teacher quality). Let’s assume that \(T\) and \(\epsilon\) are independent of each other, which gives me the usual (for regression) decomposition of variances:

\(\sigma_L^2 = \beta^2 \sigma_T^2 + \sigma_\epsilon^2.\)

From the research cited above, we expect to find \(\beta^2 \sigma_T^2\) to be about 8% of \(\sigma_L^2\).

Tell a decision analyst that the goal is to “improve learning”, and something all the lines of the following questions should arise:

- How do we measure “learning”? (Assume that’s our \(L\) here.)
- What is our goal (achieve the biggest bang on a fixed budget, achieve a fixed target at minimal cost, …)?
- Is the goal expressed in terms of mean result, median result, achievement by students at some fractile of the learning distribution (e.g., boost the bottom quartile of \(L\) to some level), or something else (e.g., beat those pesky Taiwanese kids on international math tests)? Reducing variance in \(L\), or the range of \(L\), could be a goal, but I doubt it would be many people’s first choice, since a uniform level of mediocrity would achieve it.
- What are our levers? Teacher quality (our \(T\)) would seem to be one. Improving other measures of school quality (infrastructure, information technology) might be another. We might also look at improving socioeconomic factors, either at the school (more free lunches or even breakfasts, more after-school activities, more security on the routes to and from the schools) or elsewhere (safer neighborhoods, better food security, more/better jobs, programs to support two-parent households, …).
- How much progress toward our goal do we get from feasible changes to each of those levers?
- What does it cost us to move those levers?

The (presumably regression-based) models in the research cited earlier address the penultimate question, the connection between levers and outcomes. They may not, however, directly address cost/benefit calculations, and focusing on percentage of variance explained may cause our hypothetical decision analyst to focus on the wrong levers. Socioeconomic factors may well account for more variance in learning outcomes than anything else, but the cost of nudging that lever might be enormous and the return on the investment we can afford might be very modest. In contrast, teacher quality might be easier to control, and investing in it might yield more “bang for the buck”, despite the seemingly low 8% variance explained tag hung on it.

In my simplified version of the regression model, \(\Delta L = \beta \Delta T\). The same ingredients that lead to the estimate of 8% variance explained also allow us to take an educated guess whether \(\beta\) is really zero (teacher quality does not impact learning; what we’re seeing in the data is a random effect) and to estimate a confidence interval \([\beta_L, \beta_U]\) for \(\beta\). Assuming that \(\beta_L > 0\), so that we are somewhat confident teacher quality relates positively to learning outcomes, and assuming for convenience that our goals are expressed in terms of mean learning outcome, a decision analyst should focus on identifying ways to increase \(T\) (and, in the process, a plausible range of attainable values for \(\Delta T\)), the benefit of the outcome \(\Delta L\) for any attainable \(\Delta T\), and the cost of that \(\Delta T\).

Stricter teacher certification exams may be a way to increase teacher quality. Assuming that certification requirements do in fact improve teacher quality (which is a separate statistical assessment), and assuming that we do not want to increase class sizes or turn students away (and therefore need to maintain approximately the same size teaching work force), imposing tighter certification standards will likely result in indirect costs (increasing salaries or benefits to attract and retain the better teachers, creating recruiting programs to draw more qualified people into the profession, …). As with the connection between teacher quality and learning outcomes, the connecting between certification standards and teacher quality may be weak in the statistical sense (small amount of variance explained), but our hypothetical analyst still needs to assess the costs and impacts to see if it is a cost-effective lever to pull.

So, to recap the point I started out intending to make (which may have gotten lost in the above), explaining variance is a useful statistical concept but decision analysis should be more about cost-effective ways to move the mean/median/whatever.

And now I feel that I should take a pledge to avoid the word “assuming” for at least a week … assuming I can remember to keep the pledge.

#### Latest posts by Paul Rubin (see all)

- Piecewise Linear Approximations in MIP Models - March 11, 2018
- More on “Core Points” - January 18, 2018
- Finding a “Core Point” - January 17, 2018