Written by: Paul Rubin
Primary Source: OR in an OB World
Back in 2011, when I was still teaching, I cobbled together some R code to demonstrate stepwise regression using F-tests for variable significance. It was a bit unrefined, not intended for production work, and a few recent comments on that post raised some issues with it. So I’ve worked up a new and (slightly) improved version of it.
The new version is provided in an R notebook that contains both the stepwise function itself and some demonstration code using it. It does not require an R libraries besides the “base” and “stats” packages. There is at least one butt-ugly hack in it that would keep me from being hired in any sort of programming job, but so far it has passed all the tests I’ve thrown at it. If you run into issues with it, feel free to use the comment section below to let me know. I’m no longer teaching, though, so be warned that maintenance on this is not my highest priority.
The updated function has a few new features:
- it returns the final model (as an lm object), which I didn’t bother to do in the earlier version;
- you can specify the initial and full models as either formulas (y~x+z) or strings (“y~x+z”), i.e., quotes are strictly optional; and
- as with the lm function, it has an optional data = … argument that allows you to specify a data frame.
There are also a few bug fixes:
- if you set the alpha-to-enter greater than the alpha-to-leave, which could throw the function into an indefinite loop, the function will now crab at you and return NA;
- if you try to fit a model with more parameters than you have observations, the function will now crab at you and return NA; and
- the function no longer gets confused (I think) if you happen to pick variable/column names that happen to clash with variable names used inside the function.
As always, the code is provided with a Creative Commons license, as-is, no warranty express or implied, your mileage may vary.