Parsing Months in R

Written by: Paul Rubin

Primary Source: OR in an OB World

As part of a recent analytics project, I needed to convert strings containing (English) names of months to the corresponding cardinal values (1 for January, …, 12 for December). The strings came from a CSV file, and were translated by R to a factor when the file was read. The factor had more than 12 levels: to the literal-minded (which includes R), “August” and “August ” (the latter with a trailing space) are different months.

So I wanted a solution that was moderately robust with respect to extra spaces, capitalization, and abbreviation. A Google search turned up several solutions involving string manipulation, none of which entirely appealed to me. So I rolled my own, which I’m posting here. As usual, the code is licensed under a Creative Commons license (see the right-hand margin for details).

A few notes about the code:

  • I used the lubridate package to provide a function (month()) for extracting the month index from a date object. I know that some people dislike loading packages they don’t absolutely need (memory consumption, name space clashes, …). I find the lubridate::month() function pleasantly robust, but if you want to avoid loading lubridate, I suggest you try one of the other methods posted on the Web.
  • My code loads the magrittr package so that I can “pipeline” commands. If you load a package (such as dplyr) that in turn loads magrittr, you’re covered. If you prefer the pipeR package, a minimal amount of tweaking should produce a version that works with pipeR. If you just want to avoid loading anything, the same logic will work; you just need to change the piping into nested function calls.
  • I make no claim that this is the most efficient, most robust or most elegant solution. It just seems to work for me.

The code includes a small example of its use.

  • just seems to work for me.

The code includes a small example of its use.

#
# Load libraries.
#
<a href="http://inside-r.org/r-doc/base/library">library</a>(lubridate)
<a href="http://inside-r.org/r-doc/base/library">library</a>(magrittr)
#
# Function monthIndex converts English-language string
# representations of a month name to the equivalent
# cardinal value (1 for January, ..., 12 for December).
#
# Argument:
#   x  a character vector, or object that can be
#      coerced to a character vector
#
# Value:
#   a numeric vector of the same length as x,
#   containing the ordinals of the months named
#   in x (NA if the entry in x cannot be deciphered)
monthIndex function(x) {
    x                        %&gt;%
      # strip any periods
      <a href="http://inside-r.org/r-doc/base/gsub">gsub</a>("\\.", "", .)     %&gt;%
      # turn it into a full date string
      paste0(" 1, 2001")     %&gt;%
      # turn the full string into a date
      <a href="http://inside-r.org/r-doc/base/as.Date">as.Date</a>("%t%B %d, %Y") %&gt;%
      # extract the month as an integer
      month
  }
#
# Unit test.
#
x
The following two tabs change content below.
I'm an apostate mathematician, retired from a business school after 33 years of teaching mostly (but not exclusively) quantitative methods courses. My academic interests lie in operations research. I also study Tae Kwon Do a bit on the side.

Latest posts by Paul Rubin (see all)