AHA 2015: Data for Historical Research

Written by: Thomas Padilla

Primary Source: Thomas Padilla

At Kalani Craig’s invitation I had the great pleasure of joining a rockstar cast of instructors for the AHA Getting Started in Digital History workshop series. During my workshop, “Data for Historical Research” (slides below), I cast a pretty wide net.

The general purpose was to:

  1. recast common historical objects of inquiry (audio, text, video, images) as data
  2. define data
  3. highlight the affordances that data offer for extending a research question
  4. talk about “object essentialism” and what it might cause us to miss
  5. discuss structured vs. unstructured data
  6. speak truth about work that goes into cleaning and preparing data
  7. discuss use cases
  8. talk about relationship between disciplinary knowledge and technical knowledge
  9. give resources galore


As always one of the best parts of teaching are the questions you get during and after.
Going to gloss a couple below.

  • Doesn’t there seem to be continuity between data prep and cleaning in digital scholarship and how Historians have long gathered resources, organized them, and utilized them for analog research?

I would say enthusiastically yes! A great deal of continuity. Whether explicit or implicit, disciplinary training imparts a high degree of technical fluency for working with a wide array of different sources, organizing them, and making sense of them. If I were to highlight one important distinction I would say that the work of cleaning and prepping data for a digital project is predicated on different data formatting requirements depending on the methods and tools you would want to apply to them. So if you wanted to do a bit of topic modeling as well as some network analysis on a set of letters, chances are you are going to need to do different prep work for each. Knowing what this prep work will entail in advance of gathering your data is essential. Best to save as much sanity as possible.

On the note of preserving sanity. When embarking on the data prep phase of a digital project it’s best to take small steps. For example if you want to map a correspondence network of 19th century writers, try to prep the data for as small a set of letters that represent the full range of features you want to capture as possible. Say, 10 letters. Map them.

Did it work? Yes? Awesome.

Didn’t work? Sads.

Honestly better to realize that your formatting is not working with 10 letters rather than get to the end of a  line of 400 letters and try to reverse engineer where you went wrong and why the dang method and tool aren’t working for you. Also, personal plug – talk to a librarian they are always happy to help.

  • How do you work with people who don’t think of their work as digital in nature?

This happens a lot. Whether or not someone thinks of their work as digital, in the best cases, they are generally curious enough to seek me out and ask about x thing that they heard about, or to request my definition of DH. Generally I find the best way to make this interaction beneficial is to ask a lot of questions geared toward what this person is currently researching and what types of research materials they work with. Once you have the research questions in hand it makes it easier to frame utilization of various digital approaches and the methods they encapsulate in a manner that makes sense – optimally this familiarizes an approach like network analysis in such a way that it can be more readily seen as a way of extending a research question rather than as some sparkly spaghetti visualization thing.

Word to the wise, if you are seeking to learn more about DH I’d advise that you lead with your research questions. This will lead to a more productive exchange for all parties!

The following two tabs change content below.
Thomas Padilla
Thomas Padilla is Digital Humanities Librarian at Michigan State University Libraries. Prior to his move to Michigan he was at the University of Illinois at Urbana Champaign working at the Scholarly Commons and the Preservation Unit of the University Library. Prior to that he was at the Library of Congress doing digital preservation outreach and education. Thomas maintains diverse interests in digital humanities, digital preservation, data curation, archives, History, and interdisciplinarity. His work and projects often map to these areas of interest.