Metadata = MetaGold?

Written by: Thomas Padilla

Primary Source: Thomas Padilla

If there is one thing that most libraries, archives, and museums have in bulk, it’s collections metadata. It’s the data that describes content in the collections. This stuff is added, updated, and augmented nearly everyday. Overtime and at sufficient scale it can give insight into the contours of a particular field – author gender distribution, co-citation networks, single vs. multi-author publication, subject area development, physical media size, language of expression, location of publication, and so forth.

As a Digital Humanist working in a library my thinking often turns toward trying to think about how the years of work put into metadata might be turned to productive Digital Humanities use. To this end, I’ve captured a couple of metadata research use cases that inspire me. At the moment I’m attempting to make good on this inspiration by exploring catalog metadata from MSU’s Comic Book Collection (rumored to be the largest collection in North America). For Comic Book Nerd/Historian/Librarian/Digital Humanist, it’s kinda like a dream come true.

Anyway … if you have a favorite example of working with metadata, please share in the comments below.

Women in Libraries (@benmschmidt)
Sketches out method for determining proportion of works written by women in a library collection. Once gender is assigned to works, female/male authorship can be visualized as a distribution across LC Class. Gender representation across authored works can be mapped according to place of publication, publisher, etc.

Analyzing Historical History Dissertations (@lincolnmullen)
Working with ETDs, the author draws upon various MARC fields to indicate changing gender participation in the discipline of history over time, change in page count for dissertations, page counts by university, and location where dissertations were produced.

Call Numbers (@benmschmidt)
Author charts highest density of term occurrence, e.x. in which LC classes is a word like ‘evolution’ most prevalent + how does prevalence change in classes over time?

The Library Project
Two visualizations that draw on catalog data to represent interdisciplinarity on the one hand and usage data on the other. The first visualization represents circulation data by subject area. The second visualization represents top 25 most interdisciplinary subject areas and the books that sit at the intersection between subject areas – in essence serving as a discovery tool oriented toward finding texts that span multiple subject areas.

Charting Former Owners of Penn’s Codex Manuscripts (@MitchFraas)
Special collections example – drawing from the 700 and 561 fields, a network of relationships is visualized between manuscripts and former owners of manuscripts. In effect this allows visualization and analysis of provenance chains.

Tate Gallery Collection Metadata (@Tate)
The Tate Gallery released metadata for 70,000 artworks in their collection, and fascinating things happening. New interfaces, interesting visualizations, and some portals that provide new ways to interact with the collection.

The following two tabs change content below.
Thomas Padilla
Thomas Padilla is Digital Humanities Librarian at Michigan State University Libraries. Prior to his move to Michigan he was at the University of Illinois at Urbana Champaign working at the Scholarly Commons and the Preservation Unit of the University Library. Prior to that he was at the Library of Congress doing digital preservation outreach and education. Thomas maintains diverse interests in digital humanities, digital preservation, data curation, archives, History, and interdisciplinarity. His work and projects often map to these areas of interest.