Written by: Julia Frankosky
Primary Source: Digital Scholarship Collaborative Sandbox
In August 2014, the MSU Libraries purchased the text files of the Congressional Record and its predecessors the Annals of Congress, the Register of Debates, and the Congressional Globe for the years 1789 – 1918. These text files can be downloaded from on-campus and used by our researchers for text analysis purposes. Wanting to utilize this new addition but being completely unfamiliar with text analysis, I consulted with two of our DH librarians, Devin Higgins and Thomas Padilla, who recommended I try using the tool Voyant (among some other tools).
Voyant is an easy to use, free, web-based text analysis tool. Users can upload their own text files, add a URL to text, copy and paste text into a text box, or open two items that have been uploaded by others (currently Humanist Listserv Archives and Shakespeare’s plays). Once text is added, users can “reveal” their texts to produce Cirrus word clouds, summaries about the words used in the text (number of words used, what the frequently used words are, peaks in frequency of usage, etc.), and access a corpus reader.
I uploaded 68 files comprising the textual data of the Congressional Record for 1915 in order to produce a Cirrus word cloud to illustrate the primary events and discussions occurring in Congress. War raged in Europe, but the U.S. had yet to enter as of 1915; I assumed that the war would be a primary topic for discussion, but was curious as to what other major issues were being discussed.
One thing to keep in mind is that this is a computer program,, so when producing a word cloud, the most commonly used words that are displayed tend to be function words (determiners, prepositions, auxiliary verbs, etc.), which don’t really add much to understanding a text.
Luckily, it’s very easy to exclude these types of words using built-in Taporware to use a large library of stop words. With just a push of a button, you start to see a more useful word cloud.
There are still many words that dominate the cloud that add little to illustrating major topics in the Congressional Record. In addition to using the added Taporware list of stop words, you can also add your own words to exclude from scanning. I added about 30 of my own words (such as Mr., gentleman, sir, say, and so on) and was able to produce a word cloud that was more illustrative to discussion:
It could still use more tweaking to get rid of additional words that don’t add much, but major themes are now more obvious. In addition to the war, you can see that pensions, appropriations, money, and the navy were topics that were often mentioned in Congress in 1915. If I were interested in any of these topics in particular, I could click on the word and it would display all sentences in which they occur so you can get an idea of context.
This visual representation of the text allowed me to quickly gain an understanding of common topics being discussed in Congress in 1915 without having to read 68 days worth of Congressional Record proceedings. This can be a great way to introduce researchers, especially students, to the wealth of information contained in the Congressional Record (or any other large chunk of text), in a non-intimidating manner.
Latest posts by Julia Frankosky (see all)
- Digitizing Government and the Copyright Hurdle - June 17, 2015
- Catching Fugitives with Zotero - February 24, 2015
- Seeing through the Congressional Record with Voyant - December 16, 2014