Written by: Josh Rosenberg
Primary Source: Joshua M. Rosenberg – March 11, 2017
The Internet Archive’s Television News Archive is a cool way to search closed captions from TV shows.
Here’s a bit more information on it:
The Internet Archive’s Television News Archive, GDELT’s Television Explorer allows you to keyword search the closed captioning streams of the Archive’s 6 years of American television news and explore macro-level trends in how America’s television news is shaping the conversation around key societal issues.
There’s an easy way to access the archive in R, via the awesome Newsflash package. Since I am visiting my brother and father in Colorado, we thought to check out how often rock climbing is mentioned (on TV news stations, in specific ABC, CBS, FOX, NBC, and PBS):
We annotated the plot with two events (my brother knows about them, not me):
While it looks like rock climbing is being mentioned more, it might in part be due to more news over time (we would need to turn the number of mentions into a rate, like number of mentions per some number of words or hour of news).
What else could this be useful for? Well, in education, discussion of policy issues and curricular standards could be worth a look.
Code (in R)
library(newsflash) library(tidyverse) library(hrbrthemes) climb <- query_tv("rock climbing", filter_network = "AFFNETALL") t1 <- lubridate::ymd_hms("2012-05-30 00:00:00", tz = "UTC") t2 <- lubridate::ymd_hms("2016-01-12 00:00:00", tz = "UTC") t1i <- lubridate::ymd_hms("2012-04-30 00:00:00", tz = "UTC") t2i <- lubridate::ymd_hms("2015-12-12 00:00:00", tz = "UTC") climb$timeline$date_w <- lubridate::round_date(climb$timeline$date_start, unit = "week") mutate(climb$timeline, date_start=as.Date(date_w)) %>% ggplot(aes(date_start, value)) + geom_col() + scale_x_date(name=NULL, expand=c(0,0)) + ggthemes::scale_fill_tableau(name=NULL) + labs(title="Timeline") + theme(legend.position="bottom") + theme(axis.text.x=element_text(hjust=c(0, 0.5, 0.5, 0.5, 0.5, 0.5))) + ggtitle("Rock Climbing on Affiliate TV Stations for ABC, CBS, FOX, NBC, and PBS") + ylab("Number of Mentions") + geom_vline(xintercept = as.numeric(as.Date(t1)), color = "#cd2626", alpha = .4) + geom_vline(xintercept = as.numeric(as.Date(t2)), color = "#cd2626", alpha = .4) + annotate("text", x = as.Date(t1i), y = 45, label = "60 Minutes Special on Alex Honnold", angle = 90, family = "Roboto Condensed") + annotate("text", x = as.Date(t2i), y = 45, label = "First Ascent of Dawn Wall", angle = 90, family = "Roboto Condensed") + labs(caption = "Data from the Internet Archive and GDELT Television Explorer (http://television.gdeltproject.org/cgi-bin/iatv_ftxtsearch/iatv_ftxtsearch).") + theme_ipsum_rc(grid="XY")
Latest posts by Josh Rosenberg (see all)
- A Shiny interactive web application to quantify how robust inferences are to potential sources of bias (sensitivity analysis) - January 19, 2018
- Outcomes from a self-generated utility value intervention in science (in IJER) - December 30, 2017
- Review of ‘What’s Worth Teaching: Rethinking Curriculum in the Age of Technology’ - November 7, 2017