The Internet Archive’s Television News Archive and Newsflash

Written by: Josh Rosenberg

Primary Source:  Joshua M. Rosenberg – March 11, 2017

The Internet Archive’s Television News Archive is a cool way to search closed captions from TV shows.

Here’s a bit more information on it:

The Internet Archive’s Television News Archive, GDELT’s Television Explorer allows you to keyword search the closed captioning streams of the Archive’s 6 years of American television news and explore macro-level trends in how America’s television news is shaping the conversation around key societal issues.

There’s an easy way to access the archive in R, via the awesome Newsflash package. Since I am visiting my brother and father in Colorado, we thought to check out how often rock climbing is mentioned (on TV news stations, in specific ABC, CBS, FOX, NBC, and PBS):


We annotated the plot with two events (my brother knows about them, not me):

While it looks like rock climbing is being mentioned more, it might in part be due to more news over time (we would need to turn the number of mentions into a rate, like number of mentions per some number of words or hour of news).

What else could this be useful for? Well, in education, discussion of policy issues and curricular standards could be worth a look.

Thanks to hrbmstr for the package. The code I used below is heavily adapted from the Newsflash example.

Code (in R)


climb <- query_tv("rock climbing", filter_network = "AFFNETALL")

t1 <- lubridate::ymd_hms("2012-05-30 00:00:00", tz = "UTC")
t2 <- lubridate::ymd_hms("2016-01-12 00:00:00", tz = "UTC")

t1i <- lubridate::ymd_hms("2012-04-30 00:00:00", tz = "UTC")
t2i <- lubridate::ymd_hms("2015-12-12 00:00:00", tz = "UTC")

climb$timeline$date_w <- lubridate::round_date(climb$timeline$date_start, unit = "week")

mutate(climb$timeline, date_start=as.Date(date_w)) %>% 
    ggplot(aes(date_start, value)) +
    geom_col() +
    scale_x_date(name=NULL, expand=c(0,0)) +
    ggthemes::scale_fill_tableau(name=NULL) +
    labs(title="Timeline") +
    theme(legend.position="bottom") +
    theme(axis.text.x=element_text(hjust=c(0, 0.5, 0.5, 0.5, 0.5, 0.5))) +
    ggtitle("Rock Climbing on Affiliate TV Stations for ABC, CBS, FOX, NBC, and PBS") +
    ylab("Number of Mentions") +
    geom_vline(xintercept = as.numeric(as.Date(t1)), color = "#cd2626", alpha = .4) +
    geom_vline(xintercept = as.numeric(as.Date(t2)), color = "#cd2626", alpha = .4) + 
    annotate("text", x = as.Date(t1i), y = 45, label = "60 Minutes Special on Alex Honnold", angle = 90, family = "Roboto Condensed") +
    annotate("text", x = as.Date(t2i), y = 45, label = "First Ascent of Dawn Wall", angle = 90, family = "Roboto Condensed") +
    labs(caption = "Data from the Internet Archive and GDELT Television Explorer (") +
The following two tabs change content below.
Joshua M. Rosenberg is a Ph.D. student in the Educational Psychology and Educational Technology program at Michigan State University. In his research, Joshua focuses on how social and cultural factors affect teaching and learning with technologies, in order to better understand and design learning environments that support learning for all students. Joshua currently serves as the associate chair for the Technological Pedagogical Content Knowledge (TPACK) Special Interest Group in the Society for Information Technology and Teacher Education. Joshua was previously a high school science teacher, and holds degrees in education (M.A.) and biology (B.S.).