Written by: Josh Rosenberg
Primary Source: Joshua M. Rosenberg – March 11, 2017
The Internet Archive’s Television News Archive is a cool way to search closed captions from TV shows.
Here’s a bit more information on it:
The Internet Archive’s Television News Archive, GDELT’s Television Explorer allows you to keyword search the closed captioning streams of the Archive’s 6 years of American television news and explore macro-level trends in how America’s television news is shaping the conversation around key societal issues.
There’s an easy way to access the archive in R, via the awesome Newsflash package. Since I am visiting my brother and father in Colorado, we thought to check out how often rock climbing is mentioned (on TV news stations, in specific ABC, CBS, FOX, NBC, and PBS):
We annotated the plot with two events (my brother knows about them, not me):
While it looks like rock climbing is being mentioned more, it might in part be due to more news over time (we would need to turn the number of mentions into a rate, like number of mentions per some number of words or hour of news).
What else could this be useful for? Well, in education, discussion of policy issues and curricular standards could be worth a look.
Code (in R)
library(newsflash) library(tidyverse) library(hrbrthemes) climb <- query_tv("rock climbing", filter_network = "AFFNETALL") t1 <- lubridate::ymd_hms("2012-05-30 00:00:00", tz = "UTC") t2 <- lubridate::ymd_hms("2016-01-12 00:00:00", tz = "UTC") t1i <- lubridate::ymd_hms("2012-04-30 00:00:00", tz = "UTC") t2i <- lubridate::ymd_hms("2015-12-12 00:00:00", tz = "UTC") climb$timeline$date_w <- lubridate::round_date(climb$timeline$date_start, unit = "week") mutate(climb$timeline, date_start=as.Date(date_w)) %>% ggplot(aes(date_start, value)) + geom_col() + scale_x_date(name=NULL, expand=c(0,0)) + ggthemes::scale_fill_tableau(name=NULL) + labs(title="Timeline") + theme(legend.position="bottom") + theme(axis.text.x=element_text(hjust=c(0, 0.5, 0.5, 0.5, 0.5, 0.5))) + ggtitle("Rock Climbing on Affiliate TV Stations for ABC, CBS, FOX, NBC, and PBS") + ylab("Number of Mentions") + geom_vline(xintercept = as.numeric(as.Date(t1)), color = "#cd2626", alpha = .4) + geom_vline(xintercept = as.numeric(as.Date(t2)), color = "#cd2626", alpha = .4) + annotate("text", x = as.Date(t1i), y = 45, label = "60 Minutes Special on Alex Honnold", angle = 90, family = "Roboto Condensed") + annotate("text", x = as.Date(t2i), y = 45, label = "First Ascent of Dawn Wall", angle = 90, family = "Roboto Condensed") + labs(caption = "Data from the Internet Archive and GDELT Television Explorer (http://television.gdeltproject.org/cgi-bin/iatv_ftxtsearch/iatv_ftxtsearch).") + theme_ipsum_rc(grid="XY")
Latest posts by Josh Rosenberg (see all)
- In what months are educational psychology jobs posted? An update! - July 9, 2018
- Evolution of a (data) visualization - June 28, 2018
- Learning R (for data analysis and data science): Where to start - June 8, 2018