Written by: Josh Rosenberg
Primary Source: Joshua M. Rosenberg, March 5, 2016
I came across this post on how to scrape data from Facebook pages for statistical analysis, and was motivated to give it a try. After thinking about which pages (including pages for educational organizations and communities) would be interesting to analyze, I looked at interactions with 2016 United States Presidential candidate’s Facebook pages. While the files referenced in the post used Python, you could certainly do the same in R, but I had been looking for a chance to try out Python.
After scraping the data, I created this plot of the total number of interactions (likes, comments, and shares) with each of the (as of the current date) six remaining presidential candidates using the powerful ggplot2 package in R:
First, I’d like to emphasize that this is not cutting-edge – Five Thirty Eight, for example, is doing much more sophisticated analyses using Facebook.
It is, though, pretty interesting to see the spike in interactions with Donald Trump’s page and the slow but steady rise of interactions with Hillary Clinton’s page (and Bernie Sanders’). It is also interesting to think that this data from Facebook pages is publicly accessible (although data from individual users, is, of course not).
My broader interest in Facebook data is educational. For example, we’ve examined State Educational Twitter Hashtags (i.e., #miched). While we loved working with data from Twitter, we wondered whether Facebook may provide another venue to do research. The answer, for now, at least, is that Facebook data are much less rich than data from Twitter, which makes sense, given that the majority of Twitter user’s profiles and timelines are public (anyone can read a tweet someone posts), where the majority of Facebook user’s profiles are private.
The files (Python and R for the plot) are available here.