My Digital Vote projects
A Texas Experiment in Social Media Election Forecasts: The 2018 Texas house races
Roberto Cerina and Ray Duch
Nuffield College, University of Oxford
Can digital trace replace random-digit-dialing? As part of the My Digital Vote project, we follow the digital trace of Texas Facebook users to predict the outcome of Texas congressional district elections. Our predicted district-level vote share, that relies on zero polling data input, generate essentially the same results as the aggregated conventional polling data. There are 36 congressional districts in Texas. 33 of them are not that competitive – and in these easy cases our predictions match the prevailing polling wisdom. For the other three? We agree with the FiveThirtyEight forecast about District 32 – they take it to be a closer race, but still call it for the Republican. District 7 for FiveThirtyEight is a complete toss-up while we call it for the Republican. District 23 is a toss-up for us while they call it for the Republican. The Congressional District map of Texas summarizes our results.
In this Texas experiment a small sample of Facebook profiles produced high-frequency estimates of district-level vote share of comparable quality to state-of-the-art survey-based models.
This Texas experiment was initiated in May 2018. Is it feasible to use behavior on social media to accurately forecast election results? Recent high-profile polling failures motivated the project [1,2]. Could low-cost monitoring of social media increase the frequency of observing partisan preferences; increase the geographical scope of these observations; as well as reduce potential bias by leveraging revealed as opposed to stated preferences?
Some technical details
To get the forecasts we treat public, explicit measures of candidate support on Facebook -- such as likes, loves or explicitly positive comments – as proxies for voting intention. We then match users, who had made these public shows of candidate support, to a record on the public Texas Voter Registration file. The resulting data is treated as a survey. The district-level vote shares are then calculated using standard Multilevel Regression and Post-stratification (MRP) techniques [3,4]. The main difference is that we did not limit ourselves to parametric models – rather we use a Random Forest based Probability Machine . We refer to this new approach simply as MLPs— Machine Learning and Post-stratification.
Since we don’t have the election results yet (the election is on Tuesday, November 6th) we benchmark against the forecast from fivethirtyeight.com. The FiveThirtyEight classic model is based on the latest public opinion polls, long-term trends in aggregate voting behavior and polling errors, and correlations across similar districts over the whole nation. It turns out that observing the behavior of about 6,000 registered voters active on Facebook can provide us with as accurate a measure of support as leveraging thousands of opinion polls, each with a sample size ~1000 or more.
Figure 2 compares the estimates of Republican 2-party vote share based on the Digital Vote project against the FiveThirtyEight forecast, for the week before election day. Our Digital Vote forecasts, particularly for the least competitive races, tend to gravitate closer to 0.5 than is the case for FiveThirtyEight. This reflects a noisier signal from the social media sample. Note the uncertainty intervals around our predictions are narrower compared to FiveThirtyEight reflecting the very different estimation methods employed here.
As mentioned earlier, only three Texas Congressional Districts are expected to be in contention. Our estimates agree with the FiveThirtyEight forecast in the case of District 32 (close but Republican); and we disagree in District 7 (we are calling it for the Republicans); and disagree in District 23 (we are calling it a toss-up).
: Sturgis, Patrick, et al. “Report of the Inquiry into the 2015 British general election opinion polls.” (2016).
: Kennedy, C., Blumenthal, M., Clement, S., Clinton, J. D., Durand, C., Franklin, C., … & Saad, L. (2018). An evaluation of the 2016 election polls in the United States. Public Opinion Quarterly, 82(1), 1–33.
: Lauderdale, B. E., Bailey, D., Blumenau, Y. J., & Rivers, D. (2017). Model-Based Pre-Election Polling for National and Sub-National Outcomes in the US and UK. Working paper.
: Wang, W., Rothschild, D., Goel, S., & Gelman, A. (2015). Forecasting elections with non-representative polls. International Journal of Forecasting, 31(3), 980–991.
: Malley, J. D., Kruppa, J., Dasgupta, A., Malley, K. G., & Ziegler, A. (2012). Probability machines. Methods of Information in Medicine, 51(01), 74–81.