As Nate Cohn outlined in The New York Times on Thursday, the latter three error sources are more likely to undercount Democrats than Republicans. For example, Democrats are more likely than Republicans to have a cell-phone from a different area code than where they currently live (like all three of the authors of this article), which in turn results in coverage error since such individuals cannot be included in state-level polls. Cohn notes that among cell-phone only adults, people whose area code does not match where they live lean Democratic by 14 points, whereas those that matched lean Democratic by 8 points. For an example of non-response and survey error, Cohn notes that Hispanics who are uncomfortable taking a poll in English are more likely to vote Democratic than demographically similar Hispanics.
Thus, we expect the actual polling errors to be larger than the stated errors, and moreover, we expect polling results to favor the Republicans. This pattern is strikingly apparent when we plot the observed differences between poll predictions and actual election outcomes for the 2012 Senate races…
How much do these overly optimistic forecasts matter? First, the theoretical 3 percentage point margin of error is already substantial, and puts nearly every competitive race within that range. Second, when you add in the unaccounted for errors, election outcomes in contested races are simply far less certain; and coverage and non-response errors will likely only get worse each cycle. Third, while aggregating a bunch of polls for each election reduces the variance, it does not eliminate the bias, so these overconfident predictions pose a problem for aggregate forecasts as well. In short, those fancy models that show probability of victory are only as good as their ingredients, and if the polls are wrong, the poll aggregations will be wrong as well.
—David Rothschild, Sharad Goel, and Houshmand Shirani-Mehr
Hidden Errors and Overconfident Pollsters