ERROR.

Mark Blumenthal has a comprehensive response to a London Review of Books piece that suggests, among other things, that the problem with primary polling this year has been too-small sample sizes. This implies a pretty fundamental misunderstanding of what sampling is on the part of the author, David Runciman, but it got me thinking in general about how we conceptualize error in public opinion polling and how it might relate to the scattershot nature of this year's polls.

First, as Runciman correctly points out, there have been plenty of election-eve polls this year that were totally out of the ballpark, and it appears more than the 5% that should be allowed by the standard margin-of-error application. Zogby famously had Obama winning California by six, and multiple concurrent fieldings of head-to-head match-ups are frequently quite disparate. Indeed, Gallup has gotten dissonant findings from their own two concurrent polls at several points this year. Something is going with the polls, and it goes far beyond the hew and cry of the New Hampshire "debacle" that was addressed ad naseum at AAPOR this year.

I think there are a couple of major issues that could easily be addressed just in the way polls are handled in the media. The first is something that Keith Olbermann is already doing, and that he's calling the "Keith Number" because nobody else is bothering to follow his lead. This number is the stated margin of error plus the percentage of undecided voters in the sample; so, a poll with Obama leading McCain 47-44 with an MOE of +/-3 would have a Keith Number of 12. Putting aside for the moment that it should be 15, since the MOE moves in both directions, this is a pretty stark change in the way poll stories are framed. When most polls are reported, undecided voters don't exist, and neither do supporters of third parties, unless and until they make enough noise to force their candidate into the polling instrument. Undecideds are a huge part of the story of why polling has erred so much to the Obama side this year -- Democratic primary voters who decided on the last day tended to support Clinton, and those people would've been undecided when the polling was conducted.

Another problem is how strength of support is measured. Some polls include leaners -- that is, soft supporters of one candidate or another -- in the same category is strong supporters. But leaners, quite obviously, are much more likely to switch candidates or wind up not voting than are strong supporters, making their inclusion another important source of potential error. This, too, is some that could be clarified by the media.

But polling error goes deeper than that. How is it that polls conducted at the same time, purporting to measure the same thing, can be so different? Something that's almost never acknowledged in the reporting of poll results is the impact of question wording and question order. The order of names within a question can matter, whether the question includes individuals' titles or party affiliation can matter, whether the question is built around the word "vote" or "support" can matter, etc. Can this alone explain the wildly divergent results we've seen in some races? Of course not, but this and other methodological factors -- such as live vs. automated interviewer, etc. -- contribute some error in places that are often kept in shadow. When we compare polls not to other polls but to actual election results, the situation is complicated further. For example, Obama won the Missouri primary by about 11,000 votes out of over 827,000 cast. What is the likelihood that 11,000 Missourians thought about voting in that primary, but ultimately decided not to? To take an even more extreme example, what is the likelihood that 538 Floridians wanted to vote for Al Gore in 2000, but got side-tracked on Election Day and never made it to the polls? Close elections are toss-ups for reasons that are anything but political and may not even have anything to do with individual voters -- bad weather, traffic jams, etc.

Given all this potential error, how we discuss poll results is incredibly wrong-headed. While journalists give a nod to the margin of error, swings within it -- particularly swings in which the "lead" changes -- are treated as real events. Political scientists are guilty of this as well, as they try to construct predictive models that account for unaccountably close elections, which for all intents and purposes are ties from a data perspective. What I think is clear from this year's polls is that a) we have a media problems, and b) we have a polling problem. I was glad to see AAPOR talking a lot about it at this year's conference, but I'm somewhat concerned that the focus was so acutely on New Hampshire and journalist education, and not on working towards a set of best practices for public opinion measurement. What's within the power of pollsters is to sample better and measure better, and those ought to be the first steps.

Posted by Aaron S. Veenstra ::: 2008:05:31:16:37