Mon 29 Sep 2008

A few weeks back I was really starting to tire of the constant political punditry. Around the same time someone introduced me to two sites that have now become my first stop for political updates. They're both sites that use a variety of sophisticated statistics to aggregate the avalanche of polls that are coming out on a daily basis. The result, I think, is a picture of what really matters in the election, which voters are really important, which states are truly the battlegrounds.

One site, the Princeton Election Consortium (PEC), is run by Sam Wang, who is a biophysicist and neuroscientist at Princeton, focuses on a meta-analysis to give us a snapshot of how the election stands right now. They don't do predictions. One of the most interesting parts of the site is a graph that charts the median electoral count for Obama since April. Sam has carefully marked some of the events that seemed to significantly turn public opinion. I also note that for the first time since the convention, the 95% confidence interval for the estimator is still above the 270 electoral vote line Obama would need to win.

The other great site out there is FiveThirtyEight.com, which is run by Nate Silver. Nate is actually a sports statistician by trade, and by a lot of people's estimates, he's completely revolutionized the way sports statistics are calculated and used. See the bio of him from Newsweek. 538 (which, incidentally, is the total number of electoral votes out there) differs from the PEC in that it actively tries to make a prediction about what the outcome will be on election day based on current and past information. I don't pretend to understand the statistics, but 538's approach involves, among other things, simulating election results under a variety of parameters. So, for example, right now Obama wins 80.5% of the simulations. More detail on what makes 538 different is available here.

I love geeking out on the stats. of these two sites, even though I don't pretend to understand it all. It gives me a basis for judgment beyond the whims of a particular commentator. Don't get me wrong, both guys are liberals, but they're transparent about it. And they give us all the raw data so we can draw our own conclusions from the models.

Still, I'm taking it all with a grain of salt. In the past, when I've had to teach about statistics I've tried to debunk the popular notion that statistics are objective. I've said that you can't divorce the mathematics of a statistical test (e.g. regression) from the researcher who chooses its inputs and output and who uses it in service of a particular question. A statistic might be objective, but only until you try to say anything about it or use it for a particular purpose.

PEC and 538 are certainly open to this bias and, to be fair, both Wang and Silver know it. PEC's FAQ page includes the question 'Why should I believe the Meta-Analysis? In 2004, didnâ€™t it predict a narrow Kerry victory?' To which Sam Wang responds (touche!), oops, I goofed. Turns out he included an assumption in his model that was influenced by his particular political leaning. Without that assumption, his method predicted the final result spot on. How do we know this won't happen again?

**Update: I accidentally referred to the author at FiveThirtyEight.com as Nate Quinn. Actually, his name is Nate Silver. I mixed him up with the site's other major contributor, Sean Quinn. Apologies!**

You know it won't happen again because my methods are purposely kept simple enough to be transparent. There is a bit of an entry barrier to the code itself (MATLAB and python), but the method should be well enough documented that people can understand it.

There is always the danger of bias on the part of the analyst. The difficulty is in making good choices. As always I am attentive to readers.

Nate Silver's methods aren't perfect but he is in general an honest broker. His site provides a prediction, as opposed to my site, which provides a current snapshot. Recently he assigned undecided voters in an uneven split. This is similar to my error in 2004. However, the election does not look to be headed for a close outcome so I think he will not suffer a penalty.

Sam Wang

election.princeton.edu.

That's a fair point. Transparency is one of the best arguments in favor of open source software, for example. I tend to use R, an open source statistical package, partly because I like the confidence I get from knowing anyone, anywhere can check out the code. SPSS, STATA, etc. are developed by truly smart people, but 'many eyes make all bugs shallow.'

Still, it seems to me it's very hard to know which assumptions are problematic except in retrospect. You had great historical reasons for weighting undecided voters as you did in 2004.

What I'd really like to see, I guess, is some conservative Republicans who run sites similar to PEC and 538. Anyone know of any?

[...] You can imagine that, in an election that's as enthusiastically watched as this one, there's a lot of attention to predicting the outcome. Sites like fivethirtyeight.com use sophisticated statistical models to make predictions based on polling data. I've written about this previously. [...]