Nate Silver taught numbers how to fuck.: ontd

cut_piece in ontd_political

Nate Silver taught numbers how to fuck.

Nov 10, 2008 10:22

Monday, November 10, 2008
Franken's Odds of Winning Recount May Be Long -- or Short

Votes counted in Minnesota's senate race: 2,833,089

Votes separating Norm Coleman and Al Franken: 221

Determining a candidate's odds of winning a recount is a function of three parameters. The first parameter is the margin separating the leading and trailing candidates. In Minnesota, this margin is apparently 221 votes -- although it has changed several times since results first came in on Tuesday night (it was originally more than 700), and it may change again before results are finalized this week. But let's assume that 221 is the correct number for the time being.

The second parameter is what I call the Correctable Error Rate (CER). This is the percentage of ballots that were not counted originally, but which will be counted given a hand recount.

The third parameter is the percentage of recounted ballots which are resolved for the trailing candidate -- in this case, Al Franken. It might seem natural to assume that this number is 50.0%, but there is good reason to think that it might not be. More in this in a moment.

But for now, let's get back to estimating that other parameter, the Correctable Error Rate. There are essentially two reasons why a vote might be missed in a machine count. The first is if the voter undervotes the ballot, and the second is if he overvotes it.

An overvote occurs when a machine -- in this case, Minnesota's optical ballot scanners -- registers a vote for two or more candidates in a given race. When this occurs, the machine throws both votes out, meaning that no vote is recorded in that race. An overvote is always -- or almost always -- unintentional. It may occur, for instance, when a voter initially selects one candidate and then crosses his name out before picking the other one (see example from the Minnesota Secretary of State below). It might also occur if, say, a voter fully fills in the oval beside one candidate, but then leaves a stray pen mark beside another candidate's name.

An undervote is just the opposite -- it occurs when the machine is unable to record a vote for any candidate in that race. This is probably the more common error, and may occur if the voter fails to follow the ballot's instructions in any number of ways, such as by placing an 'X' by the candidate's name rather than filling in his oval, or using his own pen or pencil rather than the one provided to him. Unlike an overvote, however, an undervote may oftentimes be intentional -- the voter may simply skip a race that he is not interested in.

The Associated Press has reported that there were approximately 25,000 ballots -- or about 0.9 percent of the total cast in Minnesota -- in which a vote was recorded for the presidency but not for Minnesota's senate race. This figure might be either too high or too low as an estimate of the true error rate in Minnesota. On the one hand, in many or perhaps even most of these cases, the voter may have left the senate race blank intentionally. On the other hand, this total is not inclusive of certain other types of errors, such as when the voter undervoted both the presidency and the senate race (as might occur when the voter was systematically making the same error in all the races on his ballot), or when the machine recorded a vote, but did so for the wrong candidate (this particular error should be fairly rare, but may happen occasionally).

For what it is worth, an 0.9 percent error rate would be fairly consistent with other studies of optical scanning systems, which are considered among the more reliable voting technologies (they are almost certainly the most reliable fully auditable voting system). These error rates are relatively low, in part, because most optical scanning systems can quickly read a ballot before it is handed to the poll worker, alerting a voter to potential overvotes or undervotes -- a process known as 'precinct scan'. In Minnesota, the vast majority of counties have such precinct scanning systems, but they may be applied inconsistently -- it appears that in most precincts, for instance, the machines were programmed to alert the voter to an overvote, but not to an undervote. If a precinct scan check is not applied, or the poll worker is too busy or distracted to alert the voter, error rates using optical scanning systems be at least twice as high.

Still, I would guess that 0.9 percent is toward the higher end of the plausible range for what I am calling the Correctable Error Rate -- the fraction of ballots that will be resolved differently when recounted by hand than when initially counted by machine. Many undervotes, as mentioned above, may be intentional. Among those that aren't, moreover, the voter's intent might not be sufficiently easy to determine even upon a hand recount. I would guess that somewhere between 7,500 and 25,000 ballots (or about 0.25 percent to 0.90 percent of the total vote) will actually be reclassified during the hand recount. Moreover, about 15 percent of these votes will be counted for third-party candidate Dean Barkley, rendering them essentially meaningless.

If the Correctable Error Rate in fact falls somewhere in this range, than Franken's chances of winning a recount are not very strong -- provided that a misclassified ballot is equally likely to favor Franken or Coleman. By using a binomial distribution, we can estimate Franken's chances of gaining at least 221 votes given various CER's:

Correctable Odds of Franken
Error Rate Winning Recount*
=================================
0.10% 0.00%
0.25% 0.24%
0.50% 2.27%
0.75% 5.14%
0.90% 6.93%
1.00% 8.01%
1.50% 12.52%
2.00% 16.04%
3.00% 21.00%
* Assuming equal distribution of Franken, Coleman errors.If, for instance, 25,000 votes or about 0.9 percent of the total are reclassified during the recount, than Franken's odds of winning are only about 7 percent. If only 0.5 percent of the total vote is reclassified, then his odds of winning are not much more than 2 percent.

Until now, however, we have been assuming that ballot tabulation errors are equally likely to favor Franken and Coleman -- but this is probably not the case. Why not? There is substantial evidence that undervotes and overvotes are significantly more common among what we might call vulnerable voters -- in particular, minorities, elderly voters, low-income and low-education voters, and first-time voters. A 2001 study for the House Committee on Government Reform, found that undervoted ballots were more than twice as common in minority-heavy, low-income precincts than in predominately white, upper-income precincts -- even when using the relatively reliable, precinct-based optical scanning system that Minnesota uses. (The discrepancies are significantly higher when using less reliable technologies like punch cards.)

How might these demographics play out in Minnesota? According to exit polls, elderly voters split their votes almost exactly evenly between Franken and Coleman (Coleman's strength came from middle-aged voters, not older or younger ones). There was little relationship, moreover, between education levels and voter preferences.

Among other groups of vulnerable voters, however, Franken sigificantly outperformed Coleman. Franken led by 15 points among voters making $50,000 or less, while Coleman led by 3 among voters making between $50,000 and $100,000, and by 16 among voters making $100,000 or more. Coleman won white voters by 3 points, but Franken won among minorities by 40 points. And while there is no direct evidence of this in the exit polls, it is likely that Franken performed significantly better than Coleman among first-time voters.

Assume that minorities are 50% more likely than white voters to have undervoted the ballot; this is arguably a conservative assumption. If this is the case, than about 51.0% of reclassified ballots (excluding those cast for third parties) are likely to be resolved in Franken's favor. Alternatively, suppose that voters making $50,000 or less are 50% more likely than wealthier voters to have undervoted the ballot. In this case, 51.3% of reclassified ballots would go to Franken. This might not seem like a big deal, but as you'll see in a moment, it makes a huge amount of difference.

If, over the long run, we expect Franken to win 51% of corrected ballots, his odds of winning the recount may be quite strong -- in fact, he may be the prohibitive favorite depending on the number of recounted ballots:

Correctable Odds of Franken
Error Rate Winning Recount*
=================================
0.10% 0.02%
0.25% 10.51%
0.50% 58.67%
0.75% 86.23%
0.90% 93.35%
1.00% 95.93%
1.50% 99.67%
2.00% 99.97%
3.00% 100.00%
* Assuming 51% of corrected ballots resolved for Franken over long-run.Let me go ahead and give you an entire matrices' worth of data given various assumptions about the Correctable Error Rate and the fraction of correctable errors resolved in Franken's behalf -- the numbers in the table represent Franken's odds of winning the recount:

The values in bright yellow represent the ones that I consider to have stemmed from the most reasonable assumptions -- that is, a relatively low CER, but a slight majority of corrected ballots being resolved in Franken's favor. As you can see, this is not very helpful -- given different sets of "reasonable" assumptions, Franken is anywhere from the prohibitive underdog in the recount to the prohibitive favorite! The average value contained within the yellow region, however, is 44.3 percent, which is pretty close to where things are trading on Intrade right now.

A couple of additional notes before we close out. Firstly, it's very important that Franken's deficit is is down to 221 votes, rather than the 700 or so that it appeared to be originally. Suppose that the Correctable Error Rate is 0.75%, and that Franken wins 50.5% of corrected ballots; we have him winning the recount 39.3% of the time under these assumptions. If, however, Franken had to make up 700 votes rather than 221, his win percentage under these assumptions would be just 0.008% percent -- about a 13,000-to-1 longshot.

Secondly, in this article we have been thinking of ballot tabulation errors as essentially discrete and random events -- that is, there is no instrinsic relationship between your likelihood of having your vote miscounted and that of the person standing in line in front of you. There may be a separate class of errors, however, which we might call malfunctions: those which, presumably because of faulty technology, might affect a large number of ballots at once. If ballot tabulation errors are not independent of one another but instead are "clustered", than the odds for the trailing candidate to prevail in a recount may be higher than implied by the charts above.

-- Nate Silver at 6:58 AM

Source

this is very long but i found it really interesting :)

al franken, minnesota, nate silver taught numbers how to fuck