Wednesday, October 7, 2009

Predicting Presidential Elections from Biographical Information

Why crunch a bunch of numbers via regression to forecast a presidential election, when the candidates' biographical data seemingly gets you closer to the actual results? I don't know. This won't put number crunchers out of business (Good, I didn't waste 2008 after all!), but the findings from a study by Armstrong and Graefe do shed light on an interesting new avenue by which elections outcomes can be predicted. Here's how they constructed their model:
"We created a list of 49 cues from biographical information about candidates that were expected to have an influence on the election outcome. Then, we estimated whether a cue has a positive or negative influence on the election outcome. ... We distinguished two types of cues: (1) Yes / no cues record whether a candidate shows a certain characteristic or not. (2) More / less cues are more complex as they also incorporate information about the relative value of the cue for the candidates that run against each other in a particular election. In general, the candidate who achieved a more favorable value on a cue was assigned a value of 1 and 0 otherwise. For more information on the coding see Appendix 1. Finally, the sum of cue values for each candidate in a particular election determined his PollyBio index score (PB)."
And what did that yield? Out of the 28 elections between 1900 and 2008, the candidate with the highest PB index score won 25 times (see below).

Source: Armstrong and Graefe (2009). "Predicting Elections from Biographical Information about Candidates"

My first thought was, "I'll bet they missed the close ones." Well, those are the types of elections most of the forecasting models have the hardest time predicting. But that wasn't necessarily the case here. The Armstrong and Graefe model missed 1948 (Truman), 1976 (Carter) and 1992 (Clinton) and on the former two had company from other noted forecasting models. The only notable miss was Clinton's election in 1992.
"PollyBio failed in predicting the correct winner for the three elections in 1948, 1976, and 1992, in each of which an incumbent president was running. A look at the data helps to explain the failure for these three elections. Gerald Ford in 1976 and George Bush in 1992, who were both wrongly predicted to win, had particularly strong biographies. For our set of ‘yes / no’ cues, which did not include relative measures between candidates (like height, intelligence, or attractiveness), Ford and Bush achieved the highest score of all 56 candidates in our sample (together with Theodore Roosevelt in 1904 and William McKinley in 1900). By comparison, Harry Truman, who PollyBio failed in predicting to win the 1948 election, scored particularly low on the same set of cues. Being the only U.S. president after 1897 who did not earn a college degree, Truman achieved the lowest score of all incumbents in the sample. Among all candidates, only three achieved a lower score."
What was the common theme? A switch in power from one party to the other? They are all Democrats -- Southern Democrats at that (Fine Missouri's a border state.). No, those weren't it. All three elections involved incumbents. The model seems to do better in open seat races than in those where incumbents were involved.

So why wait for election day in 2012? Start comparing the bios of the prospective Republican candidates against Obama now. Who stacks up best? (My guess is Romney or Gingrich.) Hey, it is a race that involves an incumbent.

Hat tip to Political Wire for the link.


Recent Posts:
The 2012 Presidential Candidates: Pawlenty and Petraeus

State of the Race: New Jersey (10/6/09)

Here's what things would have looked like in New Jersey had the Rasmussen poll been released tomorrow.

No comments: