Thursday, September 22, 2016

Predicting the 2016 presidential election

My "data programming" class, UW CSE 160, is a popular introductory computer science class.  Its examples and assignments are taken from the real world and use datasets from science, engineering, business, etc. -- not from abstract math, puzzles, or programming itself, such as computing the Fibonacci sequence or implementing a linked list.  These real-world examples are more compelling to students, and they better prepare students for the programming they will do in the future.  The class's assignments can also be integrated into an existing class without fully adopting the methodology of real-world examples.

One particularly popular assignment asks students to predict the outcome of the 2012 election, based on polling data.  Preceding the 2012 election, many political pundits, working from their gut feel, predicted a Romney win or said the election was "too close to call".  Nate Silver of the website FiveThirtyEight had been predicting an Obama win for months, and he correctly predicted the outcome of every state.

The key to Silver's approach is to combine different polls using a weighted average. In a normal average, each data point contributes equally to the result. In a weighted average, some data points contribute more than others. Silver examined how well each polling organization had predicted previous elections, and then weighted their polls according to their accuracy: more biased pollsters had less effect on the weighted average.

The concepts are simple enough for beginning programmers to complete successfully after just 3 or 4 weeks of instruction.  The assignment is interesting enough to be assigned later in the term, too.  Since most of the assignment is provided and students just have to implement 10 functions, this assignment also gives students practice in the critical skill of reading code.

When this assignment was first handed out in January 2013, the 2012 election was a fresh memory.  Now it may seem dated to students. Therefore, you could update the assignment to use polling data for the 2016 election.

Doing so requires collecting and cleaning polling data.  You can find information about how we collected and cleaned data for the 2008 and 2012 elections, in file https://courses.cs.washington.edu/courses/cse140/13wi/homework/hw3/raw-data.zip. (Students:  this doesn't give any hints about how to solve the assignment.)

If you adapt the Python programs in that zip file and collect polling information about the 2016 election, please share the fruits of your labor by emailing me.  Other instructors and their students will appreciate it!

No comments: