June 18 / Statistics

Occasionally in political science, we run into problems in which we have a small dataset and a large array of possible of predictors. Choosing a parsimonious model can be difficult. When theory-based model selection is out of the question, automated variable selection allows us to estimate the probability that each predictor is included in the “true model.” We can use these estimates to then prune our model. Here, I describe a Bayesian ordered probit regression model with stochastic search variable selection (SSVS).
Read MoreOrdered Probit Regression with Variable Selection

February 8 / R & Shiny

This app uses Monte Carlo simulation to determine the probability of landing on any given space in Monopoly. The slider allows you to choose which turn (how many dice rolls) you are interested in. Sometimes the initial image doesn’t load, so move the slider around a bit until the pictures appear. This particular simulation assumes the players never stay in jail and always leave on the next turn, either by paying or by using a get out of jail free card.
Read MoreMCopoly: a Monopoly Simulator

January 23 / Data Analysis

Last year, my brother and I began a project that required collecting lots and lots of tweets to analyze. So far, we’ve collected over 9.5 million geo-located tweets from roughly 20 US cities. Here’s how we did it.

Read MorePersistent Tweet Collection in Python

November 8 / R & Shiny

Lately I’ve been playing Blendoku on my phone. This is a very pretty game, the object of which is to fill in a bunch of missing colors. It looks sort of like a crossword puzzle with several colored squares and even more black squares. The player’s job is to fill in all of the black squares with the colors that best blend the gaps between colorful squares. The screenshot below shows an almost completed game.

If we wanted to make our own random levels of Blendoku, how would we do it? I haven’t rebuilt the entire game (yet), but I have figured out how to make the underlying color gradients. Here’s how to do it in R.

Read MoreBlendoku & Color Interpolation

October 2 / Data Analysis

A few months ago, I decided it would be fun to do some predictive modeling of the quality of upcoming Hollywood films. There’s tons of data out there, but some of it can be hard to find. As part of that project, I wrote a short R script to scrape some data from Rotten Tomatoes. Feed the function an actor’s name, and it will return all of their film and TV work along with corresponding Tomatometer scores, release years, and a few other things. So, we can use this to find out if anybody’s career is in a real decline (or upswing).  For example, Charlie Sheen or M. Night Shyamalan.

Read MoreRotten Tomatoes Data in R