Another weekend side project: Spatial voter modeling in Haskell

cdsmith · July 5, 2023, 7:58am

Well, it was another holiday weekend in the U.S., so here’s another Haskell project I picked up. This one wasn’t planned; just started with some random conversations, which led to writing some code, and refining it… and I now have a reasonably realistic statistical model of voting behavior, as well as analysis tools for visualizing and analyzing election systems and methodologies and their practical effects.

The basic idea is to mode voters and candidates in an election as existing in a fairly high-dimensional vector space (100 dimensions) of political opinions and group affinities, and then define a probability distribution that describes which opinions and affiliations voters have in each simulation. The model chosen is a Mixture of Zipf Gaussians, which models voters as belonging to a collection of overlapping subpopulations, each of which has their own size, mean political positions, different degrees of variation along different axes, etc., though with variance decaying globally in each additional dimension. The result looks relatively realistic based on observed experience.

Using this model, we can answer some key questions about election systems through Monte Carlo simulation. Questions like: How often do different possible election methodologies - plurality, instant runoff, Smith/Condorcet, range voting, Borda count, STAR, etc., agree with each other about who wins an election? When they disagree, how can we characterize the kinds of decisions that each makes? How often can we find a Condorcet winner, and is the lack of Condorcet winners merely a theoretical problem or does it come up in practice? What is the impact of tactical voting: how effective is it in various methodologies, and how does it affect the character of the results?

Interestingly, a Monte Carlo simulation of elections using a spatial MoZG model tuned to reflect realistic of voting populations yields quite different results than well-known attempts to ask the same questions that rely on less realistic and nuanced models such as the better-known IC and IAC models. For example, while the common but unrealistic IC and IAC models predict the likelihood of a Condorcet winner to be approximately 0 or 1/e, respectively, simulation here suggests they almost always exist, failing to emerge only about 1% of the time.

I will likely write up the actual results of the investigation at some point, but for now I’m just sharing the code in case anyone finds it interesting.

turion · July 6, 2023, 2:26pm

This is awesome! Can you comment on the technical aspects a bit? Like:

Did you find good libraries to do the numerics?
Did you find any useful libraries for Monte Carlo and stochastic modelling? If yes, how did they fail to be useful to you? (It seems like you wrote random sampling for a number of distributions from scratch)
Was Haskell performant for your application?
What did you need to do to arrive at a good performance? I’m seeing a few strictness annotations, UNPACKs, etc., are these simply good practice or were you forced to put them in? If the latter, how did you figure out that you need them? (Profiling, …)
Did you ever have issues with Double precision?

I’m asking so many questions because I have all these questions about my stochastic modelling application

cdsmith · July 6, 2023, 7:43pm

I’m afraid I won’t have the best answers for you, as this was really more of a weekend diversion than a serious project, but I shall do my best.

I did not look for libraries to do numerics, or to Monte Carlo simulation. It honestly took me less time to write the 15 lines of sampling code myself than to hunt for a library to do it for me. I had to look up the formula for Box-Muller transform on Wikipedia, but that took 30 seconds. If I were to find myself wanting to incorporate more complex statistics, I’d probably look for something.
Haskell was performant enough. Maybe less so when I decided to try sampling way too many voters just to make sure it wouldn’t change the character of the results, but I just went out for dinner and it was done when I got back!
I definitely did a bit of profiling, because it’s dead simple. (cabal run --enable-profiling exe:foo -- +RTS -p… yeah, it’s a bit to memorize, but then you know it) I can’t recall the specific reasons I added those things. I often find it’s likely enough I’ll want to make scalar fields strict and unboxed that I just go ahead and do it.
I may well have issues with Double precision, but if so, I am not yet aware of them.