Computer For Compiling Haskell Programs

Folks, I have recently graduated from working on small Haskell codebases, to one that is big enough that compilation (and re-compilation) times are getting painful.

If I wanted to just throw money at the problem, and I was willing to give up essentially everything else, what kind or specs should I be looking for to get the fastest compilation times per dollar?

I’ll be upfront that the hardware layer is a complete black-box to me - while interesting, there are other things I’d rather learn first.

P.S. I understand that the answer is “it depends”, but would like any guidance. Thank you.

4 Likes

You’ll want lots of cores, lots of memory, and a fast disk. Practically, for me that means something with at least 8 cores, about 32 GB of RAM, and an SSD. If money is really no object I would go for a Threadripper with as much fast RAM as you want along with an NVMe SSD (perhaps a few in RAID). If you’re looking for an actual pick list, I’ve found both Logical Increments and PCPartPicker to be useful even though they generally cater to gamers.

3 Likes

One thing to note on top of what @taylorfausak said is that, sadly our tooling (and compiler) does not parallelise well (for now). Thus while multiple cores (>8) is a good idea, make sure you also get high single core frequencies for the dreaded bottleneck situations where the rest of your build depends on a single module (or package).

For CI workloads, high single core frequency has always beaten high core counts. The 3700X isn’t bad. The i9900 isn’t too bad a choice either, how further security mitigations might impact those chips and their performance I can’t tell.

I can also report that lots of RAM and high performance IO (e.g. NVMe’s) do help.

Make sure to pass the -j flags to GHC and cabal/stack/…

6 Likes

It is not hardware related, and you may already be well aware of this, but if compilation time is getting too long, make sure you’re not using a command line like stack build --file-watch for general day-to-day development, but instead using something that has fast recompile times, like ghcid.

There are a lot of blog posts that explain how to make the best use of ghcid for various tasks. For example, I wrote a short blog post on how to use ghcid for web development:

I’m not very familiar with them, but I imagine that other tools like HLS have similar speeds to ghcid.

2 Likes

Thanks Taylor! I found a machine by System76 that is just around $2k. Seems like a decent upgrade from my current setup, but not something I can swing on whim at the price.

  • i9-10900k (10 cores, 20 threads) (3 up to 5.3 GHz)
  • 32GB memory @ 3200 MHz
  • 1TB NVMe drive
2 Likes

I looked into the same question a while ago and concluded that instead of buying a new computer I would just hire a cloud VM whenever I need it. A 32 GB 8 core machine can be obtained for a day’s work (8 hours) at the cost of $2 [1].

The unit economics on this are interesting. It’s not obviously the right solution for everybody. It’s worth bearing the following in mind:

  • $2k can buy your proposed System76, or more than 4 years of working day VM usage (at current rates).
  • Buying your own machine is a fixed upfront cost which you can’t change if your plans change later.
  • Cloud costs are likely to decrease with time.
  • To make best use of a VM you will need familiarity with remote access technologies such as X forwarding.
  • For the cloud solution to be cost effective you will need to develop some way of persisting state so that you can release the VM when you’re not using it.

[1] For example https://www.linode.com/pricing/. I don’t have any affiliation with Linode but I am a happy customer.

6 Likes

That’s an interesting approach. Are you actually using that method in your day-to-day job?

I’m not using this approach in my day-to-day job, but I use a related approach: I RDP into a powerful machine owned by my employer. The approaches share an acknowledgement that with modern internet speeds it is unnecessary to have a powerful machine locally.

1 Like

I also do this, primarily for linux though as my daily driver is a macOS machine. I found latency to be the biggest issue, but mostly due to running linux systems in Europe, and using them from SEA.

Visual Studio Code’s remote integration is surprisingly good, you hardly notice you are on a foreign machine, it feels mostly local (except for no noisy fans spinning up). Emacs should be able to work equally well via Tramp.

The most basic setup would be mosh + tmux, if a shell is all you want.

Relatively capable machines on can be had for ~60eur/mo or ~700eur/year.

Again this requires a stable (fast if you want to transfer a lot local <-> remote), and ideally low latency connection to the datacenter where the remote machine is in.

4 Likes

I have been pondering such a setup myself, also to be able to program on my newly aquired eInk-Display-Tablet, especially now that summer comes.

Since I am not programming 24h a day, are there good options where you save money (and resouces) by using a very-fast-to-spin-up, charged by the minutes, cloud machine, or is that not worth the bother?

(What’s mostly holding me back so far is just decade old habits about managing windows using xmonad rather than tmux/screen. )

3 Likes

are there good options where you save money (and resouces) by using a very-fast-to-spin-up, charged by the minutes, cloud machine, or is that not worth the bother?

I sometimes do that. In providers like Digital Ocean or Hetzner, server snapshots are much cheaper than running servers. What I do is re-create a server from the snapshot when I want to program (deleting the snapshot) and when I’m finished I take another snapshot (deleting the server).

I even wrote a small command line utility to automate the process a bit.

Startup times are not immediate however. In my experience, they take a few minutes.

why can we not just have a distributed operation system yet that just takes care of this so we don’t have to keep in thinking about all these solutions

1 Like

I would say that reduce compiler memory comsumption would be a social justice action and would help to increase haskell adoption . Not everybody has the money to buy a i9 32 Gb 1Tb ssd to get reasonable compile times.
Sorry for stating the obvious.

8 Likes

How big is big code size?

A relatively small app I’m working on is about 5k lines, I develop using a repl, then run gitlab’s CI to compile & deploy. Un-cached this takes about 40 minutes, from cached about ~6-8.

I do want to warn that the approach of using cloud providers will often end with a quite low clock speed. Desktop CPUs and server CPUs optimize differently and high single threaded performance is one of the most generally beneficial things to compile times.

In the best case for the cloud provider, the desktop CPU will likely be 25% faster in critical tasks, and often far more than that (A desktop CPU being more than twice as fast as a cloud CPU in single threaded cases is common). It takes care to get a Haskell project to parallelize its compiles much, and that doesn’t remove the entire influence of single threaded performance.

I’d take care with your approach.

8 Likes

I tried to do this as well, but the experience was not much faster. Tried both AWS EC2 instances and DigitalOcean droplets.

2 Likes

I keep coming back to this topic…

Just had a look at the Hetzner cloud offers, and they offer Storage Volume that you can connect to the temporary servers. If you keep your /home on these, would that make turning these machines off and on faster? Or are these storage volumes too slow to use as the $PWD during development?

It might help yes, as the size of the snapshot does seem to influence how much it takes to restart the server.

Some data: I had 39G total disk usage (34G of them in /home) and restarting the server took about 5 minutes.

I did some cleanup, reduced disk usage to 17G total (also invoked fstrim as recommended here) and now server starts in around 3 minutes.

So perhaps keeping most of your data in a volume and keeping the server small would work. One might even dispense with snapshots altogether, instead re-creating the server from scratch each time and mounting the volume. I don’t know how much overhead volumes introduce though.

why can we not just have a distributed operation system yet that just takes care of this so we don’t have to keep in thinking about all these solutions

There are development-environment-on-the-cloud services like GitHub codespaces. I would like to hear about the experience of using them for Haskell. Although they only seem to be available for “organizations using GitHub Team or GitHub Enterprise Cloud”.

Azure has a similar offering.

1 Like