Haskell Foundation DevOps Weekly Log, 2022-07-29

Welcome to week 11 of the devops log!

This week saw real results from my work to reduce spurious failures: jobs can now be safely retried. Automatically retrying impacted jobs allowed the head.hackage pipelines of Thursday morning (lint, perf) to eventually succeed! Plus, I got more useful data from that incident since there were a number of failures in a row.

Also on the subject of spurious failures,

  • @simonpj highlighted another failure type: T16916 is a particularly flaky test. I found that it happens regularly, but less frequently than other failures, so I will deal with it later.
  • I got access to more runners to inspect runner-related failures.
  • @angerman and I began collaborating on “signal 9” failures.

Finally, there were also a few devops-y developments outside of spurious failures. I configured marge-bot to restart on failure like I discussed in my last devops log. I was also alerted to two potential issues with the GitLab server, itself: Disk usage is growing quickly, which may be related to my head.hackage residency profiling. And unrelated to anything else, GitLab reports that more repositories are “failing their repository check”, which needs a deeper look.

Therefore I’ll be looking into these GitLab server problems next, before continuing with signal_9 and other spurious failures!



Curious about the pros and cons of self hosting gitlab

1 Like

Well, it wouldn’t be my first choice.

1 Like

Back when GHC was small(er) and commercial “code-hosting” was more of a luxury, it probably made more sense to self-host. As for the commercial options nowadays - considering the size that Glasgow Haskell Central has now expanded to, finding affordable hosts that can cope with the current and future workloads will only get more challenging.

I for one wouldn’t be expecting a change away from self-hosting in the near future…

Oh, I think self-hosting CI runners is a defensible option, since we need to know what’s happening all the way to the metal anyway when running performance tests. I’m less certain about running our own GitLab server, however. And I’d still think hard about how many self-hosted runners we need.

Remember the opportunity cost of dealing with running our own infra is all the other development work not getting done to support Haskell and its ecosystem. Work that is paid with donations, sponsorships by Haskell users, and volunteers’ own time. My own todo list is pretty long. Tinkering with a huge complicated Ruby application doesn’t give us a lot of leverage.

Anyway it’s not my decision at this point, and I wouldn’t make any hasty changes even if it was. :slight_smile: