Sorry for skipping last week’s log, but thanks to everyone who spoke with me at Zurihac! I had fun talking and meeting people, and even managed to do a little hacking.
Since the last update, I have finished everything for adding residency profiling to nightly head.hackage jobs except testing that the results actually show up in the database. Although that took most of my time, I also have begun shifting to my main focus area: CI stability.
While this was mostly “code complete” before Zurihac, I still had to spend a lot of time on it. Primarily, I was learning how to deploy changes to the GitLab server. Keeping such a system running is a herculean task, and it’s no wonder it has been a timesink for the GHC team.
Luckily, the server is maintained in a NixOS configuration, meaning even as a newbie I can discover the current state of the system and follow recent changes. This really minimizes the amount of folklore that must be passed down from admin to admin!
Not everything on the server is rigidly defined, however. There is still work done directly on the server, and there are git submodules that can accidentally acquire local commits that don’t get pushed upstream, and oh hey it turns out the repo actually defines multiple servers, and work is done directly on those servers as well, so watch out for conflicts…
This is all totally normal, though. The time spent easing onboarding newbies needs to be weighed against how often someone new actually joins, and the rigid definition of systems need to balanced against ease and speed of deployment. As long as the team is small, the current set of tradeoffs is reasonable.
Anyway, here’s another boring list of links summarizing my code changes that, by its breadth alone, gives some insight into the complexity of devops:
- Only capture eventlogs on nightly run (!236) · Merge requests · Glasgow Haskell Compiler / head.hackage · GitLab
- Use the new name for the fedora33 jobs (!235) · Merge requests · Glasgow Haskell Compiler / head.hackage · GitLab
- Enable event logging while compiling head.hackage (!230) · Merge requests · Glasgow Haskell Compiler / head.hackage · GitLab
- T11829: Use <stdexcept> instead of <exception> (!8475) · Merge requests · Glasgow Haskell Compiler / GHC · GitLab
- Enable eventlogs on nightly perf job (!8468) · Merge requests · Glasgow Haskell Compiler / GHC · GitLab
- Import eventlogs (!1) · Merge requests · Glasgow Haskell Compiler / ghc-perf-import · GitLab
- Catch a more appropriate error (03d174a0) · Commits · Glasgow Haskell Compiler / ghc-perf-import · GitLab
- (Plus commits to the private server-configuration repository)
Ironically, the task to add residency profiling isn’t my main focus, and it took a lot longer than anybody expected. But I think that’s great. I got to experience GHC CI as a user. I already have much better insight into the assemblage of servers, repos, pipelines, jobs, and workflows that give rise to the emergent phenomenon known as “GHC CI”. Plus, observant readers will note that some of the commits listed above are fixing CI failures I ran into!
I can now turn my focus completely towards making CI less painful. The meta issue about intermittent failures continues to bear fruit, and I’ve started to characterize the kinds of failures people (including me) experience. We already have an idea how to solve one whole class of problems, but I’ll leave that for next time, when I have harder data…