DevOps Weekly Log, 2024-02-07

Hello! It’s another weekly log dominated by CI issues.

In the last week, CI runners were running out of space because:

  • unbounded disk usage for docker images, and they’re all pretty big
  • unbounded disk usage for repo clones on darwin runners, and they’re also pretty big
  • the nix store was not being cleaned on one server
  • unbounded disk usage for Rosetta 2’s OAH cache on macOS

Furthermore, various CI runners kept falling offline because:

  • gitlab-runner freezes on some Darwin machines
  • gitlab-runner does not start on boot on some Darwin machines

Solutions, or perhaps “hacks”, are deployed for some of these problems, but there were and are issues with the deployments of those hacks. Some of the issues have been fixed. More work is desperately needed in this domain.

Besides babysitting runners, some work was done looking at spurious failures, in particular #24407 (see issue triage notes below).

Finally, in the last couple days I had to take some time out to square up my business taxes and other official matters.

Now that I’m done with those chores and CI is more stable, I will return to the Stackage migration. Not much progress was made recently because of the emergent issues with CI. But it’s still the number one thing I want to finish! The final operational change involves switching completely to a new storage bucket for the haddocks on stackage.org. This move is partially complete, but one process needs to start writing to the new bucket, and another needs to start reading from it. The underlying apps have been updated to support the new bucket, so it’s now a matter of updating the automation to use the new apps and new bucket configuration data.

GHC Issue Triage Notes

Click here for the long list

See you next time!

1 Like