Welcome to another weekly log!
In the last week, my time was spent on ops incidents and bug fixes:
- Worked on new provisioning automation for GHC CI runners on Equinix Metal !16: Draft: use zfs on ghc-ci-x86 · Merge requests · Glasgow Haskell Compiler / ci-config · GitLab
- Fixed a trio of bugs in my retry bot
- False positives for “out of disk space” failures caused by erroneously matching on legitimate test output
- False positives for “out of memory” failures with the same root cause
- False positives for a few types of failures because of a logic bug in my code
- Fixed an outage of the GitLab server. I restarted the server to resolve the problems. Apparently the server had run out of disk space earlier in the weekend; although space had been recovered, the initial outage was probably the cause for the ongoing behavior.
- Identified and mitigated a couple new spurious failure types.
Coming up, I intend to stop focusing on fixing these spurious failures piecemeal and start focusing on scaling up the effort. Right now, if you want to do something about a spurious failure on GitLab, you ping me on your merge request and cross your fingers that something gets done about it. I’d like to think I have a pretty good track record, but it would be a bad use of Foundation resources if I never found a way to put more power in the hands of contributors. That will be my focus for the coming months.
See ya!