Haskell Foundation DevOps Weekly Update, 2022-10-10

Welcome to Week 16!

This weekly update is late because I went to MuniHac last week and forgot to send one early. :slight_smile: It was nice to meet everyone who was there!

Last week, I focused on reducing the causes of spurious failures in GHC CI.

  • There is a simple case where GitLab reports a runner system failure. It’s now tracked and jobs are automatically restarted.
  • I began separating certain fragile tests out of the head.hackage pipeline.

Certain GHC tests rely on external libraries. Those were moved out of the GHC tree and added to head.hackage (tests/ghc-tests) so that they would always have up-to-date libraries to use. Unfortunately, they are still fragile. Since we don’t have access to the same fragility framework that is in GHC, I’m just going to run the tests in a separate job that will be allowed to fail.

Sweeping tests under the rug by allowing them to fail is not fun, but doing so means that more people can finally get in the door and down the hall, so it’s a necessary evil. Once CI is actually green, we can look at those tests in more detail. Maybe there are real GHC bugs to be squashed!

Next week This week, I will simply continue reducing the causes of spurious failures in GHC CI!

5 Likes

So the tests were relocated, but the fragility framework wasn’t ported out as well…is the framework too GHC-centric?

Is the framework GHC-centric? Yes.

Is it too GHC-centric? I don’t know. There might be something in there that could be spun off into a standalone service. (Or maybe such a service exists, and GHC could somehow be refactored to use it.)

The tools that exist in GHC don’t just allow you to mark a test as fragile—they record the results into a database so you can observe how fragile they are. There is no good view on to those data yet, but building such visibility into grafana.haskell.org is one of the jobs I’ll do once CI is stabilized.