Hello, welcome to week 9.
This week I moved into phase 2 of CI failure reporting. Following on last week’s CI failure dashboard, I’m now working on a system to backfill new errors as they are categorized.
Monday and Tuesday were wasted spent on exploring problems and strategizing. Unfortunately my powers of concentration were low for a few days.
I did discover, however, that two of the most common kinds of errors originate on the runner systems. I can’t dig into them until I get access. I’ve notified the people who can give me access (@bgamari, @angerman) and I should notify them agai—oh wait I just did.
By Wednesday I had cleared my cognitive roadblock. I wrote a summary email of the status of tracking spurious failures. Read it here: Tracking intermittently failing CI jobs.
On Wednesday I was momentarily occupied with what I thought could be a simple fix for a failing CI job. (It was not simple. But I got an issue out of it. nofib fails to run in CI (#21859) · Issues · Glasgow Haskell Compiler / GHC · GitLab)
Later on Wednesday I was able to set out my plan for coding the backfill system.
On Thursday and Friday, I coded.
The backfill system has two halves. The first half loads all jobs and their logs into a local sqlite database. (That part is done now.) The second half will create tables of errors based on the jobs and their metadata. From there, they can be loaded into the dashboard’s database. Once that part is also done, I will return to categorizing and diagnosing errors.
That’s all for now!
P.S. Early notice: I will be away for 4-5 weeks starting around August 15.