Separation of Building and Testing in CI - any known best practices?

Hi,

As a both full-time and hobbyist Haskeller, I’m running CIs for many Haskell monorepos (mainly with GitHub Actions, but it is little relevant the discussion in what follows).

Because we wanted to check buildability and status of test-suites separately, we have two phases in our CIs: Build and Test.

In Build workflow, as the name suggests, we build the entire project (mainly with cabal) and upload executables and test-suite exes as artifacts.
Then, in Test phase, we download artifacts from the Build Phase, and run all the test-suites.
Here, we use artifacts instead of cache to ensure the consistency with build and test case. We cannot use cache here - if there are several commits run in CI in parallel, cache state can be “contaminated” by other commits, which violates the consistency between Build and Test phases.

To achieve this, we often write shell scripts utilising cabal-plan to extract relevant executables and test suite, record the list of tests, and then upload it as an artifact in Build Phase.
Then, in Test Phase, we download the artifact from Build Phase, unpack it, and run.

For the ease of writing bash script, we often end up in running tests in serial manner, which means there is a case where not all the test suites are executed when there is at least one failing test-suite in-between.

Yes, we can be more careful: generate more structured test information, configure test jobs dynamically based on it, specifying continue-on-error correctly, allowing concurrency, bra bra. This can definitely be done.

Here is a question: is there any prior work for such purpose? i.e. running test exes not from builder-specific caches but instead of artifacts?

And, I have another question. While cabal-plan is quite useful, list-bins command lists binaries/tests/benchmarks non-local packages (e.g. not specified as package: target in project file), even with --hide-global, --hide-builtin, and --hide-setup options.
To filter-out such non-local targets, we use some ad-hoc hack with grep and/or testing existence with [ -f ${TARGET} ] or so.
It would also be good to know if there is better way to achieve this goal more systematically.

Thank you in advance!

2 Likes

Instead of thinking about building artifact, I would also consider caching building result. The cache then can be used for speeding up the test phase.

For caching, there is also the full cache that GitHub even provides, but also a more involved setup that is incremental cache lile Nix. This way between rach new build not everything is rebuilt neither.

1 Like

Hi @hellwolf,

Instead of thinking about building artifact, I would also consider caching building result.

Thank you for your suggestion!
Indeed, we had used cache for testing with GitLab CI at first. But it is rather tedious for us to configuring cache keys so that it won’t be contaminated by other parallel jobs while sharing the build result with such jobs to save the build time across the job.

Through writing back to your suggestion, however, I think I have came up with one viable solution at least with the recent cache mechanism of GitHub Actions.
We can include commit hash as the last piece of a cache key.
In building phase, we specify the keys without commit hash in restore keys. In testing phase, on the other hand, we use actions/cache/restore action alone without any restore key and with fail-on-cache-miss option enabled. Then the Testing phase can use the corresponding cache for sure. This might increase the cache quota consumption, but it doesn’t matter at least for public repositories. Although I hadn’t tested this yet, I will give it a try. Thank you for your suggestion!

The cache then can be used for speeding up the test phase.

On the other hand, this suggestion does not make sense - we don’t build anything in testing phase, and we still have to retrieve the equivalent result from and decompress cache instead of artifacts.

For caching, there is also the full cache that GitHub even provides, but also a more involved setup that is incremental cache lile Nix. This way between rach new build not everything is rebuilt neither.

I’ve certainly heard about of Nix, but I couldn’t use it in my laptop due to lack of the disk spaces. Anyway, I could still use Nix on CI workflow, I will investigate that possibility. Thanks for your pointer!

Although it can be cache, it might be good to know if there is any test-runner which doesn’t rely on any intermediate build cache but only on test executables and metadata. That might help us for running test suites in a clean room setting.

I don’t really understand the question. It seems you already know what to do?

What’s the challenge with collecting exit codes in a loop without short circuiting the test script?

1 Like

Yes, there is no challenge. I haven’t done it because it just works and lack of spare time.

What I wanted to know was twofold:

  1. Have anyone implemented a tool just implement such a logic? (Surely I could do this, but I want to use the one if already present).
  2. Is there any known best practice in using cabal-plan in CI workflow?
1 Like

Specific to Github actions? I doubt it.

The only project doing anything remotely like that is haskell-CI: GitHub - haskell-CI/haskell-ci: Scripts and instructions for using CI services (e.g. Travis CI or Appveyor) with multiple GHC configurations

That may be one place to implement such logic, perhaps?

1 Like

… And as a Haskeller, I really don’t want to write/copy-paste a gigantic bash script everytime I create a new repo and modify the cabal-plan filter to match project-specifc config. I would rather implement it as a tool written in Haskell, so was curious if there was any.

Specific to Github actions? I doubt it.

Well, not quite. I think that the tool can be implemented in a CI-backend-agnostic manner. It can provide two separate features:

  1. Collector: Collects test exes, serialise required metadata into e.g. JSON, and compress them into single file.
  2. Runner: From the output from Collector, run test binaries appropriately (perhaps in parallel).

In this way, what we need for each distinct CI service is just to store/upload and restore/download Collector output appropriately.

The only project doing anything remotely like that is haskell-CI: GitHub - haskell-CI/haskell-ci: Scripts and instructions for using CI services (e.g. Travis CI or Appveyor) with multiple GHC configurations

That may be one place to implement such logic, perhaps?

Thank you for suggenstion! I will take a look to see if I can contribute there.

1 Like

I’d likely use this tool if it were available.

Even better if integrated with haskell-ci, which is how I already set up CI for my projects. Perhaps doing it in an external tool would work too, but the haskell-ci source probably has existing code for talking about CI things, and it’d be easier to augment it?

Let us know how it goes!

1 Like

I gave a try to cache-based approach inspired by @hellwolf here.

TL;DR: it is surely possible, but it is far more complicated compared to the artifact-based approach (at least with GitHub cache alone).

The basic strategy is to include commit hash in the final cache key and requires exact matching in Testing Phase.
However, there are several points to be carefully considered to get everything as expected:

  • If you include hash of files into the cache keys to share the common cache across the builds (e.g. to share the cache of ~/.cache/store based on the contents of cabal.project.freeze and package.yamls), , we must consistently use the one computed at appropriate point.
    • For example, if you use hashFiles('**/*.hs') in cache key, this hash must be computed right after checking-out because some new .hs files can be generated under dist-newstyle and/or some source-repository-package can introduce extra hs-files.
    • We also must be pay attention to use exactly the same key in Testing and Building Phases - I used workflow output mechanism to ensure Testing Phase to use the same key.
  • We must call cabal with the same configuration both in Testing and Building phase - otherwise, cabal test will recompile some parts needlessly.

That being said, as we must keep the consistency between Building and Testing Phases, it is far more lightweight to use artifacts mechanism than caches. This story is of course about GitHub Caching mechanism, though. Perhaps Nix can offer more clever caching or haskell-ci could do this more well. So please let me know if anyone is doing this kinda thing with caching mechanism.

1 Like

I don’t understand the hesitancy with caching. With an appropriate cache key, there wouldnt be any cross contamination between builds.

If you’re concerned with reliable builds, you want to use stack instead of cabal. Then your cache key can simply be the hash of the stack.yaml file and all the package.yaml or cabal files in your project. If those two files stay the same, you’re guaranteed to have the same dependencies. (I only ever use the cache for third party deps, I find caching the actual project to be more difficult)

If you’re writing Haskell tests, then stack test or cabal test will run all the test suites and exit if any fail. If you’re using another test framework or some bespoke test framework, then use whatever runner that test framework uses. At that point, it’s not a Haskell specific question.

I’m not sure why there’s such a concern with using the exact same test executables that were built in the build phase. We did this at my old company, and our script worked okay, but I don’t think it’s worth it. If the tests and binaries are all built from the same source code, it doesnt matter if its the exact same binary. Yes, theres a risk that some errant bit flip causes a build-phase artifact to have a nontrivially different behavior than a test phase artifact, but IMO the risk is so small and unlikely that it doesnt make sense designing an entire CI around it.

That’s an odd statement. You can use a cabal.project with a frozen index state and even use the stackage LTS constraints. You can use the hash of that + your .cabal file.

4 Likes

I think this is really an innocent misunderstanding due to tooling silos.

I am relatively new to the community, and I do hear people say cabal missing features what stack has. For me it is confusing since I started with cabal and it look fine to me. I don’t think I am alone having this confusion.

I think providing some example Haskell project mimicking real world engineering project (bundling dev environment setup, tests, CI setup, etc.) could bridge the gap within the community.

I think OP’s question is one such evidence that, if there is some authoritative and kept-up-to-date example projects maintained by the community it would really lower the Haskell ecosystem barrier to entry.

1 Like

Hi @brandonchinn178, thank you for getting in!

I’m afraid but I think you are missing the main points: we DO cache build results to speed up the multiple builds. The point is: cache are not reliable for sharing binaries between Build and Test jobs in CI, at least without nontrivial complication and careful thoughts.

The point is, working out the appropriate key that just work is not as easy as expected. The final result in the preceding post on cache-based approach is more complicated than artifacts-based approach. There are far more things to be considered to get everything work.

I’m afraid, but I can’t get your point. As for the files that should be considered in the cache key, I think there is no difference between (Nix-like) cabal and stack. After all, the scenario remains almost the same - we need to consider cabal.project and cabal.project.freeze instead of stack.yaml. Then, your statement If those two files stay the same, you’re guaranteed to have the same dependencies will hold almost equally for cabal. Cabal still installs packages not listed in freeze file, but it is a small portion of dependency (b/c we use the plan downloaded directly from Stackage) and we can add it by hand - and in large stack-based project, there is certainly the case where one needed to extra-deps in stack.yaml. Anyway, I can’t see any crucial difference between them as for caching strategy.

Yes, “cache dependencies only” should just work for such a case. However, a monorepo with dozens of subpackages that takes a few hours for production (optimised) build, caching the local build result is mandatory. Otherwise, we must wait hours everytime we add new features / bugfixes to battle-test agains the production inputs.
And, in the context of this topic, your argument sounds rather confusing to me - how can we stop caching local packages in one hand and use caching mechanism for sharing the build result of local test-suite simultaneously?

(Rather off-topic Disclaimer: What follows sounds rather adversarial against stack - but it is only regarding running in CI. I do love stack for what it have brought to the Haskell world. Stackage couldn’t be there if there is no stack; the maturity of cabal-install today is deeply inspired by stack. I really, deeply appreciate it.)

Actually, I (and the company I’ve been working for) had used stack for many years and decided to switch to cabal last years. This is, indeed, the result of the consideration of CI performance. With our codebase, cabal-install takes less cache quota and fewer seemingly unnecessary rebuilds than stack at least with our codebase in our and my personal experience. In addition, building stack-based project in CI needs a slight attention on timestamps to get caching done right: we must use either checkout --depth 0 or restore timestamps manually before the build. These factors convinces us to use cabal-install at least in CI side. And the transition from stack to cabal was not so hard job, especially in CI jobs. The main logic of CI didn’t changed from cabal to stack. We still miss stack ide targets equivalent in cabal, but we can workaround with cabal-plan or whatever.

If the source code is same, generally yes. It is the case when the cache contamination occurred between the same commit hash. But in some cases, the build from the different source code CAN share the same build cache - and this is sometimes the intended behaviour at least for building phase. After all, in Build Phase, we need caches to reduce the build time between non-identical, modified codes. Otherwise the cache makes no sense (other than eating cache quota up :slight_smile:).

But this is not obviously the case with testing. We must test the binaries from the same source code. And indeed, in our experience, there were some cases that cache pollution causes false-positive or false-negative results in CI. At that time we are using GitLab which didn’t have enough flexibility as today’s GitHub, so we decided to use artifacts instead. And even if GitHub today provides a powerful cache mechanism, as I demonstrated (partial) PoC above, it clearly complicates the logic.

So your claim:

…is missing my entire point. Sorry for my unclear writings. The intention here is, as I’ve been trying to clarify in this comment, it is not as easy as the first thought to configure cache mechanism so that it won’t deliver binaries from other run with different source code. It’s possible, but its complication seems unworthy compared to the simplicity to artifact-based approach. And my “include hash in cache key” approach has another drawback: it can consume unnecessary amount of storage as it forces the CI to save build result every time.
But, yes, there might be other clever shiny way to achieve consistent test artifact sharing with caching mechanism that I’m not aware of. If you have any idea, I’m really happy to hear that. Thanks!

Sorry, I think I was confused about talking about “caching” for sharing between the build and test steps. That’s not typically what caching is for. GitHub Actions and Circle CI, for example, have separate mechanisms for caching and for persisting artifacts. To share between jobs in a pipeline/workflow, you should store/upload artifacts. To share between workflows for performance reasons, use the cache.

Yes, cabal does have a freeze file, and you can download the stack snapshot for the freeze file. Technically, though, that doesn’t guarantee a consistent build in one specific scenario. If you add a dep on a package that’s not in the snapshot, cabal will resolve it dynamically, while stack will error saying “add it to extra-deps”. But if you have a check that all packages used (and all packages your packages use recursively) are in the freeze file (or dont care), then cabal works fine.

Side-note: I am a bit surprised that your production build takes so long. Our monorepo had ~30 packages and didn’t take that long. But no matter. Yeah, you could key a cache on all source files in a package, which means having a separate cache per package, which probably doesnt scale great. If you share a cache for all the packages, be sure to avoid the cache from snowballing. Typically what I do is add date +%B to the cache key to invalidate the cache every month

That is fine if you freeze the index state.

1 Like

This is exactly what we are doing at present and I stated it in the very first post. I’m happy to see you agree with my whole point!

…And I remembered that my main point was not suggesting artifacts as a mean for separated CI (sorry for cluttering…). The main question of the topic is whether there is an existing tool to help such practice. As the answer seems “No”, I would consider the possible solution more than a bash script. If I made a progress I’ll report.

Yes, apologies. I completely misunderstood your original post (and possibly got confused with some of the comments).

Yes, I don’t believe an existing tool would help with running test executables in a directory. At an old company, we just wrote a small python script that found exes in a directory + ran them. We used stack ide targets to find the exes; if cabal doesnt have that feature, you could probably write a Haskell script that uses the Cabal-syntax lib to parse cabal files + find test suites.

1 Like