Github cache action

I am currently trying to speed up ci on github with caching however my stack caches are too big is there a way to be more specific with the cached paths to reduce the cache size but still get good build times?

      - name: Use Cache
        uses: actions/cache@v4
        with:
          key: ${{ matrix.os }}_${{ matrix.resolver }}
          path: |
            ~/.stack
            ./.stack-work
            ./*/.stack-work

Perhaps splitting up the ~/.stack path, but I am unsure what that should actually look like.

3 Likes

Is your project publicly available? If you link to it, I can better understand your configuration.

Your code includes matrix.resolver, which I assume is a Stackage lts-* / nightly-* / ghc-* string. This makes me wonder how you are installing GHC and Stack. Perhaps you are not using haskell-actions/setup and are instead installing Stack and then letting Stack install GHC? Stack installs GHC into ~/.stack, so this may be a significant factor of the cache size. If you are testing nightly (which points to the latest nightly snapshot), then you may be accumulating every version of GHC that you have ever tested in the corresponding cache.

I use stack-${{ matrix.ghc }}.yaml files instead. The matrix specifies the GHC versions to test. Each stack-${{ matrix.ghc }}.yaml file has the Stack configuration, specifying the resolver.

Example stack-9.6.6.yaml file:

resolver: lts-22.28

packages:
  ...

Example stack-9.8.2.yaml file:

resolver: nightly-2024-07-09

packages:
  ...

Example stack-9.10.1.yaml file:

resolver: ghc-9.10.1

packages:
  ...

extra-deps:
  ...

In the workflow, you can then use haskell-actions/setup to install both GHC and Stack. Every stack command used must include the --system-ghc option to prevent Stack from installing GHC into ~/.stack.

I am not sure what you mean by “splitting up the ~/.stack path.” Perhaps you want to cache ~/.stack separately from the .stack-work directories? You can do this using multiple cache steps. Separate caches must of course have different keys.

2 Likes

Here are the things I cache. There may be simpler/better ways…

1 Like

If you want to (temporarily) insert a debug step that shows the size of your ~/.stack, here is a script that attempts to provide a useful summary:

#!/usr/bin/env bash
# shellcheck disable=SC2088

if [ ! -d "${HOME}/.stack" ] ; then
  echo '~/.stack not found'
  exit 0
fi

echo '## programs'
if [ -d "${HOME}/.stack/programs" ] ; then
  find ~/.stack/programs -mindepth 2 -maxdepth 2 -type d -print0 \
    | xargs --null du --human-readable --summarize
else
  echo '(not found)'
fi

echo
echo '## snapshots'
if [ -d "${HOME}/.stack/snapshots" ] ; then
  du --human-readable --summarize ~/.stack/snapshots
else
  echo '(not found)'
fi

echo
echo '## setup-exe-cache'
if [ -d "${HOME}/.stack/setup-exe-cache" ] ; then
  du --human-readable --summarize ~/.stack/setup-exe-cache
else
  echo '(not found)'
fi

echo
echo '## ~/.stack total'
du --human-readable --summarize ~/.stack

Note that the “programs” section shows the size of each program directory. It does not include archives. When using --system-ghc, none should be installed here.

2 Likes
1 Like

Thank you for pointing out to use haskell-actions to install ghc instead of stack! That will solve my size problem and make the cache include patterns very simple.

1 Like

Thanks!

I checked out your CI configuration and learned something new. You are using the Stack --snapshot option to override the resolver configured in the stack.yaml file. This allows you to test many snapshots without having to clutter your project with many stack-*.yaml files. Nice!

I may be able to do this to clean up some of my projects, but many of my projects require different configuration for different snapshots. For example, some configuration requires extra-deps that pin specific versions of packages that are not in the snapshot. Also, optparse-applicative has different dependencies depending on the version, and Cabal can handle this automatically while Stack requires manual configuration.

Thanks for sharing!

2 Likes

Glad we both learned something useful!

1 Like

Another option is to use haskell-ci. But use it from Github, because the released version isn’t generating compatible CI configs now that Github bumped their glibc requirement.

I am unsure if your second sentence means that it works or that it does not work on github.

What I mean is that the version of Haskell-CI on Hackage isn’t new enough. The versions are all intended to generate working CI configurations, I am just telling you not to use v0.18.1 (the latest on hackage). Because it won’t actually generate something that works. Instead you should pull haskell-ci from Github and use it to generate a Github CI configuration (which will work, since the master version of Haskell-CI was updated to handle the glibc bump). And your CI configuration would work for a couple of years. At that point, when it stops working, you could regenerate with an even newer version of Haskell-CI.

1 Like

Please see my comment here: Blessed recipe how to use stack on github actions, in particular caching? ¡ Issue #5754 ¡ commercialhaskell/stack ¡ GitHub

Perhaps splitting up the ~/.stack path, but I am unsure what that should actually look like.

Yes, exactly :+1: The linked post shows how.

Wish I had this when I was figuring it out.

So, currently I am caching

          path: |
            ${{ steps.setup-haskell-stack.outputs.stack-root }}
            .stack-work
            */.stack-work

but according to issue 5754 I should change the stack root line to be

 ${{ steps.setup-haskell-stack.outputs.stack-root }}/pantry
 ${{ steps.setup-haskell-stack.outputs.stack-root }}/snapshots

Reference to steps.setup-haskell-stack.outputs.stack-root

Workflow file

but according to issue 5754 I should change

No.

You should have 4 separate caches.

Please read the post. There’re snippets after TL;DR

The complexity of this warrants its own action or a change to stack support easy caching of workflows.

[…] a change to stack to support easy caching of workflows.

But is it just stack, or cabal (the library) which stack uses? Moreover, is this a problem for the 'Hub, or is the 'Lab also affected?

It’s not a problem… it’s a tradeoff. Want quick-n-dirty simple cache? Plop in entire ~/.stack plus .stack-work to one pot; brace for issues some ~years later (depending on project velocity).

Want fast rebuilds in a mostly correct, long-term reliable setup? Alas, brace complexity.

… Remember, cache invalidation is one of the hardest problems of computer science :joy:

My post is explicitly stack centric. CI caching with cabal v1-build & cabal v2-build each will be mostly different; haskell-ci may be of help there.

Re: GitLab — please read.

I’ve done quite a few CI/CD pipelines for haskell, on CircleCI, BitBucket Pipelines, GitHub Actions… even Jenkins.

In all cases, the syntax (yaml, groovy, whatnot) is ofcourse different; but underlying issues, ideas, and principles, are the same. So yes, on GitLab too.

I thought a lot about this…

The complexity of this warrants its own action or a change to stack support easy caching of workflows.

Still undecided though.

See, the problem space is riddled with nuance & vast enough that trying to invent a universal “higher-order YAML” to solve it all — seems utterly futile. The universality aspect is bound to multiply complexity, too. I just don’t see how to make it happen.

Set up CI in a given repo? Sure, easy, hold my beer. Concoct a universal CI for any Haskell repo? Hell no.

Perhaps having it explicitly in the workflow yaml, tailored for each specific case, is better in terms of complexity. Much less abstractions to decode, than a universal action must have (“build targets? cabal flags? monorepos? native dependencies?? backpack mixins?? vendored private source repos?! custom ghc builds?!?” etc etc) — with the steps “inlined”, the reader is seeing direct stack build calls, instead of yet another layer of bespoke YAML-embedded DSL.


In terms of stack support: yes, a “build plan” feature would help resolve one minor correctness issue in that setup. Yet, complexity would remain; it won’t solve the need for 3–4 separate caches.

…much like what seems to happen with people’s .cabal directories:

Hence my post.