Hackage errors: invalid hash S-5170 InvalidChunkHeaders

Hi.

We are constantly seeing two different errors on CI and I really would like to understand and fix them.

First one is this:

Selected mirror https://hackage.haskell.org/
Downloading root
Waiting to acquire cache lock on /Users/runner/.stack/pantry/hackage/hackage-security-lock
Acquired cache lock on /Users/runner/.stack/pantry/hackage/hackage-security-lock
Released cache lock on /Users/runner/.stack/pantry/hackage/hackage-security-lock
Selected mirror https://hackage.haskell.org/
Downloading timestamp
Downloading snapshot
Downloading mirrors
Cannot update index (no local copy)
Downloading index
Verification error: Invalid hash for <repo>/01-index.tar.gz
...
Invalid hash for <repo>/01-index.tar.gz
  Invalid hash for <repo>/01-index.tar.gz
  Invalid hash for <repo>/01-index.tar.gz
  Invalid hash for <repo>/01-index.tar.gz
  Invalid hash for <repo>/01-index.tar.gz

(reported at Cache causing invalid hash for 01-index Ā· Issue #1366 Ā· haskell/hackage-server Ā· GitHub)

Usually the error is indeed reported for the 01-index.tar.gz file, but sometimes we also see an error about snapshot.json.

The second one is an error from Stack, but seems to be also related to hackage infra:

2025-08-19 06:14:00.219502: [debug] Downloading archive from https://hackage.haskell.org/package/proto-lens-runtime-0.7.0.4.tar.gz
HttpExceptionRequest Request {
  host                 = "hackage.haskell.org"
  port                 = 443
  secure               = True
  requestHeaders       = [("User-Agent","Haskell pantry package")]
  path                 = "/package/proto-lens-runtime-0.7.0.4.tar.gz"
  queryString          = ""
  method               = "GET"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutDefault
  requestVersion       = HTTP/1.1
  proxySecureMode      = ProxySecureWithConnect
}
 InvalidChunkHeaders

(reported at stack unpack: error S-5170 -- InvalidChunkHeaders Ā· Issue #145 Ā· commercialhaskell/pantry Ā· GitHub)

Both problems are transient, usually retrying the failed jobs makes the workflow succeed (we are using GitHub Actions).

Maybe these problems are related, probably both have something to do with the CDN?

A few questions:

What is the difference between 00-index.tar.gz and 01-index.tar.gz and when is one used instead of the other? How does the verification work? And what could be a reason for a failure?

Are others also seeing this on a regular basis?

I was able to reproduce it with the following steps, run in a loop:

  1. rm -rf ~/.stack/pantry/
  2. stack –resolver ../snapshot.yaml unpack HUnit-1.6.2.0 QuickCheck-2.14.3 …
  3. rm -rf *

From 100 runs, I got 4 times error S-5170 which was caused 3 times by an VerificationErrorLoop resulting from a Invalid hash error for 01-index.tar.gz and once an InvalidChunkHeaders error.

For a single run, there also were a few VerificationError’s for 01-index.tar.gz reported, but apparently on the 4th try it could successfully verify the file and the run was eventually successful.

Thanks!

3 Likes

The InvalidChunkHeaders error comes from http-client and indicates that the body of the response is corrupted one way or another.

I tried to reproduce the issue directly with http-client like this:

module Main (main) where

import Control.Monad
import Network.HTTP.Types
import Network.HTTP.Client
import Network.HTTP.Client.TLS

main :: IO ()
main = do
  mgr <- newTlsManager
  req <- parseRequest
    "https://hackage.haskell.org/package/proto-lens-runtime-0.7.0.4.tar.gz"
  forM_ [1..100] $ \i -> do
    print i
    resp <- httpLbs req mgr
    unless (statusCode (responseStatus resp) == 200) $
      print resp

It worked fine for me. Could you please try it as well?

ADDED: Note that it’s making 100 requests, and might consume a lot of network traffic.

2 Likes

Hi Yuras,

I tried your script a few times, but it didn’t reproduce for me. But that might be because HTTP 301 responses are cached (and it becomes much more unlikely to hit the problem).

Note, when using curl, one can see that the first response is a HTTP 301 redirect which indeed uses chunked encoding (which I think is quite weird, since there is no body).

So, I used this inside a loop:

module Main (main) where

import Control.Monad
import Network.HTTP.Types
import Network.HTTP.Client
import Network.HTTP.Client.TLS
import System.Exit
import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString.Char8 as C

main :: IO ()
main = do
  mgr <- newTlsManager
  initReq <- parseRequest
    "https://hackage.haskell.org/package/proto-lens-runtime-0.7.0.4.tar.gz"
  let req = initReq { requestHeaders = [(hUserAgent, C.pack "Haskell pantry package")]}
  withResponseHistory req mgr $ \ resp -> do
    let redirects = hrRedirects resp
    forM_ redirects $ \ (_rreq, rresp) -> do
      print rresp
    let finalResp = hrFinalResponse resp
    bss <- brConsume $ responseBody finalResp
    unless (statusCode (responseStatus finalResp) == 200) $ do
      print $ finalResp { responseBody = L.fromChunks bss }
      exitFailure

I ran it 200 times and it failed just once:

## 33
!! FAILED !! 33
HttpReqSingle.hs: HttpExceptionRequest Request {
  host                 = "hackage.haskell.org"
  port                 = 443
  secure               = True
  requestHeaders       = [("User-Agent","Haskell pantry package")]
  path                 = "/package/proto-lens-runtime-0.7.0.4.tar.gz"
  queryString          = ""
  method               = "GET"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutDefault
  requestVersion       = HTTP/1.1
  proxySecureMode      = ProxySecureWithConnect
}
 InvalidChunkHeaders

1 Like

Oh, I could also reproduce it with curl!

Running curl -so /dev/null -w ā€˜%{http_code}\n’ ``https://hackage.haskell.org/package/proto-lens-runtime-0.7.0.4.tar.gz in a loop, it failed after a few tries:

curl: (56) chunk hex-length char not a hex digit: 0xd

So that indeed indicates a problem with the server setup… (or server implementation?)

4 Likes

OK, then let’s assume there is nothing wrong with http-client. But feel free to ping me if you find any issue on http-client side.

I tried the curl thing, but was not able to reproduce the issue after 100 requests. It doesn’t prove anything of course, but the issue might be between server and you, e.g. misbehaving proxy.

1 Like

Yes, I also just managed to trigger it once, and I ran it ~1000 times now… \edit: I hit it again locally, and I also just created a workflow for github actions here. It took 565 tries to hit the problem.

Well, it happens not only for me, but on our CI systems which live in a completely different region / network environment. And with TLS, ruling out some MITM attack, it seems more likely the problem is on the server side (behind wherever TLS is terminated) and not specific to me.

1 Like

Also, I could reproduce this using http instead of https. Here’s a trace of a run with curl:

<= Recv header, 39 bytes (0x27)
0000: 58 2d 54 69 6d 65 72 3a 20 53 31 37 35 37 35 39 X-Timer: S175759
0010: 31 31 30 31 2e 35 31 34 33 34 38 2c 56 53 30 2c 1101.514348,VS0,
0020: 56 45 31 34 38 0d 0a                            VE148..
<= Recv header, 28 bytes (0x1c)
0000: 74 72 61 6e 73 66 65 72 2d 65 6e 63 6f 64 69 6e transfer-encodin
0010: 67 3a 20 63 68 75 6e 6b 65 64 0d 0a             g: chunked..
<= Recv header, 2 bytes (0x2)
0000: 0d 0a                                           ..
<= Recv data, 5 bytes (0x5)
0000: 0d 0a 31 0d 0a                                  ..1..

As you can see, the last part is not a valid chunk at all. The server usually sends an empty chunk (30 0d 0a 0d 0a) but sometimes it is this. Almost looks like some sort of off-by-one error with wrong array indexing…

Meanwhile, I tried to reproduce it with a local hackage-server instance but this seems to work just fine.

1 Like

This is very useful info, please continue investigating if you can! I have been hearing reports of this error for a long time. Finally having a curl reproducer is a big step forward. I’m gonna try to replicate it.

2 Likes

I got curl to fail (after 2000 and 5000 iterations), but I didn’t get much good data out of the failure. I didn’t get any error message. I’ll have to keep trying.

@claudio , what did you use to get that output with the bytes info? Some combination of curl –raw and hexdump maybe?

$ curl --raw --fail-with-body https://hackage.haskell.org/package/proto-lens-runtime-0.7.0.4.tar.gz  | hexdump --canonical
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100     5    0     5    0     0      8      0 --:--:-- --:--:-- --:--:--     8
00000000  30 0d 0a 0d 0a                                    |0....|
00000005

Edit: also, in your local hackage-server, does it even use a chunked encoding? I can access the origin server behind the CDN for hackage.haskell.org, and it doesn’t look like it’s using it.

Edit 2: Confirmed again after 1253 iterations. Here is the last good response plus the final, invalid response:

00000000  30 0d 0a 0d 0a                                    |0....|
00000005
00000000  0d 0a 31 0d 0a                                    |..1..|
00000005

(And I forget -S, which is why I wasn’t seeing curl’s error message).

2 Likes

Nice that you could reproduce it. :+1:

You can use –-trace with curl in order to get the hex data output, either write it to a file or use - (stdout) or % (stderr).

Yes, my local hackage-server does use chunked transfer encoding (maybe you could use –-http1.1 with curl when you request the origin server?).

I have just opened a PR to prevent it from doing that and just send an empty body for this specific redirect. See Set Content-Length for permanent redirection by avdv Ā· Pull Request #1431 Ā· haskell/hackage-server Ā· GitHub

1 Like

Ah, the ā€œoriginā€ is, of course, using nginx as a reverse proxy, so I’m not hitting the bare hackage-server service. At any rate, I don’t see any chunked encoding from the origin, so somehow it seems like this error is coming from the CDN itself. Or at least that’s where the evidence is pointing. Note I’m not saying Fastly has a screaming obvious bug that manifests every ~1000 requests—I think they would have caught that if so. Maybe it’s something in our configuration, or the interaction. The hunt continues.

EDIT just adding the one-liner I used to repro, slightly expanded for readability:

(
rm -f hackage-cnt;
set -o pipefail;
while curl -sS --raw --fail-with-body \
    https://hackage.haskell.org/package/proto-lens-runtime-0.7.0.4.tar.gz \
    | hexdump --canonical;
do
    echo -n . >> hackage-cnt;
    sleep 0.1;
done;
wc hackage-cnt
)
1 Like

I have been thinking about this and this may be a dumb question, but are you sure you’re hitting the right endpoint on the origin server? If the request is using HTTP/1.1 it should always use chunked encoding:

$ curl -D- -s --http1.1 http://localhost:8080/package/proto-lens-runtime-0.7.0.4.tar.gz -o /dev/null -X GET -H User-Agent:'Haskell pantry package' -Hcache-control:no-cache
HTTP/1.1 301 Moved Permanently
Transfer-Encoding: chunked
Connection: Keep-Alive
Content-Type: text/plain; charset=UTF-8
Date: Thu, 18 Sep 2025 08:27:16 GMT
Location: /package/proto-lens-runtime-0.7.0.4/proto-lens-runtime-0.7.0.4.tar.gz
Server: Happstack/7.9.2.1

Nginx should just proxy the response without change, usually. Apparently it does so when the request flies in via Fastly… Could you post the curl response / trace to the origin server perhaps?

1 Like

I’ve asked to see the nginx config to see if anything funny is happening there.

I’ve also looked at the Fastly config and saw one funny thing that I’m asking about. It’s a peculiar config setting that could put us far enough off the beaten path to be hitting some weird edge case.

As far as I know, I’m hitting the origin correctly. You can do it, too. It’s not a big secret, but I’m not gonna write out the origin domain name explicitly. (I asked an LLM and it was able to figure it out from another post on this Discourse. :slight_smile: )

From another angle: I’ve done some code spelunking, and /package/:tarball has been served from a module called ā€œLegacyRedirectsā€ since at least 2010. (I was curious where the 301 was actually coming from.) The 301 is the only thing that has the funny chunked transfer encoding, so I’m gonna look into whatever part of Stack is using that legacy url in the first place.

3 Likes

Regarding ā€œwhat part of Stack is using ā€¦ā€, perhaps it is Pantry.Hackage.getHackageTarball from the pantry package?

If it is, that part of Pantry dates from 2018:

1 Like

I made a few more tests related to the Invalid hash errors. It seems these are solely due to some inconsistent information being cached (probably by Fastly).

I added some debug output and apparently whenever the 01-index.tar.gz changes, there might be a timespan where the information in the snapshot.json file is fresh but the cached tarball is stale which leads to a file length mismatch:

# LHS is size is from the remote, RHS is size from the trusted info

mismatched file length: FileLength {fileLength = 130613598} /= FileLength {fileLength = 130614301}

mismatched file length: FileLength {fileLength = 130614301} /= FileLength {fileLength = 130615977}

As illustrated by the output above, the tarball was updated to a new size (130614301) but the (CDN) server still served the old tarball. Same for the second output.

If the CDN gets into this condition, it can take quite a long time (in my testing it took over a minute) until the information is consistent again – probably the 300 seconds configured as the cache’s max-age.

3 Likes

Oh that’s a good catch. Not sure what we can do about it.. we sort of need the CDN turned up high to keep our bandwidth under control as our new hosting forces us to limit usage. Perhaps there’s some varnish setting to ensure the two files are coupled?

1 Like

A hypothesis from a similar issue reported to me by herbert could be that range queries do not force the index-01 to be refetched by the cache server, only normal queries do. that could lead the index-01 to not be fetched as rapidly as the metadata, among other problems.

alternately, starting to fetch a late range could trigger a refresh of the whole file, but then the range query itself could timeout before the correct part of the file is reached.

maybe we just need to decide that range support for our use case still isn’t adequate in varnish and turn it off…

or we could try the caching chunks trick decided here, or something else: Caching partial objects with varnish 4.0

1 Like

I see. But wouldn’t a shorter TTL be mitigated by making the caching proxy revalidate conditionally using the ETag? How often does a snapshot / index get updated on average?

Of course, that would also not really solve the problem just make it less likely to hit.

After asking an LLM and reading up a bit on it, it does not seem so. But one can purge a cached entity with a HTTP request.

I noticed that the snapshot.json file has a max-age of 60 seconds, whereas the tarball has 300 seconds. So the snapshot file is indeed fetched more frequently.

Also note that I just tested with downloading whole files, not ranges.