How do I clone Hackage?

So, I want to get the source code of all packages on Hackage at once, and ideally also all the metadata like number of downloads and reverse dependencies. Is there some handy tool that would do this for me? And how much would it weigh in the end — 1GB, 10GB, 100GB?

Running a local instance of Hackage is not my goal at this time, though it would be nice to have.

2 Likes

cabal list --simple will print out every single package on Hackage.

So

cabal list --simple | sed 's/ /-/

will give you everything and then

cabal list --simple | sed 's/ /-/' | xargs cabal fetch

would cache it while

cabal list --simple | sed 's/ /-/' | xargs cabal get

would download+unpack them.

4 Likes

You can use this small program that I designed exactly for this purpose:

It’s a component of https://hackage-search.serokell.io/ that downloads package sources from Hackage.

3 Likes

This is the tool used by the hackage backup mirrors to clone all relevant information: GitHub - haskell-hvr/hackage-mirror-tool: Hackage mirroring tool

These mirrors basically only contain sourcecode and package-relevant metadata, not the information for a UI, so it is very close to what you want.

Note that the “package relevant metadata” (in the form of cabal files and revisions) also already exists on your machine, in the form of the 01-index.tar which cabal fetches from hackage.

That said, neither this nor any other mirror tool I am aware of will fetch or clone download counts. Those would need to be scraped separately.

1 Like