Executable size

I’m evaluating Haskell for writing simple CLI tools. I have (almost) none experience with Haskell in production.

As for the code, Haskell just rocks. Well, I may be biased, since I love FP, and Haskell syntax is very clean (comparing to OCaml, or Go). The compiled executable, though, is very big.

I wrote a simple app, to download movies from some streaming platform (using yt-dlp). All it does is: (1) check the available audio and video formats, (2) ask user, which formats should be downloaded, (3) downloads the files, (4) merge them into the final mp4, using ffmpeg. Nothing complex.

The compiled binary, after stripping (and using this -split-sections GHC option), is 9.4MB. I have written the same app in both Go and OCaml, achieving binary sizes of 1.2MB and 2.8MB, respectively. I have also noticed, that the Haskell binary is linked to the libgmp library, while the OCaml’s binary is not. (I find it strange, since this library shouldn’t be used at all in such a simple app…)

As for the Go’s executable, I get rid of the fmt package, which adds much to the size, so this 1.2MB is a little size-optimized.

Is there any sane way to limit executable size in Haskell? (Well, I could have “invented” my own ExceptT monad, and all needed lift functions, instead of using mtl, but it doesn’t seems like a sane solution…)

Or, perhaps, the size is not a problem at all, since further source code growth won’t cause the executable size growth (too much)?

8 Likes

Unfortunately, executable sizes in Haskell are significant.

I have also noticed, that the Haskell binary is linked to the libgmp library

Not sure how OCaml handles big integers, but Haskell uses libgmp for arbitrary-precision integers. It’s tied to base whether you like it or not. Same as it links libpthread even if you don’t fork any threads. You can easily get executables of a couple hundred MBs with Haskell.

One thing you might try to do is to link executables dynamically, but if you want a static binary, there isn’t much you can do. I guess that in general GHC doesn’t take many steps to reduce sizes, because it isn’t generally an issue for the community.

4 Likes

It won’t change the in-memory footprint (AFAIK), but running upx on a statically compiled binary reduced size significantly. If I recall, it was about 10% of the original size after compression.

6 Likes

The GHC user’s guide lists these things:

11.3. Smaller: producing a program that is smaller

Decrease the “go-for-it” threshold for unfolding smallish expressions. Give a -funfolding-use-threshold=0 option for the extreme case. (“Only unfoldings with zero cost should proceed.”) Warning: except in certain specialised cases (like Happy parsers) this is likely to actually increase the size of your program, because unfolding generally enables extra simplifying optimisations to be performed.

Avoid Prelude.Read.

Use strip on your executables.

7 Likes

Which platform and architecture are you targeting.

3 Likes

Thank you for ALL your answers!
I haven’t expected such interest.

As for the platform, my main target is GNU/Linux.

2 Likes

Would “runghc” an acceptable quasi-scripting alternative for this use case?

1 Like

So you package an entire GHC installation with your program that is around 2 GB?

1 Like

probably not :slight_smile: I guess it’s only for local scripting, so I am probably missing the mark here.

fwiw, in comparison to a more common scripting language in terms size shipped:

$ rpm -q --queryformat "%{NAME} %{SIZE}\n" python3-libs python3
python3-libs 33170288
python3 33316
1 Like

Quasi-scripting? Yes. Rely on runghc? Not sure. What if I need some external dependencies, currently managed with cabal?

For Bash / Python / Golang I don’t need to worry about external dependencies, since they are already in the system and/or standard library. For Haskell (and OCaml), I do. It’s not a problem, as long as I can compile my app on my machine, and then distribute the binary to other machines.

1 Like

Using stack (example: tiny-games-hs/hackage/pong2/pong2.hs at main · haskell-game/tiny-games-hs · GitHub):

#!/usr/bin/env -S stack script --resolver lts-20 --package ansi-terminal-game

Using cabal run:

#!/usr/bin/env -S cabal run --index-state=2023-03-05T09:21:17Z
{- cabal:
build-depends: base, ansi-terminal-game ==1.8.1.0
ghc-options:   -threaded
-}
import Terminal.Game;main=playGameS(Game 20(10,10,1,1,10,0)l d e)>>=finish
e(x,y,a,b,z,s)=x<2&&(y<z||y>z+8);l _(x,y,a,b,z,s)e=(x+a,y+b,f 79 x a,
 f 21 y b, min 15$max 2$z+case e of{KeyPress 'w'-> -1;KeyPress 's'->1;_->0},
 s+if x<2then 1 else 0);d r(x,y,dx,dy,z,s)=mergePlanes(blankPlane 80 24)
 [((1,1),box 80 23 '█'),((2,1),blankPlane 79 21),((z,1),box 1 8 '▕'),((y,x),
 box 1 1 '⬤'),((24,60),stringPlane(show s))]
f m x a=if x<3 then 1 else if x>m then -1 else a
finish (_,_,_,_,_,s)=putStrLn$unwords["You scored",show s,"points!\n"]
-- ^10 ------------------------------------------------------------------ 80> --
{- hackage-10-80/pong (gergoerdi)

-}

But I digress from the original post.

3 Likes

Since you mentioned CLI tools in the plural, the busybox scheme of having multiple programs in one executable works perfectly fine in Haskell. Create multiple symlinks to your haskell executable and then use getProgName to get the name of the symlink used to invoke your exe.

6 Likes

You can also try to build your binary with -dynamic flags, which will using dynamic link for all of your dependencies. There will be another benefit if you have multiple Haskell command binaries, they will share the same .so in memory to reduce memory cost.

Your app probably doesn’t compile with MicroHa (yet), but MicroHs binaries are 10-100x smaller than GHC binaries. And 10x slower.

Give it a try, and report any problems (probably packages that don’t compile).

2 Likes

On Windows 11, an executable that does nothing (main = pure ()) has (stripped):

> stack --snapshot ghc-8.10.7 build
> (dir .stack-work\install\b0f6abbe\bin\*.exe).Length
1259520

> stack --snapshot ghc-9.0.2 build
> (dir .stack-work\install\d5d85271\bin\*.exe).Length
6004224

> stack --snapshot ghc-9.12.2 build
> (dir .stack-work\install\a69d16be\bin\*.exe).Length
6904320

EDIT3: Or, with ‘pure GHC’ (the default runtime is single-threaded):

> stack --snapshot ghc-8.10.7 ghc -- Main.hs
> stack --snapshot ghc-8.10.7 exec -- strip Main.exe
> (dir Main.exe).Length
1216000

> stack --snapshot ghc-9.12.2 ghc -- Main.hs
> stack --snapshot ghc-9.12.2 exec -- strip Main.exe
> (dir Main.exe).Length
6801408

EDIT1: On Ubuntu 24.04.3 LTS (via WSL 2) (executable named testNullSize):

$ stack --snapshot ghc-8.10.7 build
$ stat -c %s "$(stack --snapshot ghc-8.10.7 path --local-install-root)/bin/testNullSize"
796992

$ stack --snapshot ghc-9.12.2 build
$ stat -c %s "$(stack --snapshot ghc-9.12.2 path --local-install-root)/bin/testNullSize"
950016

EDIT2: On Windows 11, I looked at what GHC was doing during its linking phase (with stack build --ghc-options -v), as between 8.10.7 and 9.0.2. I can see that GHC is pulling in (i.e. -l) different versions of things and additional things, but I understand that those additional things should not be radically increasing the size of the executable file because they are system libraries that are, essentially, dynamically linked:

8.10.7 9.0.2
HSbase-4.14.3.0 HSbase-4.15.1.0
HSinteger-gmp-1.0.3.0 n/a
n/a HSghc-bignum-1.1
HSghc-prim-0.6.1 HSghc-prim-0.7.0
HSrts_thr HSrts-1.0.2_thr
n/a ws2_32
n/a ole32
n/a rpcrt4
n/a ntdll

With MicroHs: 195464

I apologize for the large size, but the runtime system takes up 195368 on its own. So for larger programs it looks less sever. E.g., the MicroHs compiler itself takes 24.4MB when compiled with GHC, but only 0.72MB when compiled with mhs.

5 Likes

It seems that the size of GHC compiled executables has greatly increased over the years.

I think it could be a good idea to add regression tests for executable sizes in GHC CI. Then this size increase would not have been so invisible.

2 Likes

Is it possible to link the GHC/MicroHS runtime dynamically but statically link the dependencies? This could greatly save on executable size while still keeping the binaries relatively sharable among different systems.

As of about a year ago GHC does include this: testsuite: Add mechanism to collect generic metrics (!11612) · Merge requests · Glasgow Haskell Compiler / GHC · GitLab

1 Like

Yes, it would be possible to link the MHS runtime dynamically. But I wanted to avoid the complexity of it. So you pay 200k overhead for every executable.

1 Like