Where can I learn more about the interactions of ghc, gcc, the FFI, and the linker?

We are in the process of splitting pandoc into smaller packages. E.g., it is now possible to compile pandoc without Lua support. Doing this, I noticed that the output becomes a lot smaller when excluding Lua, but only when linking dynamically. The effect on static binaries is almost negligible.

Effect of excluding Lua from pandoc (GHC 9.2.3 on Linux):

  • dynamic linking: -40%
  • static linking: -2%

Steps to reproduce (use --flag -lua in the call to cabal to toggle the inclusion of Lua)

git clone https://github.com/jgm/pandoc && cd pandoc
cabal build pandoc-cli --constraint='lua -export-dynamic'
find dist-newstyle -type f -name pandoc \
    -exec strip '{}' \; \
    -exec ls -l '{}' \;

# check binary size in and compare with
cabal build pandoc-cli --constraint='lua +export-dynamic'
find dist-newstyle -type f -name pandoc \
    -exec strip '{}' \; \
    -exec ls -l '{}' \;

This counter-intuitive to me, as I would have expected a statically linked binary to be bigger. Can someone help me to shed some light on this? Or, better yet, could someone point me to some resources that’d allow me to learn more about the issue myself?

2 Likes

How large are the absolute file size differences?

Tried split sections?

https://downloads.haskell.org/ghc/latest/docs/users_guide/phases.html#ghc-flag--split-sections

Numbers on the size difference:

Lua linking pandoc size
yes dynamic 190 MB
yes static 113 MB
no dynamic 111 MB

This is my cabal.project.local for pandoc:

flags: +embed_data_files
tests: True
optimization: 1
test-show-details: direct
test-options: -j4
library-stripping: True
split-sections: True
1 Like

The lua package’s export-dynamic does not control dynamic or static linking, but it controls whether the lua library symbols should be exported from the lua haskell library.

It makes sense that this will make the lua haskell library bigger: if you don’t export them the linker can garbage collect unused features functions and features.

1 Like

(True to form, the pandoc maintainer produces a nicely formatted table :sweat_smile:)

5 Likes

Thanks for looking into this, what you say makes a lot of sense. The thing I still don’t understand is why a linker would add more than 70 MB for a 2 MB library. Bit excessive, isn’t it?

I did not fancy building pandoc, but I looked at a smaller reproducer: cabal init and then adding the lua package.

Small exe:

$ cabal build --constraint='lua -export-dynamic' --builddir=dist-small
...
Linking .../src/pandoc-lua-size/dist-small/build/x86_64-linux/ghc-9.2.2/pandoc-lua-size-0.1.0.0/x/pandoc-lua-size/build/pandoc-lua-size/pandoc-lua-size ...

$ ls -lah dist-small/build/x86_64-linux/ghc-9.2.2/pandoc-lua-size-0.1.0.0/x/pandoc-lua-size/build/pandoc-lua-size/pandoc-lua-size
-rwxr-xr-x 1 adam users 3.4M Oct 22 11:41 dist-small/build/x86_64-linux/ghc-9.2.2/pandoc-lua-size-0.1.0.0/x/pandoc-lua-size/build/pandoc-lua-size/pandoc-lua-size

$ readelf --syms dist-small/build/x86_64-linux/ghc-9.2.2/pandoc-lua-size-0.1.0.0/x/pandoc-lua-size/build/pandoc-lua-size/pandoc-lua-size | grep contains

Symbol table '.dynsym' contains 239 entries:
Symbol table '.symtab' contains 5213 entries:

Big exe:

$ cabal build --constraint='lua +export-dynamic' --builddir=dist-big
...
Linking .../src/pandoc-lua-size/dist-big/build/x86_64-linux/ghc-9.2.2/pandoc-lua-size-0.1.0.0/x/pandoc-lua-size/build/pandoc-lua-size/pandoc-lua-size ...

$ ls -lah dist-big/build/x86_64-linux/ghc-9.2.2/pandoc-lua-size-0.1.0.0/x/pandoc-lua-size/build/pandoc-lua-size/pandoc-lua-size
-rwxr-xr-x 1 adam users 14M Oct 22 11:41 dist-big/build/x86_64-linux/ghc-9.2.2/pandoc-lua-size-0.1.0.0/x/pandoc-lua-size/build/pandoc-lua-size/pandoc-lua-size

$ readelf --syms dist-big/build/x86_64-linux/ghc-9.2.2/pandoc-lua-size-0.1.0.0/x/pandoc-lua-size/build/pandoc-lua-size/pandoc-lua-size | grep contains

Symbol table '.dynsym' contains 32641 entries:
Symbol table '.symtab' contains 36147 entries:

So we can see that the exe with +export-dynamic contains a lot more symbols and if we inspect the readelf output we see that it exports a lot of Haskell symbols.

If we add --verbose to the cabal build, copy out the final ghc command and add -v -fforce-recomp to it we can see that lua's -Wl,-E is present in the final link line:

$ cabal build --constraint='lua +export-dynamic' --builddir=dist-big --verbose
...
Linking...
.../.ghcup/bin/ghc --make ...

$ .../.ghcup/bin/ghc --make ... -v -fforce-recomp
...
*** Linker:
gcc ... -Wl,-E ...
...

So we can make a good guess that the reason the lua +export-dynamic pandoc exe grows so much is that it exports (and therefore includes) a big chunk of all the Haskell dependencies that were not needed.

1 Like

lua says this about the export-dynamic flag

Add all symbols to dynamic symbol table; disabling this will make it possible to create fully static binaries, but renders loading of dynamic C libraries impossible.

I think this must mean loading dynamic C libraries that use the lua runtime, and the intent is only to export the symbols from the lua c library. (Fix linux binary build process · Issue #3986 · jgm/pandoc · GitHub confirms that this is the case.)

I tried making the exports more specific by using the --dynamic-list linker feature, but I’m afraid I was not able to conjure up the right combination of cabal/ghc/gcc/ld flags.