Embedding files in WASM

I’m writing an application in Haskell that compiles both to native and to WASM, using the GHC backend. I would like to be able to embed a directory of text files in the binary; is there a way to do that? I thought of using Template Haskell + something like file-embed but as far as I can tell TH doesn’t work on WASM. Is there any other way I can do that, or can I get TH to work? (using TH would actually solve some other issues too, but i’ve worked around the ones i’ve encountered so far, so it’s fine if it doesn’t work)

You can encode (e.g. string escape or base64) the files and embed those as string literals. It should be pretty easy to write a script that does that for you automatically. If you want that to happen automatically on every build, you can use a custom Setup.hs.

Surprisingly, I can’t easily find any packages that escape strings such that they are valid Haskell string literals. So, base64 encoding is probably your best bet if you want something off the shelf.

I thought show for strings did the escaping into a string literal

You’re right, I forgot about that.

You can use wizer for this: Define sth like

embeddedDir :: [(FilePath, ByteString)]
embeddedDir = unsafePerformIO $ Data.FileEmbed.getDir "/embed"
{-# NOINLINE embeddedDir #-}

foreign export ccall loadEmbeddedDir :: IO ()

loadEmbeddedDir :: IO ()
loadEmbeddedDir = void $ evaluate $ rnf embeddedDir

and add a c-sources: init.c to your .cabal file with content like

#include "Rts.h"

#include "Main_stub.h"

__attribute__((export_name("wizer.initialize"))) void __wizer_initialize(void) {
  hs_init(NULL, NULL);
  loadEmbeddedDir();
  hs_perform_gc();
  hs_perform_gc();
  rts_clearMemory();
}

and then run

wizer --allow-wasi --wasm-bulk-memory true \
      $(wasm32-wasi-cabal list-bin my-exe) -o wized.wasm \
      --mapdir /embed::/path/to/dir/to/embed

The resulting wized.wasm then can use the top-level embeddedDir declaration.

See here for more details: Glasgow Haskell Compiler / ghc-wasm-meta · GitLab
In particular, you can also do arbitrary precomputations on the embedded data, depending on your usecase.

We also use this approach in Ormolu Live in order to embed and parse the operator fixity database.


But indeed, TH would be more convenient, see #24376: Template Haskell not working with WASM backend · Issues · Glasgow Haskell Compiler / GHC · GitLab for the tracking ticket.

2 Likes

This seems promising. How would I extend that to native, or would I just use TH in targets where it’s supported?

Using TH where supported :+1: (e.g. via CPP and #ifdef wasm32_HOST_ARCH).

hmm, this seems to work for running in the browser but not in wasmtime, i get this error:

Error: failed to run main module `w.wasm`

Caused by:
    0: failed to invoke command default
    1: error while executing at wasm backtrace:
           0: 0xdb309 - wasm-embed-test.wasm!StgRun
           1: 0xbc4d5 - wasm-embed-test.wasm!scheduleWaitThread
           2: 0xaf5a9 - wasm-embed-test.wasm!rts_evalLazyIO
           3: 0xb2027 - wasm-embed-test.wasm!hs_main
           4: 0x8dff - wasm-embed-test.wasm!main
           5: 0xfdcd5 - wasm-embed-test.wasm!__main_void
           6: 0x8695 - wasm-embed-test.wasm!_start
    2: wasm trap: undefined element: out of bounds table access

here is all my code: gist

after some more investigation, it looks like it works when it’s used not in main (and with -no-hs-main) and ran in the browser; but doesn’t work when used in main, neither on wasmtime nor on the browser

I think the problem here is that by default, the GHC WASM backend creates a WASM command module (see here for more context), which in particular has a _start function that initializes the Haskell RTS and runs main. However, the Haskell RTS was already initialized via wizer, so you get an error.

You can circumvent that by instead compiling as a reactor, and exporting main as _start manually, see the latest revision here: Revisions · wasm wizer embed file test · GitHub

this worked, thanks!