GHC WebAssembly Weekly Update, 2023-03-08

Hi all, the GHC WebAssembly weekly update is back!

Starting this week, I’m trying to change the format a little bit. Instead of a single huge bullet pointed list, the update will now consist of subsections with paragraphs. I hope the new format is easier to read and provide better explanations on my work.

As usual, you’re more than welcome to ask questions shall you find some vague contents in these weekly updates! One challenge in technical writing is I don’t know what readers don’t know, though I still hope I can do a better job at this in the future.

Previous update:

A ghc-prim bugfix

I fixed a bug in ghc-prim that causes incorrect runtime result of casInt64Array# in unregisterised 32-bit builds (!10044). This bug was discovered by a testsuite case AtomicPrimops that infinitely loops in the unregisterised build. The fix is annoyingly simple, though it took me a few days to dig into the offending module’s assembly code and pin down the exact scene of crime.

Rest assured, I’ve formally run out of codegen bugs to fix for the time being. The remaining failures in the testsuite are trivial ones that are either expected to be broken, or just needing a bit testsuite driver enhancement to handle the cross compile scenario.

The short-term goal is to get the testsuite actually running on CI. Once I get to this milestone by the end of Q1, I will proceed to implement more exciting user-facing features: the JavaScript FFI and Template Haskell support.

Testsuite driver debugging horror

I looked into a testsuite driver issue that blocks the work of running the entire GHC testsuite against the wasm backend on CI (#22889). I thought I’ve fixed it with some previous refactorings (!9919), turns out I was too naive.

It’s never ever fun to debug a legacy Python codebase with multi-threading logic and random livelocks. I can assure you I’d rather debug ten codegen bugs in my backend than this one particular bug.

So far I’ve managed to use gdb to look into a debug build of cpython, found the exact place that’s stuck at runtime is python’s builtin ThreadPoolExecutor.

My next plan is a deeper refactoring in the driver: get rid of multi-threading logic completely, use asyncio/coroutines for concurrent execution of test cases.

Experiment with wasm simd128

The upcoming Safari 16.4 release comes with wasm simd128 support, after which all major browser engines plus wasmtime will support this wasm feature out of the box. So I experimented with enabling simd128 in our wasi-sdk sysroot and our default C/C++ compile flags.

Even if the wasm backend doesn’t support the GHC simd primops yet, some clang optimizations like loop unrolling will be able to generate wasm simd128 opcodes which helps with performance.

Result of my experiment: no testsuite regression, it just works! So I’m very tempted to enable simd128 by default once Safari 16.4 is released.

One caveat: the bindists shipped in ghc-wasm-meta will produce wasm modules that are accepted by less wasm runtimes, since some of them still don’t have good simd128 support yet.

Dear all, if you use the GHC wasm backend and run the output wasm modules in environments other than one of these runtimes:

  • V8 (Chrome/Edge/deno/nodejs/cloudflare)
  • SpiderMonkey (Firefox)
  • JavaScriptCore (Safari/bun)
  • Wasmtime (fastly)
  • WasmEdge

Then it would be very nice if you could leave a comment here to mention your use case.

Misc other work

  • With help from @nomeata in his example code that runs Haskell wasm module on fastly’s platform, I corrected the custom import code example. The example code serves as a template if one wants to call custom import functions in Haskell or C.
  • A patch to wasi-libc (#401) to slightly improve dlmalloc code size & performance by replacing C implementations of clz/ctz with builtin opcodes.
  • A PR to wizer (#69) that optionally enable wasm simd128 support.
  • I reported a V8 core dump (#13753) triggered by a wasm backend output when wasm tail-call is enabled. It turns out to be a nodejs embedding issue (#46777) that seems to be a misuse of V8 fast calls for uvwasi.