I want to advertise fancy solutions so much, but if I were to start a company with my own money, my honest answer would be using the battle tested hslua
library for scripting layer in a Haskell app. Now this is boring and off-topic, so there are a couple of fancy solutions about serializing Haskell functions to disk:
Static pointers
Static pointers is a mechanism to assign unique fingerprints to Haskell functions. These fingerprints are 128-bit hashes that can be serialized, hence the name “static pointer” since no code is really serialized and dynamically loaded.
Caveats:
- It does not address the issue of dynamic code loading at all. Say you have deployed your backend and you want to load a user-written Haskell script. You need to insert user script into your project and recompile the monolith and redeploy and restart.
- You can only assign static pointers to functions without free variables, not arbitrary function closures at runtime. So you need to use special combinators to compose existing static pointers to emulate “function closures” that can be serialized. It feels awkward tbh.
- You can only assign static pointers to monomorphic functions without constraints, so working with type classes is another pain spot.
- The fingerprint logic is very ad-hoc and you should assume that once you recompile your stuff, existing fingerprints you’ve serialized have likely been invalidated.
Using ghci linker
There’s a well known blog post on Simon Marlow’s blog that explains the idea. It works for your use case as long as you’re willing to dig into GHC internals, but also: you’re completely on your own to guarantee ABI stability. Even if the user pluggable function has very primitive type like ByteString -> IO ByteString
, as long as that function refers to something provided by your backend, then it’s prone to ABI changes when you recompile your backend.
Using ghci interpreter
This is much more heavyweight than the above; your backend now depends on the entirety of GHC API as well as the global package database. The backend sets up a GHC API session, consumes user-written Haskell script as string, then load that string using the same evaluation mechanism of ghci.
Slow but steady memory leak awaits because of CAF retention, FastString table, etc. If your backend can tolerate periodic restart then it’s the quickest way to get going though.
One common caveat of using ghci interpreter or stuffing a whole ghci in your backend is to what extent do you trust user input script. The script, either in compiled shared object form or ghci bytecode form, will live in the same address space as your entire backend and there’s a ton of dirty business a malicious user Haskell script can do. Even with safe haskell and very carefully restricted prelude, it’s trivial to write a script to allocate a ton as a naive dos attack.
Using ghc wasm backend
Use the ghc wasm backend to compile user Haskell scripts to wasm modules, then run those modules in your Haskell backend. The RPC calls between wasm side and host side can be achieved specifying a set of operations the wasm side may perform on the host side. Since this is a shameless self-plug, allow me to at least highlight the pros before going for the cons:
- Proper sandboxing, the wasm module may only perform side effects based on the capabilities granted by the host. It’s also trivial to enforce resource constraints, both memory consumption and execution time can be limited, so you have more confidence consuming arbitrary user input.
- Once the wasm/host interfaces are properly defined, the same wasm modules can be deployed to a fleet of different host runners with different hardware and operating systems. And the ABI stability issue in other solutions doesn’t exist here.
- The wasm module execution state can be snapshotted and restored much more easily compared to native binaries.
The main caveat that comes to my mind:
- It’s still in its early days and haven’t seen enough seed users yet. Bus factor is still essentially 1 at the moment, and Template Haskell support is still missing.