Serializing Haskell functions to disk

Every data type you define is typeable and I believe all built-in types too, but existentially or universally quantified types are not.

3 Likes

You could fork hell, which is 1.2k lines of code in a single file and gets you a mini-Haskell for scripting. It’s pretty easy to configure what types are supported, and what primitives, both monomorphic and polymorphic. It has some basic type clases like Eq, Ord, Show and Monad, but you can’t write your own within the object language.

I kept it intentionally one file to artificially limit the size of the implementation, and to make it easy for someone to fork and re-use for another purpose.

You could generate an untyped AST in haskell-src-exts format, and that’s easy to serialize to/from disk. [The subset of] Haskell syntax is stable over time, unlike some binary format which is likely to lead you to trouble. GHC’s API and general infrastructure is also fast-moving and wobbly, so I would never use it in a deployed app. If you’re always planning on generating code that has type annotations, then you probably don’t need type-inference and could cut out that whole block of code and bring it down to 500~ lines.

Performance-wise, a basic fib implementation outperforms GHCi slightly, which isn’t a brag or rigorous, but does indicate that its performance isn’t bad. Which makes sense, the evaluator would fit on a napkin and doesn’t really do anything.

One small: if performance or distribution becomes a problem, you can generate Haskell code from a Hell AST and compile it with GHC (haskell-src-meta: Parse source to template-haskell abstract syntax.), as it doesn’t “add” any features that Haskell doesn’t already have, and output WASM down the line if needed. That adds the burden of depending on the GHC toolchain, but it’s a path forward nonetheless.

3 Likes

more to the point, do things like Vector a → a have a Typeable interface? I suspect we’re talking of trainable models in the machine learning sense here.

That’s a great practical example and it shows what I mean, namely the function and vector are typeable, but the polymorphic a is slightly problematic. Instead you’d have to use Typeable a => Vector a -> a.

1 Like

Based on the description of the problem, I would personally look into defunctionalizing your Strategy. I.e. instead of serializing the function, serialize whatever data you used to construct that function. After all, you’re not reading Haskell code from an input field in the UI or from a database, serialize the thing you’re reading or maybe serialize an intermediate stage between the raw input and the eventual Strategy.

Independent of the language used, serializing functions is very messy conceptually (except maybe for languages like C where functions can’t have closures).

I think of defunctionalization sort of as an incremental way to implement a DSL. Initially, there’s only one Strategy, so your DSL consists of the grammar (), i.e. that one Strategy is the only thing that’s expressible:

theOnlyStrategy :: UTCTime -> Bid
theOnlyStrategy = the . winning . formula

data DSLv1 = TheStrategy

toStrategy :: DSLv1 -> Strategy
toStrategy TheStrategy = theOnlyStrategy

You add 2 more Strategys in the next release, one parameterized over an integer and another over a boolean, then your DSL is represented as S1 | S2 Integer | S3 Bool and you have a simple migration from the first version of the DSL and you have a new toStrategy from DSLv2:

migrate :: DSLv1 -> DSLv2
migrate TheStrategy = S1

toStrategy :: DSLv2 -> Strategy
toStrategy S1 = theUsedToBeOnlyStrategy
toStrategy (S2 i) = ...
toStrategy (S3 b) = ...

This would give you the following benefits:

  • You have complete introspection into serialized strategies, you can display them, design domain-specific UIs for modifying them, statically analyze them and even translate them to an SMT solver and prove properties about them.
  • New releases can improve existing strategies or fix bugs in them
  • It’s just simple Haskell, no need to worry about unspeakable horrors involved in serializing code + runtime closures.
9 Likes

That seems to be the best approach for now. Thank you!

1 Like

also from an ML angle, if the strategy belongs to a parametric family of functions (e.g. it’s a polynomial of a fixed degree), you only need to serialize the coefficients so the closure only effectively needs to exist at runtime:

eval :: DSLv3 -> (UTCTime -> Double)
eval (S3 i1 i2 i3) t = i3 + i2 * t + i1 * t**2
3 Likes

Potentially of interest:

1 Like

By what point do you figure you’re implementing a Lisp with Haskell?
Might as well use Common Lisp with Coalton.

Bartos has this art school teacher vibe : "lambdas are closures mmmkay?

1 Like

I think that’s the C++ programmer in Bartosz speaking. When he says “lambdas are not named functions”, he means named functions in the C++ sense, where they can only be top-level. And when he says lambdas are closures, he means it in the sense that C++ lambdas get turned into a closure object if they have a non-empty capture clause. So I think that bit is him talking to his inner C++ that he can’t just pass around function pointers (which are effectively StaticPtr (a -> b) in Haskell land)