I am working on a project which compiles an expression language embedded DSL into a circuit/graph-like structure. The evaluation of these expressions is via a feed forward evaluation on the circuit, so if you were going to run it on multiple sets of inputs ideally you would compile the DSL expression only once.
I have binary codecs for the circuit types, so to construct an evaluation executable it’s easy enough to run the compiler as a one-time setup, write the output circuit to a file, and have the evaluator deserialize this file. You can even use TH to bake that file into the executable..
However, when I tried to use compile time evaluation via TH lift
(which to me seems more natural as it doesn’t deserialize on every evaluation), I noticed that compilation was extremely slow and consumes an ungodly amount of memory. I’m not savvy enough to be able to profile this static evaluation, but I did spend a lot of time with the profiler when developing the compiler. I can definitely tell that whatever GHC is doing to statically evaluate the compile expression, it is way more intensive than if you were to run the compile phase separately – to the point of consuming all memory on my machine for large circuits that are easily compiled using the other method.
I don’t know a lot about TH and even less about what the static evaluation is doing, and it’s possible that I’m just doing something wrong. Does anyone know if there are some caveats/“gotchas” that come with lift
? Is there any guide to avoiding them? Does anyone have any idea what profiling techniques I can use in this instance to figure out what GHC is doing?