I’m looking for ways to translate modern Haskell code into relatively modern, portable C code.
I previously experimented with the old -fvia-C option from unregistered GHC builds, but it no longer seems to work with modern versions of Haskell.
I also tried a different pipeline: compiling Haskell to LLVM IR using GHC, and then using the LLVM C backend (LLVM CBE) to generate C code. However, the LLVM IR produced by GHC follows Haskell’s runtime and evaluation conventions, which do not map cleanly to the assumptions made by LLVM CBE when generating C.
As a result, I have not been able to obtain a working Haskell → C translation pipeline using this approach.
Does anyone know of any viable approaches or experimental tools that could help achieve Haskell → C translation in a reasonably portable way?
Any pointers or suggestions would be greatly appreciated.
GHC’s “unregisterised” C backend should still work. However, you don’t pass -fvia-C option to GHC – GHC itself needs to be built for the C backend. This means, you need to build GHC yourself.
The -fvia-C option was used for the “registerised” C backend that existed on GHC <= 7.0.
If the compatibility with GHC is not important, you might want to try MicroHs, which bootstraps with a C compiler according to its document.
Does it specifically need to be compiled to C, or would it be workable for it to be callable from C instead? The FFI can be used to make Haskell functions available to C, not only the other way around.
Thanks a lot for all the detailed answers — I really appreciate the insights.
To clarify my use case a bit more: my goal is not portability per se, but rather to use C as an intermediate representation step in a toolchain. In particular, I am interested in leveraging existing C-based obfuscation tooling (e.g. Tigress) as part of a pipeline for transforming Haskell programs. After that, I would like to further lower the result to LLVM IR and analyze it, especially to reconstruct data-dependency graphs using LLVM passes.
I did try MicroHs as suggested, but the generated C code appears to represent the Haskell runtime rather than the original program structure. It essentially produces a low-level runtime-like representation (large table), which makes it difficult to recover meaningful data-dependency information at the LLVM level.
Regarding NASA’s Copilot, I understand it can compile a subset of Haskell-like stream programs to C99, but it seems to target a very specific domain (embedded, real-time stream processing), so it would not be applicable to general-purpose modern Haskell code.
Thanks again for all the suggestions — they were very helpful in clarifying the landscape of existing approaches.
what do you need exactly from the obfuscation tooling?
The only haskell decompiler i know of only works if you have the debug symbols for the program. also the generated assembly is extremely hard to follow due to all the inlining and lazyness.
If the only concern is reverse engineering the only thing you need to prevent that for atleast a decade is just to ensure no debug symbols are inserted in the final binary.
A good haskell to C compiler wont output c code that reflects the haskell program. Haskell needs extensive optimization and whatnot to be performant on traditional cpus ( cpus not optimized for lazy functional languages) , or when compiled to a language such as C ( even if your cpu was optimized for haskell, you could not take advantage of that from c )
To clarify my goal: I am not using obfuscation for security or reverse engineering purposes. Instead, I use it as a way to generate multiple structurally different program variants that still implement the same original algorithm.
Starting from Haskell, the idea is to obtain different semantically equivalent implementations, and then generate different data-dependency graphs from these variants after lowering through LLVM IR.
The objective is simply to increase the diversity of data-dependency graphs corresponding to the same algorithm, not to analyze obfuscation or reverse engineering.
I do understand that going through C introduces a lot of complexity and may heavily distort the structure, especially given how Haskell is compiled.
-Haskell AST . What is left after parsing is complete. Type checking happens here
-Core. Simpler than AST but still typed. Many haskell specific optimizations are performed. Additional type checking happens here , but i dont remember how neccesary they are. I have heard the additional type checking is peformed to prevent an incorrect codedly optimizations or transformation from AST resulting into incorrect code
-STG : Untyped representation originally designed to represent code in such a way that enables efficient compilation of non strict code into cpus not designed for that paradigm.
-Cmm: traditional low level IR. Strictness and lazyness are no longer explicit, just like any common low level IR like llvm.
I remember other haskell compilers being similar. Idris2 ( a dependent language descended from haskell ) , also has a compiler with similar IRs
It seems you would want to perform those transformations in Core or STG, if using GHC