I was reading about the Koka language (for its effect system specifically) and came across the fact that it does not have a traditional garbage collector. Rather, the memory management of Koka programs is done automatically through what is called Perceus.
Is there a fundamental reason this is impossible/undesireable to implement in GHC, or is this just a matter of putting engineering effort into it? Sadly, I don’t have access to the publication, which may already explain why/why not.
My understanding is that generational garbage collectors are very good at processing a bunch of “garbage” with minimal effort. Instead of deallocating every single thing by hand (one free per object), you move the objects you keep (usually far less than what you throw) and re-assign the previous zone to be overwritten for allocating newer, fresher objects. Suddenly you’ve cut the amount of calls to free and it’s no longer a bottleneck.
Reference counting as implemented in Koka or Lean requires a heap that is acyclic, or at least a heap that can be partitioned into a DAG of compact regions, each of which are collected as one. Lazy evaluation and more generally mutability is fundamentally at odds with that, because it introduces cycles in the heap.
If you want Lean’s RC, you have to give up on thunk update. E.g.,
will no longer operate in constant space, but rather needs to allocate each cons cell anew.
Of course, in this case you could simply allocate ones into a compact region and maintain the DAGs-of-regions invariant. In general, it is difficult to predict statically how big a region must become to eliminate all cycles. You could declare the whole heap one big region, but then you can only free it at the end of the program, e.g., you lose precision/space safety.
Is it worth giving up on laziness and unrestricted mutability to have a simple and efficient GC algorithm? Perhaps it is! Time will tell and we surely will have a good comparison in a few years of time.
Aside from the issues with laziness and mutation, it’s not clear to me that Haskell would benefit well from a radical change to memory management technology. Much of the value of Haskell comes from the existing ecosystem of code, and that code has been written assuming the performance tradeoffs of a traditional tracing GC.
Part of using Lean 4 effectively is ensuring that data structures are unique to allow mutation instead of copying, and that has a pervasive effect on the style of performance-sensitive code, which is different from the way one would write performance-sensitive Haskell.
I’m not longing for a different memory management mechanism in Haskell, and I certainly am not suggesting that real engineering efforts should be made in this direction.
This is more of a question about the differences between koka and Haskell that allow koka to not have a garbage collector, while simultaneously sharing a lot of DNA with Haskell.
FYI: Elton Pinto and Daan Leijen (one of the authors of the Perceus algorithm) will present a paper on “Exploring Perceus For OCaml” at the ML workshop this year. There is also a master thesis on this topic by Elton Pinto – you might find some answers there.