Hécate's Crackpot Ideas: A compilation server for GHC

With the recent focus on improving parallel compilation at the module level, with semaphores (and the problems of having cabal-install and GHC having shared custody for semaphore management across multiple libc implementations), I have been broadening my horizons on what a compiler should look like.

Moreover, Speaking with Alan Zimmerman during the Haskell Ecosystem Workshop (co-located with ZuriHac 2024) pointed me in the direction of C#'s compiler, Roslyn, and the paradigm of query-based compilers. Chatting with Moritz Angermann and Andrea Bedini was also very eye-opening!

The idea would be the following: An alternative Driver for the compilation pipeline would be implemented to act as a daemon that takes requests (much like a normal server), with such queries like:

  • Compile this module to native code
  • Fetch the type of this declaration
  • Completions for members of a record
  • Docstring for an identifier
    etc etc.

And the command-line tool ghc would be a client for this daemon.
One big advantage would be that ghc-daemon would be in charge of scheduling the building of modules across the capacities of a machine. An interesting design question would be “How many daemons should live on your system?”. For now I envision one daemon per user, so that we avoid the problems of having a daemon that can write and read everywhere on the filesystem. :slight_smile:

cabal-install and ghc wouldn’t have to coordinate through semaphores on disk for such a thing. cabal-install would call ghc that would send the compilation order to the daemon (or start the daemon if it is not started) and get information about the build in return.

I have been prototyping on my spare time with a toy compiler in order to get a feeling of how things should fall together. I’d be interested to chat with folks who are interested!

Now you may think “I know just the thing that does that”, and you would be right to think about HLS! It’s not an entirely new idea nor is it unheard of.
I haven’t been talking about distributed compilation, but that is something available today with the the external interpreter

Here is a very simple diagram to illustrate things a bit:

So yeah, let’s bring the client-server compiler architecture to a new level!


References (please do read / watch them, they explain things better than I would in a forum thread):

PS: I realise after having written this that the subject was brought up by @brandonchinn178 in 2023 in GHC build server for optimizing one-shot compilation

39 Likes

As a clarification, I indeed talk about both the concepts of query-based architecture and compilation server, but what interests me the most is the server, for its scheduling capabilities. The query-based model is an architecture that goes along quite nicely, but I’m not married to the idea.

2 Likes

I expect Alexis King’s talk on rackets tooling vs GHC’s is very worthwhile https://www.youtube.com/watch?v=H0ATppFmt2o

4 Likes

Thank you for this link, an eye-opening talk indeed!

1 Like

There has been some work on this idea, but I don’t know of any end-product of it: Implementing a compilation server

I think this is a very cool idea in principle. I have always had some questions about how this is going to work across tools, though.

Both a query-oriented compiler and a language server are (IMO) going to be backed by an online build system. But the language server really wants to handle the full build graph so that you can get fine-grained invalidation etc. So how does this work?

The language server uses the compiler as a library? Do they use the same language for build rules? Does the compiler library just expose the pieces needed to write the build rules?

Or do they communicate over a process boundary? Then you need a complicated protocol and there will be lots of serialisation of large objects. Or you need to push all the logic for the queries the language server wants into the compiler, which leads to…

Or should they just be merged? That would be simplest in many ways and I suspect is where newer language implementations will go. But it would be very hard for HLS… it’s dependency footprint is massive! Hard to avoid if you want to pull in plugins that use other tools as libraries…

I think there are many variants of this that can probably work but I don’t know if people have done this in practice. E.g. I think rustc is query-oriented but I think rust-analyzer doesn’t use it?

So: interesting but I want to know the design details!

3 Likes

@Kleidukos I am very happy that you decided to bring your idea here! At least architecturally speaking HLS seems to be well placed: it controls the GHC pipeline and it is event-driven with a RPC interface.

Maybe the (unfortunate I might say) tight coupling between GHC and HLS has a silver-lining: given HLS has to link with GHC to understand the code exactly as GHC would do; then it is fair to say that HLS is the compiler. Guess how HLS gets to figure out how to set up a GHC session for a cabal or stack project? by pretending to be the compiler! :slight_smile:

HLS also has a powerful plugin system that can streamline a proof-of-concept; so you could try writing a plugin to run the rest of GHC pipeline (AFAIK: the code-generation part).

This would also have an interesting consequence: no more waiting for a HLS release to support a new GHC version. HLS is GHS and there could be only one binary to install! (of course I am dreaming but why not).

2 Likes

Let me point out that, while these are great and fundamental questions to ponder; the reality of today is the first one. HLS does use GHC as a library and IMHO the evolution of this idea could follow a (gentle) strangler pattern: wrap as much a GHC as you need to offer both the batch and the interactive interfaces. The answers to your question might well come as by-produts: e.g. you will get a first implementation of the batch interface as HLS build rules, which might be coarse at first but could get refined later on.

2 Likes

I would love to see tooling like this moved forward. This would be a huge improvement for tools like haskell-language-server but also for alternative build frontends like Buck2 or Bazel or Nix.

At Mercury, our ~10,000 module / 1.2 MLOC backend monolith has made haskell-language-server unusable, as outlined in our recent post “Announcing ghciwatch 1.0” on the Mercury engineering blog:

For projects that large, HLS’s performance starts to break down: on my 2022 Mac Studio (M1 Ultra, 20 cores, 64GB memory), HLS takes a full 32 minutes to load the project. After it’s loaded everything, making changes to project files will start reloads that take at least 1.5 minutes (changes to modules nested deeper in the dependency graph can take as long as 5 or 10 minutes to reload, if HLS is able to reload at all). HLS is also very memory intensive, consuming 50GB of RAM after loading the project and climbing from there.

10 Likes