The Montreal Haskell Compiler

9 Likes

There seems to be nothing there. Perhaps a bit early to announce?

Also, mhs is already taken by MicroHs, so I’d suggest using a different abbreviation.

4 Likes

Note: not involved in the project saw it and thought it was cool that there seems to be a Renaissance of new Haskell implementations. So will watch the space and was posting in case other people want to do the same or follow along. Sorry if it’s misleading.

3 Likes

Hi guys! It is indeed very early; I only created the org and repo a couple days ago. This is something I’ve wanted to do for a while. I’d been putting it off, but decided to finally make something “official” after several people expressed their interest in a new Haskell compiler and core ecosystem to me.

Now that it’s been posted about here, I’d be very happy to continue this thread to answer any questions about MHC.

I’ll write a follow-up comment later today where I’ll go over the motivations and high-level plan for the project.

11 Likes

Sounds exciting! Looking forward to your follow-up.

Making a standard Haskell compiler is not particularly hard compared to compilers for many other languages. But the modern Haskell ecosystem doesn’t use just standard Haskell; it relies on GHC, which supports a richer and more complex language. GHC extends standard Haskell with a wide range of powerful features, including commonly used extensions like GADTs, linear types, and Template Haskell. On top of that, it supports compiler plugins like Liquid Haskell and provides advanced runtime features such as Software Transactional Memory (STM), which interact closely with GHC’s intermediate representation, Cmm.

Because of this, porting modern Haskell packages to a compiler that only supports standard Haskell is almost as much work as porting them to a similar language, like Idris 2 or Lean 4. In that light, building a new Haskell compiler from scratch seems less practical than either improving GHC or targeting another language entirely.

I’m currently working on GHC for GSoC, and there are many areas where it can be improved.

2 Likes

Putting aside the fact that not every project has practicality as its sole motivator, I also can’t help but think that this line of reasoning is a bit backwards. In my opinion, the reason why the modern Haskell ecosystem relies on GHC-specific features is because GHC has been the only compiler that we have had around for a long time now, not the other way around. There was no sufficient incentive to propagate these advancements to the language from GHC to Haskell-the-spec, because GHC was the only thing people had anyway.

I believe that competition breeds standardization, and therefore in the long run, having multiple competing Haskell compilers would motivate both the compiler implementors and the library authors to converge to a standard (if I may be so bold, to come with a new Haskell Report) with which these now-GHC-specific bits would be for every compiler to enjoy. In my experience, tooling that does not make any assumptions about the underlying implementation tends to be more robust. So I believe that moving to a future where there are multiple viable Haskell compilers would benefit everyone, including the GHC users.

9 Likes

I like diversity!

Yes, there is real value in people collaborating in a single giant project like GHC – we can achieve so much more together than we can individually.

But that very reach and scale makes GHC a bit ponderous and not-agile. There is a real role for other implementations to explore different parts of the design space. For example

  • Lennart’s MicroHs is incredibly small compared to GHC, while still implementing many (but far from all) of its language extensions; it bootstraps in a flash, and compiles Haskell to very small binaries. They may not run as fast as GHC-compiled code, but sometimes that doesn’t matter.
  • Artin is building a typechecker for Haskell that that corresponds line-for-line with formally expressed typing rules. GHC’s typechecker is far more complicated and far less comprehensible, and entirely lacks formal typing rules – even though it does more.

Maybe the Montreal Haskell Compiler will also have an “angle” that will enrich us all.

24 Likes

I’ve decided to break my longer follow-up into multiple comments. Partly due to time constraints on my end. But also because there’s just a lot of ground to cover. Like I said, this is something I’ve been thinking about for a long time.

Here’s the first part. By happy coincidence, the timing and topic of this one will line up well with @simonpj’s kind and motivational comment.

1. At a Cosmological Scale

MHC takes inspiration from a bunch of ideas I’ve encountered, all of which I interpret as different expressions of the same core theme: that there is no “one true Haskell.” Instead, I see Haskell as an ongoing research project. One with many branches and simultaneous directions. This is also one of the ways that I read the “avoid success at all costs” slogan: we want to succeed both as a language and as a community; not as a single product.

Along those lines, MHC is fundamentally an experiment, intended to test new ways of implementing the Haskell language (or language family, if you end up interpreting the remainder of my comments in that way) and its ecosystem. I hope that it will flourish and lead us in exciting new directions. But it may not, and I think that would be an equally valuable learning opportunity for all of us.

As for specific plans, for now I’ll start with a softball. Many of the key parts that go into compilers are useful for a lot more than just the compiler itself. I’d like to disentangle some of these parts, and (ideally) turn them into their own packages. That way users can depend on and even extend each package individually.

Much more to come…

5 Likes

I appreciate the heads up about the potential naming conflict.

I’d been planning on using mhc, which I hope isn’t taken yet! Maybe this can serve as a semi-official reservation of that name?

While I’m on the topic of the name, it is indeed referring to my home city of Montreal, Quebec. Wanted to get that out there before anybody has to ask.

8 Likes

Would QHC or MQC be just as relevant ?

I now realise that I did not notice that mhc and mhs are different, so I guess mhc is still free albeit potentially confusing to inattentive people like me.

3 Likes

Thank you for this. Apologize again for the premature share but it’s not every day that you come across a nascent Haskell implementation that doesn’t already have a lot of its concepts codified.

Excited to see how this develops.

3 Likes

No need to apologize. I’m very happy to see so many people interested in my project.

I have a bad habit of overthinking when left alone. Getting the project in front of a wider audience enables others to contribute their expertise, experience, and constructive criticisms. I appreciate people dedicating time to discussing my project.

Honestly I wasn’t sure how the suggestion of a new Haskell compiler would be received. I was afraid people would be dismissive of it. Especially considering how many ambitious things I have in store for MHC.

Receiving so much interest so early, and from people directly involved with GHC no less, has been invigorating for me.

2 Likes

That’s totally understandable.

As a bit of personal lore: one of my closest friends/colleagues is very dyslexic, and he makes these sorts of substitution mistakes often. So I’m sympathetic to that sort of thing.

For now I’ll continue using mhc since I’m used to it by now. But if many people find it confusing, I’ll take suggestions for changing the name. I’ll create an issue in the Github repo for people to share their name suggestions.

Here is the issue for name suggestions. Please also share any conflicts you find.

1 Like

What would those be? Compared to GHC’s technological choices

  • alex for the lexer and happy for the parser (as opposed to e.g. parser combinators)
  • unique identifiers for names (as opposed to e.g. de Bruijn indices)
  • type-checking undesugared expressions against desugared types
  • metavariables as mutable references, removed by zonking (as opposed to e.g. an explicit, pure typing context)
  • System FC as the Core language (as opposed to other lambda calculi)
  • evaluation model given by STG (as opposed to a different formalism)
  • codegen with NCG or LLVM, an RTS written in C

What do you intend to replicate and where do you envision a different approach?

5 Likes

Perhaps monhc would be a better name then? If you’re francophone, it has the additional benefit of being your Haskell compiler.

(to be clear, there’s nothing wrong with reserving mhc either; I hope I see the day when the 26 letters of alphabet are insufficient for all the Haskell compilers)

5 Likes

I hadn’t planned to get into that until a few comments later in my big chain. But your question(s) is/are so tempting that I’m going to dive into some of the technical plans now, just for you :grinning_face_with_smiling_eyes:

Without further ado…

2. The Guts

2.1 Parsing (and lexing, kinda)

I’ve settled on “grammar combinators” as my weapon of choice for MHC’s parser implementation. Grammar combinators are a bit like parser combinators: the idea is that complex grammars (respectively parsers) are built out of simpler existing ones, using a small-ish set of higher-order functions to combine things. And both have the advantage of keeping everything inside Haskell; there’s no bespoke macro language or funky pre-processing to define your grammar and get a working parser for it.

The main differences between grammar and parser combinators, that are relevant to our discussion here, are:

  • Context-free grammars are strictly Applicative/Alternative. Monad is too powerful.
  • The grammar HOAS isn’t a parser on its own. It needs to be interpreted through some external routine.
  • There are many choices of parsing routine. Specifically, we can use LR(k) parsers to avoid some the back-tracking that parser combinator implementations would need to do for those languages.
  • Down the line, we can generate performant static parser code from the grammar HOAS, to achieve the same overall performance as Happy/Bison/Yacc-generated parsers.

This is the first part of MHC that I’m building out, for various reasons. If you want, you can see how that’s going in the MHC repo. My implementation is heavily inspired by Olle Fredriksson’s Earley parser package.

My immediate goals for the Hanjiru grammar/parser system are:

  1. Implement parsing algorithms (Earley, LR(0), LALR(1)) as interpreters on the HOAS grammar encoding. I intend for this first batch of interpreters to be slower than Olle’s, but much easier to read and maintain (hence the hanjiru-slow branch name).

  2. Define some simple demo grammars for people, me included, to experiment with and build off of. One of these will probably be a very bare-bones take on Haskell syntax.

  3. Figure out a nice way to do Happy-style user-defined monadic post-processing on parser outputs.

I haven’t decided on the lexer yet. So far, Hanjiru is agnostic about the format of its input. The only requirement is that it has a list-like structure (i.e. implements uncons somehow). If anybody has lexer suggestions, I encourage you to open an issue in the MHC repo.

As for reasons I chose to come up with my own parsing infrastructure rather than sticking to Happy. Hanjiru’s grammar definitions are first-class citizens in Haskell, meaning they’re:

  • fully typed,
  • modular,
  • easily re-usable,
  • able to be documented with Haddock, and indexed on Hackage.

Also, I feel that Happy’s implementation has too many glaring problems to be saved. The whole code-base smells like it belongs a decade or more in the past. I can’t even get it to build from source on my NixOS system.

I have nothing but respect for the work that the Happy maintainers do. But I feel it’s time to come up with something new.

2.2. Core Language(s)?

This question cuts deep into the heart of why I decided to create MHC in the first place. To make a very long story short, the ideal outcome for MHC is to have a design so modular and flexible that a user can open up the compiler and drop in their own IRs, type checker, backend, whatever, all without looking at a single line of my code.

That’s the absolute most ideal outcome. I don’t know whether it’s something that can be fully realized. In either case, it’ll at least serve as a central guiding principle.

2.3. The MHC Experiment

The experimental aspect of MHC is largely intended to answer the following questions:

  1. Is a compiler this flexible even possible?
  2. On a meta level, will prioritizing modularity and extensibility so much help other ambitious Haskell compiler projects if they use us as a platform?

These other projects include: Linear Haskell, Dependent Haskell, and Clash. Right now, the GHC monolith presents a major technical hurdle for them. Any change can quickly balloon into an implementation nightmare spanning the entire code-base.

There are also major social barriers to overcome with a monolithic compiler. Even small changes and improvements can get trapped in endless debate cycles. These things are absolutely worth discussing. But it puts a serious damper on our ability to implement, test, and adopt changes that many of us are already in love with.

2.4 The Compiler is a Big Bag!

My vision for MHC is really that it will be a big bag of compilery things. Right now I only have a rough idea of what some of them will be. Ultimately, people should be able to pull their favourite parts out of the bag, and put them together into whatever they like. They should also be able to come up with their own compiler bits that others can try out. And if people like the new bits, they can go in the bag too.

Of course, not everyone wants to build their own compiler from scratch, no matter how nice we make the pieces. So a major priority for me is the development and maintenance of a good Default Pipeline that covers most use-cases out of the box. I can’t see MHC succeeding without that.

I think that bagging MHC like this would be helpful in addressing the technical and social problems that we have with monolithic compilers.

Bagging lets contributors control the scope of their contributions. Prototypes can pare down their dependencies to only the absolute essentials. Enthusiasts can build their perfect pipelines, even including features that are still far from making it into the Default. And by the time we do start thinking about adding something to the Default Pipeline, we’ll (hopefully) all have a better idea of what that feature is like in practice.

2.5. Actually Implementing It…

Following a suggestion from @Kleidukos, MHC will start off as not much more than a pretty frontend bolted onto a wrapped GHC core. Happily, MHC is already intended to be an amorphous bag. That frees us up to implement it one piece at a time. Anything we haven’t implemented yet, we can rely on GHC to help us with.

I feel that also gives MHC something of a roadmap. What’s left to do? Whatever we’re still dependent on GHC for!

Like @simonpj, I want MHC and GHC to be friends. Competing is one of the goals. But so is collaborating. I don’t want one compiler to “win” against the other. I hope that GHC will support us when we need them. And that with their support, we’ll become a testing ground for new ideas, and share insights and lessons that GHC can also benefit from.

3 Likes

Also, before anybody starts suspecting anything…

7 Likes

If you want some free real estate, cross-compiling Haskell as easily as Golang can would be huge.

6 Likes