Single-compiler vs. multi-compilers (or. implementation-defined vs. standardized) languages

I have been watching some CppCon2022 videos and one of the topics was what from Rust community they could learn. I think some of the points are relevant for Haskell community too, especially more beginners friendly and more thoughts into developers ergonomic.

But I wanted to reflect on one thing which I think is the elephant in the room but no one there seems to highlight: Rust and most of the recent darling languages are single-compiler project. C++ was certainly not. Haskell was neither until after Haskell2010 we are happy to just go along with ghc20XX way anyhow.

I wonder what is the Haskell foundation thinking nowadays about it, and also what is the general thinking within current community is.

Edit:

@david-christiansen 's response from the perspective of HF: Single-compiler vs. multi-compilers (or. implementation-defined vs. standardized) languages - #6 by david-christiansen

@david-christiansen has some interesting thoughts about this. Hopefully he’ll be along to share.

I think there have never been serious competitors to GHC (that survived for more than 5 years). Some history from A History of Haskell: being lazy with class:

  • hbc was the first Haskell compiler, but:

    … since the compiler was written in LML it was more or less doomed to dwindle.

  • nhc was built for the specific purpose of being bootstrapped on personal machines which at the time had only 2MB of memory. I guess Moore’s law (or the equivalent for RAM) is what killed nhc.

  • Then there’s Yale Haskell, but it compiled via an existing Common Lisp compiler so:

    While performance within the strict subset of Haskell was comparable with other systems, there was a factor of 3 to 5 in lazy code that could not be overcome due to the limitations of the Lisp back end. For this reason, in addition to the lack of funding to pursue further research in this direction, the Yale Haskell implementation was abandoned circa 1995.

  • Finally,

    All the compilers described so far were projects begun in the early or mid ’90s, and it had begun to seem that Haskell was such a dauntingly large language that no further implementations would emerge. However, in the last five years [2003-2007] several new Haskell implementation projects have been started.

    Those are: Helium, UHC/EHC, jhc, and yhc. To this day, none caught up with GHC. In fact, most are already abandoned.

I would attribute the fact that we had standards to the fact that the language was designed by academics. In contrast, C and C++ have standards because there was really a need for multiple compilers for different platforms.

I think now it is much more impactful to focus on making GHC more modular such that it becomes easier to experiment with the compiler again. See:

  • The revitalized GRIN compiler project, which is (among other things) a new back end for GHC for whole program optimization.
  • Standard Chartered is planning to use GHC as a front end for their Mu dialect of Haskell.
  • The Asterius and GHCJS compilers are also being integrated into GHC.

If I would start a project like Helium now, which is a Haskell compiler with a focus on better error messages, then I would make it an alternative front end for GHC instead of a whole new compiler.

8 Likes

Not a Haskell Foundation representative but my views are simple:

  • It’s better to have a single barely working compiler than two non-working

Haskell has very limited volunteering resources. Of course, people are free to do whatever they want. But in my view, improvements to GHC will benefit every single Haskell developer while improvements to another Haskell compiler will benefit only a few.

8 Likes

There is work underway to implement another Rust compiler:


Wow: what a “glowing endorsement” for the current [2022 Oct] Haskell situation…


While A History of Haskell is being quoted:

…and usually with one implementation at each site. So how many of those languages survived the arrival of Haskell? With the possible exception of Miranda(R)…almost none, at least as far as I know.

As for Miranda(R) back then:

Now Haskell is in a similar predicament: relatively-widely used, but only one implementation. Yes, there are the obvious differences like the lack of trademark and having an open-source implementation, but so were the majority of Haskell’s other predecessors (sometimes as a result of the aforementioned restrictions ;-).

I have more opinions than ideas about how to avoid it, but:

…much sooner than any of us expected.

In his Turing-Award call for liberation, John Backus introduced the phrase “von Neumann bottleneck”. Right now, Haskell seems to be stuck in the glorious GHC bottleneck” - whether it can be similarly liberated remains to be seen…

Hopefully he’ll be along to share.

Of course! Sorry, this has been kicking around in my head for a while, so it’ll be a long post.

First off, I don’t think that “single compiler” or “multiple compiler” is quite the most useful way to think about the issue. Rather, I’d want to think in terms of “implementation-defined languages” vs “standardized languages”. In other words, if there’s a piece of paper describing the language and it disagrees with the compiler, which one is buggy?

Haskell started off as clearly being a standardized language. It arose from a milieu in which there were a variety of lazy functional languages, all of which had quite similar capabilities, but none of which were compatible. The lack of compatibility led to all kinds of annoying work porting libraries from one to the next. The initial push was to find a kind of common core language that all of these projects could implement, but as committees full of researchers are wont to do, they ended up innovating nonetheless, leading to things like monadic IO and type classes.

The Haskell language report was never fully formal the way that something like The Definition of Standard ML is. Nonetheless, even if it wasn’t specific enough to mathematically prove that a compiler had a bug, reasonable people could read it and use it.

A related discussion is the tension between prescriptive and descriptive approaches to human language. While there is no official standard for English, there are a number of respected bodies that have standardized particular versions of the language. Other languages indeed have defining bodies, like Dansk Sprognævn for Danish. These standardized languages exist in tension with the community of speakers - the standard mostly codifies linguistic changes that have already become popular in the community, but the standard is also used to do things like red underlines from spellcheckers and friendly notes about grammar from co-authors. The feedback mechanisms are complex and tied up with the distribution of power in the community of speakers. In reality, this is the relationship between most standard descriptions of programming languages and their implementations as well - innovations begin in implementations and then flow back to the standards, and powerful implementations have an easier time getting things into the standard (see, for instance, EME in HTML5).

This doesn’t happen in Haskell anymore. I’m not taking a position here either way on whether it should, just pointing out that it doesn’t. Today, Haskell 2010 does not describe any usable implementation, and divergence form Haskell 2010 is not considered to be a bug. For instance, Haskell 2010 indicates that fail is a part of Monad, and does not require a Functor or Applicative superclass.

One important difference between implementation-defined languages, in which there is a single canonical implementation that serves as a spec for alternative implementations (e.g. Racket or Rust) and standardized languages (e.g. Scheme or C++) is that the relationship between the compiler and the tooling is different. For Racket or Rust, tooling should generally support racket or rustc, and other compilers should additionally present similar pragmatics if they’d like to integrate into the tool ecosystem. For Scheme or C++, tools like geiser or cmake treat the compiler as a kind of pluggable component that may require some portability shims but ideally won’t. These days, implementation-defined languages are frequently treating the language, the build system, the documentation tools, and other important parts of the system as an integrated whole to be developed in concert with a view towards giving the user an easy time of things.

Standardized languages, by paying the cost of worse integration, gain many benefits. Multiple implementations means that each can specialize (e.g. interactive environment vs compiler speed vs generated code speed vs platform support vs static analysis). It also means that no one implementor controls the language, so it helps maintain an alignment of interests between users and implementors, because the implementors can’t “go rogue” as easily. It also allows a more deliberative approach to language evolution, because competing interests have a way of sharpening arguments and ideas.

I think that, in Haskell, we have the social dynamics of a standardized language. Because other implementations are possible (at least in our minds), we maintain a notional separation between GHC, GHCUp, Cabal, Stack, Haddock, HLS, etc. There is no “Haskell project” the way there is a Rust project or a Racket project, but rather there’s a variety of projects making useful Haskell things. The upside of this is that we can integrate new compilers, which can be a wonderful thing (e.g. the SML and Common Lisp worlds get lots of value from this, and it used to be common in Haskell to use Hugs for interactive development and GHC for building fast binaries). The downside of this is that it becomes harder to address cross-cutting concerns, we end up with more brittle integrations, and we risk more community splits. And social dynamics tend to be reflected in software architecture as well.

On the other hand, I also think it’s highly unlikely that another useful Haskell compiler will come into existence today (barring a couple of specialized use cases like Helium). A big part of the value of a programming language comes from network effects, and being compatible with all the great things on Hackage will require GHC-compatible implementations of things like Template Haskell, generics, GADTs, higher-rank polymorphism, etc. I think it’s much more likely that the future of Haskell compiler development lies in improvements to GHC. The benefits of a standardized language are a bit moot if only one implementation seems realistic.

This is just my thoughts about our situation - no specific recommendation of a path forward is in this post! That’s up to the various Haskell projects out there to think about and coordinate on.

TL;DR:

  • Haskell is formally a standards-defined language, but is now an implementation-defined language in all but name (there is no maintained implementation of the most recent standard).
  • Our social organizations and project structuring are based on the assumptions of a standards-defined language, which has costs and benefits. We pay the costs but don’t get most of the benefits anymore.
16 Likes

You’ve said it implicitly, but I want to make it more explicit.

I think GHC needs competition and does so urgently. The community can’t challenge the course of GHC much, IMO. A new implementation can (that may very well be a fork too).

Whether that leads to another compiler usable in production or not may not be that important.

I don’t think that I said this implicitly, actually. I worked pretty hard when writing that comment to refrain from suggesting particular courses of action, because I think it’s useful to separate a discussion of how the world is from a discussion of how it should be, and likewise how we get there. Agreeing on the facts is an important prelude to agreeing on the direction, and mixing the discussions can make motivated reasoning more tempting as well as making it harder for people with different values to reach agreement.

Thank you for your ideas, but they’re not necessarily mine.

Now, do I have ideas about where to go and how to get there? Yes! But I think that with the job that I have, I should listen much more than I speak about these kinds of things.

10 Likes

I’m aware you didn’t suggest this, but you described the effects.

It also allows a more deliberative approach to language evolution, because competing interests have a way of sharpening arguments and ideas.

I’m suggesting that a good course of action is having more competition, just so that we can benefit from the effects of competition.

I’ll also highlight that competition doesn’t exclude collaboration, so people who’re constantly scared of splitting scarce resources (compiler devs) seem to be missing the point that there very likely will be cross pollination and that GHC doesn’t seem to attract a lot of new devs either.

2 Likes

Thank you for your suggestions!

I wrote my analysis of the facts of the matter precisely to serve as a basis for this kind of discussion, so of course I’m happy that you’re using them. I did write it in terms of tradeoffs, however, and I notice that you’re only selecting the pluses and not the minuses :slight_smile:

2 Likes

I’m not sure what you mean. The downsides come with it anyway. Some of them are more anxiety in my opinion (as described above) and the benefit outweighs them overall.

Forking GHC 8.10.7, freezing the surface language, backporting M1 NCG and focussing on performance, compilation speed and stability will attract a lot of attention from industry and the community.

Popular libraries will very likely make sure to stay compatible. Library authors today are already very wary about new language features, since they have to support a wide range of GHC versions and CPPing your external API is usually a disaster.

Cabal probably wouldn’t need much adjustment. GHCup will support it instantly, HLS will likely follow, since it’s a one-time cost (given the stability focus).

Network effects will decide what the community values more and this will be properly visible.

My 2 cents here.

Forking GHC 8.10.7, freezing the surface language, backporting M1 NCG and focussing on performance, compilation speed and stability will attract a lot of attention from industry and the community.

I would be interested in understanding more what could make this a sustainable stream of efforts.

(a) Is it from academic interests and funding? Which I think people are lured into more theoretical subjects, this seems too “pragmatic”…
(b) Is there a industrial interests and why?
(c) Is it self-funded by a driven individual/group who is determined to see it from happening?

I think in a world where are ample initiatives/distractions around, it’s critical to analyze the social dyanamic too.

A lot of alternative compilers have already be mentioned, like Hugs, uhc, nhc, Helium etc. But I want to mention one additional compiler: haskell-src-exts. Well, technically it isn’t a compiler, it is “only” a parser, but it was the foundation of a lot of great compiler-like Haskell tooling. We had renamers, mutation testing frameworks, source-to-source supercompilers, linters, refactoring tools etc. (The list of reverse dependencies of haskell-src-exts is here). One of the ideas was to have a suite of libraries that tooling can build on, the haskell-suite.

Among all this tooling that was developed, most became unmaintained, some, like hlint, invested the effort and switched to the ghc-lib-parser, but none of those still relying on haskell-src-exts work reliably on all of modern GHC-Haskell code.

In my opinion, we need a good replacement for the haskell-src-exts ecosystem more than we need another “Haskell source code to machine instructions” compiler. Nowadays it is already possible to use the ghc-lib-parser directly, but it is still much more inconvenient than what haskell-src-exts provided. Trying to write tooling which relies on any of the later AST representations, like renamed or typechecked Haskell code, by relying directly on the GHC Api, is both extremely difficult and also difficult to maintain.

If we dont have the necessary engineering power to maintain “just” another Haskell parsing library, I don’t see how another Haskell compiler which is sufficiently feature-rich to support modern Haskell libraries could be sustained.

But to end on a very optimistic note: I am looking with great enthusiasm at a lot of the effort that people are spending in making GHC more modular. Similarly, the splitting of cabal into separate libraries is making it easier to develop tooling for cabal files. I really liked David’s Haskell Symposium keynote about focusing on tooling in the next decade, and I think a more library-oriented approach to our compiling infrastructure is the way to get there.

6 Likes

Facebook is running a GHC fork afaik (or heavily patched one).

Standard Chartered has its own (incompatible) Mu compiler.

Industry has enough resources, if they really want to. To me it just seems there are no combined efforts and so they either do their own thing or stick to GHC and accept the shortcomings wrt their priorities.

A company that employs 50+ Haskell devs should do anything they can to improve compilation time, because it directly correlates with developer productivity. Interactivity and short feedback loops are immensely under prioritized in my opinion.

I think HF could very well contact major industry users and get feedback about their priorities wrt Haskell compiler.

Otherwise we’re just discussing this ad hoc for the 10th time with no interesting data.

1 Like

Interestingly, they are trying switch to GHC for their front end (parsing & type checking) because they don’t want to maintain a full compiler stack themselves.

6 Likes

Yes, but the reason an alternative on-par compiler doesn’t exist is not lack of resources.

I’m not sure SCB has a lot of interest in the new language features of GHC 9.x (at least I don’t as a Mu developer). Mu doesn’t even support all of 8.10.7 language extensions.

So if such a GHC 8.10.7 fork existed and was well and lively, maybe that would be a much more interesting base/target (maintaining patches, backends or depending on GHC API is much less work if the compiler is prioritizing stability… ask HLS devs).

That’s ofc my opinion.

2 Likes

I had the same thoughts before. But at least GHC 9.2 has such powerful RTS features which make pre-GHC 9.2 compilers look quite modest:

  • Info table profiling: this option e.g. tells the specific code line that contains space leaks. It’s now impossible to imagine how we lived before that. Blog post for more info.
  • The -Fd option for returning memory to OS. Before that option, Haskell programs could consume 4x (!!!) more memory than they actually use. Also blog post with more explanation.

Improvements to RTS will come along breaking changes and language changes. Unfortunately, they’re not decoupled enough at this stage so if you want to benefit from ground-breaking observability and performance improvements, you need to eat a bitter pill and pay the upgrade cost.

On the brighter side, such runtime behaviour improvements justify the upgrade code quite well.

6 Likes

…and now a growing number of Haskell users have made the judgement that the “upgrade cost” is now too high, otherwise they probably wouldn’t still be using version 8.x.y now - if it’s not being used and it’s expensive, is it really there? Furthermore, there’s already been a variety of new features which have been transferred to older GHC versions.

However if there was another Haskell compiler, that process could go in both directions: patches which mend problems in the other compiler could also prove useful in GHC.

Is it really growing? I remember 7.10 was quite popular for a long time while GHC 8 was already released.

What are really the blocking breaking changes now that DeepSubsumption is back? I can only find:

  • 8.8 removes the fail method from the Monad class
  • 9.0 makes TH splices separation points for constraint solving passes
  • 9.2 simplifies subsumption, but old behavior can be restored by using DeepSubsumption

All other breaking changes seem unlikely in practice. But I guess they could still be likely to occur somewhere in the transitive closure of all dependencies of your projects.

That is also good to note: making the compiler stable is one thing, but you also need to deal with the ecosystem. Is it really possible to stabilize a large enough part of the ecosystem?

Also, while making that list of breaking changes I thought that it would be very helpful if we had a comprehensive list of breaking changes for all GHC versions and perhaps also for other popular libraries. Every breaking change could include example error messages and migration strategies. That sounds a lot like the error messages index to me. Maybe this is a good idea for HF @david-christiansen or the stability working group?

Forking at 8.10 would give us another Eta. A laudable initiative, but it’ll fall behind too fast, feature-wise, to sustain interest.

3 Likes