I compare GHC to LLVM. LLVM obviously has bugs (a lot of actually), but it’s used by nearly anyone, it’s bootstrappable, so it’s easy to back port fix, bootstrap from fixed version and check that new version is reproducible. LLVM has strong community, so releases are tested heavily before going public. GHC is the opposite one. GHC and Haskell have much smaller communities, GHC isn’t bootstrappable, so I even can’t imagine how they fix bugs and how they can be sure that recent compiler binary isn’t corrupted by 10 years old bug. For a long time there were no stable versions, so it’s common to see projects bounded to specific GHC version and packages versions, because managing all deps without LTS is impossible. And it looks like that GHC is still managed like an academia project a little bit, without strict checks before release, otherwise I can’t understand how issues like #26711 are possible.
GHC how do you say good test suite, which is why most bugs get caught. I would also say that most compiler bugs are of the form of a theoretically correct program getting rejected, rather than an incorrect result. In short, I think the testsuite and the wealth of hackage packages are the evidence for GHC being trustworthy.
Please don’t compare LLVM with GHC, their underlying economics are on completely different scales.
I think the comparison to LLVM is pretty interesting actually!
LLVM receives much more funding and has many more active contributors than GHC. And yet issues like #190540 still happen, where any program that used musttail or the tailcc calling convention with byval parameters (i.e. essentially any non-trivial program that used musttail or tailcc) was totally miscompiled under LLVM 22.1.
This isn’t to say that there isn’t room for improvement in GHC’s processes (or LLVM’s), but I don’t think GHC is unique here in any way. Compilers are hard.
Yeah, that’s interesting if you’re familiar with both GHC and LLVM, but not necessarily if you’re familiar with neither.
Is “very” a trustworthy and robust answer?
I suppose that trustworthiness and robustness of software are currently very vague. Common proxies are:
- Number of automated tests.
- Amount of human scrutiny received (correlates with size of community, anticorrelates with size and complexity of the code).
- Severity of bugs that have been discovered (correlates with number of automated tests, anticorrelates with human scrutiny).
One issue with individual bugs is that testing can only show the presence of bugs, never their absence. Discovering severe bugs may be due to increased testing, not due to poorer code.