[Call for Ideas] Forming a Technical Agenda

chris · February 17, 2021, 1:19pm

A Faster Compiler

We need to talk about build times.

Anyone wanting to kick the tyres on Haskell could plausibly try out its most famous application,
setting it out to build its second most famous application — i.e., ghc building pandoc.

Initially things would go rather, especially if they have a decent workstation. They would observe
the enormous stack which it would duly tear through, until they got to the application itself at
which point it would all slow down to a trickle, for 15+ minutes, concluding that these Haskellers can
configure make to run in parallell but clearly that is as far as Haskell goes in parallell
execution. Haskellers will talk you to death with all of the fancy papers but can’t put it into
practice.

If our Haskell-curious developer were to make some enquiries of people using Haskell in anger on
non-trivial code bases they would quickly discover that build times are a real issue.

And this is all about to get worse. Disposable, fanless productivity laptops are being shipped with
8 cores and the rest of the industry will follow suit, while top-end workstations are being built
with 64 cores, soon to be many more, with mid-range prosumer workstations getting tens of cores.

We cannot have our flagship, critical, app being both slow and (out of the box) single threaded.

I am told that one issue is that the problem is the difficulty of effective load balancing when
cabal/stack and GHC are doing their own thing. Surely there is some low hanging fruit here.

tonyday567 · February 17, 2021, 8:27pm

My perception of the Foundation is that it can achieve stuff that has been blocking language evolution for a long time. I would add some big tickets:

unify ghcup/cabal/stack/stackage
fix the Num hierarchy. To do this we need functional dependencies in base.
module-named (or namespaced) re-exports, so I can export ‘Map.fromList’ say, rather than make the user of a library have to add containers as a dependency. Python does this right, as an example.

kgardas · February 17, 2021, 9:13pm

Using haskell for writing some commercial software. Also still just beginner, let’s say industry beginner. I would appreciate:

fix historic design mistake(s), e.g. String as [Char] etc.
simplify, or allow managed code simplification (and enforce this) by either implementing some option to turn on simple haskell or boring haskell or some option to support user defined haskell/ghc subset. E.g. -fhaskell_profile= where in file there will be a definition of supported extensions so user will get hard error on an attempt to add/use company not supported extension in the code. Or something like that.
focus on quality of implementation. GHC runtime/rts screams for attention here. A lot was done already, but still Haskell/GHC is miles away from industry languages like Java. I’m thinking about GC and its scalability, debugging, runtime behavior observability etc.

chreekat · February 17, 2021, 10:39pm

Paying down tech debt in Cabal would be my personal choice. Triaging issues and PRs, taking care of rough edges and bugs. I’m not sure if it should be in the Foundation’s mission to have a vision for Cabal’s future, but clearing the path a bit so others can have an easier job of it does seem appropriate, and can be scoped or time boxed more easily.

angerman · February 18, 2021, 2:44am

There are a lot of really good ideas here. Full disclosure: I’ve been involved in drawing up the first technical agenda during the HF’s formation.

One point I’d like to reiterate, which I think is important. We have lots of people who work on interesting problems, and at some point even with financial support behind them for those. We should also respect the effort people have put into projects. I would hate to see the HF jump onto projects, overtake them (likely unintentional) and then just demotivate those who started those projects and drive them out.

I’d rather see the HF take on issues that are too large for any single person or company to take on, or that need large and broad consensus across the larger community.

For example, I know that Obsidian Systems is going to work on GHC performance; Hasura with Well-Typed as well, and @rae has an open position for an intern to look at performance at Tweag as well. If the HF could provide some coordination support (if needed) and help align targets that would likely very helpful, and well spent resources, without needing to find someone to work on this directly for the HF.

In general I believe the Haskell Foundation should support existing work in general but more general use its position to tackle issues that need a neutral party to form consensus, or are too large for any single entity to even trying to tackle due to lots of social lubricant being needed to make it work.

One item that hasn’t been mentioned yet is: fixing template haskell. This is a rather big issue and it’s going to span so many projects, people, and will need to have built some broad consensus and design on how to fix it. There are lots of ideas floating around, but someone would need to helm this and organise it.

Why do we need to fix template haskell? TH’s ability to run arbitrary IO during compile time is a serious security concern. Build a project that somehow pulls down a dependency that wipes your hard drive? Exfiltrates your ssh keys? Something else? I don’t know how anyone feels comfortable running arbitrary haskell code with TH on their machines without aggressively sandboxing the environment in which the code is compiled.

The whole idea that TH can execute arbitrary IO (file access, run processes, …). TH is also a major reason why cross compilation is so painful, though that’s not limited to its IO capabilities.

Ericson2314 · February 18, 2021, 3:26am

Yeah I see people piling on in this thread with their favorite issues. I think it’s very important that HF not just go laxer-focussed on any one issue or other, but instead look at the overall functioning of “the machine” with all it’s disparate volunteers, techniques, etc.

For example:

A lot of people want various things with base / the standard library, including more stability. I think a lot of these goals are irreconcilable in a single library, and this and not social/political issues is why we’ve seen endless discussion of this stuff that goes nowhere. I want to seebase` broken and up and lots of it decoupled from GHC so we can actually see ecosystem-level experimentation, and yet also more stability. The fine-grained base avoids collateral damage from fast-evolving or competing core infra also making life miserable for things that just use the uncontroversial bits.

The “alternative preludes” I don’t think do the topic justice because a) they are so big as to be geared towards end applications rather than the fine-grained libraries that are our ecosystem’s bread and butter, b) since they must use base they are forced to paper over issues rather than extricate them properly.
Upgrading being hard I think directly relates to base being too big. base's every increasing major version creates are ripple of busywork throughout the Cabal-verse. A finer-grained base would “free” the highlighly stable portions from the ever churning bits.

(Fun fact, @snoyberg himself written that (I paraphrase) Rust doesn’t really need a Stack. He ascribed that to a culture of less breakage—I point to Rust’s std never having a breaking change in particular. I think that’s an over-correction that will cause std to accumulate mistakes and gotchas faster than base has, but perhaps is the lesser evil. Breaking up base is analogous to Rust’s core vs std, and I think wit versioning should allow for I think the goldilocks sweet-spot in between Rust’s “never break” and Haskell today’s “always break”.)
Many projects have extremely slow CI that sucks up valuable time. For me the gold standard is correct-by-construction caching for bug-free incramentality, and only Nix provides that at the moment. I don’t think there is appetite for yet another GHC build system, but I hope to knock out all the weirdness that makes a bespoke build system required in the first place. cabal build ghc and stack build ghc are perfectly possible we move the goalposts and accept building the compiler alone, and build rts, base etc. as separate steps.

Getting core libraries off extremely slow Travis is also important.

So yeah, few of these things are directly solving problems raised, especially in that they are focused on expert/core volunteer pain points. But I really do believe that if we first increase the productivity at which we can fix everyone else’s issues, the diversion will pay for itself really quickly. The challenge is thus building the trust to pursue these intermediate milestones.

I also want to get ahead of any sentiments that “Haskell is not popular and other languages are because they do these things better”. I’ve been involved with the Rust community for many years (though less recently), and I’ve dealt with the build systems / workflows of many languages as a Nix maintainer. A lot of tooling and build system is bad or worse across the board. So I don’t think we should feel shame or envy working on this stuff, because it’s not because we’re playing catch up. It’s rather because there is an opportunity to crush the competition .

lowest hanging fruit != areas where we are most behind!

Ericson2314 · February 18, 2021, 3:50am

A lot of TH can get away with no IO, and Haskell being Haskell should make that possible. Other TH however does need IO (though host IO, not target IO, as we get with cross-compilation today). Setup.hs is almost always used for evil and thus I don’t think we can do much better. Nix however shows that sandboxing builds is totally feasible, and so I think we should just do that and call it a day.

Doing something major about Cabal (e.g. getting rid of Setup.hs, rethinking the Cabal cabal-install boundary or whether it should exist at all, etc.) would be an excellent way to “cut the Gordian knot”. But those are very politically delicate, and I think not something the HF should tackle until his has more momentum.

However, I do think in the meantime Cabal should be studied for just how mind-boggling complex it is. Cabal more than anything else witnesses just how dramatically complexity can accrue. I think treating it as a cautionary tale can well inform how we deal with issues elsewhere, and would also prepare HF for when the beast can finally be confronted.

simonpj · February 18, 2021, 9:52am

Yes, I think everyone agrees strongly about that. The HF should focus on things that aren’t already being done. It would be counter-productive to duplicate, or (worse) supplant existing efforts. Rather we should seek to enhance and support them. “Thank you for doing X; how can we help make you more productive?”.

“Is anyone already doing this?” should be a question we ask about every proposed idea.

chris · February 18, 2021, 2:56pm

[Disclaimer: I am on the Technical Agenda Task Force]

The Technical Agenda Task Force is acutely aware of this issue of trampling on the toes of folks that are already working on stuff, and arguably this has been incorporated into the name – our object of focus is the agenda, not necessarily making stuff. Current thinking is that if an issue is important enough (most likely of strategic importance) then we ought to consider adopting it even if others are working on parts of the problem. In that case the TATF should be careful not to disrupt existing efforts for all the reasons you describe, but we might want to make enquiries about what is being done and identify complementary areas to tackle or take a watching brief or render assistance, or, where apropriate, help with coordination between different efforts.

chris · February 18, 2021, 3:08pm

In general I believe the Haskell Foundation should support existing work in general but more general use its position to tackle issues that need a neutral party to form consensus, or are too large for any single entity to even trying to tackle due to lots of social lubricant being needed to make it work.

I agree wholeheartedly with this statement. I should add that I am sceptical about the HF taking on large technical projects (a cabal/stack replacement), or projects where the goals are not tightly defined – e.g., an all-new singing and dancing Prelude that will fix all of its problems. Note, I am not saying we should not try to make things better but would want to look closely at Grand Unifying Projects that seek to determine the answer to Life, The Universe and Everything.

agentultra · February 18, 2021, 3:41pm

Here are a few ideas from other language foundations:

Could HF set aside a budget to bring under-privileged contributors and maintainers out to conferences or pay to get their work published?
Concerning adoption could the HF set aside a budget to give user groups and community organizers funds to rent spaces and buy pizza? Would enable us to run more Haskell user groups and potentially gain more contributors.
Would the HF be willing to put together a training program to teach people how to contribute to GHC or core libraries? Something like the eudyptula challenge or similar?
I also agree with a lot of the other suggestions that the core library could be refined and expanded. Supporting common data structures such as vectors, hash maps, queues, etc and the new linear types features would be awesome DX for making Haskell easy to reach for and require less setup/tooling.
Supporting the DTF and other community efforts by a recognition site where contributors can opt-in to receive badges and awards for their contributions?

blamario · February 18, 2021, 5:54pm

I want to see base broken and up and lots of it decoupled from GHC so we can actually see ecosystem-level experimentation, and yet also more stability.

Hear hear! But how exactly would Haskell Foundation help accomplish this task? Pay a developer? Establish a bounty? Since base is really a creature of GHC, perhaps this should start as a GHC initiative first?

Now that I mention it, a bounty program could actually make a lot of sense.

Ericson2314 · February 18, 2021, 6:20pm

I think the main obstacle is the current prohibition of orphan instances linearizing base’s modules in a rather contorted way. I have started a wiki page here https://gitlab.haskell.org/ghc/ghc/-/wikis/Rehabilitating-Orphans-with-Order-theory. I don’t think the problem is terribly hard, but I think it does deserve an academic paper before anything gets merged. This would mean mustering the research side of the GHC community for the sorts of problems that have normally been dealt by the non-researcher side, which (to me at least) is an exciting opportunity but also a non-negligible challenge.

Ericson2314 · February 18, 2021, 6:28pm

To me the big question is whether to plow right into that, or first focus on CI to free up more capacity for a new research project. I would normally say yes, CI first — capital projects to cut operating costs before capital projects for other purposes — but the solution for CI I trust most — doubling down on Nix everywhere — is highly politically fraught, perhaps more so than greenfield research about orphan instances with radical implications for how libraries are organized!

MaxGabriel · February 19, 2021, 1:26am

Compiler performance. I’m not sure if there is need here for tooling, or it’s just a matter of lots of work. Possibly this could also be tools to profile our own builds to speed those up (eg identify bottlenecks in parallelism).

Anything HLS needs from GHC’s side seems pretty high leverage. Especially anything that helps HLS on large codebases

Better documentation for libraries would be great. Not sure if there there is a facilitation role here or it’s pure work.

Mac support, whenever issues come up. Not sure what the facilitation approach is here—maybe giving GHC developers macs or more Mac CI.

Kleidukos · February 19, 2021, 11:15am

Both of them actually

DavidB · February 19, 2021, 12:20pm

I often find myself in the situation of having to recommend books and introductory material on Haskell to students. On the Haskell website https://www.haskell.org/documentation/ there is a list of beginner books. A quick check shows that among these books only three, “Real World Haskell”, “Learn you a Haskell” and “Developing Web Applications with Haskell and Yesod” are freely available. (In a HTML version, and not in the typesetted pdf form).

While we have a well-stocked university library which our students can use, I still prefer to recommend material that is freely available on the internet. Maybe the Haskell foundation could try to contact the authors and publishers to see whether it is possible to make some of these books available as PDFs.
Some of these books have been available for several years, and these books are usually not particularily profitable for the publishers or authors anymore.

Several of the books have been published by Cambridge University Press, a non-profit publisher which is open to allowing freely available PDF versions of their books. (Cf. the preface to Tom Leinster’s book on Basic Category Theory: https://arxiv.org/pdf/1612.09375.pdf).
There is also precedence for companies sponsoring the open-access availability of books published by for-profit publishers, and several other programming languages have books that are simultaneously available in print and for free online.

Many thanks again to everyone responsible for setting up the Haskell foundation and making the joy of programming in Haskell available to more people

gilmi · February 19, 2021, 1:37pm

I also agree that there’s a lot of work that can be done on documentation, learning resources and the haskell.org website. But I’m under the impression that this thread is on more technical issues? Maybe it’s better to have a different thread on this when the foundation is ready to tackle it.

ozataman · February 19, 2021, 6:25pm

Hey all,

I wanted to capture some initial thoughts around pain points we have been observing in the past several months (years, perhaps) through our industrial use of Haskell (fairly extensively) at Well and Soostone. I’ll post some high level descriptions here, but can certainly elaborate further if helpful. As a disclaimer, I feel strongly only on the first couple of points - I’m capturing the rest here for discussion’s sake, but readily admit I couldn’t claim they’re the most important priorities at the level of the ecosystem.

Finally, please note some of these comments come from an environment where we champion the use of Haskell in competition with more mainstream alternatives like JavaScript/NodeJS/TypeScript/etc and where practical results are what drive/justify the use of Haskell.

Faster compilation

To set a bold target, we need to reduce our compile times to something like 10% of what it is
today. The amount of productivity lost to this one particular pain point is astronomical, to put it mildly. That said, any and all improvements would be most welcome, even if just a small percentage.

Perhaps a better way to say this is to make GHC compilation speed one of our primary, top-level objectives for Haskell adoption.

To illustrate the point further by offering a trade-off, I would much rather delay progress on developments in advanced typing features for the sake of compiler speedups or at least to ensure that they don’t cause further speed regressions. This is given, of course, that Haskell already provides for an amazingly expressive programming model unmatched elsewhere.

If helpful, here’s another mental model: When GHC compile times slow down by 10% for the benefit of a new advanced feature, industrial users immediately feel a multi-million dollar hit to their productivity/happiness/QoL in return for a modest offsetting gain but only in use cases where they benefit from the new advanced feature.

Lower memory usage

This is less important than compilation speed, but still worth mentioning. We’re currently unable to work on our 500+ module project on a 16GB RAM laptop. The immediately solution is of course to get larger machines, which is what we’re having to do, but it isn’t doing us any favors in arguments against, say, JavaScript.

Add to this HLS, ghcid, repl, etc. and you really need a heavyweight
machine to work effectively on a growing commercial codebase.

Mac support needs to get better

We’re still running into mach-o linker errors on Mac on large projects and having to
jump through hoops to avoid it. In some corporate setups (especially under regulation), developers are forced into a single platform (e.g. Mac) and ensuring proper “industrial” compatibility is pretty important.

An answer for security scanning / static analysis needs

Big corps are increasingly mandating that their vendors must use static
analysis tools for cybersecurity and other compliance reasons. We should think about improving our answer here.

A very clean, extensible Nix project setup/skeleton

I know this is a contentious point, as we have a multitude of preferences in our ecosystem, but I’m of the opinion that reducing time-to-prototype for a full-stack application setup would be a big win for increasing Haskell industrial adoption.

Reduce energy needed to spin up a new project to 0
Have clear path to transforming into a serious project
Must be easily extensible to add non-Haskell dependencies; python,
databases, tensorflow, whatever.

A clear, easy path to modern DevOps style deployments

Manual management of VMs with SSH access is a non-starter in serious corp setups. They immediately fail compliance audits and a variety of needed certifications. Having an easy answer for productionalization via something like Terraform, containers, Gitlab runner style CI/CD deployments, etc, would go a long way so folks don’t have to reinvent the wheel themselves.

anticrypto · February 19, 2021, 7:43pm

tl;dr I think the Haskell Foundation should focus on low-hanging fruit (namely documentation and guidelines) for most of the problems outlined in @ozataman’s original post, rather than focusing its effort primarily on technical improvements to GHC.

Additionally I believe that the Haskell Foundation should strive to be as unopinionated as possible in its focus and recommendations, and that separate “Working Groups” should be established to address concerns for specific areas (e.g. DevOps, Nix, etc.).

I want to preface my response by saying that I think all of the Haskell-specific points in this post are worth thinking on and improving in GHC; I’m taking the time to write this up for two reasons:

So that other potential industrial Haskell users can potentially chart a path for themselves that avoids these problems without having to wait for GHC to implement technical improvements
To encourage the folks working on a technical agenda to focus on what I believe to be the “lowest hanging fruit” for us as a community: establishing a set of best practices, design patterns, and general frameworks for architecting large Haskell applications that avoid these sorts of pitfalls

Faster Compilation

tl;dr We should always strive to improve GHC’s performance, but for most users I posit that the slowdowns people have experienced can be immediately mitigated by a proper set of best practices

I’m going to make the, perhaps bold, assertion that GHC’s performance as a compiler is actually mostly fine, and that the vast majority of compile-time regression which users encounter is a result of writing code and pulling in dependencies which rely on the increasingly advanced language features that GHC incorporates.

I’m not suggesting that compilation time isn’t a good metric to set for improvement (regression-testing compiler performance between releases is a good thing), but that for most codebases that I have experience with the issue is more often that Haskell developers tend to reach for extremely baroque language features to accomplish common tasks.

In other words: Haskell programmers should learn from C++ developers and establish some guidelines for:

practical “novelty budgets” within common classes of Haskell codebases
the performance implications of certain language features which often tend to make up these novelty budgets
playbooks for recovering from situations where unmitigated technical debt has “blown” a project’s novelty budget
- I imagine there are plenty of industrial users who have ended up in this situation and might be able to offer some insight

Lower Memory Usage

tl;dr As with compilation speed issues, GHC’s resource consumption is very likely to be “adequate” so long as developers avoid features that are known to trip the compiler up unless they are absolutely necessary.

I feel the same way about this as I do about the section above: there are probably many areas where GHC can be improved on a technical level, but the vast majority of memory bloat in most industrial codebases will almost definitely come from people using advanced/complex language features.

I feel fairly safe in making the claim that GHC (and much of the associated tooling) should be fine dealing with projects upwards of 1000 modules and 100k LoC without much issue as long as the project is well-architected.

Likewise (although you didn’t mention it), the runtime footprint of Haskell applications can be extremely small yet many industrial codebases can often find themselves in a situation where they consume vast amounts of memory or burn countless CPU cycles.

The key point, in my mind, is for us to understand what sorts of constructs/behaviors cause GHC’s resource consumption to balloon (both at compile-time and at runtime), and to avoid them as much as possible.

Mac Support Needs to Get Better

As far as I’m aware, linker errors on macOS these days are wholly confined to Nix build setups.

If there are mitigations on the GHC side of things which can improve this, we should certainly pursue them, however I don’t think it’s worth prioritizing this as I’ve not heard of any from stack or cabal-install users in recent memory.

A Very Clean, Extensible, Nix Project Setup/Skeleton

tl;dr Industrial users of Haskell and Nix should establish a “Nix Working Group”, but the Haskell Foundation (in general) should not concern itself with advocating for/addressing problems with Nix in any official capacity.

I don’t think that the Haskell community should get into the habit of evangelizing Nix, especially in industrial environments.

Nix is extremely complex, arguably moreso than Haskell itself; Haskell users are already much more likely to recommend Nix than users of other languages, and I worry that this gives the impression that Nix is the One True Way to develop and deploy Haskell applications.

A Clear, Easy Path to Modern DevOps-Style Deployments

tl;dr I feel very strongly that the Haskell Foundation should strive to be as unopinionated as possible in advocating for build/deployment/ops solutions so that we can maximize Haskell’s potential for industrial adoption.

Much like with Nix, I think that while it might be useful to establish a “DevOps Working Group” it shouldn’t be something that the Haskell Foundation should directly concern itself with beyond making sure that Haskell tooling conforms with whatever the state of industrial “best practices” is for artifact generation.

Realistically, I think that there are orders of magnitude more resources available for DevOps best practices than for Haskell these days; we shouldn’t expend the effort in an area where any competent company should be expected to educate themselves accordingly.

I think, as I’ve probably expressed in some of my other comments, the Haskell community does itself a disservice by tying itself so strongly to Nix. I get the impression that many community members feel that the “ideal” solution is something involving nix-copy-closure and a bunch of NixOS machines. While this may be nice for an organization that’s bought into Nix/NixOS, it’s extremely non-standard in the realm of industrial application deployment.