[Call for Ideas] Forming a Technical Agenda

bgamari · February 17, 2021, 12:52am

Hi all,

With the Haskell Foundation’s board and executive director established and fundraising going well, it is time to pin down an agenda of technical priorities which the Foundation’s resources should support. In this thread I am asking for your help in brainstorming ideas for this agenda.

Since its introduction Haskell Foundation has been framed primarily as an enabling institution. The Foundation is primarily to facilitate, not supplant, existing community efforts; while the Foundation can fund new projects, this should be the option of last resort.

In particular, we want to focus on projects which:

facilitate the commercial adoption of Haskell and support the broader open source Haskell ecosystem.
which would benefit from financial or technical resources which would serve as a force multiplier to project development
in the case of projects that require developer time, have clear user-benefitting milestones which are achievable within six months

The goal of this thread is to collect a set of moderately-concrete projects which we can then prioritize and craft into an agenda for the Foundation’s first year activities. Please reply with your own suggestions for discussion.

bgamari · February 17, 2021, 12:53am

To help start discussion, below I give a non-comprehensive list of themes and projects that are (in my opinion) consistent with the goals of the Foundation. These are offered only in the spirit of kindling discussion; you are encouraged not to simply respond in agreement but rather to reply with your own ideas!

Haskell Language Server: Users today expect robust, easy-to-use IDE support. The Haskell Language Server has made great strides towards this goal, but much remains to be done. We should consider what can be done to push the HLS forward and ensure that it is packaged in an easily consumable form. Relevant projects might include contributing technical resources (e.g. CI capacity) and supporting developers to coordinate releases and technical planning.
Addressing Windows papercuts: Windows support has recently been a persistent problem. Between compiler bugs, native toolchain issues, and packaging troubles, there are numerous sharp edges that a prospective Haskell user might encounter on Windows. Nevertheless, Windows remains the most popular end-user operating system and as such has unique commercial relevance. Work might include:
- Triage open GHC tickets, and work to resolve them in order of end-user relevance
- Work to improve robustness and compile-time performance on Windows by migrating GHC to an LLVM-based native toolchain
- Advertise the improved support and actively seek out remaining pain points
Document and streamline profiling and performance analysis practices: GHC/Haskell has a wealth of tools for understanding runtime performance. However, these tools often do not work cleanly together and there are few resources that describe how these tools can be used to address concrete performance problems. As performance debugging is an important part of the development process, this is a significant hurdle to commercial adoption.

The Rust community does an excellent job of documenting these practices in an approachable manner (e.g. see the Rust Performance Book). We would be well-advised to follow their model and facilitate/sponsor the writing of a comprehensive guide describing common Haskell performance problems, relevant tools, and their application.

Possible topics include:
- Characterising memory usage with GHC’s built-in heap profiler and eventlog2html
- Locating thunk leaks with ghc-debug
- Understanding runtime time characteristics with the eventlog, Threadscope, and ghc-events
- Diagnosing, fixing, and avoiding common sources of long compile times
- Integrating production monitoring as mentioned below
Develop best practices for production telemetry: GHC’s eventlog exposes a wealth of information about the runtime behavior of Haskell workloads. Recent GHC development opens the door to moving the integrating this information

rebeccaskinner · February 17, 2021, 3:42am

A couple of ideas that I’ve had bouncing around my head for a while that might be worth considering:

General improvements to Haskell and the haskell infrastructure’s security

Evaluating the security of hackage, and looking at what opportunities there might be to improve security there (requiring package signing? 2fa? security scanning of uploaded packages?)
Documenting best practices for securely managing haskell dependencies, for example running an internal hackage mirror or how to best make use of existing cabal security facilities . See the recent example of the supply chain attack.
Is there anything we could, or should, do to mitigate risks caused by Setup.hs or Template Haskell being able to do some arbitrary IO? There might be some opportunity there either with tooling improvements or documentation.
Evaluate how well haskell is supported by existing static analysis and security scanning tools and make recommendations for organizations that have requirements for static analysis (there’s not much we can do here for commercial products, but it might generally be good to have this on the radar).

Consider expanding the core libraries
Not much specific to contribute here, but I wonder if it would be worth considering trying to get to more of a “batteries included” state in the core libraries, either by looking at things that we might want to promote, or else by considering what deficiencies exist in the library ecosystem and soliciting the community to build libraries to fill those gaps.

snoyberg · February 17, 2021, 4:05am

Cleaning up the base and other core libraries, including things like
- Fixing unnecessary laziness
- Avoiding partial functions
- Providing better data structures, such as using ByteString/Text instead of String or Vector instead of lists, in APIs
Redesign the GHC warning system to reduce the maintenance burden. In my experience, 95% of changes I make for new versions of GHC are “add some CPP or other hack to avoid an unused import warning because something has been reexported from elsewhere.” We could either modify the warnings themselves, or change how we maintain libraries.
More controversial: add some kind of streaming data concept into base, probably based on stream fusion.
Quality of life improvement: anything that can speed up compilation time is a Good Thing
Policy direction: define better rules around GHC and library backwards compatibility rules. Optimize this so that it’s practical for people to maintain their libraries to avoid attrition.
Ultimately: some serious discussion around what the future of ghcup/cabal/stack/stackage looks like together.

adamse · February 17, 2021, 7:19am

Replace String = [Char] with an utf-8 based string type.

chris · February 17, 2021, 1:19pm

A Faster Compiler

We need to talk about build times.

Anyone wanting to kick the tyres on Haskell could plausibly try out its most famous application,
setting it out to build its second most famous application — i.e., ghc building pandoc.

Initially things would go rather, especially if they have a decent workstation. They would observe
the enormous stack which it would duly tear through, until they got to the application itself at
which point it would all slow down to a trickle, for 15+ minutes, concluding that these Haskellers can
configure make to run in parallell but clearly that is as far as Haskell goes in parallell
execution. Haskellers will talk you to death with all of the fancy papers but can’t put it into
practice.

If our Haskell-curious developer were to make some enquiries of people using Haskell in anger on
non-trivial code bases they would quickly discover that build times are a real issue.

And this is all about to get worse. Disposable, fanless productivity laptops are being shipped with
8 cores and the rest of the industry will follow suit, while top-end workstations are being built
with 64 cores, soon to be many more, with mid-range prosumer workstations getting tens of cores.

We cannot have our flagship, critical, app being both slow and (out of the box) single threaded.

I am told that one issue is that the problem is the difficulty of effective load balancing when
cabal/stack and GHC are doing their own thing. Surely there is some low hanging fruit here.

tonyday567 · February 17, 2021, 8:27pm

My perception of the Foundation is that it can achieve stuff that has been blocking language evolution for a long time. I would add some big tickets:

unify ghcup/cabal/stack/stackage
fix the Num hierarchy. To do this we need functional dependencies in base.
module-named (or namespaced) re-exports, so I can export ‘Map.fromList’ say, rather than make the user of a library have to add containers as a dependency. Python does this right, as an example.

kgardas · February 17, 2021, 9:13pm

Using haskell for writing some commercial software. Also still just beginner, let’s say industry beginner. I would appreciate:

fix historic design mistake(s), e.g. String as [Char] etc.
simplify, or allow managed code simplification (and enforce this) by either implementing some option to turn on simple haskell or boring haskell or some option to support user defined haskell/ghc subset. E.g. -fhaskell_profile= where in file there will be a definition of supported extensions so user will get hard error on an attempt to add/use company not supported extension in the code. Or something like that.
focus on quality of implementation. GHC runtime/rts screams for attention here. A lot was done already, but still Haskell/GHC is miles away from industry languages like Java. I’m thinking about GC and its scalability, debugging, runtime behavior observability etc.

chreekat · February 17, 2021, 10:39pm

Paying down tech debt in Cabal would be my personal choice. Triaging issues and PRs, taking care of rough edges and bugs. I’m not sure if it should be in the Foundation’s mission to have a vision for Cabal’s future, but clearing the path a bit so others can have an easier job of it does seem appropriate, and can be scoped or time boxed more easily.

angerman · February 18, 2021, 2:44am

There are a lot of really good ideas here. Full disclosure: I’ve been involved in drawing up the first technical agenda during the HF’s formation.

One point I’d like to reiterate, which I think is important. We have lots of people who work on interesting problems, and at some point even with financial support behind them for those. We should also respect the effort people have put into projects. I would hate to see the HF jump onto projects, overtake them (likely unintentional) and then just demotivate those who started those projects and drive them out.

I’d rather see the HF take on issues that are too large for any single person or company to take on, or that need large and broad consensus across the larger community.

For example, I know that Obsidian Systems is going to work on GHC performance; Hasura with Well-Typed as well, and @rae has an open position for an intern to look at performance at Tweag as well. If the HF could provide some coordination support (if needed) and help align targets that would likely very helpful, and well spent resources, without needing to find someone to work on this directly for the HF.

In general I believe the Haskell Foundation should support existing work in general but more general use its position to tackle issues that need a neutral party to form consensus, or are too large for any single entity to even trying to tackle due to lots of social lubricant being needed to make it work.

One item that hasn’t been mentioned yet is: fixing template haskell. This is a rather big issue and it’s going to span so many projects, people, and will need to have built some broad consensus and design on how to fix it. There are lots of ideas floating around, but someone would need to helm this and organise it.

Why do we need to fix template haskell? TH’s ability to run arbitrary IO during compile time is a serious security concern. Build a project that somehow pulls down a dependency that wipes your hard drive? Exfiltrates your ssh keys? Something else? I don’t know how anyone feels comfortable running arbitrary haskell code with TH on their machines without aggressively sandboxing the environment in which the code is compiled.

The whole idea that TH can execute arbitrary IO (file access, run processes, …). TH is also a major reason why cross compilation is so painful, though that’s not limited to its IO capabilities.

Ericson2314 · February 18, 2021, 3:26am

Yeah I see people piling on in this thread with their favorite issues. I think it’s very important that HF not just go laxer-focussed on any one issue or other, but instead look at the overall functioning of “the machine” with all it’s disparate volunteers, techniques, etc.

For example:

A lot of people want various things with base / the standard library, including more stability. I think a lot of these goals are irreconcilable in a single library, and this and not social/political issues is why we’ve seen endless discussion of this stuff that goes nowhere. I want to seebase` broken and up and lots of it decoupled from GHC so we can actually see ecosystem-level experimentation, and yet also more stability. The fine-grained base avoids collateral damage from fast-evolving or competing core infra also making life miserable for things that just use the uncontroversial bits.

The “alternative preludes” I don’t think do the topic justice because a) they are so big as to be geared towards end applications rather than the fine-grained libraries that are our ecosystem’s bread and butter, b) since they must use base they are forced to paper over issues rather than extricate them properly.
Upgrading being hard I think directly relates to base being too big. base's every increasing major version creates are ripple of busywork throughout the Cabal-verse. A finer-grained base would “free” the highlighly stable portions from the ever churning bits.

(Fun fact, @snoyberg himself written that (I paraphrase) Rust doesn’t really need a Stack. He ascribed that to a culture of less breakage—I point to Rust’s std never having a breaking change in particular. I think that’s an over-correction that will cause std to accumulate mistakes and gotchas faster than base has, but perhaps is the lesser evil. Breaking up base is analogous to Rust’s core vs std, and I think wit versioning should allow for I think the goldilocks sweet-spot in between Rust’s “never break” and Haskell today’s “always break”.)
Many projects have extremely slow CI that sucks up valuable time. For me the gold standard is correct-by-construction caching for bug-free incramentality, and only Nix provides that at the moment. I don’t think there is appetite for yet another GHC build system, but I hope to knock out all the weirdness that makes a bespoke build system required in the first place. cabal build ghc and stack build ghc are perfectly possible we move the goalposts and accept building the compiler alone, and build rts, base etc. as separate steps.

Getting core libraries off extremely slow Travis is also important.

So yeah, few of these things are directly solving problems raised, especially in that they are focused on expert/core volunteer pain points. But I really do believe that if we first increase the productivity at which we can fix everyone else’s issues, the diversion will pay for itself really quickly. The challenge is thus building the trust to pursue these intermediate milestones.

I also want to get ahead of any sentiments that “Haskell is not popular and other languages are because they do these things better”. I’ve been involved with the Rust community for many years (though less recently), and I’ve dealt with the build systems / workflows of many languages as a Nix maintainer. A lot of tooling and build system is bad or worse across the board. So I don’t think we should feel shame or envy working on this stuff, because it’s not because we’re playing catch up. It’s rather because there is an opportunity to crush the competition .

lowest hanging fruit != areas where we are most behind!

Ericson2314 · February 18, 2021, 3:50am

A lot of TH can get away with no IO, and Haskell being Haskell should make that possible. Other TH however does need IO (though host IO, not target IO, as we get with cross-compilation today). Setup.hs is almost always used for evil and thus I don’t think we can do much better. Nix however shows that sandboxing builds is totally feasible, and so I think we should just do that and call it a day.

Doing something major about Cabal (e.g. getting rid of Setup.hs, rethinking the Cabal cabal-install boundary or whether it should exist at all, etc.) would be an excellent way to “cut the Gordian knot”. But those are very politically delicate, and I think not something the HF should tackle until his has more momentum.

However, I do think in the meantime Cabal should be studied for just how mind-boggling complex it is. Cabal more than anything else witnesses just how dramatically complexity can accrue. I think treating it as a cautionary tale can well inform how we deal with issues elsewhere, and would also prepare HF for when the beast can finally be confronted.

simonpj · February 18, 2021, 9:52am

Yes, I think everyone agrees strongly about that. The HF should focus on things that aren’t already being done. It would be counter-productive to duplicate, or (worse) supplant existing efforts. Rather we should seek to enhance and support them. “Thank you for doing X; how can we help make you more productive?”.

“Is anyone already doing this?” should be a question we ask about every proposed idea.

chris · February 18, 2021, 2:56pm

[Disclaimer: I am on the Technical Agenda Task Force]

The Technical Agenda Task Force is acutely aware of this issue of trampling on the toes of folks that are already working on stuff, and arguably this has been incorporated into the name – our object of focus is the agenda, not necessarily making stuff. Current thinking is that if an issue is important enough (most likely of strategic importance) then we ought to consider adopting it even if others are working on parts of the problem. In that case the TATF should be careful not to disrupt existing efforts for all the reasons you describe, but we might want to make enquiries about what is being done and identify complementary areas to tackle or take a watching brief or render assistance, or, where apropriate, help with coordination between different efforts.

chris · February 18, 2021, 3:08pm

In general I believe the Haskell Foundation should support existing work in general but more general use its position to tackle issues that need a neutral party to form consensus, or are too large for any single entity to even trying to tackle due to lots of social lubricant being needed to make it work.

I agree wholeheartedly with this statement. I should add that I am sceptical about the HF taking on large technical projects (a cabal/stack replacement), or projects where the goals are not tightly defined – e.g., an all-new singing and dancing Prelude that will fix all of its problems. Note, I am not saying we should not try to make things better but would want to look closely at Grand Unifying Projects that seek to determine the answer to Life, The Universe and Everything.

agentultra · February 18, 2021, 3:41pm

Here are a few ideas from other language foundations:

Could HF set aside a budget to bring under-privileged contributors and maintainers out to conferences or pay to get their work published?
Concerning adoption could the HF set aside a budget to give user groups and community organizers funds to rent spaces and buy pizza? Would enable us to run more Haskell user groups and potentially gain more contributors.
Would the HF be willing to put together a training program to teach people how to contribute to GHC or core libraries? Something like the eudyptula challenge or similar?
I also agree with a lot of the other suggestions that the core library could be refined and expanded. Supporting common data structures such as vectors, hash maps, queues, etc and the new linear types features would be awesome DX for making Haskell easy to reach for and require less setup/tooling.
Supporting the DTF and other community efforts by a recognition site where contributors can opt-in to receive badges and awards for their contributions?

blamario · February 18, 2021, 5:54pm

I want to see base broken and up and lots of it decoupled from GHC so we can actually see ecosystem-level experimentation, and yet also more stability.

Hear hear! But how exactly would Haskell Foundation help accomplish this task? Pay a developer? Establish a bounty? Since base is really a creature of GHC, perhaps this should start as a GHC initiative first?

Now that I mention it, a bounty program could actually make a lot of sense.

Ericson2314 · February 18, 2021, 6:20pm

I think the main obstacle is the current prohibition of orphan instances linearizing base’s modules in a rather contorted way. I have started a wiki page here https://gitlab.haskell.org/ghc/ghc/-/wikis/Rehabilitating-Orphans-with-Order-theory. I don’t think the problem is terribly hard, but I think it does deserve an academic paper before anything gets merged. This would mean mustering the research side of the GHC community for the sorts of problems that have normally been dealt by the non-researcher side, which (to me at least) is an exciting opportunity but also a non-negligible challenge.

Ericson2314 · February 18, 2021, 6:28pm

To me the big question is whether to plow right into that, or first focus on CI to free up more capacity for a new research project. I would normally say yes, CI first — capital projects to cut operating costs before capital projects for other purposes — but the solution for CI I trust most — doubling down on Nix everywhere — is highly politically fraught, perhaps more so than greenfield research about orphan instances with radical implications for how libraries are organized!

MaxGabriel · February 19, 2021, 1:26am

Compiler performance. I’m not sure if there is need here for tooling, or it’s just a matter of lots of work. Possibly this could also be tools to profile our own builds to speed those up (eg identify bottlenecks in parallelism).

Anything HLS needs from GHC’s side seems pretty high leverage. Especially anything that helps HLS on large codebases

Better documentation for libraries would be great. Not sure if there there is a facilitation role here or it’s pure work.

Mac support, whenever issues come up. Not sure what the facilitation approach is here—maybe giving GHC developers macs or more Mac CI.

Topic		Replies	Views
Technical Agenda Working Group	5	965	December 15, 2020
Tech Agenda Track: Meeting Minutes 2/26 Haskell Foundation	0	413	March 2, 2021
Tech Agenda Track: Meeting Minutes 3/5 Haskell Foundation	4	804	March 17, 2021
Tech Agenda Track: Meeting Minutes 3/12 Haskell Foundation	0	403	March 13, 2021
Tech Agenda Track: Meeting Minutes 3/19 Haskell Foundation	0	568	March 22, 2021

[Call for Ideas] Forming a Technical Agenda

A Faster Compiler

Related topics