Bringing Data.Text into `base`: What is the next step?

tristanC · September 8, 2022, 4:10pm

Sorry I misread your first positive changes regarding text not being readily available. I thought there was something special about bytestring that makes it more available than text without being part of base.

Bodigrim · September 8, 2022, 6:53pm

The immediate technical challenge is text-2.0.1: dependencies | Hackage. Do we expect to fold every (transitive) dependency into base?

I think making cabal init to add text to build-depends for new projects would give most of the benefits for almost zero costs.

sullyj3 · September 9, 2022, 12:18am

A modern language should have a proper packed textual type in its standard library. This is a matter of straightforward pragmatism. Since Haskell doesn’t currently have that, frankly I’d be embarrassed to recommend it as a language useful for serious work. I just don’t understand the opposition. Can you imagine if the Rust people were talking about splitting String out of std into a separate library? I’d assume they’d lost their minds.

hasufell · September 9, 2022, 1:00am

I was not aware that cabal init supports choosing alternative Preludes interactively through mixins. Did I miss something? Or you misread my comment?

hasufell · September 9, 2022, 1:07am

Why not all boot libraries?

jaror · September 9, 2022, 8:44am

Not all, but I’d be in favor of these: base, binary, bytestring, containers, deepseq, directory, exceptions, filepath, mtl, process, stm, text, time, and transformers.

In particular I wouldn’t add array and parsec because they aren’t the most popular libraries in their domain any more. Instead we could add vector and megaparsec, but I personally think vector is too complicated and megaparsec is too specific to always be included. Personally I prefer contiguous (and primitive) for arrays, but that package isn’t very popular yet (and it currently has quite a large amount of transitive dependencies).

Kleidukos · September 9, 2022, 10:05am

Misread your comment indeed.

coot · September 9, 2022, 11:28am

If we move text and possible other libraries directly to base this will affect the development process, so I quite like the idea of having standard library which wraps multiple libraries. Does anybody know how haddock presents re-exports? I think it does the right thing (although I am not 100% sure), if it does then standard would be a sensible step forward. We’d only need to move text to base if we want to change the representation of String at the same time, but that might be quite difficult anyway right now. Anyway I’d be happy with either choice.

Bodigrim · September 9, 2022, 8:26pm

This makes a lot of sense as well. It’s a bit dumb that a vanilla ghci session has access to bytestring, text and containers, but cabal repl after cabal init suddenly pretends to know nothing about boot libraries.

One more avenue to explore is to split text into text-type, providing only data Text with instances, and text proper, providing everything else. In this case it’s likely that only text-type should remain a boot package, because the only other consumers of text are Cabal and parsec, both for superficial reasons. If the resulting text-type appears slim enough, in future one could discuss merging it into base.

Just to be clear: I’m not going to put my efforts into any of this, treat it as a cheap talk

romes · September 10, 2022, 9:26am

I have a take on a different solution RE: adding core libraries at cabal init, but it’ll be up to cabal developers to know how feasible it is.

(1) the problem I see with the current “just base” setup is that when starting a project, I don’t know which libraries I’ll need. By the time I need text, I import Data.Text but then there I have to go add text to the packages. Then the same thing happens for import Data.Set, etc… This cost varies across domains, and while for me it’s repetitive boilerplate work I can see it be more problematic when e.g. teaching Haskell!

(2) the problem I see with the solution to “add all core libraries at cabal init” is that I want my dependencies to match the things I’m using. If we starting adding them all by default I would eventually prune the dependencies of the project before e.g. publishing (I wouldn’t want my haddock dependencies list to be unnecessarily large). If big enough, I can imagine I’d just comment all and re-add them by compiler need, as I wouldn’t remember which ones are being used, and would end up doing the boilerplate work I was trying to avoid in the first place.

My proposed solution that addresses (1) and (2) is a mechanism we’re already familiar with implemented in cabal:

Previously, if I had a cabal project with a module A and imported a new module B from it:

module A where

import B

...

When compiled it would fail unless I added other-modules: B to the project.

However, this was fixed and now we have it quite nice. If we compile A without adding B to other-modules, we’ll get a warning:

<no location info>: warning: [-Wmissing-home-modules]
    These modules are needed for compilation but not listed in your .cabal file's other-modules:
        B

And this exact mechanic is what I think might be a good solution for our case.

The solution concretely:

I say (if possible) it would be great if I could from anywhere in my project import Data.Text and import Data.Map – everything would work – and I would simply get a warning:

<unlisted package info>: warning: [-Wunlisted-core-package]
    These core packages are needed for compilation but not listed in your .cabal file's dependencies:
        text,
        containers

This doesn’t address the issue of a standard library which doesn’t require extra imports for things like a Map and a good textual representation. It doesn’t address the fact base functions still can’t rely on Text. But I think it’s a nice compromise between not having any help from cabal and adding all core libraries on init.

RE: @rae 's standard: It’s a good idea, but would say it has to be backed by the haskell foundation/ship with GHC. Otherwise isn’t it just another alternative prelude?

gdifolco · September 11, 2022, 8:43am

While I can understand the necessity of having the requirement of having qualified imports for third party libraries to alleviate a lack of extensibility, it should not be the default design whenever there are part of the same library.

gdifolco · September 11, 2022, 8:45am

You’ll coupled all subsequent libraries to the underlying compiler implementation, which may cause maintenance issues in the long run.

ChShersh · September 11, 2022, 10:45am

For the purpose of understanding the pros and cons of this decision better, it could be helpful to look at history. In 2007, when GHC 6.8.1 was released, packages array, bytestring, directory and many others were split from base into separate packages.

GHC 6.8.1

I haven’t found the discussion with the reasons for the split. But I thought it would be interesting to know why this happened. Otherwise, it looks like just going circles: first we split things then we move them back. Lots of time was wasted and we stayed in the same spot.

Of course, circumstances and limitations may change and the situation in 2022 might be different from 2007. In that case, it would be interesting to know what has changed

atravers · September 11, 2022, 11:14am

1.4. Release notes for version 6.8.1 (section 1.4.4) - well spotted, @ChShersh!

…since 6.8.1, has there been any other transfers of code away from base into other boot-libraries? If not, then comparing it to the current version of base could help to determine how things have (apparently) “gone awry”.

This information may not be able to help in salvaging base, but it could help to avoid making (most of) the same mistakes, if it’s impossible to practically separate GHC and base and the two have to be rebuilt.

tomjaguarpaw · September 11, 2022, 1:47pm

“there’s only two ways I know of to make money: bundling and unbundling.”

atravers · September 11, 2022, 3:11pm

jackdk · September 11, 2022, 10:03pm

How much pain would be removed if cabal offered to edit your .cabal file to add the missing package build-depends if it sees you trying to import a module from a package you aren’t depending on?

$ cabal build
[snip...]
app/Main.hs:3:1: error:
    Could not load module ‘Data.Text’
    It is a member of the hidden package ‘text-1.2.5.0’.
  |
3 | import Data.Text
  | ^^^^^^^^^^^^^^^^

Automated fixes are available. Would you like to:

1. Do nothing
2. Add `build-depends: text ^>=1.2.5.0` to the target `app`
-- One of these for each `common` stanza imported by the target being built
3. Add `build-depends: text ^>=1.2.5.0` to the common stanza `deps`

Enter choice [1]: 2
Successfully updated `foo.cabal`; attempting rebuild.

Perhaps only show these hints if cabal is being run interactively, so that other build tools don’t get tripped up by a subprocess waiting for input?

santiweight · September 11, 2022, 10:17pm

I believe that much of the frustration in this thread would be solved by (either of):

Adding better default packages to cabal init (with an optional standard package)
Having HLS suggest adding new packages when importing from a hidden package

There is already progress for (2) in HLS, and I encourage people to send some emojis/love to the relevant PR: https://github.com/haskell/haskell-language-server/pull/2954

(1) seems like a good thing to do. Please add feedback/support on the new issue in Cabal.

Let’s push for either of these solutions. We can always find the right solution after we pick the low-hanging fruit.

Also: a much hotter point of cabal frustration is module discovery. I encourage everyone to read the thread here and add thought to the conversation so that progress happens!

santiweight · September 11, 2022, 10:43pm

This is not generally a job for cabal (which afaik does not know about type checking errors that come from GHC). But it’s certainly reasonable for HLS to handle.

hasufell · September 12, 2022, 4:21am

I don’t understand what this means. Boot libraries are already coupled to GHC versions: version history · Wiki · Glasgow Haskell Compiler / GHC · GitLab

Topic		Replies	Views
Informal discussion about the progression of `base` Core Libraries Committee	158	8336	May 6, 2024
Customizing Base Package Learn	28	1393	March 7, 2024
Text-2.0 with UTF8 is finally released! Announcements	20	2821	January 27, 2022
Base proposal around vector-like types Haskell Foundation	85	7473	August 26, 2023
The evolution of: Decoupling base and GHC	47	3661	February 17, 2022

Bringing Data.Text into `base`: What is the next step?

Related topics