Bringing Data.Text into `base`: What is the next step?

Indeed, I don’t expect us to change encoding, but the example serves as a caution against moving things into base: it ties upgrades to GHC. That doesn’t seem desirable.

Perhaps you can elaborate on the positive aspects of moving text into base. What would we achieve by it that we can’t already do?

1 Like

Sure, let me amend my initial post. :slight_smile:

Thanks. So I agree with 2 (New APIs inside of base can adopt Text). That would be nice. But I think 1 should be solved by different means. If it’s difficult to use text in a Cabal project, make it easier! That shouldn’t require radically-restructuring base libraries.

For reference:

  1. We change the culture towards a legitimisation of Text as part of the basic toolkit of the Haskeller, especially in Cabal projects where text isn’t readily available unless it is added manually to the dependencies.

If it’s difficult to use text in a Cabal project, make it easier! That shouldn’t require radically-restructuring base libraries.

Are you thinking about a solution specifically to make it easier? I know of Cabal mixins but they would probably need base to re-export text (and thus depend on it). Or we can have something radical like a std library that re-exports the Core Libraries (with sensible PVP bounds), and have cabal-install amend its default template.

Regarding a radical restructuring of the base library, I’m afraid I don’t quite understand what you mean by that, this discussion is about having Data.Text.* in base. But perhaps you’re seeing something that I do not?

Are you thinking about a solution specifically to make it easier?

I don’t have any particular solution in mind, but it shouldn’t be beyond the wit of humanity to come up with some reasonable solution, less drastic than including Data.Text in base.

To me, including Data.Text in base is a radical restructuring, because it prevents the evolution of Data.Text as distinct from GHC.

I have a simple proposal: let there be a new package standard. standard includes and re-exports modules from base and text (and possibly a few more). We teach cabal to either default to including standard as a dependency or ask users during the interactive init. Problem solved?

11 Likes

Doesn’t sound appealing to hardcode a 3rd party library name into cabal and treat it special. Then this should be a generic feature that supports arbitrary alternative preludes. And that’s a more complicated design space.

1 Like

I don’t mean to grill you on that topic, but do you have anything that is not covered by Cabal mixins today? :slight_smile:

While this is a technically valid solution, I think that from a PL design perspective, not only having but maintaining the textual type in a separate library outside of the lowest common denominator (in our case, base) really sends out the wrong message.
I do not know of any other language that has decided to do such a thing in favour of a data structure that has a useless API and cannot even give me the number of graphemes in a character string vs. the number of code points.

(rant: And to be frank, I think that I am quite fed up by this Haskell Exceptionalism that is so pervasive in our culture to the point of justifying the worst design decision from the last millennium as if their rectification was an intolerable attack on our very being. I’m not only speaking of what this discussion brings but it is a general sentiment that strikes me when I read similar community debates.)

5 Likes

What about bytestring? It seems like both text and bytestring are already provided with ghc according to Haskell Hierarchical Libraries . Why Cabal projects can use bytestring but not text?

This page lists the libraries shipped with GHC distributions (both text and bytestring are listed there) but this is not something that is reflected in Cabal projects, no.

Also I don’t understand your last sentence. Of course Cabal projects can use bytestring and text.

Sorry I misread your first positive changes regarding text not being readily available. I thought there was something special about bytestring that makes it more available than text without being part of base.

1 Like

The immediate technical challenge is text-2.0.1: dependencies | Hackage. Do we expect to fold every (transitive) dependency into base?

I think making cabal init to add text to build-depends for new projects would give most of the benefits for almost zero costs.

10 Likes

A modern language should have a proper packed textual type in its standard library. This is a matter of straightforward pragmatism. Since Haskell doesn’t currently have that, frankly I’d be embarrassed to recommend it as a language useful for serious work. I just don’t understand the opposition. Can you imagine if the Rust people were talking about splitting String out of std into a separate library? I’d assume they’d lost their minds.

6 Likes

I was not aware that cabal init supports choosing alternative Preludes interactively through mixins. Did I miss something? Or you misread my comment?

Why not all boot libraries?

2 Likes

Not all, but I’d be in favor of these: base, binary, bytestring, containers, deepseq, directory, exceptions, filepath, mtl, process, stm, text, time, and transformers.

In particular I wouldn’t add array and parsec because they aren’t the most popular libraries in their domain any more. Instead we could add vector and megaparsec, but I personally think vector is too complicated and megaparsec is too specific to always be included. Personally I prefer contiguous (and primitive) for arrays, but that package isn’t very popular yet (and it currently has quite a large amount of transitive dependencies).

3 Likes

Misread your comment indeed. :slight_smile:

If we move text and possible other libraries directly to base this will affect the development process, so I quite like the idea of having standard library which wraps multiple libraries. Does anybody know how haddock presents re-exports? I think it does the right thing (although I am not 100% sure), if it does then standard would be a sensible step forward. We’d only need to move text to base if we want to change the representation of String at the same time, but that might be quite difficult anyway right now. Anyway I’d be happy with either choice.

This makes a lot of sense as well. It’s a bit dumb that a vanilla ghci session has access to bytestring, text and containers, but cabal repl after cabal init suddenly pretends to know nothing about boot libraries.


One more avenue to explore is to split text into text-type, providing only data Text with instances, and text proper, providing everything else. In this case it’s likely that only text-type should remain a boot package, because the only other consumers of text are Cabal and parsec, both for superficial reasons. If the resulting text-type appears slim enough, in future one could discuss merging it into base.


Just to be clear: I’m not going to put my efforts into any of this, treat it as a cheap talk :wink:

9 Likes