The Quest to Completely Eradicate `String` Awkwardness

I see lots of enthusiasm and interest in this thread. May I urge it to be channeled into practical developments and not just dreaming how strings could have looked in an imaginary ecosystem? I’m sorry for reality check, but String and functions to work with it are to remain in base, and String cannot be substituted by Text or [Text] under the hood automagically.

I see lots of proposed solutions in this thread, but I don’t know what problem they solve. If you want to use Text, go and use it. What exactly makes it inconvenient?

9 Likes

In my experience… mainly, the fact that Text and String overlap almost exactly 1:1 in their usecases, except as mentioned, Text is just a better String. Secondly, OverloadedStrings is annoying to say the least, and while it and a combination of another extension seem to help when it comes to defaulting string literals to Text, it’s not something you see often, and it’s still a bit more inconvenient to use Text than String because base is so string-oriented, so you’re forced to implement some of the basic features yourself, or use a library like text-show or text-display for something that should have been included in base or core (or is already included, just for the worse version). Of course, if I had to pick between using Text or String right now, I would exclusively use String because it’s just way more convenient and using Text as it is right now just adds some annoying overhead. And yet, I would still like to get the benefits that Text would give if I had used it. I think the point of this discussion is to get a better idea so that we can direct this fairlyand story into knowing how or what would be a good way to simplify the ecosystem…

This statement is almost true, because every String (unless it contains surrogate characters!) can be replaced by a lazy Text, where each chunk contains exactly one Char. But I assume this is not what you mean?

If the statement was to mean that String can be replaced by strict Text then it’s false, you can easily face major performance penalties if you do so.

For instance, tasty uses String. Never been an issue.

How exactly [Data.Text.Text] is better than Data.Text.Lazy.Text? They are almost isomorphic (chunks of lazy text are guaranteed to be nonempty) and the existing implementation of lazy Text is indeed built upon the implementation of strict Text, so I don’t see who is to win and what is to be saved. Mind you that one would still need to wrap [Text] into a newtype to provide instances.

I remember alternative preludes being all rage around 2016. Pretty much all of them are abandoned or on life support in 2024, causing great pain to projects who embraced them. Alternative preludes do not stand the test of time.

As for text I would strongly advise against moving the entire package into base. But we can move data Text and its instances.

You can define (:+) as below and use it instead of (:):

pattern (:+) :: Char -> Text -> Text
pattern x :+ xs <- (uncons -> Just (x, xs)) where
  x :+ xs = cons x xs
8 Likes

If Text works better for your usecases, just use it.

Why is it annoying? You can put it into a Cabal file to apply for all modules (or even into the global Cabal config, if you fancy so) and you can put it into .ghci to apply for all GHCi sessions.

But you don’t have to pick one. Use whatever is appropriate and approachable in each specific instance.

3 Likes

it would be more convenient if we picked Text over the current String and somehow replaced String with it while attempting to make it still act like String but with the benefit of Text ?

One proposed motivation for the Backpack module system back in the day was to overcome the Text vs String impasse by letting packages depend on an abstract signature which could be filled with either one.

From Edward Z. Yang’s thesis:

But Backpack didn’t catch on for a variety of reasons.

1 Like

Can you elaborate on where you would draw the line and why?

Just strict data Text and the API used by its instances, entirely ignoring Data.Text.Lazy, encompasses code currently living in 15 different modules in text. Here, courtesy of calligraphy, is a graph of those dependencies:

A complex dependency graph

(Link to SVG, if the above doesn’t render.)

Is extracting this specific bundle of code more appealing than absorbing text wholesale?

calligraphy sounds cool! Unfortunately I don’t see any image, I just see literally “A complex dependency graph” (which I guess is the alt text). Is something wrong my end, or is that what everyone sees?

Sorry about that; does the link I just added work better for you?

(Direct SVG uploading is disabled on this Discourse; I just tried a PNG fallback but the server force-shrinks it down to an unusable size.)

Yeah, the link works, thanks!

1 Like

Could you please remove instance Binary Text (which in case if data Text is moved into base would go into binary itlsef) and other instances for classes outside of base and rerun the analysis?

2 Likes

If the supporting code for instance Binary Text goes into binary, and the supporting code for instance Lift Text goes into template-haskell, we are down to 9 modules and this graph:

A less complex dependency graph

(Link to SVG)

But the supporting code for instance Lift Text, for example, includes the function Data.Text.Array.copyFromPointer. Is that function really going to live in template-haskell? Seems like a very weird place to import it from. Similarly, is Data.Text.Array.newPinned going to live in binary simply because encodeUtf8, and thus instance Binary Text, uses it?

I would have assumed that even if those instances go to their classes’ packages, the Text-related functions they use would go to base, which would make the first graph more reflective of how text is to be chopped up.

It can be whittled down further (fusion can go, Data.Text.Array can go), but yes, selected functions from 9 modules are infinitely easier to put into base than 55 modules in their full glory.

The instance is to be rewritten without them, using raw GHC.Exts, and the functions with the rest of Data.Text.Array will continue to live in the text package.

Same: reimplement the instance using GHC.Exts, keep Data.Text.Array in text intact. All of them are just thin wrappers.

So just to be clear, your vision is to expose data Text, its instances, (maybe?) pack and unpack, and nothing else? And have redundant implementations of any machinery beneath those exports in base and text?

Correct, only data Text and instances, and export it from Prelude.

Hmm: similar to how the standard Prelude re-exports a select group of entities from the Data.List module - it worked back then, so it would be very annoying for something like it to fail now…

What about the constructor? It has to be exported from somewhere in order for text to make use of it, but it doesn’t seem like a safe thing to export from Prelude. It should be exported from the implementation module, right?

What’s the correct naming convention for that module? GHC.Text? Something with Internal in it somewhere?

Yeah, good point.

I’d suggest Data.Text.Type.

Text depends on ByteArray, which depends on various things that use Prelude, which gets in the way of exporting Text in Prelude.

Should I:

  1. Move data ByteArray to a new Data.Array.Byte.Type module and move its instances to their class modules
  2. Make those various things depend on specific GHC.Internal.* modules instead of Prelude
  3. Something else I haven’t thought of?

Data.Byte.Array already depends on a bunch of GHC.Internal.* modules, so option (2) seems the easiest.

Just as a side note: base uses this property (surrogate range) for FilePath roundtripping (PEP-383).

So we wouldn’t be able to change the definition of Char, unless we also move OsPath to base.

Well, yes. But I’d also say that time has proven that base is the last place to receive a redesign of historical mistakes.

So where does that leave us?

Sure, but:

  • String is a bad default
  • and yet it is deeply ingrained in our standard library
  • it is a constant educational challenge

So what we’re facing here isn’t really lack of better alternatives, but sh*tty defaults.

And defaults do matter.

7 Likes