I see lots of enthusiasm and interest in this thread. May I urge it to be channeled into practical developments and not just dreaming how strings could have looked in an imaginary ecosystem? I’m sorry for reality check, but String and functions to work with it are to remain in base, and String cannot be substituted by Text or [Text] under the hood automagically.
I see lots of proposed solutions in this thread, but I don’t know what problem they solve. If you want to use Text, go and use it. What exactly makes it inconvenient?
In my experience… mainly, the fact that Text and String overlap almost exactly 1:1 in their usecases, except as mentioned, Text is just a better String. Secondly, OverloadedStrings is annoying to say the least, and while it and a combination of another extension seem to help when it comes to defaulting string literals to Text, it’s not something you see often, and it’s still a bit more inconvenient to use Text than String because base is so string-oriented, so you’re forced to implement some of the basic features yourself, or use a library like text-show or text-display for something that should have been included in base or core (or is already included, just for the worse version). Of course, if I had to pick between using Text or String right now, I would exclusively use String because it’s just way more convenient and using Text as it is right now just adds some annoying overhead. And yet, I would still like to get the benefits that Text would give if I had used it. I think the point of this discussion is to get a better idea so that we can direct this fairlyand story into knowing how or what would be a good way to simplify the ecosystem…
This statement is almost true, because every String (unless it contains surrogate characters!) can be replaced by a lazy Text, where each chunk contains exactly one Char. But I assume this is not what you mean?
If the statement was to mean that String can be replaced by strict Text then it’s false, you can easily face major performance penalties if you do so.
For instance, tasty uses String. Never been an issue.
How exactly [Data.Text.Text] is better than Data.Text.Lazy.Text? They are almost isomorphic (chunks of lazy text are guaranteed to be nonempty) and the existing implementation of lazy Text is indeed built upon the implementation of strict Text, so I don’t see who is to win and what is to be saved. Mind you that one would still need to wrap [Text] into a newtype to provide instances.
I remember alternative preludes being all rage around 2016. Pretty much all of them are abandoned or on life support in 2024, causing great pain to projects who embraced them. Alternative preludes do not stand the test of time.
As for text I would strongly advise against moving the entire package into base. But we can move data Text and its instances.
You can define (:+) as below and use it instead of (:):
pattern (:+) :: Char -> Text -> Text
pattern x :+ xs <- (uncons -> Just (x, xs)) where
x :+ xs = cons x xs
If Text works better for your usecases, just use it.
Why is it annoying? You can put it into a Cabal file to apply for all modules (or even into the global Cabal config, if you fancy so) and you can put it into .ghci to apply for all GHCi sessions.
But you don’t have to pick one. Use whatever is appropriate and approachable in each specific instance.
it would be more convenient if we picked Text over the current String and somehow replaced String with it while attempting to make it still act like String but with the benefit of Text ?
Can you elaborate on where you would draw the line and why?
Just strict data Text and the API used by its instances, entirely ignoring Data.Text.Lazy, encompasses code currently living in 15 different modules in text. Here, courtesy of calligraphy, is a graph of those dependencies:
calligraphy sounds cool! Unfortunately I don’t see any image, I just see literally “A complex dependency graph” (which I guess is the alt text). Is something wrong my end, or is that what everyone sees?
Could you please remove instance Binary Text (which in case if data Text is moved into base would go into binary itlsef) and other instances for classes outside of base and rerun the analysis?
If the supporting code for instance Binary Text goes into binary, and the supporting code for instance Lift Text goes into template-haskell, we are down to 9 modules and this graph:
But the supporting code for instance Lift Text, for example, includes the function Data.Text.Array.copyFromPointer. Is that function really going to live in template-haskell? Seems like a very weird place to import it from. Similarly, is Data.Text.Array.newPinned going to live in binary simply because encodeUtf8, and thus instance Binary Text, uses it?
I would have assumed that even if those instances go to their classes’ packages, the Text-related functions they use would go to base, which would make the first graph more reflective of how text is to be chopped up.
It can be whittled down further (fusion can go, Data.Text.Array can go), but yes, selected functions from 9 modules are infinitely easier to put into base than 55 modules in their full glory.
The instance is to be rewritten without them, using raw GHC.Exts, and the functions with the rest of Data.Text.Array will continue to live in the text package.
Same: reimplement the instance using GHC.Exts, keep Data.Text.Array in text intact. All of them are just thin wrappers.
So just to be clear, your vision is to expose data Text, its instances, (maybe?) pack and unpack, and nothing else? And have redundant implementations of any machinery beneath those exports in base and text?
What about the constructor? It has to be exported from somewhere in order for text to make use of it, but it doesn’t seem like a safe thing to export from Prelude. It should be exported from the implementation module, right?
What’s the correct naming convention for that module? GHC.Text? Something with Internal in it somewhere?