The Quest to Completely Eradicate `String` Awkwardness

Also a daily reminder that moving things into base is not free, because it’s not (yet) reinstallable. Shipping bugfixes becomes slower and changing API requires a CLC proposal.


On another note, which I don’t see discussed yet. String is defined in the Haskell report:

1 Like

Regarding that note:

  • 30 Foreign.Marshal
# ghci
GHCi, version 9.8.2: https://www.haskell.org/ghc/  :? for help
ghci> import Foreign.Marshal
ghci> :t unsafeLocalState

<interactive>:1:1: error: [GHC-88464]
    Variable not in scope: unsafeLocalState
ghci> 

…so is adherence to the Haskell 2010 Report still an “absolute” requirement? If not, then type String = [Char] can also be moved out to its own library, possibly along with (the String-based versions of) Read and Show.

The requirements are ad-hoc and depend on the current CLC. As long as I am on it, I’m likely to vote against proposals that violate the standard. I know at least a couple of other CLC members are much more lenient when it comes to it.

1 Like

I think one very big impediment to getting rid of String is that you simply can’t stop people using String. String is just [Char]. Char is never going away. [] is never going away. You can’t stop people putting Chars inside []s. It’s too easy to use to eradicate.

I think efforts are better spent making sure Text is easier to use, particularly, making sure the preferred API of every popular library uses Text instead of String. Something else that would help in this direction would be a putative extension StringLiteralsAreText. OverloadedStrings is too general to be ergonomic, in my opinion.

14 Likes

To expand on this, there is a text data structure that has relatively good asymptotic complexity for all sensible operations (at worst a log factor slower than Text):

Why don’t we make that the default? Do we have benchmarks showing the space and time overhead compared to Text?

3 Likes

Exactly!

Core libraries or others have no obligation to support String/FilePath forever.

The sad part about this is that they are taking up the namespace as historical mistakes.

I feel people have too high expectations of base. Would it needs a redesign? Most certainly. But between haskell report adherence and lack of good migration strategies… I don’t see that happening.

So these days I’m thinking: maybe the way forward is to shrink base as much as possible, because everything outside of base is:

  • not subject to those pesky CLC proposals :wink:
  • reinstallable
  • irrelevant to the Haskell report

Alternative preludes can then achieve more opinionated interfaces and the community can vote with their feet.

That still leaves a bitter aftertaste in my mouth though when I have to explain to newcomers that:

  • String was a mistake
  • lazy IO (so… half of base) is unsafe

Sometimes you don’t get a second shot at an implementation/interface as this discussion reveals… maybe that also explains why CLC sometimes is overly conservative.

4 Likes

Another (early) internal-implementation option is a “juxtaposed-subnode” arrangement (there have been other terms) - for lists, the idea being that the tail/rest of the list is next to the cons node, rather than being in some other part of the heap (accessible via a second reference). Lists would then be more like short arrays, with either an nil node at the end, or an indirection to another shared list (with singly-used lists being automatically appended during GC).

But this arrangement could have benefits for all structured Haskell values with at least one lifted component or field: it isn’t list-specific.

So would this bring [Char] closer to parity with lazy Text?


If so, is abstract monadic I/O an abject failure?


Then you can tell the SWI-Prolog folks that they also got it wrong:

(…while the rest of us watch from a safe distance ;-)

At what point is it more beneficial to discuss moving text and primitive and deepseq and many other “core libraries” into the compiler as well? base is already being absorbed in 9.10, to GHC.Internal.* modules. Perhaps a good move is to merge everything into the compiler libraries, and only then export from base? It might permit a smoother migration trajectory than changing base outright, allowing for better integration from first steps. It would also permit libraries to have a smaller dependency footprint, by having base re-export these core libraries and giving them broader accessibility and discoverability. It’s evident that not everyone knows about Hoogle!

Doesn’t ExtendedDefaultRules cover that?

1 Like

Nope, the point of it is to use ghci defaulting rules in normal modules!

Reading the docs, it does seem like it only applies to a few classes, but I’ve definitely used default (Text) with OverloadedStrings before, 🤷

Thank you for writing that down. Is there a migration plan for replacing String pattern match with text? It looks we would need new view patterns, for example to match a text starting with a given character, or am I missing something?

Also, looking at Data.Text.Read, is there a reason why the error message is a String? Perhaps the eradication should starts there :slight_smile:

Unless I’m misunderstanding what you’re saying, we can already use uncons with ViewPatterns to get that sort of behaviour. Though it will always be more verbose than for String.

Ha, that is a bit ironic. Maybe MonadFail compatibility?

Oops, I meant pattern synonyms, similar to the ones from Data.Sequence. That is, to provide a simple migration path for the String users that rely on the list pattern to match strings.

1 Like

It’s on the way.

3 Likes

So once base is reduced to one or more modules which merely re-export from other modules, such as GHC.Internal.*:

  1. Move String into it’s own library somewhere, whose contents are then re-exported via base.

  2. Re-export Text via base, with deprecation-alerts added in that String library.

  3. Then stop re-exporting String from base altogether.

  4. Those deprecation-alerts can then be removed from the String library.

In that way, an effort to replace String with Text can help to make base (as it is now) smaller, rather than adding to it: nicely-done, mixphix!


So if an experienced Haskeller can be confused in that way, spare a thought for new arrivals - perhaps boring old SML-style:

explode :: Text -> [Char]
implode :: [Char] -> Text

would be a simpler option for educational purposes?

Maybe not with these names… that’s giving me PHP flashbacks…

Alright. I’m sorry for taking such a long time to write another message, but I feel like I had to absorb everything that everyone is saying. I’m extremely thankful for all the interactions and I feel like I need to point out a couple of notes after what people are saying:

First off, perhaps in the long run it is more user friendly to eradicate Text rather than String. Throughout everything mentioned thus far it feels like what people are imagining if in a hypothetical world we have all carried this herculean task, all code would be littered with Text instead of String. I think that this wouldn’t be a good experience for beginners, as they are called String right about everywhere else. Of course, I don’t mean that we should just remove Text and it’s semantics, but more like keep String's name and replace its internals with Text…? This sounds kind of tough, but maybe it’s easier than I think?

The issues that come with doing something like that, as people mentioned, comes in a lot of forms:
Firstly… one of the things at the heart of the problem is that, as someone previously mentioned, a String is ultimately just a [Char], and this is perhaps easier to abstract over than a monomorphic Text?

I’m not quite sure about how one would solve this… I think it would be best if, outwardly, String still acted similarly to [Char], but it would actually be invalid to say type String = [Char], as the internals would be closer to Text. I am really quite unsure about this, as I am not very sure about the internals and all I can comment about is that it would be more convenient if we picked Text over the current String and somehow replaced String with it while attempting to make it still act like String but with the benefit of Text? To me, this sounds like the premise of a really bad piece of fiction, but maybe people can think of things up to make it a reality… In case people complain about the special casing part and not the outrageous piece of fiction that I’ve written, I think it’s totally fair to do some fairly special things when it comes to String. I think, if nothing else, it would make things considerably more convenient that a string type can get some use out of the typeclasses like Foldable and the others. Perhaps we can create some sort of StringContainer type that could be used if nothing else, if we don’t want to rely on Char for anything… though, maybe it could be made to work with Text?

I think this will help with what @tomjaguarpaw previously mentioned, about being unable to stop people putting Chars inside []. I think that all we have to do is lie and make it so that [Char] is really an alias to the new String, or something similar?

I think this will also help with pattern matching, as it would now be simpler?
I’m really sorry if this all sounds really idiotic, but it sounds like the best case scenario, so I’m interested in what we would need to sacrifice to reach this…

Secondly… as previously mentioned, there’s the issue of whether we should pick lazy or strict variants of Text. There also has been recommendations to have Text be strict and use [Text] for laziness, I believe.

I’m not really too picky about this part, as I think it would be frankly equivalent in either situation… though, I suppose it would perhaps be better to go with the lazy text variant; I don’t see much benefit in having strict text be the default, and I feel like a “default” Textnot breaking expectation from the demand ofString` would be without much fault. As I said though, I don’t really have any strong feelings here.

Once again, I’m really sorry if this is sounding outrageous…

SWI-Prolog recently made a similar change:

(In ISO Prolog, a double-quoted string literal is converted to a list of character-codes: What type is a single quoted string? - Help! - SWI-Prolog .)

It’s not idiotic nor outrageous. Two things that come to mind:

  • String (or rather [Char]) has the unfortunate property of being pattern-matched quite heavily, so a lot of people’s code – like (x:xs) – will become invalid suddenly

  • There also has been recommendations to have Text be strict and use [Text] for laziness, I believe.

No idea who recommended it but it’s not a good advice, a dedicated streaming library should be used (conduit, streamly, etc).

I’d prefer the other way round: Let GHC store all data in a serialized way, as demonstrated by the Gibbon project. Then String is stored as efficient as Text, and we can drop Text and all of its flavors.