The Quest to Completely Eradicate `String` Awkwardness

Either that or it would require Rust-style editions, as opposed to whatever this was supposed to be:

…so both are probably just as unlikely to happen.

Also a daily reminder that moving things into base is not free, because it’s not (yet) reinstallable. Shipping bugfixes becomes slower and changing API requires a CLC proposal.


On another note, which I don’t see discussed yet. String is defined in the Haskell report:

1 Like

Regarding that note:

  • 30 Foreign.Marshal
# ghci
GHCi, version 9.8.2: https://www.haskell.org/ghc/  :? for help
ghci> import Foreign.Marshal
ghci> :t unsafeLocalState

<interactive>:1:1: error: [GHC-88464]
    Variable not in scope: unsafeLocalState
ghci> 

…so is adherence to the Haskell 2010 Report still an “absolute” requirement? If not, then type String = [Char] can also be moved out to its own library, possibly along with (the String-based versions of) Read and Show.

The requirements are ad-hoc and depend on the current CLC. As long as I am on it, I’m likely to vote against proposals that violate the standard. I know at least a couple of other CLC members are much more lenient when it comes to it.

1 Like

I think one very big impediment to getting rid of String is that you simply can’t stop people using String. String is just [Char]. Char is never going away. [] is never going away. You can’t stop people putting Chars inside []s. It’s too easy to use to eradicate.

I think efforts are better spent making sure Text is easier to use, particularly, making sure the preferred API of every popular library uses Text instead of String. Something else that would help in this direction would be a putative extension StringLiteralsAreText. OverloadedStrings is too general to be ergonomic, in my opinion.

14 Likes

Is advocating (strict) Text not premature optimisation ?

I love String, the API is great (in the Prelude, can use all List functions without import, name clashes etc, can pattern match, lazy) and most of the time quick enough. So in my opinion they are not only “good enough” but actually better than Text. The only advantage of Text is potential performance gain that I usually can not see or care.

When I encounter a performance issue, then I switch to Text but then I am not even sure than strict Text is the best. When talking about Text people then talk about Builder or list of strict Text, is it not what Lazy.Text is (a list of strict Text) ?. Lazy Text seem to me the best of both worlds (why would I waste time copying a two strings to concatenate them when I can just keep the two in memory).

Do we actually have evidence that strict Text perform better than lazy ones ?

In a ideal world, the compiler should be able to map internally String to Lazy.Text or at least be able to allocate a full literal string in one chunk.

2 Likes

To expand on this, there is a text data structure that has relatively good asymptotic complexity for all sensible operations (at worst a log factor slower than Text):

Why don’t we make that the default? Do we have benchmarks showing the space and time overhead compared to Text?

2 Likes

Exactly!

Core libraries or others have no obligation to support String/FilePath forever.

The sad part about this is that they are taking up the namespace as historical mistakes.

I feel people have too high expectations of base. Would it needs a redesign? Most certainly. But between haskell report adherence and lack of good migration strategies… I don’t see that happening.

So these days I’m thinking: maybe the way forward is to shrink base as much as possible, because everything outside of base is:

  • not subject to those pesky CLC proposals :wink:
  • reinstallable
  • irrelevant to the Haskell report

Alternative preludes can then achieve more opinionated interfaces and the community can vote with their feet.

That still leaves a bitter aftertaste in my mouth though when I have to explain to newcomers that:

  • String was a mistake
  • lazy IO (so… half of base) is unsafe

Sometimes you don’t get a second shot at an implementation/interface as this discussion reveals… maybe that also explains why CLC sometimes is overly conservative.

4 Likes

Another (early) internal-implementation option is a “juxtaposed-subnode” arrangement (there have been other terms) - for lists, the idea being that the tail/rest of the list is next to the cons node, rather than being in some other part of the heap (accessible via a second reference). Lists would then be more like short arrays, with either an nil node at the end, or an indirection to another shared list (with singly-used lists being automatically appended during GC).

But this arrangement could have benefits for all structured Haskell values with at least one lifted component or field: it isn’t list-specific.

So would this bring [Char] closer to parity with lazy Text?


If so, is abstract monadic I/O an abject failure?


Then you can tell the SWI-Prolog folks that they also got it wrong:

(…while the rest of us watch from a safe distance ;-)

There is already default mechanism for Num (via default keyword as in default (Int, Double)). Maybe it could be something similar could be done for overloaded strings (defaultString) ?
It could be part of OverloadedString extension.

At what point is it more beneficial to discuss moving text and primitive and deepseq and many other “core libraries” into the compiler as well? base is already being absorbed in 9.10, to GHC.Internal.* modules. Perhaps a good move is to merge everything into the compiler libraries, and only then export from base? It might permit a smoother migration trajectory than changing base outright, allowing for better integration from first steps. It would also permit libraries to have a smaller dependency footprint, by having base re-export these core libraries and giving them broader accessibility and discoverability. It’s evident that not everyone knows about Hoogle!

Doesn’t ExtendedDefaultRules cover that?

1 Like

Possibly, but from what I understand by reading the doc, it only works within ghci and only for some classes (and IsString) is not one of them. Please correct me if I am wrong.

Nope, the point of it is to use ghci defaulting rules in normal modules!

Reading the docs, it does seem like it only applies to a few classes, but I’ve definitely used default (Text) with OverloadedStrings before, 🤷

Thank you for writing that down. Is there a migration plan for replacing String pattern match with text? It looks we would need new view patterns, for example to match a text starting with a given character, or am I missing something?

Also, looking at Data.Text.Read, is there a reason why the error message is a String? Perhaps the eradication should starts there :slight_smile:

Unless I’m misunderstanding what you’re saying, we can already use uncons with ViewPatterns to get that sort of behaviour. Though it will always be more verbose than for String.

Ha, that is a bit ironic. Maybe MonadFail compatibility?

Oops, I meant pattern synonyms, similar to the ones from Data.Sequence. That is, to provide a simple migration path for the String users that rely on the list pattern to match strings.

1 Like

It’s on the way.

3 Likes

So once base is reduced to one or more modules which merely re-export from other modules, such as GHC.Internal.*:

  1. Move String into it’s own library somewhere, whose contents are then re-exported via base.

  2. Re-export Text via base, with deprecation-alerts added in that String library.

  3. Then stop re-exporting String from base altogether.

  4. Those deprecation-alerts can then be removed from the String library.

In that way, an effort to replace String with Text can help to make base (as it is now) smaller, rather than adding to it: nicely-done, mixphix!


So if an experienced Haskeller can be confused in that way, spare a thought for new arrivals - perhaps boring old SML-style:

explode :: Text -> [Char]
implode :: [Char] -> Text

would be a simpler option for educational purposes?

Maybe not with these names… that’s giving me PHP flashbacks…