The Quest to Completely Eradicate `String` Awkwardness

Some time ago this was written, presumably in passing joviality:

…but if things really are (already?) so bad:


  • data String a ...
    
    unitString :: a -> String a
    unitString x = ...
    
    bindString :: String a -> (a -> String b) -> String b
    bindString str k = ...
    
          ⋮
    
  • data Byte ...
    type ByteString = String Byte
          ⋮
    
  • data OS ...
    type OSString = String OS
          ⋮
    
  • data Char ...
    type Text = String Char
          ⋮
    
  • data Bin ...
    type BinString = String Bin
          ⋮
    

Or can some simplicity be salvaged from the morass of “stringy” types?

Before dreaming of eradication of String, someone would have to accomplish a much more mundane task of putting Text datatype into base. Yet I don’t see a queue of volunteers.

15 Likes

Mechanically, that doesn’t sound difficult. Out of idle curiosity, what additional work would one be volunteering to do beyond submitting an MR that moves the things from one library to the other and puts in aliases/deprecation warnings?

It would be easier to comment on additional work if someone describes first how exactly they would approach the task. Putting ByteArray from primitive into base is the closest example, which can serve as a model.

If this would happen, the amount of breakage would be monumental, hence it’s not going to.

Either that or it would require Rust-style editions, as opposed to whatever this was supposed to be:

…so both are probably just as unlikely to happen.

Also a daily reminder that moving things into base is not free, because it’s not (yet) reinstallable. Shipping bugfixes becomes slower and changing API requires a CLC proposal.


On another note, which I don’t see discussed yet. String is defined in the Haskell report:

1 Like

Regarding that note:

  • 30 Foreign.Marshal
# ghci
GHCi, version 9.8.2: https://www.haskell.org/ghc/  :? for help
ghci> import Foreign.Marshal
ghci> :t unsafeLocalState

<interactive>:1:1: error: [GHC-88464]
    Variable not in scope: unsafeLocalState
ghci> 

…so is adherence to the Haskell 2010 Report still an “absolute” requirement? If not, then type String = [Char] can also be moved out to its own library, possibly along with (the String-based versions of) Read and Show.

The requirements are ad-hoc and depend on the current CLC. As long as I am on it, I’m likely to vote against proposals that violate the standard. I know at least a couple of other CLC members are much more lenient when it comes to it.

1 Like

I think one very big impediment to getting rid of String is that you simply can’t stop people using String. String is just [Char]. Char is never going away. [] is never going away. You can’t stop people putting Chars inside []s. It’s too easy to use to eradicate.

I think efforts are better spent making sure Text is easier to use, particularly, making sure the preferred API of every popular library uses Text instead of String. Something else that would help in this direction would be a putative extension StringLiteralsAreText. OverloadedStrings is too general to be ergonomic, in my opinion.

14 Likes

Is advocating (strict) Text not premature optimisation ?

I love String, the API is great (in the Prelude, can use all List functions without import, name clashes etc, can pattern match, lazy) and most of the time quick enough. So in my opinion they are not only “good enough” but actually better than Text. The only advantage of Text is potential performance gain that I usually can not see or care.

When I encounter a performance issue, then I switch to Text but then I am not even sure than strict Text is the best. When talking about Text people then talk about Builder or list of strict Text, is it not what Lazy.Text is (a list of strict Text) ?. Lazy Text seem to me the best of both worlds (why would I waste time copying a two strings to concatenate them when I can just keep the two in memory).

Do we actually have evidence that strict Text perform better than lazy ones ?

In a ideal world, the compiler should be able to map internally String to Lazy.Text or at least be able to allocate a full literal string in one chunk.

2 Likes

To expand on this, there is a text data structure that has relatively good asymptotic complexity for all sensible operations (at worst a log factor slower than Text):

Why don’t we make that the default? Do we have benchmarks showing the space and time overhead compared to Text?

3 Likes

Exactly!

Core libraries or others have no obligation to support String/FilePath forever.

The sad part about this is that they are taking up the namespace as historical mistakes.

I feel people have too high expectations of base. Would it needs a redesign? Most certainly. But between haskell report adherence and lack of good migration strategies… I don’t see that happening.

So these days I’m thinking: maybe the way forward is to shrink base as much as possible, because everything outside of base is:

  • not subject to those pesky CLC proposals :wink:
  • reinstallable
  • irrelevant to the Haskell report

Alternative preludes can then achieve more opinionated interfaces and the community can vote with their feet.

That still leaves a bitter aftertaste in my mouth though when I have to explain to newcomers that:

  • String was a mistake
  • lazy IO (so… half of base) is unsafe

Sometimes you don’t get a second shot at an implementation/interface as this discussion reveals… maybe that also explains why CLC sometimes is overly conservative.

4 Likes

Another (early) internal-implementation option is a “juxtaposed-subnode” arrangement (there have been other terms) - for lists, the idea being that the tail/rest of the list is next to the cons node, rather than being in some other part of the heap (accessible via a second reference). Lists would then be more like short arrays, with either an nil node at the end, or an indirection to another shared list (with singly-used lists being automatically appended during GC).

But this arrangement could have benefits for all structured Haskell values with at least one lifted component or field: it isn’t list-specific.

So would this bring [Char] closer to parity with lazy Text?


If so, is abstract monadic I/O an abject failure?


Then you can tell the SWI-Prolog folks that they also got it wrong:

(…while the rest of us watch from a safe distance ;-)

There is already default mechanism for Num (via default keyword as in default (Int, Double)). Maybe it could be something similar could be done for overloaded strings (defaultString) ?
It could be part of OverloadedString extension.

At what point is it more beneficial to discuss moving text and primitive and deepseq and many other “core libraries” into the compiler as well? base is already being absorbed in 9.10, to GHC.Internal.* modules. Perhaps a good move is to merge everything into the compiler libraries, and only then export from base? It might permit a smoother migration trajectory than changing base outright, allowing for better integration from first steps. It would also permit libraries to have a smaller dependency footprint, by having base re-export these core libraries and giving them broader accessibility and discoverability. It’s evident that not everyone knows about Hoogle!

Doesn’t ExtendedDefaultRules cover that?

1 Like

Possibly, but from what I understand by reading the doc, it only works within ghci and only for some classes (and IsString) is not one of them. Please correct me if I am wrong.

Nope, the point of it is to use ghci defaulting rules in normal modules!

Reading the docs, it does seem like it only applies to a few classes, but I’ve definitely used default (Text) with OverloadedStrings before, 🤷

Thank you for writing that down. Is there a migration plan for replacing String pattern match with text? It looks we would need new view patterns, for example to match a text starting with a given character, or am I missing something?

Also, looking at Data.Text.Read, is there a reason why the error message is a String? Perhaps the eradication should starts there :slight_smile: