The Quest to Completely Eradicate `String` Awkwardness

Per Wikipedia at least, UTF-8 didn’t rise to dominance until the mid/late '00s. Yet another fun example of Haskell being old.

If only people would treat this sort of thing as fun instead of complaining so much. We can deal with it - it’s just programming :slight_smile:

6 Likes

Actually I was thinking about a future Haskell Report, but that would be years away - I would be starting small with something like Hugs, and slowly switch its sources over to an abstract String/Text while removing Char, to get a better idea of where the problems are lurking…

3 Likes

Redefining String goes against the Haskell report and so does dropping Char .

Language editions when :frowning:

4 Likes

If you mean language editions like what’s possible in Rust…I would suggest waiting for the second Rust compiler to be fully operational: it too will need to support language editions, which would then provide a point of comparison with the existing compiler. A Haskell language-edition extension could then use those two implementations of editions as points of reference.

You mean when a certain language edition is enabled, it would also swap base and other libraries (because it can’t be cabal flags)?

I’m having a bit of a hard time imagining how that would work overall.

1 Like

I’m doubt the idea I describe below is worth the complexity/effort. There will imo always be different types covering different needs between builders, human text, string-like byte sequences and pinned vs unpinned. So I don’t believe one can eradicate the awkwardness completely. Merely reduce it!

But with that disclaimer out of the way:


I was more thinking of it swapping out what some names map to rather than whole packages. Something along the lines of -XFancyStrings that has fundamental compiler support inspired by backpack which means:

  • String literals are Text/String dependent on -XFancyStrings
  • There is a mechanism in place for libraries to provide different implementations for a given name dependent on the language edition.
  • This can be used to map names fromData.String and other relevant modules to implementations/types using either [Char] or Text.
  • For example definitions in Data.String would mostly map to one of Data.String.Legacy or Data.String.Text depending on the edition used.

That should mean:

  • A code base would compile without change if:
    • Strings are treated as opaque
    • The code base uses the same edition for all modules.
    • Dependencies provide a interface for the given edition.
  • Code access string internals can be compatible with both editions by providing different implementations for each edition.
  • Code compiled using the new edition can use packages that only have a old edition interface by explicitly converting arguments. (Or maybe GHC/HLS could even do this for us if desired)

In theory that would allow everyone to use a Opaque String type in a forward compatible way, and once all their dependencies are up to date they can just switch to the new edition themselves.

But as I said. I’m not convinced it’s worth it.

Of course if your code base does that you would simply have to either change it or remain on the old language edition (potentially introducing explicit conversions if your dependencies drop the old-version interface).

But I still think the vast majority of string (or FilePath) use is opaque if you look at it on a per-function level. And on all those functions where you simply pass a String along, or store it in some data type, nothing would need to change. Which is why I think Language editions could reduce the pain of a eco-system wide transition from String to Text a lot if it ever happened, but they still can’t make all code magically work with Text that worked with String.