It would only really make sense in the context of packages that provide all the necessary functions around said newtypes, so 90% of the text package would still exist in that sense (though renaming it utf8 would make more sense).
Also I don’t like long names and wouldn’t want to specify the encoding every single time I talk about the type, so while I would want this to be roughly Bytes (encoding :: Type), I’d still want to only type Bytes and have GHC route the encoding behind my back.
Perhaps the “original mistake” in Haskell’s design (for this ongoing saga, anyway) wasn’t the definition type String = [Char], but having a Char type at all in Haskell, particularly as Unicode breaks the ASCII-era mapping of single letters (as seen on ASCII keyboards of that era) to machine-precision integers.
If String was abstract, then the encoding of the various individual Unicode glyphs (or “letters”, grapheme clusters, whatever) would be an internal implementation matter. This would have allowed the shortest possible String to be just one Unicode glyph:
"ते" :: String
which would be far more intuitive to a native writer of Devanagari than typing out:
['त', 'े'] :: [Char]
…especially considering that the computer was invented to serve our purposes.
From 1990 back to now - since someone here had an adverse recollection upon seeing those SML functions:
stretch :: Text -> [Text]
scrunch :: [Text] -> Text
So forget String - at least it could eventually be changed to type String = Text - it’s Char that’s got to go (unless the Unicode Consortium suddenly decide that characters are necessary; a rather unlikely occurrence). Moreover, the decision to introduce a separate type for strings in SWI-Prolog means that a Haskell implementation of abstract Text could reuse recent research from another declarative language…
100%. Historically speaking, programming languages designed for text processing (Icon, SNOBOL, OmniMark, …) always had string as the basic type. The Char type was a crutch used by system-programming languages to provide at least some text processing ability.
Actually I was thinking about a future Haskell Report, but that would be years away - I would be starting small with something like Hugs, and slowly switch its sources over to an abstract String/Text while removing Char, to get a better idea of where the problems are lurking…
If you mean language editions like what’s possible in Rust…I would suggest waiting for the second Rust compiler to be fully operational: it too will need to support language editions, which would then provide a point of comparison with the existing compiler. A Haskell language-edition extension could then use those two implementations of editions as points of reference.
I’m doubt the idea I describe below is worth the complexity/effort. There will imo always be different types covering different needs between builders, human text, string-like byte sequences and pinned vs unpinned. So I don’t believe one can eradicate the awkwardness completely. Merely reduce it!
But with that disclaimer out of the way:
I was more thinking of it swapping out what some names map to rather than whole packages. Something along the lines of -XFancyStrings that has fundamental compiler support inspired by backpack which means:
String literals are Text/String dependent on -XFancyStrings
There is a mechanism in place for libraries to provide different implementations for a given name dependent on the language edition.
This can be used to map names fromData.String and other relevant modules to implementations/types using either [Char] or Text.
For example definitions in Data.String would mostly map to one of Data.String.Legacy or Data.String.Text depending on the edition used.
That should mean:
A code base would compile without change if:
Strings are treated as opaque
The code base uses the same edition for all modules.
Dependencies provide a interface for the given edition.
Code access string internals can be compatible with both editions by providing different implementations for each edition.
Code compiled using the new edition can use packages that only have a old edition interface by explicitly converting arguments. (Or maybe GHC/HLS could even do this for us if desired)
In theory that would allow everyone to use a Opaque String type in a forward compatible way, and once all their dependencies are up to date they can just switch to the new edition themselves.
Unless you don’t do anything with them Strings are rarely treated as opaque. String being lists, pretty much everything done to them (split, map, combine, etc …) will be done with list functions from the Prelude. In fact, that’s even the point of using Strings, to use them as list (the opposite of opaque).
Of course if your code base does that you would simply have to either change it or remain on the old language edition (potentially introducing explicit conversions if your dependencies drop the old-version interface).
But I still think the vast majority of string (or FilePath) use is opaque if you look at it on a per-function level. And on all those functions where you simply pass a String along, or store it in some data type, nothing would need to change. Which is why I think Language editions could reduce the pain of a eco-system wide transition from String to Text a lot if it ever happened, but they still can’t make all code magically work with Text that worked with String.