Originally titled "The Quest to Completely Eradicate String
"
I have been recently reminded of this thread: Informal discussion about the progression of `base`:
These are some of the most awkward parts when it comes to handling strings in Haskell right now:
Show
, String
vs Text
, OverloadedStrings
, and string interpolation
And it is kind of something that I want to rehash right now for no particular reason. It feels like these days, we’re making a lot of changes to how Haskell works by default and violating some past expectations, so hopefully people are more agreeable when it comes to some things I mention here. A lot of this has been spoken of over and over but simply never implemented… yet!
My feelings about the issue with strings being something that people have been working around for a little bit now, and I feel like all the solutions people come up with have been bandages that work around the main issue at best.
The whole way the Haskell ecosystem deals with strings feels like one of the most awkward and flimsy of any language I’ve dealt with which is really annoying.
One of the first issues people deal with are whether to use String
, or Text
. And it is always better to use Text
, but you’d usually see people avoid using it because it isn’t in base
, and also because it is just more convenient to work with String
s. Especially without OverloadedStrings
… and even those are not a panacea, either, as they’re known to introduce some issues when it comes to inference in particular. Unfortunately, educational material mostly uses String
. I don’t know of many well known projects that use String
instead of Text
, personally, but I know that there is a fracture in terms of which to use, because String
is more convenient for being in base
, despite Text
being the better solution (though more awkward because some thing are built around String
s, and because of OverloadedStrings
)
That covers the last 3 things in the title, but what about Show
? It’s kind of well known that Show
is an annoying typeclass in the community, I believe. It doesn’t really match well between how you’d have other usages of similar functions like for other languages, like Rust’s Display
and Debug
. That is because it’s intended more of a counterpart to Read
… And the issue of Show
revolving around String
, rather than using the better option of Text
.
There’s a few different workarounds for Show
, like text-show
. And for example, pretty-simple
for pretty-printing. But the need to rely on these libraries for what I would say is just behavior that’s simple enough to build into base
(or core libraries) by default is really irritating, and it causes people to reach for the more ad-hoc, less reliable solutions, which further cause a fracture in terms of usage.
In ancient times (2 years ago), there has been some great discussion about merging Data.Text into base
, which I totally agree with, and while it would probably help a lot with the String
/Text
/OverloadedStrings
issue… the mechanics of Show
would need to be re-discussed IMO, and if we should remove it, change it’s behavior, or add entirely different typeclasses like Display
and Pretty
for user-facing output and pretty printing respectively. That being said, OverloadedStrings
wouldn’t be required anymore for Text
, but it could be useful for bytestrings, and it feels like from there we’d kind of fall into the same issues we get today, although less so, since I presume there would be less inference issues.
[NOTE: The post effectively ends there. The next part kind of involves some incoherent ramblings that I’ll have to edit later… If you want to ignore this, you can skip until the next note you see]
Recently, there has been a little bit more discussion on the delightful proposal that intends to add convenient string interpolation into Haskell, and in such discussion I feel there have been some issues with the design of how they exactly want to present it to get it working in a generic way for all the different forms of strings. Personally, I think a part of this stems from the issue that comes from Show
and how it’s simply not doing what people need it to do.
Nevertheless, there’s also the issue of whether to distinguish a normal string literal from an interpolated one, and if we need to distinguish an interpolated one… I think we have some nice things to add, but I’m not sure.
This is (well, barely, really), a bit of an odd solution to how we would do things without OverloadedStrings
. I don’t particularly have a good syntactical suggestion for this, I would probably have to ask people to recommend them, but distinguishing string literals allows us to not only distinguish an interpolated string from an interpolated one, it would allow us to distinguish string literals that represent different types. so for example, you could do b"hello world"
to represent a Strict ByteString. Of course, that’s just syntax I came up with on the fly, but I think that it’ll help make errors a little easier to understand than OverloadedStrings
. Specifically, I’m imagining this in a world where String
was removed from base
entirely and replaced with Text
, in which case the default string literal would simply evaluate to Text
. I know that one thing that should be thought about is whether to provide possibilities for all the different forms of bytestrings, like short bytestrings, lazy, strict, etc. I suppose at that point it gets a little too much, and the idea starts to sound really stupid. Uh, I guess you can ignore it, if so…
Nevertheless, the point I brought this up isn’t exactly to theorize about some badly thought out form of string literal, but to show that I think it’s an example where if Show
was a little bit better, and if people were to do away with String
entirely, it could have been made a little easier. Or maybe not, now that I think about it. So perhaps just ignore me mentioning that PR at all other than saying that it’s delightful… it would definitely be a very nice QoL addition to Haskell…
While the main topic of this post is about strings and how we handle them, I also feel like some part of the issue has to do with the fact that base
is kind of in lockstep with the compiler, there has been some discussion about separating it from the compiler, but I’m not quite sure how helpful that is. It feels like another part of the issue has to do with the fact that typeclasses are kind of difficult to deal with if you change them across versions… it turns out, some module shenanigans (via cabal
?) are a possible solution, but I have never seen them touted or recommended…
[NOTE: The rant ends here!]
Sorry for this being such a messy, stream of consciousness type of post… I think to help make it clear what my intentions are, I think the best way forward for the Haskell ecosystem is to erase String
everywhere, not just in base
, and just start using Text
instead! It’s probably one of the most difficult things to do in Haskell right now, but I think it’s necessary, it feels like it has been for a while… it seems to me like delaying it will only make the issue worse in the future.