Text variant for getEnv

Hello,

getEnv returns a String, so as someone who bases my internals on Text, I generally just pack/unpack and carry on… but this time I was wondering, if there was an existing Text variant, but that question quickly morphed into: why isn’t there a getEnv variant that returns Text?

I attempted to answer my own question, but only came up with partial answers that were about other aspects of the String/Text problem.

I also briefly thought about adding my own and sending in a PR, but couldn’t determine whether or not I’d be wasting my time raising the discussion to the core libraries committee.

Feedback is appreciated, thank you!

2 Likes

The getEnv function comes from System.Environment. Since that is in the base package I don’t think it can depend on the text package.

I’ve enjoyed using Envy as a nicer interface around environment variables. If you just want Text support, env-extra looks promising.

5 Likes

Good point. So I read up on that (“why isn’t text in base in haskell”), and I see how that discussion gets no where.

Also, it seems the issue is more about replacing String with a ByteString / Vector variant of some kind (that I’m not familiar with), not Text.

It is unfortunate that the haskell ecosystem cannot figure out how to allow this change to percolate through.

This leads me to a new question: Some threads I’ve read suggest all three/etc types have their place, so if I understand… that is suggesting that String has it’s place. When is it actually appropriate to use String in code you care about?

1 Like

Thank you for the references!

This is a pretty complicated discussion on its own! I’ve written a fair amount of Haskell using both String and Text, but I honestly can’t think of a time where the String type was actually what I wanted! The reasons I used it in practice don’t have anything to do with the type itself:

  1. It’s the default, and I don’t need to depend on a library for it. It’s still hard to depend on libraries outside full-on cabal project (ie for runhaskell-style scripts). That’s a problem on its own!
  2. A lot of existing libraries and APIs using String and don’t always have clear alternatives.
  3. OverloadedStrings needs to be enabled manually can sometimes causes ambiguous type errors in code that was fine without it.

None of these are great reasons! But changing a core type in a language with a lot of existing code is really hard. There’s a delicate balance between having the language stagnate and having too much churn for users and library authors.

4 Likes

The main advantage of String over Text is that String is a simple inductive algebraic data type while Text is opaque. So with Text you always have to use the provided library functions to manipulate it and that can be quite bothersome.

And Text itself is also not ideal for several reasons:

  • Text uses UTF-16 while most of the rest of the world uses UTF-8, so Text is usually less space efficient and you often have to copy it when interacting with the outside world. There is some movement towards text-utf8, but it is going slowly and it is uncertain if that will ever replace the current Text implementation. One of the problematic areas is fusion, this might make single operations slower, but it can eliminate intermediate allocation if you for example map a function over a Text several times.
  • Text is unpinned, which means that you need to copy it to be able to pass it to safe foreign functions. An example of when this limits performance is with the pcre2 library.
  • Text has some overhead which gave rise to the text-short package which claims that Text is not suitable for short text (I personally don’t think the overhead is that large), but I would argue that it is not suitable for long strings either. I would recommend to store larger text in a rope, for example the one defined in yi-rope. Ropes provide much faster functions for combining and splitting text: O(log n) for most operations instead of O(n).
4 Likes

OK, so I will attempt to summarize what I’ve taken away from this discussion:

  • @Tikhon beautifully summarizes my experience, I guess there are probably many more that have had the same experience. I wonder how many of already gone down this road and gave up some where else along the path.
  • While we work a lot with the Text datatype, ByteString is a more appropriate replacement for String in most situations (particularly getEnv and related functions interacting with the environment - those values are not best represented/handled with Text). My topic/question here should have focused on ByteString, not Text. Thanks to @jaror (and several others that spoke to me privately about this) for helping me understand the details.
  • It’d probably be useful to add some helper functions that convert to Text at times, but we should be talking more about replacing String.
  • There is no good use for String, and it’s continued existence is inexplicable, except inertia (and I would guess politics?).

As a side note, I have attempted to figured out where to make the proper “proposal” for such a change, but I couldn’t locate it and gave up (more than once). In particular, I couldn’t figure out if I should be reaching out to the base maintainers, or GHC, or someone else, or where the correct place to put that proposal.

Please LMK if you think there is a better next step than "Proposing we deprecate use of String, to be replaced by ByteString".

One last question: Am I just walking into a political dust storm? Am I crazy to advocate for change here? Is this a waste of time?

Thanks for the feedback and discussion on this :slight_smile:

One thing that has not been mentioned yet is backpack. Supporting multiple string types was one of the motivating use-cases. And it has been implemented in the backpack-str packages. I think we should switch to a modular system using backpack instead of switching to a particular string implementation.

2 Likes

That sure sounds reasonable, though I would also ask (b/c I’m not familiar with backpack):

  • what are it’s known faults? are those acceptable here?
  • will the modularity (and being able to switch backends so to speak) degrade the UX in some way? or how ergonomic is it in the real-world?
  • are there other options to consider beyond backpack and bytestring?

thanks for the discussion