HF Tech Proposal #1: UTF-8 Encoded Text

That is not what I meant. backpack is a language feature similar to type classes, but then at the module level. So, I definitely wouldn’t call it an alternative prelude.

backpack-str is a Haskell package which provides a specification (similar to a type class) of which functions are required for a string-like type and it provides implementations (similar to instances) of that specification for several types.

An alternative prelude could (and probably should) be developed to show the advantages and disadvantages of the backpack approach before a definite choice is made to switch the actual base package to the backpack approach. I think such an alternative prelude could be developed right now, it does not need anyone’s approval.

1 Like

Hi, one of the authors of Backpack here.

Why has Backpack not taken off? The biggest reason is no one is working full time on it. Since finishing my PhD, I’m being paid to work on a different open source project (PyTorch), and between that and raising a new baby, I don’t really have time beyond helping answer people’s questions about Backpack in threads like this :slightly_smiling_face:

But let’s say that we did have someone pushing Backpack (something I’d love to see! I still believe in the project). What are the things they’d have to do first? Here’s my ordered list:

  1. Implement Stack support for Backpack (this is the number one reason why regular libraries can’t easily go off and start using Backpack, unless it’s entirely internal use it locks you out of the Stackage ecosystem)
  2. Introduce Backpack support to GHC’s build system (as the most important libraries you might want to parametrize are base which are built as part of GHC’s bootstrap process)
  3. Fix some of the major outstanding Backpack bugs; specifically https://github.com/haskell/cabal/issues/6835 is the most important one, https://github.com/haskell/cabal/issues/5434 is not too bad but pretty annoying

These are pretty hefty projects. But if you want to replace String in base, I don’t really see any other route!

5 Likes

Thank you for the response @ezyang!

Could you explain why this is the only route worth considering? Or what about the alternatives falls short and is unacceptable?

1 Like

To me, changing String in base is subject to the following constraints:

  1. Existing Haskell code must not break; at the very least, you have to go through a deprecation cycle where both the old and new forms are available (but personally, I think that String-based base will always have a place in the language)
  2. There must be exactly one canonical implementation of base; it’s not acceptable to copy paste base into base-string and base-text because now all further development on base has to be replicated in two locations
    This puts you squarely in the space of Backpack-like solutions.
1 Like

This puts you squarely in the space of Backpack-like solutions.

That’s true only if you define “Backpack-like” in a tautological way. Here’s an alternative route to dropping String, which to me doesn’t look much like Backpack:

  1. Add more type classes to base in the vein of Semigroup, Monoid, and IsString. It could be something in the vein of StringLike, my own TextualMonoid, or MonoTraversable. Whatever the classes are, String and Text would be their instances.
  2. Extend the default declaration to default the new classes to String.
  3. Abstract the types of all base functions that currently operate on String to accept any StringLike instance. Make sure the performance doesn’t suffer in concrete applications.
  4. Extend the default declaration so that Text can override String on user demand.
  5. Wait for the abstraction to propagate outward. Fix performance regressions as they happen.
  6. Switch to Text by default in base.

The benefit of this approach is that it remains squarely in the language and libraries. It doesn’t depend on an extra-linguistic package-level mechanism (i.e., Backpack), so the end result is less complex. The process to get that end result may be more painful, and performance may never be quite as good.

2 Likes

Typeclasses also introduce complexity in their own way. They obscure error messages and are yet another hurdle for beginners pass when they start learning Haskell.

The complexity that Backpack adds will be for library writers, not for end-users. Libraries can expose their default APIs, e.g. String and Text, as normal fully instantiated Haskell packages.

1 Like

Having both module signatures and typeclasses does add complexity.

And yet, my feeling is that sometimes module signatures can be the better solution:

  • For types that are not the “core” types of an API, but nevertheless used by most functions. Things like the strings used in error messages (possibly nested inside error wrapper types or exceptions). A typeclass solution would complicate the signatures of most functions for something that is a secondary detail.
  • For types that are central to your API, but that aren’t likely to vary through a single client’s use of it. For example, parsers: in a given program, you aren’t likely to use more than one stream type with a parser library. Rather than typeclass-heavy signatures, clients would prefer to have functions already specialized so some stream type.

base fits those two cases in relation to String.

I don’t think you are contradicting @ezyang’s characterisation. Note in particuar point 1: existing Haskell code must break. A type class based solution will break existing code (consider print (fromString "Hello")). That’s not to say a type class based solution wouldn’t be the better solution, just that it wouldn’t match @ezyang’s requirements.

I agree, and this especially hurts because even beginners would run into it. The reason type signatures would be simpler with Backpack is exactly because Backpack is outside the type system, which is why I call it extra-linguistic. I suspect this means that advanced users will find it constraining, but that may be a good trade-off versus scaring off beginners with complicated type signatures.

Majority of parser libraries in use have an abstract input type, I have written some myself. Even the venerable parsec has an input type parameter. I think the historical evidence is pretty clear in this case.

That’s exactly the reason I added the default extensions to my list, so the existing code doesn’t break.

As a general clarification to no one in particular, the reason I pointed out the typeclass-based solution roadmap was simply to push back on the claim that Backpack (or “Backpack-like”) is the only way forward. It’s not. It may very well be the best way forward, but I’m not qualified to judge that.

3 Likes

That’s exactly the reason I added the default extensions to my list, so the existing code doesn’t break.

Ah interesting! I’d be interested in seeing a worked example of how this can be made backwards compatible.

Is there anywhere I can read about what’s required for this point? I’m interested in putting some work into this, but I don’t know enough about the intersection of Backpack and building GHC to know what’s actually missing here. I don’t use Stack, so I’d probably do more harm than good on that point, but I’d love to help make Backpack applicable to base.

Actually your example already works even with OverloadedStrings, try it:

{-# Language OverloadedStrings #-}
import Data.String
main = print (fromString "Hello, World!")

The reason it works is because the OverloadedStrings doesn’t merely generalize the type of string literals. It also adds a built-in default declaration that makes all ambiguous types default to String when possible.

The simplest imaginable way this transition from String ends up happening would be by enabling OverloadedStrings extension and flipping the switch in that built-in default rule from String to Text. Of course this would be horribly irresponsible, and I’m not advocating for it. What I’m advocating for is exposing this built-in default rule and putting it in users’ hands, so they can switch away from String at their own pace. I put together a GHC proposal to that effect.

I think a straightforward first step is to make a base-indef that uses the signature from backpack-str instead of String everywhere.

I would also add proper integration with Hackage to the list of things that need to be done. There was a post recently on reddit where the author of the Raaz cryptography package ran into issues with backpack on Hackage.

2 Likes

I guess I should have tested my example :smiley: How about the below?

Basically, I never use OverloadedStrings because every time I do my code becomes full of ambiguous type variable errors. If your GHC proposal can resolve that problem and base can be polymorphic in string type without breaking existing code then that’s great!

{-# Language OverloadedStrings #-}
{-# Language FlexibleInstances #-}
{-# Options_Ghc -Wall #-}

class Foo a where foo :: a -> ()
instance Foo [Char] where foo = const ()

main :: IO ()
main = print (foo "Hello, World!")
    • Ambiguous type variable ‘a0’ arising from a use of ‘foo’
      prevents the constraint ‘(Foo a0)’ from being solved.
...
1 Like

You found one of the holes I’m trying to patch. The trouble with the built-in default that comes with OverloadedStrings is that it resolves only the ambiguities that involve the IsString class. It doesn’t do anything for your Foo. To resolve that, I proposed the extension NamedDefaults so you could declare

default Foo ([Char])

to resolve the ambiguities involving Foo. That’s not the end of it though, because if Foo is declared in a library module we need a mechanism to export the default declarations alongside the class.

1 Like

Doesn’t that then require adding default declarations to every class?

You will also need default declarations that feel wrong to me, like default Foldable ([]) for this code:

{-# LANGUAGE OverloadedStrings #-}
main = putStrLn (foldr (:) [] "Hello, World!")

Edit: I was a bit confused, this problem above is not about breaking existing code but rather about writing new code with OverloadedStrings. I did find an example that does break if base gets overloaded, namely:

{-# LANGUAGE FlexibleInstances #-}

class Foo a where
  foo :: a
instance Foo [Char] where
  foo = "Hello, World"

main = putStrLn foo

This code will be broken if it is not changed and putStrLn gets an overloaded type like putStrLn :: StringLike a => a -> IO (). Even if there is a default StringLike (String) declaration exported by the prelude.

Most of them I’m afraid. Not quite every class, Functor and Semigroup for example would be fine.

Your first example already demonstrates a compatibility break, but I don’t think it’s not as bad as what we already went through with FTP. Come to think of it, a default Foldable ([]) declaration would have come useful back then.

EDIT: On the bright side, the FTP transition is a proof that ambiguous type errors are not the end of the world. So we don’t really need to find and default every possible class, only the commonly used ones.

How would the typeclass defaulting mechanism handle datatypes in in base that have String fields? For example many of the exceptions from Control.Exception, like ErrorCall.

Not too well, to be honest. If one were determined to completely eradicate String from base, I suppose one could replace every String field with an existential type. But that would probably bring more problems than it would solve. Personally I’d rather leave ErrorCall and error :: String -> a alone, they’re not suffering from any performance issues.

This is interesting, but also a bit out of my ability/experience (apologies if my question / suggestion is naive or ridiculous), however I am wondering why the “simplest” case I can see doesn’t seem to fit: while not fancy, couldn’t we add what we need to base for the migration, provide the types/functions to make the conversion graceful, and update as we go? Some packages will have versions where a change is breaking and requires an update to downstream code. We have deprecation cycles for this reason.

AFAICT, it’s ok to break base in this way, as it’s part of addressing the wart, and GHC seems to do this. Is it unreasonable to skip the typeclass and polymophism and follow a more simple and direct approach?