GHC String Interpolation Survey Open!

I was tempted by this argument but I think I reject it. String interpolation is also going be one of the first things a newbie hits, and I worry that a newbie will try to use this feature to write an apparently-simple program, mistype something, and get a big error about missing instances involving builders and probably MPTCs. At which point, our hypothetical new Haskeller will frisbee the laptop out the window in frustration — a great loss.

A design that provides clean code for newbies but incomprehensible errors when you do the slightest thing wrong is a massive double-edged sword. (I base this on personal experience with advanced “if it typechecks, it definitely works”-type libraries that use advanced GHC features, where I failed to produce a working program or resigned myself to doing things in very bad ways to get anything at all.

Whatever we design here and bake into GHC needs to fail gracefully in a newbie’s hands. IMHO, this includes at least the following situations:

  • Syntactic errors: missing }, writing #foo instead of #{foo} (or whatever splice character we chose), writing #{foo (} or other parse errors in the splice;
  • Interpolating values of types with missing instances;
  • Interpolating where the output is an IsString t => t (i.e., underconstrained for some reason)
  • Interpolating a value of a type where there are instances Interpolate Foo String but not Interpolate Foo Text (or vice-versa, or involving lazy/strict Text, or…)

That makes me lean towards simplicity at the expense of expressive power and being the complete string-templating solution for all cases.

This makes me think that interpolating only to String might even be the best option? This removes the MPTC and then you can define a single-parameter class Interpolate. Then defining an interaction with -XOverloadedStrings that inserts fromString into the results of an interpolated overloaded string, ideally in a way that avoids materialising the entire string as individual Char along the way.

Desugaring example:

{-# LANGUAGE OverloadedStrings #-}
t :: Text
t = s"Name: ${name}, Age: ${show age}"

-- Could desugar to this. Unsure if it's better to have all the string chunks converted to `Text` ASAP, or whether it's best to consume the entire string and allocate a single `Text`.
t = mconcat
  [ fromString "Name: "
  , fromString name
  , fromString ", Age: "
  , fromString (show age)
  ]
1 Like

@BurningWitness

  1. In addition to “Ew Template Haskell”, the only way to implement string interpolation with quasiquoters is with haskell-src-meta, because the compiler doesn’t provide an easy way to convert the string "show name" to the expression show name. And haskell-src-meta is a heavy dependency. So it’s also a high cost to ask users to include it in all their projects.
  2. Manually interpolating strings or using printf are still not satisfactory to me. If you have a multiline template, you either break it up or lose the context of where in the string a given variable is injected
emailBody =
    s"""
    Hello ${name},

    Your package "${package}" has outdated dependencies on Hackage.
    Please update "${package}".

    Sincerely,
    Hackage.org
    """

emailBody2 =
    """
    Hello """ <> name <> """,

    Your package \"""" <> package <> """" has outdated dependencies on Hackage.
    Please update \"""" <> package <> """".

    Sincerely,
    Hackage.org
    """

emailBody3 =
    printf
      """
      Hello %s,

      Your package "%s" has outdated dependencies on Hackage.
      Please update "%s".

      Sincerely,
      Hackage.org
      """
      name
      package
      package

I find string interpolation to be the only ergonomic way to express these kinds of programs.

@VitWW Please respond in the survey :slight_smile:

@jackdk “Whatever we design here and bake into GHC needs to fail gracefully in a newbie’s hands” - I agree, but I don’t see why that implies the rest of your comment. We have all of these issues already with Num, Monad, or OverloadedStrings. And as an aside, just as there’s a subset of developers who are adamantly against using OverloadedStrings (personally, I’d never use OverloadedLists), I’d imagine there’d be a similar subset who would never use StringInterpolation, but IMO I don’t think that means we shouldn’t have it.

I can see the appeal of an approach that ignores complex use-cases like SQL and focuses only on interpolating to String-like types (string/text/bytestring). And it would certainly be better than the status quo. But I think keeping the feature general allows users to be innovative with the kinds of APIs they can provide, especially since it’s not lightweight to implement outside of a native compiler extension (see above comment about haskell-src-meta). Yes, we’ll probably get some cursed APIs out of this, and maybe it adds some more footguns for newbies, but I personally value the potential for innovation here over being conservative about the kinds of expressions we want to allow.

1 Like

It’s fine enough for a prototype, IMHO, since string-interpolate already depends on it and people seem to like and use that.

Num is widely acknowledged as a poor design, and probably shouldn’t be taken as licence to ship confusing things. I think that a more accurate comparison would be Foldable. It makes things much better for veterans but more confusing for newbies. (I say this as a course tutor for a first-year/freshman Haskell course, where handwaving typeclasses before students had mastered recursion made things harder for the weaker students. The Foldable/Traversable Proposal (FTP) was before my time, but IMHO we should have made the default a bit more newbie-friendly while still making it easier for advanced users to pick up the power. This might have looked like keeping the monomorphic list-consuming functions in Data.List (and re-exporting these through Prelude), and providing polymorphic ones in Data.Foldable for alternate preludes to set up.

It seems to me that it would be very valuable to allow newbies to opt into a form of string interpolation that’s going to fail comprehensibly in the common use cases. I lead a team of Haskellers at work, and part of that is helping decide very consciously how far we’re going to go with our Haskell. While we’re not a “simple Haskell” shop by any means (we use lens, there are a few GADTs around, etc.), we definitely have to consider the teachability of our chosen dialect and make sure that each approved extension carries its weight.

If I can’t explain to a newbie an error message the comes from a reasonable-looking misuse of StringInterpolation, then I can’t teach it to newbies nor expect them to use it without at least an initial handle on all the features upon which it depends. We see this problem with lens: before people really get around it and understand how to read and respond to the type errors it throws up, it is mysterious and frustrating to use. I think string interpolation needs to be simple enough to not have this property.

Is this the real crux of your objection to shipping a prototype library to explore the design space for interpolations? If so, I recommend pausing this process and putting up a GHC proposal for exposing its parser in a way that’s going to be convenient for TH use. A convenient way parse snippets of Haskell source into an ExpQ/DecQ/PatQ or whatever would be fantastic.

IME, the more bells and whistles I add to a design in the name of “future extensibility” and “potential for innovation”, the more I find that I’ve overdesigned the thing I’m trying to build. Then I end up having to trim back the design to extend it in the ways I actually need, or to make the design comprehensible to others. If you’re shipping a new extension to GHC, you won’t have that luxury and it will stick around approximately forever. I want your proposal to succeed, by which I mean “is enabled by many users” not “is landed in GHC and released”. I think the best way to do that is to seriously study the use cases that it’s built for, and make sure that new Haskellers can pick it up and run with it. The survey has but a couple of short examples but I think it’s worth eunmerating more and seeing how they behave under the different proposed schemes. These use cases should, IMHO, include failures: lexical failures where the interpolations are malformed, failures where the interpolations refer to the wrong things, failures where the result type of the string is ambiguous, failures where the extension is disabled (will it just report “variable not in scope: s”?), etc.

Good string interpolation has the potential to be a great QoL improvement, and I commend you for taking it on. But it also has the potential to be a great newbie-confuser, and I would really rather that didn’t happen.

Also, why is the interpolation character s? I could understand i or f, but s surprises me.

2 Likes

Issue tracked here: #20862: Provide basic QuasiQuoters in template-haskell? · Issues · Glasgow Haskell Compiler / GHC · GitLab

Perhaps my comment went a bit off the rails. I think I’d summarize my comment as “I don’t think string interpolation should be special cased for string-like types”. But thinking about it more, perhaps I’m over-indexing on the cool stuff you can do with JavaScript template literals, and forgetting that f-strings in python and normal backticks in javascript are only strings, so maybe that is useful enough.

The flip side of this is that once the extension is in GHC, it’s hard to extend it without another extension.

FWIW most of these would be the same for all the different combinations. Lexing is the same for all the options, behavior when the extension is disabled is the same…

s is what Scala uses. I decided against f because if the extension is disabled, f"..." parses as a function call, and f is a really common name for a function, but s isnt. i could work, but i find it just as bespoke as s.

That’s because you want to reference functions from inside string literal, which I don’t think is necessary. As I wrote in #11, you can move the references out of the literal. It’s only slightly more bulky, but otherwise should be the same as string interpolation with a bunch of let-bindings outside.

Indeed, that’s why I’d like to draw a line between structured multiline strings (HTML/SQL), and short unstructured ones (logs, errors that kill the application). For the former keeping track of context is critical, so references should take up as little space as possible, i.e. you’d want to let-bind everything. For the latter the extra overhead of " <> foo _ <> " doesn’t really matter.

1 Like

That’s not correct. PyF for example uses GHC API to parse antiquotes not haskell-src-meta.

And I think implementing string interpolation as a librrary first is very good idea. It allows to try out Iterpolate type class and to see how well pieces come together in practice. Hopefully it will show pain points. It’s way to check whether SQL interpolation could work or it runs into problems.

  • Just how bad error messages when string is polymorphic?
  • How painful is defining instances? Does it require to define Iterpolate Text, Interpolate String, Interpolate Lazy.Text, etc?
  • What about polymorphic data types? Would instances which call interpolate break for example SQL interpolation?

Even as a toy library it will give at least some experience on writing and using it and will allow people to try it out.

Another conspicuously missing thing is formatting options. Those are absolutely necessary for working with floating point. For example cos 1 = 0.5403023058681398. But most of the time one want control on how many digit should be shown. There’s also width specification for poor man’s ASCII tables

3 Likes

Regarding haskell-src-meta: There is ghc-hs-meta that does the same thing using the GHC parser, so it has a much lighter dependency footprint (as ghc is already pre-installed). It was initially extracted from PyF. I became a co-maintainer some time ago, but I only updated it for new GHC versions, didn’t have time to make bigger changes so far (ie ghc-hs-meta doesn’t yet cover the whole expression AST).

1 Like

Yep, this seems like a big open question, particularly in the “multiple target types” version of the design. Many of these instances will be similar, and so there will be a gut instinct to define something like an overlappable (Show a, IsString s) => Interpolate a s instance, but that’s a can of worms (not least for the SQL or HTML use case, or any others where naive concatenation introduces security flaws).

You could probably work around this by providing a newtype for use with -XDerivingVia but that needs hammering out and testing.

There are many rough-cut libraries one could imagine to test aspects of this design:

  1. Provide just the typeclasses and work with manually desugared expressions, to test type inference of desugared expressions and the developer experience providing types that work with the interpolation machinery;
  2. Provide a TH quasiquoter that can test the syntactic expansion
  3. Provide a GHC plugin that can test the syntactic expansion, by defining a function s :: Buildable s => String -> s with s = error "Did you forget to include the GHC plugin? Use the GHC plugin to find applications of s to string literals and replace them with the desugared expression.
  4. Implementations of №2 and №3 can be made significantly simpler by allowing only variables in splices, and not arbitrary expressions. This is by far the most common use case, and will let users get a taste for the ergonomics.

I think interpolation is an easy feature to want in the abstract, but the devil is in the details. The MPTC design of class Interpolate will lead directly to an m×n instances problem if we’re not careful (m types to interpolate × n types to interpolate into). Such verbosity will make it less likely that library developers define all the necessary instance to make interpolation actually useful for application developers, severely limiting the value of shipping the extension at all.

@tomjaguarpaw is correct that the design space is vast. @brandonchinn178 mentions JS template literals, which makes me wonder if there’s a variant of JS-style tagged template literals that could work here. Make interpolation affect argument passing instead. An interpolation function could be a function of two arguments:

  • [Either String Int], a list of literal strings or indices into the argument set; and
  • HList xs, a heterogeneous list of the values that appear in splices.

Then you can return whatever type you want, choose to use Show or not, and each interpolator can do things its own way instead of having to design the core machinery for every potential use case.

-- I'm going to invent a really ugly syntax here, inspired by -XQuasiQuotes
-- Given:
message :: Text
message = <text|Hello ${name}. You are ${age} years old.>

name :: Text
name = "J. Random Hacker"

age :: Int
age = 42

-- Desugars to:
message = text
  [ Left "Hello "
  , Right 0
  , Left ". You are "
  , Right 1
  , " years old."
  ]
  (name :. age :. HNil)

I think the upside of getting interpolation right is large, but the downsides of getting it wrong (impractical levels of ambiguity, permits insecure string building, unclear which interpolation targets should be defined, confusing the typechecker, confusing newbies) are also large. Experimentation is actually pretty cheap here, especially for suggestion №1 and the restricted forms of №2 and №3, so I think it’s really worth doing that and not just saying “she’ll be right”.

I guess what I’m saying is: I personally think good interpolation will be good for the GHC dialect of the Haskell language, and its community. But I think we’re being asked to vote on alternatives too soon, without a chance to hold the tools in our hands.

4 Likes

I get the sense that there is a split between people who want as simple a design as possible and those who want as expressive design as possible. A good compromise might be to add two extensions:

  • StringInterpolation: just the minimal explicit design
  • OverloadedStringInterpolation: the most popular non-explicit design

I think that would hedge us well against the design risks. I think for either camp the worst case is that the “wrong” design choice is taken and then nobody from their camp will use this extension. By having two extensions, we can easily alleviate this.

Considering the size of the design space, personally I would be tempted to get the first one out quickly and spend some more time figuring out the exact details of the second.

There’s also #24966: Antiquotation/splices for quasiquoters · Issues · Glasgow Haskell Compiler / GHC · GitLab. You made the the original issue quite a while ago and it has stalled, but I’m very much hoping to put some time into progressing this stuff at some point this year. I think this is and better error messages are the two big things holding quasiquotes back.

Are these the HTML and SQL examples from your proposal? I wasn’t sure about these. Of course they are just sketches rather than full libraries, but I think this sort of thing has a fundamental flaw in that object language parsing errors must be delayed to runtime. I think that would make a string interpolation based library struggle to displace existing packages, and so they aren’t great for justifying the design. But if they are analogous to things that you can do in JS that would make more sense, since I think JS people care less about spotting errors at compile-time.

Aside: the HTML and SQL examples actually suggest a different design to me, namely one where foo"..." works like a quasiquoter so you have a function foo :: [Either String Exp] -> Q Exp in scope and that gets run at compile-time.

4 Likes

Lots of languages manage to have string interpolation though. It is only as vast as we want it to be.

However I agree that there is many solution to chose from and maybe solution is to not chose at all and let the user decide. Why not use different “string specifier” for different use case:

  • s for “string” everything is String, no class or type inference issue
    s"${age}" complains that age is not of type String one must write s"${show age}".

  • i for interpolating string. No builder but i"${name} is ${age} years old" expand to `mconcat [interpolate name, "is “, interpolate age, " years old”.

  • ss use show for non String (and possibly fromString for string-like type)
    ss"${name} is ${age} years old" expand to `mconcat [name, "is “, show age, " years old”.

  • sl could return a list, sb or ib a builder etc …

Please note that when using interpolate, ambiguity can always be resolved with an explicit conversion to String.

Finally another option is to use different symbols for interpolation and raw argument. maybe ${...} and #{...} or $... and ${...}.

$age and would expand to show age and but show would be needed in ${show (age +1)}.

etc … The design space is vast indeed.

2 Likes

I like the idea of implementing this kind of like QualifiedDo - GHC would provide string interpolation syntax, desugaring, and semantics that are easy to understand using the StringInterpolation extension. The default semantics should have good type inference and present good error messages for beginning users. Then, I would want the ability provide custom semantics via something like QualifiedStringInterpretation to support additional expressivity - overloaded strings, SQL escaping, builders, whatever. You just need to import the definitions used in desugaring from a qualified module, and use the module name in the interpretation syntax - s"a ${x} b" becomes MyInterp.s"a ${x} b", and desugaring is identical except definitions from MyInterp are used instead of ones from Prelude.

1 Like

Thank you for this work. I had not found how to vote on the survey, so here is my opinion with a bit of justification and discussion.

A1. B1. C1. As simple as possible.

I think that everything else can be added later using a source plugin (and GHC can even propose a new kind of plugin, “interpolation plugin” which would be called only on the interpolation string for efficiency):

  • Implicit interpolation can be added using a ghc plugin, just walk the interpolation chunks and wraps all the expression by the required interpolate calls.
  • Builder too, if users wants performance.
  • The different Interpolate class, or overlapping instances, … all of that can be experimented with a source plugin in external libraries and maybe in a few months we’ll have a reference plugin which is widely adopted and could be moved into main GHC.

All of that can be done without introducing interpolation in GHC syntax, it can be done right now as a source plugin (Either stealing the quasiquote syntax, or using multiline string from 9.12). I’ve tried to re implement PyF as a source plugin recently and except for performances (e.g. walking the HsExpr GhcPs ast to look for string to replace, hence why I suggest a more targeted “interpolation string plugin”), everything is really smooth: PyF as a source plugin by guibou · Pull Request #146 · guibou/PyF · GitHub.

So considering that everything can be refined later as external libraries (or refined directly in GHC), I’m in favor of doing the simplest solution right now. Adding features (such as the explicit interpolation) is always easy, removing features seems a bit more difficult.

1 Like

Hi everyone, thanks a ton for the feedback. One thing I heard a lot of people mention is the desire for a prototype that people could build/run locally. So I implemented a prototype with 5 options discussed in this thread. See other post for details: GHC String Interpolation - Prototypes

4 Likes