GHC String Interpolation Survey Open!

Hello! I started a proposal for adding string interpolation to GHC almost exactly 2 years ago, and after getting a working prototype, I have compiled a doc and survey for some of the options discussed in the thread. Please read the doc and vote in the survey! We would really like to see where the community lands on some of these questions.

Proposal: https://github.com/ghc-proposals/ghc-proposals/pull/570
Doc + Survey: Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.

19 Likes

B3:


s"Name: ${name}, Age: ${show age}"
-- Desugars to:
--   fromBuilder $
--     toBuilder "Name: "
--     <> toBuilder name
--     <> toBuilder ", Age: "
--     <> toBuilder (interpolate age)

I don’t get this example. Why show is gone?

Unfortunately, the survey thing keeps insisting that I need to enable JavaScript despite enabling JavaScript, so here’s my $0.02 on the matter.

A few additional costs that I can’t find mentioned in the proposal:

  • There is an N x M problem, where you have N interpolation targets, and M types that can be interpolated
  • The MultiParamTypeclass approach will likely require library authors to introduce orphans to avoid excessive dependencies. E.g., an SQL library will want to provide Interpolate instances for its SQL query type, but because these have to be parametrized on the argument type as well, the library must either depend on anything that it may want to interpolate into queries, or such instances must be orphans.
  • Because the interpolation mechanism is parametrized over both the target and the arguments, type inference will not work when either is polymorphic, so in nontrivial code, types will often have to be explicitly annotated to disambiguate. E.g.: "${23}" needs a type annotation to decide which type to use for the literal 23`.
  • The lexer, as described, deviates significantly from what users coming from languages like Bash would expect - ${"foo} and ${foo { bar = 23 }} and ${23 {- comment -}} should definitely be valid and work as expected. The only reason they are not is because it would make the Lexer more difficult to write - but IMO that’s not a valid excuse. These things work fine in Bash, they work fine in quasiquotes and splices, so having them not work in string interpolation would be inconsistent, and surprising in a bad way.
  • Defaulting to Show is questionable IMO, because of Show’s ill-defined semantics. This is a whole other rabbit hole, but in a nutshell, there doesn’t seem to be any consensus on what exactly show is supposed to mean, so how can we possibly be sure that show is the right thing to use?
  • At the same time, the intended meaning of “interpolating into a string” isn’t clear either. What does it mean to interpolate an integer into a string? What does the integer represent? What does the string represent? We don’t know, but how exactly we do the interpolation depends on the answers to these questions. The existing solutions are a bit clearer on this, or else leave the interpretation up to the user: printf has its own typeclass that defines specifically what it means to use a value as a printf argument, independently from any existing conversions to string; Template Haskell based solutions are domain-specific, and use whichever mechanism is appropriate for the domain (e.g., one that builds HTML strings will typically provide some kind of ToHTML typeclass, one that builds SQL queries will provide a ToSQL typeclass, etc.).
  • I honestly see very little benefit over the status quo. The Template Haskell solutions offer very similar APIs, but since they are implemented as libraries, they can be made as generic or as specific as the use case demands, and they don’t require any machinery in the compiler that’s not already there. The current downsides of this approach (Template Haskell being problematic for all sorts of reasons, having to parse Haskell source with a separate parser that might disagree with the compiler itself, etc.) are real, but IMO the way to deal with this is to fix those problems, rather than adding more complexity to the compiler to work around them. And people are already working on fixing those things - Template Haskell is slowly moving towards a more principled approach, which should hopefully eventually solve most of the practical issues; work is underway to formalize the various APIs GHC exposes, and tapping into GHC’s actual parsing machinery from user code should eventually become practical. People who reject Template Haskell out of fundamental principles, rather than current practical issues, are a tiny minority - I personally don’t know anyone who rejects metaprogramming as a whole, people who reject Template Haskell do so because it messes up cross-compilation, because it’s untyped, because of the staging restriction, or because of any number of other complaints - but not because it’s metaprogramming, which suggests that if all those other practical complaints were addressed, the Template Haskell solutions we already have would be perfectly acceptable for the overwhelming majority of Haskell users.
6 Likes

I once had a problem with the PyF interpolator: How are bytestrings encoded? · Issue #134 · guibou/PyF · GitHub
I opened that issue because this was a bug I didn’t notice until I saw �’s in the deployed program. Maybe I should have foreseen it, treating those bytestrings as if they were text, but the ease of using PyF elsewhere kind of lulled me into believing it handled things correctly and I wouldn’t have to actually think :wink: So basically, if conversions are implicit, it should be hard to shoot yourself in the foot in this way. If it’s hard to make it fool-proof, I would go for explicit. But I don’t know if it’s hard or easy to make this kind of thing fool-proof, so I don’t know what to answer in the survey :slight_smile:

Reposting my survey comments for the sake of more discussion:

  • I think these features are going to be tremendously valuable for the new Haskell programmer who is getting started. Thank you for tackling this.

  • I think any design that leads to SQL injection via implicit interpolation should be rejected, which probably means implicit interpolation without a separate builder, and anything involving an {-# OVERLAPPABLE #-} instance involving Show.

  • I think it might be cleaner and potentially valuable to desugar to mconcat instead of chained <>-applications, for monoids/builders with an efficient mconcat.

  • I think most options here are relying too heavily on magic typeclasses to wire into GHC. In particular, I worry that the MPTC with no fundep will to lead to poor inference and confused users (particularly when interpolating to an underconstrained overloaded string). Just comprehending the purpose of the Interpolate MPTC and Builder class will be a bit confusing (I base this off the number of people who bounce off regex-base's pluggable interface.) This might be a particularly frustrating experience for new Haskell programmers, as "Hello, ${name}!" is one of the things such a programmer wants to print to the screen.

  • If we end up with more implicit machinery, I think it will be valuable to make class Buildable very clearly connected to the interpolation machinery, either by ensuring it’s buried in module GHC.Foo.Interpolate, named class InterpolateBuilder, or similar. I have often seen classes get defined to set up some narrow type-directed machinery, and then get misused for all sorts of other things. The usual offender is a ToFoo class that starts out as a tool for narrow conversions, but grows increasingly elaborate instances.

6 Likes

Also, because it wasn’t obvious to me at the time I did the survey:

If you do explicit interpolation (possibly to a Builder that’s selected by the target type), you can probably pick up a lot of the power of package formatting, by only defining Interpolate instances that take formatting arguments.

I think I still prefer formatting as the most Haskelly solution to the string formatting problem, but I keep my opinion that understandable formatting syntax will make newbies have a better time. Much of these benefits are lost if a simple error in a format string turns into an explosion of type errors referencing advanced Haskell features.

It would also be good if any typeclasses that eventually go into GHC are published to Hackage somehow, so that people can define the necessary instances without locking themselves to the latest and greatest GHC.

1 Like

I’m concerned about the size of the design space: it is vast. I strongly doubt we are going to be able to converge on a good design in the abstract. I believe with a design space of this complexity, the only way to clean out all the dark corners and smooth off all the rough edges is to try out some implementations in practice.

Sadly, the GHC proposal process doesn’t lend itself to this strategy. But with good reason: once an extension is in GHC it had better be decent because it will be nigh-impossible to remove it. A mis-designed extension can cause a great deal of damage, especially if it is widely used, as I expect string interpolation to be. I think we should be extremely cautious to not choose the wrong design here.

My suggestion is to implement a library containing some of the points in the design space as quasiquoters, promote the library so people use it in practice over the duration of approximately a year or two, and then determine which is the one that works best in practice.

I know that implementing interpolators as quasiquoters essentially defeats the whole purpose of the proposal, but doing so would just be a stepping stone to getting the most appropriate one into GHC as an extension, enshrining it for good.

14 Likes

I agree, this feels like an admission that the compiler cannot do any better, so the solution to this must be bolting a convenience extension onto GHC.


It should be possible to do something like the following in the language already, right?

sqlquery
  """
  SELECT ${x}
    FROM ${y}
   WHERE ${z};
  """
  [ "x" := (_ :: SqlBuilder)
  , "y" :=  _
  , "z" :=  _
  ]

(and, yes, I would love it if I could move the overhead and error reporting in any such function to compilation time without the ugliness of Template Haskell, but this extension would never solve that either)

I have got something like this working, and I have to say, I really don’t like SQL as an example here.

It is just not a good practice to concatenate strings as SQL.
Because of the above, it invites cleverness to the string concatenating solution, which maybe just doesn’t belong there. And whatever cleverness sneaks in, it still can’t hold a candle to the TH based solution. And if it can, then it wasn’t a string concatenating solution, but (hopefully) better TH in disguise.

The proposal goes with better examples in the motivation section: “printing out logs, rendering exceptions, generating code, pretty-printing”. Use cases that don’t resurrect demons of the early PHP days.

3 Likes

ooh boy. As somewhat expected, lots of comments :sweat_smile: I won’t be able to respond to all of them, but here are some brief responses:

  • Yes, the example under B3 was wrong, I’ve corrected it
  • The proposal in the PR isn’t completely updated yet, so don’t read too much into it, such as the description of the lexer or what strings are valid or not. I actually got a much better implementation working in my prototype
  • In general, this survey is meant to gather feedback on the syntax of such a feature, if it were to get accepted. The committee could, of course, reject the proposal, regardless of the options we pick here. So if your position is of the form “we should just not do it”, this survey is not for you :slightly_smiling_face: Update: Survey has been updated to allow specifying whether you want the feature
  • @BurningWitness - yes that’s already possible, but if you read the Motivations section of the proposal, it explains why this isn’t satisfactory
  • @jackdk @jeukshi - Perhaps I didn’t make the example clear, but the point of the SQL example is that it’s safer than just concatenating strings and avoids injection attacks. For example, with implicit interpolation, you could define interpolate as SqlQuery "?" [SqlString s] and any interpolated value will be automatically escaped
5 Likes

In addition to the things that this proposal will allow us to express, we should also ask about what it shouldn’t allow us to express.

I feel like OverloadedStrings is an extension that can easily be missused. For instance you could write an instance that was instance Read a => IsString a where .... Something like that would be bad because quoted text is no longer being used to represent something “string-y” and it would make understanding code a lot harder. You now have to wonder whether there is something non-trivial (or partial!) going on in an instance.

I think string interpolation has the potential to have issues like this too. The more polymorphic we make the interface, the more scope there is for unexpected behaviour from string interpolation.

The example with SQL injection is already an example of that for me. To me, it feels like a disadvantage that string interpolation is allowing us to write things like this. It feels like it is being used as a means of syntactic extension, and in that case, I wish it was done explicitly using Quasiquotes, which as a reader of the code would make that explicit.

Can you elaborate on your point about SQL injection? In my earlier comment, I’ve already explained how this would make SQL queries safer and less prone to injection.

I just meant the example where you showed that we could avoid SQL injection by replacing the params with ?, etc. I’m not challenging the idea that something like that is helpful for avoiding injection errors.

My point is that it makes the code non-trivial to understand. String interpolation is not just being used as a simple way to glue bits of strings together, but rather it’s doing non-trivial work based on a typeclass. And that means that you should be quite suspicious of any code that uses string interpolation.

So, the more powerful this feature is, the more uncertain a user will be with what a particular instance means. This is a general trade-off with typeclass-y interfaces, they tend to make code easier to write, but harder to understand.

3 Likes

Got it. Yeah, I agree, it makes the feature more powerful and prone to misuse, but IMO that’s an advantage.

I don’t see much uncertainty here because when you see string interpolation in the code, you usually know exactly what type is being built (string/sqlquery/whatever), and the interpolation rules for that type should generally be consistent (e.g. if youre writing an Interpolate Foo SqlQuery instance, you should know to handle sql injection attacks). I don’t see this as any different than typeclass laws. Sure, you can implement whatever monad instance you want and make do blocks have surprising behavior, but no one would do that, and if they did, the reader would recognize that this do block has that type, so it has the semantics of that monad

But by all means, vote for the explicit option in the survey!

3 Likes

Good observation, indeed there are cases where we want find-and-replace, but not concatenation.

I agree with the underlying concerns, I disagree on the pain points:

  • Manually interpolating Text strings sucks because working with Text sucks. Data.ByteString.Builder doesn’t do any serialization typeclass shenanigans and is properly efficient, I love it and use for logging and error reporting in the game I’m writing. The only [giant] problem with it is that it can’t build anything but ByteStrings, which means I can’t [safely] use it for anything except binary serialization.

  • I would absolutely love it if there were some way to do compile-time programming in Haskell without Template Haskell. All of it is already possible with merely $$(), [|| ||] and fail, it just happens to be a monstrous non-reusable eyesore with staging restrictions.

  • If compile-time programming were easy, I imagine printf would be trivial to implement safely in Haskell, it’s simply a conversion from a string literal to a function. Not that I’d expect it to be the only solution to this problem.

From my perspective a string interpolation extension would trim down a 70-character logging line to, maybe, 50 characters, and that’s provided someone already did all the dirty work of writing the underlying library that defines all the conversions.

1 Like

If someone prefer Explicitly (s)he doesn’t require string interpolation.

String interpolation is about Implicitly and not about maximum flexibility.
So, imho most wanted feature is

  1. A2 - implicit
  2. B3 - add Bulder, interpolate to s

[…] the way to deal with this is to fix those problems, rather than adding more complexity to the compiler to work around them

100% agree. With easier QuasiQuotes there is no reason to use a built-in string interpolation mechanism, instead let each library provide what it thinks it’s the “right thing” (ex. Regex literals are quite different from SQL code).

In case this proposal moves forward I would suggest to use the most minimal approach. More complex use cases can be delegated to libraries.

I was tempted by this argument but I think I reject it. String interpolation is also going be one of the first things a newbie hits, and I worry that a newbie will try to use this feature to write an apparently-simple program, mistype something, and get a big error about missing instances involving builders and probably MPTCs. At which point, our hypothetical new Haskeller will frisbee the laptop out the window in frustration — a great loss.

A design that provides clean code for newbies but incomprehensible errors when you do the slightest thing wrong is a massive double-edged sword. (I base this on personal experience with advanced “if it typechecks, it definitely works”-type libraries that use advanced GHC features, where I failed to produce a working program or resigned myself to doing things in very bad ways to get anything at all.

Whatever we design here and bake into GHC needs to fail gracefully in a newbie’s hands. IMHO, this includes at least the following situations:

  • Syntactic errors: missing }, writing #foo instead of #{foo} (or whatever splice character we chose), writing #{foo (} or other parse errors in the splice;
  • Interpolating values of types with missing instances;
  • Interpolating where the output is an IsString t => t (i.e., underconstrained for some reason)
  • Interpolating a value of a type where there are instances Interpolate Foo String but not Interpolate Foo Text (or vice-versa, or involving lazy/strict Text, or…)

That makes me lean towards simplicity at the expense of expressive power and being the complete string-templating solution for all cases.

This makes me think that interpolating only to String might even be the best option? This removes the MPTC and then you can define a single-parameter class Interpolate. Then defining an interaction with -XOverloadedStrings that inserts fromString into the results of an interpolated overloaded string, ideally in a way that avoids materialising the entire string as individual Char along the way.

Desugaring example:

{-# LANGUAGE OverloadedStrings #-}
t :: Text
t = s"Name: ${name}, Age: ${show age}"

-- Could desugar to this. Unsure if it's better to have all the string chunks converted to `Text` ASAP, or whether it's best to consume the entire string and allocate a single `Text`.
t = mconcat
  [ fromString "Name: "
  , fromString name
  , fromString ", Age: "
  , fromString (show age)
  ]
3 Likes

@BurningWitness

  1. In addition to “Ew Template Haskell”, the only way to implement string interpolation with quasiquoters is with haskell-src-meta, because the compiler doesn’t provide an easy way to convert the string "show name" to the expression show name. And haskell-src-meta is a heavy dependency. So it’s also a high cost to ask users to include it in all their projects.
  2. Manually interpolating strings or using printf are still not satisfactory to me. If you have a multiline template, you either break it up or lose the context of where in the string a given variable is injected
emailBody =
    s"""
    Hello ${name},

    Your package "${package}" has outdated dependencies on Hackage.
    Please update "${package}".

    Sincerely,
    Hackage.org
    """

emailBody2 =
    """
    Hello """ <> name <> """,

    Your package \"""" <> package <> """" has outdated dependencies on Hackage.
    Please update \"""" <> package <> """".

    Sincerely,
    Hackage.org
    """

emailBody3 =
    printf
      """
      Hello %s,

      Your package "%s" has outdated dependencies on Hackage.
      Please update "%s".

      Sincerely,
      Hackage.org
      """
      name
      package
      package

I find string interpolation to be the only ergonomic way to express these kinds of programs.

@VitWW Please respond in the survey :slight_smile:

@jackdk “Whatever we design here and bake into GHC needs to fail gracefully in a newbie’s hands” - I agree, but I don’t see why that implies the rest of your comment. We have all of these issues already with Num, Monad, or OverloadedStrings. And as an aside, just as there’s a subset of developers who are adamantly against using OverloadedStrings (personally, I’d never use OverloadedLists), I’d imagine there’d be a similar subset who would never use StringInterpolation, but IMO I don’t think that means we shouldn’t have it.

I can see the appeal of an approach that ignores complex use-cases like SQL and focuses only on interpolating to String-like types (string/text/bytestring). And it would certainly be better than the status quo. But I think keeping the feature general allows users to be innovative with the kinds of APIs they can provide, especially since it’s not lightweight to implement outside of a native compiler extension (see above comment about haskell-src-meta). Yes, we’ll probably get some cursed APIs out of this, and maybe it adds some more footguns for newbies, but I personally value the potential for innovation here over being conservative about the kinds of expressions we want to allow.

2 Likes

It’s fine enough for a prototype, IMHO, since string-interpolate already depends on it and people seem to like and use that.

Num is widely acknowledged as a poor design, and probably shouldn’t be taken as licence to ship confusing things. I think that a more accurate comparison would be Foldable. It makes things much better for veterans but more confusing for newbies. (I say this as a course tutor for a first-year/freshman Haskell course, where handwaving typeclasses before students had mastered recursion made things harder for the weaker students. The Foldable/Traversable Proposal (FTP) was before my time, but IMHO we should have made the default a bit more newbie-friendly while still making it easier for advanced users to pick up the power. This might have looked like keeping the monomorphic list-consuming functions in Data.List (and re-exporting these through Prelude), and providing polymorphic ones in Data.Foldable for alternate preludes to set up.

It seems to me that it would be very valuable to allow newbies to opt into a form of string interpolation that’s going to fail comprehensibly in the common use cases. I lead a team of Haskellers at work, and part of that is helping decide very consciously how far we’re going to go with our Haskell. While we’re not a “simple Haskell” shop by any means (we use lens, there are a few GADTs around, etc.), we definitely have to consider the teachability of our chosen dialect and make sure that each approved extension carries its weight.

If I can’t explain to a newbie an error message the comes from a reasonable-looking misuse of StringInterpolation, then I can’t teach it to newbies nor expect them to use it without at least an initial handle on all the features upon which it depends. We see this problem with lens: before people really get around it and understand how to read and respond to the type errors it throws up, it is mysterious and frustrating to use. I think string interpolation needs to be simple enough to not have this property.

Is this the real crux of your objection to shipping a prototype library to explore the design space for interpolations? If so, I recommend pausing this process and putting up a GHC proposal for exposing its parser in a way that’s going to be convenient for TH use. A convenient way parse snippets of Haskell source into an ExpQ/DecQ/PatQ or whatever would be fantastic.

IME, the more bells and whistles I add to a design in the name of “future extensibility” and “potential for innovation”, the more I find that I’ve overdesigned the thing I’m trying to build. Then I end up having to trim back the design to extend it in the ways I actually need, or to make the design comprehensible to others. If you’re shipping a new extension to GHC, you won’t have that luxury and it will stick around approximately forever. I want your proposal to succeed, by which I mean “is enabled by many users” not “is landed in GHC and released”. I think the best way to do that is to seriously study the use cases that it’s built for, and make sure that new Haskellers can pick it up and run with it. The survey has but a couple of short examples but I think it’s worth eunmerating more and seeing how they behave under the different proposed schemes. These use cases should, IMHO, include failures: lexical failures where the interpolations are malformed, failures where the interpolations refer to the wrong things, failures where the result type of the string is ambiguous, failures where the extension is disabled (will it just report “variable not in scope: s”?), etc.

Good string interpolation has the potential to be a great QoL improvement, and I commend you for taking it on. But it also has the potential to be a great newbie-confuser, and I would really rather that didn’t happen.

Also, why is the interpolation character s? I could understand i or f, but s surprises me.

4 Likes