GHC String Interpolation Survey Open!

Hello! I started a proposal for adding string interpolation to GHC almost exactly 2 years ago, and after getting a working prototype, I have compiled a doc and survey for some of the options discussed in the thread. Please read the doc and vote in the survey! We would really like to see where the community lands on some of these questions.

Proposal: https://github.com/ghc-proposals/ghc-proposals/pull/570
Doc + Survey: Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.

19 Likes

B3:


s"Name: ${name}, Age: ${show age}"
-- Desugars to:
--   fromBuilder $
--     toBuilder "Name: "
--     <> toBuilder name
--     <> toBuilder ", Age: "
--     <> toBuilder (interpolate age)

I don’t get this example. Why show is gone?

Unfortunately, the survey thing keeps insisting that I need to enable JavaScript despite enabling JavaScript, so here’s my $0.02 on the matter.

A few additional costs that I can’t find mentioned in the proposal:

  • There is an N x M problem, where you have N interpolation targets, and M types that can be interpolated
  • The MultiParamTypeclass approach will likely require library authors to introduce orphans to avoid excessive dependencies. E.g., an SQL library will want to provide Interpolate instances for its SQL query type, but because these have to be parametrized on the argument type as well, the library must either depend on anything that it may want to interpolate into queries, or such instances must be orphans.
  • Because the interpolation mechanism is parametrized over both the target and the arguments, type inference will not work when either is polymorphic, so in nontrivial code, types will often have to be explicitly annotated to disambiguate. E.g.: "${23}" needs a type annotation to decide which type to use for the literal 23`.
  • The lexer, as described, deviates significantly from what users coming from languages like Bash would expect - ${"foo} and ${foo { bar = 23 }} and ${23 {- comment -}} should definitely be valid and work as expected. The only reason they are not is because it would make the Lexer more difficult to write - but IMO that’s not a valid excuse. These things work fine in Bash, they work fine in quasiquotes and splices, so having them not work in string interpolation would be inconsistent, and surprising in a bad way.
  • Defaulting to Show is questionable IMO, because of Show’s ill-defined semantics. This is a whole other rabbit hole, but in a nutshell, there doesn’t seem to be any consensus on what exactly show is supposed to mean, so how can we possibly be sure that show is the right thing to use?
  • At the same time, the intended meaning of “interpolating into a string” isn’t clear either. What does it mean to interpolate an integer into a string? What does the integer represent? What does the string represent? We don’t know, but how exactly we do the interpolation depends on the answers to these questions. The existing solutions are a bit clearer on this, or else leave the interpretation up to the user: printf has its own typeclass that defines specifically what it means to use a value as a printf argument, independently from any existing conversions to string; Template Haskell based solutions are domain-specific, and use whichever mechanism is appropriate for the domain (e.g., one that builds HTML strings will typically provide some kind of ToHTML typeclass, one that builds SQL queries will provide a ToSQL typeclass, etc.).
  • I honestly see very little benefit over the status quo. The Template Haskell solutions offer very similar APIs, but since they are implemented as libraries, they can be made as generic or as specific as the use case demands, and they don’t require any machinery in the compiler that’s not already there. The current downsides of this approach (Template Haskell being problematic for all sorts of reasons, having to parse Haskell source with a separate parser that might disagree with the compiler itself, etc.) are real, but IMO the way to deal with this is to fix those problems, rather than adding more complexity to the compiler to work around them. And people are already working on fixing those things - Template Haskell is slowly moving towards a more principled approach, which should hopefully eventually solve most of the practical issues; work is underway to formalize the various APIs GHC exposes, and tapping into GHC’s actual parsing machinery from user code should eventually become practical. People who reject Template Haskell out of fundamental principles, rather than current practical issues, are a tiny minority - I personally don’t know anyone who rejects metaprogramming as a whole, people who reject Template Haskell do so because it messes up cross-compilation, because it’s untyped, because of the staging restriction, or because of any number of other complaints - but not because it’s metaprogramming, which suggests that if all those other practical complaints were addressed, the Template Haskell solutions we already have would be perfectly acceptable for the overwhelming majority of Haskell users.
6 Likes

I once had a problem with the PyF interpolator: How are bytestrings encoded? · Issue #134 · guibou/PyF · GitHub
I opened that issue because this was a bug I didn’t notice until I saw �’s in the deployed program. Maybe I should have foreseen it, treating those bytestrings as if they were text, but the ease of using PyF elsewhere kind of lulled me into believing it handled things correctly and I wouldn’t have to actually think :wink: So basically, if conversions are implicit, it should be hard to shoot yourself in the foot in this way. If it’s hard to make it fool-proof, I would go for explicit. But I don’t know if it’s hard or easy to make this kind of thing fool-proof, so I don’t know what to answer in the survey :slight_smile:

First thanks for the work your have been doing on multiline strings and now on string interpolation.

My first reaction was, actually we don’t really need string interpolation. Multiline string wasn’t just syntaxic sugar, as it moves the string concatenation from runtime to compile time.
This is not the case for string interpolation it is pure syntaxic sugar evaluated at runtime. However it is handy and having it baked in means there is no extra dependency neither choice paralysis. So I welcome it.

However,

A naive implementation of string interpolation would convert s"a ${x} b" to something like “a " <> … <> " b”

There is another naive implementation which is mconcat ["a", ..., "b"]. This solves the builder problem buy letting the implementation of mconcat deal with the performance/optimisation issue.

It could also be less naive and desugar using Left and Right as in B1 option.
B1 becomes ]

makeQuery split"SELECT * FROM user WHERE name = ${name} AND age = ${age}"

to

 makeQuery
     [ Left "SELECT * FROM user WHERE name = "
     , Right $ interpolate name
     , Left " AND age = "
     , Right $ interpolate age
     ]

No need for special typeclass, just plain old list.
Also, interpolate could be passed as an argument using a special syntax, as in

makeQuery  interpolate\"SELECT * FROM user WHERE name = ${name} AND age = ${age}".
2 Likes

Reposting my survey comments for the sake of more discussion:

  • I think these features are going to be tremendously valuable for the new Haskell programmer who is getting started. Thank you for tackling this.

  • I think any design that leads to SQL injection via implicit interpolation should be rejected, which probably means implicit interpolation without a separate builder, and anything involving an {-# OVERLAPPABLE #-} instance involving Show.

  • I think it might be cleaner and potentially valuable to desugar to mconcat instead of chained <>-applications, for monoids/builders with an efficient mconcat.

  • I think most options here are relying too heavily on magic typeclasses to wire into GHC. In particular, I worry that the MPTC with no fundep will to lead to poor inference and confused users (particularly when interpolating to an underconstrained overloaded string). Just comprehending the purpose of the Interpolate MPTC and Builder class will be a bit confusing (I base this off the number of people who bounce off regex-base's pluggable interface.) This might be a particularly frustrating experience for new Haskell programmers, as "Hello, ${name}!" is one of the things such a programmer wants to print to the screen.

  • If we end up with more implicit machinery, I think it will be valuable to make class Buildable very clearly connected to the interpolation machinery, either by ensuring it’s buried in module GHC.Foo.Interpolate, named class InterpolateBuilder, or similar. I have often seen classes get defined to set up some narrow type-directed machinery, and then get misused for all sorts of other things. The usual offender is a ToFoo class that starts out as a tool for narrow conversions, but grows increasingly elaborate instances.

5 Likes

Don’t know if it has been discussed but I would prefer (if possible) string interpolation to be part of multistring (using """...""") syntax rather than string prefix (ala s"...").

3 Likes

Also, because it wasn’t obvious to me at the time I did the survey:

If you do explicit interpolation (possibly to a Builder that’s selected by the target type), you can probably pick up a lot of the power of package formatting, by only defining Interpolate instances that take formatting arguments.

I think I still prefer formatting as the most Haskelly solution to the string formatting problem, but I keep my opinion that understandable formatting syntax will make newbies have a better time. Much of these benefits are lost if a simple error in a format string turns into an explosion of type errors referencing advanced Haskell features.

It would also be good if any typeclasses that eventually go into GHC are published to Hackage somehow, so that people can define the necessary instances without locking themselves to the latest and greatest GHC.

In B3

class Interpolate a s where
interpolate :: a → s

{----- User code -----}

s"Name: {name}, Age: {show age}"
– Desugars to:
– fromBuilder $
– toBuilder "Name: "
– <> toBuilder name
– <> toBuilder ", Age: "
– <> toBuilder (interpolate age)

Shoudn’t it be toBuilder (interpolate name). In the case of building some sql query even though name is a string, it is a parameter and might need to be replaced by '?' or be sanitized etc …

Also, it might be usefull for interpolate to have access to the raw string, it will become (interpolate "name" name).

Could an option None of the above be added to the survey ?

I’m concerned about the size of the design space: it is vast. I strongly doubt we are going to be able to converge on a good design in the abstract. I believe with a design space of this complexity, the only way to clean out all the dark corners and smooth off all the rough edges is to try out some implementations in practice.

Sadly, the GHC proposal process doesn’t lend itself to this strategy. But with good reason: once an extension is in GHC it had better be decent because it will be nigh-impossible to remove it. A mis-designed extension can cause a great deal of damage, especially if it is widely used, as I expect string interpolation to be. I think we should be extremely cautious to not choose the wrong design here.

My suggestion is to implement a library containing some of the points in the design space as quasiquoters, promote the library so people use it in practice over the duration of approximately a year or two, and then determine which is the one that works best in practice.

I know that implementing interpolators as quasiquoters essentially defeats the whole purpose of the proposal, but doing so would just be a stepping stone to getting the most appropriate one into GHC as an extension, enshrining it for good.

12 Likes

I agree, this feels like an admission that the compiler cannot do any better, so the solution to this must be bolting a convenience extension onto GHC.


It should be possible to do something like the following in the language already, right?

sqlquery
  """
  SELECT ${x}
    FROM ${y}
   WHERE ${z};
  """
  [ "x" := (_ :: SqlBuilder)
  , "y" :=  _
  , "z" :=  _
  ]

(and, yes, I would love it if I could move the overhead and error reporting in any such function to compilation time without the ugliness of Template Haskell, but this extension would never solve that either)

I have got something like this working, and I have to say, I really don’t like SQL as an example here.

It is just not a good practice to concatenate strings as SQL.
Because of the above, it invites cleverness to the string concatenating solution, which maybe just doesn’t belong there. And whatever cleverness sneaks in, it still can’t hold a candle to the TH based solution. And if it can, then it wasn’t a string concatenating solution, but (hopefully) better TH in disguise.

The proposal goes with better examples in the motivation section: “printing out logs, rendering exceptions, generating code, pretty-printing”. Use cases that don’t resurrect demons of the early PHP days.

ooh boy. As somewhat expected, lots of comments :sweat_smile: I won’t be able to respond to all of them, but here are some brief responses:

  • Yes, the example under B3 was wrong, I’ve corrected it
  • The proposal in the PR isn’t completely updated yet, so don’t read too much into it, such as the description of the lexer or what strings are valid or not. I actually got a much better implementation working in my prototype
  • In general, this survey is meant to gather feedback on the syntax of such a feature, if it were to get accepted. The committee could, of course, reject the proposal, regardless of the options we pick here. So if your position is of the form “we should just not do it”, this survey is not for you :slightly_smiling_face: Update: Survey has been updated to allow specifying whether you want the feature
  • @BurningWitness - yes that’s already possible, but if you read the Motivations section of the proposal, it explains why this isn’t satisfactory
  • @jackdk @jeukshi - Perhaps I didn’t make the example clear, but the point of the SQL example is that it’s safer than just concatenating strings and avoids injection attacks. For example, with implicit interpolation, you could define interpolate as SqlQuery "?" [SqlString s] and any interpolated value will be automatically escaped
4 Likes

In addition to the things that this proposal will allow us to express, we should also ask about what it shouldn’t allow us to express.

I feel like OverloadedStrings is an extension that can easily be missused. For instance you could write an instance that was instance Read a => IsString a where .... Something like that would be bad because quoted text is no longer being used to represent something “string-y” and it would make understanding code a lot harder. You now have to wonder whether there is something non-trivial (or partial!) going on in an instance.

I think string interpolation has the potential to have issues like this too. The more polymorphic we make the interface, the more scope there is for unexpected behaviour from string interpolation.

The example with SQL injection is already an example of that for me. To me, it feels like a disadvantage that string interpolation is allowing us to write things like this. It feels like it is being used as a means of syntactic extension, and in that case, I wish it was done explicitly using Quasiquotes, which as a reader of the code would make that explicit.

Can you elaborate on your point about SQL injection? In my earlier comment, I’ve already explained how this would make SQL queries safer and less prone to injection.

I just meant the example where you showed that we could avoid SQL injection by replacing the params with ?, etc. I’m not challenging the idea that something like that is helpful for avoiding injection errors.

My point is that it makes the code non-trivial to understand. String interpolation is not just being used as a simple way to glue bits of strings together, but rather it’s doing non-trivial work based on a typeclass. And that means that you should be quite suspicious of any code that uses string interpolation.

So, the more powerful this feature is, the more uncertain a user will be with what a particular instance means. This is a general trade-off with typeclass-y interfaces, they tend to make code easier to write, but harder to understand.

1 Like

Got it. Yeah, I agree, it makes the feature more powerful and prone to misuse, but IMO that’s an advantage.

I don’t see much uncertainty here because when you see string interpolation in the code, you usually know exactly what type is being built (string/sqlquery/whatever), and the interpolation rules for that type should generally be consistent (e.g. if youre writing an Interpolate Foo SqlQuery instance, you should know to handle sql injection attacks). I don’t see this as any different than typeclass laws. Sure, you can implement whatever monad instance you want and make do blocks have surprising behavior, but no one would do that, and if they did, the reader would recognize that this do block has that type, so it has the semantics of that monad

But by all means, vote for the explicit option in the survey!

2 Likes

Good observation, indeed there are cases where we want find-and-replace, but not concatenation.

I agree with the underlying concerns, I disagree on the pain points:

  • Manually interpolating Text strings sucks because working with Text sucks. Data.ByteString.Builder doesn’t do any serialization typeclass shenanigans and is properly efficient, I love it and use for logging and error reporting in the game I’m writing. The only [giant] problem with it is that it can’t build anything but ByteStrings, which means I can’t [safely] use it for anything except binary serialization.

  • I would absolutely love it if there were some way to do compile-time programming in Haskell without Template Haskell. All of it is already possible with merely $$(), [|| ||] and fail, it just happens to be a monstrous non-reusable eyesore with staging restrictions.

  • If compile-time programming were easy, I imagine printf would be trivial to implement safely in Haskell, it’s simply a conversion from a string literal to a function. Not that I’d expect it to be the only solution to this problem.

From my perspective a string interpolation extension would trim down a 70-character logging line to, maybe, 50 characters, and that’s provided someone already did all the dirty work of writing the underlying library that defines all the conversions.

If someone prefer Explicitly (s)he doesn’t require string interpolation.

String interpolation is about Implicitly and not about maximum flexibility.
So, imho most wanted feature is

  1. A2 - implicit
  2. B3 - add Bulder, interpolate to s

[…] the way to deal with this is to fix those problems, rather than adding more complexity to the compiler to work around them

100% agree. With easier QuasiQuotes there is no reason to use a built-in string interpolation mechanism, instead let each library provide what it thinks it’s the “right thing” (ex. Regex literals are quite different from SQL code).

In case this proposal moves forward I would suggest to use the most minimal approach. More complex use cases can be delegated to libraries.