After getting positive reception from my talk at HIW this week, and seeing as discussion has died down on the proposal, this will be the final call for review for the proposal, before I submit it to the committee. Thanks!
Call me crazy, but I think the absence of convenient string interpolation syntax in Haskell is actually a good thing. Hear me out.
Naive string interpolation, that is, injecting generic string representations of arbitrary Haskell values into strings, is rarely conceptually correct - in almost all situations, the resulting strings are supposed to adhere to some kind of structural rules or constraints, i.e., some kind of grammar, and the injected values are supposed to be contained to some sort of “value” concept within that grammar. Naive string interpolation cannot ensure this - there’s no way to make sure the injected string does not “escape” the grammar, changing the structure of the template rather than just filling a blank within it.
This is not a theoretical concern. Naive string interpolation causes bugs, security disasters, and other pains, all the time in languages that have them. Just some examples off the top of my head:
Anyone who has ever done anything nontrivial in Bash can testify that dealing with whitespace in variables correctly is unreasonably difficult, especially when those variables are interpolated into strings (which is extremely common in Bash scripts).
XSS is still running rampant in PHP, a language that has string interpolation baked into the fabric of the language.
SQLi is still a major problem in any language that has convenient string interpolation language, because it is marginally more convenient than using the proper solution, parametrized queries.
I am actually hard pressed to think of an example where naive string interpolation is morally correct - even the classic “greeter” program (ask user for their name, greet them with "Hello, ${name}") is technically wrong, which is easily demonstrated by providing names that contain things like backspaces, newlines, carriage returns, terminal control sequences, etc.
I strongly agree that we should avoid unsafe interpolations. The current proposal lacks a way to have the interpolation produce compile-time errors, besides checking just the types of the interpolated values. Haskell’s type system is generally not strong enough to enforce the kinds of guarantees you’d want (e.g. this string that represents a name should not contain any newlines).
I would say the solution to that is to use quasiquotes so that you can check the grammar at compile-time with a user-defined parser. What we do really need then for better tooling support is a standardized way to interpolate Haskell values into quasiquoters.
Just standardized syntax could be enough but it can be a bit annoying if people could break tooling by writing a non-conforming quasiquoter. So perhaps the standard should be enforced the compiler but that would require some more design work.
And I don’t think quasiquotes should be dismissed so easily as in the proposal, because Template Haskell has improved a lot recently and quasiquoters in particular avoid (or could in the near future avoid) most of the important issues with TH. I guess the proof is in the pudding so I should see for myself if I can write a nicer string interpolation library with quasiquotes.
We kind of have a de-facto-standardized way of interpolating Haskell values into quasiquoters (pull in haskell-src-exts and run antiquotation contents through its Haskell parser), it’s just not very nice and has some issues.
Have you read the actual proposal? It doesn’t work for arbitrary Haskell values, it uses a typeclass to render the value.
The vast majority of your comments are not Haskell specific, and yet almost every other mainstream language has it, so I’m not sure your concerns hold up. People are generally aware of what string interpolation can do, and what you shouldn’t do with them, which is transferrable from other languages.
Things like SQL injection could be mitigated with a SQL string interpolator, as I described in the proposal, which allows ergonomic query parameterization using interpolation syntax while escaping interpolated values.
The proposal has a pretty good story about using qualified interpolators to allow safe interpolation into strings with restricted grammars (e.g. SQL). It’s true that users might ignore that and use naive interpolation to construct unsafe queries, but they might also use (++) to construct bad queries…
On the quasiquotes point, it would be useful to have a built-in way to have interpolation/antiquotation inside quasiquotes (see #24966). I see that as complementary to this proposal. Although it does suggest using $(...) in interpolated strings for consistency with antiquotation inside TH quotes, whereas the proposal currently uses ${...}.
Data.ByteString.Builder does manual interpolation better, the only reason it’s not a proper replacement is poor typing. If Builder could be UTF8 on the type level, you’d get a perfect chain of
Just to add to this: I’m a big fan of TH and I want to see it become better. Yet, even if we implemented antiquotes for quasiquotes, there would still be many people who refuse to use TH for principled reasons, eg, lack of (sufficiently good) cross compilation support or sandboxing.
While I think these objections could be resolved, there’s no guarantee that it will happen any time soon. So, it would be great to have StringInterpolation, which doesn’t depend on TH, that anyone can use.
I think there need to be more Show like classes, for pretty printing and user facing. For example Pretty would ensure readable lists, while UserFacing printing for Data.Map.Map would not include a fromList. The UserFacing class seems to be called Interpolate by you, but it is a separate issue, and could be implemented without any GHC extensions. I think adding those classes would be a good first step.
Yes, that’s a good observation. I opened a CLC discussion for adding a relevant typeclass like Display. But I like ChickenProp’s comment saying that having a class specifically for Interpolate is nicer than a general class, because it’s narrowly scoped and doesn’t claim to represent “the definitive way to render a type to String”, which may be numerous (e.g. floats)
Have you read the actual proposal? It doesn’t work for arbitrary Haskell values, it uses a typeclass to render the value.
I have read the proposal, yes. What I mean by “arbitrary Haskell values” here is that the template is basically just a string, and nothing in the toolchain knows anything about any potential structural constraints of that template string. Any value for which a typeclass instance for string interpolation exists can be injected into a template string at any point, regardless of what that string represents.
In other words, the problem is that we’re constructing strings from a template and data within a particular domain (e.g., SQL, HTML, a greeter program’s output, etc.), but the mechanism we’re using for that is completely unaware of the structure.
People are generally aware of what string interpolation can do, and what you shouldn’t do with them
And yet my experience is that people are not as aware of the issue as you’d expect.
Ah, that part actually went slightly past me. SQL is probably not the best example here, because what you really ought to do is send the parameters separately from the query string and not doing any interpolation at all, but I get the idea, and for things like HTML, it sounds like a decent approach. I guess it would hinge on whether the correct solution (using those domain-aware interpolators) would be more obvious and more ergonomic than the incorrect one (using naive string interpolation) in practice.
If you look at the proposal, that’s exactly what happens
let name = "Robert'; DROP TABLE auth; --"
print SQL.s"SELECT * FROM users WHERE name = ${name}"
-- SqlQuery
-- { sqlText = "SELECT * FROM users WHERE name = ?"
-- , sqlValues = [SqlString "Robert'; DROP TABLE auth; --"]
-- }
As you can see from the SQL example, you can certainly create custom interpolators that are domain specific and aware of the structure.
But I still don’t understand your general point about string interpolation. There’s literally no difference between "My name is " <> name or s"My name is ${name}". If name has weird characters like backspace or control chars, it would be applied in either case, its not specific to string interpolation
But regardless how you feel about the concept in theory, it’s had much success in almost every other mainstream language, so it’s clearly popular in practice
The difference between "My name is " <> name and s"My name is ${name}" is that the first one looks and feels much clunkier, and I guess my point is that that’s a good thing, because it is clunky, and if you ever find yourself in a situation where you do this kind of thing a lot, you should find a proper solution to your problem, one that takes structural constraints etc. into account. I feel that the presence of a readily available, convenient, readable string interpolation feature would incentivize the clunky solution rather than making it look and feel as clunky as it is. I’ve always loved Haskell for naturally nudging me towards the “morally correct” solution, and making bad code uncomfortable to write. I’m not sure “it’s a very popular feature in other languages” is necessarily a good metric for “we should do this”.
However, I think I’ve been a bit too hasty reading the proposal, and skimmed over some essential parts. If custom interpolators are in fact not just possible, but end up becoming standard practice, then this could actually be an opportunity to get string interpolation right for once. I think I might actually like that a lot.
Normal s"..." with overloaded strings (uses default Interpolate class, concats with String, final fromString)
Basic.s"..." an interpolator shipped from GHC that doesnt do any implicit interpolation, just concats
Data.Text.s"..." would use the default Interpolate class that concats with Text and rewrite rules for performance
Same for LazyText and Builder
The first two bullets are free, text gets it out of the box. The third bullet needs to be implemented by text. Text could also choose to implement other interpolators if it wants, e.g. an interpolator using its own TextInterpolate class. But TextInterpolate should be unnecessary; my benchmark shows that rewrite rules with the default Interpolate class is equally performant.
If it’s not clear what the difference between the first and third bullet is, it’s essentially the difference between
Text.pack ("age: " <> show age)
Text.pack "age: " <> (Text.pack . show) age
Yes of course, it’s not exported from base and as such APIs defined in base can’t use it. But were Text exported from base, I’d be interested in having Display in base, just like Rust’s stdlib does.