Prefer do notation over Applicative operators when assembling records

Gabriella439 · May 20, 2024, 4:39pm

AntC2 · May 20, 2024, 10:13pm

getPerson :: IO Person
getPerson = Person <$> getLine <*> getLine

A coupla things struck me about this code. (Neither to do with using Applicative vs positional inline calls.)

Command line dialogue to gather record content? Presumably real-life applications have many fields in a record. Wouldn’t you throw up a form for data entry, with helpful prompts and drop-down fields?
getLine straight into a data structure? Wouldn’t you first validate the content is a plausible name? (Or a number or a Bool, whatever.)

janus · May 20, 2024, 10:43pm

You said yourself these two points are not addressing the arguments in the article. You could swap IO for validation-selective. The code snippets are demonstrating a point about records, why can’t we just accept if they’re slightly contrived?

AntC2 · May 21, 2024, 1:29am

The code overall is returning IO something, and using functions of type IO something (getLine, putStrLn). It wouldn’t have occurred to me (Monads is one of the big features of Haskell) to make anything applicative in the first place. So that straw man is not just “slightly contrived”, it’s wholly artificial.

The title says “assembling records”; the advice seems to be deeper than that. I think the point is:

Even when grabbing a few arguments from the command line, it’s worth declaring a record type to put them in; because
you can name the fields; then
use NamedFieldPuns and RecordWildCards to assemble the record when returning from IO.
That gives you flexibility and modularity to prompt for, fetch and validate/process each field individually within the do block.

So the ‘Ergonomics’ (i.e. modularity) and ‘Order insensitivity’ points are to justify using named fields rather than positional constructors. That point gets made only in the ‘Caveats’; whereas it’d have more force appearing at the top “Instead of doing this” with the straw man being a data decl without named fields.

I think the example would go over better if the fields were values you’d typically grab at the command line – perhaps a number/run index and a file name.

tbidne · May 21, 2024, 4:03am

The example is not contrived because the entire thesis is:

When constructing a record in some Applicative, prefer Record Syntax over Applicative Style (with ApplicativeDo, if necessary).

The choice of Applicative is irrelevant to this claim. IO is probably the most familiar Applicative / easiest to motivate, so it is a reasonable choice.

But this pattern shows up all the time for other Applicatives (e.g. optparse-applicative, aeson).

On topic: I agree, nice examples. For those who do not like RecordWildCards, it is easy to adapt to NamedFieldPuns:

getPerson :: IO Person
getPerson = do
    firstName <- getLine
    lastName <- getLine
    return $ Person {
        firstName,
        lastName
    }

tdammers · May 21, 2024, 6:10am

IMO this fails to mention some arguments for the opposite side:

Conciseness. Applicative style is more concise, especially when the actions producing the arguments are syntactically and semantically simple. SomeRecord <$> getOneThing <*> getOtherThing is a one-liner, and perfectly readable; the do alternative is considerably longer without really adding a lot of readability:

do
    oneThing <- getOneThing
    otherThing <- getOtherThing
    return $ SomeRecord oneThing otherThing

Four lines, and one extra level of indirection (via local variables), compared to that one-liner.

Obviously this ignores the potential benefit of using record syntax to explicitly name those fields - what can I say, it’s not a straightforward answer…

Mirroring pure code. More often than not, code evolves from pure constructs (SomeRecord oneThing otherThing) to effectful constructs (monadic or applicative); and it is often desirable or useful to mirror the structure of the pure construct in effectful code. Applicative syntax, IMO, does a pretty good job at mirroring function application with “positional” arguments (but of course if the original pure code used record syntax, it’s better to mirror that, and use record syntax with do notation in the effectful version).

Avoiding redundant names. IMO, explicitly naming things when those names don’t contribute to understanding the code is just as bad as failing to name things when they could. Unnecessary explicit names clutter the reader’s working memory, waste screen real estate, and risk being (or becoming, as the code changes: “name rot”, akin to documentation rot) inappropriate, misleading, or straight up wrong.

It’s just a shame that we can’t combine the two approaches, using record syntax with applicative prefix operators or something like that:

SomeRecord
    { oneThing <=> getOneThing
    , otherThing <=> getOtherThing
    }

This would combine the conciseness of applicative syntax with the explicit naming of record syntax, while still mirroring what it would look like in pure code. (I picked <=> as a symbol for a hypothetical “applicative record field assignment” quasi-operator, but I’m not overly opinionated about the choice of symbol here.)

BurningWitness · May 21, 2024, 6:45am

It would be nice to be able to weave an Applicative into a constructor, but consider that this only applies to decoding, not encoding. So you’re still left to solve the problem of

encode Foo {..} =
  object
    ( pair "this" fooThis
   <> pair "that" fooThat
    )

Perhaps some notion of a single-constructor record could yield nicer answers here.

encode foo =
  object
    ( pair "this" foo.this
   <> pair "that" foo.that
    )

decode =
  object $ do
    foo.this <- pair "this" text
    foo.that <- pair "that" text
    pure foo

michaelpj · May 21, 2024, 8:36am

This would be an excellent case for something like Add InlineBindings proposal. by evincarofautumn · Pull Request #64 · ghc-proposals/ghc-proposals · GitHub :

do SomeRecord
    { oneThing = (<- getOneThing)
    , otherThing = (<- getOtherThing)
    }

AntC2 · May 21, 2024, 9:45am

We’ve seen several threads recently talking about ‘Boolean blindness’ or ‘Maybe blindness’. If your function has a bunch of arguments all same type (like names) there’s positional blindness: the only way to tell firstName vs lastName or inputFile vs outputFile is one is the first getLine and the first argument to the applicative – oh or did the original programmer think it the other way round? The get is 50 loc before the usage.

IOW explicit names are always necessary and always contribute to understanding the code. And I hope your workplace’s coding standards require long_and_meaningful_names.

(Just mentioning some arguments for the opposite side )

tdammers · May 27, 2024, 9:40am

I don’t think I agree with this as a blanket statement. Yes, long and meaningful names can help, but they come at a cost, and they require manual diligence to make sure they remain correct. Those are fairly big downsides in practice.

Suppose, for example, we have a map function, and we want to give its arguments descriptive names. Great, so let’s define it as map functionToMapOverList listToBeMappedOver, and it type as map :: (inputElement -> outputElement) -> [inputElement] -> [outputElement].

But now we generalize that function to arbitrary functors (basically, fmap), so our type is now: map :: Functor functorToOperateIn => (inputElement -> outputElement) -> functorToOperateIn inputElement -> functorToOperateIn outputElement - but there is nothing that rings an alarm bell as we compile this, even though our arguments are still named functionToMapOverList and listToBeMappedOver, which is now incorrect, because functorToOperateIn is no longer required to be [], it can be anything, Maybe, IO, Either String, Tree, whatever.

And meanwhile, all the information we have encoded in those argument names and type variables is actually redundant, we’ve already stated it in the type.

We don’t need to name the functor type functorToOperateIn, because it’s already evident from the structure. We don’t need to name the function argument, because it’s already evident from the structure that it is an argument to map, and from its type that it must be a function that maps input values to output values. We don’t need to map the list (or functor value) either, because, again, its role is already evident from its type and position. Due to the nature of Haskell’s syntax, we still have to name those things, because that’s the only way we can refer to them, but given that any descriptive name we could come up with would be redundant, it’s actually better to pick a meaningless name, just meaningful enough to make it easy to remember. And hence:

map :: (a -> b) -> [a] -> [b] - "map takes a function from some type a to some other type b, and a list of values of type a, and it returns a list of values of type b. Given this structure, the only sensible thing map can do is produce a list of values that correspond to the values in the input list, with the given function applied to it."

And:

map f xs = ...: we just name the first argument f, for “function”, which is easy to remember; and we name the second argument xs, as per the convention that the first general-purpose term-level variable we use will be x, but if it’s some kind of list-like collection, we will call it xs, as a quasi-plural form of x, to indicate that it being a collection is relevant. And that’s all the information we need to encode in those variable names - anything more would just eat up screen real estate and actually make the code harder to work with, for a reasonably experienced developer at least.

When you do need descriptive names, then that usually means your structure and types aren’t descriptive enough by themselves. In a language like Java or Python, this is often inevitable: their type systems are neither rigid nor expressive enough to capture as much meaning as Haskell’s type system can, and structural constraints tend to be much weaker too, not in the last place due to a lack of control over side effects.

And this, IMO, is the real explanation why Haskell culture gravitates so much towards short, seemingly cryptic names like f, x, a, b - it’s not because of Math, it’s not because we’re feeling smug and superior, it’s because there is genuinely less of a need for long descriptive names.

That’s not to say things don’t backfire - using short concise names like these is only feasible as long as you deliver on the promise of “self-documenting” code through structure and types - if you don’t, then you need long descriptive names just like anyone else. This is not good:

f :: String -> String -> String -> String -> String -> String
f a b c d e = ...

However, as a Haskeller, your first impulse in such a situation should not be “I need more descriptive names for those arguments”, but rather, “how can I make those types more descriptive”. E.g.:

f :: Protocol -> Hostname -> Port -> Path -> Query -> URL
f proto host port path q = ...

We’ve picked slightly longer names for the arguments here, but they’re still pretty short and cryptic, but that’s fine - the types now clearly define what they do, their main purpose is just to be easy to remember, not to actually document anything. And we haven’t even given f a descriptive name - in a real codebase, we probably would, but what it does is already evident from the type. And, best of all, we don’t even need to remember the order of the arguments: if we get them wrong, then the compiler will tell us, provided those are all newtypes and not type aliases. It’ll just say “hey, you put the path after the hostname, but you need to put a port there”, and you go “oh, yeah, right”. And it will do so regardless of how you name those variables.

hasufell · May 27, 2024, 11:14am

The lazy pattern matching gymnastics you have to do with ApplicativeDo in order to avoid really obscure compiler errors makes this probably one of the more unergonomic extensions and I would strongly advocate against it in any production codebase.

AntC2 · May 27, 2024, 10:33pm

I didn’t make it as a blanket statement. I started " If your function has a bunch of arguments all same type …"

map/fmap doesn’t have a bunch of arguments, but only two; each is different type.

newtypes are another useful way to give descriptive names. I’ve nothing against them; and would indeed prefer that approach/String gets used too much. But if the supplier of your library didn’t go to that extra bother, what are you gonna do?

(Even with newtypes, what if your program is juggling/comparing results from several Querys on the same Host?)

tdammers · May 29, 2024, 6:23am

Fair point.

As always, there is no silver bullet. I do think that one should keep in mind that you have more tools in your arsenal than just “long and descriptive names”, and that “long and descriptive names” should not be your first choice.

IMO: types > structure > conventions > names > structured doc comments > freeform doc comments.

If you have a program that juggles several queries, and their respective roles matter, then I’d first look for a suitable convention - e.g., if one query gets appended to the other, then it makes sense to accept them in the same left-to-right order as <> and similar operators would (or, ideally, provide an actual Semigroup instance). If the semantics are something like “use this one if some condition holds, else the other one”, then accept them in that order. If it’s “use this query, but if a part is missing, take it from this other query”, then I’d accept the default one first, and the one that overrides second. These conventions are all idiomatic in Haskell, and most Haskellers would expect these orderings.

OTOH, if those two (or more) queries have no natural ordering to them, but it still matters which is which, e.g. because you’re going to fetch two different resources (say, an HTML document and a CSS stylesheet), then you could think about how you could express this in the types (e.g., you could tag your queries with phantom types that indicate the type of expected response, so instead of Query -> Query -> IO (HtmlDocument, Stylesheet), you get Query HtmlDocument -> Query Stylesheet -> IO (HtmlDocument, Stylesheet), and your query function itself is no longer FromHTTP a => Query -> IO a, but FromHTTP a => Query a -> IO a); but if that’s not an option, then yes, descriptive names are the best choice.

But that’s a far stretch from “always use long and descriptive names”, or even “default to long and descriptive names”.

People often assume that long names are free and won’t hurt, but they’re not, and they can hurt maintainability a lot.