getPerson :: IO Person getPerson = Person <$> getLine <*> getLine
A coupla things struck me about this code. (Neither to do with using Applicative vs positional inline calls.)
-
Command line dialogue to gather record content? Presumably real-life applications have many fields in a record. Wouldnât you throw up a form for data entry, with helpful prompts and drop-down fields?
-
getLine
straight into a data structure? Wouldnât you first validate the content is a plausible name? (Or a number or aBool
, whatever.)
You said yourself these two points are not addressing the arguments in the article. You could swap IO for validation-selective. The code snippets are demonstrating a point about records, why canât we just accept if theyâre slightly contrived?
The code overall is returning IO something
, and using functions of type IO something
(getLine, putStrLn
). It wouldnât have occurred to me (Monads is one of the big features of Haskell) to make anything applicative in the first place. So that straw man is not just âslightly contrivedâ, itâs wholly artificial.
The title says âassembling recordsâ; the advice seems to be deeper than that. I think the point is:
- Even when grabbing a few arguments from the command line, itâs worth declaring a record type to put them in; because
- you can name the fields; then
- use
NamedFieldPuns
andRecordWildCards
to assemble the record whenreturn
ing fromIO
. - That gives you flexibility and modularity to prompt for, fetch and validate/process each field individually within the
do
block.
So the âErgonomicsâ (i.e. modularity) and âOrder insensitivityâ points are to justify using named fields rather than positional constructors. That point gets made only in the âCaveatsâ; whereas itâd have more force appearing at the top âInstead of doing thisâ with the straw man being a data
decl without named fields.
I think the example would go over better if the fields were values youâd typically grab at the command line â perhaps a number/run index and a file name.
The example is not contrived because the entire thesis is:
When constructing a record in some
Applicative
, prefer Record Syntax over Applicative Style (withApplicativeDo
, if necessary).
The choice of Applicative
is irrelevant to this claim. IO
is probably the most familiar Applicative
/ easiest to motivate, so it is a reasonable choice.
But this pattern shows up all the time for other Applicative
s (e.g. optparse-applicative
, aeson
).
On topic: I agree, nice examples. For those who do not like RecordWildCards
, it is easy to adapt to NamedFieldPuns
:
getPerson :: IO Person
getPerson = do
firstName <- getLine
lastName <- getLine
return $ Person {
firstName,
lastName
}
IMO this fails to mention some arguments for the opposite side:
Conciseness. Applicative style is more concise, especially when the actions producing the arguments are syntactically and semantically simple. SomeRecord <$> getOneThing <*> getOtherThing
is a one-liner, and perfectly readable; the do
alternative is considerably longer without really adding a lot of readability:
do
oneThing <- getOneThing
otherThing <- getOtherThing
return $ SomeRecord oneThing otherThing
Four lines, and one extra level of indirection (via local variables), compared to that one-liner.
Obviously this ignores the potential benefit of using record syntax to explicitly name those fields - what can I say, itâs not a straightforward answerâŚ
Mirroring pure code. More often than not, code evolves from pure constructs (SomeRecord oneThing otherThing
) to effectful constructs (monadic or applicative); and it is often desirable or useful to mirror the structure of the pure construct in effectful code. Applicative syntax, IMO, does a pretty good job at mirroring function application with âpositionalâ arguments (but of course if the original pure code used record syntax, itâs better to mirror that, and use record syntax with do
notation in the effectful version).
Avoiding redundant names. IMO, explicitly naming things when those names donât contribute to understanding the code is just as bad as failing to name things when they could. Unnecessary explicit names clutter the readerâs working memory, waste screen real estate, and risk being (or becoming, as the code changes: âname rotâ, akin to documentation rot) inappropriate, misleading, or straight up wrong.
Itâs just a shame that we canât combine the two approaches, using record syntax with applicative prefix operators or something like that:
SomeRecord
{ oneThing <=> getOneThing
, otherThing <=> getOtherThing
}
This would combine the conciseness of applicative syntax with the explicit naming of record syntax, while still mirroring what it would look like in pure code. (I picked <=>
as a symbol for a hypothetical âapplicative record field assignmentâ quasi-operator, but Iâm not overly opinionated about the choice of symbol here.)
It would be nice to be able to weave an Applicative
into a constructor, but consider that this only applies to decoding, not encoding. So youâre still left to solve the problem of
encode Foo {..} =
object
( pair "this" fooThis
<> pair "that" fooThat
)
Perhaps some notion of a single-constructor record could yield nicer answers here.
encode foo =
object
( pair "this" foo.this
<> pair "that" foo.that
)
decode =
object $ do
foo.this <- pair "this" text
foo.that <- pair "that" text
pure foo
This would be an excellent case for something like Add InlineBindings proposal. by evincarofautumn ¡ Pull Request #64 ¡ ghc-proposals/ghc-proposals ¡ GitHub :
do SomeRecord
{ oneThing = (<- getOneThing)
, otherThing = (<- getOtherThing)
}
Weâve seen several threads recently talking about âBoolean blindnessâ or âMaybe blindnessâ. If your function has a bunch of arguments all same type (like names) thereâs positional blindness: the only way to tell firstName
vs lastName
or inputFile
vs outputFile
is one is the first getLine
and the first argument to the applicative â oh or did the original programmer think it the other way round? The get
is 50 loc before the usage.
IOW explicit names are always necessary and always contribute to understanding the code. And I hope your workplaceâs coding standards require long_and_meaningful_names.
(Just mentioning some arguments for the opposite side )
I donât think I agree with this as a blanket statement. Yes, long and meaningful names can help, but they come at a cost, and they require manual diligence to make sure they remain correct. Those are fairly big downsides in practice.
Suppose, for example, we have a map
function, and we want to give its arguments descriptive names. Great, so letâs define it as map functionToMapOverList listToBeMappedOver
, and it type as map :: (inputElement -> outputElement) -> [inputElement] -> [outputElement]
.
But now we generalize that function to arbitrary functors (basically, fmap
), so our type is now: map :: Functor functorToOperateIn => (inputElement -> outputElement) -> functorToOperateIn inputElement -> functorToOperateIn outputElement
- but there is nothing that rings an alarm bell as we compile this, even though our arguments are still named functionToMapOverList
and listToBeMappedOver
, which is now incorrect, because functorToOperateIn
is no longer required to be []
, it can be anything, Maybe
, IO
, Either String
, Tree
, whatever.
And meanwhile, all the information we have encoded in those argument names and type variables is actually redundant, weâve already stated it in the type.
We donât need to name the functor type functorToOperateIn
, because itâs already evident from the structure. We donât need to name the function argument, because itâs already evident from the structure that it is an argument to map
, and from its type that it must be a function that maps input values to output values. We donât need to map the list (or functor value) either, because, again, its role is already evident from its type and position. Due to the nature of Haskellâs syntax, we still have to name those things, because thatâs the only way we can refer to them, but given that any descriptive name we could come up with would be redundant, itâs actually better to pick a meaningless name, just meaningful enough to make it easy to remember. And hence:
map :: (a -> b) -> [a] -> [b]
- "map
takes a function from some type a
to some other type b
, and a list of values of type a
, and it returns a list of values of type b
. Given this structure, the only sensible thing map
can do is produce a list of values that correspond to the values in the input list, with the given function applied to it."
And:
map f xs = ...
: we just name the first argument f
, for âfunctionâ, which is easy to remember; and we name the second argument xs
, as per the convention that the first general-purpose term-level variable we use will be x
, but if itâs some kind of list-like collection, we will call it xs
, as a quasi-plural form of x
, to indicate that it being a collection is relevant. And thatâs all the information we need to encode in those variable names - anything more would just eat up screen real estate and actually make the code harder to work with, for a reasonably experienced developer at least.
When you do need descriptive names, then that usually means your structure and types arenât descriptive enough by themselves. In a language like Java or Python, this is often inevitable: their type systems are neither rigid nor expressive enough to capture as much meaning as Haskellâs type system can, and structural constraints tend to be much weaker too, not in the last place due to a lack of control over side effects.
And this, IMO, is the real explanation why Haskell culture gravitates so much towards short, seemingly cryptic names like f
, x
, a
, b
- itâs not because of Math, itâs not because weâre feeling smug and superior, itâs because there is genuinely less of a need for long descriptive names.
Thatâs not to say things donât backfire - using short concise names like these is only feasible as long as you deliver on the promise of âself-documentingâ code through structure and types - if you donât, then you need long descriptive names just like anyone else. This is not good:
f :: String -> String -> String -> String -> String -> String
f a b c d e = ...
However, as a Haskeller, your first impulse in such a situation should not be âI need more descriptive names for those argumentsâ, but rather, âhow can I make those types more descriptiveâ. E.g.:
f :: Protocol -> Hostname -> Port -> Path -> Query -> URL
f proto host port path q = ...
Weâve picked slightly longer names for the arguments here, but theyâre still pretty short and cryptic, but thatâs fine - the types now clearly define what they do, their main purpose is just to be easy to remember, not to actually document anything. And we havenât even given f
a descriptive name - in a real codebase, we probably would, but what it does is already evident from the type. And, best of all, we donât even need to remember the order of the arguments: if we get them wrong, then the compiler will tell us, provided those are all newtypes and not type aliases. Itâll just say âhey, you put the path after the hostname, but you need to put a port thereâ, and you go âoh, yeah, rightâ. And it will do so regardless of how you name those variables.
The lazy pattern matching gymnastics you have to do with ApplicativeDo in order to avoid really obscure compiler errors makes this probably one of the more unergonomic extensions and I would strongly advocate against it in any production codebase.
I didnât make it as a blanket statement. I started " If your function has a bunch of arguments all same type âŚ"
map
/fmap
doesnât have a bunch of arguments, but only two; each is different type.
newtype
s are another useful way to give descriptive names. Iâve nothing against them; and would indeed prefer that approach/String
gets used too much. But if the supplier of your library didnât go to that extra bother, what are you gonna do?
(Even with newtype
s, what if your program is juggling/comparing results from several Query
s on the same Host
?)
Fair point.
As always, there is no silver bullet. I do think that one should keep in mind that you have more tools in your arsenal than just âlong and descriptive namesâ, and that âlong and descriptive namesâ should not be your first choice.
IMO: types > structure > conventions > names > structured doc comments > freeform doc comments.
If you have a program that juggles several queries, and their respective roles matter, then Iâd first look for a suitable convention - e.g., if one query gets appended to the other, then it makes sense to accept them in the same left-to-right order as <>
and similar operators would (or, ideally, provide an actual Semigroup
instance). If the semantics are something like âuse this one if some condition holds, else the other oneâ, then accept them in that order. If itâs âuse this query, but if a part is missing, take it from this other queryâ, then Iâd accept the default one first, and the one that overrides second. These conventions are all idiomatic in Haskell, and most Haskellers would expect these orderings.
OTOH, if those two (or more) queries have no natural ordering to them, but it still matters which is which, e.g. because youâre going to fetch two different resources (say, an HTML document and a CSS stylesheet), then you could think about how you could express this in the types (e.g., you could tag your queries with phantom types that indicate the type of expected response, so instead of Query -> Query -> IO (HtmlDocument, Stylesheet)
, you get Query HtmlDocument -> Query Stylesheet -> IO (HtmlDocument, Stylesheet)
, and your query function itself is no longer FromHTTP a => Query -> IO a
, but FromHTTP a => Query a -> IO a
); but if thatâs not an option, then yes, descriptive names are the best choice.
But thatâs a far stretch from âalways use long and descriptive namesâ, or even âdefault to long and descriptive namesâ.
People often assume that long names are free and wonât hurt, but theyâre not, and they can hurt maintainability a lot.