Haskell records in 2025 (Haskell Unfolder #45)

Will be streamed live today, 2025-06-25, at 1830 UTC.

Abstract:
Haskell records as originally designed have had a reputation of being somewhat weird or, at worst, useless. A lot of features and modifications have been proposed over the years to improve the situation. But not all of these got implemented, or widespread adoption. The result is that the situation now is quite different from what it was in the old days, and additional changes are in the works. But the current state can be a bit confusing. Therefore, in this episode, we are going to look at how to make best use of Haskell records right now, discussing extensions such as DuplicateRecordFields, NoFieldSelectors, OverloadedRecordDot and OverloadedRecordUpdate, and we’ll take a brief look at optics.

22 Likes

Thanks a lot for the great stream. I enjoyed every minute of it and I learned about some of the Extensions, which I consider using from now on.
You mentioned the optics library exactly when I was already searching for possibilities to do exactly what optics achieved. I was stunned when you mentioned the OverloadedLabels extension, because that’s the ghc user-doc tab I had opened just that instant.

Please continue with your work, it helped me a lot!

11 Likes

Many thanks for this.

A while back, I changed Stack’s own code (which has lots of records) to use {-# LANGUAGE DuplicateRecordFields #-},
{-# LANGUAGE OverloadedRecordDot #-}. To avoid some compiler errors around ambiguity (see below), I found I had (in places) to do things like (extract):

import           Stack.Types.Project ( Project (..) )
import qualified Stack.Types.Project as Project ( Project (..) )
...
  let project :: Project
      project = project'
        { Project.compiler = mcompiler <|> project'.compiler
        , Project.snapshot = fromMaybe project'.snapshot mSnapshot
        }

The messages otherwise were like (GHC 9.8.4):

src\Stack\Config.hs:853:11: warning: [GHC-02256] [-Wambiguous-fields]
    Ambiguous record update with parent type constructor ‘Project’.
    This type-directed disambiguation mechanism will not be supported by -XDuplicateRecordFields in future releases of GHC.
    Consider disambiguating using module qualification instead.
    |
853 |         { compiler = mcompiler <|> project'.compiler
    |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Haskell Error Index’s advice on that is here:

Do you have any advice yourself in that regard? Is what I am doing idiomatic?

1 Like

The qualified import trick is IMO a terrible workaround. Embrace the fact that at the moment for updates optics are the cleanest option.

4 Likes

Actually, the update ambiguity warning is one of things I had originally wanted to talk about, but the episode was already too long.

I think the warning is a bit unfortunate, and I would personally suggest to just ignore it for the time being. If OverloadedRecordUpdate becomes fully implemented (which I still hope), then the now discouraged syntax will be exactly the syntax to use again. I think it’s a little bit sad to steer users aways from this syntax in this transition phase, and I’d very much hope that the feature does not actually get removed before OverloadedRecordUpdate works …

On the other hand, I also agree with @arybczak and that’s also why I “simplified” the update story in the episode. Use optics for updates. For the time being, it’s the best solution, and for nested updates the only practical one (other kinds of lenses are of course ok, too).

6 Likes

I don’t mind the qualified import trick; I think it’s natural to qualify things to resolve ambiguities, before resorting to more sophisticated mechanisms. It does become more awkward when there are multiple fields with the same name in the qualified module, however.

3 Likes

Another way to disambiguate record updates (Also kind of a hack, but can be neatly self-contained):

  let env' = (\GetEnv {..} -> GetEnv {end = p', ..}) env

Snipped from a change I did recently to move a library to non-prefixed fields. It requires RecordWildCards and NamedFieldPuns. It’s not type-directed in that case, as we are telling the compiler about the constructor (twice).

It also suggests a possible way out for a future GHC version:

let var' = var @Constructor { ambiguousField = newValue }

It’s only a minor improvement over qualified imports or the lambda trick above, but I think it would be more usable.

1 Like

Also, with (&) from Data.Function you can make the hack almost record-update shaped:

  let env' = env & (\GetEnv {..} -> GetEnv {end = p', ..})

Wait. What? Oh … GetEnv is a data constructor ‘applied’ [**] to {..}. But with a space in the middle and no surrounding ( ).

Thank you Andres and Edsko, but I found your first five minutes scene-setting didn’t point out just how weird is this bit of syntax: not like any other language using { name = val, ... } [***]; not like anything else in Haskell. Compare:

\Just   (1, 2, 3)  -> ...  -- two arguments
\Nothing[3, 5, 7]  -> ...  -- two arguments, even if no space between them
\GetEnv { .. }     -> ...  -- one argument

let j = Just    in foo j (1, 2, 3)     -- referential transparency rules!
let n = Nothing in foo n [3, 5, 7]
let g = GetEnv  in foo g { .. }        -- type error

[**] Possibly not the best terminology. Maybe ‘bound with’? Note that update syntax is if anything, the { ... } reverse-applied to the record expression.

[***] where almost universally, tightfix struct.name is the accessor inside the struct/object.

Using . for compose (and not insisting it be space-surrounded - discussed in the video about 13:30) was always going to end in tears. There’s a 1999 paper from Jones and Jones proposing using some other glyph for compose, so that . is reserved for qualified names and record access.

I am sad that we now have two incompatible Haskells, where tightfix . means either compose (used heavily in Lenses) or record access, with apparently no hope of overloading the syntax for both, even with Type-Directed Name Resolution.

Optics are great for updates.

After the episode I gave records some more thought:


newrec Foo a =
  { name :: String
  , value :: a
  , bar :: Bar
  }
-- new keyword, newrec
-- generates type Foo a, data constructor Foo, and says Foo is a record
-- fields are name, value and bar, there is no function pollution
-- only one data constructor, one type, no need for partial functions

Foo {name = "foo", value = 1, bar = newBar} -- creation

-- getting a value
r {Foo | name}
-- getting a nested value
r {Foo | bar.baz.name}
-- if bar is not a newrec, fail
-- if bar is a newrec, follow the fields
-- no need to disambiguate further, you can follow the record definitions

-- this fails because name is String, and String is not a record
r {Foo | name.bar}
-- this fails because xyz is not a field
r {Foo | xyz.bar}

-- we lost OverloadedRecordDot and HasField in order to disambiguate, maybe we can compensate?
-- returning a tuple with multiple values, can be nested
r {Foo | name, bar, bar.baz.name}
-- returns (name, bar, bar.baz.name)
-- maybe lists or other records/types could be returned

-- other Foo constructor clashes with it? qualify the module
r {M2.Foo | name}

-- multiple updates with nested update
r {Foo | name = "hello", value = 1, bar = newBar, baz.quux.etc = 1}

-- this is illegal, assignment and update? or maybe assign and then update, might confuse user
r {Foo | name = "hello", value = 1, bar = newBar, bar.name = "b"}

-- tired of repeating nesting indexes?
r { Foo
  | name = "hello"
  , baz.a1.b1 = 1
  , baz.a1.b2 = 2
  , baz.a1.b3 = 3
  }
-- same thing
r {Foo | name = "hello", baz.a1 := {b1 = 1, b2 = 2, b3 = 3}}

-- thanks to Foo disambiguating we can find the other data constructors by following the newrecs
-- we search for a Foo in visible newrecs
-- for each field, we got its type, no type checking or inference needed
-- if we need to go deeper, the type of the field must be a visible newrec
-- if it is not, we fail because it is not a record
-- if it is, we got the next data constructor and its fields
-- therefore, ALL the expressions can be written by GHC as nested cases

r {Foo | name = "hello", value = 1, bar = newBar, baz.quux.etc = 1}
=> case r of Foo a1 a2 a3 a4 -> Foo "hello" 1 newBar (a4 {Baz | quux.etc = 1})
=> Foo "hello" 1 bar (case a4 of Baz b1 b2 -> Baz (b1 {Quux | etc = 1}) b2)
=> Foo "hello" 1 bar (case a4 of Baz b1 b2 -> Baz (case b1 of Quux c1 -> Quux 1) b2)

r {Foo | name, bar, bar.baz.name}
=> case r of Foo a1 a2 a3 -> case a3 of Bar b1 b2 -> case b2 of Baz c1 -> (a1, a2, c1)

r {Foo | name, bar.baz, quux.awk}
=> case r of Foo a1 a2 a3 -> case a2 of Bar b1 b2 -> case a3 of Quux c1 -> (a1, b2, c1)

What if they worked like this? This could be an extension.

If you wrote that as

newtype Foo a =
  { name :: String
  , value :: a
  , bar :: Bar
  }

you’d be writing Hugs/Trex 1996, or PureScript, where { name :: String, ... } is a first-class stand-alone type, and first-class corresponding { name = value, ... } expression. But how will it fit for multiple constructors within a type? (Or perhaps those aren’t needed for constructors with many fields?)

Putting | inside { }, with special meaning also features in Trex and in PureScript. (And using | chimes with List comprehensions inside [ ].) See that 1999 paper for a coherent overall design (as an example/survey of the design space, not as a concrete proposal today). Why not this for creation and for update:

{ MkFoo | name = "foo", value = 1, bar = newBar} 
-- ^^^^^ helpfully disambiguates if there are multiple constructors all with these fields
{ r :: M2.Foo | name}
-- ^^^^ avoids a set of parens if r were prefixed

Then we’d at least be denoting a sensible whole expression inside the bracketing.

There are already too many Haskell-alike languages with better record syntax. (Even ML, which had sensible syntax before Haskell was a thing.) I think the design of the extensions you cover in the video suffer from a too narrow view. If you’re stuck in quicksand, first stop wriggling.

1 Like

The idea was to have 1 constructor only for each record so that type and data constructor correlate and you can do nested disambiguation from a starter constructor: with this you can get all the nested fields you want and update all the nested fields you want without having to implement extensible records or use generics or anything special.

So basically we have
newtype Foo a = MkFoo { name :: String, value :: a }
and
newtype Bar a = MkBar { name :: String, value :: a }
are different
and { name :: String, value :: a } is not first class anything.

Looks fine. MkFoo could be the type, too. What’s important is that you know how to desugar into the case.

How did Standard ML tackle deep nested record updates?

Many thanks for your advice.

As a naive user of GHC, what is a little frustrating about GHC-02256 is that GHC is warning that it will become less clever in the future (no doubt for some greater good). I am used to adding things to code to avoid ambiguity, but here it is not ambiguous (GHC 9.8.4 can differentiate).

Despite Stack’s code using lots of records in lots of modules, GHC-02256 only bites unavoidably in ten of its modules.

Stack already uses template-haskell and is built on rio which provides microlens optics. However, Stack uses types that have lots of fields and having to list out their derived lenses in explicit exports and imports is unattractive. optics-core making use of overloaded labels is neat. On the other hand, having two alternative approaches to optics in the same project is also unattractive.

So, on balance, in the case of Stack, I’ve decided to continue to follow Haskell Error Index’s advice.

2 Likes

“You have stumbled upon something difficult: Updating a deeply nested record.”

My personal (very opinionated) opinion is that deeply-nested updates/indeed deeply-nested records impose a cost on implementation, that the programmer should be made aware of by being forced to use verbose syntax. (BTW Elm shares my opinion; but then it’s usually opinionated.)

There’s a strong theoretic reason for avoiding nesting, known as The Relational Model. Suppose, per that SML example, you have person_name nested inside person_bio nested inside employee. Some of those persons might also be customers (but most customers are not employees). So do you nest a duplicate person_name inside a duplicate person_bio inside customer? Then when they have a birthday you have to update age in two places; when they get married you have to update their status (and possibly lname) in two places.

No: you make person_name a top-level datum; give it a person_id field; hold that _id also in person_bio. I think this deeply-nesting business comes from OOP where you have overwriteable object contents; it’s not appropriate for functional languages.

For myself, I don’t use accessor functions; I always use pattern matching; and explicit constructor application for build/update. So RecordWildCards and NamedFieldPuns are useful; NoFieldSelectors keeps me honest; DuplicateRecordFields is essential.

1 Like

Generally I think the records situation is pretty great now*, but I do agree with you on the record syntax with a space being weird. This is why Fourmolu always removes the space by default, on which I was ultimately swayed by this comment years ago from @mboes, which makes the same point as yours:

{}, () and [] are all symbols that come in pairs, but in Haskell that’s where the similarity stops. Unlike (...) and [...], {...} is never an expression. In fact, it’s always just part of a syntactic construct for expressions, like case x of {...}, do {...}, foo {...}. Record updates bind more tightly than applications. It’s an unfortunate design decision that SPJ is on record saying he regrets, but here we are. So it makes sense to avoid any formatting that would suggest that {...} is an argument passed to some function. Or worse, that it’s an extra argument when appearing in a pattern.

* Apart from OverloadedRecordUpdate being half-finished and that stupid warning as a result, as others have discussed. And the fact that discovering the right set of extensions is tricky for newcomers, with GHC2024 being too conservative for my tastes, but at least the video in OP might help with this.

6 Likes

Following a relational model means carrying a list of employee and a list of person and finding each person id from each employee when what I want is the employee names, and flattening all nested non shared inner records (because if they are shared, they turn into an id) into one depth fields. Simplifies the updating but my gets are now joins, and I get to explicitly lug around my “tables” when I was just working with employees.

And as you add more and more and assumptions change you need to “renormalize the tables”. You had a player with health, now you have an enemy with health, and now you have entity with health and an id pointing to (player | enemy) lest you risk nesting.

I suppose it is indeed an OOPism to package your data in certain ways to work with it.

If we’re comparing apples with apples, you’re working with (“lugging round”) a list/dataset of employees. Suppose you want to find employee named “John Doe”: there might be shorthand syntax for employee.person_bio.person_name.lname; that’s really more work for the machine than scanning list/table person_name direct to get an _id.

Sure, schema design isn’t easy; and you need to anticipate changes to requirements not yet surfaced. This example has split the data into three nested layers. Is that design decision not the same as normalising the tables? You get a ‘nose’ for it. Also:

In the Relational Model, joins are the norm, they’re super-easy. Haskell is allegedly a world-class programming language; why are joins so hard? Purescript has a Union class (thanks to @rhendric for pointing that out); in Hugs/Trex it wasn’t too hard to produce a record-append aka merge that is well-typed – using no more than Hindley-Milner + type classes, + FunDeps, + Overlaps.

GHC can’t keep up with an approach developed in the 1970’s, nor with a Haskell that’s gotten no love in the past 20 years?

And yet GHC continues to put effort into nested updates, as a ‘poor imitation’ of OOP. (I don’t think all the effort towards Dependent Typing will help either; though I’d love to hear otherwise.)

1 Like

…and since that alternative for . apparently went nowhere:

Translated into Haskell:

infixr 9 `o`
o :: (b -> c) -> (a -> b) -> a -> c
(f `o` g) x = f (g x)

…since no-one here seems to have any great problem with “infix-quoting” the likes of div, mod, divMod, quot, rem, quotRem, et al.


Since you’ve apparently forgotten how this topic ended:


From FOLDOC:

Enjoy.

1 Like

Ok, ok. Here’s a more purely-inside-Haskell analogue:

  • List append (++) appears to be an inoffensive operation; and since Strings are Lists, you’d innocently build up a longer String by appending pieces of the message.
  • In a language with mutable objects, you’d append by merely poking the nil at the end of the l.h. arg with a pointer to the start of the r.h. arg. We’re not changing any of the cell contents, so no harm done.
  • We all know it ain’t that simple. Haskell doesn’t have mutable objects; (++) iterates down the spine of the l.h. arg (every cons cell) to create/copy a new List differing from the original only in the last node.
  • The performance hit is sufficiently bad, there’s a smarter shows in the Prelude for building up such messages.
  • Similarly, nested update to a record superficially looks like it only pokes one field value.
  • Realistically, it rebuilds-by-copy the whole structure from the top down. There’s no hope of an equivalent to shows, because Haskell data types are flat/vectors of fields, not linked lists.
  • Do deeply-nested updates negate the efficiency gains from flat vectors of fields?
1 Like
  • Do deeply-nested updates negate the efficiency gains from flat vectors of fields?

If the compiler can see that the old value(s) aren’t retained it can just perform the update in place. You can ensure that the old value isn’t retained with Linear Haskell.

2 Likes