[RFC] Mutable records as a GHC extension

For clarity I won’t be using the phrase “immutable records” in this topic. Only “ADT products”, “field labels” and “record syntax”.

1. Status quo

Through the years there have been numerous attempts at implementing mutable records separately from GHC (see e.g. this Reddit thread from 2018). All of them have ultimately failed, as Haskell’s type system on its own is too verbose to make such a feature look good and too weak to support it properly.

The need for mutable records has however never left. The community chose to emulate them using ADT products, a task I argue they’re not fit for, leading to several curious outcomes:

  1. Complaints about laziness as a norm.

    The complaints are never about it being hard to use laziness, they’re about not being able to escape it. Everyone is signed up to the notion that they have to put their application state into an ADT product, at which point a single failure to fully evaluate the next edition of their entire state snowballs into a memory leak.

  2. Partial ADT product updates as a feature.

    ADTs cannot be updated in place, they’re constructed anew each time, right? Why do we then use x { a = b } and not C { a = b, .. = x }? I have no idea how to reconcile nested updates in OverloadedRecordUpdate with this, and this is currently the community-consensus solution to proper record syntax.

  3. Allocation cost for copies of ADTs as the cost of doing business.

    Yes, I get it, it’s a bump-allocating garbage collector, allocations are dirt-cheap; no, I shouldn’t have to waste space and time duplicating references to things I’m not interacting with in the moment.

2. IORefs

The closest we can get to mutable records in current Haskell is an ADT product of IORefs. This allows accessing each element independently, with addresses each of the problems outlined above. It’s also obviously a terrible approach, as it requires allocating a number of IORefs linear to the number of values.

The documentation to MutVar#, which backs IORef, says

A MutVar# behaves like a single-element mutable array.

A lot of people know about mutable arrays of values through vector, but in fact GHC supports these directly. There is however no nice way to access such arrays heterogeneously in today’s Haskell.

3. Mutable records

Consider the following module, taking both function names and behaviors from primitive#SmallArray (which in turn took from GHC.Exts):

module Data.Mutable where

  -- Kind of mutable records
  data MutableLayout

  -- Instances for this type class are only created by the compiler.
  class HasMutableField (x :: Symbol) (r :: MutableLayout) a | x r -> a

  data Immutable (r :: MutableLayout)
  index :: HasMutableField x r a => Immutable r -> a

  data Mutable (r :: MutableLayout)
  read :: HasMutableField x r a => Mutable r -> IO a
  write :: HasMutableField x r a => Mutable r -> a -> IO ()     
  thaw :: Immutable r -> IO (Mutable r)
  freeze :: Mutable r -> IO (Immutable r)

  -- More variants, e.g. MutableST, can be added below.

We can then write a declaration like the following, mirroring a declaration of an ADT product with field labels:

{-# LANGUAGE MutableRecords #-}

mutable Con a b =
  Con
    { a :: a
    , b :: !(IntMap b)
    , c :: {-# UNPACK #-} !Int
    , d :: Mutable Foo
    }

This declaration would:

  • Add type Con :: Type -> Type -> MutableLayout to the type-level;

  • Add a constructor Con that serves as a record pattern synonym for
    Immutable (Con a b);

  • Have GHC infer the internal data representation for this mutable record;

  • Create a HasMutableField instance for each field, describing how each field is to be accessed.

Using the constructor and thaw is thus the only way to create a new mutable record, which can be read from and written to in IO, and converted back to immutable using freeze.

Caveats:

  • To support unpacking fields [Im]mutable will have to be backed by a special array that combines [Mutable]ByteArray# and Small[Mutable]Array#. The data layouts for both are rather simple, this shouldn’t be a problem;

  • Using record syntax to partially update an Immutable should not be allowed, as this can be done directly using mutation. This means that x { a = b } can never refer to an Immutable pattern synonym, simplifying implementation;

  • Record pattern synonyms are not a GHC feature at the moment, but I assume making GHC create special synonyms for this occasion is a separate, simpler, topic;

  • Nested record update would require a separate type class
    (e.g. nestedWrite @'["foo", "bar", "baz"]), which could be simplified further with a magic type family that breaks Symbols on periods
    (nestedWrite @"foo.bar.baz");

  • How GHC chooses to store the fields in a mutable record is an implementation detail that should not be exposed to the user. This feature is not here to solve FFI interoperability;

  • These records are not extensible, I’m not opening that can of worms.

And that’s it. Every part of this seems to already exist in GHC today, doesn’t seem like it would require any massive changes, it’s well-contained in a module and an extension.

4. GHC proposal

There’s only one issue here: the claims I’m making in the first section are quite outlandish. Virtually every real-world program out there is structured as described and the conclusion I’m ultimately drawing towards is that the syntax for x { a = b } should be deprecated (it’s in the Haskell98 report, section 3.15.3).

It is thus important for me to see what community’s views on this are and whether mutable records are viewed as a necessary, or at least interesting, feature. Should this gain enough traction I’ll expand it into a GHC proposal.

4 Likes

What are your thoughts on the prior proposal? It’s been a long time so I haven’t re-read it. Wondering how much it is/isn’t aligned with your suggestions.

I assume you’re talking about the Mutable Constructor Fields proposal.

What it wants to do probably makes sense in terms of the garbage collector, but the result is a monstrous notion of a half-mutable ADT with all sorts of weird edge cases. It’d put it in the same category as partial ADT product updates.

I instead want to see a clear separation between ADTs, which are always immutable and by default lazy, and mutable records, which we may well want to be strict by default. The ability to thaw/freeze a mutable record only exists as a convenience for construction and teardown, and should not be extended further.

1 Like

It’s a well written proposal, and I’m interested in hearing from the experts who need all the mutability stuff purely as an interested bystander.

The stuff about g = f { x = a } , however, struck me as a non-sequitur. Sure, it has always been a pain to write for nested types, but the meaning has always been clear to me. “g equals f except/where x equals a”. I’m probably just missing something, but it seems unrelated to introducing mutable records. Your alternative says the same thing in more words and doesn’t seem any clearer. Can you clarify how it relates to the rest of the proposal?

6 Likes

There’s a distinction here between independent values grouped together for ease of access and a well-formed set of interdependent values. In my mind mutable records are for the first case and ADTs are for the second. Mutable records are updated; ADTs are constructed. The syntax should reflect this mental model.

There also other benefits to the alternate formulation, e.g. it would avoid the confusion caused by -Wambiguous-fields (see e.g. topic #9545), and it would allow feeding fields from other types (a better equivalence is
let Foo { .. ] = x in Bar { a = b, .. }).

I agree with your distaste for that syntax. I especially dislike it’s called ‘update’ syntax, because there’s no update going on in the imperative programming sense of overwriting a location. ‘rebuild shorthand’ might be better terminology. I always use the explicit constructor syntax, which works reasonably smoothly with the .. RecordWildcards.

I agree the HasField and friends/‘update’ mechanisms are just too perplexing to be worth using. Indeed auto-naming a function same as a field (as in H98) runs counter to the drive to eliminate punning from other parts of the language.

What’s more perspicuous in the earlier proposal’s data decl is that a mutable structure can appear only within IO (or ST).

I guess for record access specifically the coherent approach would be only have access to field labels once a relevant constructor has been scrutinized, but then that creates a non-type that is “the record inside the constructor”, which can’t be passed to functions. This explains why NamedFieldPuns and RecordWildCards extensions are structured the way they are.

Perhaps it would make sense to think of record syntax as a special type, much like MutableLayout of this proposal, so that it can be repackaged into a separate type if passed to a function, otherwise the reallocation is somehow elided. [After thinking about it some more this would introduce way more problems than it would solve]

Actually, the record doesn’t need a special type, it could be represented by a newtype over the original type with the relevant constructor name on the type level. Have it be only obtainable by pattern matching with a special extension, have getField retrieve the relevant field unsafely.

Anyway, this proposal isn’t about changing the record syntax for ADTs, we shouldn’t focus this hard on that.