Couldn't records be solved entirely at the type level?

1. The problem

I think records are my only great annoyance with Haskell-as-a-language at this point. A decade of work has been poured into figuring out a good solution to extend the record syntax for ADTs specifically, which comes with its own host of problems because ADTs are immutable and have sums in them. At the same time the language does not bother providing any support for those same updates on mutable datatypes, which is pretty much irreplaceable in low-level programming.

2. A solution

Libraries like vinyl present a different approach, storing all record data at the type level. The implementation can then be defined over a specific datatype as one sees fit.

2.1. Single-level

After prototyping a bit and throwing out all the extensibility stuff, it seems like that approach can be reduced to a very lean type-level core:

data Field a = Symbol := a

type Fields a = [Field a]

class Add (k :: Symbol) (v :: a) (xs :: [Field a]) (ys :: [Field a]) | k v xs -> ys

class Find (k :: Symbol) (xs :: [Field a]) (v :: a) | k xs -> v

-- The following two are only needed for implementing records over arrays

class Length (xs :: [Field a]) (n :: Nat) | xs -> n

class Index (k :: Symbol) (xs :: [Field a]) (n :: Nat) | k xs -> n

Which then makes it trivial to define, for example, an array record that can be used in both immutable and mutable forms (mimicking Data.Primitive.SmallArray):

newtype MutableSmallArrayRec m (xs :: [Field Type])

newtype SmallArrayRec (xs :: [Field Type])

-- Creation functions are omitted, but there are type-safe ways to do so.

freeze :: Length xs len => MutableSmallArrayRec IO xs -> IO (SmallArrayRec xs)

thaw :: Length xs len => SmallArrayRec xs -> IO (MutableSmallArrayRec IO xs)

assign :: (Find k xs v, Index k xs n) => v -> MutableSmallArrayRec IO xs -> IO ()

access :: (Find k xs v, Index k xs n) => MutableSmallArrayRec IO xs -> IO v

index :: (Find k xs v, Index k xs n) => SmallArrayRec xs -> v

A JWT, which is a JSON object, could be expressed as a type-safe dictionary in a similar manner.

It’s even generic enough that it can be used to fix the annoying Storable typeclass:

data Offset = Offset Nat Type

class Allocable struct where
  type SizeOf  struct :: Nat
  type AlignOf struct :: Nat

  type FieldsOf struct :: [Field Offset]

newtype PtrRec (struct :: Type) = PtrRec (Ptr ())

assign
  :: ( Find k (FieldsOf struct) ('Offset offset v)
     , KnownNat offset
     , Storable v
     )
  => v -> PtrRec struct -> IO ()

access
  :: ( Find k (FieldsOf struct) ('Offset offset v)
     , KnownNat offset
     , Storable v
     )
  => PtrRec struct -> IO v

2.2. Nesting

Dot-notation can be added on top completely separately:

class DotNotation (k :: Symbol) (ks :: [Symbol]) | k -> ks

Then custom type classes can be used to chain individual segments.

For the FFI level array support could be added as well: consider foo.bar[3].honk where "bar" := CArray 'UnknownSize Bar (or perhaps 'KnownSize 4).

3. The issues

3.1 Type families

Recursive type families are atrociously slow, but functional dependencies can be used instead to do the exact same thing, so this doesn’t need to be solved for this to work.

3.2. Lists

The big problem is that fields are described as a type-level list. This means symbol lookup and equality checks are slow, plus duplicates have to be manually pruned on insertion.

Within Haskell’s type-level system as is this appears unsolvable (as in “too painful to implement”), but I think a temporary solution could be a part of base. Consider a type-level dictionary, somewhere in GHC.TypeFields or similar:

data Fields a -- UTF-8 ordered dictionary from `Symbol` to `a`.

type family Empty :: Fields a

-- Fails on duplicate insertion.
-- Should only allow a subset of characters for coherent nesting.
type family Add (k :: Symbol) (v :: a) (xs :: Fields a) :: Fields a

-- Fails when no such symbol exists
type family Find (k :: Symbol) (v :: a) (xs :: Fields a) :: a

type family Size (xs :: Fields a) :: Nat

-- Fails when no such symbol exists
type family Index (k :: Symbol) (xs :: Field a) :: Nat

3.3. Nesting

Notation slicing functions are way more complex than list traversals, so baked-in type families would be far more efficient than a complex network of symbol conses, unconses and appends.

4. Conclusion

This feels like it should’ve been a GHC proposal made in 2018 by someone far more knowledgeable than me. It does not need any new Haskell extensions, only a small handful of helper functions in base, which some day would be subsumed by a more powerful general approach to type-level programming. It can be freely extended by anyone willing outside of base and It incurs no runtime overhead.

I wish I could make a plugin to see if I’m wrong on this, but I don’t know if an entire type-level dictionary can be implemented that way.

6 Likes

I think your proposal would be clearer if you added examples and compared to the existing OverloadedRecordDot approach.

For example, it is not clear to me whether your proposal allows for anonymous/extensible records. You say you throw out all the extensibility stuff from vinyl, but you do still have an Add type family. This is confusing to me.

Your idea of adding a type level map also sounds a lot like row types to me. Have you seen that proposal: Add Structural Types (Row Polymorphism) by jvanbruegge ¡ Pull Request #180 ¡ ghc-proposals/ghc-proposals ¡ GitHub?

2 Likes

Oh great, so it actually is a thing from 2018.

I forgot that the dictionary would need some way for the user to write it (lists have their own out of the box). That proposal is thus quite close to what I’m looking for.

I guess I now get to be annoyed that this has been in the hopper for six years with no movement, even though any working prototype would be small, self-contained and immensely useful.

I’m just using Add as a construction mechanism, because using cons would mean figuring out order when comparing lists. I expect type-level dictionaries to be ordered, so this wouldn’t be a problem.

I don’t care about extensibility in records because I’m used to the regular syntax not having it either. Indeed it would be possible to use an existing record as a foundation for a new one, but that’s fluff.

Anonymous records aren’t really a part of the types here, rather things can be constructed anonymously:

newtype RawRec (total :: Fields Type) (current :: Fields Type)

nil :: RawRec ys Empty

cons :: (Insert k v xs xs', Index k ys n) => v -> RawRec ys xs -> RawRec ys xs'

new :: RawRec xs xs -> Rec xs
2 Likes

They could, the question is what compromises you make in terms of ease of use, performance and maintainability of implementation. For example, you can make records type safe by enforcing:

-- new keyword, record, kinda like newtype
record MyRecord1 a = MyRecord1 { name :: String, thing :: a } -- good
record MyRecord2 a = MyRecord2 { name :: String, thing :: a } | Foo { name :: String } -- bad, only one constructor allowed

and enforcing total field initialization when constructing them.

Can you give an example of how using records with your proposal would look like? And can you give an example of how would you imagine ideally using records and with what ideal notation? Would you allow DuplicateRecordFields, OverloadedRecordDot, OverloadedRecordUpdate?

You should, uh, have a look at PureScript—not as a solution to all your problems (there are several divergences in design from Haskell, most notably based-on-JS-ecosystem and strict-by-default, that prevent it from being a drop-in replacement for most people), but as a working ‘prototype’ (over a decade old at this point) of row types used to do exactly this, with dot syntax on records, primitive classes for type-level row operations, etc.

(And, because the grass is always greener elsewhere, having solved compiler-supported row-type-based product types, I find it nigh-intolerable that PureScript doesn’t yet have good syntax for working with variants, a.k.a. compiler-supported row-type-based sum types. The language maintainer’s work is never done.)

3 Likes

Here’s one for immutable arrays, extrapolating from my topic examples and my previous comment (note that this uses TypeApplications):

type This =
      '{ "foo" := Char
       , "bar" := Bool
       , "baz" := Float
       }

this :: SmallArrayRec _ -- Known to be equal to `This` based on type resolution
this = new
         . cons @"baz" (5 :: Float)
         . cons @"bar" True
         . cons @"foo" 'a'
         $ nil

result :: _ -- Known to be Bool based on type resolution
result = index @"bar" this

All of this is type-safe. The mutable counterpart works exactly the same and can be converted to and from the immutable variant through copying. Furthermore all of this can be implemented using newtypes, so under the hood either variant eagerly inlines to direct array access (so runtime performance shouldn’t be much different from regular records).

Ease of use suffers slightly because RecordWildCards won’t magically work with this, but an extension to allow this could be added later if deemed worthy. Maintainance-wise, as stated, all functions can live in a separate module and all the datatype implementations don’t even need to be in base, so the only thing to bikeshed here is type-level record notation.

Duplicate fields are supported with this by default, because symbols don’t have the same restrictions as regular record fields. Similarly there is nothing to overload in terms of notation: any characters can be used for nesting, dot is assumed because that’s what every other language uses.

Indeed there is knowledge to be gained there, but my main point is that this doesn’t need any complex proposals. The most barebones implementation should slot nicely into the language as is and it should be enough for most of what’s needed from it.

I thus disagree that the grass is always greener because the Haskell lawn currently only features dirt.


Also I don’t see a big need for sum types in this part of the language because ADTs already do a great job at that.

I see how you can map current record notation to it.
How does it look with deeply nested fields, plain updates, deeply nested updates and multiple updates?

type This =
      '{ "foo" := Char
       , "bar" := Bool
       , "baz" := Float
       }

type That =
      '{ "foo" := Char
       , "bar" := Bool
       , "baz" := Float
       }

type Thing =
      '{ "foo" := Char
       , "bar" := Bool
       }

-- this gives ambiguous errors?
f = new
    . cons @"baz" (5 :: Float)
    . cons @"bar" True
    . cons @"foo" 'a'
    $ nil

-- this gives ambiguous errors?
result r = index @"bar" r

For the code snippet, they’re all just types, so you’d get

f :: Rec '{ "bar" := Bool, "baz" := Float, "foo" := Char }

result :: (Find "bar" xs v, Index "bar" xs n) => Rec xs -> v

Single-level updates don’t make sense in the immutable version, you’d need to convert it to a mutable record, apply the changes and convert back. Or, if you’re writing imperative code you can just keep mutating the mutable version directly.

Nesting, as said, can be a completely separate thing, because updating "foo.bar.baz" is the same as accessing "foo", then accessing "baz", then updating "bar". This will be different for every single type, e.g. for FFI nested anything merely requires adding offsets together.

You are right, thank you.

I was thinking of composing index, cons and new to return a new immutable record like how it’s done with current records.
Basically reimplementing

rec.foo.bar
rec { foo = a, bar = b }
rec { foo.bar.baz = a, bar = b, foo.quux = c }
rec { foo = a, ..}

in this notation to see how ergonomic it is and the errors it gives in different scenarios.

1 Like

; > laughs maniacally <

2004 HList. (See the pdf from that link.) Originally used FunDeps, now upgraded to use TFs. So not just a proposal but an implementation. I don’t find enough in your sketch to tell whether it’s significantly different to HList.

Yeah. Perhaps there’s a reason those “more knowledgable” maybe wouldn’t agree that “this doesn’t need any complex proposals”.

One problem I don’t recalling seeing mentioned very often is that if all the fields of a data type are exposed via names, the type isn’t abstract - all those names can then be used to make public duplicates of values with the original data type.


Others have made similar laments e.g:

…which I have commented on here:

…with this being my final post there:

I disagree heavily here. That proposal’s goal is to alter ADTs, which are a fundamental part of the language, into something quite outlandish with far-reaching implications. It wouldn’t even solve the issue I have, since binding that monstrocity to an existing chunk of memory would be impossible.

Using mutable things unsafely across threads is a design decision that users can already make. This proposal would merely allow people to use mutable things as records, that’s it.

I’m requesting a single new type-level type with a shred of new syntax that could be reneged upon later if it’s implemented as an extension. It, in my estimation, would allow for things that are otherwise excruciatingly painful to do in the language. If this is too tall of an ask to even entertain in a research language, then I think something is amiss.

2 Likes

Why should it? In a future parallel-by-default Haskell, trying to use any shared mutable data risks either:

  • queueing up all those parallel calls to each have their turn to modify that data, making said parallelism practically useless;

  • or letting all all those parallel calls occur in any random order, which incurs non-determinism.

So why add another (mis)feature which only “works” for sequential Haskell?

This entire discussion is about type-level dictionaries, it has nothing to do with the execution semantics of resulting data implementations. Yes, you would be able to create a mutable array with a type-level dictionary attached, use it across multiple threads with no synchronization and thus get data races; this is in no way different from what currently exists, where you can create a mutable array with no dictionary and do the exact same thing.

Your initial post says

Mutability surely is to do with execution semantics?

All data in Haskell is immutable, and always has been. (Well OK there are MVars and icky low-level stuff using unsafes: they’re not really part of the language/they’re snuck inside IO.)

The “decade of work … poured into figuring out a good solution to extend the record syntax for ADTs” isn’t aimed at changing anything about execution semantics.

So I’m not seeing why this is a ‘problem’ in Haskell. We just don’t do “low-level programming” – except by callouts to low-level interfaces from within IO.

My motivation for creating this topic is that the current record system is insufficient to tackle certain types of problems. I choose to express it through mutable records because I enjoy low-level programming, but if I enjoyed different things I could as well have done this with immutable arrays (coming from problems with compiling large ADTs, hence the existence of packages like large-records) or pure dictionaries (which at times could use the same record guarantees, e.g. in JWTs or when parsing the Vulkan spec).

On top of that I find the idea of nested record updates for immutable data weird, something I’ve expressed in one of my comments on New notation for records?, so juxtaposing the two makes the most sense in my mind.

And, of course, there’s the fundamental issue that not all C structs are properly convertable to and from ADTs due to potential ambiguities and bit fields, so there simply is no existing solution for that issue.

1 Like

I take this opportunity to shill inline-c, for lurkers.
And a recent example.
Love this package.

1 Like

I will most probably never touch that library. Clumsily parsing C (can’t do it any other way) in Template Haskell and then slapping the words “seamless” and “No FFI” is just too much for me.

1 Like