About type sharing (mostly) and records (desugaring, mostly)

Damn, that’s depressing. Haskell is good in many areas so I guess I’ll keep going. On the other hand, without anonymous records, I’m forced to use ugly workarounds with “so-called” safe undefined protected by fancy type machinery. Or complex type families constructs to simulate “subtypes” of a type (i.e. switch on or off fields through a Maybe Void or () type). This is verbose and unnecessary. But then I only have very limited time to offer and the areas in which haskell needs help are numerous and growing so… Maybe i’ll end up considering your suggestion, time will tell.

PS: it seems large-anon is doing a great job at filling my needs for now though, after a few experiments I find it very impressive. I’m a bit wary of having to rewrite huge chunks of code when ghc gets a baked in implementation

You could also charge ahead and use your struggles and learning to contribute upstream! GHC evolves and evolves and evolves.

1 Like

GHC has decided to put all its efforts into Dependent Haskell. Having done that, they’re discovering DH needs a great deal more headscratching than Eisenberg’s thesis would suggest. So they’ve delivered … approximately nothing. AFAICT DH won’t help one jot with an anonymous/extensible record design.

Then I don’t think you’ll be faced with rewriting anything for at least ten years. If GHC evolves, it’s so glacial, I’m failing to notice. (Written bitterly by someone who was specifically invited to spec up amendments for a better-principled FunDeps/overlap to support an anonymous/extensible records system – which I was reluctant to do because I expected the effort to be pointless – and … I didn’t even get the courtesy of an acknowledgment.)

Well, you’re in good company:

  • twenty years ago, Robert Ennal’s research into optimistic evaluation looked like it was “going mainstream”…but it didn’t: presumably the promising initial results from his prototype didn’t carry through to production.

  • More recently, Peng Li’s research into lightweight concurrency, while being integrated into Lighthouse (derived from the House OS), could never quite match the performance of the current approach GHC uses: after years of being kept in sync with the regular one, that particular branch has also been abandoned.

Perhaps it’s due to the limits of the dial-up Internet connection I was using in 2003, but I don’t recall Ennals making much of a hullabaloo about his work going nowhere: he just went elsewhere! As for Li, I vaguely remember some unhappiness being expressed somewhere (I could be wrong), but he moved on as well.

They both had invested much more time and effort into their respective research, so I would have thought they would have far more to complain about…but apparently they have chosen to now remain quiet. Why they have made that choice is a question I have no answer to: feel free to ask!

Your repeated grudge against DH in this forum is noticeable to many newbies including me already. Perhaps you could rephrase it to something constructive that we could do something about in another post, instead of poisoning this thread?

1 Like

The large-anon guys are saying that the core generated for HasField instances are quadratic. (Also for instances over type-level lists – which derive ultimately from the HList 2004 work.) HasField is not H98; type-level lists are not H98; HList relied on beyond-H98 features like overlapping instances and FunDeps. So I wouldn’t expect H98-compliant code to suffer the quadratic explosion, hence why it’s not “mentioned in the Report”.

To conform with H98, each field name gets declared only once; so each accessor function is accessing only one record type. So (maybe I’m being too naieve) I’d expect code size to be linear in the number of fields ceteris paribus (as we Economists like to say).

And your sample code hasn’t declared any instances at all – not even ... deriving (Eq, Show). So I don’t think it’s going anywhere near what large-anon is talking about.

OTOH you don’t say what extensions you switched on. So perhaps the code is using beyond-H98 features and they have generated ‘secret’ instances? (And sorry, I can’t read that desugarring, but are you saying the code is quadratic?)

So do we suffer quadratic explosion with extensible/anonymous records? I’m asking/I don’t know. Of course depends what choices we make for the design. I’ll take Hugs/TRex for definiteness – not because I’m claiming it’s “better” or “more suitable for Haskell” than other designs.

An anonymous record type (well ok I’m giving it a type synonym for easy mention)

type Test = Rec( fieldA :: Int, fieldB :: Double, fieldC :: [Int] )    -- is shorthand for
--  Rec( fieldA :: Int | ( fieldB :: Double | ( fieldC :: [Int] | EmptyRow ))  -- nested

type TestMore = Rec( fieldB :: [Test], fieldA :: Test)    -- not your TestMore example
--  Rec( fieldA :: Test | (fieldB :: [Test] | EmptyRow ))

-- an instance would look like (I'm omitting some implementation detail)

instance Eq EmptyRow  where ...              -- base case empty rec

instance (Eq a, Eq rest) => Eq (Rec( fieldA :: a | rest ))  where ...
instance (Eq b, Eq rest) => Eq (Rec( fieldB :: b | rest ))  where ...

So there’s one instance for each distinct field name across all possible field types. There’s no instances for all the possible combos of field names. So linear in the number of field names? I am worried (looking at what the large-anon folk are saying) with that recursive instance Eq rest =>. Because that looks like what happens with type-level lists.

Now here’s the trick with Hugs/Trex: although the types look like they’re nested, the implementation turns the record values into a ‘flat’ vector of fields. What’s more puts the field names in alphabetical order, irrespective of the order of appearance in the program source. That’s what the Rec( ) construct is doing. So this doesn’t suffer the problem with the values corresponding to a type-level list that there’s cons-like constructors all the way down. (Something else the large-anon folk are worried about.)

(meta: I hope this speculation is an example for @travers as to where I’m going/what I’m thinking about. I feel DH has nothing to say here, but please correct if I’m wrong. Oh, and it was you hooked me into this discussion by linking to one of my posts/threads.)

1 Like

I feel DH has nothing to say here, but please correct if I’m wrong.

Hrm:

So just stop the hubbub.


Oh, and it was you hooked me into this discussion by linking to one of my posts/threads.

I’ll ask the moderators if it’s possible to exclude “your posts/threads” from appearing in search results - that way, you being “hooked” into future discussions will be the result of someone seeking out that content intentionally

There is a possibility here for improvement: because they are just another notational addtitive (like do), Haskell 2010’s record syntax - with sufficient cunning - could be redefined in terms of another, more comprehensive record system:

  • Old code using record syntax continues to work as before;

  • New code can make direct use of the new record system.

Any library-backed record implementation which can provide such a redefinition must surely be a potential candidate for a new Haskell addendum (like hierarchical-module name and FFI support were).

That’s a brave claim without an actual proposal in front of us.

That’s what HasField is trying to do – and without even a “new record system”. And it’s hard to retain backwards compatibility: if a module using HasField is imported into a module that doesn’t (or vice versa); if the modules declare same-named fields. And turns out HasField gives this quadratic explosion in instances.

purescript's record decls look superficially like Haskell’s, but with incompatible semantics. Again if a future Haskell were to import a purescript style record into Haskell (or v.v) with same-named fields; the accessor’s instances must get overloaded with different semantics. Now try exporting a function with that accessor call embedded in its definition.

So Hugs.Trex uses distinct syntax for its records. (Which caused Mark Jones consternation at the time.) There’s no attempt to share anything: you can have Trex fields same-named as H98 fields, but you can’t export those Trex accessors; you can’t use the Trex accessors on H98 data structures (type mis-match).

I don’t see anybody talking about merely a library-backed anything. Everybody wants new syntax (or to repurpose existing syntax), with special magic between the syntax and object code. We might arrive at the get code being merely library calls (overloaded per record type); set/update (esp type-changing) seems a lot nastier.

I’m wondering if this talk of quadratic explosion is rather too much ‘the sky is falling’. To take the initial example:

This is generic code. (IOW I’m disagreeing with the authors that’s come from “simple Haskell code”. The original code was short; if you’re gonna write highly abstract code that’s so amorphous, expect to pay for it.) So we don’t know what instance of Functor (Applicative) we’re using; hence all those type arguments. At a usage site, apBaseline appears supplied with known types; the overloading gets resolved; the type arguments get optimised away(?)

Turning to HasField, field access must use a field name that gets taken as a ‘literal’ type-level String; the usage must appear in a context where the data type must be resolvable syntactically (from the appearance of the data constructor, or a type annotation). So there’s no generic code getting planted(?)

The ‘Well Typed’ blog post goes on to consider HList examples. And that seems less tractable, because there’s a HCons constructor all the way down the spine, each appearance at a different type. Hence (I’m not saying anything new here) HList is(/was in 2004) a neat proof-of-concept for a record-alike system; it’s not realistic for ‘industrial strength’ records.

That’s a brave claim [about record syntax…]

Then go and ask the authors of the Haskell 2010 Report about what the intended meaning of this earlier post’s quote (from page 62 of 239 of said document) should be…

That bit’s unchanged since the '98 Report. Although there might be still around some people spatio-temporally contiguous with its authors, I very much doubt they’ll remember the intent at the time. We’d better try to interpret the words as given:

A constructor with associated field labels may still be used as an ordinary constructor; features using labels are simply a shorthand for operations using an underlying positional constructor.

So in purescript, that bit up to the semicolon does not hold: if your constructor is declared with field labels, you can only build values or access components using field labels; there’s no (visible) ‘underlying positional’ anything.

In SML, per my description in that earlier thread records are “not tied to any data type/not needing a data constructor prefix”. No ‘ordinary constructor’/no constructor at all. So again, you can only build or access record types using labels. There is a positional structure under the hood in the implementation; it’s abstracted away/you can’t access it directly from program code.

Hugs.Trex is very similar to SML: there’s no constructor [**] – that’s what it means by ‘anonymous’. (See my example decls above.) Some label fieldC might or might not appear in any particular record type. Even knowing that under the hood records are normalised to alphabetic sequence of field labels doesn’t help without knowing what other labels appear.

[**] Or you could argue there’s the same constructor Rec( ) for all record types. Then you’ll have to say it allows a variadic number of fields, with any combination of field names. So it ai’n’t a H98 constructor.

IOW without an actual proposal for records in front of us, we can’t presume in terms of “just another notational addtitive”. Just like (say) MultiParam Type Classes or FunDeps/Type Families or Scoped Type Variables can’t be ‘translated’ as just another notation for H2010: they express something beyond the semantics of H2010.

The H2010 report doesn’t help with what the meaning of (say) duplicate field labels ‘should be’. It merely says they’re not allowed in H2010 “A label cannot be shared by more than one type in scope.” The report isn’t trying to guess the future of Haskell. Heck it didn’t even describe de facto Haskell as supported by the two main compilers at the time it was published.

…hence the term “record syntax”. Your long list of record systems and their collective unsuitability for use with that syntax only excludes those systems - it isn’t a proof that a record system will never support Haskell 2010/1998 record syntax in a backwards-compatible way.

What that long list does show is just how stubbornly difficult it was (and still is) to find a solution to “the records problem” that’s:

  • suitable for Haskell, now and into the future,

  • and most people can at least tolerate, including people with prior experience with one or more of those systems from your long list.

I’m a lot puzzled by this response:

I mentioned three programming languages, of which two are very firmly in the Haskell stable. Here’s an example long list. (A reasonable sample, but by no means comprehensive. Note Pascal had records from the get-go 1970.) BTW I think records for non-functional languages are entirely suitable to compare. We don’t want mutable data structures; but we surely do want the ability to return a new data structure mostly copied from a previous ‘version’.

purescript's data syntax for declaring record structures is exactly the same syntax as Haskell’s. What it allows is same-named fields appearing in different datatypes – so it’s the semantics that’s different.

SML’s syntax if you’re going to put the record inside a data type with a constructor prefixing then is exactly the same syntax as Haskell’s. Again there’s no restriction on same-named fields. You can also use records stand-alone, again possibly with fields same-named as those inside a data.

Hugs.Trex was forced to use different syntax so that its records could co-exist with H98 data in the same module. I find that a dubious motivation – since Trex field names are in a different namespace vs H98. I think it could be arranged to choose module-wide H98 vs Trex semantics over the same syntax; including export/import of the different datatypes. (But you’d need namespace control to access Trex records from H98 semantics or v.v. – I have something of a prototype achieving that.)

As at early 2000’s a lot of people were voiciferous they could barely tolerate H98 records. A fair few of them were coming from SML. I see some have returned to SML. Anyhoo, many have abandoned Haskell.

That’s talking about design decisions up to the H98 standard.

I agree that as at H98 it would have been unwise to commit to a relatively unexplored design. 25 years have passed and … the field for design is still open. But nobody developed any robust designs, meanwhile a (by now) insurmountable fatberg of legacy code uses the allegedly ‘stopgap’ approach.

I find the emphasis on syntax curiously conservative for Haskell: most Haskell extensions deliberately also exploit additional syntax. Sometimes the old syntax can be treated as a special case of the new (MultiParam Type Classes take in H98 classes as a special case).

Yep all records proposals amount to that. The question is whether to expose the underlying positional access. And the clamour for labelled fields was precisely because that’s bad practice as soon as you have more than a couple of fields.

As a pedantic note, it actually isn’t the same syntax, because in PureScript you can do something like this (source):

author :: { name :: String, interests :: Array String }
author =
    { name: "Phil"
    , interests: ["Functional Programming", "JavaScript"]
    }

A PureScript declaration like data Person = Person { name :: String, interests :: Array String } may superficially look like Haskell, but it’s parsed differently — really it’s the equivalent of Haskell data Person = Person (String, [String]). Consult the differences from Haskell document for more information.

Alright, let’s look at how well your “insufferable-groaning” technique is working…


Other posts and threads by you about Haskell’s record syntax:

  • https://discourse.haskell.org/t/well-typed-blog-anonymous-or-large-records-with-overloadedrecorddot-and-overloadedrecordupdate/5939/2

  • https://discourse.haskell.org/t/haskell-records-compare-standard-ml/5933 (36 posts)

  • https://discourse.haskell.org/t/using-haskell-for-commercial-data-intensive-applications-name-clashes/4595 (4 posts)

  • https://discourse.haskell.org/t/the-evolution-of-ghc/4008/13

Progress towards solution in Haskell you would find satisfactory:

(...none!)

You really should consider finding another technique: your current choice just doesn’t seem to be having the desired result…

Like implementing more sophisticated behaviour?

Or combining records with Overlaps/FunDeps, to build projection and extension methods?

So I don’t just groan. I also document possible designs – as you yourself linked to earlier in the thread.

Hey thanks! I didn’t know that (so the syntax is more like SML’s). I looked at purescript a while back and saw only the more Haskell-like data decl. (Is that anonymous version a recent-ish change?) Yes I knew of the different parsing – I was focusing on the superficially similar syntax giving a different semantics.

Also I thought purescript's records were a thin veneer over Javascript. Is that still the case?

I don’t use PureScript, but I believe it’s always been like this. Insofar as I’m aware, this is precisely because they are just JavaScript records.

I also document possible designs […]

So here’s an idea for another one: since GHC will soon have a JS backend, maybe you can devise a way to bring JS’s record system into Haskell. But as you noted earlier in the thread, designs without working code don’t usually amount to anything (much like insufferable groaning).