Haskell records compare Standard ML

AntC2 · May 6, 2023, 1:37am

Thanks @atravers, I hadn’t come across that page before.

I see it dates to 2015, before discourse, even before GHC’s using github. Haha should I add a nit that there’s too many places to nitpick? And the only person to contribute since 2015 has been yourself.

I see the lack of anonymous records/record-polymorphic field names as not a ‘nit’ but a great big gaping hole that other languages – even other functional languages – even other Haskells – have filled long ago. So I think not material for that Nitpick page.

I was hoping that asking here would prompt somebody to say look there’s this or that package meeting the need.

There is Lenses, which supports record-polymorphic field names, but not ad-hoc/anonymous records; and they need too much declarative machinery/template Haskell to be ‘lightweight’ imo. There’s that whole safari park full of wild operators – if it has to be that prolix, what’s wrong with it?

atravers · May 6, 2023, 4:22pm

which “Haskells” - implementations or “spinoff” languages?
how does each “Haskell” solve the “anonymous records/record-polymorphic field names” problem?

As it happens, some information relevant to this ~~gaping hole of a~~ topic has been widely available since 2007: see slcv's quote from page 16 of 55 in A History of Haskell: Being Lazy With Class by Paul Hudak, John Hughes, Simon Peyton Jones and Philip Wadler.

Presumably this is still the case, as your hope that:

…seems to have brought forth yet again all the same stuff that’s been previously ~~regurgitated~~ rehearsed elsewhere (yes, even lenses and -XOverloadedRecordDot is now old news). But if you really think SML has got it right, some of these could be worth looking at:

Lazy Standard ML from FOLDOC
A Compiler for Lazy ML (Lennart Augustsson)
The Chalmers Lazy-ML Compiler (Lennart Augustsson and Thomas Johnsson)

Enjoy.

AntC2 · May 7, 2023, 7:22am

Thanks, I’ll reorder the points in your post (and look at the Lazy ML links later – although I’d already had a quick look at ML before starting this thread). Mostly I’m going to repeat what’s already been said up-thread …

And @sclv quoted that passage way up-thread. (And I was already very well aware of it.) The part you omitted shows this was describing the position as at 1993, not 2007 when that paper appeared. So nothing had improved in the preceding decade? Not true: Mark P Jones had designed and got working two distinct approaches for records in a Haskell. Indeed both working simultaneously (albeit primitively) before the H98 standard.

In 2007 that was documenting the history (the what). It’s not adequate as an explanation for me (the why). To continue to make that claim two decades further on with all the advances in higher-order/System-F-with-bangles/Dependent Haskell says to me: GHC doesn’t care.

I can’t speak as a compiler implementer nor type theorist. The usual Haskell way is to review “the huge number of” prior designs for feature X; boil them down to some underlying mechanism; express that in the type system. As at 2007 – even as at a decade later – GHC tolerated multiple competing designs in the same feature-space, and that’s probably what would be needed. To this day we still have FunctionalDependencies and TypeFamilies and OverlappingInstances and UndecidableInstances, with none of them withering away.

(Fair question: am I just whinging/asking for the impossible?)

Spinoffs most obviously purescript, which I mentioned. They don’t seem to have wasted time wringing their hands about ‘ideal’ designs.
Haskells: Hugs/Trex, and new improved Hugs/Trex. (I’m not blowing my own trumpet/it’s nothing to be proud of. If even a non-compiler non-propellor-head like me can hack in C++, why can’t GHC do a much better job?)

In purescript it seems to be by a thin veneer over the javascript implementation. (Yes, you might call that cheating, but it does have to mesh with purescript’s type system.)
In Hugs/Trex there’s an extension to the H98 type system, Section 3 of the paper – which today we’d recognise as PolyKinds.
In my improvements, I’ve relied on FunDeps and Overlapping instances – having first slightly liberalised the Hugs implementation of them; but never allowing the porridge of behaviour in GHC.
So (full disclosure) the 1996~2006 Hugs/Trex implementation allowed ‘magic’ auto-instances for record access that an end-user couldn’t write for themselves. By easing the FunDep/Overlap rules – just a little, I’m now not needing such magic.

atravers · May 7, 2023, 8:58am

Oh, the joys of long, winding threads: I’ve removed that duplicated content.

…much like GHC still doesn’t have direct support for recursive import for modules, instead relying on the {-# BOOT #-} workaround - the following quote from this post in that issue:

…would also seem to apply to the current state of affairs regarding records. But then again:

…perhaps you just need to find a group of equally-motivated devs who “feel like” bringing records in Haskell up to your specification.

Hugs was written in C for maximum portability. I too have tried to make it work with its “OO descendant” (primarily for the template support, methinks) but soon went back to C, much like Torvalds & co. did for a particular OS kernel: all the best with your conversion of Hugs away from C!

Functions, Records and Compatibility in the λN Calculus (Laurent Dami).
Label-Selective λ-Calculus Syntax and Confluence (Hassan Aït-Kaci and Jacques Garrigue).

(…there could be other papers; I just don’t have my old OS archives readily available at the moment.)

wiz · May 7, 2023, 11:06am

While we’re at it, can we have a module-record-typeclass unification, like they do in The next 700 module systems paper?
No? Okay, I’ll be going then…

atravers · May 7, 2023, 12:08pm

The experience of the current effort to make GHC more modular along with having read Ben Moseley and Peter Marks’s Out of the Tar Pit now has me pondering this question: should some varieties of types (and the corresponding structured values) be allowed at all?

From the modular-GHC paper:

(page 33 of 59)

Caveat warning Passing several services to a function can look
cumbersome: why not bundle some of them into a single record
(say XYZEnv)? In our experience, there is no one-size-fits-all record
like this and the pattern is only a local maxima, instead, there is
a slightly different record best suited for each function. Thus, the
drive to bundle these services into a single record is exactly how
coupling begins. The sequence of events is:

Services are bundled into records such as XYZEnv.

Most of the functions which input XYZEnv use most of its
services or heavily related and thus have high coherence.

The code base evolves by adding new functions or features;
some functions require more than what is in XYZEnv, but still
require some services that exist in XYZEnv.

This produces an incentive to create new slightly altered
XYZEnv, or expand the existing XYZEnv. Creating more XYZEnvs
is cumbersome, and appears redundant, so the “path of least
resistance” is taken and an existing XYZEnv is extended with
whatever new fields are required for just the new functions.

Now that XYZEnv has grown, there are two secondary effects:
First, coherence is reduced because the number of functions
which use all of the XYZEnv fields is lower. Second, there is
more incentive to pass XYZEnv around because its functionality
has expanded.

And now we arrrive at a vicious cycle; more and more services
are added into XYZEnv because it is conveniently threaded
through many functions. But it is threaded through many
functions because of the many services it provides.

This is how we ended up threading DynFlags and HscEnv every-
where and storing arbitrary values into these records!

…and from Out of the Tar Pit:

(page 52 of 66)

9.2.4 Benefits for Data Abstraction

[…]

We believe that in many cases, un-needed data abstraction actually rep-
resents another common (and serious) cause of complexity. This is for two
reasons:

Subjectivity Firstly the grouping of data items together into larger com-
pound data abstractions is an inherently subjective business […]. Groupings which make sense for one purpose will inevitably differ from
those most natural for other uses, yet the presence of pre-existing data
abstractions all too easily leads to inappropriate reuse.

Data Hiding Secondly, large and heavily structured data abstractions can
seriously erode the benefits of referential transparency […]
This problem occurs both because data abstractions will often cause
un-needed, irrelevant data to be supplied to a function
[which lowers its coherency], and because
the data which does get used (and hence influences the result of a
function) is hidden at the function call site. This hidden and excessive
data leads to problems for testing as well as informal reasoning […]

It would appear that the current difficulties in making GHC more modular had (to some extent) been predicted as far back as 2006. So what was Moseley and Marks’s advice to avoid this problem?

 -- allowed
data Enumeration = Alpha Char | Beta Int | ... | Psi Double | Omega Bool

 -- banned
data Product = TarPot Char Int ... Double Bool

This also appears to be the solution which is being adopted by the modular-GHC developers - from their paper again:

(page 32 of 59)

In the following example, we can see that function foo requires
(and probably uses) two services: Logger and TmpFs.
foo :: Logger -> TmpFs -> ... -> IO ...
In current GHC, many similar functions would have the follow-
ing prototypes instead because services behavior may be configured
via command-line flags and some of their state may be stored in
the session environment:
foo :: DynFlags -> ... -> IO ...
 -- or
foo :: HscEnv -> ... -> IO ...
A significant part of our work is to refactor functions that have
the latter interfaces (DynFlags and HscEnv parameters) into functions
with the former interface (explicit passing of each service).

Perhaps the simplest way to solve the “anonymous records/record-polymorphic field names” problem is to disallow the types which necessitate their existence - product types, particularly those with inordinately-many components (such as HscEnv and DynFlags). But such a restriction is probably best left for yet another “Haskell” to explore…

AntC2 · May 8, 2023, 1:38am

Hmm. Probably all varieties of types have their uses. The trouble is there’s plenty of examples of each type getting pushed beyond its sweet spot as systems evolve. And then the maintenance headache of: it seems a lotta work to convert this tuple to a named data type everywhere, let’s just kludge in one more field …

The first candidate for deprecating would be Lists: almost never are they the right solution. And yet … all that LISP heritage … and all those Beginners’ courses … and StackOverflow IP …

Tuples are ok for a thrown-together quick-and-dirty. But tuples with 20 fields are ridiculous. Nested tuples/pairs are ok-ish. But nested 20 deep?

I’m not convinced a function with 20 arguments is a big improvement over a tuple with 20 fields: positional access is notoriously error-prone. So then you make each argument a distinct newtype – well that would help just as much accessing tuples as accessing multi-arguments.

So now your namespace is peppered with newtype types and newtype constructors – some of those perhaps used only in one function call. newtypes are often a pain because they allow only a single ‘content’ field. Contrast

type Colour = Rec( r :: Int, g :: Int, b :: Int)
cyan = ( r = 0, g = 100, b = 100 ) :: Colour

[I’m using Hugs/Trex syntax, just for something definite. This should not be taken as an endorsement over ‘Brand X’ record design.]

A type synonym, so more lightweight/less namespace pollution
Can have multiple fields (with the slight benefit I can permute the order they appear in).
Self-describing/ad-hoc: the signature on cyan is not needed – it’s there as the usual Haskell belt-and-braces.
So actually, I didn’t need to declare Colour at all: the value with those field names gives all the type security I need.
What’s more – and unlike data – I’m forbidden from using positional access/I must use field names. (Which compile to positional offsets, as usual, so no memory/performance downside.)

The ‘compile to positional offsets’ bit might vary between languages/implementations. It looks to me like purescript has some sort of dynamic typing for records/lookup to a descriptor to get an offset at runtime. SML allows only pre-defined record structures, so that it can keep a handle on offsets.

Yeah. For some value of “inordinately many”. Good luck with defining/getting consensus on that. (And with stopping people evading the limit by just nesting 10 x 10.) When you have an answer, we can limit record types to the same.

Or … ? Support ways for a function to say: I need only two fields out of HscEnv, I don’t care what other stuff is in it/whatever passes it to me [**] can refactor to its heart’s content. Haskell records/with GHC extensions can kinda do that today, but it’s leaky.

[**] Of course it’s not really passing the whole record; it’s passing a pointer to the start of the vector. And other functions accessing HscEnv get the same pointer. What worries me with that “explicit passing of each service” is do we extract many sets of a few fields each out of HscEnv? That seems a lot of thrash on the stack.

So would ‘Modularizing GHC’ be following the approach they’re proposing if Haskell already had a stand-alone/lightweight records system?

AntC2 · May 8, 2023, 2:13am

Oh, I was going to take great exception to something from the Tar Pit:

I know the software industry is as forgetful as it is innovative, but are these people serious? We’ve had criteria for ‘grouping of data items together’ and a whole durned mathematic basis since early 1970’s.

I think what they mean is: if you start your systems design by just throwing data elements together, agglomerating as you expand functionality, you’ll get a mess. Then it’s subjective as to how much effort goes into sorting out the mess ‘properly’ vs just kludging the worst parts – because the customers (and the Sales Department) want functionality and they want it now and stop with all that navel-gazing fiddling about.

Personal story: I’ve been involved with the guts of internal data structure design for two mid-range ERP packages. One was beautifully normalised/a joy to design queries over. The other’s was a whole series of kludges with ‘spare’ fields re-used for purposes they were never intended; with character fields holding both a tag and a stringified number – total nightmare in SQL to extract the subfields and turn them into a format to join to other tables – and I was doing a lot of that because its data integrity was shockingly bad. Guess which ERP went out of business?

atravers · May 8, 2023, 5:33am

AntC2 · May 12, 2023, 10:04am

That relies on being able to design/structure features orthogonally. Nice ideal to aim for but …

Take OverlappingInstances with FunDeps – or with Closed/overlapping TypeFamilies: how to guess what the overloading (instance body) is supposed to do – even if we figure out the types? I had to think that through quite hard to get Hugs/Trex records to play nicely with instances.

If we want Separation of Concerns, I think that means treating (say) record fields ( x :: Float, y :: Float ) orthogonally to ( r :: Int, g :: Int, b :: Int ), so that ColouredPoint can be both a Point and Coloured.

Perhaps that’s a way to handle DynFlags and HscEnv rather than breaking them up into tiny groups of fields. For Hygienic Orthoganality (™) we need the type system to ensure a function touches only x, y. Today’s HasField does some of the job; but it doesn’t stop a routine accessing by old-fashioned pattern matching as well as new-fangled .x.

atravers · May 12, 2023, 11:03am

can we have a module-record-typeclass unification, like they do in …

That relies on being able to design/structure features orthogonally. Nice ideal to aim for but …

…it’s often like vena cavas - some features end up being “superior” to others (and why the expression problem continues to persist).

As for (somehow) using types for e.g modular programming…Claus Reinke investigated whether something like that could work in his thesis:

…a bold decision for that time, considering the growing interest in type systems as popularised by the likes of Miranda^(R), Lazy ML and others, culminating in the appearance of Haskell. It begs the question: 25 years later (to the month!), can something resembling Heinke’s language now be subject to practical type checking?

AntC2 · May 13, 2023, 11:32am

It’s not that I’m ignoring that effort. In fact I’ve been struggling through the ‘Modularizing GHC’ paper. I think I’ve put enough effort into understanding it: it’s the paper to blame if I’ve made this much effort and still it seems incoherent. They don’t like the style of GHC-as-a-language-service. They wouldn’t do it that way if they were starting from scratch. (Also GHC has been going for over 30 years; it’s been through the hands of many maintainers; there have been many different styles.) Frankly: welcome to the real world of successful software artefacts; suck it up.

There’s a long list (Section 3) of gripes about GHC internals. There’s a largely disconnected and abstract/theoretical Section 4 on principles of ‘Domain-Driven Design’.

§4.1 Ubiquitous Language and Type-Driven Design

The ‘Ubiquitous Language’ means the terminology used in comments/variable naming. Why is that in the same section as Type-Driven Design?

GHC also doesn’t fully exploit typedriven design: a lot of functions are partial;

Good luck with finding a total function from source language to object code.

I’m not sure I see any benefit in poking about inside GHC like this. A large part of the emergent design of large-scale, long-lived, multiply-maintained systems is personal taste. You can put a huge amount of distracting effort into reorganising internals without ever delivering concrete improvements in maintainability. (Joel on Software has a story about Borland rewriting Quattro Pro.)

Now there do seem to be some nasties wrt DynFlags and HscEnv. The proper use of those would be to read from the environment/command line/configuration file(s) and fix them for the duration of the compile run before looking at any Haskell source. This is too much of a ‘batch compile’ approach. But it seems sub-parts of DynFlags get overridden for particular sub-purposes (in HLS?). Certainly those fields should be split out. Also working in Visual Studio/HLS, you can change and change back config settings as you go along. I’ve noticed sometimes HLS just gets stuck; you have to abandon and reload.

That’s where the ‘durned mathematical basis’ for grouping data items comes in: separate the ones representing the (fixed) target environment vs the (variable/reconfigurable) compile-time settings. But I don’t think that means going so far as splitting those big (not actually) static conglomerates into zillions of tiny fragments.

What does this have to do with a records system wot we do not 'ave in Haskell? Next post (after getting that rant off my chest) …

AntC2 · May 13, 2023, 12:11pm

Since we’re into making predictions, I predict this’ll replace one problem by a different one. the reason for product types is to put together values that belong together/change at the same time/depend on the same triggers. (This is just tedious old-hat relational theory again: “same triggers” = keys/functional dependencies – that is relational FDs, not Haskell ones.)

domain concepts aren’t always represented at type-level; [section 4.1 again]

If a domain concept is a config flag, you could represent each flag as a type, with each setting as a possible data constructor. But now some flags ‘belong together’: do you really want separate handing for FlexibleInstances, FlexibleContexts, UndecidableInstances, FunctionalDependencies, Overlap*? They’re likely to be needed in all the same places.

And if (groups of) DynFlags get overridden for some sub-purpose but the type system knows only the type of the flag, how to type-safely distinguish the ‘global’ flag from its local override?

The trouble with Haskell records is field handling isn’t “represented at type-level”. Consider some Hugs/Trex:

gbSwap ( g = g, b = b | rest ) = ( g = b, b = g | rest )
-- inferred ::  (a\g, a\b) => Rec (g :: b, b :: c | a) -> Rec (b :: b, g :: c | a)

This type shows gbSwap takes a record argument; returns a record result; touches exactly two fields in the record named g, b; expects the record to contain other fields rest :: a, but doesn’t touch them/just passes them back. (Yes, Trex’ display of that type is unfriendly [**]: I’m not advocating for exactly Trex.)

So the field names participate in the types and in effect reveal which functions read and/or write which fields. This is both more lightweight code and better documenting of the function’s behaviour.

[**] Trex follows H98 in making field names lower-case. But they’re constants, not variables. (Compare HasField represents field names as type-level String.) So that Rec( b :: b, ...) means there’s a field name b of type (variable) b. Note that field names are in a different namespace vs data constructors/variables vs type constructors/variables, so the human is confused, not the compiler. I’d spell field names starting Upper, like other constructors. In rest :: a, rest is a regular data variable, a is its regular type variable, so both correct to be lower-case.)

atravers · May 13, 2023, 2:52pm

Instead, there’s been a “hodge-podge” of “variously-baked” ideas, proposals et al over the past few decades, of which none have been adopted in Haskell (with a few appearing in those other “Haskells”). It’s an exercise in “herding cats”, and considering how well a recent attempt went…just be grateful that you now have -XOverloadedRecordDot:

…this being the legacy of Miranda^(R), which precedes Unicode by approximately a decade. Frankly: welcome to the reality of successful [but not at any cost] software programming languages; suck it up and move it along.

Otherwise, let’s just banish everyone who’s ever “vented unhappy thoughts” about the current situation on a tropical island, and no-one returns until the Haskell “records/named fields” problem is solved. It could even be made into a TV show to fund the endeavour: “Survivor - Fields and Records”…maybe pay-per-view?

AntC2 · May 13, 2023, 10:39pm

Touché. Thanks for the feedback. Yep, that’s pretty much where I am: on my desert island, I’ve modified Hugs – although Trex was already streets ahead of GHC.

No, I don’t use it/would rather it never happened. (Given that we are where we are with dot-as-composition.) -XDuplicateRecordFields I do use. (Wish I could figure out how to implement it in Hugs.) That came with 8.0, so I see no merit in upgrading to GHC v9.

Not true: SML has an actual working and long-standing implementation; Hugs/Trex has an implementation; purescript has an implementation; SPJ has co-authored several "precise, worked-out design"s.

atravers · May 14, 2023, 4:44am

So the answer to the questions in your original post:

…then must surely be something like:

…as opposed to:

Q.E.D.

Topic		Replies	Views
Couldn't records be solved entirely at the type level? Learn	31	2541	June 21, 2024
Using Haskell for commercial/data-intensive applications -- name clashes	3	784	May 27, 2022
Computed Properties for Haskell Records	14	1755	December 2, 2023
[Well-Typed Blog] Anonymous or large records with OverloadedRecordDot and OverloadedRecordUpdate Links	8	1140	March 9, 2023
Haskell records in 2025 (Haskell Unfolder #45) Links	28	1021	July 2, 2025

Haskell records compare Standard ML

Related topics