Haskell records compare Standard ML

SML’s design for records matured around 1986, with a few tweaks in the 1997 standard.

SML’s Record syntax is superficially like Haskell: { year = 2023, month = 3, day = 7} : {year : int, month : int, day : int}. (p13 in the standard; : is how SML spells ::.) Similar syntax as Haskell in pattern matching, binding a pattern (possibly including vars) to a label within a record.

After that the similarity ends. Its records are:

  • First-class/stand-alone/anonymous – that is, not tied to any data type/not needing a data constructor prefix.
  • Labels are therefore global scope/can appear in any record.
  • Labels don’t give rise to selector functions, so need special syntax # to extract by label #month today.
  • Free-standing tuples are treated as shorthand for a record with numeric labels: (2023, 3, 7) ==> {#1 = 2023, #2 = 3, #3 = 7}. (p22 Derived Form) IOW #1, #2 are like fst, snd, but not limited to twoples.

Record syntax arrived in Haskell ~1995, as syntactic sugar in data constructors, to ease the burden of squinting at constructors/patterns with many positionally-defined fields.

Presumably Haskell designers were aware of SML – they at least stole the syntax. There’s a 1999 paper Mark P Jones & SPJ – in the middle of finalising the 1998 standard – proposing Haskell pretty much follow SML/throw out compatibility with the 1998 standard. It mentions a few design difficulties – beyond the incompatibility, but nothing compared to the difficulties we’ve all been suffering for ~25 years.

I can’t find any explanation/rationale for why Haskell’s records are so different vs SML Does anyone know of any refs? Or can give some explanations?

6 Likes

This like many questions is answered (at least briefly) in the “history of haskell” paper

One of the most obvious omissions from early versions of Haskell
was the absence of records, offering named fields. Given that
records are extremely useful in practice, why were they omitted?

The strongest reason seems to have been that there was no obvi-
ous “right” design. There are a huge number of record systems,
variously supporting record extension, concatenation, update, and
polymorphism. All of them have a complicating effect on the type
system (e.g., row polymorphism and/or subtyping), which was al-
ready complicated enough. This extra complexity seemed partic-
ularly undesirable as we became aware that type classes could be
used to encode at least some of the power of records.

By the time the Haskell 1.3 design was under way, in 1993, the user
pressure for named fields in data structures was strong, so the com-
mittee eventually adopted a minimalist design originally suggested
by Mark Jones: record syntax in Haskell 1.3 (and subsequently) is
simply syntactic sugar for equivalent operation on regular algebraic
data types. Neither record-polymorphic operations nor subtyping
are supported.

cf: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/history.pdf

5 Likes

Fwiw, I think a basic anonymous record system without any form of subtyping. Just:

{foo: 1, bar: 2}.bar

and

let foo = 1
     bar = 2
in {..} :: {foo :: Int, bar :: Int}

would be enough to justify a proposal.

We don’t need to have the best record system, and a bare-bones record system is ridiculously powerful on its own merits.

3 Likes

Thanks @sclv, yes I’ve always taken that bit of history as saying “minimalist” H98 was a stopgap design, and that MPJ+SPJ offering an improvement (and lots of other ideas for improvement in the early 2000’s) shows nobody was very happy.

In Hugs at least, there were two competing designs by 1996: both the Haskell standard and TRex (Gaster+MPJ) – which looks like (actually more than) SML. TRex does use type classes to give a powerful variety of records.

That’s the bit that I would have thought was obviously limiting from the outset. (That is, having a field name be record-polymorphic, not necessarily the field’s type.) This isn’t a limitation in SML. Surely any moderately complex data structure will have cross-references amongst record types, and you’ll want to give the one identifier the same name in both places(?)

2 Likes

Thanks, but let me stop you there. (And I wasn’t really trying to design something else, just understand why we are where we are.):

  • From the outset, . in Haskell meant function composition. And spaces around it were not significant. So that syntax (I know you want to use it because OOP and even because SQL) couldn’t/wouldn’t mean field access.
  • SML has #bar {foo: 1, bar: 2} (at least I think it does – someone please correct me if not), where #<name> gives a field selector function, used prefix as with any other function.
  • SML doesn’t have data decls introducing labels, nor { .... } appearing as part of data decls, so MkR {foo = 1, bar = 2} unambiguously means constructor MkR applied to a stand-alone record. Whereas in Haskell the { .... } is part of and binds to the data constructor syntax.

I’m not disagreeing, but I’m pretty sure that after ~25 years of getting nowhere “bare-bones” is not going to be enough. And the lipstick getting smeared on the H98 design is filling up all the possible syntax design space.

I notice for example purescript just doesn’t have anything like the H98 design; and poaches the javascript model – which is syntactically quite similar to SML. (But I think it’s interpreted rather than compiled, so probably has some performance downsides?).

3 Likes

Something like the so called “sharp notation” in SML (see: Programming in Standard ML - Robert Harper, p.49) could also be implemented in Haskell, albeit in reverse order:

class SharpAccessible a where
    (#) :: a -> (a -> b) -> b

data Point = Point
    { x :: Double
    , y :: Double
    }

instance SharpAccessible Point where
    (#) p field = field p

main :: IO ()
main = do
  let p = Point {x = 1, y = 2}
  putStrLn $ show $ p#x

I think using a # instead of a . would also solve the ambiguity problem of using . in function composition. Then the only remaining problem is that the scope of the field accessor functions should be narrowed to the record. But I think we already have the DuplicateRecordFields language extension for that.

Question to all: Are there any problems with such an approach?

Your proposed (#) has the same type as Prelude reverse apply (&).

I’m not seeing how putting an accessor operator as a method in a class helps. And it’d be just a nuisance to overload every data type with fields.

# in general is used almost as a reserved sym in many GHC extensions – see Magic Hash for example: " Enables the use of the hash character ( # ) as an identifier suffix." So your p#x would be parsed as magic hash suffixed to identifier p, then p# applied to x.

Put the code into the Haskell playground, and you’ll see (#) is correctly identified as an infix function. Tested with GHC 8.6.5 to GHC 9.6.1.
The MagicHash extension isn’t part of GHC2021, Haskell2010 and Haskell98. It’s only activated with the flag -fglasgow-exts, but that is deprecated anyway.

Your “only” is incorrect: MagicHash extension is individually enabled. But I was giving that as only one example among many as to why (#) would be a poor choice. Especially since there’s already (&) to do the same job.

If you’re playing around, the acid test would be to declare two types with the same field label. DuplicateRecordFields allows the decls. But won’t resolve the ambiguity in the access.

(Yes, -fglasgow-exts is deprecated, to prefer individually enabling. That " [some] extension isn’t part of GHC2021, Haskell2010 and Haskell98." doesn’t stop it being ‘blessed’ for the longer term.)

Oh and BTW you’re misinterpreting that “sharp notation” in ML: the # isn’t a standalone operator, it’s lexically part of the accessor name – prefixed as much as GHC’s MagicHash is suffixed. On that p.49 #lab is the lexeme; you could potentially in the same scope have a separate identifier lab not denoting the field accessor. It would seem that in all HLLs, accessing fields needs an amount of hardwired magic.

(&) and (#) aren’t really equal:

main :: IO ()
main = do
  let p = Point {x = 1, y = 2} 
  putStrLn $ show $ p#x + p#y     -- this works
  putStrLn $ show $ p&x + p&y     -- this works not
  putStrLn $ show $ (p&x) + (p&y) -- this works too

Maybe it has something to do with the associativity and precedence of the operators. I haven’t really thought about that either.

I also now realize that it is not necessary to define a typeclass for the operator.

You’re right, DuplicateRecordFields does not work as I thought. I can use it to declare multiple records with the same field names. But when I create one of the records, I still get an error.

Correct, in SML the # is not an operator, I expressed it imprecisely.

Pedantic note: (&) is in base but not in Prelude.

(A shame in my opinion; it would be nice to have (&) in more places!)

I’ve been thinking about the associativity and precedence of a possible new operator for record access. Suppose I have a nested record like this:

data X = X {y :: Y}
data Y = Y {z :: Z}
data Z = Z

xyz = X {y = Y {z = Z}}

And a reverse application operator, say (#):

(#) :: a -> (a -> b) -> b
x # f = f x

With this operator one can write the access to the lowest level element of xyz as follows:

z (y xyz) = (y xyz)#z = (xyz#y)#z

This shows that the (#) operator must be left-associative in order to omit the brackets. In addition, it should have the highest precedence. But that is exactly the default setting for infix operators, namely infixl 9. The (&) operator from base, on the other hand, has the fixity infixl 1.

In my opinion, an operator other than . would be better for record access since . is already used in function composition and qualified imports. If there’s a conflict with GHC’s use of #, we could think of a different name, although I like the # since SML also uses it for record access (though not as an operator).

As far as I can see the problem with the records, there are two main problems: First, that the access to record fields is desired as described with the # operator, but with a .. Secondly, that no duplicate field names are possible in different records. Maybe there is a way to improve the DuplicateRecordFields extension so that it does not only can declare duplicate names, but also use these names as accessor functions?

OverloadedRecordDot, and other extensions in section 6.5 of the User Guide. This is all bleeding-edge currently – see at the link to ‘Solving HasField constraints’.

Dot has such a strong precedent for field/component accessing – including OOP and SQL – using anything else would be weird. And it fits naturally with dot for qualified imports/modules.

It’s dot as function composition that’s a terrible idea – especially wanting to not surround it with spaces.

There’s no precedent for using # as field access. SML wouldn’t be a strong precedent anyway amongst Haskellers; SML’s use of # is different to what you’re proposing.

2 Likes

I don’t think the . for function composition is terrible. Composition is usually denoted by f ∘ g in mathematics, so a . fits well. Again, this is elementary Haskell and should not be changed. It may seem strange to use something other than a dot to access record fields, but I believe that with such a simple thing, users would easily understand. Haskell already has significantly different semantics than other languages, then this wouldn’t matter anymore.

I’m confused about what you attempt to solve… Emulate SML syntax in Haskell?

BTW, the linked paper says the sharp notation is bad:

image

In SML the # is not an operator but a notation. It is therefore irrelevant whether sharp notation is discouraged in SML. Using a # was also just a suggestion. By referring to SML, I just wanted to point out that the sharp character in SML is also used to refer to record fields.

Another possible name for such an operator would be -<. This would also describe the reverse application quite well.

The problem that could be solved is that people want to write things like xyz.y.z to access record fields. With such an operator one can write xyz#y#z or xyz -< y -< z instead.

Yes, I’m not following what @jodak is now attempting. The Haskell community has decided to use . as postfix/tightfix field accessor, and that’s already implemented in GHC v9, per my links above. It lives alongside . (non-tightfix) as function composition – rather awkwardly in my opinion.

If you’re asking what I was interested in as o.p., I’m curious why Haskell doesn’t have stand-alone/anonymous records as does SML. It was a question about semantics in the first place, not syntax.

That you can now do (GHC v9), with that dot syntax.

Yeah, not clear what Harper thinks is “unreadable”. The (p:real*real) is a pair (in Haskell terms) without meaningful names for fst, snd. I think he’s complaining about positional access and preferring to use named fields – to avoid inadvertently swapping the x, y coordinates. I think (from the example just after you snip) he’s preferring meaningful field names and/or pattern matching on lhs of a function equation. IOW he’s against any of Haskell’s fst p or x p (H98 style) p.fst or p.x (OverloadedRecordDot) or p#fst or p#x.

Consider a function taking two points as arguments, and calculating the distance between them. Now there’s two xs and two ys.

1 Like

I agree, it’s rather awkward. In my opinion the only way to solve this is to use something other than . for either record access or function composition. I am against changing the function composition operator.

EDIT: Interesting, in Lightweight Extensible Records for Haskell - Mark P Jones, Simon Peyton Jones (1999)p.3 they believe that # would be a good choice for function composition. I can follow their explanations and I’m not fundamentally against. But I still question if this wouldn’t be too big of a breaking change.

I agree with @AntC2 that the period was a wrong choice for function composition. I’m growing more fond of Elm’s <|, |>, <<, >> menagerie the more I use it.

So for everyone who’s annoyed with not being able to:

  • use a period to denote the selection of a component from a structured value;

  • use the proper (mathematical) symbol for function composition rather than a period;

…please go and get that time machine up and running, so the rest of us don’t have to listen to all these “moans and groans”

1 Like