New notation for records?

So, I have been thinking (shocking, I know). Consider:

data Country = Country {name1 :: String, company :: Company} deriving Show
data Company = Company {name2 :: String, boss :: Employee, car :: Car} deriving Show
data Employee = Employee {name3 :: String, age :: Integer, empCar :: Car} deriving Show
data Car = Car {name4 :: String} deriving Show

(Yes, I’m not using duplicated fields because that’s a different discussion.) When working with records, getting the values is never a real pain point. Not even for nested records. Using the named field functions elegantly compose to extract what you want.

f :: Country -> Integer
f = age . boss . company

Updating a deeply nested value is the issue. You have to get to the final record, apply the changes, then recursively apply changes until reaching the first record. It gets verbose fast. It looks even worse when you want to do multiple changes on other deeply nested records.

g :: Country -> Integer -> Country
g r i = let x1 = company r
            x2 = boss x1
        in r { company = x1 { boss = x2 { age = i } } }

You can change it a bit:

g :: Country -> Integer -> Country
g r i = r { company = (company r) { boss = (boss (company r)) { age = i } } }

I looked at this at thought: “can you introduce something to desugarize into this?”.
I thought of something like this:

r { company ~= { boss ~= { age = i } } }

The ~= should grab the expression to the left of his parent { and replace himself with the function to his left and a =. That way the previous expression would be equivalent to:

r { company = (company (r)) { boss = (boss (company (r))) { age = i } } }

Another equivalence:

r { company = (company (r)) { boss = (boss (company (r)) { age = i }
                            , car = (car (company (r)) { name4 = "new" }
                            , name2 = "cmp" }
  , name1 = "ccc" }
-- is this
r { company ~= { boss ~= { age = i }
               , car ~= { name4 = "new" }
               , name2 = "cmp" }
  , name1 = "ccc" }

At this point I noticed the commas and the blocks, and thought of rewriting this:

r { company ~= { boss ~= { age = i }
               , car ~= { name4 = "new" }
               , name2 = "cmp" }
  , name1 = "ccc" }

into this:

  r do
    company do
        boss do
            age = i
        car do
            name4 = "new"
        name2 = "cmp"
    name1 = "ccc"

My first question is: is this even possible?
My second question is: am I insane?
What do you think?

2 Likes

Just to make sure, are you aware of lens?

They are sometimes a bit daunting error-wise, but they solve for sure what you correctly identified as the main problem: updating nested records. E.g.:

AClimb   -> \lp -> lp & speed . _1 .~ (-1)
                      & speed . _2 .~ 0
5 Likes

I wouldn’t give this too much thought - while no-one else wants to admit it, what everyone really wants is C-lassic record notation:

yerf' = yerf.larry.curly.moe.laurel.hardy =: 73

6 Likes

Yes, I am aware of optics, lens, and OverloadedRecordUpdate, and use them. Thank you for your consideration.
I was just playing with records, saw the pattern, and thought that it would be funny if record do-notation would solve the problem.
No need for Template Haskell, typeclasses, polluting record types with other types.
Can you imagine, monad do-notation, applicative do-notation and now record do-notation.
I wonder if it would be beginner friendly for newcomers, over say, lenses (probably).

2 Likes

Eh, if type families weren’t slow as molasses something like get @"foo.bar.baz" could be assembed out of accessors for "foo", "bar" and "baz" separately. At that point the separator can be any character and the language doesn’t need a syntax extension.

This would then be extendable to the FFI land too: offset into a C struct works completely differently under the hood, but ultimately means the same thing.


From what I gather, it precisely mirrors what the resulting program does: for every record you update you produce a new record with one differing field.

Over the years there have been a lot of discussions about how Haskell needs convenient record notation (implemented a decade ago, of course). I however find the entire idea of nested record updates to be at odds with functional programming as a whole. The argument goes roughly like: if I want to update foo.bar.baz directly and it’s by design safe to do so relative to the rest of foo, why don’t I just have baz separately (or at the same level as foo)? Repeat the thought experiment until you no longer need nested record updates.

5 Likes

Not everyone: I do not want nested records. (And there’s nothing secretly-in-my-heart I’m failing to admit to myself.) I’ll illustrate:

This illustrates the classic/naieve hierarchical data structure that Commercial IT abandoned in the '80’s. Why does Haskell persevere with that thinking?

Wrong. Join Country to Company to Employee. You now have a flat/wide tuple/record [**] grab the age from that. Or update the age into it.

[**] in which naming the same thing with the same name will make joining much clearer, no clutter of different names for the same thing; no need to RENAME or SELECT-by-name one of the same things.

Now admittedly update-through-Join can be hazardous. (But not in straightforward cases like this example.) This is a reasonable area for research, you’da thought Haskell would be right at the forefront. But no. GHC is not even at first base,

Addit: IOW what @BurningWitness said. That post arrived as I was typing.

3 Likes

That’s the idea. The current record functionality but with a better syntax: one that’s not intimidating, full of brackets and commas and having to keep track of references like (boss (company (r))) to the nested records you edit on the go down.

  r do
    CT.company do
        CP.boss do
            B.age = i
        CP.car do
            CR.name = "new"
        CP.name = "cmp"
    CT.name = "ccc"
-- or maybe make the following sugar for the above
-- the following avoids duplication
-- but the above makes clear whose field is who if you fail your indentation
-- and will error accordingly
  r CT.do
    company CP.do
        boss B.do
            age = i
        car CR.do
            name = "new"
        name = "cmp"
    name = "ccc"

Here’s the notation importing the records with a qualified module to avoid name clashes. It works because it desugars to the functions in the way records demand it, to avoid ambiguity.

There’s no need to import lens, no need to import anything complicated, it works out of the box, you can do multiple nested updates, you can do whole record replacements on the way down if you wish. No need to overload anything.

I agree. But I had the idea and wanted to share to see if records would be more palatable to write this way.
With do-notation it looks functionally imperative, like monadic do-notation. Which I think is fitting.

I do agree that the do-notation would be very verbose compared to C-notation for this particular case since you need 6 lines to reach hardy but if you plan on doing multiple updates on the way then it’s just one extra line per update.
OR you can sing Baby Shark on the way and put all the do’s on one line, instead of dots, that would be funny.

That’s another way too.

Is there anyone doing this research?

By “classic notation”, I meant no other abstractions:

  • no HasThis, HasThat, HasTheOther or HasSomeMore classes or instances;

  • no row, column, diagonal, corner, vertex, height, width, length, size, volume or other obtuse forms of polymorphism;

  • et al!

…the ol’ warhorse didn’t need any of that, and neither did Lazy ML.


So treat records like small DB relations - I think I’ve seen that before:

Alternatively:

2 Likes

If you have what is really a database structure of this-tuple cross-referencing that-tuple cross-referencing t’other-tuple, today’s only workable approach is to put the lot in an external database (could be in-memory) and interact with it via an SQL library. But then Haskell has no merits for that over any commercial language – and indeed Haskell’s laziness gets in the way (as others already mentioned). So I wouldn’t use Haskell – unless there’s some overpowering other reason for using Haskell.

Curiously (ref the discussion at @atravers’ “Alternatively”), SQL was at first designed to be declarative: that’s the only bit of Codd’s notion of an Algebra that survived a bunch of unimaginative IBM Engineers. The core of SQL (ignoring the ‘Procedural’ features) still is declarative – if you squint the right way; and lazy-ish: if your query merely wants to know whether a record EXISTS meeting some condition, it’ll return as soon as it finds one. (See also QUOTA queries.)

If you were trying to get there, I wouldn’t be starting from SQL – which was awful in 1970’s and has just got awfuller down the decades. wrt laziness for updates, SQL from the beginning was aimed at shared databases, with single-user being an afterthought when powerful-enough workstations came along. It can’t afford updates to be lazy, for fear of exposing an inconsistent state to other users/other threads. Any proposal to include weird behaviour like laziness would come with an avalanche of verbose syntax; and probably its semantics would be only pretending; and the DB companies would implement (something) with the utmost bad grace, if at all. I’d be starting from some version of a Relational Algebra.

purescript is Haskell-alike and at least has stand-alone records not tied to a particular datatype. Its records are implemented as javascript under the hood, so are limited by that. They do claim to be “extensible”, but that term is open to interpretation. javascript has strict semantics; purescript follows course, for conscious/deliberate reasons.

"extensible"

Given there was a Haskell with extensible records before 1998 (more powerful than ML, as I understand it); and then there was HList 2004 (which has been upgraded over the years to keep step with fancy type-acrobatics), you’da thought there would be continuing research.

There is a dude tinkering away with Hugs/Trex. It needed quite a bit of hacking OverlappingInstances and FunDeps before there was enough structure to manipulate Trex records (as opposed to polishing them nicely and putting them on a shelf); that post is reporting sufficient for an algebra over Trex records:

r == (r `proj` r2) `app` (r `remv` r2)    -- for all Trex records r, r2

Those three operations are sufficient to build a [Natural] Join; indeed are expressively complete for any operation over ‘labelled tuples’ providing they’re type-compatible.

In-memory solutions can use dictionaries instead of tables for most of the functionality (Elm conference video on it) and some dictionary types can be spine-lazy (e.g. Lazy types in radix-tree). So I’d say less of a “no merits” and more of an “untrodden design paths” situation.

Then that “dude” has a fundamental problem to solve first: making that extended Trex work without functional dependencies. Why? Because functional dependencies make the Haskell type system Turing-complete, and is why most other Haskell implementors dislike them so vehemently.

At some point in the future, functional dependencies will no longer exist in Haskell, if it stays primarily a functional language. So if that extended Trex system hasn’t been migrated away from them at that point, it will be irrelevant; a novelty some “dude” once managed to cobble together for an early Haskell interpreter…

What makes me even more suspicious about this supposed need for named fields/records is that - at least in Haskell - no-one has expressed a similar need for functions to have named parameters, such as:

instance Integral a where
   div { numerator :: a, denominator :: a } :: a
                       	⋮ 

So what exactly is the difference here?

Functions rarely have a bunch of arguments of the same type, so it’s not all that useful. Contrast it with, for example, decoding, where an on-disk structure can be an array of ten 32-bit integers where each one means a different thing.


I think an extensible record solution similar to vinyl wouldn’t need any additional abstractions whatsoever, just a properly lazy type family arrangement, which GHC currently does not have. This would then allow records to use dictionaries, lists and arrays (both mutable and immutable) on the data level, which would ultimately solve all of the record woes, not just the aforementioned tip of the iceberg notation issue.

With something like that in place the question of whether Haskell’s ADTs really need named fields would become quite a bit more interesting.

2 Likes

Functions rarely have a bunch of arguments of the same type, so it’s not all that useful.

Start reading from:

https://discourse.haskell.org/t/haskell-records-compare-standard-ml/5933/26

…to the end of that thread.


But one of the often-regurgitated arguments against laziness is that an erroneous expression can escape from a suitable error-processing context, the common example being exceptions lurking in (lazy) structured data. Having a similar from of laziness at the type level risks incurring the same problem - type-errors going undetected for varying lengths of time, until someone (much later) refactors the associated code, at which time they are confronted with said (and unexpected) type-error, which is contrary to the stated objective of type-checking in Haskell: to catch type errors early.

The type-checking analogue of lenient evaluation could be another option, like what the functional logic language Verse uses…if some heuristic can be found to balance early error detection with permitting more programs to successfully pass type-checking.

Yes, with some combinators sprinkled in. The question we should ask is: What monad do record updates live in? It could be this:

import Control.Lens.Setter (set,over)
import Data.Monoid (Endo(..))

-- (,) (Endo s) is a monad because Endo s is a monoid
type Update s = (Endo s,())

runUpdate :: s -> Update s -> s
runUpdate s (Endo f,_) = f s

set_ :: Lens' s a -> a -> Update s
set_ l a = (Endo (set l a),())

over_ :: Lens' s a -> Update a -> Update s
over_ l (Endo f,_) = (Endo (over l f),())

Then your example could, provided the appropriate lenses, be written like this:

runUpdate r $ do
    over_ company $ do
        over_ boss $ do
            set_ age i
        over_ car $ do
            set_ name4 "new"
        set_ name2 "cmp"
    set_ name1 "ccc"

Perhaps someone adept could write a quasi-quoter that translates lens = value into set_ lens value and also introduces an over_ whenever lens do is seen inside the quasi-quoted region.

Remark: This trick to use a monad where we’re actually just building a monoidal value is e.g. used in the blaze libraries so that we can use do-notation to build markup.

4 Likes

Your timing couldn’t be worse: this last week. But do note named parameters are not necessarily the same thing as records with labelled fields – and indeed see @goldfirere’s detailed description of O’Caml, which seems to have three (at least) similar-but-different mechanisms.

I’m going to react violently against this claim. But really it’s off-topic of records notation. Could someone with an ‘Admin wrench’ please split off the posts on to a new thread.

(The stuff on named arguments is better moved there too IMO.)

Also someone had better tell the folk using GHC’s 9.n HasField etc features that it’s moribund (allegedly), since it relies heavily on FunDeps.

Is there something wrong with the type system being Turing-complete? Oleg has exploited that to good effect. Yes Haskell’s type system is unsound; as I understand it, that’s in addition to being T-c; and we seem to have lived with it ok for decades.

That’s very cool and reduces the amount of typing. Using lenses is cheating though, look how clean it looks anyways:

{-# LANGUAGE TemplateHaskell #-}

import Control.Lens.TH
import Control.Lens.Setter (set)

data Country = Country { _countryName :: String, _countryCompany :: Company } deriving (Show)
data Company = Company { _companyName :: String, _companyBoss :: Employee, _companyCar :: Car } deriving (Show)
data Employee = Employee { _employeeName :: String, _employeeAge :: Integer, _employeeCar :: Car } deriving (Show)
data Car = Car { _carName :: String } deriving (Show)

makeLenses ''Country
makeLenses ''Company
makeLenses ''Employee
makeLenses ''Car

example :: Integer -> Country -> Country
example i = set (countryCompany . companyBoss . employeeAge) i
          . set (countryCompany . companyCar . carName) "new"
          . set (countryCompany . companyName) "cmp"
          . set (countryName) "ccc"

main :: IO ()
main = do
    let car1 = Car { _carName = "car1" }
    let car2 = Car { _carName = "car2" }
    let boss = Employee { _employeeName = "Carl", _employeeAge = 30, _employeeCar = car2 }
    let comp = Company { _companyName = "Acme", _companyBoss = boss, _companyCar = car1 }
    let ctry = Country { _countryName = "Scotland", _countryCompany = comp }
    print ctry
    print $ example 420 ctry

My original intention was to see if it could be made into an extension of GHC so that the do’s desugar into plain record notation, so you don’t have to type all those brackets and nested records referring to the previous one. It wouldn’t involve lenses.

Not that lenses aren’t very cool.

1 Like

Indeed OP might not like this topic splintering in subthreads. Let us all try to stay on topic!

Hooray!

This could be just what’s needed to help Haskell’s record/named-fields system “out of the rut crater”, by establishing a common notation for both regular and constructor functions in Haskell. All commentary about (the lack of/adding) “true records” to Haskell can continue there:

https://github.com/ghc-proposals/ghc-proposals/discussions/654

…and never be seen here on this Discourse again, until a solution that works in GHC is found.

(For those who are wondering: I don’t have a G!tHub account :-)