Computed Properties for Haskell Records

Today, I wrote on a whim this small post about computed properties for record types in Haskell.
That is, a calling functions that compute a property of a record based on the fields of that record as though that function were a (computed) property of that record using the dot syntax:

TL;DR:

{-# LANGUAGE GHC2021, OverloadedRecordDot, DataKinds #-}

import GHC.Records

data User = User
  { birthYear :: Int
  , name :: String
  }

instance HasField "age" User Int where getField = age

age :: User -> Int
age u = 2023 - u.birthYear

user :: User
user = User{birthYear=1995, name="Robert"}

main = print user.age
7 Likes

Thanks @romes, yes it was always part of the Overloaded Record Dot [ancient history] line of thought you could make up ‘virtual’ fields for a record. That’s mimicking SQL table.field, and especially OOP object.extractor – where the called method might be doing arbitrary computation.

You’ll see amongst that ancient history examples like concating and pretty-formatting firstName ++ " " ++ lastName to give a virtual fullName field.

Some of those proposals were going further: tightbinding postfix . as reverse function apply, not specific to record.field – so

foo . bar $ quux bas.flub === foo (bar (quux (flub bas)))

(The first . must be familiar function composition because it’s space-surrounded. Note tightfix bas.flub binds tighter than the quux function application.)

Beware that the Lenses approach uses . to mean always function composition, and it’s a particular idiom for deeply-nested fields to build an accessor in tightfix style. (If you switch on OverloadedRecordDot, beware that steals the tightfix function composition syntax which makes Lensistas grumpy.)

3 Likes

Ok, thanks. I’ve taken out that specific part of my comment/it wasn’t material. (Lenses always did mess with my head.)

I associate this idea with Matt Parsons, who I think has written about it a few times, although the only thing I can find right now is Stealing Impl from Rust .

4 Likes

Speaking about user-defined HasField instances, I think it would be neat if OverloadedRecordDot was extended so that it supported types as field identifiers.

I can currently write

newtype Address = Address String 
  deriving newtype Show

data User = User { address :: Address , name :: String } 
  deriving stock Show

instance HasField Address User Address where
  getField User {address} = address

someUser :: User
someUser = User { address = Address "someplace", name = "somename"}

foo :: Address
foo = someUser & getField @Address

but I would like to be able to write

foo :: Address
foo = someUser.Address

The Go language has a feature similar to this, called anonymous fields.

That seemed a good idea. Do you remember the reason why it was rejected ? Was it because it would potentially be incompatible with lenses’ code ?

1 Like

Not specifically: using . like that is incompatible with Lenses, whatever it means. [**]

(Vaguely.) It’s to do with name resolution for the field: note that you might import this module/its data decl into some other module that isn’t using the OverloadedRecordDot and therefore isn’t aware of HasField. Or this module might import a data decl from somewhere not using it. The data doesn’t include ... deriving HasField so there’s no telling from the data how this field is going to be used.

Name resolution detects the tight . early in the compilation pipeline, tries to figure out which data decl the field comes from, and creates a virtual HasField instance on the fly. For this reason there’s all sorts of syntactic restrictions around using the ., to give name resolution a hope of succeeding. (For example, you can’t put an expression after the .: it has to be bare field name.)

If tight-. was merely sugar for reverse apply, instance resolution couldn’t kick in yet, and type inference would probably get stuck not knowing what was the type of the resulting expression.

[**] Although … note that Lenses uses . between piled-up/nested field names (lenses arising therefrom). It doesn’t use . between the outermost container and the lens(es) – instead it has that zoo of squiggly operators, depending whether you’re merely accessing the underlying field or ‘updating’ it in some form. So a very clever name-resolution algorithm might still be able to detect when tight-. is being used for compose versus field access versus reverse apply. But then the different uses have different binding precedence. I guess it’s all too hard – to explain to perplexed programmers especially.

1 Like

That seems a spectacularly fragile idea. As soon as you have a data with more than one field of the same type, it’s ambiguous. (If someone’s amending the data decl by adding a field, are they going to go check if somewhere there’s a usage relying on the unique field type?) And note type resolution/instance selection comes after name resolution in the usual compiler pipeline. (OverloadedRecordDot has special handling at that stage, as I vaguely recalled above.)

I’m keen to allow upper-case field names (which would have to use NoFieldSelectors), and I think would work with OverloadedRecordDot or of course just the { ... } syntax. I think of field names more like constructors than functions.

Syntaxing sugar for f x (in our case x.f) shouldn’t be any different to infer that f x. I am even more perplex now.

I have two quibbles with your post. First,

data Square
  = Square
    { topLeft     :: Point
    , bottomRight :: Point
    }

is a fine way to define a Rectangle data type. But that’s just an aside.

What really bothers me about the virtual record fields via OverloadedRecordDot idea is that they’d have very different semantics from the actual record fields. Consider

data Polygon = Polygon {
   vertices :: [Point],
   perimeter :: Double}

polygon :: [Point] -> Polygon
polygon vs = Polygon vs (calculatePerimeter vs)

instance HasField "surfaceArea" Polygon Double
  where getField = calculateArea

The calculatePerimeter and calculateArea functions both have linear complexity. The fake field polygon.surfaceArea will exact that cost every time it’s accessed, the proper field access polygon.perimeter only the first time.

I’d like to see properly designed computed fields added to Haskell, because this would play to its lazy strength for a change. But this ain’t it.

3 Likes

Well, don’t shoot the messenger! I’m trying to remember some very intricate/overlapping decisions from more than 7 years ago. GHC tries to make each extension orthogonal, so it can’t know/doesn’t try to restrict what else might be going on in the code. I’ve mounted the tape from the archive; let me see … Some points:

  • x.f.g.h should be supported, to access within deeply-nested records. But instance resolution for h depends on instance resolution for g, which depends on …
  • Your local module might be importing the data decl, but have a local decl for function f. The usual way to resolve the name clash would be a module prefix: x.ModuleF.f – does anybody like that? Can you read it as "the f field from module ModuleF applied to record x"? Contrast: the design idea was that dot-suffix field access should be syntactically lightweight.
  • The instances for HasField are weird (as SPJ says somewhere). The field name (type-level String) doesn’t appear as an argument to getField (it’s a phantom type). The usual way to resolve that at the call site would be sth like getField @"age" x. But a) we don’t want to force @ calls on a user merely accessing a field; b) (again) how do we keep the dot-suffix syntactically lightweight?
  • If dot-suffixing were trying to be more polymorphic, I could assign nameShadow = name the field accessor function. The type of that is merely :: User -> String. Now how do I get from x.nameShadow to @"name"?

So the decision [**] was: .name always means field access, irrespective of what other names are in scope; and always gets implemented as getField :: HasField "name" r a => r -> a, where r is filled in from the usage site, and there’s a FunDep to a from the HasField instance.

[**] It’s very difficult at design time to anticipate all the gotchas, and all the ways other extensions might overlap/interfere. There was a strong feeling (and given GHC had more of a culture of experimenting back then) for providing merely NoFieldSelectors ASAP, so as to free up the namespace and let a thousand flowers bloom. That never happened.

So when the HasField stuff eventually arrived, I was greatly disappointed. It seems to me it has a very low power to weight ratio – especially considering how much syntax it steals. I tried it. I didn’t like it. I’ve never used it in real code. I wish it was just withdrawn before more people take it up and then GHC has (another) backwards compatibility/legacy problem.

BTW Haskellers have been arguing about records since … oh, before the H98 Language Standard was published. What went in there was considered a stopgap at the time. Even so, it was a braindead decision to tie the field name to only a single record type. There was a swirl of possible redesigns 2000~2002, and including a very obvious and fully implemented proposal TRex from Hugs. Nothing happened. When I started an interest in Haskell ~2010 it was still a hot topic; still we were almost … nearly … on the verge of something better.

Frankly, I’ve given up hoping. If you’re writing an application in which record structures and flexible accessing are important, don’t use Haskell.

1 Like

The o.p. didn’t propose such a dubious design for Polygons[**]/and anyway Square was only an easy-to-understand example to illustrate the technique. Your example is bogus.

Yeah, good luck with specifying what a ‘proper design’ is. (I think you haven’t shown one in your post.)

Edit: At a guess, you seem to be taking an overly OOP/procedural approach that you expect the calculated field to be ‘stored’ inside the record. A Haskell term mySquare.area is just some expression (after desugaring), and as such susceptible of specialisation/inlining then CSE [Common Subexpression Elimination]. It won’t necessarily get evaluated twice. (And in case it needs evaluating only once, you’ve gained nothing; if it doesn’t need evaluating at all (because laziness), you’ve wasted storing a thunk in the record.)

[**] Isn’t that design in fact under-specified: given (say) eight points, there’s multiple ways to draw a perimeter around them – if we allow concave/self-intersecting arrangements – even if we know the magnitude of the perimeter – and you appear to be claiming perimeter is a calculated field.

This one seems pretty easy, actually. Please steal this for a proposal if you like.

Extend the grammar production fielddecl:

fielddecl → vars :: (type | ! atype)
+         | var = exp

This production defines a ‘computed field’ for its associated constructor. This is equivalent to defining an ordinary field with the type of exp, except for the following differences:

  • The RHS of a computed field type-checks as if every field name of that constructor, computed or not, is in scope and has the type of the field (not the field’s getter function). All other names are expected to refer to imported or top-level identifiers. Note that circular references between computed fields are possible.

  • Prior to type-checking, uses of the constructor in an expression are replaced with lambdas that accept only the non-computed fields of the constructor (in declaration order) and inline the RHS of any computed fields. As an example:

data Polygon = Polygon
  { vertices :: [Point]
  , perimeter = calculatePerimeter vertices :: Double
  , surfaceArea = calculateArea vertices :: Double
  }

foo = Polygon [a, b, c]

foo is rewritten to

foo = (\vertices -> let
    perimeter = calculatePerimeter vertices
    surfaceArea = calculateArea vertices
  in Polygon vertices perimeter surfaceArea) [a, b, c]

(The let allows computed fields to refer to each other and themselves. Any identifiers used in the RHS that are not field names must be inlined hygienically; a local identifier must not shadow a top-level identifier referenced by a computed field.)

Labeled constructions (Polygon { vertices = ... }) are rewritten analogously. Missing fields that are not computed are still initialized to undefined, per the Haskell 2010 Report. It is an error to provide the value of a computed field here.

  • Uses of the constructor in a pattern are replaced with patterns that contain wildcards for all computed field positions. Labeled patterns are unaffected.

  • Labeled updates (foo { vertices = vertices foo ++ ... }) are translated per the Haskell 2010 Report as if only non-computed fields exist, and then constructors are rewritten as above. It is an error to provide the value of a computed field here.

1 Like

Data constructors can be partially applied – just like everything. So what’s the type of Polygon [a, b, c] ? Why wouldn’t it be :: Double -> Double -> Polygon like any regular data constructor ?

Suppose I return bare Polygon constructor as the result of an expression/function (and suppose type Polygon has other constructors, with differing numbers of computed fields). What’s to be the type of the return? Note the type can’t encode which constructor got returned.

So with a partially applied constructor, how do I deliberately leave a computed field undefined – because, for example, I know I haven’t supplied enough in the ‘ordinary fields’ to support the calculation. (If runtime tries to calculate, it’ll throw an error – unlike putting undefined in fields in a H98 data type.)

Emm? Wouldn’t you want to bind a var to the computed positions, so you can grab the result of the computation. Indeed that might be a Maybe/Either/List/etc so you’d want a full pattern match.

As per my “Edit:” to the post you’re responding to (perhaps it arrived after you started into your sketch), I’m a lot less than convinced that storing the calculated field inside the data type will gain you much/often. It feels a lot like you’re trying to bring a OOP/strict language idiom into Haskell.

The type would be Polygon. The user-visible Polygon constructor should act like the desugared Polygon constructor minus the computed fields. That’s the design goal.

Computed fields are associated with a specific constructor, just like other fields. So if you return bare Polygon, it has type [Point] -> Polygon. If you return another constructor bare, it has the type that that constructor would have if you ignore its computed fields.

What difference would that make? If you never use the computed field, it’s just a thunk. If you use it and it never forces the value of an undefined field, you get a valid result. If you use it and it does force the value of an undefined field, the result is the same as what would happen if you called a function that computes the same thing. But the advantage of a computed field over a function is that any sharing of the input value will also share the result of the function, without having to use some sort of external memoization.

Maybe in a Polygon{..}-style wildcard pattern (I didn’t include that in the proposal but I could see it going either way). But if you match on Polygon v, I don’t think it’d be expected to introduce vars for the computed fields.

You can always match on p@Polygon{} | Just x <- p.computedMaybeThing.

If you want to look at it that way, sure. But benefiting from sharing data is an existing Haskell idiom for both performance and making practical use of lazy infinite structures, and it covers more cases than CSE does (and GHC is a little unpredictable about when it applies CSE anyway, I believe).

1 Like

Or heck, maybe it’s easier to express this all with pattern synonyms: Translate

data Polygon = Polygon
  { vertices :: [Point]
  , perimeter = calculatePerimeter vertices :: Double
  , surfaceArea = calculateArea vertices :: Double
  }

to

data Polygon = Polygon_internal
  { vertices :: [Point]
  , perimeter :: Double
  , surfaceArea :: Double
  }

pattern Polygon vertices <- Polygon_internal vertices _ _
  where
  Polygon vertices = let
      perimeter = calculatePerimeter vertices
      surfaceArea = calculateArea vertices
    in Polygon_internal vertices perimeter surfaceArea

and don’t expose Polygon_internal. That should do it. (Add some pragmas for pattern completeness, and probably a few other things having to do with pattern synonyms’ rough edges, but that’s the basic rewrite taken care of with existing language features at least.)

1 Like

My first thought was, of course it is not an anti-pattern, it’s been designed for that exact reason.
Since reading all the comments on this post, I changed my mind : this is definitely an anti-pattern.

First, there is no benefit from using user.age over (user age), unless you can use it to update thing, or you are already using user.name, which is not idiomatic so far.
If it is to “ease” transition from people coming from an OOP backgroung, I doubt it will be effective.

Then, it hides the fact that age can be time consuming and will be recalculated every time. As @rhendric pointed, the whole thing could be “enhanced” with something along

[quote=“rhendric, post:14, topic:8237”]
Extend the grammar production fielddecl:

fielddecl → vars :: (type | ! atype)
+         | var = exp

This production defines a ‘computed field’ for its associated constructor. This is equivalent to defining an ordinary field with the type of exp, except for the following differences:
[/quote] Extend the grammar production fielddecl:

fielddecl → vars :: (type | ! atype)
+         | var = exp

This production defines a ‘computed field’ for its associated constructor. This is equivalent to defining an ordinary field with the type of exp, except for the following differences:

But this will make what some think as already overcomplicated (*) even more complicated.

However, I think there are legitimate uses for fields and tight binding.
For example for people which want to use records as ML module (for dependency injections or mocking …) it make sense to be able to write mod.f instead of Mod.f and this without having to bother with the parenthesis.

After reflection on

If we see in x.f that f “belongs” to x (as Maybe belongs to Data.Maybe) then it makes sense that the tight binding make the type inference unidirectional, you need to know x before choosing f.
There for x.f is more than (f x) it also adds the functional dependency x -> "f" -> f.

Something where HasField excels is to “lift” nested field up. This can be used to rewrite @blamario problem (I have had similar problems and came with the following approach). Instead of writing

or extending GHC to allow

One can reverse the problem and write this instead

data Polygon = Polygon
  { vertices :: [Point]
  }

data PolygonDenorm = PolygonDenorm
  { polygon :: Polygon
  , perimeter :: Double
  , surface :: Double
  }

Denorm stands for denormalized. It is clear now, what you have a “pure” Polygon which haven’t been spoiled with optimization concern and a denormalized version of a polygon which keeps a thunk on all calculated value that you want to keep in “cache”.

Defining a vertices field on PolygonDenorm, will let you use it as if it was a Polygon.

instance HasField "vertices" PolygonExtra Double where getField = vertices . polygon

You can then, use all smart constructor/pattern synonyms techniques to prevent people from updating a polygon (to make sure that the “computed” fields in sync).

Example

data PolygonDenorm = PolygonDenormPrivate -- Don't export this constructor
  { polygon :: Polygon
  , perimeter :: Double
  , surface :: Double
  }

-- | Smart constructor
polygonDenorm :: [Point] -> PolygonDenorm
polygonDerom vertices = PolygonDenormPrivate vertices (calculatePerimeter vertices) (calculateSurface vertices)

pattern  PolygonDenorm :: [Point] -> Double -> Double -> PolygonDenorm
pattern PolygonDenorm vertices p s <- PolygonDenormPrivate vertices p s

etc …

(*)

)

1 Like

Quite the opposite actually. I like Haskell’s laziness and I’d like to see it expanded in scope rather than restricted as with the StrictData and Strict “extensions”. And I’m not looking at OOP as an aspirational goal, but at attribute grammars.

That’s what I was considering as well, @rhendric, but I don’t think it would be very useful on its own.

The biggest issue is that you’d have to declare all computed fields at once. That means the feature would be usable only by the original author of the data type declaration. You couldn’t import a data type from a different module or library and add a computed field to it.

To overcome that problem we’d have to add some form of record subtyping, which would of course be a much bigger project.

2 Likes