Today, I wrote on a whim this small post about computed properties for record types in Haskell.
That is, a calling functions that compute a property of a record based on the fields of that record as though that function were a (computed) property of that record using the dot syntax:
TL;DR:
{-# LANGUAGE GHC2021, OverloadedRecordDot, DataKinds #-}
import GHC.Records
data User = User
{ birthYear :: Int
, name :: String
}
instance HasField "age" User Int where getField = age
age :: User -> Int
age u = 2023 - u.birthYear
user :: User
user = User{birthYear=1995, name="Robert"}
main = print user.age
Thanks @romes, yes it was always part of the Overloaded Record Dot [ancient history] line of thought you could make up âvirtualâ fields for a record. Thatâs mimicking SQL table.field, and especially OOP object.extractor â where the called method might be doing arbitrary computation.
Youâll see amongst that ancient history examples like concating and pretty-formatting firstName ++ " " ++ lastName to give a virtual fullName field.
Some of those proposals were going further: tightbinding postfix . as reverse function apply, not specific to record.field â so
(The first . must be familiar function composition because itâs space-surrounded. Note tightfix bas.flub binds tighter than the quux function application.)
Beware that the Lenses approach uses . to mean always function composition, and itâs a particular idiom for deeply-nested fields to build an accessor in tightfix style. (If you switch on OverloadedRecordDot, beware that steals the tightfix function composition syntax which makes Lensistas grumpy.)
I associate this idea with Matt Parsons, who I think has written about it a few times, although the only thing I can find right now is Stealing Impl from Rust .
Speaking about user-defined HasField instances, I think it would be neat if OverloadedRecordDot was extended so that it supported types as field identifiers.
I can currently write
newtype Address = Address String
deriving newtype Show
data User = User { address :: Address , name :: String }
deriving stock Show
instance HasField Address User Address where
getField User {address} = address
someUser :: User
someUser = User { address = Address "someplace", name = "somename"}
foo :: Address
foo = someUser & getField @Address
but I would like to be able to write
foo :: Address
foo = someUser.Address
The Go language has a feature similar to this, called anonymous fields.
Not specifically: using . like that is incompatible with Lenses, whatever it means. [**]
(Vaguely.) Itâs to do with name resolution for the field: note that you might import this module/its data decl into some other module that isnât using the OverloadedRecordDot and therefore isnât aware of HasField. Or this module might import a data decl from somewhere not using it. The data doesnât include ... deriving HasField so thereâs no telling from the data how this field is going to be used.
Name resolution detects the tight . early in the compilation pipeline, tries to figure out which data decl the field comes from, and creates a virtual HasField instance on the fly. For this reason thereâs all sorts of syntactic restrictions around using the ., to give name resolution a hope of succeeding. (For example, you canât put an expression after the .: it has to be bare field name.)
If tight-. was merely sugar for reverse apply, instance resolution couldnât kick in yet, and type inference would probably get stuck not knowing what was the type of the resulting expression.
[**] Although ⌠note that Lenses uses . between piled-up/nested field names (lenses arising therefrom). It doesnât use . between the outermost container and the lens(es) â instead it has that zoo of squiggly operators, depending whether youâre merely accessing the underlying field or âupdatingâ it in some form. So a very clever name-resolution algorithm might still be able to detect when tight-. is being used for compose versus field access versus reverse apply. But then the different uses have different binding precedence. I guess itâs all too hard â to explain to perplexed programmers especially.
That seems a spectacularly fragile idea. As soon as you have a data with more than one field of the same type, itâs ambiguous. (If someoneâs amending the data decl by adding a field, are they going to go check if somewhere thereâs a usage relying on the unique field type?) And note type resolution/instance selection comes after name resolution in the usual compiler pipeline. (OverloadedRecordDot has special handling at that stage, as I vaguely recalled above.)
Iâm keen to allow upper-case field names (which would have to use NoFieldSelectors), and I think would work with OverloadedRecordDot or of course just the { ... } syntax. I think of field names more like constructors than functions.
data Square
= Square
{ topLeft :: Point
, bottomRight :: Point
}
is a fine way to define a Rectangle data type. But thatâs just an aside.
What really bothers me about the virtual record fields via OverloadedRecordDot idea is that theyâd have very different semantics from the actual record fields. Consider
data Polygon = Polygon {
vertices :: [Point],
perimeter :: Double}
polygon :: [Point] -> Polygon
polygon vs = Polygon vs (calculatePerimeter vs)
instance HasField "surfaceArea" Polygon Double
where getField = calculateArea
The calculatePerimeter and calculateArea functions both have linear complexity. The fake field polygon.surfaceArea will exact that cost every time itâs accessed, the proper field access polygon.perimeter only the first time.
Iâd like to see properly designed computed fields added to Haskell, because this would play to its lazy strength for a change. But this ainât it.
Well, donât shoot the messenger! Iâm trying to remember some very intricate/overlapping decisions from more than 7 years ago. GHC tries to make each extension orthogonal, so it canât know/doesnât try to restrict what else might be going on in the code. Iâve mounted the tape from the archive; let me see ⌠Some points:
x.f.g.h should be supported, to access within deeply-nested records. But instance resolution for h depends on instance resolution for g, which depends on âŚ
Your local module might be importing the data decl, but have a local decl for function f. The usual way to resolve the name clash would be a module prefix: x.ModuleF.f â does anybody like that? Can you read it as "the f field from module ModuleF applied to record x"? Contrast: the design idea was that dot-suffix field access should be syntactically lightweight.
The instances for HasField are weird (as SPJ says somewhere). The field name (type-level String) doesnât appear as an argument to getField (itâs a phantom type). The usual way to resolve that at the call site would be sth like getField @"age" x. But a) we donât want to force @ calls on a user merely accessing a field; b) (again) how do we keep the dot-suffix syntactically lightweight?
If dot-suffixing were trying to be more polymorphic, I could assign nameShadow = name the field accessor function. The type of that is merely :: User -> String. Now how do I get from x.nameShadow to @"name"?
So the decision [**] was: .namealways means field access, irrespective of what other names are in scope; and always gets implemented as getField :: HasField "name" r a => r -> a, where r is filled in from the usage site, and thereâs a FunDep to a from the HasField instance.
[**] Itâs very difficult at design time to anticipate all the gotchas, and all the ways other extensions might overlap/interfere. There was a strong feeling (and given GHC had more of a culture of experimenting back then) for providing merely NoFieldSelectors ASAP, so as to free up the namespace and let a thousand flowers bloom. That never happened.
So when the HasField stuff eventually arrived, I was greatly disappointed. It seems to me it has a very low power to weight ratio â especially considering how much syntax it steals. I tried it. I didnât like it. Iâve never used it in real code. I wish it was just withdrawn before more people take it up and then GHC has (another) backwards compatibility/legacy problem.
BTW Haskellers have been arguing about records since ⌠oh, before the H98 Language Standard was published. What went in there was considered a stopgap at the time. Even so, it was a braindead decision to tie the field name to only a single record type. There was a swirl of possible redesigns 2000~2002, and including a very obvious and fully implemented proposal TRex from Hugs. Nothing happened. When I started an interest in Haskell ~2010 it was still a hot topic; still we were almost ⌠nearly ⌠on the verge of something better.
Frankly, Iâve given up hoping. If youâre writing an application in which record structures and flexible accessing are important, donât use Haskell.
The o.p. didnât propose such a dubious design for Polygons[**]/and anyway Square was only an easy-to-understand example to illustrate the technique. Your example is bogus.
Yeah, good luck with specifying what a âproper designâ is. (I think you havenât shown one in your post.)
Edit: At a guess, you seem to be taking an overly OOP/procedural approach that you expect the calculated field to be âstoredâ inside the record. A Haskell term mySquare.area is just some expression (after desugaring), and as such susceptible of specialisation/inlining then CSE [Common Subexpression Elimination]. It wonât necessarily get evaluated twice. (And in case it needs evaluating only once, youâve gained nothing; if it doesnât need evaluating at all (because laziness), youâve wasted storing a thunk in the record.)
[**] Isnât that design in fact under-specified: given (say) eight points, thereâs multiple ways to draw a perimeter around them â if we allow concave/self-intersecting arrangements â even if we know the magnitude of the perimeter â and you appear to be claiming perimeter is a calculated field.
This one seems pretty easy, actually. Please steal this for a proposal if you like.
Extend the grammar production fielddecl:
fielddecl â vars :: (type | ! atype)
+ | var = exp
This production defines a âcomputed fieldâ for its associated constructor. This is equivalent to defining an ordinary field with the type of exp, except for the following differences:
The RHS of a computed field type-checks as if every field name of that constructor, computed or not, is in scope and has the type of the field (not the fieldâs getter function). All other names are expected to refer to imported or top-level identifiers. Note that circular references between computed fields are possible.
Prior to type-checking, uses of the constructor in an expression are replaced with lambdas that accept only the non-computed fields of the constructor (in declaration order) and inline the RHS of any computed fields. As an example:
(The let allows computed fields to refer to each other and themselves. Any identifiers used in the RHS that are not field names must be inlined hygienically; a local identifier must not shadow a top-level identifier referenced by a computed field.)
Labeled constructions (Polygon { vertices = ... }) are rewritten analogously. Missing fields that are not computed are still initialized to undefined, per the Haskell 2010 Report. It is an error to provide the value of a computed field here.
Uses of the constructor in a pattern are replaced with patterns that contain wildcards for all computed field positions. Labeled patterns are unaffected.
Labeled updates (foo { vertices = vertices foo ++ ... }) are translated per the Haskell 2010 Report as if only non-computed fields exist, and then constructors are rewritten as above. It is an error to provide the value of a computed field here.
Data constructors can be partially applied â just like everything. So whatâs the type of Polygon [a, b, c] ? Why wouldnât it be :: Double -> Double -> Polygon like any regular data constructor ?
Suppose I return bare Polygon constructor as the result of an expression/function (and suppose type Polygon has other constructors, with differing numbers of computed fields). Whatâs to be the type of the return? Note the type canât encode which constructor got returned.
So with a partially applied constructor, how do I deliberately leave a computed field undefined â because, for example, I know I havenât supplied enough in the âordinary fieldsâ to support the calculation. (If runtime tries to calculate, itâll throw an error â unlike putting undefined in fields in a H98 data type.)
Emm? Wouldnât you want to bind a var to the computed positions, so you can grab the result of the computation. Indeed that might be a Maybe/Either/List/etc so youâd want a full pattern match.
As per my âEdit:â to the post youâre responding to (perhaps it arrived after you started into your sketch), Iâm a lot less than convinced that storing the calculated field inside the data type will gain you much/often. It feels a lot like youâre trying to bring a OOP/strict language idiom into Haskell.
The type would be Polygon. The user-visible Polygon constructor should act like the desugared Polygon constructor minus the computed fields. Thatâs the design goal.
Computed fields are associated with a specific constructor, just like other fields. So if you return bare Polygon, it has type [Point] -> Polygon. If you return another constructor bare, it has the type that that constructor would have if you ignore its computed fields.
What difference would that make? If you never use the computed field, itâs just a thunk. If you use it and it never forces the value of an undefined field, you get a valid result. If you use it and it does force the value of an undefined field, the result is the same as what would happen if you called a function that computes the same thing. But the advantage of a computed field over a function is that any sharing of the input value will also share the result of the function, without having to use some sort of external memoization.
Maybe in a Polygon{..}-style wildcard pattern (I didnât include that in the proposal but I could see it going either way). But if you match on Polygon v, I donât think itâd be expected to introduce vars for the computed fields.
You can always match on p@Polygon{} | Just x <- p.computedMaybeThing.
If you want to look at it that way, sure. But benefiting from sharing data is an existing Haskell idiom for both performance and making practical use of lazy infinite structures, and it covers more cases than CSE does (and GHC is a little unpredictable about when it applies CSE anyway, I believe).
and donât expose Polygon_internal. That should do it. (Add some pragmas for pattern completeness, and probably a few other things having to do with pattern synonymsâ rough edges, but thatâs the basic rewrite taken care of with existing language features at least.)
My first thought was, of course it is not an anti-pattern, itâs been designed for that exact reason.
Since reading all the comments on this post, I changed my mind : this is definitely an anti-pattern.
First, there is no benefit from using user.age over (user age), unless you can use it to update thing, or you are already using user.name, which is not idiomatic so far.
If it is to âeaseâ transition from people coming from an OOP backgroung, I doubt it will be effective.
Then, it hides the fact that age can be time consuming and will be recalculated every time. As @rhendric pointed, the whole thing could be âenhancedâ with something along
[quote=ârhendric, post:14, topic:8237â]
Extend the grammar production fielddecl:
fielddecl â vars :: (type | ! atype)
+ | var = exp
This production defines a âcomputed fieldâ for its associated constructor. This is equivalent to defining an ordinary field with the type of exp, except for the following differences:
[/quote] Extend the grammar production fielddecl:
fielddecl â vars :: (type | ! atype)
+ | var = exp
This production defines a âcomputed fieldâ for its associated constructor. This is equivalent to defining an ordinary field with the type of exp, except for the following differences:
But this will make what some think as already overcomplicated (*) even more complicated.
However, I think there are legitimate uses for fields and tight binding.
For example for people which want to use records as ML module (for dependency injections or mocking âŚ) it make sense to be able to write mod.f instead of Mod.f and this without having to bother with the parenthesis.
After reflection on
If we see in x.f that f âbelongsâ to x (as Maybe belongs to Data.Maybe) then it makes sense that the tight binding make the type inference unidirectional, you need to know x before choosing f.
There for x.f is more than (f x) it also adds the functional dependency x -> "f" -> f.
Something where HasField excels is to âliftâ nested field up. This can be used to rewrite @blamario problem (I have had similar problems and came with the following approach). Instead of writing
or extending GHC to allow
One can reverse the problem and write this instead
Denorm stands for denormalized. It is clear now, what you have a âpureâ Polygon which havenât been spoiled with optimization concern and a denormalized version of a polygon which keeps a thunk on all calculated value that you want to keep in âcacheâ.
Defining a vertices field on PolygonDenorm, will let you use it as if it was a Polygon.
You can then, use all smart constructor/pattern synonyms techniques to prevent people from updating a polygon (to make sure that the âcomputedâ fields in sync).
Example
data PolygonDenorm = PolygonDenormPrivate -- Don't export this constructor
{ polygon :: Polygon
, perimeter :: Double
, surface :: Double
}
-- | Smart constructor
polygonDenorm :: [Point] -> PolygonDenorm
polygonDerom vertices = PolygonDenormPrivate vertices (calculatePerimeter vertices) (calculateSurface vertices)
pattern PolygonDenorm :: [Point] -> Double -> Double -> PolygonDenorm
pattern PolygonDenorm vertices p s <- PolygonDenormPrivate vertices p s
Quite the opposite actually. I like Haskellâs laziness and Iâd like to see it expanded in scope rather than restricted as with the StrictData and Strict âextensionsâ. And Iâm not looking at OOP as an aspirational goal, but at attribute grammars.
Thatâs what I was considering as well, @rhendric, but I donât think it would be very useful on its own.
The biggest issue is that youâd have to declare all computed fields at once. That means the feature would be usable only by the original author of the data type declaration. You couldnât import a data type from a different module or library and add a computed field to it.
To overcome that problem weâd have to add some form of record subtyping, which would of course be a much bigger project.