Intro
I’ve noticed a common problem cropping up in some of the projects I’ve worked on, regarding heavily-parameterised datatypes with several type aliases for different use-cases.
I’ve also had the concept of a solution percolating around in my head for a while now, so I wanted to get it out and see if anyone has any thoughts on it.
Context
A common pattern in large Haskell codebases is product types that are fully parameterised, for example:
data User' id username password birthday friends = MkUser
{ userId :: id
, userUserName :: username
, userPassword :: password
, userBirthday :: birthday
, userFriends :: friends
}
deriving (Show, Eq, Generic)
We would then define type aliases for different situations:
type User =
User'
UserId
UserName
()
Day
[UserName]
type UserCreate =
User'
()
UserName
PlainTextPassword
Day
[UserName]
type UserPatch =
User'
()
(Maybe UserName)
(Maybe PlainTextPassword)
(Maybe Birthday)
(Maybe [UserName])
Why?
The reasons for doing this, rather than a bunch of independent data declarations, are several:
-
We can re-use code, for example:
countUserFriends :: User' id username password birthday [UserId] -> Int countUserFriends = length . userFriends
Then we can use countUserFriends with User, UserCreate etc.
-
All the type aliases can share instances. For example if we use deriveJSON, the generated
JSONinstances will apply forUser,UserCreate,UserPatchetc.:deriveJSON (defaultOptions { omitNothingFields = True }) ''User'ghci> encode (MkUser () "michaelscott69" "1234" (YearMonthDay 1965 March 15) [] :: UserCreate) { "userUserName": "michaelscott69", "userPassword": "1234", "userBirthday": "1965-03-15", "userFriends": [] } ghci> encode (MkUser () Nothing Nothing Nothing (Just ["dwightschruteARM"]) :: UserPatch) { "userFriends": ["dwightschruteARM"] }
Note that omitNothingFields will only omit fields for parameterised types with aeson >= 2.2.
-
We can easily convert between the aliases:
makeUser :: UserCreate -> IO User makeUser userCreate = do id <- genId pure $ userCreate { userId = id, userPassword = () } -
It makes it very easy to use database libraries like Opaleye:
makeAdaptorAndInstanceInferrable' ''User' type UserSql = User' (Field PGUserId) (Field PGUserName) (Field PGPasswordHash) (Field SqlDate) () -- many-to-many relationship handled by a separate table userTable :: Table UserSql UserSql userTable = table "users" . pMkUser $ MkUser { userId = tableField "id" , userUserName = tableField "username" , userPassword = tableField "password_hash" , userBirthday = tableField "birthday" , userFriends = pure () } insertUser :: Connection -> UserCreate -> IO User insertUser conn userCreate = do newId <- genId passwordHash <- hashPassword $ userPassword userCreate let userToInsert = userCreate { userId = newId, userPassword = passwordHash, userFriends = () } userSql :: UserSql = toFields userToInsert runInsert conn $ Insert userTable [userSql] rCount Nothing pure $ userCreate { userId = newId, userPassword = () }
Problems
However, there are some drawbacks to this approach. For one thing, as the domain grows, the list of type parameters can become very long and difficult to work with:
data User' id username password birthday friends paperSold bossesSleptWith favouriteResort ......
countUserFriends :: User' id username password birthday [UserId] paperSold bossesSleptWith favouriteResort ...... -> Int
countUserFriends = length . userFriends
Moreover, for every general function like countUserFriends that only needs a subset of the parameters to be specified, every time we add a parameter to User', we have to update the function. Of course GHC is nice enough to tell us exactly where we need to do this, but it’s still a pain, and it’s not totally error-proof if we accidentally get the parameters mixed up.
Another issue is that it becomes harder to use ad-hoc variants of User'. In the Opaleye snippet above, let’s say we wanted to create an ephemeral type alias (or just an inline type annotation) to represent the argument to toFields. This would be very similar to User, with one or two differences, but we still need to recreate the whole parameter list:
type UserToInsert =
User'
UserId
UserName
PasswordHash
Day
()
- let userToInsert = userCreate { userId = newId, userPassword = passwordHash, userFriends = () }
+ let userToInsert :: UserToInsert = userCreate { userId = newId, userPassword = passwordHash, userFriends = () }
Again, every time we change User', User or UserSql, we’ll also have to update UserToInsert.
One DRY solution to this is to use partially-parameterised types:
type User'' password friends =
User'
UserId
UserName
password
Day
friends
type User = User'' () [UserName]
type UserToInsert = User'' PasswordHash ()
The problem again is that as the domain grows, and we need more and more different variants of User'', the intersection of all their
parameters becomes smaller and smaller until we end up with a chaotic situation like:
type User'''' id username password friends =
User'
id
username
password
Day
friends
type User''' id password friends = User'''' id UserName password friends
type User'' password friends = User''' UserId password friends
type User = User'' () [UserName]
type UserCreate = User''' () PlainTextPassword [UserName]
type UserToInsert = User'' PasswordHash ()
type UserPatch = User' () (Maybe UserName) (Maybe PlainTextPassword) (Maybe Birthday) (Maybe [UserName])
This quickly becomes unmaintainable.
What can we do about it?
I think that a lot of these issues will go away once we have native first-class row types in Haskell. In the meantime, I imagine a language extension that would solve at least some, if not all of the problems I’ve highlighted.
I’m picturing standard value-level Haskell record syntax, lifted to the level of type parameters.
Explicitly, type parameters in data declarations would now have names, with (optional) kind annotations:
data User'
{ id :: Type,
, username :: Type
, password :: Type
, birthday :: Type
, friends :: Type
} = MkUser
{ userId :: id
, userUserName :: username
, userPassword :: password
, userBirthday :: birthday
, userFriends :: friends
}
Type aliases could now (optionally) be written using record contructor syntax:
type User =
User'
{ id = UserId
, username = UserName
, password = ()
, birthday = Day
, friends = [UserName]
}
Types could now also be defined using record update syntax, with the exact same semantics as value-level record update:
type UserCreate =
User -- note this refers to the type alias, not the data declaration
{ id = ()
, password = PlainTextPassword
}
In theory, we could also have some kind of record accessor syntax:
type UserPatch =
User -- note this refers to the type alias, not the data declaration
{ id = ()
, username = Maybe (username User)
, password = Maybe (password User)
, birthday = Maybe (birthday User)
, friends = Maybe (friends User)
}
Of course, type aliases could still be written using the normal syntax:
type UserSql =
User'
(Field PGUserId)
(Field PGUserName)
(Field PGPasswordHash)
(Field SqlDate)
()
Somehow (maybe maybe), we could imagine a kind of record pattern matching at the type level:
countUserFriends :: User' { friends = [UserName] } -> Int
countUserFriends = length . userFriends
Maybe even inline record updates, with bindings?
addUserId :: user@(User' {}) -> user { id = UserId }
addUserId u = do
id <- genId
pure $ u { userId = id }
Caveats
I’ve never worked on GHC before. I have absolutely no idea how easy or even possible any of this would be to implement. I would imagine that the more basic aspects would be syntactic sugar, but maybe others would be more serious changes.
I’m also not sure how any of this ties in with ongoing GHC development, such as Dependent Haskell, Linear types, or row types.
Questions for the reader
I hope I’ve communicated this clearly enough.Maybe there’s an existing solution that I just don’t know about, or some best practices I can use to avoid the problems I outlined?
Otherwise, it’d be interesting to know if anyone’s tried implementing this before; if not, whether people think it would be a good idea.
Anyway, thanks for reading ![]()