Haskell Extension idea: QualifiedData

I’ve bouncing this idea around in my head for a while now, and after playing with some imaginary code, I thought I’d share it with the community to get some feedback. I guess you might consider this a preliminary proposal to see if its even worth investing more effort into the idea or creating an official proposal.


:warning:
A disclaimer beforehand, I felt reluctant to even share this because I didn’t want this idea to potentially derail Richard’s Local Modules proposal, but I’ve become quite discouraged from reading all of the pushback against the proposal along with other proposals of similar nature (first class modules, Namespaces, etc).

Not that this pushback is bad, many opponents of these proposals brought up some valid points. But I’ve noticed the general sentiment with Haskell proposals is that the smaller and less invasive they are, the better chance they have of succeeding. So with that in mind, my preliminary proposal follows!


The Problem / Motivation

Duplicate record fields and data constructors. I want to write code like this:

data Person = Person
  { name :: Text, ... }

data Company = Company
  { name :: Text, ... }

-- Or
data Breakfast = Eggs | Banana

data Lunch = Sandwich | Banana

But I can’t without getting complaints about name or Banana already being used.


Workarounds

For records, there’s DuplicateRecordFields, but this doesn’t help with record update syntax

As suggested in the convenience thread RecordWildCards and OverloadedRecordUpdate can also remedy the update issues, but OverloadedRecordUpdate is experimental currently, and I personally do not like the syntax of RecordWildCards - it feels cluttered to me, especially if I have to do something like:

modifyCompany Payload -> Company -> Company 
modifyCompany payload oldCompany@Company{..} = 

-- I.e I want to refer to company as a whole value + with the Record Wildcard

For duplicate data constructors, the only workaround is to either physically create separate modules for each data type, or to add some prefix to your type such as:

data Breakfast = BEggs | BBanana

data Lunch = LSandwich | LBanana

This workaround also works for record fields, but this solution is very unsatisfying.
Now one might “its just syntax, get over it”, but the point of this extension is to make haskell a more convenient language to use. We need to focus on convenience if we hope to attract a greater audience, and I believe my suggestion is a pretty boring and standard in relation to other languages.


The Proposal

According to GHC Documentation

… Given a datatype definition

data Foo = MkFoo { bar :: Int, baz :: String }

The following will be available:

  1. the type constructor Foo;
  2. the data constructor MkFoo;
  3. the fields bar and baz for record construction, update, and pattern matching; and
  4. the selector functions bar :: Foo -> Int and baz :: Foo -> String.

With this in mind, we add a language extension QualifiedData which sticks the declarations of 2-4 in some sort of implicit module given the same name as the type itself; in the above case: Foo.

With this in mind, we add a language extension QualifiedData which changes the above rules to:

The following will be available:

  1. the type constructor Foo; (No change)
  2. An local module Foo;
  3. the data constructor MkFoo within module Foo;
  4. the fields bar and baz for record construction, update, and pattern matching;
    4b. the fields of rule 4 are also created within module Foo; and
  5. the selector functions bar :: Foo -> Int and baz :: Foo -> String are created within module Foo

This means (when QualifiedData is enabled)

  1. The data constructor for foo would be Foo.MkFoo
  2. To construct a MkFoo, you could say:
    Foo.MkFoo { Foo.bar = 12, Foo.baz = "Baz" } or
    Foo.MkFoo { bar = 12, baz = 12 }
  3. To update a record foo with a value of MkFoo, you say: foo { Foo.bar = 12, Foo.baz = "baz" }
  4. Record selectors would take the form Foo.bar someFoo

This would be applied to all data declarations for any module the extension was enabled in.
(I don’t think theres much point in using this for newtype on further reflection)

The benefits:

  1. It automatically solves both duplicate record fields and duplicate data constructors without requiring any other extensions. (I think… I could be wrong :sweat_smile:)

  2. It makes field selectors less problematic because they no longer pollute the general term namespace. I know some people don’t feel fond of field selectors, but I have found them handy on occasion to use, such as when doing some type of fmap.

  3. It doesn’t require a change in any library code out in the wild. Everyone who doesn’t want it writes haskell code the same as usual. Everyone who does want it gets to all of their data types behind an implicit module without bothering others.


Caveats

I’ve thought of one caveat to point 3. The following code is currently illegal in haskell today

data Breakfast = Eggs | Banana

data Lunch = Sandwich | Banana

This gives the error Multiple declarations of Banana. Using QualifiedData should prevent this error from being raised, but how do consumers of this module deal with this without requiring them to enable QualifiedData?

One workaround I can think of here is to conditionally add a prefix to the data constructors so that they become something like:

data Breakfast = BreakfastEggs | BreakfastBanana

data Lunch = LunchSandwich | LunchBanana

Is this even a possible workaround? I don’t know enough about the implementation of GHC to know if this can even be done.

Edit: Actually. the more I think about this, the more I realize this isn’t a valid workaround. Even if it was, it’d probably create confusion. I’ll think about this point a bit more. Suggestions are welcome as well


Other notes and comparisons:

Compared to the other proposals such as Local Modules, First Class Modules, or Namespaces, this proposal is far less powerful. But one might consider this an advantage since it avoids lot of the pitfalls and edge cases of the other proposals. We dont have to worry about the various ways exporting submodules could impact users and how this interacts with using/not-using local modules.

What I like about the Local Modules proposal over this one is that it feels like a more general purpose and more composable solution. But I hope (maybe naively) that it might be possible to implement this in such a way that it lays the ground work for a more proper Local modules implementation, and also reaps us some benefits along the way.


Some Example code

When I was first playing with this idea, I wrote the following imaginary code. I leave it here just so you can see the extension in action

Some plain old file: SomeModule.hs

module SomeModule where

data FooType
  = Foo
  | Bar
  | Baz

data Person
  = Person { name :: Text, age :: Int }

And a main module Main.hs where QualifiedData is used

{-# LANGUAGE QualifiedData #-}

module Main where

import SomeModule (FooType(..), Person(..))

showFooType :: FooType -> String
showFooType f = case f of
  FooType.Foo -> "foo"
  FooType.Bar -> "bar"
  FooType.Baz -> "baz"

createPerson :: Text -> Int -> Person
createPerson name age = 
  Person { Person.name = name, Person.age = age }

testPerson :: Person -> String
testPerson p = 
  p { Person.name = "hello" }

What I hope to gain from this post is some general feedback, yays and nays! And perhaps guidance on how this proposal might be refined if refinement is needed and people are generally in favor of this in general

Thanks for taking the time to read this!

5 Likes

Thanks for thinking about this! I think it might well be possible to carve out a subset of Local Modules like this that makes sense on its own but also lays the groundwork for the full-blown Local Modules in the future. And I agree it could provide a nice solution to the problems of duplicate record fields. However, I worry it may still be complex to fully specify.

For example, how does it interact with qualified imports? If I say import qualified SomeModule then presumably SomeModule.FooType.Foo refers to the Foo data constructor of FooType exported from SomeModule, but what if there is also a module called SomeModule.FooType?

3 Likes

I thought about that during my brain storming and the best I could come up with is to just qualify the module to a different name. E.g: import SomeModule qualified as OtherModule. Otherwise not fixing it would result in some type of ambiguous name error.

Hmm, currently the way Haskell works, it does some type of module merging logic.

import Foo.Bar qualified
import Bar.Baz qualified as Foo.Bar

Seems to work fine with no errors of any sort. I don’t really understand what’s happening there but I wonder if that same behavior can be applied here? But maybe just giving some ambiguity error is still the better option?

Right now, unless I’m mistaken, a name is always (PascalID.)*(camelID|PascalID), and the prefix is always a reference to a module, however it has been imported. This proposal would complicate that, and while we can defer any ambiguity errors that arise in a similar way we currently disambiguate between modules imported with the same name, I’m not keen on having to explain exactly what the new structure of a name is to learners. At least with the local modules proposal the prefix is still a module; there’s just a new way to declare modules.

I would not have this concern if some syntax that was not TypeName.accessor/TypeName.Constructor was chosen for this feature. It does seem to me like a different sort of thing than pulling a name from a module namespace, and perhaps it deserves a different way to represent it rather than further overloading poor ..

It strikes me that that the import trick (see also here) for working with duplicate record fields could be considered a primitive version of this proposal: it carefully imports different subsets of definitions of a module and gives them different aliases, which usually correspond to datatype names.

import Types as Person (Person(..))
import Types as Company (Company(..))
1 Like

Well my hope is that this proposal would be implemented essentially as a subset of the local module proposal. I.e: Declaring a type like data FooType = ... would be declaring a local module implicitly. They just wouldn’t have the ability to explicitly create local modules in the meantime.

The same behavior was outlined in the local modules proposal too. See Motivation under point #2.

If it turned out that we didn’t want this to be implemented as a local module, I wouldn’t be opposed to using a different syntax instead of the . though. The dot just feels natural as many other languages handle enums and union types the same way, e.g: F#, C#, Dart, Typescript, etc. Rust notably uses Foo::Bar syntax, but I’ve seen a lot of pushback on that one; I don’t particularly care for it myself.


Yea, this is exactly how I currently deal with this situation for duplicate record fields. It’s quite annoying. It’d be nice if this happened automatically. For duplicate data constructors, I’ve opted to use the typename-prefix workaround :persevere:

You mean a syntax like

import Types as X dataaliases

Which would have the effect of importing

import Types as X
import Types as X.Person (Person(..))
import Types as X.Company (Company(..))

that is, a submodule alias for every datatype that we import from X?

For duplicate data constructors

Perhaps we could have DuplicateConstructorNames extension analogous to DuplicateRecordFields.

1 Like

Haha no I didn’t mean that specifically but that’s an interesting thought on its own! I think that type of proposal is essentially just syntactic sugar, its even less invasive than what I’ve written up here. Good idea! One might also imagine some type of syntax where you choose what data types are aliased
e.g:

import Types(qualified Person(..), qualified Company(..))

Maybe a request to make aliases for all data types could look like

import Types(qualified)

What I meant though was basically what I’ve specified in this proposal, which would basically cause these aliases to happen automatically as well, only as some type of implicit submodule instead of just a module alias.

Wouldn’t this be unambiguous even without the field qualifiers, i.e. Foo.MkFoo { bar = 12, baz = "Baz" }? I realize this is not quite forward-compatible with local modules you’re hoping for, but since the main motivation is the ergonomics of records…

Ah yeah you’re right. Record update would require the qualified reference but record construction wouldn’t. Thanks for chiming in with that.

So I guess the declarations from point 3 referenced above should be both created unqualified and also qualified. So they’ll be more ergonomic for record construction but the qualified version can be used for record update. I’ve updated the original spec I gave to include these points more explicitly. Thanks for the help!

Ok, lets try this again with a 2nd variation of this proposal. This one is based on this comment

The motivation/problem section are the same as the initial post in this thread.

The Proposal

We add a language extension QualifiedData which has the effect of automatically silently importing all Data types under a qualified alias.

So for example, for the following module code:

module SomeModule where

data FooType = Foo | Bar | Baz

data Person
  = Person { name :: Text, age :: Int }

data Company
  = Company { name :: String, website :: String }

Importing the above module with QualifiedData like so:

{-# LANGUAGE QualifiedData #-}
module Main where

import SomeModule
import qualified SomeModule as SM

would be equivalent to:

{-# LANGUAGE QualifiedData #-}
module Main where

import SomeModule
import qualified SomeModule as FooType (FooType(..))
import qualified SomeModule as Person (Person(..))
import qualified SomeModule as Company (Company(..))

import qualified SomeModule as SM
import qualified SomeModule as SM.FooType (FooType(..))
import qualified SomeModule as SM.Person (Person(..))
import qualified SomeModule as SM.Company (Company(..))

The benefits:

  1. We get a more ergonomic way to disambiguate/use duplicate record fields
  2. With the addition of some type of DuplicateDataConstructors extension, we get the same benefits as #1 for data constructors
  3. It makes disambiguation of field selectors more ergonomic. (Same as original point 2)
  4. It doesn’t require a change in any library code out in the wild. Everyone who doesn’t want it writes haskell code the same as usual. Everyone who does want it gets to all of their data types behind an implicit module without bothering others. (Same as original point 3)
  5. It minimizes the noise/length of imports
  6. This change is completely syntactic sugar, therefore there is no new behavior to accommodate for, and no new machinery that needs to be created to do this. Nothing is happening in this extension that users can’t already do manually, it just automates it, making it much less annoying.
  7. (Authors opinion): It brings us much closer to the experience of other languages, where enum-like types can be “dotted” into to look at its members.
  8. Because this extension isn’t invasive, it’s likely easy to deprecate if something much better comes along later.

As always, any feedback on this idea is greatly appreciated. Thanks for reading!

4 Likes

I like this more than the original proposal, I think!

How would it interact with more complicated imports?

import SomeModule hiding (FooType)
  • FooType is not in scope, but Foo, etc. are. Is the import ... as FooType (FooType(..)) generated?
import SomeModule (Foo, Bar)
  • Does this generate import ... as FooType (Foo, Bar)?
import SomeModule hiding (Foo, Bar)
  • I suppose this might generate import ... as FooType (Baz)?

What about classes? Their members can also be imported via the (..) syntax; would it make sense to support augmenting

import Data.Bifunctor

with

import qualified Data.Bifunctor as Bifunctor (Bifunctor(..))

too?

Hmm, those are all interesting questions - thanks for bringing them up!

After giving this some thought, I think the conclusion I’ve come to is that, unless the entirety of a type has been hidden, e.g: hiding (FooType(..)), then the extra qualified import would be created.


This isn’t actually valid syntax (atleast from my testing it didn’t work), but I’m guessing the example would more correctly be

import SomeModule (FooType(Foo, Bar))

Which following the above rule: Yes, this would generate the separate import as you inquired above.


Yep, sounds correct to me.


Yea this sounds totally reasonable and seems to fit well into the system so I don’t see why not! I guess we might change the name of the extension to QualifiedEntities or QualfiiedImportItems (or whatever) in that case.

Thanks for asking those questions, that helped clarify this proposal further!

1 Like

Oh yes, my mistake! I was originally thinking about the similar, but actually legal, version where it’s fields instead of constructors being imported individually:

import SomeModule (website)

Does that generate a qualified import?

And then what about

import SomeModule (name)

Does that generate two qualified imports?

In this case, I think the answer is no. I think the rule here would be that the qualified imports are only created if the type itself is actually referenced when importing, i.e: It won’t trigger if you merely import field selector functions.

E.g:

import SomeModule (website) -- No qualified import created

import SomeModule (Company(website)) -- Qualified import for Company *is* created

General Rules/Spec

In general, I think this leads the specification to the following rules:

  1. Unspecified imports automatically create qualified imports for all data types and classes

  2. When using an import spec, qualified imports are only created if the type is explicitly mentioned,
    e.g: import FooMod (Foo(..)) or import FooMod (Foo(Foo, Bar, fooField, barField)). In this case, the qualified import will match the import spec for that type.
    But import FooMod (fooField, barField) will not generate qualified imports

  3. When importing with hiding, the qualified import will not be generated only if the type is explicitly omitted in its entirety, e.g: import SomeModule hiding (Foo(..)). Any unmentioned data or classes will have qualified imports generated.

  4. When import hiding only partially excludes components of a type, that type will still be imported qualified minus whichever fields were hidden

e.g:

import SomeModule hiding (Company(website))

becomes

import SomeModule hiding(Company(website))
import SomeModule qualified as Company (Company(name))

Alternative Rules

Although maybe the above rules are more complicated than they need to be. One might argue that within the qualified imports, access to everything should always be granted. Meaning

Rule 2 becomes:
Any time a type is mentioned at all in an import list, a full qualified import is generated,
e.g:

import SomeModule (Company(website))

becomes:

import SomeModule (Company(website))
import SomeModule qualified as Company (Company(..))`

Rule 4 becomes:
4. When import hiding only partially excludes components of a type, that type will still be imported qualified in its entirety

e.g:

import SomeModule hiding (Company(website))

becomes

import SomeModule hiding(Company(website))
import SomeModule qualified as Company (Company(..))

I’m not sure which is better between the first set of rules and this alternative set, but the alternative set is probably much simpler to implement. It might actually be the more desirable option as well


Duplicate Imports

One last thought I had: I don’t think we need to worry about the case of partially duplicated imports

import SomeModule (Person (name))
import SomeModule (Person (age))

I think in this case, maybe two qualified imports would be created like so:

import SomeModule (Person (name))
import SomeModule (Person (age))

import SomeModule qualified as Person (Person(name))
import SomeModule qualified as Person (Person(age))

But this doesn’t seem to be a problem since the modules would just merge. Also, code formatters like ormoulu (and maybe fourmolu) automatically combine the original 2 imports into a single one anyways

I’ve realized that this 2nd version of the proposal has one fatal flaw - the behavior only works when importing a module, it doesn’t work in the module where data and classes are declared. This creates an inconsistent experience at best, but ultimately doesn’t really address the problem in its entirety. In that sense, I think the first proposal (being a subset of Richard’s original proposal) is definitely the better way to do this. I guess I’ll have to sit and think on it more

1 Like