Using Haskell for commercial/data-intensive applications -- name clashes

Are you reading/writing ‘wide’, flat records from/to a database, or from/to other applications in an interchange format such as JSON? How do you declare the data structures in Haskell?

Are you running into the familiar difficulties with Haskell data/record labelling that you want to use the same label name in many different record types? Do you want to use the same name for data constructors in different data types?

(This discourse triggered from a proposal around same-named data constructors.)

We’re all familiar with the challenge, so I’ll recap briefly:

  • Some fields you want to give generic labels such as id, name, status. Other modules might be doing the same. You can at least use module prefixing to resolve ambiguities, but that gets verbose. Or give longer labels studentName, facultyName – also verbose.
  • Some fields of different types within the same module you want to give exactly the same name because they denote exactly the same field. studentId on a course enrollment, studentId on a class attendance or exam result. Those data structures are defined in the same module, so module prefixing won’t help.
  • Prefix the data type name to the field label? That rapidly gets verbose and ugly.
  • Other languages use a dot suffix format Enrollment.studentId (where Enrollment is a type) or thisEnroll.studentId or this.studentId (where this is an object/variable). This also can get verbose, but your editor/IDE at least might recognise the dot format and give helpful prompts.
  • In Haskell those dot lexical formats are already used: for module prefixing; or . as function composition.
  • We’ve had DuplicateRecordFields for some time that at least doesn’t barf at the point of declaration. And DisambiguateRecordFields with increased sophistication at 9.2. OverLoadedRecordFields is about to land. (That’s an initial step, more work to come.)

There’s a similar (though perhaps less challenging) naming clash with data constructors:

  • Different modules by chance using the same tags for enumerations: Oct, Dec for months vs Oct, Dec for radixes. (Module prefixing should help here.)
  • Do you name your constructors same as the data type they’re binding? data Region = City City | State State | ... In H98 the constructors live in a different namespace to type names, but new swishy Haskell wants to collapse those namespaces, imports/exports aren’t so well aware, other developer tools don’t know about Haskell namespaces at all.

If you look back at the history, the H98 design for ADTs/records was always a stopgap. There was an alternative design already developed ~1996. There were many papers/proposals in the early 2000’s; so nobody was proud of it then. But no proposal ‘stuck’, so from ~2012, people started putting lipstick on the pig. There’s by now many layers of lipstick.

As I see it, the reason this is a long-standing mess in Haskell is lack of support for ‘Row types’. Row types don’t have ‘private’ labels/all labels are global; the label name alone doesn’t tell anything about the field’s type or the structure it’s a field within. From a Row type you produce either a Sum aka Variant or Product aka anonymous Record.

In nearly every other part of Haskell design, a proposal is not entertained unless it has a sound theoretical basis. (That’s why Overlapping Instances or FunDeps don’t get much love at GHC HQ.) For records/rows there are theoretically sound approaches; purescript for example never went near H98-style records; and already has a Variants module built on better-founded rows/records. But Haskell/GHC records stumble on with the lipstick.

So how do you cope with the naming clashes today?

1 Like

Seems to me already iffy: City is already a type; with DataKinds in play, constructor City also becomes a type (constructor).

Does your code do this? Are you planning never to use DataKinds?.

We use optics + OverloadedLabels, at work.

2 Likes

we mangle the fields like you suggest (eg sutdentId rather then id), or we use abriaveted type names, like sId or use module prefixes.
anonymous records would be fantastic.
You can kind off fake that with classy lenses, for example a HasId type class with a single function that says a type has an id accessor, but that’s incredibly verbose (and type errors become worse).