`type data` `data` reification

TypeData declarations arrived in GHC 9.6 (March 2023), aimed at building on DataKinds aka data type promotion

… which means that a single data declaration introduces both a type with value constructors, and a kind with type constructors. In some cases this alleviates the need for duplicated declarations (e.g., Bool ), however, in many common cases using promotion leads to clutter.

That is, the cluttering of namespaces with data constructors intended never to be used in terms.

I’m trying to follow the proposal and discussion from early 2018. It’s hard to cast memory back to what was and wasn’t in GHC at the time. There’s

What I notice there is TypeRepr, explained in the module top-comments:

  -- In addition, we provide a value-level reification of the type
  -- indices that can be examined by pattern matching, called 'TypeRepr'.

So there is something to be used in terms?? As well as the type data constructors to be used as types and the type data (emm) type to be used as kind type-of-type. The canonical style of declaration is given in the Proposed Change:

  type data Universe = Character | Number | Boolean
   
  data Type u where
    Character :: Type Character
    Number    :: Type Number
    Boolean   :: Type Boolean

This corresponds (I think) to a technique in the HList paper (2004 – so waaay before any promoting was allowed [**] – Section 4 Type-level naturals)

   class HNat n
   data HZero; instance HNat HZero
   data HSucc n; instance HNat n => HNat (HSucc n)
   hZero :: HZero; hZero = ⊥
   hSucc :: HNat n => n -> HSucc n; hSucc _ = ⊥
   hPred :: HNat n => HSucc n -> n; hPred _ = ⊥

   -- example usage
   hLookupByHNat hZero (id .*. HNil) ...
   -- "Numeral-based access operations for heterogeneous collections"
  1. What does it gain with an EmptyDataDecl then making hZero = ⊥ to give a term-level name, as opposed to (say) data HZero = MkHZero to give a term-level name?
  2. Although type data constructor (type) Number has only value , GADT Number :: Type Number is a legitimate non-bottom value. Then term name Number can be used in pattern matching as well as a function parameter(?)
  3. Is there a more general design pattern here? To have three layers of naming:
    • type-of-type Universe
    • type Number
    • term Number (I’m carefully following the example here; I can see that duplicate name won’t work in a unified namespace; so NumberProxy??)
  4. Then in the type data decl should there be the ability to declare all three layers together?

[**] Then class HNat is ‘poor person’s promotion’ to be able to group the Peano constructors/constrain them from other type-level constructors.

The use of EmptyDataDecls like this was a common pre-DataKinds trick to create type-level “data constructors”. I guess the author used instead of an actual constructor to emphasize that HZero was only intended to be used as a type-level construct, not to provide information for use at runtime. Then hZero fulfills the function of Proxy, allowing a “type argument” to be passed.

Of course this isn’t hugely safe, e.g. if your caller gives you a HZero argument nothing stops you eliminating it using EmptyCase, and you need tricks like HNat to regain some level of “kind checking”. Hence the motivation for doing this more robustly with DataKinds.

I’d be careful here: the type data constructor Number is not a “type” in the sense of “a thing that classifies values”. Thus saying that ⊥ :: Number is a kind error, just as ⊥ :: Maybe is a kind error.

The word “type” is sometimes used to mean “a thing that uses the type-level namespace”, but I think it’s clearer to use phrases such as “type constructor” or “type-level expression” for those. Thus the type data constructor Number is not a type, but it is a type-level expression. Similarly Maybe is not (strictly) a type, but it is a type constructor.

One way to think about this is to observe that the Type datatype defined here is really the “singleton type” corresponding to Universe. Given a (type-level) datatype, the singleton construction gives a mechanical way to construct a corresponding datatype for use at the term level, such that pattern matching on the singleton corresponds to dependent pattern matching.

Maybe. One can certainly automate generation of the singleton type from the original, e.g. as in the singletons package. But that’s really a workaround for being unable to use dependent pattern matching directly.

From a dependent types perspective, there should ideally be one type that can be used at both levels. Much as in the pre-DataKinds days we used EmptyDataDecls to fake DataKinds, today we can use singletons to fake dependent types, subject to various awkward constraints such as needing two datatypes for what is conceptually a single type. Perhaps in the future such hacks may become unnecessary.

1 Like

Does the TypeData extension make the reflection package superfluous? See this question. That is, does it become easier to get our hands on functions with types like

Proxy 'True -> Bool

We have Maybe :: Type -> Type, so yes that’s not a type simpliciter. Is Maybe Number a type; or is it ill-formed (ill-kinded because Number :: Universe)?

Should your sentence read “the type data constructor Number is not a Type”? (By lower-case ‘type’ usually I mean anything that can come after a :: in a term’s annotation, or as an ‘argument’ to function arrow ->.)

Thank you Adam for a comprehensive reply. “unnecessary” when we have (I’m guessing)

hLookupByHNat :: forall (l :: HList). forall (n :: HNat) -> l -> hElemByHNat n l
--                                    ^^^^^^ dependent quant

-- example usage
hLookupByHNat HZero (id .*. HNil) ...
--            ^^^^^ type, not term

in which (l :: HList) is a heterogeneous list, so the type of its nth element depends on the n so gets calculated by type-level function hElemByHNat (that I invented out of thin air). This dependency is expressed in the original FunDep in hLookupByHNat’s method decl.

And to call the function, I didn’t need a term-level hZero. OTOH (l :: HList) is needed both for its (element) type(s) to calculate the result type from hLookupByHNat and for its term-level value to yield the nth term. So I’m not sure that signature is happy.

I’m afraid I still don’t grok what the singletons package is all about. It was presented as a temporary work-around, so I decided to skip it and wait for the proper approach.

Not really. TypeData allows one to define data constructors that can only be used at the type level, not at the term level. But for mixing levels one still needs singletons / dependent types.

A proxy doesn’t carry any runtime information, so forall (b :: Bool) . Proxy b -> Bool isn’t able to return different values at different types. That’s where some form of singleton is needed (sometimes expressed using a class constraint, e.g. forall (b :: Bool) . SingI b => Proxy b -> Bool). With visible dependent quantification (in recent GHCs) this can be simplified to forall (b :: Bool) -> SingI b => Bool. Full dependent types in the future may allow foreach (b :: Bool) -> Bool, where the foreach quantifier indicates that the value is passed at runtime rather than being erased.

Indeed, Maybe Number ill-kinded (whichever Number is being used).

My sentence uses “type” the way I think it should be used. :grin: But the important thing is to recognise that there are two possible meanings and be clear about which one is being used in any particular context. (Your definition is roughly what I mean by “type-level expression”.)

Okay, I can sort of see that. It would violate parametricity.

But how is this different, then? The class constraint is an implicit dictionary parameter. How does that contain more run-time information about b than the Proxy? Is it because the Sing type family provides structured values to pattern-match on? (Sorry, I did not mean to turn this thread into a singletons tutorial.)

Still I think extensions like DataKinds or TypeData ought to be bundled with some of the functionality that singleton’s demote provides. It currently doesn’t because there is no suitable type class in base or GHC lib?

Yes, the dictionary for SingI b is ultimately just a value of type SBool b, which provides both a runtime value isomorphic to Bool and a proof of the equality between the value passed at runtime and the type-level Bool:

data SBool (b :: Bool) where
  SFalse :: SBool False
  STrue  :: SBool True

Ok, thank you; the penny is slowly dropping … To try to explain it to myself:

What I had in mind by “declare all three layers together” is sth like (inventing some wild syntax on the fly)

type data Universe = Character  = CharProxy     -- add a term-level constr
                   | Number     = NumbProxy
                   | Boolean    = BoolProxy

-- thus avoiding the GADT
   data Type u where
     Character :: Type Character
     Number    :: Type Number
     Boolean   :: Type Boolean

Then the XxProxy constrs ensure that pattern matching delivers the right type constructor. (That syntax is aimed to telescope together type data ... with data Character = CharProxy, etc. so that we have term CharProxy :: Character; type constr Character :: Universe.)

But what is to be the type signature for a function that consumes such terms?

  • not foo :: Character -> ... because that’ll consume only CharProxy.
  • not foo :: a -> ... because that’s too general.
  • ?? foo :: (a :: Universe) -> ... —but what do the equations look like?
    • foo CharProxy = ... – inferred :: Character -> ..., so not general enough
    • foo NumbProxy = ... – likewise
    • foo BoolProxy = ... – ditto

So the GADT enables foo :: Type u -> ... such that the equations go

  • foo Character = ... – that is, Type u’s constructor :: Type Character
  • foo Number = ... – … " … :: Type Number
  • foo Boolean = ... – … " … :: Type Boolean

And the pattern matching safely yields the ‘dependent’ type.


I feel I should explain why I’m asking such particular details. I’m hunting for a Haskellish type theory underpinning for a records calculi. Where ‘record’ is a stand-alone/first-class value containing name-value pairs. Haskell GHC doesn’t have those at all; C/C++ structs, Java/script objects, ML/purescript/elm records are along the right lines, but not first-class enough [**]. The name (constructor?) appears in both terms and type expressions:

  • { name = "Jones", grade = 'D' } :: { name :: String, grade :: Char }
    Is the usual sort of syntax; a bit confusing because it’s the value labelled name that’s type String; if name has a type it’s :: a -> Name a/or perhaps the = syntax is smuggling in a constructor (:=) :: n -> a -> Labelled n a; then the n is a phantom type constructor.
  • For these particular (slew of) use cases, I think it makes sense the term-level name is same as the type-level constructor. “I do often use the tycon/datacon punning myself” [SPJ as of the type data discussion – of course he might have changed his mind since].
  • [**] First-class would include methods to project a record on a subset of labels; remove a subset of labels; extend a record with some other record’s names-values.

Just as a note, PureScript allows for all of these things to be implemented as library functions:

project a record on a subset of labels

remove a subset of labels

Both of these are Record.Extra.pick, depending on whether you’re specifying the b row (subset to keep) or the r row (subset to remove).

extend a record with some other record’s names-values

This is Record.merge.

The compiler primitive that both of these implementations use is Prim.Row.Union, which is the wired-in way to say that two rows (a row is a list of label-type pairs) combine to form a third. The compiler will solve for the missing variable given any two.

1 Like

Thanks Ryan (and I was merely explaining my rationale rather than wanting to dive off-topic). I was somewhat aware of what PureScript does. I asked around a while back, and got the impression PureScript records are built over JavaScript, and rather limited to what that will support.

To the q’s at hand, wikip says PureScript doesn’t support GADTs (they can be faked). Then is there some other way to achieve singletons/type reification? Or has PureScript got ahead of GHC in implementing dependent typing?

Also JS is dynamically typed. Does PureScript give static type guarantees the compiler will detect attempts to access a field/name that isn’t in the record?


We’re reifying the field name/label from row type-level expressions to term-level records. I wonder what dragons are hiding in “automatically solved”:

The Prim.Row module is embedded in the PureScript compiler. Unlike Prim , it is not imported implicitly. It contains automatically solved type classes for working with row types.

My “not first-class enough” was partly because PureScript allows duplicate labels – as shown by the “left-biased” and the Nub class. Then putting all three FunDeps on Union is a bit iffy. For example:

  • Both left and right include the same label bleah, left contains one occurrence, right contains 2, one of whose values is same as that in left. Then
  • going by FunDep left right -> union, union will contain two (three?) occurrences. But
  • going by right union -> left, both occurrences in union get stripped out, yielding a left with no bleah.

I think a First-class calculus would deliver

  • r == (r `project` r') `merge` (r `remove` r') for any other row r'. That won’t hold if the original r contains duplicates.

You might have spotted my strikethrough Haskell above. There is a Haskell that “allows for all of these things to be implemented as library functions” (for some value of “all” :wink: ).

That’s Hugs/Trex, after some careful surgery to the Overlap/FunDep behaviour. I suspect Trex, like Prim.Row, hides some not-theoretically-vouchsafed ad-hockery. So my line of questions here is circling back after the development to better understand what might be going on.

I popped in with the PureScript facts because of this:

Even if PureScript the language isn’t right for you, the underlying theory of row polymorphism that it uses might be useful in your hunt.

Correct, this is something PureScript doesn’t have. For singletons-like cases, I’d use the tagless final encoding.

class SBool repr where
  sfalse :: repr False
  strue :: repr True

This might be ‘faking’ it? But it’s safe, and more ergonomic than in Haskell due to PureScript’s support for impredicative polymorphism, which makes it easier to work with types like List (forall repr. SBool repr => repr False).

Yes. Though you can use unsafe functions to do unsafe things, and FFI into JavaScript requires you to correctly type, and occasionally wrap, the interface. But PureScript-to-PureScript, the type system ensures that record access is safe.

This is where the duplicate labels thing is actually useful. (I admit, I was turned off by it when I first was learning the language, but from a theory perspective it simplifies things like Union a lot.) Union just concatenates everything together; the specified result here is that union will contain three occurrences of bleah, which are ordered (the one from left first, then the ones from right). Solving right union -> left is straightforward because of this: it strips off the rightmost two occurrences from union and leaves left with the single leftmost one. It’s all sound. (Rows are semi-ordered lists; order doesn’t matter between different labels, but does for the same label. You can think of them as having a normalized form in which they are stable-sorted by label.)

The price of this simplicity is that not every Record r is a representable record, which is where Nub comes in. Nub is the destructive operation that can only be run one way. It gets involved when it’s time to work with actual records instead of rows, and it’s generally ideal to push it to the ‘edge’ of any type-level computation.

Not sure exactly what you’re getting at with this. At the type level, if r and r' are PureScript-like rows, r `project` r' is just r'r' is already the subset. Type-level merge and remove are both Union, with different unknowns. So translating into constraints, this is like asking if, given Union r' r_ r (where r_ is an intermediate variable representing r `remove` r'), do we have Union r' r_ r? And yes, I dare say we do.

Thanks. Where is that theory written up? Is there something more recent than Daan Leijen’s 2005 ‘Extensible records with scoped labels’? That seems to draw heavily on the for-Haskell Hugs/Trex 1996 Gaster&Jones. From Leijen’s Related Work section:

Gaster and Jones [7, 6] presented an elegant type system for records and variants based on the theory of qualified types [10, 11]. They use strict extension with special lacks predicates to prevent duplicate labels. The predicates correspond to runtime label offsets, and standard evidence translation gives a straightforward compilation scheme with constant time label selection.

When I say “draws heavily”, Leijen’s approach seems almost the same (including same syntax for the ‘rest of the row’ variable) except lacking (ha!) the lacks predicates.

The consequences (what is a “semi-ordered list”?) make me nervous. Trex’s “constant time label selection” is because it keeps the physical representation abstract/it can manage the offsets how it chooses, including for example label/key-hashing.

It seems PureScript’s labels can be any String, provided quote-surrounded.

  • If PostScript’s are Stringy under the hood, are they in the term namespace and the type namespace simultaneously?
  • For example, can labels be named same as in-scope term variables? author = { name: name }
  • It’s disappointing they can’t be TitleCase (except if quoted) because – as I hinted above – I think of fieldnames as more akin to Constructors. author = { Name: name }
  • (Leijen follows Trex in putting row-types as a different kind; labels can appear only within row types. Using the kind system is what prompted my q about type data.)

Well yes, to avoid duplicate labels, I could surround every operation with nub; and I see there’s a Lacks constraint. So I could build what I want within PostScript. Trex infers the Lacks for me:

Hugs.Trex> :type (\{Name = name | r} -> name)
\(Name = name | r) -> name :: a\Name => Rec (Name :: b | a) -> b
                           -- ^^^^^^  row a lacks Name, auto-inferred

(Your later q’s are a bit too in the weeds, I’ll PM.)

I like this paper:

Thanks, I think I’ve (tried to) read that before. It seems not to have anything to do with Purescript (does mention the Leijen paper, and Gaster&Jones). It’s more of a theoretical exercise; “We instantiate our approach in a simple functional language with Hindley-Milner polymorphism, which we call ROSE.” “… we leave questions of efficiency to future work.”

§ 3.1 considers ‘Scoped Records’ – that is, with scoped labels. They have to make “row combination non-commutative” – that is merge/union have left-leaning or right-leaning variants. “We also now have two different forms of containment: given rows ζ1 and ζ3, it can either be the case that ζ1 ⊙ζ2 ∼ ζ3 or the case that ζ2 ⊙ζ1 ∼ ζ3; unlike in the previous section, these are not equivalent.” (‘containment’ is the subtyping relationship). Other operations seem to get more “complicated”. But then they’ve started with an algebra assuming unique labels. I guess if you started by assuming duplicates allowed, imposing uniqueness would seem more complicated.