DatatypeContexts: has something changed since it was marked deprecated?

The DatatypeContexts language extension has been marked deprecated for over ten years.The best summary for why is on Prime (archive link) NoDatatypeContexts – Haskell Prime

It’s widely repeated that this is an anti-feature and should be removed any day now. However, the explanation as to why it does not work, taken from that archive page, does not seem to apply today.

Quoting and abbreviating, the problem is stated to be that given
data Eq a => Foo a = Constr a
you cannot write the following
isEq (Constr x) (Constr y) = x == y
because the compiler doesn’t know that there is an Eq a instance from the datatype context.

This is what i see with GHC 9.0.2:

$ stack ghci --ghci-options="-XDatatypeContexts"
...
-XDatatypeContexts is deprecated: It was widely considered a misfeature, and has been removed from the Haskell language.
...
λ> data Eq a => Foo a = Constr a
λ> isEq (Constr x) (Constr y) = x == y
λ> :t isEq
isEq :: Eq a => Foo a -> Foo a -> Bool

It seems to me the compiler figured out the Eq constraint just fine.

The article goes on to say

In fact, we cannot even write

getVal :: Foo a -> a
getVal (Constr x) = x

And yet

λ> getVal (Constr x) = x
λ> :t getVal
getVal :: Eq a => Foo a -> a

The above was all in a GHCi session but I see the same results if I compile and run a program instead. The problem does occur if you write type signatures for these functions. Was this the same situation a decade ago? Or did the instances not work back then?

2 Likes

You’re abbreviation too much, they say:

One might expect that, with the above Foo type, one could write

isEq :: Foo a -> Foo a -> Bool
isEq (Constr x) (Constr y) = x == y

and the compiler would know that there is an Eq a instance from the datatype context. However, this is not the case. We instead must write

isEq :: Eq a => Foo a -> Foo a -> Bool
isEq (Constr x) (Constr y) = x == y

You’re trying to write it without the type signature which will then infer the type with the constraint, but that is not what they mean. They mean that one might expect the Eq a constraint to be added implicitly to any type signature that mentions Foo a, so that you don’t have to write it manually every time.

And similar with the getVal problem, they’re saying you should be able to give it the signature without the explicit Eq a constraint:

getVal :: Foo a -> a

I can reproduce both these examples on the playground: https://play.haskell.org/saved/3fbv6IRF

3 Likes

You find a more recent discussion of this feature under the name “partial type constructors”. There were two papers, one at ICFPPOPL 2019 and one at Haskell 2022:

https://dl.acm.org/doi/10.1145/3371108

https://dl.acm.org/doi/abs/10.1145/3546189.3549923

Edit: And sorry for just posting those two articles. I don’t remember the exact argumentation and formal system they present. But I can recall that it is very much in the spirit of trying to rehabilitate the DatatypeContexts extension.

2 Likes

There are also recordings of those, see my comment here: Defining the behaviour of a Num instance with other Num types and with itself - #8 by jaror

2 Likes

A decade ago yes it was. But a decade or so before that (1998-1999) GHC behaved differently. This was only discovered late in finalising the '98 report. ‘Contexts on datatype declarations’, starting at SPJ’s opener.

So your getVal example wouldn’t need an Eq constraint – because the Foo a is consumed, not produced. The isEq example does need an Eq constraint – because there’s an explicit ==, not because there’s Foo a consumed.

1 Like

Thanks, so far I’ve just gone through Apoorv’s talk. I can see some reasons this isn’t getting much traction …

  • Asking “Was the AMP a good idea?” [Apoorv’s next-to-next-to-last slide]. Do we really want to go through all that library restructuring again?
  • “dirty details” [next-to-last slide] I’m getting a fishy smell – as in fishhooks.
  • How do we avoid carrying class/constraint dictionaries around the place? (What’s more the (@) is a methodless dictionary/only there to compile-time verify the code, but it carries constraint dictionaries.)
  • Because I think the proposed approach is just plain wrong: a large proportion of those dictionaries/constraints are unnecessary; and the 1999 SPJ mail thread I linked above explains why:

pattern-matching on MkT [SPJ thought of] as eliminating a constraint.
But since the dictionary isn’t stored in the constructor we can’t eliminate it.)

** I therefore PROPOSE that when pattern matching on a constructor,
** any context on the data type declaration is ignored.

With constructor classes (like instance Functor Set, where output Set b wants Ord b), we in general need an Ord dictionary inside the fmap: whatever morphing fmap is applying, we can’t rely on it preserving the relative ordering of elements. OTOH we don’t want an Ord dictionary for the incoming Set a, because fmap isn’t interested in whatever order the incoming elements are in.

I should mention that unlike in 2010 when DatatypeContexts were taken away, we now have PatternSynonyms that can require constraints for building but not matching (with that double Foo a => Bar a => ... business). So a supplier of a datatype can make the real constructors abstract; export only the synonyms. (I wish Set would do that: it does indeed make the constructors abstract; but provides a bunch of functions (with Ord constraints) for building; most access you have to stream to a List; a PattSyn for lookup within a pattern would be lovely.)

And apologies to @jon if I’m hijacking the thread: I think previous answers concluded the q; if you want more discussion around the o.p. I’ll start a new thread.

1 Like

That email thread is interesting. In particular this part by Wadler:

The class constraints on data declarations were a simple way for the user to ask the compiler to enforce this invariant. They have compile-time effect only, no effect whatsoever on run-time (in particular, no dictionaries should be passed).

And in a later mail:

Because my interest is as much in documentation as in constraints on
how the program executes
[…]
Your `redundant pollution’ is exactly the effect I want to achieve!

So it seems to me Wadler wanted basically the behavior that GHC currently has: datatype contexts should introduce no dictionaries on their own, but only require that whenever such data types are used that there is also an explicit “redundant” constraint.

To be honest, I’m starting to come around to that viewpoint. Essentially data type contexts would be a form of machine checked documentation that prevents “silly” types. For example it prevents you from accidentally writing an Functor instance like this:

{-# LANGUAGE DatatypeContexts #-}

data Eq a => Set a = Set [a]

instance Functor Set where
  fmap f (Set xs) = Set (fmap f xs)

Without the constraint it would be totally legal, but obviously you would never want to write this because it does not uphold the otherwise implicit invariant of the set type (e.g. fmap (const 0) (Set [1,2,3]) would produce a malformed set: Set [0,0,0]).

Of course in an ideal world you’d write out the full invariant (with liquid or dependent types), but I think datatype contexts are a useful middle-ground.

Another advantage is that it would be possible to infer nominal role for datatypes with contexts, so you wouldn’t have to use RoleAnnotations in many cases.

This has been my preference too, it makes the documentation more “local/close” to the definition. I was assuming there were no run time costs.

What’s the steel man’s argument against doing it? I have heard that people saying it is anti-pattern in Haskell, but I have never full understood why.

(To take your second point first.) I’m all for machine-checked documentation if it comes for free (or at least for cheap). But contexts don’t come free, and Wadler of all people should know that since he invented TypeClasses. You can look at the last section of the Ingle et all 2022 paper:

Another challenge of our approach is that our elaboration step introduces numerous additional constraints in the types of polymorphic functions. For total type constructors, these additional arguments provide no value, but can still make optimization more difficult. In our current implementation, this causes significant performance loss in some test cases.

My take is, yes, Wadler wanted the same behaviour GHC currently has (and the behaviour Hugs always had) – taking the user-programmers’ point of view. But the only way Haskell compilers know to support a constraint is supplying an extra argument (dictionary) to the function – whether or not that dictionary has any methods; whether or not the code uses any of those methods. And Wadler it seems never considered it from the implementers’ point of view.

SPJ’s approach/GHC-as-was also prevented you writing an instance Functor Set. [**] Because the call to Set constructor on RHS/in an expression context demands a dictionary. Compare (as I said above) those contexts for pattern synonyms.

Note (somewhere on Hackage/I can’t find it just now) there’s a parallel choice of Functor-alikes with a fmap-alike that supports constraints on the result type.

I agree there shouldn’t be a Functor instance for Set: the applied function might produce same-value output elements from different-value inputs; so it would break the Functor law saying the cardinality of the structure mustn’t change.

[**] How can I be so sure about the behaviour of something changed back in 1999? Because I re-implemented that behaviour in Hugs. What’s more it was only about a dozen lines of C++ needed changing – chiefly cutting out where it was planting constraints. The really hard effort was the trial-and-error to figure out which particular dozen lines that was. (And then testing it to blazes, of course.)

Note that I’m not advocating for the new “partial type constructors” approach of implicitly adding a bunch of @ constraints everywhere. I’m advocating for using the DatatypeContexts extension as it is today in GHC. That does not have any run-time cos, it does not even affect run-time behavior in any way*. The data type contexts should just be a way to, at compile time, force users of the type to write the constraint everywhere and make it impossible to write functions like that fmap that do not have the constraint.

* It would mean that you might have to write unused constraints in some cases (e.g. empty :: Ord a => Set a), but GHC should be able to optimise those away. Although even that could be problematic: #20418: The () constraint should perhaps not be optimised. · Issues · Glasgow Haskell Compiler / GHC · GitLab. Is that the point you’re trying to make or am I overcomplicating things?

I just tried it out and this is rejected:

data Set a = Set [a]

pattern Set' :: () => Eq a => [a] -> Set a
pattern Set' xs = Set xs
T.hs:9:23: error:
    • No instance for (Eq a)
        arising from the "provided" constraints claimed by
          the signature of ‘Set'’
      In other words, a successful match on the pattern
        Set xs
      does not provide the constraint (Eq a)
    • In the declaration for pattern synonym ‘Set'’
  |
9 | pattern Set' xs = Set xs
  |                       ^^

You’d have to make Set a GADT to store the Eq inside the Set constructor:

data Set a where 
  Set :: Eq a => [a] -> Set a

And then you don’t even need a pattern synonym. But indeed the Functor Set instance is still rejected because you’d need to get an Eq b constraint from somewhere in the RHS.

Edit: Ah, after rereading I guess you want the opposite:

pattern Set' :: Eq a => () => [a] -> Set a
pattern Set' xs = Set xs

That makes more sense. It puts a constraint on all construction sites, but does not constrain nor give you access to the dictionary when matching. But this still has the same problem as the current DatatypeContexts approach in that you’d have to add unused constraints like empty :: Eq a => Set a (which currently are not always optimised away by GHC).

Yes, well done! I always get bamboozled by which way round they go.

The fmap-alike I was trying to remember above is foldMap :: Monoid m => (a -> m) -> t a -> m, where the t a would be Set a.

Map each element of the structure into a monoid, and combine the results with (<>).

So the recombining would squish out duplicate elements in the result, and sequence in order of the Ord for that type.

Wee-ell maybe: One of the current rules for DatatypeContexts is not to apply the constraint for constructors that don’t mention that type param. (Yeah my description is baffling – even to me. And demonstrated in the docos with NilSet, but not explained. Better shown with an example, let’s try for a BST):

data Ord a => Set a = Tip  |  Node (Set a) a (Set a)

So an empty Set consisting of only a Tip gets no constraint demanded. But, but! For Tip there’s no mention of a only because that’s a H98 style decl. I know really Tip :: Set a (to use GADT style) so its type does mention a.

Then while I was hacking Hugs, I dropped the “that don’t mention that type param” part of the rule. So I get Tip :: Ord a => Set a/no need for a empty separate function.

(Yeah whether the compiler can optimise away the useless constraint is a separate matter.)

Addit:/ re-reading your post: perhaps you’re saying the whole idea of a Ord a constraint for an empty Set is useless(?) I disagree: there’s no point creating a Set in one place in the code, then trying to insert an element in a another piece of code/distant module only to find out the creator omitted to constrain the elements suitably.

(If you want to use Set for a structure that can only be empty or a singleton/so doesn’t need an Ordering, that’s an abuse of a datatype/don’t do that.)

1 Like

No indeed, I meant unused, in the sense that you don’t actually use the dictionary, rather than useless.

Yeah, this is exactly why I think DatatypeContexts is still useful and should perhaps be undeprecated in GHC.

I think the answer you’ll get is again to use PatternSynonyms:

pattern NilSet' :: Eq a => () => Set a
pattern NilSet' = Set []              -- using your version of Set

To be clear: that constrained-empty-Set is not the behaviour you’ll get if we merely restore DatatypeContexts as was. It’d need restoring and some of the complexity removed (if the implementation was anything like Hugs).

(Not sure if I should raise this here or on the Pre-proposal you’ve drafted.)

Note that even that foldMap can’t be used with Ord a => Set a as defined by H98: the t a would need an Ord a constraint even though the instance will only pattern-match on it. Whereas pre-1999 GHC would not require the constraint.

Same applies for this, similar to in the Pre-proposal/goldfirere’s example:

idSet :: Set a -> Set a
idSet x = x                 -- (1)
idSet (Set xs) = (Set xs)   -- (2)
idSet x@(Set xs) = x        -- (3)
getSet (Set xs) = xs        -- (4)

(1) won’t raise a need for a constraint, because no data constructor appears. I disagree with goldfirere here: (1) should not be rejected: whatever built the x – using the data constructor – saw the constraint satisfied, we’ve no need to ask for it again.

(2) will raise a need for a constraint, even though it does the same as (1). Because the data constructor appears on RHS.

(3) will raise a need for a constraint under H98 rules, but didn’t under GHC pre-1999. And I’d say pre-1999 was correct behaviour, same as for (1).

(4) will raise a need for a constraint under H98 rules, but again didn’t under GHC pre-1999. (This is the getVal example from o.p. here.) And I’d say again the pre-1999 behaviour was correct: whatever built the Set xs saw the constraint satisfied. We’ve no need to re-check it – especially since we’re extracting the payload out from the Set.

Note in all cases that if the consumer of the result from these functions wants to apply some method needing a constraint, it’s their responsibility to provide the dictionary in their type. (No different to if they wanted to show the value – which constraint Set doesn’t provide.)

Now it’s not quite true “whatever built the x – using the data constructor – saw the constraint satisfied,”: I could call idSet undefined and (1) would just pass it along; (2) and (4) would give pattern match failure; (3) I’m not quite sure. But this is no worse than with undefineds anyway.

tl;dr I’m already disagreeing with the Pre-proposal. And failure to reach consensus is why DatatypeContexts got deprecated and didn’t come back.

1 Like