After `{-# OVERLAPPING #-}` comes `{-# INCOHERENT #-}`, but why?

Hello Haskell community!
This is my first post so please be patient with me, especially if I use incorrect terminology or oversee something important/ obvious.
Also please excuse me if I motivate my example more extensively than I actually would have to.
So let’s skip to the chase:
I’ve had a lot of fun using Haskell classes recently and it’s one of the features I love the most about the language as it can express so many different semantics. However, working with it, I have had some issues with the fact there is no fine grained control between adding {-# OVERLAPPING #-} and {-# OVERLAPPABLE #-} pragmas and -XIncoherentInstances.
First of all I of course want to note that I didn’t come up with everything in the code examples by myself but have borrowed some of the ideas from existing HList-implementations.
To illustrate my issue I wrote up a little .lhs file to understand my motivation for this and for you to actually try things out.
It can be found here and obviously in the following post:

Say we have the following datatype:

data HTree 
  = HLeaf Symbol Type
  | HNode [HTree] 

We can give it a singleton counter part for the value level by writing the following GADT:

type HSTree :: HTree -> Type
data HSTree as where 
  -- | Make the counterpart of a leaf
  HSLeaf :: forall (label :: Symbol) (typ :: Type). typ -> HSTree ('HLeaf label typ)

  -- | Make the counterpart for an empty node
  HSNode :: forall . HSTree ('HNode '[])
  -- | Insert an HSTree into a node 
  (:+>:) :: forall (root :: [HTree]) (node :: HTree).  
              HSTree node -> HSTree ('HNode root) -> HSTree ('HNode (node ': root))
infixr 5 :+>:

You can now surely build heterogeneous trees like so:

myHTree :: HSTree ('HNode '[ 'HLeaf "firstLeaf" Int, 'HLeaf "secondLeaf" String])
myHTree  = HSLeaf @"firstLeaf" 42 :+>: HSLeaf @"secondLeaf" "foo" :+>: HSNode 

But now you generally also would like to access the leafs and/or the nodes which you would probably want to do by performing a search for a label within the tree. Because of having to know that such a search must succeed for a given HSTree you normally construct a proof that there is such a label with the use of a class and a proof-GADT, we’re now going to show that it’s not possible to create such a thing with the current Haskell instance resolution methods and I’m going to propose a fix to this.

Let’s try to write a function called htreeNode' like so:

htreeNode' :: forall label e tree proxy. proxy label -> HSTree tree -> e
htreeNode' _ _tree = undefined

We first want to create a proof of the element with the label specified being part of the tree and then index the tree by using this proof, let’s first care about the proof:

First we need a type to build our proof with:

data TreeElem (name :: Symbol) (typ :: Type) (tree :: HTree) where 
  -- | Hurray, we found our element!
  Here :: TreeElem name typ ('HLeaf name typ)
  -- | provided we have a proof for the rest of the list for within the node 
  --   to contain an element, we can also construct a proof for it with one 
  --   more element
  Farther :: TreeElem name typ ('HNode lst) -> TreeElem name typ ('HNode (x ': lst))
  -- | provided we have a proof for an element in the node to contain a list 
  --   we can also construct a proof that the node that contains the element 
  --   contains the element
  Deeper :: TreeElem name typ tree -> TreeElem name typ ('HNode (tree ': lst)) 

We now want to write a class that produces such a proof term like so:

class TreeElemOf (name :: Symbol) (typ :: Type) (tree :: HTree) | name tree -> typ where 
  treeElemOf :: TreeElem name typ tree

But before we try to do that, let’s take a step back and see what we’re actually trying to implement by taking another look at the htreeNode function and trying to complete it a bit more with what we’ve done until now:

htreeNode :: 
  forall label e tree. 
  TreeElemOf label e tree => 
  -- given we have a proof that a an element with name `label` and type `e` is in the tree
  HSTree tree -> 
  e 
  -- we can extract that element from the `HSTree`
htreeNode = 
  let 
    -- the proof that that the element is actually in the tree
    proof :: TreeElem label e tree
    proof = treeElemOf @label @e @tree
   in indexWithProof proof 

indexWithProof here takes a term that tells us how to index the tree and the tree that we’re trying to index.
Writing it should be easy:

indexWithProof :: forall label e tree. TreeElem label e tree -> HSTree tree -> e
-- we have already found the element, yey
indexWithProof Here (HSLeaf x) = x
-- the element is actually a bit farther within the node
indexWithProof (Farther rest) (_ :+>: xs) = indexWithProof rest xs
-- the element is deeper in the tree
indexWithProof (Deeper rest) (x :+>: _) = indexWithProof rest x

So let’s actually see that we can extract an element with that function:

secondLeaf :: String
secondLeaf = indexWithProof (Farther (Deeper Here)) myHTree

Evaluating this actually spits out “foo” as expected, you can try it out.

However, there’s one caveat here: we constructed the proof ourselves which isn’t really practical, after all we just want to name the node we want to extract and then get back the element with that name, like in htreeNode, e.g. we want to call htreeNode @"foo" myHTree. However, if you try that out you may notice that GHC complains about a missing instance for TreeElemOf for the HTree.

We will now motivate why we cannot write those and what we’d need to actually do so:

First let’s show that the instances would actually be trivial:

instance TreeElemOf name e ('HLeaf name e) where
  treeElemOf = Here

So far, so good: if we have a Leaf, we found our element.

instance TreeElemOf name e x => TreeElemOf name e ('HNode (x ': xs)) where
  treeElemOf = Deeper treeElemOf
instance TreeElemOf name e ('HNode xs) => TreeElemOf name e ('HNode (x ': xs)) where
  treeElemOf = Farther treeElemOf

And we can actually write both of those instances independently of each other but we cannot write them together;
there is no way to do so as far as I’m aware. You can try out both of them independently and they will work, even
htreeNode, as long as you don’t try to construct proofs of terms which the instance you chose to leave in cannot
create.

The issue with them is that instance resolution doesn’t consider the context of an instance head at all, hence for
the GHC instance resolution mechanism those two instances are actually exactly the same.

There are good reasons to do so because to not make an arbitrary choice while resolving the instance we would
necessarily have to

  • check whether the instance contexts are disjoint.
  • trust the programmer

We all know the latter is not an option but the first might be. I do not know the GHC internals but maybe it may be
possible?

My idea would be to have a syntax like this:

instance family TreeElemOf name e ('HNode (x ': xs)) where 
  TreeElemOf name e ('HNode xs) => treeElemOf = Farther treeElemOf
  TreeElemOf name e x => treeElemOf = Deeper treeElemOf

With the following semantics:

  • the instance whose context is the first one to match the given context is chosen
  • if instance resolution fails for a candidate, the next candidate with a matching context is tried
  • if none of the contexts in the instance family is matched, try a more general instance head (i.e. go on with instance resolution as usual)
  • there can only ever be at most one instance family for each instance head

This would circumvent a couple of problems:

  • the issue that sometimes you don’t know which modules export which instances, which potentially may be harmful because they may overwrite a context unexpectedly
  • the issue that if you have exactly the same instance head but you can’t make sure the contexts are disjoint you would have an arbitrary instance chosen.
  • the issue that with many instances your program will spend a lot of time in instance resolution and may choose an unexpected instance when not limited to a handful of instances to choose from

One issue with this is that we have to do unification and instance resolution in an alternating manner, which currently is not supported by GHC - if instance resolution succeeded, GHC doesn’t go back to it.

Apart from this, I would guess this would range somewhere between {-# OVERLAPPABLE #-} and {-# OVERLAPPING #-} pragmas and -XIncoherentInstances wrt safety, as the programmer has to be careful that they choose the right order of matching contexts but it’s still not completely arbitrary like with -XIncoherentInstances, I think limiting the issues of where the instance family is declared to a closed block also adds safety as it makes overviewing the - possibly risky - instances easier and more maintainable.

Thank you for reading, commenting and telling me what issues might linger within this - not only in general but also wrt - possibly - an actual implementation for GHC.

Because I have seen a reply (which apparently got deleted) proposing something like artificially making one instance more specific than the other adding one more extra instance - I’m pretty sure that doesn’t work because then the instance resolution system will just more eagerly choose going deeper or farther but never both at once, e.g. deeper farther won’t be possible, just Deeper Deeper Deeper … or Farther Farther Farther …

Round 2 :slight_smile: :

class TreeElemOf' (inTree :: Bool) (name :: Symbol) (typ :: Type) (x :: HTree) (tree :: HTree) | name tree -> typ where 
  treeElemOf' :: TreeElem name typ tree

type family InTree name e x where
  InTree name e (HLeaf name e) = True
  InTree name e (HLeaf name' e') = False
  InTree name e (HNode '[]) = False
  InTree name e (HNode (x : xs)) = Or (InTree name e x) (InTree name e (HNode xs))

type family Or b1 b2 where
  Or True True = True
  Or True False = True
  Or False True = True
  Or False False = False

instance TreeElemOf name e x => TreeElemOf' True name e x ('HNode (x ': xs)) where
  treeElemOf' = Deeper treeElemOf
instance TreeElemOf name e (HNode xs) => TreeElemOf' False name e x ('HNode (x ': xs)) where
  treeElemOf' = Farther treeElemOf

instance forall name e x xs. TreeElemOf' (InTree name e x) name e x ('HNode (x : xs)) => TreeElemOf name e ('HNode (x ': xs)) where
  treeElemOf = treeElemOf' @(InTree name e x) @name @e @x

I adapted this approach from Section 5.5 of Effect Handlers in Haskell, Evidently.

1 Like

I’m not sure whether I did something wrong but on this HSTree

otherHTree :: HSTree ('HNode '[ 'HNode '[ 'HLeaf "lowerFirst" Integer, 'HLeaf "lowerSecond" String],'HLeaf "firstLeaf" Int, 'H
Leaf "secondLeaf" String])
otherHTree = (HSLeaf @"lowerFirst" 84 :+>: HSLeaf @"lowerSecond" "bar" :+>: HSNode) :+>: myHTree 

running htreeNode @"lowerSecond" otherHTree gives me no instance for TreeElemOf "lowerSecond" () ('HNode '[]) error (btw I don’t have a clue why it says unit there, should really be a or something but it doesn’t matter because it didn’t find an instance)

It works now, forgot an instance!

Thanks for your investigation!

You need to also give an explicit application for the type, otherwise it defaults to (). I don’t know if that is fixable.

yeah, I mean it’s not wrong, if it can’t find an instance it can also not infer the type of the element to return, right? it’s just weird that it defaults to () (to me, that is).

No, it can’t find an instance because it tries to find one that returns (). This defaulting only happens in GHCi by the way. In a normal source file you will get an ambiguity error, I believe.

makes sense, thanks a lot!

Btw, I would still appreciate some feedback on the general idea of instance families, although there is a way to solve the specific problem in another way.
I also do have to say that I think it’s a bit of an issue that the solution breaks type inference. (using this with OverloadedRecordDot will require a type annotation)

1 Like

Hi @MangoIV I haven’t read your post in detail yet, but as a general warning:

GHC’s implementation of Overlapping Instances and FunDeps is ‘bogus’ – that’s the word SPJ uses in a Note in the compiler code. See long-standing Trac #10675 for further discussion.

(There are further difficulties with ‘orphan instances’ – but those shouldn’t affect you if you have only a single module.)

You can manage the 'bogus’ness with a carefully defined set of instances and overlaps (sometimes needing to create dummy instances). It’s more of an art than a science, sadly.

Before you go suggesting improvements (and there are already heaps of suggestions), bear in mind that in general a class might have more than one FunDep, and that FunDep source positions in instances might overlap for one FunDep, but not for others.

1 Like

So do I understand correctly that before suggesting extensions to the class system you’d prefer if the current issues would be fixed first?

(There are further difficulties with ‘orphan instances’ – but those shouldn’t affect you if you have only a single module.)

I don’t think that would affect my idea at all because the instance family would be put in place instead of an instance and the instance resolution would just continue as normal after finding an instance in the instance family (or not).

I kinda fear that the biggest problem would be that we normally do not go back to instance resolution if a context doesn’t unify. (once again, I do not know the GHC internals but that’s how I understood it from reading the documentation)

I think you’re trying to run before you can walk. I’m still working through your posts, but I see some misconceptions already …

  • All modules export all instances. (You can control which classes get exported and which methods. The compiler must be able to see all instances, for sanity in type solving.)
    The difficulty is if one/some module(s) in your application don’t import all modules that declare instances: when it comes to compile the top-level module, it finds sub-modules compiled with inconsistent sight of instances.
    This is the ‘orphan instances’ problem. Don’t do that.
  • Instance contexts (the bit before => in instance (context, context, ...) => MyClass ... where ... play no part in choosing instances. (That contexts have some effect is a common newbie misunderstanding.) If you have “exactly the same instance head”(s) the program won’t compile. That has always held with all Haskells. Again it’s for type sanity.
  • You want the compiler to examine all possible instances and make sure it doesn’t commit to an instance until it’s definitely excluded all others. I’d be begging the compiler to take all the time it needs. Do you have evidence the compiler is “spending a lot of time in instance resolution”? How would you know that’s what it’s spending its time on?

No you don’t want to limit the instances the compiler has to choose from: you don’t know better than its ability to see the program-wide typing. The best you’ll get is incoherent behaviour and unpredictable hard-to-find errors; the worst you’ll get is segfaults.

As an example of how tricky this is, consider a class to add type-level Nats, trying to exploit three-way type improvement:

class AddNat (a :: Nat) (b :: Nat) (c :: Nat)  | a b -> c, a c -> b, b c -> a

instance AddNat Z b b
-- base case: a is Zero, so b must be same as c; or
--            b is same as c, so a must be Zero
instance ( AddNat a' b c' , ? ) => AddNat (S a') b ?
-- iterative case: a is non-zero, so c must be non-zero
--                 then reduce a to a' and iterate (but also need to reduce c so iterate on c'); or
--                 c is non-zero, so either a or b or both are non-zero

I didn’t think GHC/the power of FunDeps could express that, but it can. (OTOH this is exploiting what comes very close to GHC’s 'bogus’ness.)

1 Like

instance contexts (the bit before => in instance (context, context, ...) => MyClass ... where ... play no part in choosing instances. (That contexts have some effect is a common newbie misunderstanding.) If you have “exactly the same instance head”(s) the program won’t compile. That has always held with all Haskells. Again it’s for type sanity.

Sorry for the misunderstanding here. I am aware that currently the contexts don’t have influence on how an instance is chosen, I just wanted to motivate why I want the “instance family” to be closed.

All modules export all instances. (You can control which classes get exported and which methods. The compiler must be able to see all instances, for sanity in type solving.)
The difficulty is if one/some module(s) in your application don’t import all modules that declare instances: when it comes to compile the top-level module, it finds sub-modules compiled with inconsistent sight of instances.
This is the ‘orphan instances’ problem. Don’t do that.

This also seemed to be a misunderstanding, I was trying to say that knowing that every module you import possibly (with matching on contexts and without closing the “instance families”) could be harmful would be a motivation to make the “instance family” closed.

You want the compiler to examine all possible instances and make sure it doesn’t commit to an instance until it’s definitely excluded all others. I’d be begging the compiler to take all the time it needs. Do you have evidence the compiler is “spending a lot of time in instance resolution”? How would you know that’s what it’s spending its time on?

I was just guessing that it would take quite a lot of time going back to instance resolution every time it would fail unifying and that’s why I was proposing to have a closed “instance family” - so this is is just to compare closed “instance families” vs always going back to instance resolution for equally general instance heads to find a matching context. (while currently equally general instance heads are in all cases rejected)

Yeah for those three I’m guessing I just didn’t express myself clear enough, sorry for that.

No you don’t want to limit the instances the compiler has to choose from: you don’t know better than its ability to see the program-wide typing. The best you’ll get is incoherent behaviour and unpredictable hard-to-find errors; the worst you’ll get is segfaults.

I’m trying to say: What about just one instance head that would not be able to have different instance in the current implementation? Wouldn’t it be sane to choose one of the instance provided as soon as it’s matching a context?

Just to summarize, what I want:

  • making instance resolution depend on contexts in some special cases (“instance family”)
  • making the order of chose instances the choice of the programmer in this case as part of “choice” or casing semantics
  • in the three bulletpoints I was trying to say why my proposal of “instance families” would maybe fit into that gap between incoherent instances and OVERLAPPING/ OVERLAPPABLE

I’m really sorry for the misunderstanding, thank you for taking so much time to read and explain, that’s really kind.

To your last example: I also found it quite interesting that you can simulate (everything?) in type families (but at least associated types) with FunctionalDependencies enabled :smiley:

It seems to me that what’s being asked is that exactly this should be changed

Yes I grokked that. Haven’t had time 'til now to explain why it’s a bad idea. Haskell doesn’t now nor never has [**] [***] tried to do that. Consider if contexts did play a part in instance resolution:

class C a  where ...
instance (D Int b) => C (Int, b)  where ...

class D c d  where ...
instance {-# OVERLAPPABLE #-} (C (Int, b)) => D Int b  where ...
instance D Int Bool  where ...
  1. Should that instance C get accepted? Which instance D should it chain to?
  2. Should that OVERLAPPABLE instance D get accepted?
  3. The answer to 1. depends on the answer to 2. The answer to 2. depends on the answer to 1. ad infinitum.
class F e f  | e -> f, f -> e  where ...
instance F Char Bool  where ...

instance (F Char e) => C (Char, e)  where ...

Should that instance C get accepted? If so, F's FunDep will ‘improve’ e to Bool. But then the instance C's body might not be compatible. Then faced with a [Wanted] C (Char, e), should the compiler defer choosing that instance to see if other type inference will satisfy e ~ Bool – or if not, look for a more general instance C?
a) It might all work out in the end; or it might get stuck; or it might arrive at circular dependencies.
b) Even if it works out, all this investigating what-ifs are liable to need backtracking and will certainly slow down and complicate instance resolution – in what is already a horribly complex part of the compiler.

Just not worth it, because the programmer can always structure their (overlapping) instances to guide resolution – at cost of denser code. [see next post after next]

[**] There’s vestigial evidence that Hugs Haskell used to be a bit fancier with multiple potentially eligible instances “but it is currently broken” – as of 20 years ago.

[**] Some of the authors of Hugs went on to develop ‘Habit’, a Haskell-like language. That has ‘Instance Chains’ – which might be similar to what @MangoIV has in mind, and includes some ability to restrict instance selection from instance contexts. Somewhere there’s a forum thread where Garrett concedes this introduces the sort of ambiguities/indeterminism I’ve outlined. Your compilation will end up in the same kinda place as with UndecidableInstances: compiler looping because it can’t resolve contexts.

[***] The prohibition is documented/spec’d/motivated (poorly IMO) as the last example in section 4.3.2 of the H2010 Language Report (same text as '98 Report).

Hmm maybe. Yes in general type families are very comparable with FunDeps. There’s some annoying differences in corner cases.

  • Firstly, there’s no difference between Type Families vs Associated Types: Assoc Types are merely a syntactic convenience; the decls are hoisted out of the instance to top level.
  • That means when you see a Type Family usage like F Int b where F looks like it’s Assoc’d inside class C, there’s not necessarily any instance C Int b. And perhaps that doesn’t matter, or perhaps it does – you’ll have to go hunting round the code. It’s something like a term-level result being undefined: maybe the program will never try to evaluate it. [Section 3 THE TOTALITY TRAP]
  • For GADTs using a constraint with a FunDep inside a constructor, the compiler might not be able to see the improvement; whereas with a TypeFamily it more usually can. It’s ugly.

Although it’s an art, there are some rules of thumb:

  • Never use {-# INCOHERENT #-}. If the compiler can’t resolve your instances, you can always add more/be more explicit until it can – unless your overall structure is actually incoherent; in which case adding the pragma will just mask the problem until it pops out in some mysterious way long after you’ve forgotten why you wrote the code that way. Your users will curse you.
  • It sometimes speeds up compilation to use {-# INCOHERENT #-} – but only after you’ve proved your instance structure by compiling without. That is, it’s an optimisation, not a design technique. Even so: never use {-# INCOHERENT #-}, because the next person to modify your code won’t know/remember the subtlety, will stick in some random instance and the program is back at underlying incoherence.
  • GHC allows semi-overlapping like ‘the example reported lazily /potentially overlap’. Better compilers don’t allow that. Don’t do that/always make sure your instances overlap in strict substitution order. (See example below.)
  • Always use {-# OVERLAPPABLE #-}; if you see {-# OVERLAPPING/OVERLAPS #-} immediately find the more general instance(s) and make sure it’s OVERLAPPABLE. a) that’s better documentation; b) with OVERLAPPABLE the compiler will try to defer instance resolution until it’s sure there’s no more specific instance – although this isn’t ever-so reliable.
  • Sometimes you’re importing a module with instances that you can’t change; and yet you want to write an OVERLAPPING instance. If you can’t get the author to change their module, expect trouble.

The answer to the example I posed, shows a more general technique:

{-# LANGUAGE  GADTs #-}                            -- to use ~ type equality constraint

class AddNat (a :: Nat) (b :: Nat) (c :: Nat)  | a b -> c, a c -> b, b c -> a

instance AddNat Z b b

instance {-# OVERLAPPABLE #-}
         (a ~ (S a'), c ~ (S c'), AddNat a' b c')   -- explicit type improvement
         => AddNat a b c                            -- most general/irrefutable instance head

You can always write a most general head like that. If nec, add a constraint that uses a helper class/instances to give if/then/else logic for instances.

1 Like

This is all a bit over my head, but I really like this note and want to comment on it:

Unfortunately, a lot of us learning Haskell do this a lot. It’s very easy to do, and hard to accomplish our goals when limiting ourselves to constructs we understand enough to use them without issue.

I think this is (in part) due to the many, many different ways you can use Haskell, or express your intent with Haskell and the ways GHC can compile that expression, as well as the lack of clear guidance on how these constructs can actually be used in practice.

1 Like

Yes, I sympathise. And as a stimulus for learning, it’s valid to imagine a ‘better’ version of some feature, to help think through why the feature works as it does. Unfortunately with Overlaps/FunDeps, the (combo of) features doesn’t ‘work’ always, and there’s plenty of better versions implemented or proposed.

Haskell used to be an experimental language. There’s plenty of experiments that didn’t work out so well and/or people lost interest in polishing them. So they’ve gotten left over, partially because of the tyranny of backwards compatibility: somebody might be using the feature/don’t take it out.

Overlapping Instances are reasonably polished (but not well documented) – except where they interact with FunDeps, which is not documented at all. FunDeps have a strong theoretical background, well documented, which explicitly excludes interaction with Overlaps, but which GHC pretty much ignores.

A big difficulty as you get beyond newb level, is that different code uses different features. So it’s not just which you should write, but also which you need to read/ be aware of to understand others’ code or libraries you want to import.

With FunDeps/Overlaps specifically there’s also

  • Assoc Types which seem at first sight similar to FunDeps, but don’t interact at all with Overlaps; and anyway are really Type Families in disguise, so they get ripped away from the class instances, as I pointed out above.
  • Type Families/Assoc Types appearing in constraints combined with ~ type equality are very similar to FunDeps.
  • But Closed Type Families (typically combined with ~) are what you need to achieve something like Overlaps. They have to be declared standalone, and it sometimes needs clunky workrounds to make the type logic extensible in the sense of adding class instances.
  • Because each of those features has limitations, your contexts often end up needing UndecidableInstances – which sounds scary, but usually isn’t. (My AddNat example needs them, for example.) If it turns out some gnarly combination of instances/contexts is undecidable, it’s a devil of a job to pin down what’s wrong. (There’s again been several proposals to make UndecidableInstances more newb-friendly, but you don’t get a publishable research paper out of that sort of polishing.)

I guess those using Haskell commercially will have ‘house guidelines’ that bless certain extensions/prohibit others – exactly to reduce cognitive load. Then what to do if you want to build on somebody else’s IP by importing a package that adheres to different standards?

(For example I don’t use Lenses – not because there’s anything wrong with them, but because it needs understanding Template Haskell, and there’s just too many options, constructs and operators. I don’t have piles of deeply nested data structures to justify the learning investment.)

Then many of my interventions in GHC are along the lines: just stop making things even more brain-aching – especially for newbs.

1 Like

I’d say the instance family idea may be worth a GHC proposal. It would need some polishing before you submit it there @MangoIV. There’s at least two design problems that would need to be resolved:

  • Your example looks neat only because there’s a single class method. What happens if there are two or more class methods? What happens if there are zero, where do you specify the instance context then? I have a feeling you’ll need two levels of nesting instead of just one as you envisioned.
  • You’ll need to clarify the interaction between the instance family declarations and the regular instance declarations, as in what happens if there are both for a particular class. I’m guessing you’d want at least to disallow overlap between them, because you can always resolve it inside the instance family declaration.