R/haskell - Was simplified subsumption worth it for industry Haskell

So impredicative types might be worth the cost of the simplified subsumption.

Yes, I agree that seems plausible… and that is precisely the thrust of my aforementioned /r/haskell comment. Since it is now topical, I suppose I may as well reproduce it here in full:

I will add my voice to the pile of people saying that I am unhappy about this change. I was skeptical of it when I first saw the proposal, but I did not say anything then, so I cannot complain too loudly… and, frankly, my unhappiness is quite minimal. I think I was right in my assessment at the time that it was not worth wading into the discussion over.

This is not the first time something like this has happened. All the way back in 2010, GHC argued that Let Should Not Be Generalized, which made a similar case that trying to be too clever during type inference was not worth the costs. Instead, the authors argue, it is better to allow the programmer to be explicit. Having said this, there is a big difference between Let Should Not Be Generalized and Simplify Subsumption, namely that the former did not eliminate let generalization in general: it is still performed if neither GADTs nor TypeFamilies is enabled. So the choice to eliminate let generalization was much more explicitly focused on a poor interaction between two type system features, rather than being primarily a way to simplify the compiler’s implementation.

As it happens, I am personally skeptical that the elimination of let generalization was the right choice, but I also acknowledge that the interactions addressed in the paper are genuinely challenging to resolve. I am definitely grateful for GADTs and type families, and all things considered, I am willing to give up let generalization to use them. The point I am trying to make is that my choice is at least clear: I get to pick one or the other. Over time, this choice has effectively served as an informal survey: since programmers overwhelmingly agree to this tradeoff, it abundantly obvious that programmers believe those type system features pay their weight.

Can we say the same about the elimination of higher-rank subsumption? No, we cannot, as there is no choice, and we cannot even really understand yet what we, as users of GHC, have gotten in return. I suspect that if a similar strategy had been taken for this change—that is, if a release introduced an extension SimplifiedSubsumption that was implicitly enabled by ImpredicativeTypes—then programmers would be a lot more willing to consider it. However, that would obviously not provide the same benefit from the perspective of compiler implementors, who now have to maintain both possible scenarios.

With all that in mind, did the committee get this one wrong? Honestly, I don’t think I’m confident enough to take a firm stance one way or the other. Type system design is hard, and maintaining GHC requires an enormous amount of work—the cost of features like these are not free, and we as users of the compiler should have some sympathy for that… not just because we acknowledge the GHC maintainers are human, too, but also because we indirectly benefit from allowing them to focus on other things. That is not to say that this change was unequivocally justified—personally I lean towards saying it was not—but more that I think you can make arguments for both sides, it isn’t a binary choice of whether it was right or wrong, and it is inevitable that sometimes the committee will make mistakes. Perfection is a good goal to aim for, but sometimes we miss. I think this sort of discussion is precisely the sort of healthy reflection that allows us to learn from our missteps, so for that reason, I am glad to see it.

2 Likes

I continue to think we’re mischaracterizing what deep subsumption got wrong: it would invisibly change performance properties of your code when GHC was not clever enough to actually execute what you wrote. Simplified subsumption requires you to be explicit.

Here is an example:

{-# LANGUAGE RankNTypes #-}

module Main where

fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)

f :: Int -> Int -> forall a. a -> (Int, a)
f x y a = (x+y,a)

g :: Int -> Int -> a -> (Int, a)
g x y a = (x+y,a)

main :: IO ()
main = let [f1, f2, f3] = (map (g (fib 30)) [1,2,3]) in
       print [f1 True, f2 False, f3 True]

This program evaluates fib 30 once. But if I change the call to g in main to use f instead, it evaluates fib 30 three times. (It works with f only with GHC 8!) We can observe this by compiling the program and running it with +RTS -t, and we see the allocations go up by roughly a factor of 3. Note that f and g are identical, except for forall a. appearing at an unexpected place in f's type. Note further that we are taught that types are erased, and that all type abstraction and application is correspondingly erased, too. Yet this forall drastically affects the performance of this program.

What simplified subsumption does is to make the programmer aware that map (f (fib 30)) does not really work: f (fib 30) has type Int -> forall a. a -> (Int, a), while map expects something of type a -> b. Note: no foralls there. GHC 8 does magic under the hood (the magic is called eta expansion) to accept the program; GHC 9 removes this magic.

Perhaps we could decide as a community that we prefer the magic (or maybe some different magic) and avoid manual eta-expansion. Magic is sometimes quite nice! But I do think that, if we started from simplified subsumption and someone suggested deep subsumption, we would be quite skeptical.

Separately, I think we (“we” = the GHC steering committee) erred in making this change without warnings, etc., as a way to aid migration. Or maybe the whole eta-expansion should have been kept but turned into a warning. Or something. I’d be happy to revisit this. But I do think it’s important to set the record straight that this change really does make a possibly disastrous performance effect more explicit.

12 Likes

@rae But see the idea from @xplat that I wrote up in https://github.com/ghc-proposals/ghc-proposals/pull/287#issuecomment-1120461992

I wrote up your example below trying to manually desugar via @xplat’s method

{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE RankNTypes #-}
-- Needed because argument to "seq" has polytype
{-# LANGUAGE ImpredicativeTypes #-}

module Main where

fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)

f :: Int -> Int -> forall a. a -> (Int, a)
f x y a = (x+y,a)

g :: Int -> Int -> a -> (Int, a)
g x y a = (x+y,a)

main :: IO ()
main = let -- f' :: forall a. Int -> a -> (Int, a)
           f' = let -- f0 :: Int -> forall a. a -> (Int, a)
                    f0 = f (fib 30)
                in f0 `seq` \x -> f0 x
           [f1, f2, f3] = (map f' [1,2,3]) in
       print [f1 True, f2 False, f3 True]

it works in GHC 9.0 and 9.2 and has the desired sharing.

I was a bit confused by “8” and “9”, so if it helps others, I believe I deduced this to mean “8 series”, i.e. “8.0 to 8.10”, and “9 series”, ie.“9.0 to 9.2” (so far). In particular, it doesn’t mean “8.0” and “9.0”.

1 Like

A no-longer-secret goal of the GHC modularity work for me at least — not on speaking behalf of my coauthors here, with whom I have not discussed this idea much — is that people should be able to swap out components like the type checker for greenfield experiments. Those experimental components can just panic on any input they don’t know how to handle. In this way, GHC ought to make it really easy to try out proof of concepts for subsets of Haskell.

Combine with better GHC.X.Hackage/head-hackage support, and it should be really easy to try building the whole ecosystem with your prototype, patching things out to “get stuff unstuck” where it makes sense.

Together, I think this should be amazing milestone for enabling more research. No more trade-off between doing weird out-there stuff vs being able to empirically evaluate ones work an extant corpus!

5 Likes

Note that MLF does need a new form of quantification, so full integration would require extra syntax and slightly different types in Core. So, it would need to modify more than just the type checker. But for the prototype we can probably just avoid extra surface syntax and use unsafeCoerce to fit the resulting program in Core.

Or maybe the other types of quantification can be written using the existing qualified types syntax. That is an interesting idea to explore.

Note there are existing plans to stop using Core types in the type checker (Remove NoGhcTc, allow HsType GhcTc, HsDecl GhcTc (#18758) · Issues · Glasgow Haskell Compiler / GHC · GitLab). Without that this is harder, but with it, it should be easier.

I bet with -fexpose-all-unfoldings a lot of those unsafeCoerces could be gotten rid of too, since the instantiation of fancy MLF types gets us back to System F types.

1 Like