https://cs-syd.eu/posts/2023-08-25-ad-hoc-polymorphism-erodes-type-safety
Interesting. See Section 1.3 of GHCâs style guide
So whatâs going on here is that allowListSize
was morally
allowListSize :: Foldable [] => Settings -> Int
allowListSize = length . settingAllowList
and it changed to
allowListSize :: Foldable Maybe => Settings -> Int
allowListSize = length . settingAllowList
but because the type class constraint is automatically resolved we donât get to see this change, become suspicious, and deal with the problem.
Since FTP length
is a particularly common culprit. I suggest using length @[]
(or whatever) wherever you use it!
In private conversations I usually call this the âinvisible diffâ problem, because this concerns the part of the diff that you usually donât see when you review a PR on github or gitlab. But it is a diff which could theoretically be made visible by appropriate tooling: The diff only has to be done on the level of the desugared Core instead of on the textual diff.
If I take your example, then the following two versions of the function are generated during compile time (taken directly from godbolt.org):
-- Before
allowListSize :: Settings -> Int
allowListSize = . (length $fFoldableList) settingAllowList
vs
-- After
allowListSize :: Settings -> Int
allowListSize = . (length $fFoldableMaybe) settingAllowList
It might be quite a bit of work to write this tooling, but it should theoretically be possible. You would only need both versions of the desugared Core available, and run a normal textual diff on that.
Basically, typeclass are not type safe if you refactor by changing a type to a similar type.
But thatâs true for monomorphic code to if you change a type to the (almost) the same type (f a
â f (f a)
).
In the given example if you replace [ClientIdentifier]
by [ [ClientIdentifyer] ]
(instead Maybe [ClientIdentifer]
), you get the same problem, whith or without class type ,@[]
monomorphic function etc âŚ
I remember being beaten by this by the past, and also tuple length
What does type safety mean here?
I only smell the desired outcome that is ârefactoring safeâ⌠not sure you can get that from type system alone, unless using additional machinery that is dependent types (dependent haskell) or gradual types refinement types (liquid haskel)?
Edit: Or we do what all industrial programmers do, write functional test cases if your type system usage is not there yet to guarantee your spec.
Liquid Haskell uses liquid types which are a form of refinement types.
Thanks for correcting, it was my confusion.
Thank you @NorfairKing: food for thought.
Yes, FTP was controversial at the time (and probably still is). For example:
Prelude> length (1, 2)
===> 1
Prelude> length (1, 2, 3)
<interactive>:5:1: error:
* No instance for (Foldable ((,,) Integer Integer))
arising from a use of `length'
But you could define instance Foldable
for all sorts of tuples.
IMO length
should have remained restricted to lists (all those tutorials!); FTP should have introduced a new scaryOverloadableLength
for Foldable
s.
So are there examples of the dangers of refactoring misleadingly failing to throw type errors, other than with length
? The article says
can occur for with every class/trait that has more than one instance/impl.
Ok. Are there classes other than Foldable
with instances for so many generic data container types?
In my years of experience, I developed a rule of thumb of when an abstraction erodes type-safety.
I call typeclasses like Foldable
structure-oblivious because they donât preserve the structure of the argument in their result. Thatâs why theyâre potentially dangerous because if you add more layers on top of your argument you have no way to leverage the type checker to catch potentially wrong behaviour earlier. So you must rely on other tools to verify your code (tests, LiquidHaskell, manual code review, etc.).
Examples of structure-oblivious abstractions:
length :: Foldable f => f a -> Int
show :: Show a => a -> String
toJSON :: ToJSON a => a -> Value
In contrast, some typeclasses are structure-preserving. The type of the structure is preserved in the result. Thatâs why all types propagate. If you change a value from settingAllowList :: [ClientIdentifier]
to settingAllowList :: Maybe [ClientIdentifier]
, functions like fmap fromClientIdentifier settingAllowList
donât compile anymore.
Examples of structure-preserving abstractions:
(<>) :: Semigroup a => a -> a -> a
fmap :: Functor f => (a -> b) -> f a -> f b
I noticed, when thereâs a bug due to a usage of a typeclass, itâs usually because the typeclass is structure-oblivious.
Nice, and I can intuit why this might be the case! Sounds like the beginning of a HLint ruleâŚ
I suppose GHC could warn about any method call which is not fully polymorphic and the definition of which has a given type variable (mentioned in the class head) on the left hand side of the arrow but not on the right hand side. On a superficial reading, that seems to define these structure discarding methods. Is there more to the story?
How does HLint or the compiler determine the difference between
middle of some code somewhere (length tree) more code
where tree
is a particular Foldable
data structure, and
lengthTree :: Tree a -> Int
lengthTree = length
which I infer is the âresponsibleâ way to use this sort of method?
I would be interested to know how many people have actually had a bug caused by this (and of those, in how many cases it was exactly that one Foldable
tuple footgun). I donât think I ever have.
HLint canât. It doesnât have type information. GHC can because it does. GHC case can see that in the first example length
is used at polymorphic type (I presume thatâs what you meant) and lengthTree
is used an monomorphic type.
In fact, stan
already has a rule for this. (And now for your regular PSA that stan
doesnât support GHC 9.x but I have a branch that adds support up to GHC 9.4).
know how many people have actually had a bug caused by this
I have, a few times over the years, I think.
I might be misunderstanding, but isnât the original issue that length
was used on one monomorphic type ([ClientIdentifier]
), and then the code changed so that it was being used on another monomorphic type (Maybe [ClientIdentifier]
) without the programmer realizing? If monomorphic-type-good is the rule, then neither of these would be flagged. If polymorphic-type-good, then how do you ever use a polymorphic function without generating warnings (this is what I was trying to ask in my previous post)?
Oh, I see, you mean how does it tell that lengthTree
is a âgoodâ usage? And itâs more a question about the semantics of âgoodâ usage than the mechanics of how tooling detects it?
(I should add a correction about stan
: I think what it detects is certain âbadâ monomorphic usages, over (,) a
particularly, and probably others.)
Yeah, I guess itâs a question about semantics. I was responding specifically to
And I think either there is more to the story or I just donât understand whatâs being proposed, because sometimes youâre going to want to use a function like length
on a not-fully-polymorphic type, so whatâs the plan for allowing that when itâs âcorrectâ, and what does âcorrectâ even mean?
I think i have been bitten once by the tuple problem, which is why, I think, I am aware of it.
I personnaly donât like FTP : if it was just a shadowing issue I would have prefered a way to do âbulk hiddingâ or something or use a prefix like fmap
.
I am also regularly bitten by readMay
return Nothing
for everything.