Emulating OCaml modules with typeclasses

Hey folks!

After reading Deconstructing Classes, I tried to use the idea to mimic a simple OCaml module functor. I have no OCaml knowledge, but basically I want to implement this:

module type STACK = sig
  type 'a t
  val empty : 'a t
  val push  : 'a -> 'a t -> 'a t
end

module ListStack : STACK = struct
  type 'a t = 'a list
  let empty = []
  let push x s = x :: s
end

module MakeSized (S : STACK) : sig
  include STACK with type 'a t = 'a S.t
  val size : 'a t -> int
end = struct
  include S
  let size s = List.length s     (* Don't mind this one... as it relies on S.t being list-based *)
end

I have this now in Haskell:

{-# LANGUAGE DerivingStrategies, TypeFamilies, AllowAmbiguousTypes #-}

import Data.Kind

class Stack (m :: k) where
  type Elem m :: Type

  type Container m :: Type -> Type

  empty :: Container m (Elem m)
  push :: (Elem m) -> (Container m (Elem m)) -> (Container m (Elem m))

data ListStack (a :: Type)

instance Stack (ListStack a) where
  type Elem (ListStack a) = a
  type Container (ListStack a) = []

  empty :: [a]
  empty = []

  push :: a -> [a] -> [a]
  push = (:)

class (Stack m) => SizedStack (m :: k) where
  size :: Container m (Elem m) -> Int

instance SizedStack (ListStack a) where
  size :: [a] -> Int
  size = length

So far, so good. But the issue is I can’t seem to find any way to actually ā€œuseā€ this concoction; I used AllowAmbiguousTypes so that I can use TypeApplications to disambiguate it. But:

stack :: [Int]
stack = empty @(ListStack Int)

Gives:

• Couldn't match type ā€˜Elem m0’ with ā€˜Int’
  Expected: [Int]
    Actual: Container m0 (Elem m0)
  The type variable ā€˜m0’ is ambiguous
• In the expression: empty @(ListStack Int)
  In an equation for ā€˜stack’: stack = empty @(ListStack Int) typecheck(GHC-83865)

And this:

stack :: forall m. (m ~ ListStack Int, Stack m) => Container m (Elem m)
stack = empty @m

Gives:

• Could not deduce ā€˜Stack m0’ arising from a use of ā€˜empty’
  from the context: (m1 ~ ListStack Int, Stack m1)
    bound by the type signature for:
               stack :: forall m1.
                        (m1 ~ ListStack Int, Stack m1) =>
                        Container m1 (Elem m1)
    at C:\Users\saeid\Projects\Haskell\playground\src\Scratch.hs:54:1-71
  The type variable ā€˜m0’ is ambiguous
  Relevant bindings include
    stack :: Container m1 (Elem m1)
• In the expression: empty @m
  In an equation for ā€˜stack’: stack = empty @m typecheck(GHC-39999)
• Couldn't match type ā€˜Elem m0’ with ā€˜Int’
  Expected: Container m1 (Elem m1)
    Actual: Container m0 (Elem m0)
  The type variable ā€˜m0’ is ambiguous
• In the expression: empty @m
  In an equation for ā€˜stack’: stack = empty @m
• Relevant bindings include
    stack :: Container m1 (Elem m1)
      typecheck(GHC-83865)
empty :: forall k (m :: k). Stack m => Container m (Elem m)

Is there a way to make the type checker happy using the class definitions as is, or refactor the types somehow to make it work? Or is it plain impossible?

Any hints or ideas would be greatly appreciated!

The problem is that Stack is kind-polymorphic, so the type of empty is:

ghci> :t empty
empty :: forall k (m :: k). Stack m => Container m (Elem m)

So you need to invoke it like this: empty @_ @(ListStack Int)

Alternatively, you can make Stack and SizedStack not kind-polymorphic by replacing k with Type. I don’t see any reason not to do that.

You may find this interesting: modules-for-lennart

Thanks for that method

I have been looking for a way to define something akin to typeclasses that support more than one implementation per type. Because that is how idris2 interfaces work, they are basically more a more general version of haskell typeclasses made to have the same power as ocaml first class modules

Then wrap (one of) the types in a newtype. And in general, if you prefer the semantics of some other language, don’t abuse Haskell into something unidiomatic.

I hear this complaint from time to time but I don’t understand what the problem is.

The idiomatic solution is to use newtypes. Otherwise, if you really want ā€œnamed instancesā€, add a parameter to the class and use type applications to select an instance.

regarding "why aren’t newtypes good enoughā€:

newtypes are fine in simple cases, but for more complex cases, such as an AST, you can’t just change the type without changing the containing types too. For example, its always fun when you need to deal with changing the json serialization of some data that is deeply nested.

This comes up more often than you might think; for example, you may want to change the way data is generated for an Arbitrary instance embedded in another piece of data.

As an aside, you may find this interesting: Monad Transformers and Effects with Backpack

That’s what coerce is for though, isn’t it?

Coerce will work if you need to change the type at the root of some structure. But, see for example: Haskell Playground

How do I change the way Comment is serialized to JSON inside the BlogPost? I can’t, at least not without making big changes.

I think the common solution is just to write the instance yourself instead of deriving it. It’s not that much work, and certainly easier than trying to override instance resolution.

Sure, I can change how I write the instance, but then I am changing it everywhere. I can’t, say, decide to have commenter emails shown when requested for an admin dashboard, but hidden when requested for public consumption.

This problem pops up all over the place, once you are aware of it.

Typeclasses are wonderfully convenient. Like all? many? pl conveniences, they’re great when they are what you want, but become frustrating when they get in the way of what you want.

Sure, I can change how I write the instance, but then I am changing it everywhere. I can’t, say, decide to have commenter emails shown when requested for an admin dashboard, but hidden when requested for public consumption.

This problem pops up all over the place, once you are aware of it.

Typeclasses are wonderfully convenient. Like all? many? pl conveniences, they’re great when they are what you want, but become frustrating when they get in the way of what you want.

Edit: I wish I had thought of this earlier, but I didn’t want to double post. This is probably the best blog post I know of that helps illuminate the modules vs typeclasses discussion. Its long, I know, but at least it isn’t Impenetrable:

ML modules and Haskell typeclasses are deceptively similar, but they are really different tools for different purposes.

Serialization is one example where I’d say typeclasses are a better fit than modules, because type classes give you the guarantee that there is just a single way that a data type can be serialized and deserialized. It relieves you from worrying about which serialization format was used. Of course sometimes the choice is out of your hands and you need to actually worry about those things, but that should not happen often.

Another example are ordered data structures like search trees. You wouldn’t want to worry about which ordering was used to construct a search tree. I challenge you to write a polymorphic search tree in ML which supports a function like map :: (Ord a, Ord b) => (a -> b) -> SearchTree a -> SearchTree b.

I had a discussion about this on Reddit a while ago

Here’s my conclusion:

I would conclude that ML modules are very useful for defining larger interfaces that abstract over several concrete implementations. And coherent type classes are more useful for small common operations that should work on multiple types but always kind of mean the same thing, i.e. things for which every implementation must satisfy certain restrictive properties/laws, like +, *, <, ==, and also for structures like functors (which are always unique anyway), applicatives and monads (here there might be some choice, but there are usually only a few sensible implementations).

I understand the problem, but I don’t see how this is any different with ML modules?
In OCaml, instead of a ToJSON BlogPost instance, you would have a derived blog_post_to_yojson function that you would still have to rewrite to use your custom Comment serialization, the exact same way you would for the newtype instance in Haskell. There wouldn’t even be any modules involved in OCaml.
Really the only difference would be that usage sites would need to call a different function on BlogPosts instead of wrapping them in a newtype.

Agreed generally. Typeclasses are often great, work well with single logical implementation, etc.

This gets to the root of something that I dislike about a lot of language features: it is easy to write yourself into a corner where you can’t quickly adapt to new requirements.

And that’s what makes the serialization example work. It seems like there should only be a single serialization method, but there actually are valid reasons to want to change it. The monoid for Integer is a great example, but there are a lot: Ord for record types that have multiple ways to render, etc. Even what is probably the simplest example - Equality! is famously hard to get right, because there is often not a single right answer - it depends upon context.

what i’d like is a language like described in https://home.ttic.edu/~dreyer/papers/mtc/main-long.pdf ; you get the flexibility of modules with the defaulting of typeclasses (paper tl;dr, add a new keyword that says ā€œwithin this scope, use these typeclass instancesā€).

idris2 interfaces are just that

Local instances with clear semantics would be a great feature to have.

In fact there is GHC.Exts.withDict that (in theory) does exactly that, but only works with classes that have one method, and, more importantly, has undefined behavior once more than one instance is in scope.

Implicit parameters also provide a simple solution to this problem. But unfortunately their implementation in GHC is not fully reliable for various reasons.

Personally I spent some time on this problem trying all sort of tricks (including compiler plugins) but wasn’t able to find a satisfactory solution. From what I can see it seems impossible without built-in support from GHC.

I don’t agree that there are many valid reasons to change it. You gave an example that was about hiding sensitive information, but I think that is not the job of a serialization function. It is easy and more idiomatic to just create a separate type.

That paper does not address the benefits of global uniqueness of typeclasses at all. I think a much more reasonable approach is detailed in Section 4 of A Non-Reformist Reform for Haskell Modularity. TL;DR different instances can exists separately from each other in different modules, as long as you don’t (possibly transitively) import them both in the same module.

There is a way to locally use a custom instance as described in Coherent explicit dictionary application for Haskell, but it is restricted and great care is taken to ensure no information about the use of this local instance escapes.