Advice about use of typeclasses in my project

Hi everyone! I am new here and a beginning Haskeller.

I have a question. I am writing an app and a few days ago I asked a question on stackoverflow. The commenters suggested that my approach, based on the experience with java, is not a very functional style. and as a beginner, I should not define my typeclasses.

In the light of making my style more functional, I want to ask about certain part of the code I am trying to write.

Currently I defined two types of curves on 2D plane

-- Y = slope * X +constant
data LinearCurve = LinearCurve {slope::Double,constant :: Double}

data Point = Point {x::Double,y::Double} 

-- piece wise linear curve that connects given points
newtype PieceWiseLinearCurve = DataCurve{points :: [XYPoint]}

data CoordType = X | Y

type Err = String

Later, other types of curves could be added.

What I need is an interpolating function for all different types of curves.

 interpolate :: a -> CoordType -> Double -> Either Err Double 

My approached based on my past java experience would be to create typeclass and then make instance for every type that I want to have ability to interpolate. But now it seems to me, that even this wording is more OOP than functional - having data types ability to do something is pretty OOP way of thinking I guess…

Anyway, If I forbid myself to define typeclasses I can define new type

newtype Interpolator = Interpolator (CoordType -> Double -> Either Err Double)

and bunch of functions that give me this Interpolator based on the data. I guess, it would be better to combine all curves into one type

data Curve = LinC LinearCurve
          |  PieceLinC PieceWiseLinearCurve

and write one function

 getInterpolatorFromCurve :: Curve -> Interpolator

But of course, many different things can be used to get Interpolator. For example, I can write

 getInterpolatorFromPoints :: (XYPoint, XYPoint) -> Interpolator

I can also go to higher dimensions, in which case I need to redefine interpolator

data Interpolator a b = Interpolator (b -> a -> Either Err Double),

where b should be appropriate coordtype for given dimension. Ok, I have my interpolator type, that can be reused in many different contexts, but I need to spam my namespace by many functions of type getInterpolatorFrom???.

Wouldn’t it better to just use type class with one function that gives me the interpolator and have only one name for the task?

The problem is, I do not see why I should not use typeclasses this way. I understand that my only motive is to reduce number of function names I have instead of the more noble motive to implement polymorphic functions later on (like I can do using Num class to write sum function that sums numbers in a list without knowing the actual number type it will be used for). But why is it considered bad style? What is the motive behind it?

Or is there some other, solution I am not seeing?

1 Like

I think you can combine your piecewise linear curve with the normal linear curve by just adding a boolean to indicate if the ends are extended to infinity or not. And the (XYPoint, XYPoint) example is just a piecewise linear curve with two points. So maybe you can just combine all your curves into one type?

I think 2D interpolation of surfaces is so significantly different that it warrants its own part of the namespace. However, I’m not quite sure what you mean by CoordType, are you interpolating axes separately? Is that really what you want?

I want to interpolate x coordinate given y coordinate and vice versa. The CoordType tells me which coordinate I am interpolating.

I do not think the bool idea is a good one in my case. The thing is, that I want linear curve to be implemented with two doubles for slope and constant term, because this makes subsequent work with it easier. The app uses the curves for creating and analyzing MILP model and these two cases are quite different in this context.

And the (XYPoint, XYPoint) example is just a piecewise linear curve with two points

It is, but I have a little more complicated constructor for my piecewise linear curve. It sorts the list of points, checks if there are at least two points and checks if they are monotone and returns the curve embedded in Either type. Anytime I would get two bare points I would need go through the trouble of creating piecewise linear curve and use interpolate function on it. On top of the additional complexity, I am afraid that this would make the very general interpolation of two points dependent on the implementation of piecewise linear curve, which is supposed to serve more specific needs…

The things is, I really do have three different scenarios in which I want to interpolate. Smashing them together does not seem to me as a good idea.

As you’ve figured out, you have a few options here.

Option 1 is to use a sum type:

data Curve
  = Linear Double Double
  | PiecewiseLinear [XYPoint]
  | ...

interpolate :: Curve -> CoordType -> Double -> Either Err Double
interpolate (Linear m b) = ...
interpolate (PiecewiseLinear points) = ...
...

This gives you a single type for curves, so you can easily store a list of curves and operate on them all with maps and filters, etc. It’s also nice if you might want to define additional functions on curves aside from interpolation. They can be ordinary functions with pattern matching.

However, it does not lend itself well to applications where you want type-safety around the type of Curve you’re working with. If, for example, you have a function that should only operate on piecewise-linear curves, what do you do when the Curve passed in is of the wrong type? Throw an error? It also means you need to define all the possible types of curves in one place.

Option 2 is to use a class:

data LinearCurve = LinearCurve Double Double
data PiecewiseLinearCurve = PiecewiseLinearCurve [XYPoint]
...

class Curve c where
  interpolate :: c -> CoordType -> Double -> Either Err Double

instance Curve LinearCurve where ...
instance Curve PiecewiseLinearCurve where ...
...

Here, you would have to work harder to keep a single list of curves, since there is no single Curve type. And every new operation you want on curves needs to become a new type class.

However, you can express which type of curve you have at the type level, so operations on one single type of Curve are type-safe. You can also easily add new types of curves, by just defining their type and a new instance of the Curve class.

By the way, the challenge of being able to easily add both new curve types and new functions on curves in a modular way is known as “the expression problem”, and much has been written on it. There are lots of “solutions” to the expression problem, but they are all quite too complex to use by default, unless you really need both axes.

So, which to choose? Both are reasonable. I’m unsure where you encountered the advice that using a type class is considered bad style. Haskell programs use type classes quite a lot, and it’s perfectly good style. That said, it is worth taking some care to ensure you’re defining the right abstraction. If there really is one notion of interpolation that you are capturing, then this is fine, and you’ll be able to tell because the general types will tend to just fall into place in a nice way, and you’ll be able to make some general statements about any implementation of the class. (Formally, these are the type class laws.) If you’re shoehorning different concepts into the same word, you’ll find it harder to define the right types or to state properties. Then you might consider backing off and using different names for the different concepts. It’s an acquired skill though, so you have to start somewhere.

2 Likes

Thank you, that was a very helpful reply. The expression problem seems like an interesting topic I should (one day) look closer at.

I’m unsure where you encountered the advice that using a type class is considered bad style

There is a comment under my question at stackoverflow by dfeuer:

Haskell beginners shouldn’t define their own classes at all. Learn to define functions, and types, and instances. These are the vast majority of actual Haskell code. As you do this, you’ll get a good feel for what makes some classes really useful and others less so. You’ll learn what makes some classes easy to use and others full of booby traps. Then when you find a good reason to actually define your own class, you’ll go through a slew of bad class designs before you get good enough at it that only most of your attempts go badly. Designing good classes is really hard and rarely necessary.

To be fair, I did try to do pretty crazy thing with those type classes in the body of the question, so maybe that triggered the harsh reply.

I also asked this subsequent question. The answers and replies did not say I should not define my own typeclasses, but they did discourage me quite a lot about it.

I think the question is why you’re defining a class.

  • If the answer is that you want a separation between interface and representation for the one implementation you intend to build, then a class is the wrong way to handle this in Haskell. Instead, you should just define the type and its public API in a module together, and not export the constructors of the type. Now you only have one module to modify if you choose to change the representation of the type and can keep the same public API.

  • If it’s because you want to write polymorphic functions that operate on many different implementation types based on some common abstraction they all implement, then type classes are the right answer to that question. Then it comes down to getting the common abstraction right, and therein lies the rub, as Shakespeare would say. Abstraction is hard, which is why I feel you’ve been warned away from it to start. Haskell gives you many more ways of abstracting than most languages, which is more powerful, but that abstraction is correspondingly more difficult to do well.

3 Likes

Makes sense. Thank you

another approach could be:

  1. write out functions specific to each case (type of curve)
  2. compare these functions, see if their signatures can be aligned
  3. see if type classes are necessary. A bunch of functions may work just as well.