Would it be possible to guarantee specialization/inlining functions based on the callee (as a plugin maybe)?

I usually work with ML/Numerical Algorithms and haskell is more of a hobby. One thing that’s important for haskell to have speedy numerical algorithms is inlining/specialization for the basic numerical types. So I wonder:

Do you think there’s a way for a ghc plugin to propagate information from the callee to called function across module boundaries? I imagine something like a special selected set of “magic types” (like Float) where we reverse the flow of information in order to guarantee that whatever function called with this type is inlinable and specialized/inlined. Ideally do this recursively. Maybe only for calls originating from specially marked functions so that maybe not the whole compilation gets bugged down, but I can still guarantee that for my numerical routines we take extra care and it’s not that delicate.

Is this a good idea? I know this might be naive. Afaik compilation usually flows the other way round

1 Like

Do you think there’s a way for a ghc plugin to propagate information from the callee to called function across module boundaries?

Can you be more precise? Give examples of what you want to achieve. Thanks!

I think it’s something like that. Let consider function sort :: Vector v a => v a -> v a. Generic version is horribly slow so we want to always use fully specialized version and it’s large so we don’t want to inline it.

So how could we proceed? One way is to add SPECIALIZE pragmas to sort. But then we’ll need to add such pragmas to all generic callers of sort. And 1) it’s a lot of work 2) there’s no good way to find which specializations are missing 3) user who calls sort or any function which uses sort with some freshly defined type needs to add all missing SPECALIZE and as I said there’s no easy way to find them.

Right now only semi-reliably working way of getting specialization is to use INLINE. I think it’s overused quite a bit, see vector-algorithms as an example

yes! sry if I was unclear. Especially because for numerical algorithms there’s a clear “hot” path where we really care about performance! And it’s really all about performance.

But, not only is this annoying to do: 1st. you don’t realize when specializations are missing, 2. it’s unreliable as the “willingness” of ghc to do the right thing can change between GHC-versions and 3. it’s a lot of work. You do not only have to write the code, no, you have to babysit these pragmas a lot and figure out what went wrong.

And there are some simple rules to look for, like Int and Float should never be called generically in these routines (what I meant with “special types”). So it would be nice to check this automatically and nudge ghc into the right direction

This looks like implicit template instantiation in C++. I.e. your sort function is a kind of template that has to be instantiated (i.e. specialized) by the compiler.

We could have a pragma ALWAYS SPECIALIZE that would force GHC to specialize at call-sites. Then we have the same issues as C++ compilers: duplicated code that has to be be removed by the linker or that has to shared somehow in a compiler session.

Now if the caller of an ALWAYS SPECIALIZE function is itself polymorphic and uses one of its received dictionaries to call the template function, then the compiler could mark the caller function as ALWAYS SPECIALIZE too, transitively.

There are some cases where the compiler won’t be able to specialize, e.g. if we have a value of type (Vector v a => v a) and call sort on it. In this case the compiler could warn.

1 Like

yeah code bloat might be a problem….the thing is that I was wondering whether I can reverse this flow of information. This goes again from abstract definition to specific function call and depends on this frickle thing that I’ve prepared everything beforehand such that ghc later can specialize/inline. When I am writing a numerical algorithm I really, really care about only few lines of code. So I want to trace every call originating from these few lines and make them inlinable/specializable, then you could warn if something breaks (can not be inlined/specialized) such that I can depend on it.

How many different types of vector are you realistically going to use? I’d recommend just committing to storable or unboxed vectors.

There’s also some related discussion on this ghc issue:

In particular:

On the mutable side, the difficulty is PrimMonad. It’s a useful tool, but it’s also a typeclass, so if you ever try to build stuff on top of it, you have to be careful with inline and inlineable or you end up with dictionary-function calls at every bind, binds that were supposed to compile to no-ops.

yes, the difference is that I want to reverse the thought process: not start with the abstract definition of the functions/data and prepare it correctly in case I need it later. As basically everything can also be needed later the tradeoff is less clear, also sometimes it’s brittle (e.g. I forget the pragma for some function in the chain). But I want to start from a pre-defined, small hot code path and then propagate the information back such that everything I would need can be inlined/specialized and I get a warning if not.

I guess the advantage of the numerical code is that I usually know which paths are going to be important and it’s relatively self-contained in a few functions.

Problem is that unboxed vector require specialization for each element type. For some Unbox a we don’t now anything useful for specialization: nor type of vectors, — it’s data family, nor operations for working on vector, — they’re come as type class dictionary parameter. And even for storable I think we’d like to have per element specializations in order to inline peek/poke. And that’s a lot of specializations.

1 Like

I think this kind of reasoning detracts from the bigger problem. The ‘sort’ example with a Vector is but one obvious use case. What about if I had some general ‘Tree tree => tree a’ data structure, or a ‘Point point => point r’ data structure, or …. Yes, one could pick a particular Vector, Tree, or Point implementation, but to be honest: I don’t want to, because most of the code does not depend on the particular type of Vector, Tree, or Point.

It would be super useful to insist to GHC to specialize/monomorphize some piece of code (or even some data structure), and the current tools for this are unfortunately sorely lacking.

(edit: more extensive backpack support could be an alternative, but their usability is currently just too bad).

We’re talking about numeric and machine learning algorithms, in which case you should care about the performance and thus also the particular types of vectors, trees, and points. Even then, you can still decouple your code by using it only through a minimal interface defined in a separate module.

I asked “Can you be more precise? Give examples of what you want to achieve.” and you replied

But that stll falls short of a precise statement of what you want to achieve.

Let me try guessing, baesd on @Shimuuar 's input.

  • You write a library that exports a type-class-overloaded function
    foo :: C a => [a] -> a -> [a]
    
  • In a client library, the function foo is called at some data type T; presumably there is an instance C T floating around.
  • You would like GHC to make a specialised copy of foo, in the client module, specialised for the data type T.

Is that is?

If so, you can say {-# INLINEABLE foo #-} and you should get exactly that. I regret the keyword – SPECIALISABLE would be better – but that’s what it is right now.

Does that work for you?

1 Like

I think real question is how to push all that work on compiler. Especially when there functions that need are called several layers deep. Let say we have functions sort and median which calls sort internally:

sort :: (Vector v a, Ord a) => v a -> v a

median :: (Vector v a, Fractonal a, Ord a) => v a -> a

At some point user wants to call median @UVector @Double. In order to make sure that call uses specialized version he needs to add specializations for both median and sort or to make sure that they’re in scope. And if (or rather when) call graph becomes deep and wide it becomes too difficult to manage manually.

Easy way out is to slap INLINE and accept longer compile time and object code bloat. That’s how we ended up with INLINE slapped on sorting functions. They’re large. They have no busy being inlined

3 Likes

You give the impression that in such code abstraction is not needed/possible. I disagree [*]. However, what people are asking for is that the compiler can guarantee that monomorphization/specialization occurs. Other languages have tools for this (for example; template programming in C++). In haskell/GHC these tools are lacking.

[*] For example CGAL heavily relies on template programming to abstract the types used to implement numbers, points, and things such as side-tests in computational geometry algorithms.

2 Likes

AFAIK -fexpose-all-unfoldings -fspecialise-aggressively achieves this.