Performance of fixed points and fusion

Continuing the discussion from Reactimate - A new AFRP library - #14 by turion

I believe the right terminology is “terminal” and “initial” (see this stackoverflow answer). The current MSF type is a direct fixed point (a la Fix):

data FixMSF m a b = FixMSF (a -> m (b, FixMSF m a b))

It could indeed be faster to write it as a greatest fixed point of a terminal coalgebra (a la Nu):

data NuMSF m a b = forall c. NuMSF (c -> a -> m (b, c)) c

You could also instead take the least fixed point of an initial algebra (a la Mu):

data MuMSF m a b = MuMSF (forall c. ((a -> m (b, c)) -> c) -> c)

In all of these types you might be able to recognize a common base functor:

data MSFBase m a b f = MSFBase (a -> m (b, f))

Both the Nu and the Mu encodings should get you automatic fusion if all goes well and GHC is smart enough. If you’re doing a lot of zip-like functions then you probably do want the Nu representation (which is also what the work on stream fusion uses). Whereas the fuision that is now in base (foldr/build) uses the Mu representation.

1 Like

I was always puzzled about this, it seems like you can help me :slight_smile: . I know what Fix/Mu/Nu do, but don’t understand this enigmatic remark in the documentation of Mu: Least fixed point. Efficient folding., resp. the documentation of Nu: Greatest fixed point. Efficient unfolding. What is meant by “efficient” here? Some deep theoretical property of the representation, or some interaction with the optimization heuristics of GHC? A pointer to some place where I can read more would also be helpful.

1 Like

The mu data type contains a function that is essentially the fold so folding is just extracting that function and applying it:

foldMu :: (f a -> a) -> Mu f -> a
foldMu f (Mu mk) = mk f
-- and really it could be written like this:
foldMu' :: Mu f -> (f a -> a) -> a
foldMu' = unMu

The nu data type is similar but opposite. To unfold a nu we simply store the arguments of the unfold:

unfoldNu :: (a -> f a) -> a -> Nu f
unfoldNu = Nu

But I think this is mostly theoretical. You always need both construction and deconstruction to do useful things with a data type. Making one fast will probably make the other slower.

1 Like

I thought there is something deeper going on. I understand that Mu is just defined as containing a fold, but I don’t see in what situations that allows me to write more efficient code. That presupposes that I can find a mk argument to the Mu constructor which folds more efficiently than the ordinary cata function which uses the Fix type. To me this seems analogous to a comparison between Church and Peano numbers. Are Church numerals more efficient at iteration than Peano encoded numbers just because Church numerals are defined via iteration?.

If we specialize the example to lists and the list functor: Can I find an inhabitant of Mu ListF that is more efficient than an implementation which is internally very similar to using cata on Fix ListF, i.e. an ordinary fold? Or is there maybe some set of examples other than lists where Mu f allows for a more efficient representation than Fix f?

I always thought that the remark about efficiency has to do with some possibilities for fusing producers and consumers which are not possible in the Fix f representation, like you write in your post above.

(If this gets too off-topic then we can maybe split this into another thread)

The gist is: type MuList = Mu ListF can fuse concatMap but not both arguments of zip, because we may only eliminate one of the arguments.
type NuList = Nu ListF can fuse both arguments of zip but not the general form of concatMap (see 7.2 in https://www.cs.tufts.edu/~nr/cs257/archive/duncan-coutts/stream-fusion.pdf). In most cases this can be worked around with Jaro’s work on higher-order rule matching as described in 9.1 of the paper.

By “fuse”, I mean that we won’t have any more ListF constructors than when writing the pipeline manually.
See for example the definition of next_ret immediately before the “Static argument transformation” subsubsection in 7.2. There we see how stream fusion currently fails to eliminate the Yield and Done constructors.

I think it is mostly a theoretical remark stating which function is easier to implement. But let’s just ask the authors: What does efficient folding/unfolding mean? · Issue #26 · spell-music/data-fix · GitHub

The trouble with concatMap fusion is that it is not possible in general, but a specific use of it which is still quite common can be fused. The specific case that can be fused is essentially when the structure of the output list does not depend on the elements of the input list. For reasons I don’t quite know off the top of my head it is easier to recognize that specific case for foldr/build fusion than it is for stream fusion.

1 Like

I implemented NuMSF as a library now: automaton: Effectful streams and automata in initial encoding

Terminology question: You would not call this the “initial encoding”, and dunai/machines MSF the final encoding?

It turns out to be much faster than the direct fixed point in the typical streaming applications I’ve benchmarked. But the speedup comes mainly from the trick to allow for a stateless constructor as well:

data NuMSF m a b where
  Stateful :: (c -> a -> m (b, c)) c -> NuMSF m a b
  Stateless :: (a -> m b) -> NuMSF m a b

Otherwise, functions like arr etc. clutter up the existential type and prevent optimizations.