Reactimate - A new AFRP library

Hello everyone!

I have been working on a new AFRP (Arrowized Functional Reactive Programming) library for about a month which I wanted to share. It is called reactimate and is heavily inspired by the great Yampa library.

Quick Intro to AFRP

For everyone not familiar with AFRP. The basic idea is to augment normal functions a -> b with some internal state. Let’s call this augmented function Signal a b which transforms a into b like a -> b would. However, not only can Signal a b transform a into b, it can also affect its next execution. If you run Signal a b multiple times (e.g. in a game loop or some simulation), you have access to its output from previous iterations. For example, you can implement feedback, which feeds back the previous output as input.

feedback :: b -> Signal (a, b) b -> Signal a b

Why a new library

While working on something else, I have discovered a neat way to represent Signals and evaluate them efficiently. I assumed that Yampa has some overhead and wanted to get better performance.

Key differences to Yampa

While Yampa is side-effect free, reactimate embraces IO for better performance and more expressive power. Here are some key differences:

  • Allows IO during execution
  • Probably has better performance (I have only done microbenchmarks)
  • Allows resampling of signals at different frequencies and in different threads
  • Experimentally integrates Push-Pull-based FRP (e.g. Events, Dynamics and Behaviors) with AFRP

Utility libraries (reactimate-game, reactimate-physics, reactimate-ldtk)

I have also created some utility libraries which integrate with reactimate. You can do some very basic 2D rendering, get input from keyboard and mouse, simulate 2D physics and load LDtk files (a 2D level designer).

However, many features are still missing or are only implemented in a rudimentary fashion.

Try it out

You can take a look at the reactimate repository and clone the project. I recommend to check out the snake-example in reactimate-game.

There is no Hackage release planned right now since there will still be breaking changes regularly.

12 Likes

This seems similar to bearriver, but then specialized to the IO monad. It is built on dunai’s MSF type which looks like this:

-- | Stepwise, side-effectful 'MSF's without implicit knowledge of time.
--
-- 'MSF's should be applied to streams or executed indefinitely or until they
-- terminate. See 'reactimate' and 'reactimateB' for details. In general,
-- calling the value constructor 'MSF' or the function 'unMSF' is discouraged.
data MSF m a b = MSF { unMSF :: a -> m (b, MSF m a b) }

(source)

And bearriver adds a notion of time:

-- | Extensible signal function (signal function with a notion of time, but
-- which can be extended with actions).
--
-- Signal function that transforms a signal carrying values of some type 'a'
-- into a signal carrying values of some type 'b'. You can think of it as
-- (Signal a -> Signal b). A signal is, conceptually, a function from 'Time' to
-- value.
type SF m = MSF (ClockInfo m)

-- | Information on the progress of time.
type ClockInfo m = ReaderT DTime m

(source)

Edit: Oh wait, I see your Signal type is just a IO (a -> IO b) and not recursive at all. That’s interesting.

dunai and bearriver also served as inspiration. As you noted, the big difference is the internal representation of Signal which does not use any recursion.

I figured that the recursive definition has quite some overhead and is much harder for GHC to optimize. In particular, inlining the recursive part seems impossible. IO (a -> IO b) is much easier there and I believe there is a good chance that Signal combinators will be properly inlined.

1 Like

Is there any significance to the reuse of the name reactimate?

Not really. I wanted a somewhat creative name related to AFRP which wasn’t already taken on Hackage.

We ran some benchmarks with dunai and bearriver a few versions ago and new versions of GHC were surprisingly good at inlining and optimizing the code (bearriver was faster than Yampa for real applications but slower for smaller benchmarks).

It would be nice if someone could expand the current benchmarks of dunai and bearriver with much larger, representative examples and an easier way to automate the evaluation. That way, it would be easier to justify performance improvements based on data and experiments. I long resisted adding performance optimizations to dunai until we had the right machinery to do it right but, as of the end of last year, the way to create benchmarks is there.

Also worth noting is that Yampa is about 30% faster if you make time a Float instead of a Double.

1 Like

If you don’t use the recursion, I’d be surprised if GHC cannot optimize it away if you use the right combinators and the IO monad in Dunai, perhaps with a small optimization or pragma to help GHC.

1 Like

I haven’t properly looked at this yet, but I’ll just note the name is (possibly confusingly) similar to reanimate, also a Haskell graphics library.

2 Likes

Microbenchmarks suggest that reactimate is much faster:

Countdown benchmark/Yampa                mean 27.80 ms  ( +- 192.7 μs  )
Countdown benchmark/dunai                mean 70.35 ms  ( +- 261.6 μs  )
Countdown benchmark/reactimate           mean 388.0 μs  ( +- 409.2 ns  )

Integrate benchmark/Yampa                mean 98.37 ms  ( +- 229.2 μs  )
Integrate benchmark/reactimate           mean 9.719 ms  ( +- 45.72 μs  )

Chaining (>>>) benchmark/Yampa           mean 26.71 ms  ( +- 407.0 μs  )
Chaining (>>>) benchmark/dunai           mean 75.62 ms  ( +- 3.863 ms  )
Chaining (>>>) benchmark/reactimate      mean 3.703 ms  ( +- 61.67 μs  )

However, my benchmarks are not that extensive and might be misrepresentative. Maybe someone else can do some benchmarks to verify the results. Moreover, benchmarking real applications is certainly needed until a definitive conclusion about performance can be drawn. I think it’s possible to build a drop-in replacement for Yampa based on reactimate similar to how bearriver is built on dunai.

My gut feeling is that the recursive definition introduces some constant overhead even if you do not use it.

reactimate has no such overhead. However, there is a pathological case for reactimate if you switch very often, so you need to be a little bit careful there.

3 Likes

I really like the Signal type, I thought there had to be a way to derive it. Using Cayley f cat a b = f (cat a b) the inner a -> IO b kleisli arrow can be idiomatically lifted over the outer layers.

{-# Language DerivingVia #-}

newtype Signal a b = Signal (Finalizer -> IO (a -> IO b))
  deriving
    (Functor, Applicative, Alternative)
  via Cayley (Kleisli IO Finalizer) (Kleisli IO) a

  deriving
    ( Profunctor, Strong, Costrong, Choice
    , Category, Arrow, ArrowChoice, ArrowLoop, ArrowPlus, ArrowZero
    )
  via Cayley (Kleisli IO Finalizer) (Kleisli IO)

edit: This inspired me to make a PR so I can reuse Cayley for unary deriving as well: Applicative, Alternative instances for Cayley by Icelandjack · Pull Request #109 · ekmett/profunctors · GitHub

5 Likes

This is very interesting. Could you share the code to these?

It’s here: https://github.com/Simre1/reactimate/blob/main/reactimate/bench/Main.hs

You can run them with cabal run reactimate:bench after cloning.

I can speed up the dunai countdown by around 3x by adding some inline annotations on arr, arrM, and morphGS and rewriting morphGS to use a helper go function.

That was just a quick test, so there is probably more to gain.

Edit: Another ~2x speedup by adding an inline pragma to (.) from the Category instance of MSF. Now sped from the original 65ms to around 12ms in total.

Edit: I’m still seeing Core like this:

$s$fCategoryTYPEMSF_$c._$s$w$c. msfCountBench4 msfCountBench2

That’s an application of an optimized version of composition (.) from the Category instance. That is a recursive function so GHC can’t do that much with it. And those two msfCountBench{4,2} are constructing the two MSF’s that correspond to this composition in the benchmark:

    MSF.feedback 
      count
      (arr (\((), !x) -> (x - 1, x - 1)))
    >>> 
      arr (\x -> if x == 0 then Just x else Nothing)

So I think a large part of the remaining slowness is because these two functions cannot be fused.

And of course the other point that should be fused is the application of the reactimate function to the stream.

2 Likes

@jaror We’re aware that inlining and changing some definitions in dunai should give speedups (`dunai`: `morphGS` is probably inefficient · Issue #370 · ivanperez-keera/dunai · GitHub), but what’s really needed is a realistic benchmarking suite to make the right optimizations. That said, I agree that there is a fundamental optimization barrier for the dunai approach because of the recursion.

Another way to get around it is to use the initial encoding instead the final encoding: https://github.com/lexi-lambda/incremental/blob/master/src/Incremental/Fast.hs This way there is no recursion, and GHC can optimize the resulting functions well. I’ve experimented with it a bit and found dramatic speedups. It has the advantage that it is not tied to the IO monad. (But the reactimate framework may be even faster than that, I don’t know.)

@simon I can understand that using IO is good for performance in your framework, and I’m really intrigued. Well done! But I also find the use of IO problematic:

  • It ties you to GHC, e.g. you couldn’t compile a reactive program using Clash.
  • It hides IO under the hood. When I use a Signal a b, how do I know whether someone used arrIO launchMissiles in its definition?
  • It’s hard to reason about determinism. When I write test cases with my signal functions, will they always be executed in the same way? Right now it seems you’re not using any threads in the implementation, but maybe you will some day for parallelisation, and I won’t notice as a user?

These concerns (primarily the last two) are part of what motivated us to develop dunai, and make the monad visible in the type signature. Would that have worked as well for you? E.g.:

newtype Signal m a b = Signal (m (a -> m b))

My impression is that you mainly need m to be IO in two places:

  1. To have effectful signals, i.e. arrM :: (a -> m b) -> Signal m a b. There, the framework would actually profit from generality.
  2. To hide state. I haven’t thought about this in detail, but maybe it’s possible that other monads (ST? StateT? UnliftIO? MonadIO? MonadConc?) provide the API you need to create, read and write stateful variables.

In particular, you could allow more than only IO! For example, you could have your signal in RIO and share resources/handlers between your components. You could have a transformer stack on top of IO, or some effect system.

2 Likes

5 posts were split into a new topic: Performance of fixed points and fusion

You are absolutely right that using IO under the hood not only has upsides but also downsides. Like with most things, there is always a trade-off.

I agree that it would be best if we could forgo the use of IO, but only if it does not hinder usability.

Performance

I wanted to maximize performance, as such using IO was the obvious choice. With IO, I can easily eliminate the recursive definition and unlock inlining across all combinators. Using a concrete monad ensures that no typeclass dictionaries fail to be specialized for the monad operations. IO also allows an efficient and concise implementation of feedback and friends.

It’s not so easy to come up with a good way to hide state in an efficient manner without IO. I think you can use ST for feedback, switch and similar functions, but not StateT. The advantage of IORef/STRef over StateT is that you do not have to collect all the states from different feedbacks into one big superstate. This is also the problem with https://github.com/lexi-lambda/incremental/blob/master/src/Incremental/Fast.hs. When I tried something similar, I did not manage to get good results.

Different monad stack

I decided against allowing a different monad, since it incurs a complexity cost. However, I can also see the advantages here and am not dead set on disallowing them. I need more usage experience to really decide.

Pure signals

If you want a quasi-pure Signal, you can create an interface around Signal where you disallow impure combinators.

newtype PureSignal a b = Signal a b`

feedback :: PureSignal a b
feedback = ...
-- Do not implement `arrIO` or other side effects  for `PureSignal`

Then you can get the guarantees of a pure signal with good performance.

Other benefits of IO

Using IO is not only good for performance. You can signal functions in different threads (resampleInThread in Reactimate.Sampling). I have also managed to integrate parts from push-pull based FRP in Reactimate.Event. It’s possible to deal with events which happen in-between simulations.

Clash

I haven’t really thought about using AFRP with Clash, but that would be pretty interesting. Are there any examples where someone has used dunai or a similar library?

Conclusion

reactimate trades purity for performance. That’s because I mainly intended for it to be used in game loops or other simulations. However, there may be a middle ground where we can get the best of both worlds.

1 Like

What kinds of games are you trying to make?

With vsync off, and without additional optimizations, I’ve clocked Yampa at 800+ FPS. It of course depends on the complexity of the game and your CPU. On mobile, for haskanoid, the game loop used to take 1-2ms and the rendering took the majority of the cycle.

I’m quite surprised by those benchmarks, but I haven’t had the time to run them on my side.

It’s true that performance for most small games is not really a critical factor. I just enjoy writing fast code.

1 Like

Liu, Yallop, et al. have done some excellent work on causal commutative arrows to get arrow computations into an efficient normal form by construction. It’s possible you’ve already seen this research in building reactimate, but if not, their approach might add some additional insight. The specific linked paper shows a couple approaches regarding “superstates” for CCAs, one which appears to be very similar to the approach taken in incremental's Fast.hs (composition tuples up the two substates), and another which uses ST (composition sequences creation of an STRef for each substate). It has been a bit since I deep-dove the paper, but going off memory, their ST approach was appealing because you pay the cost once to initialize all the substate references, as opposed to the tupling approach which required tupling the new state up each step.

CCAs are much more limited in power than reactimate's Signal and they don’t really lend themselves to dynamic switching, but the idea behind them is one of the more fascinating things I’ve come across in the Haskell space. It’s an area that feels like it has a lot of untapped potential.

Isn’t the whole point of using arrows to enable dynamic switching (Edit: compared to first-order applicative-based FRP)? That’s what Evan Czaplicki says in Controlling Time and Space: understanding the many formulations of FRP.