Reactimate - A new AFRP library

If you don’t use the recursion, I’d be surprised if GHC cannot optimize it away if you use the right combinators and the IO monad in Dunai, perhaps with a small optimization or pragma to help GHC.

1 Like

I haven’t properly looked at this yet, but I’ll just note the name is (possibly confusingly) similar to reanimate, also a Haskell graphics library.

2 Likes

Microbenchmarks suggest that reactimate is much faster:

Countdown benchmark/Yampa                mean 27.80 ms  ( +- 192.7 μs  )
Countdown benchmark/dunai                mean 70.35 ms  ( +- 261.6 μs  )
Countdown benchmark/reactimate           mean 388.0 μs  ( +- 409.2 ns  )

Integrate benchmark/Yampa                mean 98.37 ms  ( +- 229.2 μs  )
Integrate benchmark/reactimate           mean 9.719 ms  ( +- 45.72 μs  )

Chaining (>>>) benchmark/Yampa           mean 26.71 ms  ( +- 407.0 μs  )
Chaining (>>>) benchmark/dunai           mean 75.62 ms  ( +- 3.863 ms  )
Chaining (>>>) benchmark/reactimate      mean 3.703 ms  ( +- 61.67 μs  )

However, my benchmarks are not that extensive and might be misrepresentative. Maybe someone else can do some benchmarks to verify the results. Moreover, benchmarking real applications is certainly needed until a definitive conclusion about performance can be drawn. I think it’s possible to build a drop-in replacement for Yampa based on reactimate similar to how bearriver is built on dunai.

My gut feeling is that the recursive definition introduces some constant overhead even if you do not use it.

reactimate has no such overhead. However, there is a pathological case for reactimate if you switch very often, so you need to be a little bit careful there.

3 Likes

I really like the Signal type, I thought there had to be a way to derive it. Using Cayley f cat a b = f (cat a b) the inner a -> IO b kleisli arrow can be idiomatically lifted over the outer layers.

{-# Language DerivingVia #-}

newtype Signal a b = Signal (Finalizer -> IO (a -> IO b))
  deriving
    (Functor, Applicative, Alternative)
  via Cayley (Kleisli IO Finalizer) (Kleisli IO) a

  deriving
    ( Profunctor, Strong, Costrong, Choice
    , Category, Arrow, ArrowChoice, ArrowLoop, ArrowPlus, ArrowZero
    )
  via Cayley (Kleisli IO Finalizer) (Kleisli IO)

edit: This inspired me to make a PR so I can reuse Cayley for unary deriving as well: Applicative, Alternative instances for Cayley by Icelandjack · Pull Request #109 · ekmett/profunctors · GitHub

5 Likes

This is very interesting. Could you share the code to these?

It’s here: https://github.com/Simre1/reactimate/blob/main/reactimate/bench/Main.hs

You can run them with cabal run reactimate:bench after cloning.

I can speed up the dunai countdown by around 3x by adding some inline annotations on arr, arrM, and morphGS and rewriting morphGS to use a helper go function.

That was just a quick test, so there is probably more to gain.

Edit: Another ~2x speedup by adding an inline pragma to (.) from the Category instance of MSF. Now sped from the original 65ms to around 12ms in total.

Edit: I’m still seeing Core like this:

$s$fCategoryTYPEMSF_$c._$s$w$c. msfCountBench4 msfCountBench2

That’s an application of an optimized version of composition (.) from the Category instance. That is a recursive function so GHC can’t do that much with it. And those two msfCountBench{4,2} are constructing the two MSF’s that correspond to this composition in the benchmark:

    MSF.feedback 
      count
      (arr (\((), !x) -> (x - 1, x - 1)))
    >>> 
      arr (\x -> if x == 0 then Just x else Nothing)

So I think a large part of the remaining slowness is because these two functions cannot be fused.

And of course the other point that should be fused is the application of the reactimate function to the stream.

2 Likes

@jaror We’re aware that inlining and changing some definitions in dunai should give speedups (`dunai`: `morphGS` is probably inefficient · Issue #370 · ivanperez-keera/dunai · GitHub), but what’s really needed is a realistic benchmarking suite to make the right optimizations. That said, I agree that there is a fundamental optimization barrier for the dunai approach because of the recursion.

Another way to get around it is to use the initial encoding instead the final encoding: https://github.com/lexi-lambda/incremental/blob/master/src/Incremental/Fast.hs This way there is no recursion, and GHC can optimize the resulting functions well. I’ve experimented with it a bit and found dramatic speedups. It has the advantage that it is not tied to the IO monad. (But the reactimate framework may be even faster than that, I don’t know.)

@simon I can understand that using IO is good for performance in your framework, and I’m really intrigued. Well done! But I also find the use of IO problematic:

  • It ties you to GHC, e.g. you couldn’t compile a reactive program using Clash.
  • It hides IO under the hood. When I use a Signal a b, how do I know whether someone used arrIO launchMissiles in its definition?
  • It’s hard to reason about determinism. When I write test cases with my signal functions, will they always be executed in the same way? Right now it seems you’re not using any threads in the implementation, but maybe you will some day for parallelisation, and I won’t notice as a user?

These concerns (primarily the last two) are part of what motivated us to develop dunai, and make the monad visible in the type signature. Would that have worked as well for you? E.g.:

newtype Signal m a b = Signal (m (a -> m b))

My impression is that you mainly need m to be IO in two places:

  1. To have effectful signals, i.e. arrM :: (a -> m b) -> Signal m a b. There, the framework would actually profit from generality.
  2. To hide state. I haven’t thought about this in detail, but maybe it’s possible that other monads (ST? StateT? UnliftIO? MonadIO? MonadConc?) provide the API you need to create, read and write stateful variables.

In particular, you could allow more than only IO! For example, you could have your signal in RIO and share resources/handlers between your components. You could have a transformer stack on top of IO, or some effect system.

2 Likes

5 posts were split into a new topic: Performance of fixed points and fusion

You are absolutely right that using IO under the hood not only has upsides but also downsides. Like with most things, there is always a trade-off.

I agree that it would be best if we could forgo the use of IO, but only if it does not hinder usability.

Performance

I wanted to maximize performance, as such using IO was the obvious choice. With IO, I can easily eliminate the recursive definition and unlock inlining across all combinators. Using a concrete monad ensures that no typeclass dictionaries fail to be specialized for the monad operations. IO also allows an efficient and concise implementation of feedback and friends.

It’s not so easy to come up with a good way to hide state in an efficient manner without IO. I think you can use ST for feedback, switch and similar functions, but not StateT. The advantage of IORef/STRef over StateT is that you do not have to collect all the states from different feedbacks into one big superstate. This is also the problem with https://github.com/lexi-lambda/incremental/blob/master/src/Incremental/Fast.hs. When I tried something similar, I did not manage to get good results.

Different monad stack

I decided against allowing a different monad, since it incurs a complexity cost. However, I can also see the advantages here and am not dead set on disallowing them. I need more usage experience to really decide.

Pure signals

If you want a quasi-pure Signal, you can create an interface around Signal where you disallow impure combinators.

newtype PureSignal a b = Signal a b`

feedback :: PureSignal a b
feedback = ...
-- Do not implement `arrIO` or other side effects  for `PureSignal`

Then you can get the guarantees of a pure signal with good performance.

Other benefits of IO

Using IO is not only good for performance. You can signal functions in different threads (resampleInThread in Reactimate.Sampling). I have also managed to integrate parts from push-pull based FRP in Reactimate.Event. It’s possible to deal with events which happen in-between simulations.

Clash

I haven’t really thought about using AFRP with Clash, but that would be pretty interesting. Are there any examples where someone has used dunai or a similar library?

Conclusion

reactimate trades purity for performance. That’s because I mainly intended for it to be used in game loops or other simulations. However, there may be a middle ground where we can get the best of both worlds.

1 Like

What kinds of games are you trying to make?

With vsync off, and without additional optimizations, I’ve clocked Yampa at 800+ FPS. It of course depends on the complexity of the game and your CPU. On mobile, for haskanoid, the game loop used to take 1-2ms and the rendering took the majority of the cycle.

I’m quite surprised by those benchmarks, but I haven’t had the time to run them on my side.

It’s true that performance for most small games is not really a critical factor. I just enjoy writing fast code.

1 Like

Liu, Yallop, et al. have done some excellent work on causal commutative arrows to get arrow computations into an efficient normal form by construction. It’s possible you’ve already seen this research in building reactimate, but if not, their approach might add some additional insight. The specific linked paper shows a couple approaches regarding “superstates” for CCAs, one which appears to be very similar to the approach taken in incremental's Fast.hs (composition tuples up the two substates), and another which uses ST (composition sequences creation of an STRef for each substate). It has been a bit since I deep-dove the paper, but going off memory, their ST approach was appealing because you pay the cost once to initialize all the substate references, as opposed to the tupling approach which required tupling the new state up each step.

CCAs are much more limited in power than reactimate's Signal and they don’t really lend themselves to dynamic switching, but the idea behind them is one of the more fascinating things I’ve come across in the Haskell space. It’s an area that feels like it has a lot of untapped potential.

Isn’t the whole point of using arrows to enable dynamic switching (Edit: compared to first-order applicative-based FRP)? That’s what Evan Czaplicki says in Controlling Time and Space: understanding the many formulations of FRP.

As I see it, the point of using arrows isn’t to enable dynamic switching. The static nature of the Arrow interface enables optimizing the computation ahead of running it (assuming the concrete type implementing the interface is concerned with this), and arrow-style feedback loops offer very concise management of local state.

The dynamic switch at least as it is formulated in Yampa gives a Monad-like interface on top of signal functions:

switch :: SF a (b, Event c) -> (c -> SF a b) -> SF a b

Much like Monad, the dynamic nature of this gives maximum flexibility because the network structure can be reconfigured on the fly. This comes at a cost of not being able to optimize the whole network ahead of time though, which is something a more constrained framework like CCA is able to achieve.

While FRP without dynamic switching is not as powerful as FRP with dynamic switching by definition, there are many useful programs that can be expressed statically, especially when local state is so readily available.

1 Like

Compared to monads, arrows indeed allow more static optimization. However, if you don’t have dynamic switching then you should really compare to applicatives. Which programs can you write using CCAs that you can’t write with simple applicatives?

Programs that can be written using CCAs that can’t be written with simple applicatives are those that leverage the extra power the Arrow interface provides over Applicative. When the input of one part of a computation directly depends on the output from a previous part of the computation, then that computation can’t be expressed in an Applicative context. The computation can be expressed in Arrow or Monad contexts, but when both parts of the computation are statically known, Arrow offers just enough power to express it. The benchmarked flute and sheperd programs in the linked CCAs Revisited paper are examples that can’t be expressed in an Applicative context as far as I’m aware.

1 Like

I’m in fact not aware of any, it’s just on the long list of things I want to do eventually :sweat_smile: in the dynamic switching debate, Clash should fall in the same expressivity category like arrows (although they insist on writing as much as possible with Applicative).

Sounds like a blend between React and reanimate.

Here’s how I would write that shepard example from the paper in a normal applicative FRP style:

shepard :: Time -> Behavior Double
shepard seconds = if seconds <= 0.0
  then pure 0.0
  else
    let f = envLineSeg [800, 100, 100] [4.0, seconds]
        e = envLineSeg [0, 1, 0, 0] [2.0, 2.0, seconds]
        s = osc sineTable 0 f
        r = delayLine 0.5 (shepard (seconds − 0.5))
    in (e * s * 0.1) + r

Of course I guess the real question is how I’d implement these abstractions. That will take some more time. And the paper doesn’t seem to say what envLineSeg is any way.

Edit:

Actually, it is not so hard to implement it with a free applicative (click to see code)
import Control.Applicative

type Time = Double

type Vector = []

type Table = Vector Double

data Behavior a where
  EnvLineSeg :: [a] -> [Time] -> Behavior a
  Osc :: Table -> Double -> Behavior Double -> Behavior Double
  DelayLine :: Time -> Behavior a -> Behavior a
  Ap :: Behavior (b -> a) -> Behavior b -> Behavior a
  Pure :: a -> Behavior a
instance Functor Behavior where
  fmap = liftA
instance Applicative Behavior where
  pure = Pure
  (<*>) = Ap
instance Num a => Num (Behavior a) where
  (*) = liftA2 (*)
  (+) = liftA2 (+)
  fromInteger = pure . fromInteger
  abs = fmap abs
  signum = fmap signum
  (-) = liftA2 (-)
instance Fractional a => Fractional (Behavior a) where
  fromRational = pure . fromRational
  (/) = liftA2 (/)

sineTable :: Table
sineTable = undefined

shepard :: Time -> Behavior Double
shepard seconds = if seconds <= 0.0
  then Pure 0.0
  else
    let f = EnvLineSeg [800, 100, 100] [4.0, seconds]
        e = EnvLineSeg [0, 1, 0, 0] [2.0, 2.0, seconds]
        s = Osc sineTable 0 f
        r = DelayLine 0.5 (shepard (seconds - 0.5))
    in (e * s * 0.1) + r

indexOfCumulative :: Double -> [Double] -> Int
indexOfCumulative = undefined

sample :: Time -> Behavior a -> a
sample t (EnvLineSeg xs ts) = xs !! indexOfCumulative t ts
sample t (Osc table what ever) = undefined
sample t (DelayLine d x) = if t < d then 0 else sample (t - d) x
sample _ (Pure x) = x
sample t (Ap f x) = sample t f $ sample t x
1 Like