Parallel property-based testing with a deterministic thread scheduler

stevan · August 8, 2024, 6:10am

https://stevana.github.io/parallel_property-based_testing_with_a_deterministic_thread_scheduler.html

bolt12 · August 8, 2024, 9:29am

It seems you are not familiar with: GitHub - input-output-hk/io-sim: Haskell's IO simulator which closely follows core packages (base, async, stm). (io-sim: A pure simulator for monadic concurrency with STM.). It currently doesn’t support first class thread schedulers but it is deterministic and one can go a long way with random attenuation (delays, errors) in the simulation if done right. We have a lot of success using it at IOG networking team

stevan · August 8, 2024, 9:41am

I’m aware of io-sim, but I’m not aware of any document which describes how it works. It seems to me that it reimplements parts of the run-time system, which seems like a lot of work and error prone.

The approach in this post, due to matklad, seems much simpler (can be explained in full in a relatively a short post). I know that io-sim is more comprehensive in some regards, but not all regards, e.g. it doesn’t support IORefs.

I’ve also never seen a simple example of how to combine io-sim with stateful and/or parallel property-based testing. The traces that io-sim produces and the dynamic logic of quickcheck-dynamic somehow checks is again not explained anywhere (that I’ve seen).

I’d be curious to hear from somebody that understands how io-sim works what the eventual pitfalls are with the approach taken in this post, i.e. what things will be difficult to achieve with this simpler approach.

bolt12 · August 8, 2024, 9:56am

I see, yes you are right, we should definitely write something about how it works in more detail. But in a nutshell it is just a free monad like data structure that has a pure interpretation that follows the IO/GHC run-time system semantics. We have tests that verify the pure implementation against the real behavior so, yes it was quite a piece of work but it is quite robust and correct.

The way it schedules threads and deals with exceptions and what not is not much different very similar to what you describe in your post, but it has the advantage that it also has a notion of time and is pure. So in the end you can write your business logic in io-sim, interpret it in IO for your executable and test it with quickcheck pure-ly, everything with the same code.

I am not familiar with quickcheck-dynamic but I see it was developed by Quviq which also had their part in io-sim by implementing the partial order reduction (POR) feature that can find and swap concurrent threads and find even more possible race conditions. I also see in the README of quickcheck-dynamic reference to io-sim so there might be something there explaining.

stevan · August 8, 2024, 10:42am

I haven’t thought about time in this context, but I’ve simulated time before, so I got some ideas. Will add a to do item about it, thanks!

So in the end you can write your business logic in io-sim, interpret it in IO for your executable and test it with quickcheck pure-ly, everything with the same code.

It would be interesting to see a port of the counter example with increments from the post to your approach. In particular I’m curious to see how you can test different interleavings of threads.

bolt12 · August 8, 2024, 10:59am

It would be interesting to see a port of the counter example with increments from the post to your approach. In particular I’m curious to see how you can test different interleavings of threads.

Your example uses IORef which don’t exist yet in io-sim, however I don’t think it is something one can’t implement Regarding testing different interleaving of threads there are actually 3 approaches:

threadDelays. Since time in io-sim does not pass unless something block either via threadDelay or STM, instrumenting the code with threadDelays to simulate time passing between computations is one way to get different interleavings of threads. Essentially we generate some random delays with Quickcheck which get used randomly in the instrumented code. This is particularly useful in networking code where we want to simulate connections and packet transmission times (and failures!). Note that this is still deterministic. And for each test we might find different interleavings. The nice thing about this is that shrinking will work beautifully.
Either improve io-sim to have first class thread schedulers so the user can have finer grained control (I think dejafu has this) or we could simply add random jitter to every call of threadDelay in the code. Since the free monad data structure is “ours” we can instrument the code on io-sim side (note that this is not done, I am just suggesting it as a way to do it)
Using io-sim-por; this approach has its costs, but if one wants to be sure it searches all the possible race condition search space effectively and tries out all the sensible race conditions, this is the thing to use. io-sim-por will find all concurrent threads that can lead to a race condition, reverse them and see what happens

stevan · August 8, 2024, 11:14am

It should be possible to implement a broken counter with a TVar as well.

bolt12 · August 8, 2024, 11:17am

It should be possible to implement a broken counter with a TVar as well.

Yes, I only meant that the example wouldn’t be 100% equal to yours.

stevan · August 8, 2024, 11:48am

That’s not important.

jasagredo · October 1, 2024, 11:52am

I implemented the example with TVars and created a test-suite with test cases for IO, IOSim and IOSimPOR. The README explains some details on how to interpret the results, and what to expect.

In short, IOSim is deterministic (reproducible) but might not find the race, IOSimPOR is deterministic and will find the race always, IO is non-deterministic and might not find the race.

https://github.com/jasagredo/tvar-counter-iosim

coot · October 1, 2024, 1:19pm

I gave a talk on how IOSimPOR works (it was recorded, but it hasn’t been published yet). My primary motivation was to make it easier for my future self and other io-sim-por contributors to debug bugs in the partial order reduction logic - we found a few bugs, but we haven’t seen any for some time now . It discusses an even simpler example and shows how IOSimPOR finds all the races in that case: https://coot.me/presentations/iosimpor.pdf

stevan · October 2, 2024, 9:06am

Nice, this is the clearest example of what IOSim can do that I’ve seen. You should consider moving this to the README of IOSim itself.

What’s still not clear to me: generating all those delays for IOSim seems quite annoying if the SUT has lots of operations (in this case incrWithDelays only needs two delays, so that’s fine)? Otherwise this approach seems to explore random schedules like in my post?

The IOSimPOR approach on the other hand doesn’t need to generate the delays, as it explores all possible paths (presumably very slow in a bigger SUT)?

Would it not make sense to target the sweet spot where no delays need to be generated and random schedules are explored?

bolt12 · October 2, 2024, 9:32am

generating all those delays for IOSim seems quite annoying if the SUT has lots of operations (in this case incrWithDelays only needs two delays, so that’s fine)? Otherwise this approach seems to explore random schedules like in my post?

Yes this is an advantage and a disadvantage of IOSim. One one hand it is a time based event simulator and that can be useful for your application/use-case. In IOSim time is instantaneous and only passes if there’s a blocking operation that takes time. In a simulation this has to be simulated using thread delays. For example, at IOG networking team we deal with a lot of timeouts and different attenuation so it makes sense, if we want to test our code in simulation, to test if the logic is right and respects the timeouts/fallback mechanisms, if a peer is too slow or eventually disconnects. I don’t think other simulators are time based like this. On the other hand for examples like this one, one has to introduce explicit threadDelays in order to explore random schedules as you pointed out.

The IOSimPOR approach on the other hand doesn’t need to generate the delays, as it explores all possible paths (presumably very slow in a bigger SUT)?

It doesn’t generate all possible paths, it tries to be smart and only identify races that could lead to problems. Yes it is slower than IOSim but not extremely slower as it does not explore all possible paths.

Would it not make sense to target the sweet spot where no delays need to be generated and random schedules are explored?

As I see it there are two possible future work solutions for this. Implement first class scheduler for IOSim, this would be akin to dejafu IIRC and would offer the power and flexibility to the user to be precise about which threads it wants to reorder. The other option is an intermediate step which I actually have a branch on IOSim for a POC, this solution instruments each simulation step with a random threadDelay powered by a random seed, in this way you wouldn’t have to manually add threadDelays and you could still get determinism by using the same seed. However you wouldn’t be able to shrink the seed, for that reason, a first class scheduler would be much better

Topic		Replies	Views
State-based testing with quickcheck-lockstep (Haskell Unfolder #44) Links	1	116	May 14, 2025
Looking for a library like `concurrency` but with file IO operations	13	661	October 22, 2024
The sad state of property-based testing libraries Links	19	2704	July 8, 2024
GHC WebAssembly Weekly Update, 2023-02-08	0	650	February 8, 2023
Ray Tracing in One Weekend Show and Tell	7	3100	August 14, 2024

Parallel property-based testing with a deterministic thread scheduler

Related topics