That’s very nice! I wanted mutation testing in Haskell for years!
What’s the underlying mechanism of mutation? Do you mutate source code and recompile it somehow?
not a FOSS license tho..
There’s a GHC plugin that compiles all the mutations into the same binary so that you still only need to compile once. The mutations are then activated one-by-one at runtime by the mutation engine.
A test suite for test suites, is it?
How can I tell that a mutation is
- indeed of undesired behaviour,
- sufficiently distinct from the original that it deserves its own test case?
Especially with AI-generated code, I venture that mutation might actually have a chance of improving the code. Isn’t that what genetic algorithms were about?
The question isn’t whether a mutation is good or bad. The question is: does this mutation cause a change in behavior that is caught by a test?
Why should I care?
The basic assumption seems to be that a test suite must fail if the code is altered semantically.
Suppose there are two data types, A and B and a function f :: A -> B that is to be tested. Suppose both A and B are finite with n and m total elements, respectively. Then f has m^n - 1 possible mutations. A naive test suite, adding one test after another, might need in the same order of magnitude of test cases, a cleverly designed suite only log(m^n) test cases to rule out all undesired mutations. With infinite types and general recursion, this even becomes infeasible.
Consider the example in the announcement. The tests cover the design space of Int -> Int -> Bool while the semantics seem to be concerned with both integers in a finite range from 0 to some number, e.g. 20. Hence there are infinitely many mutations of canCastFireball that could be distinguished by test cases yet to be written, but that are semantically identical in the design space that the programmer cares about. If a mutation alters the behaviour at level = -42, does that mean we ought to add a test for that?
What I am trying to convey is that a spec should pin down the desired behaviour as excactly as possible, and if done right the code could be extracted from it. If a mutation does not fail the tests, it does not necessarily mean the tests are bad. It only means that the desired behaviour is exhibited by a non-singleton set of possibilities. The question is, do my tests let only the desired programs pass?
Mutation tests could indeed be very helpful with that. I would expect a mutation test to nudge the code into a direction specified by a “spec for tests”. Perhaps sydtest does just that, but one can not tell from the announcement.
From how I read Jello_Raptor comments on Announcing Mutation Testing in Haskell the library just surfaces these untested situations for you, it’s up to you to deal with them. The immediate goal here is to improve the test suite, which presumably will eventually improve the quality of the code[0]. I’ve done lots of minor “manual mutation testing” to ensure a new test covers a bugfix, so I do see the value.
[0] It has also been used in attempts to improve the quality of the coder.