As of 2023, this library is largely obsolete: arbitrary test generators with shrinking such as falsify offer much better user experience.
I don’t understand why that is. As great falsify is, I find it annoying to write generators again and again. Maybe I’m not using generics and deriving via enough, I find it a bit hard to wrap my head around in that library.
But I find the principles of smallcheck so great: By construction, the smallest test case will be tested first. No worries about shrinking. Series of test values can be implemented without boilerplate, with generics.
So why would someone claim that smallcheck has worse user experience? What’s wrong with it?
I’m surprised this happens so quickly. Could it be that this is implementation specific because of MonadLogic? What if Series would be implemented as a simple list that contains all the values in ascending size? After all, we don’t really need backtracking, so MonadLogic is maybe oversized.
It only needs base as a dependency and works with lists. It runs 10 million tests on a largeish record (8 fields, some of them recursive) in less then half a minute on my laptop.
Can you provide me with a test that explodes in smallcheck? I’ll have a look whether it’s viable in this testing framework.
@turion the problem is not the raw performance of inputs generation, the problem is that for a given depth their number grows exponentially. So with default depth 5 a record with 6 fields will generate roughly 5^6 test cases. And if your property takes two of such records, it’s 5^12, almost a quarter of a billion of them. So at this point you start to micromanage smallcheck tweaking depth here and there (which is already annoying) and end up setting depth of 2 for such test. It’s still at least 2^12 test cases, which is probably okayish, but you are barely testing anything with such depth, right?..
Imagine testing a property on strings. What would you like smallcheck to do? Should it limit itself to enumerating ['a'..] for small depth? In such case it would never generate even something very simple, like a string with a space. Or a digit. Or an upper case letter. quickcheck is so much better: in a matter of microseconds it will throw all kinds of inputs on your test property.
strings don’t see like a reason to deprecate smallcheck in favor of QC tho lolol. A reason to use QC, yeah. But using sc with strings is just using it wrong..
and you’d get strings, strings separated by spaces, strings separated by spaces and newlines both very early on. If it’s important to have spaces and newlines, then you can simply merge a generator that produces them into the default one.
Or if you need longer strings, you can do:
atLeast :: Int -> TestCases String
atLeast n = (<>) <$> replicateM n arbitrary <*> arbitrary
My point is, if you have requirements of some specific data showing up in the generator, then fix the generator. But I don’t see how there is a fundamental limitation to enumerating values.
Ok, here is maybe one thing: The concept of depth is maybe too restrictive. My approach here is to just enumerate in some order ad hoc, without keeping track precisely how deep we are going. If we need a few deeper values early on, just mix them into the generator. Keeping track of depth is not necessary, we can just restrict the number of test cases instead.
How would you know upfront whether it’s important or not? If I definitely know that something is important, I’ll write a unit test to nail it down, no need for quickcheck or smallcheck. The very point of property testing is to get confronted by unknown. And QuickCheck will just throw at me spaces, newlines, digits, punctuation, upper case, lower case, Unicode and what not. While SmallCheck will require me writing a dozen of newtypes with different generators and tweak depth here and there. And all of it for what?
I mean, one can definitely make SmallCheck work. I have fairly extensive test suites using SmallCheck. Guess what? I’ve never seen a case where it would spot a bug, which QuickCheck didn’t.
What would be a reason to use SmallCheck tho lolol?
Not surprising; that’s really not where the value of exhaustive—or rather, simplest first—testing lies. IMO the big win is getting a genuinely minimal counterexample to your property in cases where greedy shrinking just isn’t good enough.
When my domain is small enough? I’d expect most professional developers to be able to make that determination..
QC “works” sometimes but also for certain types of programs, that kind of random testing is an easy way to get master red and annoy everyone you work with randomly.
idk I just see “deprecated in favor of” as meaning the two things are actually equivalent fundamentally. not opinion-wise.
Realistically shrink is already implemented for all basic types and then genericShrink gets you 90% there. There are use cases for writing shrink manually, but they are usually in areas which are far beyond SmallCheck’s reach anyway.
Could you give an example?
Perhaps you are seeing something which is not there?.. SmallCheck description does not contain your quote.
Greedy shrinking will naturally fail to shrink any part of a counterexample that must relate to some other part of the counterexample. In the common case it can still shrink around one or two simple interrelations (e.g. two numbers that must be equal) and produce something simple enough. When those interrelations become more complex and plentiful, however, a greedy shrinker stalls out all too soon.
In particular, I’ve seen this when shrinking things that are more code-like than data-like, e.g. a representation of an opaque type by an AST of its operations.
To counter the previous post, let me just say briefly that I thank you for your maintenance to the library and your detailed responses here. I fully respect your decision to invest your time into projects that you find more rewarding.
If a program fails to meet its specification in some cases, it almost always fails in some simple case.
is false. There are many practical cases where a minimal counterexample is not that small or simple. Hybrid algorithms provide many examples.
To be concrete, imagine testing a hybrid sorting algorithm, which starts with divide-and-conquer quicksort, but switches to insertion sort once subarrays are less then N elements. On modern hardware typical values of N are [15..20] (say, it’s 18 in vector-algorithms). If you try to test it with SmallCheck you would never ever get into testing quicksort part of it, which could happily remain undefined, because enumerating all arrays up to 18 elements will take too long. But QuickCheck will get there fairly quickly.