So, I have a three layer stack: PostgreSQL data base, Haskell back end, PureScript front end. And there is a fault somewhere in this stack: sometimes some items present in the data base do not make it to the front end. A small fortune in programmer hours has been spent searching for the fault but there is no progress. No one can figure out even roughly where the fault is. Our last hope is to property check the whole thing, from end to end, verifying that any things put into the data base can be retrieved by the front end.
We have in place a fixture that creates a new PostgreSQL cluster, runs the back end on it and makes some scripted HTTP requests. We use it to run unit tests, such like «if I make this PUT
request and then that GET
request to the same end point, I should get back the same thing as I put in». In principle it would not be a problem to make a property check out of this property. Practically, tearing down the cluster, building a new pristine one and initializing the data base anew takes seconds, so this way even a trivial property check would take minutes to run.
Theoretically, we can carefully reset the data base back to the initial state after each run of the property check. Given that only a small amount of data is modified, it would take imperceptibly little time. But it is hard to know what exactly should be reset, and it is easy to get it wrong if done by hand!
One trick we do use is SQL transactions. We take a single connexion, start a transaction, perform a check and then roll the transaction back. Unfortunately, even this is not completely safe: it looks as though sequences used to draw primary keys from are not reset to their previous values! (Or maybe we are doing something wrong — I am not really a PostgreSQL guru so I am not too sure of myself.) But even ignoring this problem (it is irrelevant to most checks), there is another, more severe problem: transactions guarantee that uncommitted data is only visible inside the same connexion. So, there is no way to request it from the HTTP API. This trick is suitable for fuzzing the data base – back end layer, but nothing beyond that.
Another thing I heard being done is to selectively drop and re-create exactly the tables affected by a run of a property check. This seems challenging to automate though. How can I know in a general way if a table has been touched? And then, how can I re-create exactly the affected tables given that the data base initialization code is an opaque blob?
I wonder if this problem is already solved. In any case, any advice is welcome!