Generic-diff: generic structural diffs

I’d like to announce a new package candidate: generic-diff: Generic structural diffs

As the name implies, the package enables comparing two values of the same type, and pinpointing exactly where they differ. The main use-case I imagine is in testing, when we want to debug exactly why a test failed, but the derived Show instance of the types we’re working with don’t lend themselves to easy inspection. For example if the output has a lot of parentheses, nesting etc, it’s not always easy to look at two values and spot where the difference is. Much more detailed motivation, with examples, can be found in Generics.Diff.

There is some prior art in this space: gdiff gives us an “Edit Script” which tells us how to transform one value into another (and therefore tells us all the differences between two values). This is much more comprehensive, but does require more boilerplate (Family has to be implemented manually). By comparison, generic-diff will stop at the first difference between two values; however it requires no boilerplate (just instances of Generic and HasDatatypeInfo from generics-sop).

One other nice feature of generic-diff is the ability to pretty-print the diff types (which use sop-core types, and therefore themselves can be quite difficult to read):

{-# LANGUAGE DerivingStrategies #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE DeriveAnyClass #-}

import Generics.Diff
import Generics.Diff.Render

import qualified GHC.Generics as G
import qualified Generics.SOP as SOP

data BinOp = Plus | Minus
  deriving stock (Show, G.Generic)
  deriving anyclass (SOP.Generic, SOP.HasDatatypeInfo, Diff)

data Expr
  = Atom Int
  | Bin {left :: Expr, op :: BinOp, right :: Expr}
  deriving stock (Show, G.Generic)
  deriving anyclass (SOP.Generic, SOP.HasDatatypeInfo, Diff)

expr1, expr2 :: Expr
expr1 = Bin (Atom 1) Plus (Bin (Atom 1) Plus (Atom 1))
expr2 = Bin (Atom 1) Plus (Bin (Atom 1) Plus (Atom 2))
ghci> printDiffResult $ diff expr1 expr2
In field right:
  Both values use constructor Bin but fields don't match
  In field right:
    Both values use constructor Atom but fields don't match
    In field 0 (0-indexed):
      Not equal

The repo can be found here.
Any comments or suggestions are welcome :slight_smile:

21 Likes

Awesome, I can imagine many use cases.

Can the DiffError class give good results on maps? I imagine the internals of e.g. red-black trees of Data.Map are not that interesting for determining where two maps differ, as an insertion somewhere deep in the balanced map can trigger a tree rotation and thereby alter the root element.

(Hash-)Maps would be important so that we can diff large structures such as JSON or even entire spreadsheet documents.

2 Likes

Thanks for the feedback! If you could let me know any of the use cases you thought of, that might help me shape the library to enable certain cases.

The current SOP approach wouldn’t really work well for Data.Map, or in fact any type whose Eq instance is very different from what a derived Eq instance would be, or any type that has some kind of invariant. It’s a useful observation because it highlights an inflexibility of the library to deal with the edge cases. I have an idea for allowing users to extend to these special cases, I’ll have a go and let you know :slight_smile:

Two examples involving maps:

  1. The Object part of aeson’s Value type uses KeyMaps which used to be HashMaps and in more recent versions are actually the red-black trees from containers. I would expect a diff of two Values to yield the most shallow position in the structure that is different, which could be a key in an Object.
  2. The Xlsx type is Generic and its spreadsheets are represented as Maps from (row,column) tuples to cells. I would expect a diff to either exhibit a sheet renaming, a permutation of the sheets or some cell in some sheet that has been altered.