I agree that the current implementation of the discrimination library is not fast enough. But I believe that the fundemental idea is that you can “serialize” any data structure as an array of integers (or bitstring), and this array can be used for both ordering and grouping of items relatively quickly.
My Idea for an algorithm:
- For each element convert to an unboxed array of integers.
- Keep a hashmap (patricia tree for sorting) for each size of list encountered.
- These arrays can be stored unboxed and checked for equality fast in the hashmap.
- Optionally convert back or have original values stored in hashmap.
- profit