Persist is a binary serialization library with two aims
Have a default serialization format that can simply be derived for most types using Generic
Allow a type to have a custom serialization format that matches an external specification.
Most of the improvements with this release were for goal 2. There is now support for backpatching a length value and avoid a 2nd data-structure walk. This is useful with tag-length-value formats where the length is a fixed number of bytes. It is also a source for some performance improvements with the default format for the builtin list datatype. There is additional support for de-serializing these formats as well with the new getPrefixunsafeGetPrefix helpers.
The API is similar enough to binary and cereal that it should be easy enough to try out the library.
Includes utilities to match externally specified data formats.
I couldn’t find this in the module haddocks. What does this mean exactly? Could I use this library to handle a binary format like UBJSON? or is that not what “externally specified” means here?
That is the kind of thing meant by “externally specified.” I intend to use these new features in the cql library to match the cassandra encoding.
For example, all of the numbers in UBJSON are written in big-endian format. You would want to use putBE for them. The default is little-endian if you use the default via Generic.
The API bits I would consider useful for external formats are:
putLE/putBE
reserveSize/resolveSize*
getPrefix
There is currently no lookahead or backtracking, so depending on the binary format it may be difficult to make it fit. Lookahead would be relatively easy to implement with the exposed internals.
The defaults rely on Generic, but can be easily overridden. If there were a PR that explained why Generically was useful, I’d be happy to merge and re-release.
@Iceland_jack 's linked article makes a compelling case, IMHO:
In my opinion there are two issues here: 1) the reliance on DeriveAnyClass which introduces a footgun for the user and 2) DefaultSignatures force the author to provide only one way to derive instances
It also means that you can do things like using Generically as the newtype for “construct instances in the obvious way, by taking advantage of the structure of the type” but if you (for example) introduce newtypes for big-endian or little-endian encodings, it makes the overall library surface more ergonomically uniform.
Also, have you considered using an indexed monad to track the number of open holes that the user needs to go back and patch? With -XQualifiedDo this is much more ergonomic than it used to be, and you can then prevent the user from running a putter unless the putting code goes back and fills out all the space that it reserves.
Indexed Monads or linear types would be useful to track this. Even without the guard rails it’s useful, and it’s unlikely to cause a problem. The code ends up looking like this:
put l = do
sizeHandle <- reserveSize @Word64
go sizeHandle 0 l
where
go sizeHandle !n [] = resolveSizeLE sizeHandle n
go sizeHandle !n (x : rest) = put x >> go sizeHandle (n + 1) rest
or something even simpler:
putWithLength x = do
sizeHandle <- reserveSize @Word32
putWithoutLength x
resolveSizeExclusiveLE sizeHandle
You can use package generically to get a compatibility shim (which re-exports the newtype from base when available, so it’s safe to unconditionally depend upon).
Good call on linear types - I had forgotten about them but they’d be much easier to use and probably more performant to compile than the typelevel gymnastics needed to track it all in an indexed monad. And if you’re only targeting GHC >=9.2, you should have workable linear types available to you.