[ANN] Hindsight: Typesafe, Evolvable Event Sourcing

I’m excited to announce Hindsight, an event sourcing library I’ve been working on for a while.

Announcement blog post

GitHub repository

Documentation

Some of you might remember my Munihac 2025 presentation. I am happy to say that the API is a lot cleaner, and that we now offer extensive documentation and tutorials. While there is still a lot of work to do in key areas (such as e.g. observability), I believe the library (or rather libraries) are mature enough to be presented to general community.

There is no Hackage release yet: my plan is to gather some initial feedback from you all before tagging the initial release(s).

Happy hacking !

14 Likes

This looks awesome! I only skimmed the announcement blog post but I’m really excited for automated migrations. I’ll have to look at how it’s done

This looks great!

I don’t need an event-sourcing framework at the moment, but this caught my eye:

event versioning a compile-time concern, by separating the definition of an event (identifed by a typelevel Symbol) from that of its successive payloads. By default, migrations are handled automatically through upcasting of successive versions

I have been looking for such a solution recently. In my case, I just want a stable JSON file format, that I can guarantee backwards compatability for. There is also GitHub - Vlix/safe-json: Automatic versioning of JSON formats for Haskell data types (with backwards compatibility), which offers similar functionality. Do you know how your solution compares to this?

I think you should consider splitting off a library just for this, since it’s useful on its own.

2 Likes

Thanks for that feedback, splitting this versioning thing into a separate library is indeed an interesting idea. I had not considered it, but it shouldn’t be too hard.

I had a cursory look at safe-json. Here is what immediately strikes me:

  • The surface API is different:safe-json ties consecutive version together via associated type families and the behavior of migrations depends on runtime evaluation (the SafeJson typeclass has two methods, version and kind which control how it behaves). Hindsight’s API is almost entirely typelevel, save for the definition of the actual migration function.
  • safe-json handles supports downcasting / reverse migrations. This is not currently something I support, although I do think it would be valuable for a few reasons.
  • Hindsight has a (currently badly advertised) feature: automatic test generation (roundtrip and golden tests) for all the versions of your events, fully automatically. Just define Arbitrary instances for your payload, call createGoldenTests "my_event" and voilà ! Here is a tasty TestTree for you. GHC will yell at you if you forget to define Arbitrary for your new versions. You can read about it in the Haddock documentation.

Hi, creator and maintainer of safe-json here. :slight_smile:

I’m always on the lookout for event sourcing libraries and the likes, so I was happy to see another stab at it. I WAS surprised to see almost the same functionality as safe-json included in there.
I do appreciate that it is a bit batteries-included in that sense, but what stuck out to me, was the lack of versioning inside the JSON itself. This means that it is completely up to the user to make sure that something expecting the JSON of V2 does NOT get a V1, or even the other way around (but the latter is the reverse migrating you’ve mentioned that isn’t supported)

This would mean the user might need to use safe-json to make sure events in-transit are always parsed correctly, and then also the versioning of hindsight to make sure the event sourcing versioning works correctly.
It’d be great if you wouldn’t need an entire versioning section/API in hindsight if safe-json could pick up the slack, right? It would be a boon to safe-json and less code to manage for hindsight.

The safe-json code was very muched styled to the safecopy package’s way of handling versioning. And as such has some quirks that I also didn’t really like, but we needed a way to version JSON, so I made due.
I’d love to find a way to make safe-json’s API better and if possible also correct at compile-time, so if you could give your opinion and maybe advice in the safe-json GitHub issues, that would be very much appreciated :grin:

Hi there @Vlix !

Thanks for chiming in. :slight_smile:

You’re right that the version and event name are treated as external metadata rather than being included in the JSON. There’s a good reason for this: the storage format for these items depends entirely on the backend. For example, PostgreSQL stores them as dedicated columns with indexes for fast retrieval (though it could probably also be done with a single BSON document and GIN indexes—I just opted to keep things more standard SQL). In the filesystem store, however, they are included in the JSON. It really depends on the implementation.

Hindsight’s versioning system makes several design choices that deviate significantly from safe-json, so adopting it would require major changes. I also have some ideas for future developments that might diverge from safe-json’s core mission, so I’m not sure how much we could truly unify the approaches.

That said, I want to keep an open mind. The Haskell ecosystem is small enough that we should consider how to reduce duplication. I’ll try to free up some time to dig deeper into safe-json and articulate my thoughts more clearly so we can open up a discussion. Maybe we can meet each other halfway.

1 Like

Should I understand this as: the version (and event name) have to be included “next to the event” when in transit? Like, in HTTP/AMQP headers, or some such? Otherwise, the parsing might run into issues when only relying on the FromJSON instances.

What it really means is that it is the responsibility of the storage backend to persist the event version along with the JSON in some way so that this information is available when it needs to parse the event. The parsing + automatic upgrade would typically be performed internally by the event store implementation though, before the event is delivered to subscription handlers. Does that make sense ?

1 Like

It does make sense, for the storage and versioning part. But I guess the aspect I’m missing in this “user story”, if you will, is that event sourcing isn’t only storage.

Events also get sent from service to service (or to/from clients) and I imagine, for example, an event queue that might have one version of an event already in there, and then when some services start updating, there’ll be a combination of old versions and newer versions in the same queue.
How is the consumer (which we’ll assume has been updated and won’t have to “downgrade” newer versions to older ones) to know which version the next event it will handle has?

E.g. One of the issues I see, is that if you handle an older version as if it’s a new one, it might parseJSON correctly, because it’s ignoring the old field. And the old field might have information that would be used to add info to other fields had it been read as, and upgraded from the older version.

Ah, I think I get what you’re getting at. I believe there is a fundamental difference of mental models at play here.

Hindsight is an event sourcing library, not a messaging / streaming layer (à la Kafka or RabbitMQ). Events don’t travel as “bare messages” between services: they are consumed via subscriptions, and never as raw JSON. The deserialization is handled by the subscription mechanism, and informed by the colocated metadata, which is never “lost”. Subscription handlers receive deserialized Haskell values.

More concretely: when you call subscribe, the event store retrieves events along with their metadata (event name, version, timestamps, etc.). The subscription infrastructure uses this metadata to parse the JSON payload at the correct version and automatically upgrade it to the latest version before delivering it to your handler. Your handler receives a fully-typed EventEnvelope with the current payload type - no raw JSON involved.

The store is the authoritative source that mediates all access to events with full version information, not a mere transport pipe. If subscriptions witness an event version they cannot parse, they willl error out (outdated writers are not that much of a problem: the events they write will just be updated as needed, even if a new version is in fact defined elsewhere).

If someone wanted to integrate Hindsight with message queues - say, subscribing to a Hindsight store and then publishing events to RabbitMQ - that would be application-level code. How you
serialize events for transport in that scenario (embedding version in JSON like safe-json, using AMQP headers, etc.) would be up to you. But that’s outside Hindsight’s core scope.

Now, whether that’s a concern that Hindsight should address is fully open for discussion ! I personally don’t think so, but I am willing to be convinced.

Does that clarify the situation?

2 Likes

Ah, now we got it! I indeed understand now what the goal and application of Hindsight is.
So the JSON instances are only there for committing to and retrieving from storage, right?

I guess my next question would be, why the library depends on JSON (i.e. aeson) at all? It feels like it would depend on the storage mechanism used whether to serialize as JSON or not. Maybe that’s something for a next major version update? That the serialization method can be factored out to make it more modular, so it can depend on the storage mechanism? Or what the user would prefer?
(i.e. adding an associated type to the EventStore class, like StoreSerializationType backend, or something like that? And a class EventSerialization or something? :person_shrugging: )

For example, the memory store doesn’t really need an aeson dependency for it to work. It only needs it because the core has integrated it into what an event payload has to be. That should be something you could factor out to be of any type the user would like to use, right? (and would be nice to have the JSON option readily available, and maybe the default if possible)
But that way users can opt how to do their own serialization.

I realize I’m basically making a user story for a feature request, but I feel like the library would benefit from this option. I think then it should also be easier and more obvious in how to use hindsight and safe-json in tandem. :thinking: One is for storage and keeping state, the other more for in-transit and parsing at the edges of your program.