[ANN] hpgsql, a pure Haskell PostgreSQL driver (no libpq)

It’s a pleasure to announce hpgsql, a PostgreSQL driver written in pure Haskell (no libpq), with an API largely inspired by the great postgresql-simple library, but featuring:

  • Usage of PostgreSQL’s binary protocol
  • Query arguments passed via the protocol instead of being escaped into the query string
  • Pipelining
  • Prepared statements
  • Ability to stream query results directly from the socket (not just with cursors)
  • Interruption safety, except for very specific (and documented) edge cases
  • Thread safety, unless specific (and documented) instructions say otherwise
  • A SQL quasiquoter like the one in postgresql-query and hasql-interpolate

Here’s an example of a pipeline mixing streams, prepared and non prepared statements:

f :: Int -> IO (Stream (Of Aeson.Value) IO ())
f val = do
  (updateTbl :: IO (), aggRes :: IO (Only Int), largeResults) <-
    runPipeline conn $
      (,,)
        <$> pipelineExec_ [sql|UPDATE tbl SET val=#{val}|]
        <*> pipeline1 [sql|SELECT SUM(val) FROM tbl|]
        <*> pipelineSWith
          (rowDecoder @(Vector Int, Vector Text))
          -- We use a prepared statement for the query below
          [sqlPrep|SELECT x, y FROM tbl|]
  updateTbl
  Only total <- aggRes
  Streaming.map Aeson.toJSON <$> largeResults

Also, I am maintaining hpgsql-simple-compat, a fork of postgresql-simple that preserves its API as much as I could make it, but with internals rewritten to use hpgsql. The idea is to provide a simpler migration path from postgresql-simple; one that allows a smaller initial changeset and then the possibility of migrating queries to hpgsql one at a time. I have even migrated a CLI tool of mine, codd, as an example.

hpgsql-simple-compat is not in Hackage as I wasn’t sure duplicating the search space with modules, types, and functions already in postgresql-simple would annoy users. It’s in the “hpgsql-simple-compat” folder in the repository.

Some initial benchmarks show hpgsql can materialize rows from a query in ~38% the time postgresql-simple takes, and ~70% the time hasql takes (on my computer, Linux x64, GHC 9.10.3, compiled with -O1). Peak memory usage is trickier to analyze, and I therefore welcome people that know more to read the benchmarks page and help me better understand them. Also, please scrutinize these benchmarks as much as you can.

I want to encourage and welcome contributions, bug reports, questions, suggestions. Nothing’s off the table: what would you want or what do you need from such a library?

I really want to hear from the community, both those eager to switch and potential future users.

26 Likes

Nice, I wrote a very naive pure Haskell implementation (GitHub - chrisdone-archive/pgsql-simple: A mid-level client library for the PostgreSQL database, intended to be fast and easy to use. · GitHub) back in the day before I knew anything about writing performant Haskell. I’d been recently perusing the protocol documentation again, thinking I’d do a fresh ground up implementation for fun, try for zero-allocation message handling I’d possible, and really take advantage of the implementation being in Haskell.

Nice to see someone take a swing at it!

I’m curious, what’s the thread safety situation?

1 Like

That’s neat, and thanks!

As for thread safety, multiple threads sharing a connection will block until all results from previous queries/pipelines are fully consumed (or until a query errs), but there’s a bit more to it than that, like runPipeline requiring the same thread that sends a pipeline to be the thread that consumes the results of all the statements, and the streaming query/pipeline functions require the same thread that send the query/pipeline to be the thread that consumes the Stream fully (or until an error).

I thought these might be reasonable constraints, although I did try a more relaxed version of runPipeline’s constraints to begin with, but was unable to get something that didn’t run into deadlocks in some situations. I can get into more detail if you’re interested!

1 Like

This is cool, thank you for creating this!

I have a question about the semantics. Is it always safe to reuse connection unless I got IrrecoverableHpgsqlError? The docs are not 100% clear, e.g. this:

It is possible Hpgsql throws a different kind of exception. File a bug report if that happens, and if you know it came from Hpgsql, treat it like a IrrecoverableHpgsqlError.

But most of the time I have no idea where the exception came from. And in withTransactionMode it looks like IrrecoverableHpgsqlError might get lost if async exception arrives between tryAny and throw, which (unless I’m misreading the code) will break the promise.

(Also these TODOs are suspicious)

Thank you, Yuras! I think you were in fact the person that taught me a lot about asynchronous exceptions in a postgresql-simple PR review. I learned a lot then, so thanks!

I have a question about the semantics. Is it always safe to reuse connection unless I got IrrecoverableHpgsqlError? The docs are not 100% clear, e.g. this:

Oh, a different type of exception would be considered a bug in the library. I did try to wrap all potentially exception throwing code in a way that turns them into a IrrecoverableHpgsqlError precisely because otherwise it’s impossible to know what to do with the connection.

Since it’s the first release, though, I wouldn’t be surprised if something escaped me. Still, take this as a project goal commitment.

And in withTransactionMode it looks like IrrecoverableHpgsqlError might get lost if async exception arrives between tryAny and throw

I think I see what you’re saying. Very nice catch! I created a github issue for this, and will address it as soon as I can.

(Also these TODOs are suspicious)

Yes, there are a few warts I want to address in the codebase. This one is a potential problem “only” when opening new connections, so I thought I could do a first release before addressing them. I did take a lot more care once a connection is opened, and I have stress tests that have helped me catch and hone parts of the code.

1 Like

Wow, I’m happy to hear this!

a different type of exception would be considered a bug in the library.

I see. This is an interesting approach. Though I think you’ll fight an uphill battle maintaining this promise.

Do you think your project is not big enough to play with linear types and the new Pure Borrow?

1 Like

I was quite tempted to use linear streams in the public API, because the natural requirement that emerges from postgresql’s protocol is that query results must be consumed fully or until an error.

However, there is one use-case that I think prevents me from doing that: query cancellation.

streamResults <- queryS conn "select ..."
-- Force Stream until some criteria are satisfied, then cancel the query
cancelActiveStatement conn False

IIUC (but I have very little practical experience with linear types!) the user would then have to force the stream to get an exception, or cancelActiveStatement would have to take the stream as an argument, or other similar API choices/changes? And something similar would happen to pipelines.

But you can tell I didn’t explore the space too much.

I also haven’t read about pure borrow yet, so I’ll owe you on that one.

You can use more performance-optimized things with them, so I was interested in asking. I’ll also note that it’s not necessary to rework the high-level logic specifically for linear types; rewriting all the internals is enough, in my opinion.

1 Like

Ah right, rethinking internals could be super nice. Do you have any ideas? I mean, even if you’re not familiar with the codebase, have you imagined some kind of recv buffer optimization, or anything else, for that matter? I think this would be a great learning experience for me, and I understand putting in effort might not be on your plate, so please feel free to suggest things even at a high level - I can read/learn and see if there’s something that can be done. Of course, if you wish to contribute more directly, I’m also very much open to that.

That is really great. I’ve started working on pure Haskell PostgreSQL driver myself but will happily contribute if needed instead.

PostgREST is considering moving away from hasql in the long run: Vendor hasql · Issue #4823 · PostgREST/postgrest · GitHub - vendoring it is supposed to be the firs step. But I’ve just proposed to possibly skip it and give your library a try first.

1 Like

I would love to take contributions! I did my best to have a good CI pipeline, test suite and tooling (better summarized in the repo’s Readme), so please feel free to create issues and PRs.

I wrote in that thread about hpgsql’s current limitations too, mostly the lack of auth methods and TLS encryption. Thanks for pointing it out to me.