Botan bindings devlog

ApothecaLabs · July 5, 2023, 6:17pm

I’ve made some significant progress over the last few hours. In short order, I have:

Sussed out the M1 macOS issues
Used sdl2 as an example of good C FFI
Created a new botan.cabal that uses pkgconfig-depends: botan-3 >= 3.0.0 instead of embedding botan_all.h and botan_all.cpp via monolithic --amalgamation and the Custom.hs build script
Ditched CPP and #include <botan/ffi.h> in favor of includes: botan/ffi.h in botan.cabal
Got it to print out the Botan 3 version string from FFI!

With this being a good step forward, I will keep moving in this direction.

Question: I would like to keep a devlog of this - I am sure to run into issues where I may need to ask more questions / discuss minutia, but I don’t wish to pollute this topic. Being a relatively recent refugee from reddit, I am unsure of the expected protocol here - shall I spin off another topic for specifically discussing this effort to create botan-3 bindings? This thread can then be updated with milestones as relevant.

chreekat · July 5, 2023, 6:31pm

Hi @ApothecaLabs ,

Just keep writing! A mod can split this into a new discussion on your behalf. (I flagged it so a mod will see it.)

neil.mayhew · July 5, 2023, 6:31pm

I think it would be helpful to split it off into a new topic, but I for one am following this with interest.

ApothecaLabs · July 5, 2023, 8:07pm

I have created a repo, with the initial hello-world of printing botan’s version string:

I cannot promise that it works on Windows, but I have filled it with my best guess.

Something to note is that it uses the old ccall convention instead of the newer capi - attempting to do so yields some errors, but I did notice that sdl2 also still uses the ccall convention - I will need to investigate this before too long.

ApothecaLabs · July 6, 2023, 5:58pm

Minor update: Things continue to go well, and I am mostly following the botan ffi bindings in order. Thus far, I have:

Filled out the Botan.Version module, dipping a toe in the water
Completed the Botan.Exception module, to prepare for handling errors.
Begin work on the Botan.Utility module.
I have implemented Botan.Utility.botanConstantTimeCompare. It is probably my first proper / nontrivial use of unsafePerformIO, so I’ll be scrutinizing the internals of a few modules (such as Data.ByteArray) to get a better sense of it vs inlining as I continue.

The github isn’t updated with this quite yet, but I will commit and update when I do finish the Botan.Utility module.

Bodigrim · July 6, 2023, 6:42pm

Discourse does not allow me to like every post twice, so let me say that I’m extremely hyped about this development!

ApothecaLabs · July 6, 2023, 9:09pm

Allow me to return the favor - I am thrilled to have drawn interest from others!

It is a wonderful turn of circumstance for me, because it is not incidental that I have an interest in seeing good open-source cryptography bindings in Haskell. Though I am not a professional cryptographer by any means, I have had my fair share of hands-on experience with it, and I know to be careful with a dangerous tool lest I lose a finger or two.

The whole reason I took up Haskell was in order to figure out how to write some tricky code involving recursion and cryptography, and though I suspect my writing lay more heavily towards prose than formality, I do have a point that I have been steadily working towards in my blog series about recursion.

The next article involves implementing sparse merkle trees in a rather trite / concise way (because the approach is important for a greater discussion), and for that, and for that I need good cryptography bindings.

So you see, I am interested in more than just bindings to libraries - there are specific things that I wish to do with it

ApothecaLabs · July 8, 2023, 6:33pm

The Botan.Version, Botan.Error, and Botan.Utility modules are now functionally complete, and the repo has been updated.

This includes:

Version string, and major / minor / patch numbers
Error codes, exception types, and throwing methods
Constant time equality checking and hex encoding / decoding methods

This is giving me a good hands-on idea of how to handle Botan’s buffer / return style, and although I’d like to complete one or two of the crypto modules first, at this point I’m growing fairly confident in my ability to complete the rest of the bindings, with the caveat that I need someone to go over them too to make sure my low-level pointer / unsafePerformIO shenanigans are safe.

My next module target is probably going to be Hash.

Vlix · July 9, 2023, 2:02am

How is Botan 3 in comparison to libsodium?

I’ve seen/read someone was working on libsodium bindings (Hecate?) so might it be a better idea to focus on one library together, or do they serve different purposes?

Ambrose · July 9, 2023, 2:07am

I think C bindings are generally pretty good value. The work is mostly mechanical, and the hard part is already done. FFI just grows Haskell for cheap.

ApothecaLabs · July 9, 2023, 2:30am

Speaking as a practical user of libsodium for many years, libsodium makes algorithm choices for you, which makes for a smaller, lighter, simpler library. You don’t need to make any decisions, eg you just have hash :: ByteString -> ByteString. It aims for a simple, friendly, no-mistakes interface, at the cost of nuance - great for developing new / self-contained systems, but not so great for interacting with something that doesn’t use libsodium.

I haven’t gotten too far into it (after all, I’ve just gotten started with the bindings) but Botan is a bit more heavyweight. It is slightly more state-y, and it offers choices of algorithm and things like full control over incremental hashing, but it still tries to offer a consistent interface, rather than just collecting algorithms in a pile. So far, it seems sensible.

ApothecaLabs · July 9, 2023, 4:30am

After a bit of help with some puzzling pointers from some wonderful people, I have the initial pieces of the Botan.Hash module functioning! I learned quite a bit about ForeignPtr today, and put it to good use:

We have:

A hash object can be initialized by algorithm name.
It can be queried for its name (still in CString format), and its digest length in bytes.
It is destroyed automatically when garbage collected.

This places us rather close to having our first-hash milestone!

Kleidukos · July 9, 2023, 7:50am

I do write libsodium bindings, but in the context of the Haskell Cryptography Group, where all cryptography libraries have their place.

Vlix · July 9, 2023, 3:43pm

Ah, thanks for the explanation. Sounds like a good addition, then I’ll be checking out your progress when you share it!

ApothecaLabs · July 9, 2023, 9:13pm

We have first hash

And a quick sanity check for comparison:

hasufell · July 10, 2023, 3:37am

How do we deal with APIs where the secrets leak into Haskell land, e.g. because that’s how the API is designed?

As long as everything is on the C/C++ side, we can be somewhat confident it’s secure (written by experts and audited).

I think this is a general issue with crypto bindings, especially if they are low-level.

ApothecaLabs · July 10, 2023, 1:36pm

My mind goes about ten different ways when I want to answer this. I have a lot to say about this, because it is an unassumingly hefty question. The important parts are:

The answer to the problem of secrets leaking is that it really is an application-level (or even os- and machine- level) problem. Libraries can’t solve it; cryptographic purity is not unlike functional purity - the surrounding context must restrain itself from reaching in, and there is very little that the cryptographic primitives themselves can do about it. Cryptographic purity is something that people are only recently willing to make the orders-of-magnitude-efficiency tradeoff required to do it properly, and although some day, we’ll give a yawn as we spin up full homomorphic encryption sandboxes for everything, right now our existing computer ecosystems from the transistors up are still built to take advantage of sharing and shortcuts for efficiency above all else, because our architectures are still built based on an era where fundamentally, that is what it took.

It is.

That’s the thousand-foot perspective, anyway.

More realistically, Haskell’s crypto problem in particular is that it is easy for things to accidentally stay alive due to buildup of thunks; C and C++ are just as vulnerable to forgetting to free something if you’re handling allocations manually, which you’ll notice that cryptographic primitives require you do.

People used to write assembly and punchcards by hand, people used to write threading and graphics by hand, we still do cryptography ‘by hand’, in large part because our systems haven’t been adapted to these needs yet.

Cryptographic operations should be performed as atomically as possible, with everything zeroed and destroyed immediately after it is no longer necessary, rather than relying on the garbage collection.

Long-lived cryptographic data should be kept to a minimum, and should only be things like immediate session keys rather than the passwords used to generate said keys. It is better to ask for a password or cert store access again, than keep it in memory for the lifetime of the app. More and more, the OS is handling this sort of thing. For the duration that it is in memory, it should be treated specially, as to avoid accidentally printing it into something.

In part, this is why I was using memory, for ScrubbedBytes, but I’ll have to re-implement something like it for the botan bindings.

A lot of the practical answers to this will lie in how we design our higher-level API, which will probably involve constructs akin to withStoredPrivateKey keyRef $ \ key -> ..., and we’ll have to build stores and atomic cryptographic operations to guide the user away from ever handling raw / exposed keys and such.

chreekat · July 10, 2023, 1:59pm

Damn, so it sounds like there’s fertile ground for researching how to implement good memory-hygiene primitives for lazy functional languages!

ApothecaLabs · July 10, 2023, 2:16pm

In my mind, it is something like IO or STM, but a completely separate branch, being unable to perform side effects like print or even unsafePerformIO, and where secrets and keys cannot be returned directly, only their references. Ideally, operations should be provably halting, and should do as little as possible before returning control and the plaintext / result back to the user. Even the foreign call bindings could use this Crypto monad / DSL instead of IO.

This is mostly me summing up my internal model rather than dictating any formal structure, however

Probie · July 10, 2023, 2:28pm

I’ve always wondered if one can (ab)use a combination of unlifted types and linear types to get some safety guarantees. That would prevent things like accidentally holding onto a key when signing, since the data type holding the signature is unlifted (and therefore there is no thunk that might contain the key), and linearity can be used to make sure that the secrets don’t accidentally go any place they’re not meant to.