Botan bindings devlog

Allow me to return the favor - I am thrilled to have drawn interest from others!

It is a wonderful turn of circumstance for me, because it is not incidental that I have an interest in seeing good open-source cryptography bindings in Haskell. Though I am not a professional cryptographer by any means, I have had my fair share of hands-on experience with it, and I know to be careful with a dangerous tool lest I lose a finger or two.

The whole reason I took up Haskell was in order to figure out how to write some tricky code involving recursion and cryptography, and though I suspect my writing lay more heavily towards prose than formality, I do have a point that I have been steadily working towards in my blog series about recursion.

The next article involves implementing sparse merkle trees in a rather trite / concise way (because the approach is important for a greater discussion), and for that, and for that I need good cryptography bindings.

So you see, I am interested in more than just bindings to libraries - there are specific things that I wish to do with it :slight_smile:

5 Likes

The Botan.Version, Botan.Error, and Botan.Utility modules are now functionally complete, and the repo has been updated.


This includes:

  • Version string, and major / minor / patch numbers
  • Error codes, exception types, and throwing methods
  • Constant time equality checking and hex encoding / decoding methods

This is giving me a good hands-on idea of how to handle Botan’s buffer / return style, and although I’d like to complete one or two of the crypto modules first, at this point I’m growing fairly confident in my ability to complete the rest of the bindings, with the caveat that I need someone to go over them too to make sure my low-level pointer / unsafePerformIO shenanigans are safe.

My next module target is probably going to be Hash.

5 Likes

How is Botan 3 in comparison to libsodium?

I’ve seen/read someone was working on libsodium bindings (Hecate?) so might it be a better idea to focus on one library together, or do they serve different purposes? :thinking:

1 Like

I think C bindings are generally pretty good value. The work is mostly mechanical, and the hard part is already done. FFI just grows Haskell for cheap.

4 Likes

Speaking as a practical user of libsodium for many years, libsodium makes algorithm choices for you, which makes for a smaller, lighter, simpler library. You don’t need to make any decisions, eg you just have hash :: ByteString -> ByteString. It aims for a simple, friendly, no-mistakes interface, at the cost of nuance - great for developing new / self-contained systems, but not so great for interacting with something that doesn’t use libsodium.

I haven’t gotten too far into it (after all, I’ve just gotten started with the bindings) but Botan is a bit more heavyweight. It is slightly more state-y, and it offers choices of algorithm and things like full control over incremental hashing, but it still tries to offer a consistent interface, rather than just collecting algorithms in a pile. So far, it seems sensible.

2 Likes

After a bit of help with some puzzling pointers from some wonderful people, I have the initial pieces of the Botan.Hash module functioning! I learned quite a bit about ForeignPtr today, and put it to good use:

We have:

  • A hash object can be initialized by algorithm name.
  • It can be queried for its name (still in CString format), and its digest length in bytes.
  • It is destroyed automatically when garbage collected.

This places us rather close to having our first-hash milestone!

5 Likes

I do write libsodium bindings, but in the context of the Haskell Cryptography Group, where all cryptography libraries have their place. :slight_smile:

2 Likes

Ah, thanks for the explanation. Sounds like a good addition, then :grin: I’ll be checking out your progress when you share it!

1 Like

We have first hash :smiley:


And a quick sanity check for comparison:

8 Likes

How do we deal with APIs where the secrets leak into Haskell land, e.g. because that’s how the API is designed?

As long as everything is on the C/C++ side, we can be somewhat confident it’s secure (written by experts and audited).

I think this is a general issue with crypto bindings, especially if they are low-level.

1 Like

My mind goes about ten different ways when I want to answer this. I have a lot to say about this, because it is an unassumingly hefty question. The important parts are:

The answer to the problem of secrets leaking is that it really is an application-level (or even os- and machine- level) problem. Libraries can’t solve it; cryptographic purity is not unlike functional purity - the surrounding context must restrain itself from reaching in, and there is very little that the cryptographic primitives themselves can do about it. Cryptographic purity is something that people are only recently willing to make the orders-of-magnitude-efficiency tradeoff required to do it properly, and although some day, we’ll give a yawn as we spin up full homomorphic encryption sandboxes for everything, right now our existing computer ecosystems from the transistors up are still built to take advantage of sharing and shortcuts for efficiency above all else, because our architectures are still built based on an era where fundamentally, that is what it took. :grimacing:

It is.


That’s the thousand-foot perspective, anyway.

More realistically, Haskell’s crypto problem in particular is that it is easy for things to accidentally stay alive due to buildup of thunks; C and C++ are just as vulnerable to forgetting to free something if you’re handling allocations manually, which you’ll notice that cryptographic primitives require you do.

People used to write assembly and punchcards by hand, people used to write threading and graphics by hand, we still do cryptography ‘by hand’, in large part because our systems haven’t been adapted to these needs yet.


Cryptographic operations should be performed as atomically as possible, with everything zeroed and destroyed immediately after it is no longer necessary, rather than relying on the garbage collection.

Long-lived cryptographic data should be kept to a minimum, and should only be things like immediate session keys rather than the passwords used to generate said keys. It is better to ask for a password or cert store access again, than keep it in memory for the lifetime of the app. More and more, the OS is handling this sort of thing. For the duration that it is in memory, it should be treated specially, as to avoid accidentally printing it into something.

In part, this is why I was using memory, for ScrubbedBytes, but I’ll have to re-implement something like it for the botan bindings.


A lot of the practical answers to this will lie in how we design our higher-level API, which will probably involve constructs akin to withStoredPrivateKey keyRef $ \ key -> ..., and we’ll have to build stores and atomic cryptographic operations to guide the user away from ever handling raw / exposed keys and such.

7 Likes

Damn, so it sounds like there’s fertile ground for researching how to implement good memory-hygiene primitives for lazy functional languages!

5 Likes

In my mind, it is something like IO or STM, but a completely separate branch, being unable to perform side effects like print or even unsafePerformIO, and where secrets and keys cannot be returned directly, only their references. Ideally, operations should be provably halting, and should do as little as possible before returning control and the plaintext / result back to the user. Even the foreign call bindings could use this Crypto monad / DSL instead of IO.

This is mostly me summing up my internal model rather than dictating any formal structure, however :slight_smile:

2 Likes

I’ve always wondered if one can (ab)use a combination of unlifted types and linear types to get some safety guarantees. That would prevent things like accidentally holding onto a key when signing, since the data type holding the signature is unlifted (and therefore there is no thunk that might contain the key), and linearity can be used to make sure that the secrets don’t accidentally go any place they’re not meant to.

4 Likes

Linear types are primarily intended for safety and unlifted types are partially intended for more safety, so I think it would count as a use and not an abuse.

3 Likes

A lot of the practical answers to this will lie in how we design our higher-level API, which will probably involve constructs akin to withStoredPrivateKey keyRef $ \ key -> ... , and we’ll have to build stores and atomic cryptographic operations to guide the user away from ever handling raw / exposed keys and such.

@cdepillabout and I keep the aforementioned concerns in mind when we add to or change anything in the password library.
The Password type explicitly has only one way of getting to the actual string of characters, which is the very obviously named unsafeShowPassword function.

I’m also very hesitant to allow a ToJSON instance for these exact reasons.

If anyone with crypto expertise would like to check out the library and advise on improvements, that’d also be very appreciated.

3 Likes

The repo has been updated with the following:

  • Completed low-level bindings for Botan.Error, Botan.Utility, Botan.Hash modules
  • Added missing Botan.Error error codes, botanErrorDescription function
  • Implemented missing Botan.Utility functions for memory scrubbing, base64 encoding and decoding
  • Implemented missing Botan.Hash functions for copying state, block size
  • Added flags for lower-case hex encoding
  • Removed memory dependency - see issue

Its creeping towards minimally functional, though low-level it may be - it will clean up nicely with higher-level bindings later. I think my next target may be the RNG module low-level bindings.

5 Likes

Another day, another module! This time, it’s Random number generators!

The repo has of course also been updated, with the following:

  • Added Botan.Random (the RNG interface)
    • The Random random number generator opaque type
    • The RandomType type for specifying the type of Random
    • randomInit and randomInitWith functions to create a Random
    • randomGet and systemRandomGet functions for getting n bytes of random.
    • randomReseed and randomReseedFromRandom reseeding functions
    • randomAddEntropy function for adding your own bytes of entropy
    • NOTE: botan_rng_init_custom function is not implemented. It looks complicated, for now.
  • Switched a few alloc to malloc because long-lived references were being freed*
  • Added pure hashWith convenience method

* If someone could sanity check the ForeignPtr initialization pattern I’m using, I would appreciate it.


If there are any particular modules that you would like me to tackle next, or if you would instead like me to focus a little more on developing some higher-level bindings to Hash / Random, please let me know. Otherwise, I’m following the Botan FFI header which means that message authentication codes aka MAC would be the next module. I’m open to listening to the community on this :slight_smile:

8 Likes

This is great progress! I’m really enjoying these posts!

4 Likes

The sun has risen again. You know what that means. The repo has been updated again.

  • Added Botan.Mac module for Message Authentication Codes
    • Construction of Botan mac type strings (eg, “HMAC(SHA-256)”, “SipHash(2,4)”) is badly-documented.
    • Some MACs require nonces, but this is also badly documented.
  • Refined handling of foreign pointers of botan objects / initializers / finalizers
    • Accurate types for opaque struct, pointer, and foreign pointer.
    • Uses FinalizerPtr type
    • Avoids an extra pointer indirection
    • Still fiddling with standardizing the nomenclature, only applied to Botan.Mac
      • Botan.Hash and Botan.Random still need retrofitting
    • Thanks to @glguy for help

Today’s efforts are going to be focused on updating the foreign pointer handling in Botan.Hash and Botan.Random, and getting a better sense of what needs to be standardized - eg, whether I will continue with straight 1:1 bindings, or instead do some encapsulation, considering:

  • Things like the hash / mac / rng type strings. macInitName "HMAC(SHA-256)" are both incredibly simple, yet incredibly awkward, and need constants at the very least.
  • Checking for algorithm support is also awkward (read, effectively non-existent)so I need to pick a method of handling it, and standardize that as well.

So basically a grab-bag of things that I want to take care of before we get enough modules to make them difficult.

Also, by line count, we are almost 1/4 of the way through the botan ffi header! Things do get denser towards the bottom, especially with x509 certs, but still… :partying_face:


@david-christiansen
I’m glad you are! Tough work is so much less of a slog when there are people rooting for you along the way, and the community’s response has really been keeping me going :slight_smile:

6 Likes