Botan bindings devlog

ApothecaLabs · July 25, 2023, 4:21pm

Today’s update is small, but concerns an important thing. Up until now, I’ve been relying on the garbage collector to clean up references at its leisure, but it is good security to destroy sensitive objects as soon as you are done with them. I have also implemented FPE because it was low-hanging fruit.

Changes:

Created immediate destructor functions for objects eg cipherDestroy
Created functions using bracket that perform cleanup immediately (or as immediately as “The resource will be released as soon as possible” means), eg withCipher.
Implemented Botan.FPE module (Format Preserving Encryption), (untested)

The repo has been updated.

ApothecaLabs · July 26, 2023, 7:24pm

Update time!

In contrast to the previous minor update, this one is quite sizable I wanted to get at least the direct bindings / foreign calls done, and on that I feel I succeeded. In fact, I got so much done, that it is now easier to talk about what is left to do.

There are a scattering of functions that are still missing foreign calls and implementations, mostly related to X509, ZFEC, as well as the optional view functions, and a bunch of missing algorithm name constants - but otherwise, the the lowest-level raw bindings are about 95% complete, and the low-level non-idiomatic-Haskell layer is about 80% complete, pending those things.

Changes are:

Implemented loading functions for Diffie-Helmann, DSA, ElGamal, and RSA public and private keys
Implemented most of the remaining functions in Botan.PubKey, (failed) attempts to get view functions working
Added documentation to Botan.Hash, Botan.Random, made Botan.Random more consistent, minor notes and cleanup
Implemented Botan.PubKey.Ed25519 module
Implemented Botan.PubKey.KeyAgreement module
Implemented Botan.PubKey.KeyEncapsulation, allocBytesWith function
Implemented Botan.PubKey.X25519 module
Minor documentation / organization / missing foreign calls
Implemented Botan.SRP6 module
Started implementing Botan.X509 module, implemented foreign calls and some functions
Started implementing Botan.ZFEC (non-exposed, zfecEncode works but zfecDecode causes segfaults)
Added stub modules for Botan.PBKDF, Botan.Scrypt which contain only deprecated-because-generalized functions

There’s still a lot to do, but this library is rapidly approaching a level of completion and consistency that will soon be worth versioning and publishing to hackage, even if it is just the lower-level bindings!

ApothecaLabs · July 29, 2023, 6:27pm

Today’s update is fairly simple but sizeable. It consists mostly of two things - finishing up most of the X509 functions, and a lot of reorganizing under the hood. As part of final prep / shakedown for the first proper version release, I’ve split the library into multiple pieces, to keep raw bindings, low-level, and high-level implementations separate. That way, if you want to use the bindings but disagree with some of the higher-level implementations, you are free to use the lower-level libraries and implement your own!

Changes are:

Finished implementing Botan.X509 (mostly)
Split into multiple libraries
- botan-bindings for raw ffi
- botan-low for low-level, unidiomatic ByteString + IO interface
- botan for high-level idiomatic interface with type safety / referential transparency
Made some constant names a lot friendlier in Botan.Low

Remaining -bindings and -low issues are mostly algorithm constants and a few scattered functions, and a thorough testing to ensure that functions are implemented correctly. I’ve also been working on getting zfecDecode remaining, but it is proving to be finnicky as its output parameters are ill-described and involve either a contiguous 2D array, or an array of arrays, and it isn’t clear what the layout is.

The repo has been updated.

ApothecaLabs · August 1, 2023, 5:18pm

It’s update time again!

Today’s update is a bit of documentation,

botan-bindings has been more or less documented, first pass.
- Each function’s documentation should at least contain its C declaration
- Most functions have documentation cloned from the Botan C FFI header file
  - The correctness of the documentation is sometimes suspect, as some of it
    is taken from or inferred from the C++ documentation.
botan-low has had the module-level documentation added, and a few modules have been documented

In addition, we have the first sneak peek at the botan high-level bindings!

Implemented Botan.Prelude, Botan.Error, Botan.Utility, Botan.Version high-level idiomatic bindings
First stab at implementing Botan.Hash with high-level API design
- Implemented ‘pure / referentially transparent’ versions of hash functions
- Implemented HashSpec data type
- Implemented prototype Hash and IncrementalHash classes with data families
- Can use the HashSpec data type with the generic HashCtx and HashDigest, or can use the data families and get strongly-typed Ctx a and Digest a.
- Implemented MD5 as a hash / data family

The naming is a mess, and any of it may be changed wildly at any moment since I’m still shaking it down, but here’s what I’ve got:

The first nice thing is that there is now a HashSpec datatype for supported hash algorithms, based on Z-Botan's HashType as a reference (and saving me a lot of effort). This is of course far more convenient than remembering algorithm names yourself.

data HashSpec
    -- Cryptographic hashes
    = Blake2b Blake2bSize
    | Keccak1600 Keccak1600Size
    | MD4
    | MD5
    | RIPEMD160
    | SHA160
    | SHA224
    ...
    -- remainder omitted

hashCtxInitWithSpec :: HashSpec -> HashCtx
hashCtxInitWithSpec = hashCtxInit . hashSpecName

hashWithSpec :: HashSpec -> ByteString -> HashDigest
hashWithSpec spec = hashWithHashCtx (hashCtxInitWithSpec spec)

I’m thinking I might push the HashSpec down to botan-low, keeping the monotyped functionality together and leaving the polymorphic types for the higher-level botan, and I’ll be doing the same for other cryptographic primitives in the future after I decide upon a consistent handling, but this is a big leap forward in usability.

The second nice thing is the data families interface / API, which I adapted from a prior experimental project investigating different ways of class-ifying cryptographic primitives. It’s not quite the same as cryptonite, but I’ve found the data families + newtypes approach to be the most effective, albeit at the cost of some up-front boilerplate for the newtypes. It’s nice, because it enables type-applications to control the algorithm selection.

-- The data families and classes

data family Ctx a
data family Digest a

class Hash a where
    hashWithCtx :: Ctx a -> ByteString -> Digest a

class (Hash a) => IncrementalHash a where
    hashInit :: Ctx a
    hashUpdate :: Ctx a -> ByteString -> Ctx a
    hashUpdates :: Ctx a -> [ByteString] -> Ctx a
    hashFinalize :: Ctx a -> Digest a

-- MD5 implemented as an example

data MD5

newtype instance Ctx MD5 = MD5Ctx
    { getMD5Ctx :: HashCtx }

newtype instance Digest MD5 = MD5Digest
    { getMD5ByteString :: ByteString }
    deriving newtype (Eq, Ord)

type MD5Ctx = Ctx MD5
type MD5Digest = Digest MD5

instance Show (Digest MD5) where
    show :: Digest MD5 -> String
    show (MD5Digest bytes) = Text.unpack $ hexEncode bytes Lower

instance Hash MD5 where
    hashWithCtx :: Ctx MD5 -> ByteString -> Digest MD5
    hashWithCtx (MD5Ctx ctx) bytes = MD5Digest $ hashWithHashCtx ctx bytes

instance IncrementalHash MD5 where

    hashInit :: MD5Ctx
    hashInit = MD5Ctx $ hashCtxInit "MD5"

    hashUpdate :: MD5Ctx -> ByteString -> MD5Ctx
    hashUpdate (MD5Ctx ctx) bytes = MD5Ctx $ hashCtxUpdate ctx bytes

    hashUpdates :: MD5Ctx -> [ByteString] -> MD5Ctx
    hashUpdates (MD5Ctx ctx) chunks = MD5Ctx $ hashCtxUpdates ctx chunks

    hashFinalize :: MD5Ctx -> MD5Digest
    hashFinalize (MD5Ctx ctx) = MD5Digest $ hashCtxFinalize ctx

hash :: (IncrementalHash a) => ByteString -> Digest a
hash = hashFinalize . hashUpdate hashInit

It’s still a little in flux, and the Hash and IncrementalHash classes aren’t necessarily set in stone (for instance, the hashInit could belong to Hash or even a separate class entirely) but I think you get the gist. Let me know if you think this is unnecessarily complex compared to cryptonite's simple, non-data-family- Digest a and such - I want to get some design feedback before investing a ton of effort into doing it for all of the primitives.

Also, note that the HashSpec enum doesn’t interfere with our higher-level classy bindings, as the enum is simply a listing of botan-supported algorithms. We are free to define our own cryptographic schemes using the data families, and the high-level API might be split off onto its own in the future, in a sort of mirror symmetry to splitting off botan-bindings in the other direction.

That’s it for now. My next concrete goal is to create -Spec data types for other cryptographic algorithms & primitives, while I play around with the higher-level interfaces. That plus documentation should render the botan-low library more-or-less complete (minus ZFEC), though it may undergo nomenclatural overhaul.

I’ve noticed that my update pace has slowed down a little. This isn’t unexpected - after all, I have been giving updates every day or other for the last month, and while I’ve made it way further than I expected in that time, it has also been a pretty hefty pace to sustain. I feel I need to recoup some energy, and so I might take a bit of a break once I achieve the next major milestone; if this thread gets a bit quiet, don’t worry - I’m not going away, just taking a breather

david-christiansen · August 1, 2023, 5:56pm

Your pace has been astounding - a rest is well-deserved! Thanks again for the posts!

michaelpj · August 2, 2023, 8:57am

A thought: your three different libraries might be a good fit for the multiple public libraries feature in cabal, since I suspect they will probably always be released in lockstep. It also gives you a bit more freedom to rearrange things without having to use more of the Hackage package namespace.

arybczak · August 2, 2023, 12:08pm

The problem is that afaik stack doesn’t support them, so if we want to eventually start using botan in packages that currently use crypton (or make crypton use botan), it’s a no-go.

ApothecaLabs · August 8, 2023, 6:34pm

Hey all, I’m back after a nice break, and I’ve got an update! Its a bit messy - one big commit, because there was a bunch of refactoring, and the intermediate stages weren’t viable.

Here’s what’s changed:

The library hierarchy has been refined slightly, to better reflect each library’s intent.

There is now:
- botan-bindings - Raw FFI bindings
- botan-low - Low-level ByteString / IO interface
- botan - High-level idiomatic interface
- crypto-schemes - High-level backend-agnostic abstract cryptography interface
- crypto-schemes-botan - Botan backend for crypto-schemes
- I am also considering future additional libraries:
  - botanite - Frontend for drop-in cryptonite compatibility (if crypto-schemes proves insufficient)
  - botanium - A libsodium / saltine-like frontend with preselected algorithms for ultra-simple use.
The *.Hash modules are the best example of a complete vertical stack, from botan-bindings all the way to crypto-schemes-botan.
botan-bindings and botan-low are now effectively (though not perfectly) stable:
- there are still a few missing minor functions, and ZFEC is probably still broken.
- Some patterns have been removed from botan-low, just use the BOTAN_... constants from the botan-bindings layer for now.
Some nomenclature has been refined
Context objects like Hash and Cipher have been renamed to reflect their nature
- Eg, Hash → HashCtx, and hashInit → hashCtxInit
- This has been done to facilitate ergonomic naming in higher-level libraries
  - the -Spec naming format has been dropped, eg HashSpec is now just Hash
- Value objects like PrivKey and MP have not been renamed.
botan-low function names were suffixed with IO, typealiases added for algo names.
New modules PubKey.ECDH PubKey.ECDSA and PubKey.SM2 split off from PubKey
Algorithm-specific functions moved from PubKey to algorithm-specific modules
Random was renamed to RNG for consistency with Botan.
The PubKey module has not yet been, but may be renamed PK for consistency in the future, or its submodules moved from PubKey.* to PK.*.
Some lazy IO has been possibly fixed
- Some functions had laziness that I suspect was causing intermittent INSUFFICIENT_BUFFER_SPACE exceptions
- This may have been the cause of earlier ZFEC implementation failures, though the implementation now is probably incorrect due to experimentation.
Data types for algorithms are now available in the botan library
- botan-low init-with-name functions have been renamed to ...InitName...
- botan now provides ...Init functions which take an algorithm data type
- I’ve tried to make it ergonomic, and split off families as sub- data types.
  - Objects can now be initialized with data types like cipherModeInitIO (Cipher (CBC (AES AES256) PKCS7)) instead of with strings like cipherModeInitNameIO "AES-256/CBC/PKCS7"
Support for a bunch of newer (eg, post-quantum) algorithms has been added
- Dilithium
- Kyber
- McEliece
- SPHINCS+ is not working, but I’ll figure out what params it needs.
I am figuring out the ‘padding’ algorithm names (EME and EMSA) though there are still some questions and so naming is awkward. This will be refined

I’ve also been digging more into the Botan source, and have found that the FFI bindings have a few gaps.

The stream cipher capability is not exposed in botan FFI. It turns out that Z-Botan has some custom code in cbits/ that replicates this functionality.
FFI APIs for X.509 are insufficient at the moment. “For whatever reason we expose CRLs as a type, but they are barely usable. You can parse a CRL, but can’t even properly examine it, verify it, etc.”

See: FFI APIs for X.509 are insufficient · Issue #3627 · randombit/botan · GitHub
There is also X509- and TLS- related code in Z-Botan’s cbits/.
Some contexts cannot have their state copied (eg, Cipher*) and so making them referentially transparent is tricky. They do have a ‘clear’ operation, though, so as long as I only expose end-to-end operations and reset them at the end, they can be reused.

* Because things with a nonce… should only be used once…

(Linear types may be of interest here.)
Some algorithm parameters are referred to as just ‘padding’. I believe that there is EME for encryption, and EMSA for signing, but there may be some overlap.
A bunch of functions (especially PK-related) take an unused flags parameter, and we can just pass in 0.
There is no constant for some things, especially default flags.

This is all stuff I’ll be focusing on in the future.

Between the Z-Botan source code, the FFI docs and source code, and the C++ docs and source code, I am slowly figuring it all out. botan-bindings and botan-low are now effectively (though not perfectly) stable, and the botan library is beginning to take shape.

The repo has been updated.

I have enjoyed working on this set of libraries over the past month, having had the temporary luxury of focusing on it more or less full-time, and I would quite like to continue doing so on a more long-term basis.

There is still quite a lot of work to get everything stable and to write unit tests and tutorials and such, and it will still require the effort of several months before everything is production-grade quality.

However, I do not have the runway for that - at least, not without help.

I’ve worked on this for the last month, putting myself out here as a show of good faith. I would like to bring to the community a proposal for what I would like to achieve over the next three months, and discuss the means of funding it. If I have community support, I can continue to focus on this project indefinitely (or at least until it is done), and I would very much like to.

Please, let me know what you think of this, and in the meantime, I will get to writing the proposal and setting up an official project page.

jackdk · August 8, 2023, 11:05pm

It’s a tough one; I faced this dilemma with amazonka as well, where I really did not feel good about dropping another 150+ new service packages onto Hackage. At what point is it fair to say that stack just needs to get with the program? The initial work on multiple public libraries was done for a 2019 GSoC project, and then major improvements came through in cabal 3.8.1.0, released about a year ago. This isn’t a brand-new feature.

ApothecaLabs · August 8, 2023, 11:33pm

I have been following along this particular tangent, considering the recent proliferation of the various layers of libraries I am not sure yet what I will do, but for now it is all in flux still.

ApothecaLabs · August 11, 2023, 2:45pm

Today’s update is minor, but it pleases me greatly. I have fixed the ZFEC implementation, and we now have working forward error correction - that is, we can split data up into n shards such that only k of n shards are required to reconstruct the original data. It is useful in situations where you might lose packets during an unreliable stream of data, or while stored on a distributed network.

The changes are:

Fixed and completed Botan.Low.ZFEC and Botan.ZFEC
Implemented most of Botan.MAC

With ZFEC implemented, all of the botan-low modules are functioning, and with Botan.MAC, we have established the pattern that we will use throughout the rest of botan.

These changes have been pushed to the repo

Jappie · August 15, 2023, 2:42pm

how far is this in comparison to zbotan? I’m curious if I could use this to replace the crypton stuff in mysql-haskell.

ApothecaLabs · August 15, 2023, 5:37pm

It is a little hard to compare them - in some ways it is still behind Z-Botan but in others it has already surpassed it. It depends on what you need, and what you’re willing to put up with.

My bindings expose a slightly larger set of functionality, but it is currently a bit rough to use as I am still working on the ergonomics and there are a few issues here and there*. On the other hand, Z-Botan keeps a lot of functions in IO**, and is a fairly faithful implementation, so there’s not much distance between it and botan-low + botan if you import both.

Right now I’d continue to use what you are using, as a lot of my code isn’t rigorously tested yet and I am still working on adding a purer libsodium-like interface plus crypton-like classes and standalone algorithm data types, but once that is all done there shouldn’t be any major functionality missing compared to Z-Botan or crypton (pending a few C++ shims), and I’d totally say go for it.

* major ones being lack of X509 Cert store, TLS support which I need to shim / patch with C++ and have been experimenting with such for the last few days It does look like mysql-haskell does need the cert store stuff, so you’ll have to wait until I specifically get that in.

** Providing a pure layer over botan poses some challenge because the context objects are not-copy-able, by design, and this interferes with referential transparency unless we basically treat them as linear types. It is something I have spent the last few days furrowing my forehead over.

This also explains why botan only has RNG-as-in-entropy APIs and not deterministic PRNG / DRBG - the one deterministically seedable DRBG is an edge case with a warning and is not exported to the FFI.

To all: I have been experimenting with a few things, and will be publishing another update soon.

ApothecaLabs · August 23, 2023, 6:06pm

Hello all! I’m sorry for the silence, I had a rather unpleasant interaction that ended up wasting a rather lot of my time, and it has taken me a few days more to get back into the swing of things. It is really good to be back on this, instead - and I have quite the update for all of you.

A lot of it is experimental, and I’ve been sort of shaking things to see what’s loose - its hard to design cryptography APIs, and the first draft isn’t always the best candidate, even if it does work. Still, I’ve been doing a bit of a rundown of existing Haskell cryptography APIs, to see what we can use, what we can improve, and what we need to write ourselves.

I’m trying to find a comfortable middle ground between classes that describe how cryptographic systems are constructed (low-level, granular), and how they are used (high-level, abstract), and I have made a fair bit of progress as I continue cataloguing and comparing everything.

Here’s what has changed, broken down by library:

botan-bindings

No changes

botan-low

Minor renaming for consistency / specificity to avoid function role confusion
- May make less verbose again in the future depending on how well library nomenclature aligns
C++ integration tests for fixing some of the issues / missing functions in the Botan FFI
- Not pushed, separate branch since is more proof of concept right now still
- It works so far though.
- Should be able to use this to get FFI for the missing X509 and TLS stuff

botan

Minor renaming for consistency
Work on Botan.Bcrypt, notes on safety
Progress on Botan.Cipher - currently non-functional
Experimenting with interfaces in Botan.Random
Botan.Mac mostly completed, still kind of rough
Rough Botan.HOTP module created
Botan.KDF.kdf function implemented
Work on testing key pair cryptography
SPHINCS+ algorithm removed temporarily because it isn’t available in this version
Added Botan.Padding module that may be moved to crypto-schemes
- Am trying to create a hierarchy, padding is used in a few places but they are separate in botan
Created a small Botan.Prelude.Parser parser class that I think I will need
- Small parser for in-memory data, avoids reliance on big third party libraries.
- Will need for parsing bcrypt-like cryptographic object and schema identifiers
- May move to crypto-schemes (or some crypto-utilities library but I am loathe to split off yet another)

botanium

Created the highest-level botanium library with an interface based on libsodium / saltine
Stubbed out exported classes functions and data types
Implemented Botanium.Auth as a test

crypto-schemes

Began work on developing more abstract cryptography classes using saltine / botanium as a starting point
Added a Crypto.ByteVector data type for sized bytestrings
- Supports a well-defined Bits implementation unlike ByteString
- Acts slightly differently than an Integer due to fixes size and leading zeroes
- May be moved to some sort of crypto-utilities if necessary.
Added Crypto.Scheme class for algorithm / scheme identifiers to allow for bcrypt-like encoding / identification of cryptographic objects (still in progress)
Will be developing classes and data families for things like:
- Keys and Digests
- Hashing
- MAC / Auth code
- Ciphers
- Public Key Encryption / Signing
- Padding

botanite / crypton-botan

I have started investigating the difficulty of using botan or crypto-schemes-botan to back crypton
It does not have 100% 1:1 coverage of everything in crypton, but it appears to cover a fair bit of the exported interface, including the most common use cases.
crypton/ite internals are another matter though.
It does seem feasible, though some type signatures may change depending on whether we continue to rely on the memory package
The goal could be drop-in crypton compatibility, or simply a best-match effort to aid in migrating

Community Proposal

I have started sketching out a community proposal for this project. Right now it is mostly scratch notes of things to mention written in the proposal template, but I can now begin putting together a legible first draft.

We will now continue with our regularly scheduled updates

Kleidukos · August 23, 2023, 6:39pm

That’s great work!

Regarding the proposal, I have one comment for one thing in particular:

We need type classes, for cryptography, to allow for user-defined instances.

Be very, very wary of this. Cryptonite’s type classes are famous for needing frequent type annotations because inference gets very hard very quick. This is why I decided for Sel to use the mighty StrictByteString and leave the conversion to the consumers. It might be seen as bare-bones but eliminates a lot of headaches for everyone.

ApothecaLabs · August 23, 2023, 6:51pm

Thanks! It is really coming together now, isn’t it?

A good warning. I have encountered this irritation myself and it has driven much experimentation to see if I can do better. Right now, I am using data families. It is extra boilerplate compared to type families or phantom types, and there is still the occasional need for a type annotation, but the data families make it a lot easier to infer types, and so far it seems to be an improvement over crypton's phantom types.

Kleidukos · August 23, 2023, 7:28pm

Yeah it’s really nice

Now I’m intrigued.

Vlix · August 24, 2023, 12:09pm

I’d also vote for the crypto library to just give the resulting byte strings, for “layer-4” libraries to then make them into whatever that library needs or what the use case for it is. E.g. like password turning them into base64 and adding extra information for the password hashes.

It seems more natural for the crypto library to just do the crypto stuff. Other libraries can take up the role of improving the developer experience by adding APIs on top of the crypto library.

Kleidukos · August 24, 2023, 12:54pm

For the sake of usefulness I think it’s nice to have hexadecimal encoding and decoding handled for you by the library, however

Vlix · August 24, 2023, 1:11pm

For the sake of usefulness I think it’s nice to have hexadecimal encoding and decoding handled for you by the library, however

You mean as an ease-of-use way of marshalling the bytestring? I can see how that would help with eas-of-use in non-specific situations.

I’d argue it’ll only help the people who want to use that de-/encoding, or people who don’t care about the encoding, because anyone else would still have to import Data.ByteString.BaseXX or something to get what they want.
Though I guess if the library already depends on bytestring-baseXX you can just use that to provide the marshalling for a cheap QoL addition
Because if you only depend on an extra library for the ease-of-use marshalling, maybe that’s an unnecessary dependency? Especially if someone won’t ever use it, since now their library/application (in)directly depends on both e.g. bytestring-baseXX and bytestring-baseYY, while only using one of them.

[ TL;DR: If the library already has byte string encoding functionality in its dependencies, go for it. ]