Botan bindings devlog

BurningWitness · September 20, 2023, 5:57am

ByteString does not have a Bits instance.

Bits is a typeclass for integral types (says so on the tin) and arbitrary chunks of memory don’t fit this definition.

It would result in an ergonomic improvement for low-level binary operations

This is only useful when you need to treat memory chunks as finite fields and the only place I know that uses this is cryptography.

Ideally yes, Integer/Natural/Bool do not belong to the sized bitwise operation class, but these are small inconsistencies and burning bridges over them is too much work. Changes like these should just be scheduled for base-5, if the Committee ever decides to go for it.

As for the issues you’re facing, can’t you just use Botan’s multiple precision integers?

ApothecaLabs · September 21, 2023, 6:23pm

Boy howdy there was quite a response to this all!

I have my nose to the grindstone, and am working on responding to all of @david-christiansen 's feedback - it is taking a bit as I want to answer thoroughly.

@romes thx

@jhenahan deep inhale Congratulations sir, you win the internet.

@hasufell Thanks! I’m not getting paid for it yet, but I’d love to do this on a more permanent basis if I were, hence the budget request. Someone from Mercury did appear to reach out a few weeks ago, but I haven’t heard back from them yet.

@jackdk That seems about right!

@BurningWitness

That is indeed why ByteString does not have a valid implementation of Bits - Bits is concerned with representation or encoding, and necessarily includes signing bits and endianness, which affect binary functions like shift and rotate but not boolean functions like xor and complement. My observation is that we could split off a Boolean / Heyting class from Bits with about as much pain as the Semigroup / Monoid debacle, maybe less.

ApothecaLabs · October 3, 2023, 10:41pm

An update - Bcrypt and things

It’s been a little while since I’ve updated the devlog here; for the most part my activity for the last week or two has been focused on the proposal, and I’m just about done with an updated draft that accounts for all of the feedback - I’ll be posting that soon.

In the meantime, I still have been working on the bindings, and I have a lovely little update that I’ve been meaning to post for a few days.

The primary update is that I have:

Fixed Botan.Low.Bcrypt INSUFFICIENT_BUFFER_SPACE exception
- Botan.Low.Bcrypt unit tests now pass
Made improvements to Botan.Bcrypt for easier use
Fixed trailing NUL in bcrypt digests (may have been a regression)

By going through the C++ source, I discovered that there was an unmentioned size pointer check that was being compiled out in most cases, except for tests. In some cases Botan uses size pointers to return the final length of the output, and sometimes it also uses the same pointer to take the output buffer length as an input. It turns out we need to poke the length of the output buffer here, as it was erroneously succeeding before because we were providing a buffer of adequate size but the check was being compiled out.

Now Botan.Bcrypt should be good to go!

In addition to this, I did some work on public key exchange stuff because it is hanging indefinitely:

Began work on Botan.PubKey.KeyAgreement
- Discovered memory leak / hang in Botan.Low.PubKey.KeyAgreement
- Function hangs until terminated by OS (Ctrl-C signal is blocked)
- safe vs unsafe doesn’t seem to matter
- Occurs in botan_pk_op_key_agreement function
  - This is for DH, ECDH, Curve22519 key exchange, so high priority
- Checked some different versions of GHC to make sure its not a version-specific bug, need to test more thoroughly
- Will be going over C++ and Z-Botan source to figure it out

And now for something a little bit unusual - Merkle Trees!

I’ve been working on this project for a while, and I thought it would be good to show a bit of where this cryptography project is headed, long-term, while I finish updating the proposal pull request.

You see, I didn’t just idly decide to spend my time writing bindings to a cryptography library. I want to be able to express some very particular cryptography constructs, and I want to express them properly - and that means starting with a solid foundation. That foundation is now solid enough to begin building some of those constructs.

One of those constructs is Merkle, an abstraction of the core concept behind sparse merkle trees. A merkle tree is a data structure that hashes objects and combines their digests in order to provide integrity and cryptographic proofs for many pieces of data. Such structure form the basis of many distributed cryptosystems but often implementations are highly-concrete, focused on binary optimization rather than ease of understanding, and take pages upon pages to explain and justify everything. That never sat well with me.

Today, I’m here to show you how to build a sparse merkle tree with just a few lines of code, and we will do it in a way that barely mentions cryptography, and allows us to define arbitrary merkle data structures in an incredibly concise manner.

Are you ready to speedrun some merkle trees? We start with our ideal hash class from earlier:

data family Digest a

class Hash a where
    hash :: ByteString -> Digest a
    digest :: Digest a -> ByteString

A sparse merkle set / map is intimately related to a unital magma algebra, or what you might call a non-associative, non-commutative monoid. As such, we can define it by extending the Hash typeclass to Merkle by adding an empty and append function.

class (Hash a) => Merkle a where
    merkleHash :: ByteString -> Digest a
    merkleEmpty :: Digest a
    merkleAppend :: Digest a -> Digest a -> Digest a

One advantage of this approach is that its conceptual reliance on cryptography is low, an implementation detail. This becomes important in the future when I start blending this with indexed recursion schemes because there are more tight mathematical connections there (but we won’t get to that today we’re getting close).

Conceptually, merkleEmpty = hash "", merkleHash = hash, and merkleAppend a b = hash $ digest a <> digest b. However, a naive implementation such as that actually has some problems with what’s called a “preimage attack”. To make a long explanation short, it gives you the same digest for hashing a plain bytestring as you could get from appending two digests, and we need to be able to distinguish between the two cases (that is, force them to give different results).

We can solve this by adding salts / sigils, and differentiating between hash and merkleHash, and giving safe default implementations.

class (Hash a) => Merkle a where

    merkleHash :: ByteString -> Digest a
    default merkleHash :: (Hash a) => ByteString -> Digest a
    merkleHash = safeMerkleHash

    merkleEmpty :: Digest a
    default merkleEmpty :: (Hash a) => Digest a
    merkleEmpty = safeMerkleEmpty

    merkleAppend :: Digest a -> Digest a -> Digest a
    default merkleAppend :: (Hash a) => Digest a -> Digest a -> Digest a
    merkleAppend = safeMerkleAppend

safeMerkleHash :: (Hash a) => ByteString -> Digest a
safeMerkleHash bs = hash $ concat
    [ "#"
    , bs
    , ";"
    ]

safeMerkleEmpty :: (Hash a) => Digest a
safeMerkleEmpty =  hash "" -- Or all zeroes, or Nothing

safeMerkleAppend :: (Hash a) => Digest a -> Digest a -> Digest a
safeMerkleAppend a b | a == safeMerkleEmpty = b
safeMerkleAppend a b | b == safeMerkleEmpty = a
safeMerkleAppend a b = hash $ concat
    [ "$"
    , digest a
    , ":"
    , digest b
    , ";"
    ]

Note that because of the sigils, it is impossible to obtain any value from merkleEmpty or merkleAppend that you would from merkleHash, though one must take care to use merkleHash now (I should differentiate Digest and MerkleDigest to emphasize this).

So… what can we do with this unital magma hash / algebra?

Well, the first thing that comes to mind are folds, which we can use to trivially produce a merkle list or hash chain. Digests like these are often used for data integrity in things like torrents and distributed systems.

-- NOTE: Left-associative to match function application
infixl 5 <#>

(<#>) :: (Merkle a) => Digest a -> Digest a -> Digest a
(<#>) = merkleAppend

merkleFoldl :: (Merkle a, Foldable f) => f (Digest a) -> Digest a
merkleFoldl = foldl merkleAppend merkleEmpty

merkleFoldr :: (Merkle a, Foldable f) => f (Digest a) -> Digest a
merkleFoldr = foldr merkleAppend merkleEmpty

We’re just calculating the final digest here, but for real use cases we’d keep all of the hashes.

Now, on to the main thing - sparse merkle trees. Hash lists have been mostly supplanted by merkle trees, especially sparse ones, but they are just binary radix trees, also known as patricia trees. Before we implement our merkle tree, we need a few helper functions.

partitionBit :: (Bits (Digest a)) => Int -> [Digest a] -> ([Digest a],[Digest a])
partitionBit n = List.partition (not . (`testBit` n))

both :: (Bifunctor f) => (a -> b) -> f a a -> f b b
both f = bimap f f

Are you ready? Here we go! 6 lines!

merkleSetDigest :: forall a . (Merkle a) => [Digest a] -> Digest a
merkleSetDigest = go (digestSize (Proxy :: Proxy a) - 1) where
    go _ [] = merkleEmpty
    go _ [x] = x
    go n (x:x':xs) | x == x' = go n (x:xs) -- Deduplication
    go n xs = uncurry merkleAppend $ both (go (n - 1)) (partitionBit n xs)

It is trivial to obtain a sparse merkle set digest using unital magma hashing. Note that deduplication is unncessary if the inputs are already unique. If we ignore the type declaration and the deduplication, there are really only 4 lines that matter. Concise

So what’s going on? As we traverse the list of elements, we partition them by each bit, which naturally sorts all of the digests, and as we come back up, we perform a bunch of merkleAppends to join them. Note that an empty node is the empty hash, and that appending two empty hashes results in an empty hash, and appending an empty hash to anything results in that thing. This handles all of our sparseness for us automatically, with no extra work.

Now, this just calculates the final digest, what if we wanted to actually keep the entire tree structure?

merkleSetTree :: forall a . (Merkle a) => [Digest a] -> (Digest a, [(Digest a, (Digest a, Digest a))])
merkleSetTree = go (digestSize (Proxy :: Proxy a) - 1) where
    go _ [] = (merkleEmpty, [])
    go _ [x] = (x, [])
    go n (x:x':xs) | x == x' = go n (x:xs) -- Deduplication
    go n xs = uncurry join $ both (go (n - 1)) (partitionBit n xs)
    join (l,ls) (r,rs) = let top = merkleAppend l r in (top, (top, (l,r)) : (ls ++ rs))

In this version, we actually return the full merkle tree. Note the return type is effectively (TopDigest, [(ParentDigest, (LeftChildDigest, RightChildDigest))]), and that merkleSetDigest = fst . merkleSetTree. By simple convention, terminal nodes are implicit to the tree, and are not included in the list of nodes since their parent already contains them, and they themselves have no children. As a result, merkleSetTree [] = (merkleEmpty, []).

NOTE: One of the neat things about merkle sets as sets of elements is that they actually form a monoid hash, constructed from the unital magma hash. Since a merkle set is a set, we can construct a merkle set tree from elements in any order and get the same digest. This property becomes extremely useful in the future.

Now, this is pretty neat, but we can do one better. What’s better than a merkle set?

A merkle map:

partitionBitMap :: (Bits (Digest a)) => Int -> [(Digest a,b)] -> ([(Digest a,b)],[(Digest a,b)])
partitionBitMap n = List.partition (\ (k,_) -> not (testBit k n))

merkleMapDigest :: forall a . (Merkle a) => [(Digest a, Digest a)] -> Digest a
merkleMapDigest = go (digestSize (Proxy :: Proxy a) - 1) where
    go _ [] = merkleEmpty
    go _ [(k,v)] = merkleAppend k v
    go n (x@(k,_):(k',_):xs) | k == k' = go n (x:xs) -- Deduplication on key
    go n xs = uncurry merkleAppend $ both (go (n - 1)) (partitionBitMap n xs)

And a version that returns the full merkle map:

merkleMapTree :: forall a . (Merkle a) => [(Digest a, Digest a)] -> (Digest a, [(Digest a, (Digest a, Digest a))])
merkleMapTree = go (digestSize (Proxy :: Proxy a) - 1) where
    go _ []  = (merkleEmpty, [])
    go _ [x@(k,v)] = let h = merkleAppend k v in (h, [(h,x)])
    go n (x@(k,_):(k',_):xs) | k == k' = go n (x:xs) -- Deduplication on key
    go n xs = uncurry join $ both (go (n - 1)) (partitionBitMap n xs)
    join (l,ls) (r,rs) = let top = merkleAppend l r in (top, (top, (l,r)) : (ls ++ rs))

Note that merkleMapDigest = fst . merkleMapTree, and that although the return type appears the same as merkleSetTree, they have slightly different conventions and handling. Note that terminal nodes are now included in the list, because their children are their key and value digest, but to find out if you’ve reached the end, you have to look for the key in the list and not find it.

That’s a bit awkward, but we’ve sacrificed heavily in the name of conciseness here. In the real world we’d probably use a a more complex but also more performant implementation. These super short implementations are here to illustrate the simplicity of the concept.

Finally, observe that merkleSetDigest [a,b,c,...] == merkleMapDigest [(a,merkleEmpty),(b,merkleEmpty),(c,merkleEmpty),...] .

import Crypto.Merkle
import Crypto.Hash.SHA3
digests = fmap (merkleHash @SHA3) [ "fee","fi","fo","fum"]
t0 = merkleSetDigest digests
(t1,_) = merkleSetTree digests
pairs = fmap (\ dg -> (dg, merkleEmpty @SHA3)) digests
t2 = merkleMapDigest pairs
(t3,_) = merkleMapTree pairs
-- All of these should return the same value
t0
t1
t2
t3

This is due to the relationship that can also be observed between the Haskell Set and Map data types, and this is not a coincidence. Producing identical digests shows that our implementation is consistent and logical, and in the future, I will be using these merkle sets and maps to build a cryptographic version of Set and Map that automatically handle proof of integrity, inclusion, exclusion, insertion, and deletion - fun! However, in order to protect against preimage attacks, I will also add salts / sigils to the merkle set and map functions, to differentiate them.

This merkle tree stuff can be found in crypto-schemes and the SHA3 from crypto-schemes-botan to test it.

That was a lot, but I’ve been building up this post for a little while, and I’m glad to finally be posting it. I’ll be writing up a much more thorough blog on this topic in the future

romes · October 4, 2023, 7:59am

Absolutely glorious. I am very much looking forward to your full blog posts on this, all of them.

ApothecaLabs · October 7, 2023, 7:17pm

I have a simple but impactful update today.

Early on in the development of these libraries, I adopted a naming convention for the low-level functions in botan-low to help disambiguate them from the pure or higher-level versions of the same functions in botan. This lead to many functions being named fooCtxVerbIO, and in general, it was a bad idea, but I stuck with it because I didn’t want to spend all my time refactoring nomenclature, and I had other things to focus on.

However, lately, it has been increasingly annoying to work with, as it creates a whole bunch of visual noise for no reason, and so this update reverts that naming experiment:

Removed ‘Ctx’ and ‘IO’ from low-level function names
Function names in botan-low match the canonical botan function
This naming convention should be more or less final
Some individual function names do not match exactly
- Mostly public key encryption functions eg encryptCreate instead of pkOpEncryptCreate
botan now imports botan-low qualified a la import qualified Botan.Low.* as Low to disambiguate
Fixed botan-low tests
Fixed botan
Fixed botanium

Function names in botan are unchanged, meaning that they are still a mess, but will undergo a similar naming refactor / simplification in the future.

The update to the proposal is coming soon, but I took a breather for a few days after @Bodigrim 's advice to not worry or rush I’m glad I did because I finally got this done.

ApothecaLabs · October 12, 2023, 9:24pm

Its a day for another update !

Changes are mostly new unit tests for a few modules, and some benchmarks against crypton!

Added unit test for Botan.Low.MAC
Removed CBC-MAC (unsupported)
Added unit test for Botan.Low.HOTP
Started Botan.HOTP
Added botan-low-bench benchmark for testing botan Bcrypt and SHA3 against crypton

Aside from that, the proposal is in its final draft stage, so if you want to give feedback on it, this is your last chance!

ApothecaLabs · October 19, 2023, 6:23pm

Update time!

The repo has been updated with a few more things:

Added unit tests for Botan.Low.KDF
Improvements to Botan.KDF
Added unit tests for Botan.Low.KeyWrap
Improvements to Botan.KeyWrap
Unit test for Botan.Low.PwdHash
Fixed compilation on Linux due to filename case sensitivity (thanks @arybczak )

Slowly but surely we’re getting unit tests done for all of our modules, and we can see what passes, what fails, and what still needs work - it is quite revealing. We still need unit tests for:

PubKey and PubKey.*
RNG
SRP6
TOTP
Utility
X509

I’ve sort of just been going in alphabetical order, but it is nice to see we’re getting near the end

At some point soon I’ll need to split the unit tests into botan-low-tests for testing that the bindings are bound properly, and botan-tests for more exhaustive algorithm-specific tests including test vectors, and I’ll be further breaking them up into individual test targets by primitive too.

So why unit testing now so important?

While writing botan, I proceeded as a series of steps. First I wrote the FFI bindings, then wrote type-safe functions that compiled and ran. I tested in GHCi to make sure that these functions worked for at least one algorithm, but did not ensure that they worked for all algorithms, as doing exhaustive unit testing requires knowing the algorithm-specific quirks for each algorithm, basically going through them one by one.

As a result, many algorithms would throw botan error codes or have edge cases in behavior, and it wasn’t really feasible yet to build higher-level functions if I didn’t know that they would be valid paths for all algorithms. Now that the core of botan-low is properly implemented, I’ve focused on writing unit tests, and it’s been catching all sorts of things. A recent example of this are the offline SIV and CCM ciphers, which do not support online processing but simply return garbage - this was caught via the unit tests.

As I’ve gone through these tests, I’ve also been cataloguing which algorithms are available by combing through the C++ source, and testing and showing that they work for all algorithms - and documenting which ones don’t. This has lead to a large improvement in the representation of what algorithms are available, effectively shaping our higher-level algorithm data types in botan. It might not be noticeable if you hardcode a specific, commonly-used algorithm, but it has a large impact if there is a choice of algorithms.

Plus, there are good reasons for doing unit tests now, instead of later:

It shows us what is repetitive, unergonomic, and needs to be pushed to a higher level
It exposes missing functions (eg, MAC nonce size function)
It gives us the content for writing tutorials
It confirms / denies documentation
We can apply what we learned to shape botan, eg the structure of our ADTs

Basically, it helps us plan our next move more clearly. There’s enough code now that it is no longer trivial to ensure that changes here don’t break things there, and so unit tests help us keep moving forward.

So what’s next? After working on the proposal for so long, I need a moment to reorient my focus back to the code, but I’ve definitely got some specific goals:

Finishing up these unit tests
Getting to that large botan-bindings PR that has breaking changes
CI stuff and build instructions
Better functions for handling botan-low and botan algorithm names
Cataloguing what’s remaining to get botan-low ready for hackage candidate

Onwards!

ApothecaLabs · October 25, 2023, 4:35pm

Update: Unit Tests and Pubkey fixes

Once again, the repo has been updated, this time with a good number of things.

RNG unit tests
Improvements to SRP6
- Added synonyms for all the bytestring arguments
- Fixed srp6ClientAgree
SRP6 unit tests
- Documented function srp6_group_identifier missing from Botan FFI
TOTP unit tests
Improvements to Utility
- Replaced size calculations with pointer querying to fix InsufficientBufferSpace exception
Utility unit tests
Improvements to PubKey
- Found authoritative source of algorithm names and parameters
- Fixed GOST-34.10
- Removed X25519 (redundant synonym for Curve25519)
- Renamed awkward flag data types
PubKey unit tests
PubKey.Encrypt unit tests
PubKey.Decrypt unit tests
PubKey.Sign unit tests
PubKey.Verify unit tests
Improvements to PubKey.Sign
- ~~Fixed signFinish to use allocBytesQuerying~~
- Fixed signFinish to use upper bound and then trim
Found Botan’s test vectors in the C++ source

Almost all of the basic unit tests are done now, and though there are algorithm combos that fail, things are more or less passing in general. There are still some PubKey functions that need testing:

Key agreement
Key encapsulation
Algorithm-specific key loading functions

X509 certificates also don’t have unit tests yet, but I already know I’m going to need to take a deeper look at them so I’m holding off on that and considering it to be a larger, but important issue.

The big thing today, aside from all of the unit tests, is that pubkey encryption, decryption, signing, verification are now all working for all algorithms, though a bit fiddly because of parameters.

Signing was actually slightly broken, and was in some cases producing signatures that would fail verification. A closer look and some experimentation showed that sizes weren’t correct, which was slightly unexpected behavior given that many other functions allow you to query the size pointer by providing a null pointer for input.

In retrospect I realized that the size was non-deterministic due to the random generator. Querying once and running it again with the queried size would actually result in a different random number being generated internally, and thus the occasional InsufficientBufferSpace exception - which is why I couldn’t use querying for these functions (or any other function with non-deterministic buffers, really).

Armed with this knowledge, I figured out that I needed to both poke a size upper bound into a particular pointer, and read its actual size afterwards, and that pretty much got it working properly. Viola!

The Proposal Has been Approved for Recommendation

In some good news that actually happened last Friday, the TWG committee has voted to approve the proposal for recommendation to the Haskell Foundation.

This is a recommendation and not a commitment, and the Foundation still has to accept (or reject) it, but they are meeting Thursday, and this item is on the agenda! I obviously have high hopes, but I’ve made it this far with everyone’s help and feedback, and either way I’m happy to have had your support.

'Til next time!

ApothecaLabs · October 27, 2023, 7:02pm

Unit tests for botan-low minimally complete (minus X509)

It has been a busy few days, and I am happy to state today that unit tests for botan-low are minimally complete! I divvied up up the remaining unit tests into similar groups, and managed to get through them all, including a few fixes for key agreement and key encapsulation, which were the final pubkey operations.

Added unit tests for privKeyGetField and pubKeyGetField in Botan.Low.PubKeySpec
Added unit tests for PubKey.DH, DSA, ElGamal, and RSA
Added unit tests for PubKey.ECDH, ECDSA, and SM2
Added unit tests for PubKey.Ed25519, X25519
Added fixes and unit tests for PubKey.KeyAgreement
Added fixes and unit tests for PubKey.KeyEncapsulation

Admittedly, there’s still lots of testing improvements to be made, but every primitive should be passing tests for at least one algorithm, and most of them for several or all. Its awkward to use, but it all seems to be working, aside from a few specific things that need a more dedicated focus (which we are getting closer to now that these tests are done).

After writing these tests, I am also thinking about making a change in responsibilities regarding botan-low vs botan - making botan-low be more responsible for all of the algorithm data types. It’s hard to get 100% algorithm combination testing coverage using only string data types, because not everything is compatible, and the tests help us tell what is - so algorithm ADTs are almost necessary to do better testing.

This would be a slight departure from the proposal already, but not a great one, and I believe the proposal process acknowledges this sort of change anyhow. I’m sure it will all shake loose as things develop.

The repo has been updated.

Also I just noticed I goofed on a commit message as I pasted the previous message for format but forgot to edit it - the second commit entitled Added unit tests for PubKey DH, DSA, ElGamal, and RSA should be entitled Added unit tests for PubKey ECDH, ECDSA, and SM2 - pay this no mind

arybczak · October 27, 2023, 11:07pm

I am also thinking about making a change in responsibilities regarding botan-low vs botan - making botan-low be more responsible for all of the algorithm data types.

For the record, I like that botan-low API currently simply mimics botan’s FFI API by accepting strings and I think it should stay this way for mirroring and flexibility purposes.

Higher level API with ADTs should be a layer over it.

Ambrose · October 28, 2023, 4:06am

100%. C embeds in Haskell pretty much perfectly (except CBV structs ), so verbatim C bindings are crucial.

I even just call the module C. Like SDL.GPU.C.

Than the layer on top that abstracts over memory allocation and bracketing and other idioms, I call Simple.

ApothecaLabs · October 28, 2023, 5:29am

In that case, rather than give more responsibility to botan-low, I need to take some away and give it to botan instead.

There is a power-responsibility mismatch as over the last few months, botan-low has accrued a few more responsibilities than I’d like (online vs offline processing, unit testing, size queries) that make it more than just managing buffers. I need the strong typing to be able to make some necessary distinctions between algorithms, but moving those items up into botan would suffice instead of moving the types down.

ApothecaLabs · November 4, 2023, 7:45pm

An update: Starting work on X509 support

This is exciting! My first update after this project’s proposal was accepted for funding!

So what have I been up to? X509 certificates! Support for this functionality in the Botan C FFI is a little sparse, so I’ve been going over everything to see what’s what in particular. There’s actually quite a bit of functionality missing compared to the full Botan C++ library, and its a section of this project that I know I’ll be writing a little bit of C++ which may get contributed upstream. I’ve started adding lots of notes and data types, and X509 is probably going to be my primary focus for a bit while I regroup over some things and plan out some of the next month’s work.

X509 basic unit tests
- Testing against a single simple cert for now
- CRL unit tests pending
Stub data types for X509 Distinguished Names, CRL, Status codes, CA, CRL codes, CS

All in all though, the basic certificate loading and queries seem to be functional, but there is no support for Certificate Revocation Lists aside from checking against pre-existing CRLs that can’t be created or modified, and no support for Certificate Authorities or Stores - not in the FFI, which is why we’ll be getting our hands dirty with C++, to add to the C FFI.

The good thing is that what’s there does work, and that’s nice to check off the list. I’ve been digging through the Botan C++ source to see what else is needed - identifying gaps to fill. TLS is probably also going to need similar work, but I’ve been preparing for this

That’s it for the moment - 'til next time!

ApothecaLabs · November 7, 2023, 5:40pm

Update: Lots of C++

For the last few days, I’ve been working on extending Botan’s FFI to include X509 Certificate Authorities and Certificate Signing Requests. I’m a bit rusty with the C++, but I’m shaking off that rust quite nicely. Modern C++ has some nice things going for it, if you aren’t burdened with a legacy codebase.

I’ve hit a (temporary) wall with what I can achieve in my attempts to extend Botan’s FFI by adding C++ to the Haskell project directly. Now, this isn’t really bad, as it is something I very much expected, but the practical result is that I just need to spend a bit of time setting up build flags and things earlier than expected.

The problem is that Botan does not expose the proper headers to access the Botan_FFI C++ namespace, and as a result, I cannot obtain a reference to the encapsulated C++ objects which I need in order to pass it to the C++ functions that I am newly wrapping / calling.

For example, I have a Haskell PrivKeyPtr / botan_privkey_t and need to access the Botan::Private_Key underneath. I can see botan_privkey_t exposed in the FFI headers, and I have access to Botan::Private_Key via the public C++ headers, but I can only translate between the two if I have access to Botan’s internal Botan_FFI headers, which I don’t have.

-- Haskell
foreign import ccall unsafe hs_botan_x509_ca_create
    :: Ptr X509CAPtr
    -> X509CertPtr
    -> PrivKeyPtr
    -> Ptr CChar
    -> RNGPtr
    -> IO BotanErrorCode

// C FFI
int hs_botan_x509_ca_create(
    hs_botan_x509_ca_t* ca,
    botan_x509_cert_t cert,
    botan_privkey_t key,
    const char* hash_fn,
    botan_rng_t rng);

// C++
X509_CA X509_CA(
    const X509_Certificate &cert,
    const Private_Key &key,
    const std::string &hash_fn,
    RandomNumberGenerator &rng);

Now, this doesn’t stop me from implementing new C structs for wrapping C++ objects, nor does it stop me from using the old ones, but the issue is I can’t mix the two - the opaque botan_struct pointers remain opaque, despite the functions that need that access.

I could just re-implement botan_privkey_t as hs_botan_privkey_t and then be access its Botan::Private_Key for use with hs_botan_x509_ca_create / X509_CA(...), but then I can’t use that new privkey struct in the old functions. I’d have to rewrite the entire FFI from scratch, or at the very least clone the entire botan/ffi directory into cbits which I do not want to do.

The only real option is to fork Botan and extend the FFI from inside Botan C++. However, now you see why this is not really a problem - we were planning on contributing these improvements back upstream anyway, its just that we have to get set up to do that now. Oh bother

As a result of this, I am gating the extended FFI behind the XFFI flag. Anyone who wants to test the experimental FFI can instead clone the experimental botan-upstream fork fork, build and install it from source, and use the XFFI flag to enable the experimental FFI modules. There are some instructions in the README, and it looks mostly like this:

# Clone
git clone https://github.com/apotheca/botan-upstream $BOTAN_CPP

# Build and install C++
cd $BOTAN_CPP
./configure.py --prefix=$BOTAN_OUT
make
make install

# Play around with it
cd $BOTAN_HASKELL
cabal repl botan-low -fXFFI --extra-lib-dirs=$BOTAN_OUT/lib --extra-include-dirs=$BOTAN_OUT/include

I’m still setting up everything, but I will be moving the C++ FFI code from cbits to the botan-upstream fork. This leaves regular users unaffected, and you can continue to use a regular Botan install via your favorite package manager / distribution if you wish.

For the moment the botan-upstream fork is unmodified, so there’s not much point to it yet, but that will be changing quite soon.

That’s all for now!

ApothecaLabs · November 9, 2023, 4:48pm

Great forking success!

So I’ve been doing a lot more tinkering with C++ for the last few days, and have gotten everything set up with the new botan-upstream fork The extended FFI is working, the new C structs now really wrap their C++ counterparts, and the new bindings can all find their symbols which is great. Plus, contributing back upstream to C++ Botan is just a pull request away now, too.

Botan’s FFI tools are reasonably nice to work with, even though they aren’t exposed externally. They have macros and defines for simple tasks, and for safely ‘visiting’ C++ -land from C, and so a lot of it boils down to safely casting between C and C++ as long as the arguments line up. I still have some questions, but for the most part Botan’s FFI follows a consistent formula, which I’ve been able to apply to the new structs and functions.

This has made for some pretty good progress, despite my not having touched C++ for the better part of a decade - I’ve even gotten as far as implementing botan_x509_ca_create, botan_x509_ca_create_padding, and botan_x509_ca_destroy, though it is untested for the moment until I build Botan.Low.X509.CA

Now, it may feel a bit odd to be gallivanting off to C++ -land when there’s still a lot to do in Haskell, but the C++ work to implement better X509 support was both a high-value target, as well as one of the larger “I know this is doable, but I don’t know how hard” tasks. Now I know roughly how hard it will be (and its better to know this early), and I must say I was pleasantly surprised to find that it was less effort than expected so far - I have worked with far worse codebases.

All in all, I’m pretty satisfied with how its going

ApothecaLabs · November 15, 2023, 9:56pm

Update: X509 data types, stub functions, and bindings

This update has been building for a few days now, and I’m glad to get it out. You see, you pull on one thread, and things start to unravel - and I can see why the original FFI authors didn’t want to implement more extensive X509 support.

Basic X509 functionality does exist, but is almost entirely limited to loading read-only objects, and I want to be able to use this library to create, sign, revoke certificates, encode / decode PEM and BER/DER formatted objects - so that means getting my hands dirty and doing it myself

It is a bit tedious: to enable one functionality you must implement this other function which requires these data types in order to create these other ones - and and so on until you’re basically forced to implement everything, together. Luckily, that’s our explicit goal, so its no skin off our back!

I’m mostly mirroring the existing C++ API, and after soaking my brain in the X509 C++ source for a few days, I’ve written C FFI data types and function stubs for almost all of the X509 data structures. When I started, the existing X509 data types were just basic read-only Certificates and CRLs - now, we have data types for:

Distinguished Names
Extensions
Certificates
Certificate Authority
Certificate Signing Requests
Certificate Options
Certificate Stores
Path validation
Supplemental data types

All of the things that you need to generate and sign new certificates! There still are some open questions (namely, returning arrays of things, and some questions of ownership), and I still need to do FFI types and function stubs OCSP support, and there’s some missing functions, but we’re getting there!

One large caveat, is that this is just the stub functions, and now I still have to go through and actually implement them all to call the C++ from C. There’s still a lot of work to do, but I’ve basically started in the middle, between the Haskell and the C++ - and with the bindings set, I can get to work on implementing the Haskell and the C++ together.

On the other hand, I’m basically defining the FFI types, and that’s one of the most important parts of programming. Plus, all of this X509 work is effectively doubling the size of the Botan C FFI (it now accounts for half of the FFI header file!) so I shouldn’t be surprised it’s taking a bit of time - I’ll bet the original FFI wasn’t built in a few weeks either!

These changes have been pushed to the repo and upstream fork.

If you want to check out this recent work you’ll have to clone the experimental botan-upstream fork, build and install it from source, and use the XFFI flag to enable the experimental FFI modules. There are some instructions in the README, or you can follow along here:

# Clone
git clone https://github.com/apotheca/botan-upstream $BOTAN_CPP

# Build and install C++
cd $BOTAN_CPP
./configure.py --prefix=$BOTAN_OUT
make
make install

# Play around with it
cd $BOTAN_HASKELL
cabal repl botan-low -fXFFI --extra-lib-dirs=$BOTAN_OUT/lib --extra-include-dirs=$BOTAN_OUT/include

That’s all for the moment! Next up, I’ll be fleshing out these function stubs to be vending actual objects that we can interact with!

jackdk · November 15, 2023, 9:59pm

I might have missed it, so I’m sorry if you’ve already written this somewhere else, but: what’s the eventual plan for this fork? Are you going to be able to upstream these FFI changes?

ApothecaLabs · November 15, 2023, 10:00pm

Yes, I will (eventually) be submitting these FFI changes as a patch / pull request to the original Botan C++ library repo. It is an open issue that I am happy to contribute to.

ApothecaLabs · November 21, 2023, 5:00pm

Update: X509 Cert Options and Cert Store implementations

Now that I’ve got the X509 FFI functions all declared, it’s time to write implementations, and that’s what today’s update is about. Certificate options and certificate stores are more or less first-draft implemented on the C++ side, with certificate signing requests and certificate authorities on the way next. This means that I am very close to being able to generate new X509 certificates through the FFI, which would be a great chunk of new functionality. If I can achieve this, it will be a meaningful contribution not just for the Haskell library but to the upstream C++ library as well.

So close!

Since this is C++, I’ve run into a few questions of string / object ownership and lifetime - it compiles, and the types are correct, but I’ve not really tested it yet. There’s also a few functions that return a shared_ptr, but the Botan FFI is designed for unique_ptr. In general, I’m not up-to-date on the move / copy / ownership semantics yet, and the heavy use of auto and implicit casting makes it difficult to track what is going on. Another issue is a question string encodings, and whether Botan supports null bytes in X509 Distinguished Names, part of the X509 spec which inherits a lot of data types from ASN1 (which does allow for null bytes). I’m just using cstrings for now, but may have to go back and change some things to be uint8_t arrays instead - no biggie if I do.

I plan on posting an update to Botan’s open issue for the X509 FFI asking for some guidance, and in the mean time I’m forging ahead and I’m hoping to have the rest of the X509 interface at least roughly-implemented by the end of the week, barring the open questions about ownership and lifetimes, and for those I will at least make ready to apply the answers for when I do get them.

The repo and upstream fork have been updated. Let’s keep this train rolling!

While I’m working on that, check out the impact that this work has had on my github contribution activity - it’s neat to look back, and see the history of the project:

botan-commit-activity

I started this project back in July, and have managed to keep a consistent level of activity and momentum going ever since, making sure to get something meaningful done just about every day. Notably, the contributions to the botan-upstream fork do not appear here (yet) because it is a fork, but they will show if the fork’s changes are accepted in a PR - when that happens, this last month will be even greener

ApothecaLabs · November 29, 2023, 7:09pm

Update: More X509 C++ and Haskell over the Holiday

It’s been a bit of a quiet week; I meant to give an update days ago, but the holiday left me worn out, it has been difficult for me to write, so this update is a bit unpolished. I think I am in need of a small break, so I will be taking it a bit easy as I prepare the coming first monthly status update, and I will be trying to maintain a shorter, more frequent update schedule.

X509 Cert Store Bindings and Functions, and More

First off, what have I accomplished since the last update?

C++ Distinguished Name implementations
C++ Certificate Extension function declarations and stubs
- Still have some questions over how I’m going to implement this
C++ Certificate Options seem to be working
C++ Created Certificate_Store_In_SQL::find_key_unique, a unique_ptr version of find_key which produces a shared_ptr that we can’t use.
C++ / Haskell Fixed int botan_x509_cert_store_find_all_certs return type
C++ / Haskell X509 Path Validation names and types fixed
Reorganized Haskell X509 types (now spread across multiple modules)
Haskell Certificate Store bindings and implementations
Haskell Certificate Signing Request implementations
Haskell CRL Entry bindings and implementions
Haskell Certificate Authority implementions
Haskell Path validation implementions

I’ve spent a fair bit of time wrapping my head around STL pointer lifetimes, to which a friend has helpfully supplied the following ‘guide’ (I can’t tell if he is serious, or if it is in jest, or both):

What a monstrosity! C++ is taking a bit of a toll on me and I can feel my sanity slipping away…

Regardless, I’m having to do a bit more than just wrap functions at this point, I’m having to update or define new C++ functions to get the requisite access that I need (I can’t use shared_ptr objects due to the structure of the FFI, but there are still some functions using them so I have to write unique_ptr variants of them, as I haven’t yet determined if it is “safe” to relax the type constraint even though I suspect that it should auto-cast via move-constructors).

Aside from that, there’s a few functions that need to return jagged 2d arrays, which means an array of array sizes plus the actual size of the array, meaning you’re dealing with 3 pointer variables. Then there’s a few functions that return key-value pairs, which require 5 pointer variables (keys, key lengths, vals, val lengths, count).

Ignoring those two particular issues, I’m still missing Haskell bindings on some things (DN, Extensions), and about 30 or so missing C++ function implementations (Cert Authority, Cert Signing Request, CRL Entry, Cert Store) - otherwise things are coming along decently. Everything is at least declared and either bound or implemented in Haskell or C++, and having one side done makes it easier to fill in the other.

Some thoughts I have as I prepare the monthly update.

X509 explodes off into a fractal of complicated dependencies: for example, it is reliant on ASN1, and implementing bindings to ASN1 that is far out of scope and I’d rather do that in Haskell anyway.
Issues / questions over what encoding some things are in - in C++, string just means array of chars, are not cstrings, are not necessarily UTF8
May ignore DN objects for now in favor of passing string-encoded subject and issuer distinguished names
Extensions: singular object vs individual extension objects (which do have getters)
I worry about the ‘query with a null array pointer and we’ll populate the size pointer’ method
- Doesn’t scale to higher arrays (see issues with 2d jagged arrays)
- Potential efficiency / non-determinism issues with generated results
- Eg, the generate_crls method for sql cert stores
- I would prefer an actual size query function.
- May revisit this issue, but it applies to several
Some things are unavailable to me (Sqlite3 store implementation) at this time pending investigation
Learning some lessons on what I’d do differently if I were to start from scratch (but now is not the time for refactoring)

My biggest concern right now is that I know that the X509 stuff is still going to take continued effort, but I can’t let it take up the entire project’s focus. So, in order to make sure that I hit all of my deliverables in the next two months, I have to figure out a limit to what I’m going to focus on, and cut it off at a certain reasonable point.

Hence my thoughts are that the X509 interface I’ve so far declared is sufficient for now, and definitely an improvement over what was there already. I’ve done enough work on the X509 stuff to know that it should make it in as a deliverable, and that’s good. But it also illustrates why some high-value goals (like X509) are optional deliverables - at this point, it is also safe to say that the other optional deliverables may not make it this stage as I still have much to do.

That’s it for now - I’m preparing a more thorough update for next time to mark the end of the first month, see you then!

The repo and upstream fork have been updated.