Botan bindings devlog

ApothecaLabs · September 15, 2023, 10:15pm

It is a property of the SIV and CCM algorithms themselves - I dug into a NIST paper Structural Classification of Authenticated Encryption Schemes to be sure.

They are both ‘MAC-then-Encrypt’ AEAD algorithms, which mean that they must first process the plaintext once to authenticate it, and then a second time to encrypt it. This property is one of the reasons that ‘Encrypt-then-MAC’ AEAD algorithms are usually considered preferable.

The only real error is that botan returns garbage here and does not return an error code / throw an exception if you try anyway

ApothecaLabs · September 19, 2023, 3:06pm

It’s time for another update!

Community proposal submission

The first thing is something to celebrate - I have officially submitted the cryptography community project proposal to the Haskell Foundation - this proposal has been a few weeks in the making, so many thanks to all who provided feedback

What to work on next

Getting the community proposal written and implementing online / incremental processing for ciphers has been my focus for the last few weeks, and now that they’re both done I need to take a moment and figure out what to work on next. I have several options:

Get more unit tests written for botan-low
Incremental processing for other things
Merge the pull request that refactors botan-bindings
Work on the consistency / ergonomics / exported interface for botan-low & botan
Flesh out a proper task list to make these decisions easier

I am probably going to go for unit tests, or merging the pull request, because it is better to get the lower level stuff done first.

Cryptography Abstractions

In the mean time, while I’m deciding that, I like to like to work on the higher-level abstractions a bit to see how practical my lower-level decisions have been. With the recent addition of online processing to the mix, that has got me thinking about two things:

Lazy bytestrings which lead me on to a deeper dive into their internals - very satisfying
Higher level abstractions, which led me to re-sketch out some cryptography classes

The impetus is that existing efforts for cryptography typeclasses seem to focus on the nitty gritty mechanics of cryptography processing. Look at crypton/ite’s HashAlgorithm (or the (Block)Cipher classes), and the functions used to operate on them:

-- crypton/ite's HashAlgorithm class
class HashAlgorithm alg where
    hashBlockSize :: ...
    hashDigestSize :: ...
    hashInternalContextSize :: ...
    hashInternalInit :: ...
    hashInternalUpdate :: ...
    hashInternalFinalize :: ...

It just concretizes how it implemented incremental hashing, rather than abstracting hashing in general. I think we can build a much better typeclass hierarchy for cryptographic primitives, by analyzing the commonalities of the algorithm-specific functions.

You’d expect that a typeclass abstraction of hashing would be something much simpler, less restrictive, like:

-- An ideal Hash typeclass
class Hash alg where
        hash :: ByteString -> Digest alg

We can also just use the same interface but with lazy bytestrings for incremental processsing, and leave how the incremental processing is achieved up to the instance. This is significant because different crypto libraries may expose incremental processing differently (eg API differences), so we mustn’t assume anything about how it performs incremental processing.

-- The only difference is that it takes lazy bytestrings
class (Hash alg) => IncrementalHash alg where
        hash :: LazyByteString -> Digest alg

Then on top of these we can build more typeclasses specific to block sizes and updates and finalizing.

The Cipher typeclasses aren’t much better; the Cipher class actually only concerns itself with initialization, without defining anything about encryption / decryption. The BlockCipher class slightly improves on this, by providing individual functions for each block cipher mode, but that conflates all modes as a requirement, plus that means that the class should be called BlockCipherMode. Furthermore, AEADs and Stream ciphers are just sort of left to implement algorithm-specific encrypt / decrypt functions on their own. Other crypto primitives do not have classes.

Don’t get me wrong, extended typeclasses for block-based and incremental hashing / ciphers is great, but we shouldn’t use that as a base typeclass nor assume that all algorithms are capable of such things. Streaming hashes and ciphers do not have (input) block sizes and can take arbitrarily small inputs, but not all algorithms can do this (eg, block-based hashes and ciphers).

I’ve sketched more out in a file but its mostly just musings and scratchwork - I haven’t even tried to compile it yet - but a consistent interface has developed so we’re getting somewhere, and just about anything would be an improvement over the current situation.

Boolean typeclass proposal

We all know there’s some awkward stuff in base that could be improved, and Bits is one of them - second only to Num in awkwardness, the Bits class is too large, and unnecessarily lumps boolean algebra functions together with bit field functions. As a result, Bits is often not properly implemented, to the point that there is no proper instance for ByteString, and I believe this negatively affects the low-level-binary Haskell ecosystem in general.

This particular pet peeve is something that’s been churning in my mind for a while, and it has only gotten more relevant with the cryptography project. I believe we can fix these things by reworking things a little, and so I’ve started working on a base proposal to split off a Boolean typeclass from Bits, something like:

class (Eq a) => Boolean a where

    complement :: a -> a -- aka not

    and :: a -> a -> a
    or  :: a -> a -> a
    xor :: a -> a -> a

    -- Potentially added to complete the boolean algebra
    false :: a
    true  :: a

class (Boolean a) => Bits a where
    ...

By splitting off a Boolean class, we can now make an instance for ByteString for boolean logic, and it also leaves the Bits class as clearly pertaining to bit fields, with more clearly relevant functions.

It is still very rough, mostly being a bunch of notes, but I think there is a good point here - what do you think?

That’s all for now - til next time!

romes · September 19, 2023, 7:59pm

Amazing work Leo, thanks for all your progress reports, and community proposal!

jhenahan · September 20, 2023, 12:49am

ApothecaLabs:

class (Eq a) => Boolean a where

    complement :: a -> a -- aka not

    and :: a -> a -> a
    or  :: a -> a -> a
    xor :: a -> a -> a

    -- Potentially added to complete the boolean algebra
    false :: a
    true  :: a

What, no {-# MINIMAL nand #-}?

hasufell · September 20, 2023, 2:13am

@ApothecaLabs are you getting paid for this work?

If not, then the HF should either fund you directly or ask possible industry users if they would.

The Haskell Community needs this type of work.

jackdk · September 20, 2023, 2:58am

Is the algebraic class you’re looking for a Heyting Algebra, as seen in PureScript?

BurningWitness · September 20, 2023, 5:57am

ByteString does not have a Bits instance.

Bits is a typeclass for integral types (says so on the tin) and arbitrary chunks of memory don’t fit this definition.

It would result in an ergonomic improvement for low-level binary operations

This is only useful when you need to treat memory chunks as finite fields and the only place I know that uses this is cryptography.

Ideally yes, Integer/Natural/Bool do not belong to the sized bitwise operation class, but these are small inconsistencies and burning bridges over them is too much work. Changes like these should just be scheduled for base-5, if the Committee ever decides to go for it.

As for the issues you’re facing, can’t you just use Botan’s multiple precision integers?

ApothecaLabs · September 21, 2023, 6:23pm

Boy howdy there was quite a response to this all!

I have my nose to the grindstone, and am working on responding to all of @david-christiansen 's feedback - it is taking a bit as I want to answer thoroughly.

@romes thx

@jhenahan deep inhale Congratulations sir, you win the internet.

@hasufell Thanks! I’m not getting paid for it yet, but I’d love to do this on a more permanent basis if I were, hence the budget request. Someone from Mercury did appear to reach out a few weeks ago, but I haven’t heard back from them yet.

@jackdk That seems about right!

@BurningWitness

That is indeed why ByteString does not have a valid implementation of Bits - Bits is concerned with representation or encoding, and necessarily includes signing bits and endianness, which affect binary functions like shift and rotate but not boolean functions like xor and complement. My observation is that we could split off a Boolean / Heyting class from Bits with about as much pain as the Semigroup / Monoid debacle, maybe less.

ApothecaLabs · October 3, 2023, 10:41pm

An update - Bcrypt and things

It’s been a little while since I’ve updated the devlog here; for the most part my activity for the last week or two has been focused on the proposal, and I’m just about done with an updated draft that accounts for all of the feedback - I’ll be posting that soon.

In the meantime, I still have been working on the bindings, and I have a lovely little update that I’ve been meaning to post for a few days.

The primary update is that I have:

Fixed Botan.Low.Bcrypt INSUFFICIENT_BUFFER_SPACE exception
- Botan.Low.Bcrypt unit tests now pass
Made improvements to Botan.Bcrypt for easier use
Fixed trailing NUL in bcrypt digests (may have been a regression)

By going through the C++ source, I discovered that there was an unmentioned size pointer check that was being compiled out in most cases, except for tests. In some cases Botan uses size pointers to return the final length of the output, and sometimes it also uses the same pointer to take the output buffer length as an input. It turns out we need to poke the length of the output buffer here, as it was erroneously succeeding before because we were providing a buffer of adequate size but the check was being compiled out.

Now Botan.Bcrypt should be good to go!

In addition to this, I did some work on public key exchange stuff because it is hanging indefinitely:

Began work on Botan.PubKey.KeyAgreement
- Discovered memory leak / hang in Botan.Low.PubKey.KeyAgreement
- Function hangs until terminated by OS (Ctrl-C signal is blocked)
- safe vs unsafe doesn’t seem to matter
- Occurs in botan_pk_op_key_agreement function
  - This is for DH, ECDH, Curve22519 key exchange, so high priority
- Checked some different versions of GHC to make sure its not a version-specific bug, need to test more thoroughly
- Will be going over C++ and Z-Botan source to figure it out

And now for something a little bit unusual - Merkle Trees!

I’ve been working on this project for a while, and I thought it would be good to show a bit of where this cryptography project is headed, long-term, while I finish updating the proposal pull request.

You see, I didn’t just idly decide to spend my time writing bindings to a cryptography library. I want to be able to express some very particular cryptography constructs, and I want to express them properly - and that means starting with a solid foundation. That foundation is now solid enough to begin building some of those constructs.

One of those constructs is Merkle, an abstraction of the core concept behind sparse merkle trees. A merkle tree is a data structure that hashes objects and combines their digests in order to provide integrity and cryptographic proofs for many pieces of data. Such structure form the basis of many distributed cryptosystems but often implementations are highly-concrete, focused on binary optimization rather than ease of understanding, and take pages upon pages to explain and justify everything. That never sat well with me.

Today, I’m here to show you how to build a sparse merkle tree with just a few lines of code, and we will do it in a way that barely mentions cryptography, and allows us to define arbitrary merkle data structures in an incredibly concise manner.

Are you ready to speedrun some merkle trees? We start with our ideal hash class from earlier:

data family Digest a

class Hash a where
    hash :: ByteString -> Digest a
    digest :: Digest a -> ByteString

A sparse merkle set / map is intimately related to a unital magma algebra, or what you might call a non-associative, non-commutative monoid. As such, we can define it by extending the Hash typeclass to Merkle by adding an empty and append function.

class (Hash a) => Merkle a where
    merkleHash :: ByteString -> Digest a
    merkleEmpty :: Digest a
    merkleAppend :: Digest a -> Digest a -> Digest a

One advantage of this approach is that its conceptual reliance on cryptography is low, an implementation detail. This becomes important in the future when I start blending this with indexed recursion schemes because there are more tight mathematical connections there (but we won’t get to that today we’re getting close).

Conceptually, merkleEmpty = hash "", merkleHash = hash, and merkleAppend a b = hash $ digest a <> digest b. However, a naive implementation such as that actually has some problems with what’s called a “preimage attack”. To make a long explanation short, it gives you the same digest for hashing a plain bytestring as you could get from appending two digests, and we need to be able to distinguish between the two cases (that is, force them to give different results).

We can solve this by adding salts / sigils, and differentiating between hash and merkleHash, and giving safe default implementations.

class (Hash a) => Merkle a where

    merkleHash :: ByteString -> Digest a
    default merkleHash :: (Hash a) => ByteString -> Digest a
    merkleHash = safeMerkleHash

    merkleEmpty :: Digest a
    default merkleEmpty :: (Hash a) => Digest a
    merkleEmpty = safeMerkleEmpty

    merkleAppend :: Digest a -> Digest a -> Digest a
    default merkleAppend :: (Hash a) => Digest a -> Digest a -> Digest a
    merkleAppend = safeMerkleAppend

safeMerkleHash :: (Hash a) => ByteString -> Digest a
safeMerkleHash bs = hash $ concat
    [ "#"
    , bs
    , ";"
    ]

safeMerkleEmpty :: (Hash a) => Digest a
safeMerkleEmpty =  hash "" -- Or all zeroes, or Nothing

safeMerkleAppend :: (Hash a) => Digest a -> Digest a -> Digest a
safeMerkleAppend a b | a == safeMerkleEmpty = b
safeMerkleAppend a b | b == safeMerkleEmpty = a
safeMerkleAppend a b = hash $ concat
    [ "$"
    , digest a
    , ":"
    , digest b
    , ";"
    ]

Note that because of the sigils, it is impossible to obtain any value from merkleEmpty or merkleAppend that you would from merkleHash, though one must take care to use merkleHash now (I should differentiate Digest and MerkleDigest to emphasize this).

So… what can we do with this unital magma hash / algebra?

Well, the first thing that comes to mind are folds, which we can use to trivially produce a merkle list or hash chain. Digests like these are often used for data integrity in things like torrents and distributed systems.

-- NOTE: Left-associative to match function application
infixl 5 <#>

(<#>) :: (Merkle a) => Digest a -> Digest a -> Digest a
(<#>) = merkleAppend

merkleFoldl :: (Merkle a, Foldable f) => f (Digest a) -> Digest a
merkleFoldl = foldl merkleAppend merkleEmpty

merkleFoldr :: (Merkle a, Foldable f) => f (Digest a) -> Digest a
merkleFoldr = foldr merkleAppend merkleEmpty

We’re just calculating the final digest here, but for real use cases we’d keep all of the hashes.

Now, on to the main thing - sparse merkle trees. Hash lists have been mostly supplanted by merkle trees, especially sparse ones, but they are just binary radix trees, also known as patricia trees. Before we implement our merkle tree, we need a few helper functions.

partitionBit :: (Bits (Digest a)) => Int -> [Digest a] -> ([Digest a],[Digest a])
partitionBit n = List.partition (not . (`testBit` n))

both :: (Bifunctor f) => (a -> b) -> f a a -> f b b
both f = bimap f f

Are you ready? Here we go! 6 lines!

merkleSetDigest :: forall a . (Merkle a) => [Digest a] -> Digest a
merkleSetDigest = go (digestSize (Proxy :: Proxy a) - 1) where
    go _ [] = merkleEmpty
    go _ [x] = x
    go n (x:x':xs) | x == x' = go n (x:xs) -- Deduplication
    go n xs = uncurry merkleAppend $ both (go (n - 1)) (partitionBit n xs)

It is trivial to obtain a sparse merkle set digest using unital magma hashing. Note that deduplication is unncessary if the inputs are already unique. If we ignore the type declaration and the deduplication, there are really only 4 lines that matter. Concise

So what’s going on? As we traverse the list of elements, we partition them by each bit, which naturally sorts all of the digests, and as we come back up, we perform a bunch of merkleAppends to join them. Note that an empty node is the empty hash, and that appending two empty hashes results in an empty hash, and appending an empty hash to anything results in that thing. This handles all of our sparseness for us automatically, with no extra work.

Now, this just calculates the final digest, what if we wanted to actually keep the entire tree structure?

merkleSetTree :: forall a . (Merkle a) => [Digest a] -> (Digest a, [(Digest a, (Digest a, Digest a))])
merkleSetTree = go (digestSize (Proxy :: Proxy a) - 1) where
    go _ [] = (merkleEmpty, [])
    go _ [x] = (x, [])
    go n (x:x':xs) | x == x' = go n (x:xs) -- Deduplication
    go n xs = uncurry join $ both (go (n - 1)) (partitionBit n xs)
    join (l,ls) (r,rs) = let top = merkleAppend l r in (top, (top, (l,r)) : (ls ++ rs))

In this version, we actually return the full merkle tree. Note the return type is effectively (TopDigest, [(ParentDigest, (LeftChildDigest, RightChildDigest))]), and that merkleSetDigest = fst . merkleSetTree. By simple convention, terminal nodes are implicit to the tree, and are not included in the list of nodes since their parent already contains them, and they themselves have no children. As a result, merkleSetTree [] = (merkleEmpty, []).

NOTE: One of the neat things about merkle sets as sets of elements is that they actually form a monoid hash, constructed from the unital magma hash. Since a merkle set is a set, we can construct a merkle set tree from elements in any order and get the same digest. This property becomes extremely useful in the future.

Now, this is pretty neat, but we can do one better. What’s better than a merkle set?

A merkle map:

partitionBitMap :: (Bits (Digest a)) => Int -> [(Digest a,b)] -> ([(Digest a,b)],[(Digest a,b)])
partitionBitMap n = List.partition (\ (k,_) -> not (testBit k n))

merkleMapDigest :: forall a . (Merkle a) => [(Digest a, Digest a)] -> Digest a
merkleMapDigest = go (digestSize (Proxy :: Proxy a) - 1) where
    go _ [] = merkleEmpty
    go _ [(k,v)] = merkleAppend k v
    go n (x@(k,_):(k',_):xs) | k == k' = go n (x:xs) -- Deduplication on key
    go n xs = uncurry merkleAppend $ both (go (n - 1)) (partitionBitMap n xs)

And a version that returns the full merkle map:

merkleMapTree :: forall a . (Merkle a) => [(Digest a, Digest a)] -> (Digest a, [(Digest a, (Digest a, Digest a))])
merkleMapTree = go (digestSize (Proxy :: Proxy a) - 1) where
    go _ []  = (merkleEmpty, [])
    go _ [x@(k,v)] = let h = merkleAppend k v in (h, [(h,x)])
    go n (x@(k,_):(k',_):xs) | k == k' = go n (x:xs) -- Deduplication on key
    go n xs = uncurry join $ both (go (n - 1)) (partitionBitMap n xs)
    join (l,ls) (r,rs) = let top = merkleAppend l r in (top, (top, (l,r)) : (ls ++ rs))

Note that merkleMapDigest = fst . merkleMapTree, and that although the return type appears the same as merkleSetTree, they have slightly different conventions and handling. Note that terminal nodes are now included in the list, because their children are their key and value digest, but to find out if you’ve reached the end, you have to look for the key in the list and not find it.

That’s a bit awkward, but we’ve sacrificed heavily in the name of conciseness here. In the real world we’d probably use a a more complex but also more performant implementation. These super short implementations are here to illustrate the simplicity of the concept.

Finally, observe that merkleSetDigest [a,b,c,...] == merkleMapDigest [(a,merkleEmpty),(b,merkleEmpty),(c,merkleEmpty),...] .

import Crypto.Merkle
import Crypto.Hash.SHA3
digests = fmap (merkleHash @SHA3) [ "fee","fi","fo","fum"]
t0 = merkleSetDigest digests
(t1,_) = merkleSetTree digests
pairs = fmap (\ dg -> (dg, merkleEmpty @SHA3)) digests
t2 = merkleMapDigest pairs
(t3,_) = merkleMapTree pairs
-- All of these should return the same value
t0
t1
t2
t3

This is due to the relationship that can also be observed between the Haskell Set and Map data types, and this is not a coincidence. Producing identical digests shows that our implementation is consistent and logical, and in the future, I will be using these merkle sets and maps to build a cryptographic version of Set and Map that automatically handle proof of integrity, inclusion, exclusion, insertion, and deletion - fun! However, in order to protect against preimage attacks, I will also add salts / sigils to the merkle set and map functions, to differentiate them.

This merkle tree stuff can be found in crypto-schemes and the SHA3 from crypto-schemes-botan to test it.

That was a lot, but I’ve been building up this post for a little while, and I’m glad to finally be posting it. I’ll be writing up a much more thorough blog on this topic in the future

romes · October 4, 2023, 7:59am

Absolutely glorious. I am very much looking forward to your full blog posts on this, all of them.

ApothecaLabs · October 7, 2023, 7:17pm

I have a simple but impactful update today.

Early on in the development of these libraries, I adopted a naming convention for the low-level functions in botan-low to help disambiguate them from the pure or higher-level versions of the same functions in botan. This lead to many functions being named fooCtxVerbIO, and in general, it was a bad idea, but I stuck with it because I didn’t want to spend all my time refactoring nomenclature, and I had other things to focus on.

However, lately, it has been increasingly annoying to work with, as it creates a whole bunch of visual noise for no reason, and so this update reverts that naming experiment:

Removed ‘Ctx’ and ‘IO’ from low-level function names
Function names in botan-low match the canonical botan function
This naming convention should be more or less final
Some individual function names do not match exactly
- Mostly public key encryption functions eg encryptCreate instead of pkOpEncryptCreate
botan now imports botan-low qualified a la import qualified Botan.Low.* as Low to disambiguate
Fixed botan-low tests
Fixed botan
Fixed botanium

Function names in botan are unchanged, meaning that they are still a mess, but will undergo a similar naming refactor / simplification in the future.

The update to the proposal is coming soon, but I took a breather for a few days after @Bodigrim 's advice to not worry or rush I’m glad I did because I finally got this done.

ApothecaLabs · October 12, 2023, 9:24pm

Its a day for another update !

Changes are mostly new unit tests for a few modules, and some benchmarks against crypton!

Added unit test for Botan.Low.MAC
Removed CBC-MAC (unsupported)
Added unit test for Botan.Low.HOTP
Started Botan.HOTP
Added botan-low-bench benchmark for testing botan Bcrypt and SHA3 against crypton

Aside from that, the proposal is in its final draft stage, so if you want to give feedback on it, this is your last chance!

ApothecaLabs · October 19, 2023, 6:23pm

Update time!

The repo has been updated with a few more things:

Added unit tests for Botan.Low.KDF
Improvements to Botan.KDF
Added unit tests for Botan.Low.KeyWrap
Improvements to Botan.KeyWrap
Unit test for Botan.Low.PwdHash
Fixed compilation on Linux due to filename case sensitivity (thanks @arybczak )

Slowly but surely we’re getting unit tests done for all of our modules, and we can see what passes, what fails, and what still needs work - it is quite revealing. We still need unit tests for:

PubKey and PubKey.*
RNG
SRP6
TOTP
Utility
X509

I’ve sort of just been going in alphabetical order, but it is nice to see we’re getting near the end

At some point soon I’ll need to split the unit tests into botan-low-tests for testing that the bindings are bound properly, and botan-tests for more exhaustive algorithm-specific tests including test vectors, and I’ll be further breaking them up into individual test targets by primitive too.

So why unit testing now so important?

While writing botan, I proceeded as a series of steps. First I wrote the FFI bindings, then wrote type-safe functions that compiled and ran. I tested in GHCi to make sure that these functions worked for at least one algorithm, but did not ensure that they worked for all algorithms, as doing exhaustive unit testing requires knowing the algorithm-specific quirks for each algorithm, basically going through them one by one.

As a result, many algorithms would throw botan error codes or have edge cases in behavior, and it wasn’t really feasible yet to build higher-level functions if I didn’t know that they would be valid paths for all algorithms. Now that the core of botan-low is properly implemented, I’ve focused on writing unit tests, and it’s been catching all sorts of things. A recent example of this are the offline SIV and CCM ciphers, which do not support online processing but simply return garbage - this was caught via the unit tests.

As I’ve gone through these tests, I’ve also been cataloguing which algorithms are available by combing through the C++ source, and testing and showing that they work for all algorithms - and documenting which ones don’t. This has lead to a large improvement in the representation of what algorithms are available, effectively shaping our higher-level algorithm data types in botan. It might not be noticeable if you hardcode a specific, commonly-used algorithm, but it has a large impact if there is a choice of algorithms.

Plus, there are good reasons for doing unit tests now, instead of later:

It shows us what is repetitive, unergonomic, and needs to be pushed to a higher level
It exposes missing functions (eg, MAC nonce size function)
It gives us the content for writing tutorials
It confirms / denies documentation
We can apply what we learned to shape botan, eg the structure of our ADTs

Basically, it helps us plan our next move more clearly. There’s enough code now that it is no longer trivial to ensure that changes here don’t break things there, and so unit tests help us keep moving forward.

So what’s next? After working on the proposal for so long, I need a moment to reorient my focus back to the code, but I’ve definitely got some specific goals:

Finishing up these unit tests
Getting to that large botan-bindings PR that has breaking changes
CI stuff and build instructions
Better functions for handling botan-low and botan algorithm names
Cataloguing what’s remaining to get botan-low ready for hackage candidate

Onwards!

ApothecaLabs · October 25, 2023, 4:35pm

Update: Unit Tests and Pubkey fixes

Once again, the repo has been updated, this time with a good number of things.

RNG unit tests
Improvements to SRP6
- Added synonyms for all the bytestring arguments
- Fixed srp6ClientAgree
SRP6 unit tests
- Documented function srp6_group_identifier missing from Botan FFI
TOTP unit tests
Improvements to Utility
- Replaced size calculations with pointer querying to fix InsufficientBufferSpace exception
Utility unit tests
Improvements to PubKey
- Found authoritative source of algorithm names and parameters
- Fixed GOST-34.10
- Removed X25519 (redundant synonym for Curve25519)
- Renamed awkward flag data types
PubKey unit tests
PubKey.Encrypt unit tests
PubKey.Decrypt unit tests
PubKey.Sign unit tests
PubKey.Verify unit tests
Improvements to PubKey.Sign
- ~~Fixed signFinish to use allocBytesQuerying~~
- Fixed signFinish to use upper bound and then trim
Found Botan’s test vectors in the C++ source

Almost all of the basic unit tests are done now, and though there are algorithm combos that fail, things are more or less passing in general. There are still some PubKey functions that need testing:

Key agreement
Key encapsulation
Algorithm-specific key loading functions

X509 certificates also don’t have unit tests yet, but I already know I’m going to need to take a deeper look at them so I’m holding off on that and considering it to be a larger, but important issue.

The big thing today, aside from all of the unit tests, is that pubkey encryption, decryption, signing, verification are now all working for all algorithms, though a bit fiddly because of parameters.

Signing was actually slightly broken, and was in some cases producing signatures that would fail verification. A closer look and some experimentation showed that sizes weren’t correct, which was slightly unexpected behavior given that many other functions allow you to query the size pointer by providing a null pointer for input.

In retrospect I realized that the size was non-deterministic due to the random generator. Querying once and running it again with the queried size would actually result in a different random number being generated internally, and thus the occasional InsufficientBufferSpace exception - which is why I couldn’t use querying for these functions (or any other function with non-deterministic buffers, really).

Armed with this knowledge, I figured out that I needed to both poke a size upper bound into a particular pointer, and read its actual size afterwards, and that pretty much got it working properly. Viola!

The Proposal Has been Approved for Recommendation

In some good news that actually happened last Friday, the TWG committee has voted to approve the proposal for recommendation to the Haskell Foundation.

This is a recommendation and not a commitment, and the Foundation still has to accept (or reject) it, but they are meeting Thursday, and this item is on the agenda! I obviously have high hopes, but I’ve made it this far with everyone’s help and feedback, and either way I’m happy to have had your support.

'Til next time!

ApothecaLabs · October 27, 2023, 7:02pm

Unit tests for botan-low minimally complete (minus X509)

It has been a busy few days, and I am happy to state today that unit tests for botan-low are minimally complete! I divvied up up the remaining unit tests into similar groups, and managed to get through them all, including a few fixes for key agreement and key encapsulation, which were the final pubkey operations.

Added unit tests for privKeyGetField and pubKeyGetField in Botan.Low.PubKeySpec
Added unit tests for PubKey.DH, DSA, ElGamal, and RSA
Added unit tests for PubKey.ECDH, ECDSA, and SM2
Added unit tests for PubKey.Ed25519, X25519
Added fixes and unit tests for PubKey.KeyAgreement
Added fixes and unit tests for PubKey.KeyEncapsulation

Admittedly, there’s still lots of testing improvements to be made, but every primitive should be passing tests for at least one algorithm, and most of them for several or all. Its awkward to use, but it all seems to be working, aside from a few specific things that need a more dedicated focus (which we are getting closer to now that these tests are done).

After writing these tests, I am also thinking about making a change in responsibilities regarding botan-low vs botan - making botan-low be more responsible for all of the algorithm data types. It’s hard to get 100% algorithm combination testing coverage using only string data types, because not everything is compatible, and the tests help us tell what is - so algorithm ADTs are almost necessary to do better testing.

This would be a slight departure from the proposal already, but not a great one, and I believe the proposal process acknowledges this sort of change anyhow. I’m sure it will all shake loose as things develop.

The repo has been updated.

Also I just noticed I goofed on a commit message as I pasted the previous message for format but forgot to edit it - the second commit entitled Added unit tests for PubKey DH, DSA, ElGamal, and RSA should be entitled Added unit tests for PubKey ECDH, ECDSA, and SM2 - pay this no mind

arybczak · October 27, 2023, 11:07pm

I am also thinking about making a change in responsibilities regarding botan-low vs botan - making botan-low be more responsible for all of the algorithm data types.

For the record, I like that botan-low API currently simply mimics botan’s FFI API by accepting strings and I think it should stay this way for mirroring and flexibility purposes.

Higher level API with ADTs should be a layer over it.

Ambrose · October 28, 2023, 4:06am

100%. C embeds in Haskell pretty much perfectly (except CBV structs ), so verbatim C bindings are crucial.

I even just call the module C. Like SDL.GPU.C.

Than the layer on top that abstracts over memory allocation and bracketing and other idioms, I call Simple.

ApothecaLabs · October 28, 2023, 5:29am

In that case, rather than give more responsibility to botan-low, I need to take some away and give it to botan instead.

There is a power-responsibility mismatch as over the last few months, botan-low has accrued a few more responsibilities than I’d like (online vs offline processing, unit testing, size queries) that make it more than just managing buffers. I need the strong typing to be able to make some necessary distinctions between algorithms, but moving those items up into botan would suffice instead of moving the types down.

ApothecaLabs · November 4, 2023, 7:45pm

An update: Starting work on X509 support

This is exciting! My first update after this project’s proposal was accepted for funding!

So what have I been up to? X509 certificates! Support for this functionality in the Botan C FFI is a little sparse, so I’ve been going over everything to see what’s what in particular. There’s actually quite a bit of functionality missing compared to the full Botan C++ library, and its a section of this project that I know I’ll be writing a little bit of C++ which may get contributed upstream. I’ve started adding lots of notes and data types, and X509 is probably going to be my primary focus for a bit while I regroup over some things and plan out some of the next month’s work.

X509 basic unit tests
- Testing against a single simple cert for now
- CRL unit tests pending
Stub data types for X509 Distinguished Names, CRL, Status codes, CA, CRL codes, CS

All in all though, the basic certificate loading and queries seem to be functional, but there is no support for Certificate Revocation Lists aside from checking against pre-existing CRLs that can’t be created or modified, and no support for Certificate Authorities or Stores - not in the FFI, which is why we’ll be getting our hands dirty with C++, to add to the C FFI.

The good thing is that what’s there does work, and that’s nice to check off the list. I’ve been digging through the Botan C++ source to see what else is needed - identifying gaps to fill. TLS is probably also going to need similar work, but I’ve been preparing for this

That’s it for the moment - 'til next time!

ApothecaLabs · November 7, 2023, 5:40pm

Update: Lots of C++

For the last few days, I’ve been working on extending Botan’s FFI to include X509 Certificate Authorities and Certificate Signing Requests. I’m a bit rusty with the C++, but I’m shaking off that rust quite nicely. Modern C++ has some nice things going for it, if you aren’t burdened with a legacy codebase.

I’ve hit a (temporary) wall with what I can achieve in my attempts to extend Botan’s FFI by adding C++ to the Haskell project directly. Now, this isn’t really bad, as it is something I very much expected, but the practical result is that I just need to spend a bit of time setting up build flags and things earlier than expected.

The problem is that Botan does not expose the proper headers to access the Botan_FFI C++ namespace, and as a result, I cannot obtain a reference to the encapsulated C++ objects which I need in order to pass it to the C++ functions that I am newly wrapping / calling.

For example, I have a Haskell PrivKeyPtr / botan_privkey_t and need to access the Botan::Private_Key underneath. I can see botan_privkey_t exposed in the FFI headers, and I have access to Botan::Private_Key via the public C++ headers, but I can only translate between the two if I have access to Botan’s internal Botan_FFI headers, which I don’t have.

-- Haskell
foreign import ccall unsafe hs_botan_x509_ca_create
    :: Ptr X509CAPtr
    -> X509CertPtr
    -> PrivKeyPtr
    -> Ptr CChar
    -> RNGPtr
    -> IO BotanErrorCode

// C FFI
int hs_botan_x509_ca_create(
    hs_botan_x509_ca_t* ca,
    botan_x509_cert_t cert,
    botan_privkey_t key,
    const char* hash_fn,
    botan_rng_t rng);

// C++
X509_CA X509_CA(
    const X509_Certificate &cert,
    const Private_Key &key,
    const std::string &hash_fn,
    RandomNumberGenerator &rng);

Now, this doesn’t stop me from implementing new C structs for wrapping C++ objects, nor does it stop me from using the old ones, but the issue is I can’t mix the two - the opaque botan_struct pointers remain opaque, despite the functions that need that access.

I could just re-implement botan_privkey_t as hs_botan_privkey_t and then be access its Botan::Private_Key for use with hs_botan_x509_ca_create / X509_CA(...), but then I can’t use that new privkey struct in the old functions. I’d have to rewrite the entire FFI from scratch, or at the very least clone the entire botan/ffi directory into cbits which I do not want to do.

The only real option is to fork Botan and extend the FFI from inside Botan C++. However, now you see why this is not really a problem - we were planning on contributing these improvements back upstream anyway, its just that we have to get set up to do that now. Oh bother

As a result of this, I am gating the extended FFI behind the XFFI flag. Anyone who wants to test the experimental FFI can instead clone the experimental botan-upstream fork fork, build and install it from source, and use the XFFI flag to enable the experimental FFI modules. There are some instructions in the README, and it looks mostly like this:

# Clone
git clone https://github.com/apotheca/botan-upstream $BOTAN_CPP

# Build and install C++
cd $BOTAN_CPP
./configure.py --prefix=$BOTAN_OUT
make
make install

# Play around with it
cd $BOTAN_HASKELL
cabal repl botan-low -fXFFI --extra-lib-dirs=$BOTAN_OUT/lib --extra-include-dirs=$BOTAN_OUT/include

I’m still setting up everything, but I will be moving the C++ FFI code from cbits to the botan-upstream fork. This leaves regular users unaffected, and you can continue to use a regular Botan install via your favorite package manager / distribution if you wish.

For the moment the botan-upstream fork is unmodified, so there’s not much point to it yet, but that will be changing quite soon.

That’s all for now!