Botan bindings devlog

Still love following these updates! :heart:

I hope I’ll have time to do a read through of the code bases before they are finished, to maybe add a pair of fresh eyes focusing on usability and documentation.

1 Like

Its time for a minor update! This update is light on code, so we’ll go a little heavy on the details instead. :slight_smile:


In our last update, we were implementing online / incremental cipher encryption.

Nominally, everything seemed okay, and for the most part, it was / is, though I’m still working on online decryption in order to verify everything. Since I have already been dealing with a few issues with the Botan FFI (for example, the flaws in botan_cipher_output_length from the previous update), I like to be thorough, and because of that I noticed something odd, which has lead to a day or two of investigating. It turns out that output_length may not be the only ill-behaved botan_cipher_* FFI function.

botan_cipher_update is already somewhat awkward to use, in that the same function is used for encrypting and decrypting, as well as updating and finalizing. This is in part due to a conflation here of many structures in botan (block vs variable length ciphers, encryption vs authenticated encryption vs AEAD). Well, it turns out that it is a little more than just awkward - it may be incorrectly implemented for certain algorithms, in that online vs offline processing yields ciphertext of different lengths - but only for SIV and CCM.

I noticed this when I checked for parity between online and offline processing, and found that the online ciphertext was not equal, and was 7 bytes longer. I was alarmed for a while, as I was using SIV as my test case, and so I spent a bit wondering what I was doing wrong before realizing that it was algorithm-specific, and I just happened to have been using the affected algorithm. Naively, this implies that SIV / CCM are broken, but there may be some unknown characteristic of the algorithm that yields a different ciphertext if it is processed in chunks.

Further investigation has yielded that the extra ciphertext length is related to the number of processed non-final update chunks - I noticed that 7 regular updates + 1 final update yields 7 extra bytes, so I upped the test message length from 8 * g (ideal granularity) to 16 * g to test, and the result was that the online / incremental ciphertext now was 15 bytes longer than the offline. I then tested with 4 * g to confirm the formula of n - 1 extra bytes for a message of n * g where there are n updates of g-bytes-or-less. Since the length estimate does not include this extra length, for large enough messages the buffer overruns and throws a botan exception, but again only for online processing.

This issue is not a blocker, as it only affects SIV / CCM and the are cipher and AEAD modes are unaffected (and offline decryption works just fine!), but it warrants further investigation. I’m not exactly overjoyed to find that there are more flaws in the FFI, though :confused:

On the other hand, specific to AEAD, we need the tag to validate - we should not use any of the plaintext if the tag is invalid, which we only find out at the end of processing, which means we must finishing processing before we do anything with the result. Therefore online cipher processing may be of lesser value than initially thought*, at least for AEAD / non-stream ciphers. See usage note for Cipher.finish Cipher Modes — Botan

* This is due to the FFI requiring attaching the tag to the end of the ciphertext. A ciphertext could be verified before decrypting if the schema is Encrypt-then-MAC, and thus find online processing useful. This is geting down into the weeds though :grin:

Anyway, that’s what I’ve been working through for the last few days.

4 Likes

Another small update! , but a good one.

Online cipher processing

At long last, after a week or two of tedious testing, online processing for ciphers is complete! The knowledge learned will hopefully be useful in online processing for other cryptographic primitives, but for now I’m just glad to have these issues resolved:

  • Some algorithms were not consuming the full input buffer, even though I was using input blocks of the ideal size.
    • New, safer buffer estimation functions have been implemented
    • Cipher update now handles remaining input properly
  • Several issues regarding buffer sizes in Botan.Low.Cipher online processing have been fixed, and the resulting functions are suitable for use with any online-capable cipher / aead algorithm
  • It has been discovered that SIV, CCM cipher modes do not support online processing, but Botan does not throw an error if you attempt to use them anyway, instead silently yielding an invalid ciphertext. This has been noted.
  • Unit tests for cipher are complete enough for the moment to move on to another module
    • Some tests (correctly) fail for specific algorithms (online processing for SIV, CCM); these tests will ignore the pertinent algorithms in the future.

Community proposal final draft

After all the feedback, I think I can consider the community proposal to be at its final draft. I will likely be making an official proposal submission to the Haskell Foundation on Monday.

That is all for now! See you next update!

4 Likes

Is the online processing of SIV / CCM not working an FFI thing, or is it just Botan that does not support it?

And follow up question if it’s the latter… do you know if the Botan people know that the current implementation has this issue? :thinking:

2 Likes

It is a property of the SIV and CCM algorithms themselves - I dug into a NIST paper Structural Classification of Authenticated Encryption Schemes to be sure.

They are both ‘MAC-then-Encrypt’ AEAD algorithms, which mean that they must first process the plaintext once to authenticate it, and then a second time to encrypt it. This property is one of the reasons that ‘Encrypt-then-MAC’ AEAD algorithms are usually considered preferable.

The only real error is that botan returns garbage here and does not return an error code / throw an exception if you try anyway :grin:

2 Likes

It’s time for another update!

Community proposal submission

The first thing is something to celebrate - I have officially submitted the cryptography community project proposal to the Haskell Foundation - this proposal has been a few weeks in the making, so many thanks to all who provided feedback :grin:

What to work on next

Getting the community proposal written and implementing online / incremental processing for ciphers has been my focus for the last few weeks, and now that they’re both done I need to take a moment and figure out what to work on next. I have several options:

  • Get more unit tests written for botan-low
  • Incremental processing for other things
  • Merge the pull request that refactors botan-bindings
  • Work on the consistency / ergonomics / exported interface for botan-low & botan
  • Flesh out a proper task list to make these decisions easier :slight_smile:

I am probably going to go for unit tests, or merging the pull request, because it is better to get the lower level stuff done first.

Cryptography Abstractions

In the mean time, while I’m deciding that, I like to like to work on the higher-level abstractions a bit to see how practical my lower-level decisions have been. With the recent addition of online processing to the mix, that has got me thinking about two things:

  • Lazy bytestrings which lead me on to a deeper dive into their internals - very satisfying
  • Higher level abstractions, which led me to re-sketch out some cryptography classes

The impetus is that existing efforts for cryptography typeclasses seem to focus on the nitty gritty mechanics of cryptography processing. Look at crypton/ite’s HashAlgorithm (or the (Block)Cipher classes), and the functions used to operate on them:

-- crypton/ite's HashAlgorithm class
class HashAlgorithm alg where
    hashBlockSize :: ...
    hashDigestSize :: ...
    hashInternalContextSize :: ...
    hashInternalInit :: ...
    hashInternalUpdate :: ...
    hashInternalFinalize :: ...

It just concretizes how it implemented incremental hashing, rather than abstracting hashing in general. I think we can build a much better typeclass hierarchy for cryptographic primitives, by analyzing the commonalities of the algorithm-specific functions.

You’d expect that a typeclass abstraction of hashing would be something much simpler, less restrictive, like:

-- An ideal Hash typeclass
class Hash alg where
        hash :: ByteString -> Digest alg

We can also just use the same interface but with lazy bytestrings for incremental processsing, and leave how the incremental processing is achieved up to the instance. This is significant because different crypto libraries may expose incremental processing differently (eg API differences), so we mustn’t assume anything about how it performs incremental processing.

-- The only difference is that it takes lazy bytestrings
class (Hash alg) => IncrementalHash alg where
        hash :: LazyByteString -> Digest alg

Then on top of these we can build more typeclasses specific to block sizes and updates and finalizing.

The Cipher typeclasses aren’t much better; the Cipher class actually only concerns itself with initialization, without defining anything about encryption / decryption. The BlockCipher class slightly improves on this, by providing individual functions for each block cipher mode, but that conflates all modes as a requirement, plus that means that the class should be called BlockCipherMode. Furthermore, AEADs and Stream ciphers are just sort of left to implement algorithm-specific encrypt / decrypt functions on their own. Other crypto primitives do not have classes.

Don’t get me wrong, extended typeclasses for block-based and incremental hashing / ciphers is great, but we shouldn’t use that as a base typeclass nor assume that all algorithms are capable of such things. Streaming hashes and ciphers do not have (input) block sizes and can take arbitrarily small inputs, but not all algorithms can do this (eg, block-based hashes and ciphers).

I’ve sketched more out in a file but its mostly just musings and scratchwork - I haven’t even tried to compile it yet - but a consistent interface has developed so we’re getting somewhere, and just about anything would be an improvement over the current situation.

Boolean typeclass proposal

We all know there’s some awkward stuff in base that could be improved, and Bits is one of them - second only to Num in awkwardness, the Bits class is too large, and unnecessarily lumps boolean algebra functions together with bit field functions. As a result, Bits is often not properly implemented, to the point that there is no proper instance for ByteString, and I believe this negatively affects the low-level-binary Haskell ecosystem in general.

This particular pet peeve is something that’s been churning in my mind for a while, and it has only gotten more relevant with the cryptography project. I believe we can fix these things by reworking things a little, and so I’ve started working on a base proposal to split off a Boolean typeclass from Bits, something like:

class (Eq a) => Boolean a where

    complement :: a -> a -- aka not

    and :: a -> a -> a
    or  :: a -> a -> a
    xor :: a -> a -> a

    -- Potentially added to complete the boolean algebra
    false :: a
    true  :: a

class (Boolean a) => Bits a where
    ...

By splitting off a Boolean class, we can now make an instance for ByteString for boolean logic, and it also leaves the Bits class as clearly pertaining to bit fields, with more clearly relevant functions.

It is still very rough, mostly being a bunch of notes, but I think there is a good point here - what do you think?


That’s all for now - til next time! :slight_smile:

5 Likes

Amazing work Leo, thanks for all your progress reports, and community proposal!

1 Like

What, no {-# MINIMAL nand #-}? :grinning_face_with_smiling_eyes:

3 Likes

@ApothecaLabs are you getting paid for this work?

If not, then the HF should either fund you directly or ask possible industry users if they would.

The Haskell Community needs this type of work.

5 Likes

Is the algebraic class you’re looking for a Heyting Algebra, as seen in PureScript?

2 Likes

ByteString does not have a Bits instance.

Bits is a typeclass for integral types (says so on the tin) and arbitrary chunks of memory don’t fit this definition.

It would result in an ergonomic improvement for low-level binary operations

This is only useful when you need to treat memory chunks as finite fields and the only place I know that uses this is cryptography.


Ideally yes, Integer/Natural/Bool do not belong to the sized bitwise operation class, but these are small inconsistencies and burning bridges over them is too much work. Changes like these should just be scheduled for base-5, if the Committee ever decides to go for it.

As for the issues you’re facing, can’t you just use Botan’s multiple precision integers?

2 Likes

Boy howdy there was quite a response to this all!

I have my nose to the grindstone, and am working on responding to all of @david-christiansen 's feedback - it is taking a bit as I want to answer thoroughly.

@romes thx :grin:

@jhenahan deep inhale Congratulations sir, you win the internet.

@hasufell Thanks! I’m not getting paid for it yet, but I’d love to do this on a more permanent basis if I were, hence the budget request. Someone from Mercury did appear to reach out a few weeks ago, but I haven’t heard back from them yet.

@jackdk That seems about right!

@BurningWitness

That is indeed why ByteString does not have a valid implementation of Bits - Bits is concerned with representation or encoding, and necessarily includes signing bits and endianness, which affect binary functions like shift and rotate but not boolean functions like xor and complement. My observation is that we could split off a Boolean / Heyting class from Bits with about as much pain as the Semigroup / Monoid debacle, maybe less.

An update - Bcrypt and things

It’s been a little while since I’ve updated the devlog here; for the most part my activity for the last week or two has been focused on the proposal, and I’m just about done with an updated draft that accounts for all of the feedback - I’ll be posting that soon.

In the meantime, I still have been working on the bindings, and I have a lovely little update that I’ve been meaning to post for a few days.

The primary update is that I have:

  • Fixed Botan.Low.Bcrypt INSUFFICIENT_BUFFER_SPACE exception
    • Botan.Low.Bcrypt unit tests now pass
  • Made improvements to Botan.Bcrypt for easier use
  • Fixed trailing NUL in bcrypt digests (may have been a regression)

By going through the C++ source, I discovered that there was an unmentioned size pointer check that was being compiled out in most cases, except for tests. In some cases Botan uses size pointers to return the final length of the output, and sometimes it also uses the same pointer to take the output buffer length as an input. It turns out we need to poke the length of the output buffer here, as it was erroneously succeeding before because we were providing a buffer of adequate size but the check was being compiled out. :confounded:

Now Botan.Bcrypt should be good to go!

In addition to this, I did some work on public key exchange stuff because it is hanging indefinitely:

  • Began work on Botan.PubKey.KeyAgreement
    • Discovered memory leak / hang in Botan.Low.PubKey.KeyAgreement
    • Function hangs until terminated by OS (Ctrl-C signal is blocked)
    • safe vs unsafe doesn’t seem to matter
    • Occurs in botan_pk_op_key_agreement function
      • This is for DH, ECDH, Curve22519 key exchange, so high priority
    • Checked some different versions of GHC to make sure its not a version-specific bug, need to test more thoroughly
    • Will be going over C++ and Z-Botan source to figure it out

And now for something a little bit unusual - Merkle Trees!

I’ve been working on this project for a while, and I thought it would be good to show a bit of where this cryptography project is headed, long-term, while I finish updating the proposal pull request.

You see, I didn’t just idly decide to spend my time writing bindings to a cryptography library. I want to be able to express some very particular cryptography constructs, and I want to express them properly - and that means starting with a solid foundation. That foundation is now solid enough to begin building some of those constructs.

One of those constructs is Merkle, an abstraction of the core concept behind sparse merkle trees. A merkle tree is a data structure that hashes objects and combines their digests in order to provide integrity and cryptographic proofs for many pieces of data. Such structure form the basis of many distributed cryptosystems but often implementations are highly-concrete, focused on binary optimization rather than ease of understanding, and take pages upon pages to explain and justify everything. That never sat well with me.

Today, I’m here to show you how to build a sparse merkle tree with just a few lines of code, and we will do it in a way that barely mentions cryptography, and allows us to define arbitrary merkle data structures in an incredibly concise manner.

Are you ready to speedrun some merkle trees? We start with our ideal hash class from earlier:

data family Digest a

class Hash a where
    hash :: ByteString -> Digest a
    digest :: Digest a -> ByteString

A sparse merkle set / map is intimately related to a unital magma algebra, or what you might call a non-associative, non-commutative monoid. As such, we can define it by extending the Hash typeclass to Merkle by adding an empty and append function.

class (Hash a) => Merkle a where
    merkleHash :: ByteString -> Digest a
    merkleEmpty :: Digest a
    merkleAppend :: Digest a -> Digest a -> Digest a

One advantage of this approach is that its conceptual reliance on cryptography is low, an implementation detail. This becomes important in the future when I start blending this with indexed recursion schemes because there are more tight mathematical connections there (but we won’t get to that today :grin: we’re getting close).

Conceptually, merkleEmpty = hash "", merkleHash = hash, and merkleAppend a b = hash $ digest a <> digest b. However, a naive implementation such as that actually has some problems with what’s called a “preimage attack”. To make a long explanation short, it gives you the same digest for hashing a plain bytestring as you could get from appending two digests, and we need to be able to distinguish between the two cases (that is, force them to give different results).

We can solve this by adding salts / sigils, and differentiating between hash and merkleHash, and giving safe default implementations.

class (Hash a) => Merkle a where

    merkleHash :: ByteString -> Digest a
    default merkleHash :: (Hash a) => ByteString -> Digest a
    merkleHash = safeMerkleHash

    merkleEmpty :: Digest a
    default merkleEmpty :: (Hash a) => Digest a
    merkleEmpty = safeMerkleEmpty

    merkleAppend :: Digest a -> Digest a -> Digest a
    default merkleAppend :: (Hash a) => Digest a -> Digest a -> Digest a
    merkleAppend = safeMerkleAppend

safeMerkleHash :: (Hash a) => ByteString -> Digest a
safeMerkleHash bs = hash $ concat
    [ "#"
    , bs
    , ";"
    ]

safeMerkleEmpty :: (Hash a) => Digest a
safeMerkleEmpty =  hash "" -- Or all zeroes, or Nothing

safeMerkleAppend :: (Hash a) => Digest a -> Digest a -> Digest a
safeMerkleAppend a b | a == safeMerkleEmpty = b
safeMerkleAppend a b | b == safeMerkleEmpty = a
safeMerkleAppend a b = hash $ concat
    [ "$"
    , digest a
    , ":"
    , digest b
    , ";"
    ]

Note that because of the sigils, it is impossible to obtain any value from merkleEmpty or merkleAppend that you would from merkleHash, though one must take care to use merkleHash now (I should differentiate Digest and MerkleDigest to emphasize this).

So… what can we do with this unital magma hash / algebra?

Well, the first thing that comes to mind are folds, which we can use to trivially produce a merkle list or hash chain. Digests like these are often used for data integrity in things like torrents and distributed systems.

-- NOTE: Left-associative to match function application
infixl 5 <#>

(<#>) :: (Merkle a) => Digest a -> Digest a -> Digest a
(<#>) = merkleAppend

merkleFoldl :: (Merkle a, Foldable f) => f (Digest a) -> Digest a
merkleFoldl = foldl merkleAppend merkleEmpty

merkleFoldr :: (Merkle a, Foldable f) => f (Digest a) -> Digest a
merkleFoldr = foldr merkleAppend merkleEmpty

We’re just calculating the final digest here, but for real use cases we’d keep all of the hashes.

Now, on to the main thing - sparse merkle trees. Hash lists have been mostly supplanted by merkle trees, especially sparse ones, but they are just binary radix trees, also known as patricia trees. Before we implement our merkle tree, we need a few helper functions.

partitionBit :: (Bits (Digest a)) => Int -> [Digest a] -> ([Digest a],[Digest a])
partitionBit n = List.partition (not . (`testBit` n))

both :: (Bifunctor f) => (a -> b) -> f a a -> f b b
both f = bimap f f

Are you ready? Here we go! 6 lines!

merkleSetDigest :: forall a . (Merkle a) => [Digest a] -> Digest a
merkleSetDigest = go (digestSize (Proxy :: Proxy a) - 1) where
    go _ [] = merkleEmpty
    go _ [x] = x
    go n (x:x':xs) | x == x' = go n (x:xs) -- Deduplication
    go n xs = uncurry merkleAppend $ both (go (n - 1)) (partitionBit n xs)

It is trivial to obtain a sparse merkle set digest using unital magma hashing. Note that deduplication is unncessary if the inputs are already unique. If we ignore the type declaration and the deduplication, there are really only 4 lines that matter. Concise :grin:

So what’s going on? As we traverse the list of elements, we partition them by each bit, which naturally sorts all of the digests, and as we come back up, we perform a bunch of merkleAppends to join them. Note that an empty node is the empty hash, and that appending two empty hashes results in an empty hash, and appending an empty hash to anything results in that thing. This handles all of our sparseness for us automatically, with no extra work.

Now, this just calculates the final digest, what if we wanted to actually keep the entire tree structure?

merkleSetTree :: forall a . (Merkle a) => [Digest a] -> (Digest a, [(Digest a, (Digest a, Digest a))])
merkleSetTree = go (digestSize (Proxy :: Proxy a) - 1) where
    go _ [] = (merkleEmpty, [])
    go _ [x] = (x, [])
    go n (x:x':xs) | x == x' = go n (x:xs) -- Deduplication
    go n xs = uncurry join $ both (go (n - 1)) (partitionBit n xs)
    join (l,ls) (r,rs) = let top = merkleAppend l r in (top, (top, (l,r)) : (ls ++ rs))

In this version, we actually return the full merkle tree. Note the return type is effectively (TopDigest, [(ParentDigest, (LeftChildDigest, RightChildDigest))]), and that merkleSetDigest = fst . merkleSetTree. By simple convention, terminal nodes are implicit to the tree, and are not included in the list of nodes since their parent already contains them, and they themselves have no children. As a result, merkleSetTree [] = (merkleEmpty, []).

NOTE: One of the neat things about merkle sets as sets of elements is that they actually form a monoid hash, constructed from the unital magma hash. Since a merkle set is a set, we can construct a merkle set tree from elements in any order and get the same digest. This property becomes extremely useful in the future.

Now, this is pretty neat, but we can do one better. What’s better than a merkle set?

A merkle map:

partitionBitMap :: (Bits (Digest a)) => Int -> [(Digest a,b)] -> ([(Digest a,b)],[(Digest a,b)])
partitionBitMap n = List.partition (\ (k,_) -> not (testBit k n))

merkleMapDigest :: forall a . (Merkle a) => [(Digest a, Digest a)] -> Digest a
merkleMapDigest = go (digestSize (Proxy :: Proxy a) - 1) where
    go _ [] = merkleEmpty
    go _ [(k,v)] = merkleAppend k v
    go n (x@(k,_):(k',_):xs) | k == k' = go n (x:xs) -- Deduplication on key
    go n xs = uncurry merkleAppend $ both (go (n - 1)) (partitionBitMap n xs)

And a version that returns the full merkle map:

merkleMapTree :: forall a . (Merkle a) => [(Digest a, Digest a)] -> (Digest a, [(Digest a, (Digest a, Digest a))])
merkleMapTree = go (digestSize (Proxy :: Proxy a) - 1) where
    go _ []  = (merkleEmpty, [])
    go _ [x@(k,v)] = let h = merkleAppend k v in (h, [(h,x)])
    go n (x@(k,_):(k',_):xs) | k == k' = go n (x:xs) -- Deduplication on key
    go n xs = uncurry join $ both (go (n - 1)) (partitionBitMap n xs)
    join (l,ls) (r,rs) = let top = merkleAppend l r in (top, (top, (l,r)) : (ls ++ rs))

Note that merkleMapDigest = fst . merkleMapTree, and that although the return type appears the same as merkleSetTree, they have slightly different conventions and handling. Note that terminal nodes are now included in the list, because their children are their key and value digest, but to find out if you’ve reached the end, you have to look for the key in the list and not find it.

That’s a bit awkward, but we’ve sacrificed heavily in the name of conciseness here. In the real world we’d probably use a a more complex but also more performant implementation. These super short implementations are here to illustrate the simplicity of the concept.

Finally, observe that merkleSetDigest [a,b,c,...] == merkleMapDigest [(a,merkleEmpty),(b,merkleEmpty),(c,merkleEmpty),...] .

import Crypto.Merkle
import Crypto.Hash.SHA3
digests = fmap (merkleHash @SHA3) [ "fee","fi","fo","fum"]
t0 = merkleSetDigest digests
(t1,_) = merkleSetTree digests
pairs = fmap (\ dg -> (dg, merkleEmpty @SHA3)) digests
t2 = merkleMapDigest pairs
(t3,_) = merkleMapTree pairs
-- All of these should return the same value
t0
t1
t2
t3

This is due to the relationship that can also be observed between the Haskell Set and Map data types, and this is not a coincidence. Producing identical digests shows that our implementation is consistent and logical, and in the future, I will be using these merkle sets and maps to build a cryptographic version of Set and Map that automatically handle proof of integrity, inclusion, exclusion, insertion, and deletion - fun! However, in order to protect against preimage attacks, I will also add salts / sigils to the merkle set and map functions, to differentiate them.

This merkle tree stuff can be found in crypto-schemes and the SHA3 from crypto-schemes-botan to test it.


That was a lot, but I’ve been building up this post for a little while, and I’m glad to finally be posting it. I’ll be writing up a much more thorough blog on this topic in the future :partying_face:

9 Likes

Absolutely glorious. I am very much looking forward to your full blog posts on this, all of them.

1 Like

I have a simple but impactful update today.

Early on in the development of these libraries, I adopted a naming convention for the low-level functions in botan-low to help disambiguate them from the pure or higher-level versions of the same functions in botan. This lead to many functions being named fooCtxVerbIO, and in general, it was a bad idea, but I stuck with it because I didn’t want to spend all my time refactoring nomenclature, and I had other things to focus on.

However, lately, it has been increasingly annoying to work with, as it creates a whole bunch of visual noise for no reason, and so this update reverts that naming experiment:

  • Removed ‘Ctx’ and ‘IO’ from low-level function names
  • Function names in botan-low match the canonical botan function
  • This naming convention should be more or less final
  • Some individual function names do not match exactly
    • Mostly public key encryption functions eg encryptCreate instead of pkOpEncryptCreate
  • botan now imports botan-low qualified a la import qualified Botan.Low.* as Low to disambiguate
  • Fixed botan-low tests
  • Fixed botan
  • Fixed botanium

Function names in botan are unchanged, meaning that they are still a mess, but will undergo a similar naming refactor / simplification in the future.


The update to the proposal is coming soon, but I took a breather for a few days after @Bodigrim 's advice to not worry or rush :slight_smile: I’m glad I did because I finally got this done.

6 Likes

Its a day for another update !

Changes are mostly new unit tests for a few modules, and some benchmarks against crypton!

  • Added unit test for Botan.Low.MAC
  • Removed CBC-MAC (unsupported)
  • Added unit test for Botan.Low.HOTP
  • Started Botan.HOTP
  • Added botan-low-bench benchmark for testing botan Bcrypt and SHA3 against crypton

Aside from that, the proposal is in its final draft stage, so if you want to give feedback on it, this is your last chance!

1 Like

Update time!

The repo has been updated with a few more things:

  • Added unit tests for Botan.Low.KDF
  • Improvements to Botan.KDF
  • Added unit tests for Botan.Low.KeyWrap
  • Improvements to Botan.KeyWrap
  • Unit test for Botan.Low.PwdHash
  • Fixed compilation on Linux due to filename case sensitivity (thanks @arybczak :smiley: )

Slowly but surely we’re getting unit tests done for all of our modules, and we can see what passes, what fails, and what still needs work - it is quite revealing. We still need unit tests for:

  • PubKey and PubKey.*
  • RNG
  • SRP6
  • TOTP
  • Utility
  • X509

I’ve sort of just been going in alphabetical order, but it is nice to see we’re getting near the end :slight_smile:

At some point soon I’ll need to split the unit tests into botan-low-tests for testing that the bindings are bound properly, and botan-tests for more exhaustive algorithm-specific tests including test vectors, and I’ll be further breaking them up into individual test targets by primitive too.

So why unit testing now so important?

While writing botan, I proceeded as a series of steps. First I wrote the FFI bindings, then wrote type-safe functions that compiled and ran. I tested in GHCi to make sure that these functions worked for at least one algorithm, but did not ensure that they worked for all algorithms, as doing exhaustive unit testing requires knowing the algorithm-specific quirks for each algorithm, basically going through them one by one.

As a result, many algorithms would throw botan error codes or have edge cases in behavior, and it wasn’t really feasible yet to build higher-level functions if I didn’t know that they would be valid paths for all algorithms. Now that the core of botan-low is properly implemented, I’ve focused on writing unit tests, and it’s been catching all sorts of things. A recent example of this are the offline SIV and CCM ciphers, which do not support online processing but simply return garbage - this was caught via the unit tests.

As I’ve gone through these tests, I’ve also been cataloguing which algorithms are available by combing through the C++ source, and testing and showing that they work for all algorithms - and documenting which ones don’t. This has lead to a large improvement in the representation of what algorithms are available, effectively shaping our higher-level algorithm data types in botan. It might not be noticeable if you hardcode a specific, commonly-used algorithm, but it has a large impact if there is a choice of algorithms.

Plus, there are good reasons for doing unit tests now, instead of later:

  • It shows us what is repetitive, unergonomic, and needs to be pushed to a higher level
  • It exposes missing functions (eg, MAC nonce size function)
  • It gives us the content for writing tutorials
  • It confirms / denies documentation
  • We can apply what we learned to shape botan, eg the structure of our ADTs

Basically, it helps us plan our next move more clearly. There’s enough code now that it is no longer trivial to ensure that changes here don’t break things there, and so unit tests help us keep moving forward.

So what’s next? After working on the proposal for so long, I need a moment to reorient my focus back to the code, but I’ve definitely got some specific goals:

  • Finishing up these unit tests
  • Getting to that large botan-bindings PR that has breaking changes
  • CI stuff and build instructions
  • Better functions for handling botan-low and botan algorithm names
  • Cataloguing what’s remaining to get botan-low ready for hackage candidate

Onwards!

4 Likes

Update: Unit Tests and Pubkey fixes

Once again, the repo has been updated, this time with a good number of things.

  • RNG unit tests
  • Improvements to SRP6
    • Added synonyms for all the bytestring arguments
    • Fixed srp6ClientAgree
  • SRP6 unit tests
    • Documented function srp6_group_identifier missing from Botan FFI
  • TOTP unit tests
  • Improvements to Utility
    • Replaced size calculations with pointer querying to fix InsufficientBufferSpace exception
  • Utility unit tests
  • Improvements to PubKey
    • Found authoritative source of algorithm names and parameters
    • Fixed GOST-34.10
    • Removed X25519 (redundant synonym for Curve25519)
    • Renamed awkward flag data types
  • PubKey unit tests
  • PubKey.Encrypt unit tests
  • PubKey.Decrypt unit tests
  • PubKey.Sign unit tests
  • PubKey.Verify unit tests
  • Improvements to PubKey.Sign
    • Fixed signFinish to use allocBytesQuerying
    • Fixed signFinish to use upper bound and then trim
  • Found Botan’s test vectors in the C++ source :slight_smile:

Almost all of the basic unit tests are done now, and though there are algorithm combos that fail, things are more or less passing in general. There are still some PubKey functions that need testing:

  • Key agreement
  • Key encapsulation
  • Algorithm-specific key loading functions

X509 certificates also don’t have unit tests yet, but I already know I’m going to need to take a deeper look at them so I’m holding off on that and considering it to be a larger, but important issue.

The big thing today, aside from all of the unit tests, is that pubkey encryption, decryption, signing, verification are now all working for all algorithms, though a bit fiddly because of parameters. :partying_face:

Signing was actually slightly broken, and was in some cases producing signatures that would fail verification. A closer look and some experimentation showed that sizes weren’t correct, which was slightly unexpected behavior given that many other functions allow you to query the size pointer by providing a null pointer for input.

In retrospect I realized that the size was non-deterministic due to the random generator. Querying once and running it again with the queried size would actually result in a different random number being generated internally, and thus the occasional InsufficientBufferSpace exception - which is why I couldn’t use querying for these functions (or any other function with non-deterministic buffers, really).

Armed with this knowledge, I figured out that I needed to both poke a size upper bound into a particular pointer, and read its actual size afterwards, and that pretty much got it working properly. Viola!

The Proposal Has been Approved for Recommendation

In some good news that actually happened last Friday, the TWG committee has voted to approve the proposal for recommendation to the Haskell Foundation.

This is a recommendation and not a commitment, and the Foundation still has to accept (or reject) it, but they are meeting Thursday, and this item is on the agenda! I obviously have high hopes, but I’ve made it this far with everyone’s help and feedback, and either way I’m happy to have had your support.

'Til next time!

10 Likes

Unit tests for botan-low minimally complete (minus X509)

It has been a busy few days, and I am happy to state today that unit tests for botan-low are minimally complete! :partying_face: I divvied up up the remaining unit tests into similar groups, and managed to get through them all, including a few fixes for key agreement and key encapsulation, which were the final pubkey operations.

  • Added unit tests for privKeyGetField and pubKeyGetField in Botan.Low.PubKeySpec
  • Added unit tests for PubKey.DH, DSA, ElGamal, and RSA
  • Added unit tests for PubKey.ECDH, ECDSA, and SM2
  • Added unit tests for PubKey.Ed25519, X25519
  • Added fixes and unit tests for PubKey.KeyAgreement
  • Added fixes and unit tests for PubKey.KeyEncapsulation

Admittedly, there’s still lots of testing improvements to be made, but every primitive should be passing tests for at least one algorithm, and most of them for several or all. Its awkward to use, but it all seems to be working, aside from a few specific things that need a more dedicated focus (which we are getting closer to now that these tests are done).

After writing these tests, I am also thinking about making a change in responsibilities regarding botan-low vs botan - making botan-low be more responsible for all of the algorithm data types. It’s hard to get 100% algorithm combination testing coverage using only string data types, because not everything is compatible, and the tests help us tell what is - so algorithm ADTs are almost necessary to do better testing.

This would be a slight departure from the proposal already, but not a great one, and I believe the proposal process acknowledges this sort of change anyhow. I’m sure it will all shake loose as things develop.

The repo has been updated.


Also I just noticed I goofed on a commit message as I pasted the previous message for format but forgot to edit it - the second commit entitled Added unit tests for PubKey DH, DSA, ElGamal, and RSA should be entitled Added unit tests for PubKey ECDH, ECDSA, and SM2 - pay this no mind :slight_smile:

3 Likes

I am also thinking about making a change in responsibilities regarding botan-low vs botan - making botan-low be more responsible for all of the algorithm data types.

For the record, I like that botan-low API currently simply mimics botan’s FFI API by accepting strings and I think it should stay this way for mirroring and flexibility purposes.

Higher level API with ADTs should be a layer over it.

2 Likes