Still love following these updates!
I hope I’ll have time to do a read through of the code bases before they are finished, to maybe add a pair of fresh eyes focusing on usability and documentation.
Still love following these updates!
I hope I’ll have time to do a read through of the code bases before they are finished, to maybe add a pair of fresh eyes focusing on usability and documentation.
Its time for a minor update! This update is light on code, so we’ll go a little heavy on the details instead.
In our last update, we were implementing online / incremental cipher encryption.
Nominally, everything seemed okay, and for the most part, it was / is, though I’m still working on online decryption in order to verify everything. Since I have already been dealing with a few issues with the Botan FFI (for example, the flaws in botan_cipher_output_length
from the previous update), I like to be thorough, and because of that I noticed something odd, which has lead to a day or two of investigating. It turns out that output_length
may not be the only ill-behaved botan_cipher_*
FFI function.
botan_cipher_update
is already somewhat awkward to use, in that the same function is used for encrypting and decrypting, as well as updating and finalizing. This is in part due to a conflation here of many structures in botan (block vs variable length ciphers, encryption vs authenticated encryption vs AEAD). Well, it turns out that it is a little more than just awkward - it may be incorrectly implemented for certain algorithms, in that online vs offline processing yields ciphertext of different lengths - but only for SIV and CCM.
I noticed this when I checked for parity between online and offline processing, and found that the online ciphertext was not equal, and was 7 bytes longer. I was alarmed for a while, as I was using SIV as my test case, and so I spent a bit wondering what I was doing wrong before realizing that it was algorithm-specific, and I just happened to have been using the affected algorithm. Naively, this implies that SIV / CCM are broken, but there may be some unknown characteristic of the algorithm that yields a different ciphertext if it is processed in chunks.
Further investigation has yielded that the extra ciphertext length is related to the number of processed non-final update chunks - I noticed that 7 regular updates + 1 final update yields 7 extra bytes, so I upped the test message length from 8 * g (ideal granularity) to 16 * g to test, and the result was that the online / incremental ciphertext now was 15 bytes longer than the offline. I then tested with 4 * g to confirm the formula of n - 1 extra bytes for a message of n * g where there are n updates of g-bytes-or-less. Since the length estimate does not include this extra length, for large enough messages the buffer overruns and throws a botan exception, but again only for online processing.
This issue is not a blocker, as it only affects SIV / CCM and the are cipher and AEAD modes are unaffected (and offline decryption works just fine!), but it warrants further investigation. I’m not exactly overjoyed to find that there are more flaws in the FFI, though
On the other hand, specific to AEAD, we need the tag to validate - we should not use any of the plaintext if the tag is invalid, which we only find out at the end of processing, which means we must finishing processing before we do anything with the result. Therefore online cipher processing may be of lesser value than initially thought*, at least for AEAD / non-stream ciphers. See usage note for Cipher.finish Cipher Modes — Botan
* This is due to the FFI requiring attaching the tag to the end of the ciphertext. A ciphertext could be verified before decrypting if the schema is Encrypt-then-MAC, and thus find online processing useful. This is geting down into the weeds though
Anyway, that’s what I’ve been working through for the last few days.
Another small update! , but a good one.
At long last, after a week or two of tedious testing, online processing for ciphers is complete! The knowledge learned will hopefully be useful in online processing for other cryptographic primitives, but for now I’m just glad to have these issues resolved:
After all the feedback, I think I can consider the community proposal to be at its final draft. I will likely be making an official proposal submission to the Haskell Foundation on Monday.
That is all for now! See you next update!
Is the online processing of SIV / CCM not working an FFI thing, or is it just Botan that does not support it?
And follow up question if it’s the latter… do you know if the Botan people know that the current implementation has this issue?
It is a property of the SIV and CCM algorithms themselves - I dug into a NIST paper Structural Classification of Authenticated Encryption Schemes to be sure.
They are both ‘MAC-then-Encrypt’ AEAD algorithms, which mean that they must first process the plaintext once to authenticate it, and then a second time to encrypt it. This property is one of the reasons that ‘Encrypt-then-MAC’ AEAD algorithms are usually considered preferable.
The only real error is that botan returns garbage here and does not return an error code / throw an exception if you try anyway
It’s time for another update!
The first thing is something to celebrate - I have officially submitted the cryptography community project proposal to the Haskell Foundation - this proposal has been a few weeks in the making, so many thanks to all who provided feedback
Getting the community proposal written and implementing online / incremental processing for ciphers has been my focus for the last few weeks, and now that they’re both done I need to take a moment and figure out what to work on next. I have several options:
botan-low
botan-bindings
botan-low
& botan
I am probably going to go for unit tests, or merging the pull request, because it is better to get the lower level stuff done first.
In the mean time, while I’m deciding that, I like to like to work on the higher-level abstractions a bit to see how practical my lower-level decisions have been. With the recent addition of online processing to the mix, that has got me thinking about two things:
The impetus is that existing efforts for cryptography typeclasses seem to focus on the nitty gritty mechanics of cryptography processing. Look at crypton/ite’s HashAlgorithm (or the (Block)Cipher classes), and the functions used to operate on them:
-- crypton/ite's HashAlgorithm class
class HashAlgorithm alg where
hashBlockSize :: ...
hashDigestSize :: ...
hashInternalContextSize :: ...
hashInternalInit :: ...
hashInternalUpdate :: ...
hashInternalFinalize :: ...
It just concretizes how it implemented incremental hashing, rather than abstracting hashing in general. I think we can build a much better typeclass hierarchy for cryptographic primitives, by analyzing the commonalities of the algorithm-specific functions.
You’d expect that a typeclass abstraction of hashing would be something much simpler, less restrictive, like:
-- An ideal Hash typeclass
class Hash alg where
hash :: ByteString -> Digest alg
We can also just use the same interface but with lazy bytestrings for incremental processsing, and leave how the incremental processing is achieved up to the instance. This is significant because different crypto libraries may expose incremental processing differently (eg API differences), so we mustn’t assume anything about how it performs incremental processing.
-- The only difference is that it takes lazy bytestrings
class (Hash alg) => IncrementalHash alg where
hash :: LazyByteString -> Digest alg
Then on top of these we can build more typeclasses specific to block sizes and updates and finalizing.
The Cipher typeclasses aren’t much better; the Cipher
class actually only concerns itself with initialization, without defining anything about encryption / decryption. The BlockCipher class slightly improves on this, by providing individual functions for each block cipher mode, but that conflates all modes as a requirement, plus that means that the class should be called BlockCipherMode
. Furthermore, AEADs and Stream ciphers are just sort of left to implement algorithm-specific encrypt / decrypt functions on their own. Other crypto primitives do not have classes.
Don’t get me wrong, extended typeclasses for block-based and incremental hashing / ciphers is great, but we shouldn’t use that as a base typeclass nor assume that all algorithms are capable of such things. Streaming hashes and ciphers do not have (input) block sizes and can take arbitrarily small inputs, but not all algorithms can do this (eg, block-based hashes and ciphers).
I’ve sketched more out in a file but its mostly just musings and scratchwork - I haven’t even tried to compile it yet - but a consistent interface has developed so we’re getting somewhere, and just about anything would be an improvement over the current situation.
We all know there’s some awkward stuff in base
that could be improved, and Bits
is one of them - second only to Num
in awkwardness, the Bits
class is too large, and unnecessarily lumps boolean algebra functions together with bit field functions. As a result, Bits
is often not properly implemented, to the point that there is no proper instance for ByteString
, and I believe this negatively affects the low-level-binary Haskell ecosystem in general.
This particular pet peeve is something that’s been churning in my mind for a while, and it has only gotten more relevant with the cryptography project. I believe we can fix these things by reworking things a little, and so I’ve started working on a base proposal to split off a Boolean
typeclass from Bits
, something like:
class (Eq a) => Boolean a where
complement :: a -> a -- aka not
and :: a -> a -> a
or :: a -> a -> a
xor :: a -> a -> a
-- Potentially added to complete the boolean algebra
false :: a
true :: a
class (Boolean a) => Bits a where
...
By splitting off a Boolean
class, we can now make an instance for ByteString
for boolean logic, and it also leaves the Bits
class as clearly pertaining to bit fields, with more clearly relevant functions.
It is still very rough, mostly being a bunch of notes, but I think there is a good point here - what do you think?
That’s all for now - til next time!
Amazing work Leo, thanks for all your progress reports, and community proposal!
What, no {-# MINIMAL nand #-}
?
@ApothecaLabs are you getting paid for this work?
If not, then the HF should either fund you directly or ask possible industry users if they would.
The Haskell Community needs this type of work.
ByteString
does not have aBits
instance.
Bits
is a typeclass for integral types (says so on the tin) and arbitrary chunks of memory don’t fit this definition.
It would result in an ergonomic improvement for low-level binary operations
This is only useful when you need to treat memory chunks as finite fields and the only place I know that uses this is cryptography.
Ideally yes, Integer
/Natural
/Bool
do not belong to the sized bitwise operation class, but these are small inconsistencies and burning bridges over them is too much work. Changes like these should just be scheduled for base-5
, if the Committee ever decides to go for it.
As for the issues you’re facing, can’t you just use Botan’s multiple precision integers?
Boy howdy there was quite a response to this all!
I have my nose to the grindstone, and am working on responding to all of @david-christiansen 's feedback - it is taking a bit as I want to answer thoroughly.
@romes thx
@jhenahan deep inhale Congratulations sir, you win the internet.
@hasufell Thanks! I’m not getting paid for it yet, but I’d love to do this on a more permanent basis if I were, hence the budget request. Someone from Mercury did appear to reach out a few weeks ago, but I haven’t heard back from them yet.
@jackdk That seems about right!
That is indeed why ByteString
does not have a valid implementation of Bits
- Bits
is concerned with representation or encoding, and necessarily includes signing bits and endianness, which affect binary functions like shift
and rotate
but not boolean functions like xor
and complement
. My observation is that we could split off a Boolean / Heyting
class from Bits
with about as much pain as the Semigroup / Monoid
debacle, maybe less.
It’s been a little while since I’ve updated the devlog here; for the most part my activity for the last week or two has been focused on the proposal, and I’m just about done with an updated draft that accounts for all of the feedback - I’ll be posting that soon.
In the meantime, I still have been working on the bindings, and I have a lovely little update that I’ve been meaning to post for a few days.
The primary update is that I have:
Botan.Bcrypt
for easier useBy going through the C++ source, I discovered that there was an unmentioned size pointer check that was being compiled out in most cases, except for tests. In some cases Botan uses size pointers to return the final length of the output, and sometimes it also uses the same pointer to take the output buffer length as an input. It turns out we need to poke the length of the output buffer here, as it was erroneously succeeding before because we were providing a buffer of adequate size but the check was being compiled out.
Now Botan.Bcrypt
should be good to go!
In addition to this, I did some work on public key exchange stuff because it is hanging indefinitely:
Botan.PubKey.KeyAgreement
Botan.Low.PubKey.KeyAgreement
safe
vs unsafe
doesn’t seem to matterbotan_pk_op_key_agreement
function
I’ve been working on this project for a while, and I thought it would be good to show a bit of where this cryptography project is headed, long-term, while I finish updating the proposal pull request.
You see, I didn’t just idly decide to spend my time writing bindings to a cryptography library. I want to be able to express some very particular cryptography constructs, and I want to express them properly - and that means starting with a solid foundation. That foundation is now solid enough to begin building some of those constructs.
One of those constructs is Merkle
, an abstraction of the core concept behind sparse merkle trees. A merkle tree is a data structure that hashes objects and combines their digests in order to provide integrity and cryptographic proofs for many pieces of data. Such structure form the basis of many distributed cryptosystems but often implementations are highly-concrete, focused on binary optimization rather than ease of understanding, and take pages upon pages to explain and justify everything. That never sat well with me.
Today, I’m here to show you how to build a sparse merkle tree with just a few lines of code, and we will do it in a way that barely mentions cryptography, and allows us to define arbitrary merkle data structures in an incredibly concise manner.
Are you ready to speedrun some merkle trees? We start with our ideal hash class from earlier:
data family Digest a
class Hash a where
hash :: ByteString -> Digest a
digest :: Digest a -> ByteString
A sparse merkle set / map is intimately related to a unital magma
algebra, or what you might call a non-associative, non-commutative monoid. As such, we can define it by extending the Hash
typeclass to Merkle
by adding an empty
and append
function.
class (Hash a) => Merkle a where
merkleHash :: ByteString -> Digest a
merkleEmpty :: Digest a
merkleAppend :: Digest a -> Digest a -> Digest a
One advantage of this approach is that its conceptual reliance on cryptography is low, an implementation detail. This becomes important in the future when I start blending this with indexed recursion schemes because there are more tight mathematical connections there (but we won’t get to that today we’re getting close).
Conceptually, merkleEmpty = hash ""
, merkleHash = hash
, and merkleAppend a b = hash $ digest a <> digest b
. However, a naive implementation such as that actually has some problems with what’s called a “preimage attack”. To make a long explanation short, it gives you the same digest for hashing a plain bytestring as you could get from appending two digests, and we need to be able to distinguish between the two cases (that is, force them to give different results).
We can solve this by adding salts / sigils, and differentiating between hash
and merkleHash
, and giving safe default implementations.
class (Hash a) => Merkle a where
merkleHash :: ByteString -> Digest a
default merkleHash :: (Hash a) => ByteString -> Digest a
merkleHash = safeMerkleHash
merkleEmpty :: Digest a
default merkleEmpty :: (Hash a) => Digest a
merkleEmpty = safeMerkleEmpty
merkleAppend :: Digest a -> Digest a -> Digest a
default merkleAppend :: (Hash a) => Digest a -> Digest a -> Digest a
merkleAppend = safeMerkleAppend
safeMerkleHash :: (Hash a) => ByteString -> Digest a
safeMerkleHash bs = hash $ concat
[ "#"
, bs
, ";"
]
safeMerkleEmpty :: (Hash a) => Digest a
safeMerkleEmpty = hash "" -- Or all zeroes, or Nothing
safeMerkleAppend :: (Hash a) => Digest a -> Digest a -> Digest a
safeMerkleAppend a b | a == safeMerkleEmpty = b
safeMerkleAppend a b | b == safeMerkleEmpty = a
safeMerkleAppend a b = hash $ concat
[ "$"
, digest a
, ":"
, digest b
, ";"
]
Note that because of the sigils, it is impossible to obtain any value from merkleEmpty
or merkleAppend
that you would from merkleHash
, though one must take care to use merkleHash
now (I should differentiate Digest
and MerkleDigest
to emphasize this).
So… what can we do with this unital magma
hash / algebra?
Well, the first thing that comes to mind are folds, which we can use to trivially produce a merkle list
or hash chain. Digests like these are often used for data integrity in things like torrents and distributed systems.
-- NOTE: Left-associative to match function application
infixl 5 <#>
(<#>) :: (Merkle a) => Digest a -> Digest a -> Digest a
(<#>) = merkleAppend
merkleFoldl :: (Merkle a, Foldable f) => f (Digest a) -> Digest a
merkleFoldl = foldl merkleAppend merkleEmpty
merkleFoldr :: (Merkle a, Foldable f) => f (Digest a) -> Digest a
merkleFoldr = foldr merkleAppend merkleEmpty
We’re just calculating the final digest here, but for real use cases we’d keep all of the hashes.
Now, on to the main thing - sparse merkle trees. Hash lists have been mostly supplanted by merkle trees, especially sparse ones, but they are just binary radix trees, also known as patricia trees. Before we implement our merkle tree, we need a few helper functions.
partitionBit :: (Bits (Digest a)) => Int -> [Digest a] -> ([Digest a],[Digest a])
partitionBit n = List.partition (not . (`testBit` n))
both :: (Bifunctor f) => (a -> b) -> f a a -> f b b
both f = bimap f f
Are you ready? Here we go! 6 lines!
merkleSetDigest :: forall a . (Merkle a) => [Digest a] -> Digest a
merkleSetDigest = go (digestSize (Proxy :: Proxy a) - 1) where
go _ [] = merkleEmpty
go _ [x] = x
go n (x:x':xs) | x == x' = go n (x:xs) -- Deduplication
go n xs = uncurry merkleAppend $ both (go (n - 1)) (partitionBit n xs)
It is trivial to obtain a sparse merkle set digest using unital magma hashing. Note that deduplication is unncessary if the inputs are already unique. If we ignore the type declaration and the deduplication, there are really only 4 lines that matter. Concise
So what’s going on? As we traverse the list of elements, we partition them by each bit, which naturally sorts all of the digests, and as we come back up, we perform a bunch of merkleAppend
s to join them. Note that an empty node is the empty hash, and that appending two empty hashes results in an empty hash, and appending an empty hash to anything results in that thing. This handles all of our sparseness for us automatically, with no extra work.
Now, this just calculates the final digest, what if we wanted to actually keep the entire tree structure?
merkleSetTree :: forall a . (Merkle a) => [Digest a] -> (Digest a, [(Digest a, (Digest a, Digest a))])
merkleSetTree = go (digestSize (Proxy :: Proxy a) - 1) where
go _ [] = (merkleEmpty, [])
go _ [x] = (x, [])
go n (x:x':xs) | x == x' = go n (x:xs) -- Deduplication
go n xs = uncurry join $ both (go (n - 1)) (partitionBit n xs)
join (l,ls) (r,rs) = let top = merkleAppend l r in (top, (top, (l,r)) : (ls ++ rs))
In this version, we actually return the full merkle tree. Note the return type is effectively (TopDigest, [(ParentDigest, (LeftChildDigest, RightChildDigest))])
, and that merkleSetDigest = fst . merkleSetTree
. By simple convention, terminal nodes are implicit to the tree, and are not included in the list of nodes since their parent already contains them, and they themselves have no children. As a result, merkleSetTree [] = (merkleEmpty, [])
.
NOTE: One of the neat things about merkle sets as sets of elements is that they actually form a monoid hash, constructed from the unital magma hash. Since a merkle set is a set, we can construct a merkle set tree from elements in any order and get the same digest. This property becomes extremely useful in the future.
Now, this is pretty neat, but we can do one better. What’s better than a merkle set?
A merkle map:
partitionBitMap :: (Bits (Digest a)) => Int -> [(Digest a,b)] -> ([(Digest a,b)],[(Digest a,b)])
partitionBitMap n = List.partition (\ (k,_) -> not (testBit k n))
merkleMapDigest :: forall a . (Merkle a) => [(Digest a, Digest a)] -> Digest a
merkleMapDigest = go (digestSize (Proxy :: Proxy a) - 1) where
go _ [] = merkleEmpty
go _ [(k,v)] = merkleAppend k v
go n (x@(k,_):(k',_):xs) | k == k' = go n (x:xs) -- Deduplication on key
go n xs = uncurry merkleAppend $ both (go (n - 1)) (partitionBitMap n xs)
And a version that returns the full merkle map:
merkleMapTree :: forall a . (Merkle a) => [(Digest a, Digest a)] -> (Digest a, [(Digest a, (Digest a, Digest a))])
merkleMapTree = go (digestSize (Proxy :: Proxy a) - 1) where
go _ [] = (merkleEmpty, [])
go _ [x@(k,v)] = let h = merkleAppend k v in (h, [(h,x)])
go n (x@(k,_):(k',_):xs) | k == k' = go n (x:xs) -- Deduplication on key
go n xs = uncurry join $ both (go (n - 1)) (partitionBitMap n xs)
join (l,ls) (r,rs) = let top = merkleAppend l r in (top, (top, (l,r)) : (ls ++ rs))
Note that merkleMapDigest = fst . merkleMapTree
, and that although the return type appears the same as merkleSetTree, they have slightly different conventions and handling. Note that terminal nodes are now included in the list, because their children are their key and value digest, but to find out if you’ve reached the end, you have to look for the key in the list and not find it.
That’s a bit awkward, but we’ve sacrificed heavily in the name of conciseness here. In the real world we’d probably use a a more complex but also more performant implementation. These super short implementations are here to illustrate the simplicity of the concept.
Finally, observe that merkleSetDigest [a,b,c,...] == merkleMapDigest [(a,merkleEmpty),(b,merkleEmpty),(c,merkleEmpty),...]
.
import Crypto.Merkle
import Crypto.Hash.SHA3
digests = fmap (merkleHash @SHA3) [ "fee","fi","fo","fum"]
t0 = merkleSetDigest digests
(t1,_) = merkleSetTree digests
pairs = fmap (\ dg -> (dg, merkleEmpty @SHA3)) digests
t2 = merkleMapDigest pairs
(t3,_) = merkleMapTree pairs
-- All of these should return the same value
t0
t1
t2
t3
This is due to the relationship that can also be observed between the Haskell Set
and Map
data types, and this is not a coincidence. Producing identical digests shows that our implementation is consistent and logical, and in the future, I will be using these merkle sets and maps to build a cryptographic version of Set
and Map
that automatically handle proof of integrity, inclusion, exclusion, insertion, and deletion - fun! However, in order to protect against preimage attacks, I will also add salts / sigils to the merkle set and map functions, to differentiate them.
This merkle tree stuff can be found in crypto-schemes
and the SHA3
from crypto-schemes-botan
to test it.
That was a lot, but I’ve been building up this post for a little while, and I’m glad to finally be posting it. I’ll be writing up a much more thorough blog on this topic in the future
Absolutely glorious. I am very much looking forward to your full blog posts on this, all of them.
I have a simple but impactful update today.
Early on in the development of these libraries, I adopted a naming convention for the low-level functions in botan-low
to help disambiguate them from the pure or higher-level versions of the same functions in botan
. This lead to many functions being named fooCtxVerbIO
, and in general, it was a bad idea, but I stuck with it because I didn’t want to spend all my time refactoring nomenclature, and I had other things to focus on.
However, lately, it has been increasingly annoying to work with, as it creates a whole bunch of visual noise for no reason, and so this update reverts that naming experiment:
botan-low
match the canonical botan functionencryptCreate
instead of pkOpEncryptCreate
botan
now imports botan-low
qualified a la import qualified Botan.Low.* as Low
to disambiguatebotan-low
testsbotan
botanium
Function names in botan
are unchanged, meaning that they are still a mess, but will undergo a similar naming refactor / simplification in the future.
The update to the proposal is coming soon, but I took a breather for a few days after @Bodigrim 's advice to not worry or rush I’m glad I did because I finally got this done.
Its a day for another update !
Changes are mostly new unit tests for a few modules, and some benchmarks against crypton
!
botan-low-bench
benchmark for testing botan
Bcrypt and SHA3 against crypton
Aside from that, the proposal is in its final draft stage, so if you want to give feedback on it, this is your last chance!
The repo has been updated with a few more things:
Slowly but surely we’re getting unit tests done for all of our modules, and we can see what passes, what fails, and what still needs work - it is quite revealing. We still need unit tests for:
PubKey
and PubKey.*
RNG
SRP6
TOTP
Utility
X509
I’ve sort of just been going in alphabetical order, but it is nice to see we’re getting near the end
At some point soon I’ll need to split the unit tests into botan-low-tests
for testing that the bindings are bound properly, and botan-tests
for more exhaustive algorithm-specific tests including test vectors, and I’ll be further breaking them up into individual test targets by primitive too.
So why unit testing now so important?
While writing botan
, I proceeded as a series of steps. First I wrote the FFI bindings, then wrote type-safe functions that compiled and ran. I tested in GHCi to make sure that these functions worked for at least one algorithm, but did not ensure that they worked for all algorithms, as doing exhaustive unit testing requires knowing the algorithm-specific quirks for each algorithm, basically going through them one by one.
As a result, many algorithms would throw botan error codes or have edge cases in behavior, and it wasn’t really feasible yet to build higher-level functions if I didn’t know that they would be valid paths for all algorithms. Now that the core of botan-low
is properly implemented, I’ve focused on writing unit tests, and it’s been catching all sorts of things. A recent example of this are the offline SIV and CCM ciphers, which do not support online processing but simply return garbage - this was caught via the unit tests.
As I’ve gone through these tests, I’ve also been cataloguing which algorithms are available by combing through the C++ source, and testing and showing that they work for all algorithms - and documenting which ones don’t. This has lead to a large improvement in the representation of what algorithms are available, effectively shaping our higher-level algorithm data types in botan
. It might not be noticeable if you hardcode a specific, commonly-used algorithm, but it has a large impact if there is a choice of algorithms.
Plus, there are good reasons for doing unit tests now, instead of later:
botan
, eg the structure of our ADTsBasically, it helps us plan our next move more clearly. There’s enough code now that it is no longer trivial to ensure that changes here don’t break things there, and so unit tests help us keep moving forward.
So what’s next? After working on the proposal for so long, I need a moment to reorient my focus back to the code, but I’ve definitely got some specific goals:
botan-low
and botan
algorithm namesbotan-low
ready for hackage candidateOnwards!
Once again, the repo has been updated, this time with a good number of things.
Almost all of the basic unit tests are done now, and though there are algorithm combos that fail, things are more or less passing in general. There are still some PubKey functions that need testing:
X509 certificates also don’t have unit tests yet, but I already know I’m going to need to take a deeper look at them so I’m holding off on that and considering it to be a larger, but important issue.
The big thing today, aside from all of the unit tests, is that pubkey encryption, decryption, signing, verification are now all working for all algorithms, though a bit fiddly because of parameters.
Signing was actually slightly broken, and was in some cases producing signatures that would fail verification. A closer look and some experimentation showed that sizes weren’t correct, which was slightly unexpected behavior given that many other functions allow you to query the size pointer by providing a null pointer for input.
In retrospect I realized that the size was non-deterministic due to the random generator. Querying once and running it again with the queried size would actually result in a different random number being generated internally, and thus the occasional InsufficientBufferSpace exception - which is why I couldn’t use querying for these functions (or any other function with non-deterministic buffers, really).
Armed with this knowledge, I figured out that I needed to both poke a size upper bound into a particular pointer, and read its actual size afterwards, and that pretty much got it working properly. Viola!
In some good news that actually happened last Friday, the TWG committee has voted to approve the proposal for recommendation to the Haskell Foundation.
This is a recommendation and not a commitment, and the Foundation still has to accept (or reject) it, but they are meeting Thursday, and this item is on the agenda! I obviously have high hopes, but I’ve made it this far with everyone’s help and feedback, and either way I’m happy to have had your support.
'Til next time!
It has been a busy few days, and I am happy to state today that unit tests for botan-low
are minimally complete! I divvied up up the remaining unit tests into similar groups, and managed to get through them all, including a few fixes for key agreement and key encapsulation, which were the final pubkey operations.
Admittedly, there’s still lots of testing improvements to be made, but every primitive should be passing tests for at least one algorithm, and most of them for several or all. Its awkward to use, but it all seems to be working, aside from a few specific things that need a more dedicated focus (which we are getting closer to now that these tests are done).
After writing these tests, I am also thinking about making a change in responsibilities regarding botan-low
vs botan
- making botan-low
be more responsible for all of the algorithm data types. It’s hard to get 100% algorithm combination testing coverage using only string data types, because not everything is compatible, and the tests help us tell what is - so algorithm ADTs are almost necessary to do better testing.
This would be a slight departure from the proposal already, but not a great one, and I believe the proposal process acknowledges this sort of change anyhow. I’m sure it will all shake loose as things develop.
The repo has been updated.
Also I just noticed I goofed on a commit message as I pasted the previous message for format but forgot to edit it - the second commit entitled Added unit tests for PubKey DH, DSA, ElGamal, and RSA
should be entitled Added unit tests for PubKey ECDH, ECDSA, and SM2
- pay this no mind
I am also thinking about making a change in responsibilities regarding
botan-low
vsbotan
- makingbotan-low
be more responsible for all of the algorithm data types.
For the record, I like that botan-low API currently simply mimics botan’s FFI API by accepting strings and I think it should stay this way for mirroring and flexibility purposes.
Higher level API with ADTs should be a layer over it.