@ApothecaLabs are you getting paid for this work?
If not, then the HF should either fund you directly or ask possible industry users if they would.
The Haskell Community needs this type of work.
@ApothecaLabs are you getting paid for this work?
If not, then the HF should either fund you directly or ask possible industry users if they would.
The Haskell Community needs this type of work.
ByteString
does not have aBits
instance.
Bits
is a typeclass for integral types (says so on the tin) and arbitrary chunks of memory don’t fit this definition.
It would result in an ergonomic improvement for low-level binary operations
This is only useful when you need to treat memory chunks as finite fields and the only place I know that uses this is cryptography.
Ideally yes, Integer
/Natural
/Bool
do not belong to the sized bitwise operation class, but these are small inconsistencies and burning bridges over them is too much work. Changes like these should just be scheduled for base-5
, if the Committee ever decides to go for it.
As for the issues you’re facing, can’t you just use Botan’s multiple precision integers?
Boy howdy there was quite a response to this all!
I have my nose to the grindstone, and am working on responding to all of @david-christiansen 's feedback - it is taking a bit as I want to answer thoroughly.
@romes thx
@jhenahan deep inhale Congratulations sir, you win the internet.
@hasufell Thanks! I’m not getting paid for it yet, but I’d love to do this on a more permanent basis if I were, hence the budget request. Someone from Mercury did appear to reach out a few weeks ago, but I haven’t heard back from them yet.
@jackdk That seems about right!
That is indeed why ByteString
does not have a valid implementation of Bits
- Bits
is concerned with representation or encoding, and necessarily includes signing bits and endianness, which affect binary functions like shift
and rotate
but not boolean functions like xor
and complement
. My observation is that we could split off a Boolean / Heyting
class from Bits
with about as much pain as the Semigroup / Monoid
debacle, maybe less.
It’s been a little while since I’ve updated the devlog here; for the most part my activity for the last week or two has been focused on the proposal, and I’m just about done with an updated draft that accounts for all of the feedback - I’ll be posting that soon.
In the meantime, I still have been working on the bindings, and I have a lovely little update that I’ve been meaning to post for a few days.
The primary update is that I have:
Botan.Bcrypt
for easier useBy going through the C++ source, I discovered that there was an unmentioned size pointer check that was being compiled out in most cases, except for tests. In some cases Botan uses size pointers to return the final length of the output, and sometimes it also uses the same pointer to take the output buffer length as an input. It turns out we need to poke the length of the output buffer here, as it was erroneously succeeding before because we were providing a buffer of adequate size but the check was being compiled out.
Now Botan.Bcrypt
should be good to go!
In addition to this, I did some work on public key exchange stuff because it is hanging indefinitely:
Botan.PubKey.KeyAgreement
Botan.Low.PubKey.KeyAgreement
safe
vs unsafe
doesn’t seem to matterbotan_pk_op_key_agreement
function
I’ve been working on this project for a while, and I thought it would be good to show a bit of where this cryptography project is headed, long-term, while I finish updating the proposal pull request.
You see, I didn’t just idly decide to spend my time writing bindings to a cryptography library. I want to be able to express some very particular cryptography constructs, and I want to express them properly - and that means starting with a solid foundation. That foundation is now solid enough to begin building some of those constructs.
One of those constructs is Merkle
, an abstraction of the core concept behind sparse merkle trees. A merkle tree is a data structure that hashes objects and combines their digests in order to provide integrity and cryptographic proofs for many pieces of data. Such structure form the basis of many distributed cryptosystems but often implementations are highly-concrete, focused on binary optimization rather than ease of understanding, and take pages upon pages to explain and justify everything. That never sat well with me.
Today, I’m here to show you how to build a sparse merkle tree with just a few lines of code, and we will do it in a way that barely mentions cryptography, and allows us to define arbitrary merkle data structures in an incredibly concise manner.
Are you ready to speedrun some merkle trees? We start with our ideal hash class from earlier:
data family Digest a
class Hash a where
hash :: ByteString -> Digest a
digest :: Digest a -> ByteString
A sparse merkle set / map is intimately related to a unital magma
algebra, or what you might call a non-associative, non-commutative monoid. As such, we can define it by extending the Hash
typeclass to Merkle
by adding an empty
and append
function.
class (Hash a) => Merkle a where
merkleHash :: ByteString -> Digest a
merkleEmpty :: Digest a
merkleAppend :: Digest a -> Digest a -> Digest a
One advantage of this approach is that its conceptual reliance on cryptography is low, an implementation detail. This becomes important in the future when I start blending this with indexed recursion schemes because there are more tight mathematical connections there (but we won’t get to that today we’re getting close).
Conceptually, merkleEmpty = hash ""
, merkleHash = hash
, and merkleAppend a b = hash $ digest a <> digest b
. However, a naive implementation such as that actually has some problems with what’s called a “preimage attack”. To make a long explanation short, it gives you the same digest for hashing a plain bytestring as you could get from appending two digests, and we need to be able to distinguish between the two cases (that is, force them to give different results).
We can solve this by adding salts / sigils, and differentiating between hash
and merkleHash
, and giving safe default implementations.
class (Hash a) => Merkle a where
merkleHash :: ByteString -> Digest a
default merkleHash :: (Hash a) => ByteString -> Digest a
merkleHash = safeMerkleHash
merkleEmpty :: Digest a
default merkleEmpty :: (Hash a) => Digest a
merkleEmpty = safeMerkleEmpty
merkleAppend :: Digest a -> Digest a -> Digest a
default merkleAppend :: (Hash a) => Digest a -> Digest a -> Digest a
merkleAppend = safeMerkleAppend
safeMerkleHash :: (Hash a) => ByteString -> Digest a
safeMerkleHash bs = hash $ concat
[ "#"
, bs
, ";"
]
safeMerkleEmpty :: (Hash a) => Digest a
safeMerkleEmpty = hash "" -- Or all zeroes, or Nothing
safeMerkleAppend :: (Hash a) => Digest a -> Digest a -> Digest a
safeMerkleAppend a b | a == safeMerkleEmpty = b
safeMerkleAppend a b | b == safeMerkleEmpty = a
safeMerkleAppend a b = hash $ concat
[ "$"
, digest a
, ":"
, digest b
, ";"
]
Note that because of the sigils, it is impossible to obtain any value from merkleEmpty
or merkleAppend
that you would from merkleHash
, though one must take care to use merkleHash
now (I should differentiate Digest
and MerkleDigest
to emphasize this).
So… what can we do with this unital magma
hash / algebra?
Well, the first thing that comes to mind are folds, which we can use to trivially produce a merkle list
or hash chain. Digests like these are often used for data integrity in things like torrents and distributed systems.
-- NOTE: Left-associative to match function application
infixl 5 <#>
(<#>) :: (Merkle a) => Digest a -> Digest a -> Digest a
(<#>) = merkleAppend
merkleFoldl :: (Merkle a, Foldable f) => f (Digest a) -> Digest a
merkleFoldl = foldl merkleAppend merkleEmpty
merkleFoldr :: (Merkle a, Foldable f) => f (Digest a) -> Digest a
merkleFoldr = foldr merkleAppend merkleEmpty
We’re just calculating the final digest here, but for real use cases we’d keep all of the hashes.
Now, on to the main thing - sparse merkle trees. Hash lists have been mostly supplanted by merkle trees, especially sparse ones, but they are just binary radix trees, also known as patricia trees. Before we implement our merkle tree, we need a few helper functions.
partitionBit :: (Bits (Digest a)) => Int -> [Digest a] -> ([Digest a],[Digest a])
partitionBit n = List.partition (not . (`testBit` n))
both :: (Bifunctor f) => (a -> b) -> f a a -> f b b
both f = bimap f f
Are you ready? Here we go! 6 lines!
merkleSetDigest :: forall a . (Merkle a) => [Digest a] -> Digest a
merkleSetDigest = go (digestSize (Proxy :: Proxy a) - 1) where
go _ [] = merkleEmpty
go _ [x] = x
go n (x:x':xs) | x == x' = go n (x:xs) -- Deduplication
go n xs = uncurry merkleAppend $ both (go (n - 1)) (partitionBit n xs)
It is trivial to obtain a sparse merkle set digest using unital magma hashing. Note that deduplication is unncessary if the inputs are already unique. If we ignore the type declaration and the deduplication, there are really only 4 lines that matter. Concise
So what’s going on? As we traverse the list of elements, we partition them by each bit, which naturally sorts all of the digests, and as we come back up, we perform a bunch of merkleAppend
s to join them. Note that an empty node is the empty hash, and that appending two empty hashes results in an empty hash, and appending an empty hash to anything results in that thing. This handles all of our sparseness for us automatically, with no extra work.
Now, this just calculates the final digest, what if we wanted to actually keep the entire tree structure?
merkleSetTree :: forall a . (Merkle a) => [Digest a] -> (Digest a, [(Digest a, (Digest a, Digest a))])
merkleSetTree = go (digestSize (Proxy :: Proxy a) - 1) where
go _ [] = (merkleEmpty, [])
go _ [x] = (x, [])
go n (x:x':xs) | x == x' = go n (x:xs) -- Deduplication
go n xs = uncurry join $ both (go (n - 1)) (partitionBit n xs)
join (l,ls) (r,rs) = let top = merkleAppend l r in (top, (top, (l,r)) : (ls ++ rs))
In this version, we actually return the full merkle tree. Note the return type is effectively (TopDigest, [(ParentDigest, (LeftChildDigest, RightChildDigest))])
, and that merkleSetDigest = fst . merkleSetTree
. By simple convention, terminal nodes are implicit to the tree, and are not included in the list of nodes since their parent already contains them, and they themselves have no children. As a result, merkleSetTree [] = (merkleEmpty, [])
.
NOTE: One of the neat things about merkle sets as sets of elements is that they actually form a monoid hash, constructed from the unital magma hash. Since a merkle set is a set, we can construct a merkle set tree from elements in any order and get the same digest. This property becomes extremely useful in the future.
Now, this is pretty neat, but we can do one better. What’s better than a merkle set?
A merkle map:
partitionBitMap :: (Bits (Digest a)) => Int -> [(Digest a,b)] -> ([(Digest a,b)],[(Digest a,b)])
partitionBitMap n = List.partition (\ (k,_) -> not (testBit k n))
merkleMapDigest :: forall a . (Merkle a) => [(Digest a, Digest a)] -> Digest a
merkleMapDigest = go (digestSize (Proxy :: Proxy a) - 1) where
go _ [] = merkleEmpty
go _ [(k,v)] = merkleAppend k v
go n (x@(k,_):(k',_):xs) | k == k' = go n (x:xs) -- Deduplication on key
go n xs = uncurry merkleAppend $ both (go (n - 1)) (partitionBitMap n xs)
And a version that returns the full merkle map:
merkleMapTree :: forall a . (Merkle a) => [(Digest a, Digest a)] -> (Digest a, [(Digest a, (Digest a, Digest a))])
merkleMapTree = go (digestSize (Proxy :: Proxy a) - 1) where
go _ [] = (merkleEmpty, [])
go _ [x@(k,v)] = let h = merkleAppend k v in (h, [(h,x)])
go n (x@(k,_):(k',_):xs) | k == k' = go n (x:xs) -- Deduplication on key
go n xs = uncurry join $ both (go (n - 1)) (partitionBitMap n xs)
join (l,ls) (r,rs) = let top = merkleAppend l r in (top, (top, (l,r)) : (ls ++ rs))
Note that merkleMapDigest = fst . merkleMapTree
, and that although the return type appears the same as merkleSetTree, they have slightly different conventions and handling. Note that terminal nodes are now included in the list, because their children are their key and value digest, but to find out if you’ve reached the end, you have to look for the key in the list and not find it.
That’s a bit awkward, but we’ve sacrificed heavily in the name of conciseness here. In the real world we’d probably use a a more complex but also more performant implementation. These super short implementations are here to illustrate the simplicity of the concept.
Finally, observe that merkleSetDigest [a,b,c,...] == merkleMapDigest [(a,merkleEmpty),(b,merkleEmpty),(c,merkleEmpty),...]
.
import Crypto.Merkle
import Crypto.Hash.SHA3
digests = fmap (merkleHash @SHA3) [ "fee","fi","fo","fum"]
t0 = merkleSetDigest digests
(t1,_) = merkleSetTree digests
pairs = fmap (\ dg -> (dg, merkleEmpty @SHA3)) digests
t2 = merkleMapDigest pairs
(t3,_) = merkleMapTree pairs
-- All of these should return the same value
t0
t1
t2
t3
This is due to the relationship that can also be observed between the Haskell Set
and Map
data types, and this is not a coincidence. Producing identical digests shows that our implementation is consistent and logical, and in the future, I will be using these merkle sets and maps to build a cryptographic version of Set
and Map
that automatically handle proof of integrity, inclusion, exclusion, insertion, and deletion - fun! However, in order to protect against preimage attacks, I will also add salts / sigils to the merkle set and map functions, to differentiate them.
This merkle tree stuff can be found in crypto-schemes
and the SHA3
from crypto-schemes-botan
to test it.
That was a lot, but I’ve been building up this post for a little while, and I’m glad to finally be posting it. I’ll be writing up a much more thorough blog on this topic in the future
Absolutely glorious. I am very much looking forward to your full blog posts on this, all of them.
I have a simple but impactful update today.
Early on in the development of these libraries, I adopted a naming convention for the low-level functions in botan-low
to help disambiguate them from the pure or higher-level versions of the same functions in botan
. This lead to many functions being named fooCtxVerbIO
, and in general, it was a bad idea, but I stuck with it because I didn’t want to spend all my time refactoring nomenclature, and I had other things to focus on.
However, lately, it has been increasingly annoying to work with, as it creates a whole bunch of visual noise for no reason, and so this update reverts that naming experiment:
botan-low
match the canonical botan functionencryptCreate
instead of pkOpEncryptCreate
botan
now imports botan-low
qualified a la import qualified Botan.Low.* as Low
to disambiguatebotan-low
testsbotan
botanium
Function names in botan
are unchanged, meaning that they are still a mess, but will undergo a similar naming refactor / simplification in the future.
The update to the proposal is coming soon, but I took a breather for a few days after @Bodigrim 's advice to not worry or rush I’m glad I did because I finally got this done.
Its a day for another update !
Changes are mostly new unit tests for a few modules, and some benchmarks against crypton
!
botan-low-bench
benchmark for testing botan
Bcrypt and SHA3 against crypton
Aside from that, the proposal is in its final draft stage, so if you want to give feedback on it, this is your last chance!
The repo has been updated with a few more things:
Slowly but surely we’re getting unit tests done for all of our modules, and we can see what passes, what fails, and what still needs work - it is quite revealing. We still need unit tests for:
PubKey
and PubKey.*
RNG
SRP6
TOTP
Utility
X509
I’ve sort of just been going in alphabetical order, but it is nice to see we’re getting near the end
At some point soon I’ll need to split the unit tests into botan-low-tests
for testing that the bindings are bound properly, and botan-tests
for more exhaustive algorithm-specific tests including test vectors, and I’ll be further breaking them up into individual test targets by primitive too.
So why unit testing now so important?
While writing botan
, I proceeded as a series of steps. First I wrote the FFI bindings, then wrote type-safe functions that compiled and ran. I tested in GHCi to make sure that these functions worked for at least one algorithm, but did not ensure that they worked for all algorithms, as doing exhaustive unit testing requires knowing the algorithm-specific quirks for each algorithm, basically going through them one by one.
As a result, many algorithms would throw botan error codes or have edge cases in behavior, and it wasn’t really feasible yet to build higher-level functions if I didn’t know that they would be valid paths for all algorithms. Now that the core of botan-low
is properly implemented, I’ve focused on writing unit tests, and it’s been catching all sorts of things. A recent example of this are the offline SIV and CCM ciphers, which do not support online processing but simply return garbage - this was caught via the unit tests.
As I’ve gone through these tests, I’ve also been cataloguing which algorithms are available by combing through the C++ source, and testing and showing that they work for all algorithms - and documenting which ones don’t. This has lead to a large improvement in the representation of what algorithms are available, effectively shaping our higher-level algorithm data types in botan
. It might not be noticeable if you hardcode a specific, commonly-used algorithm, but it has a large impact if there is a choice of algorithms.
Plus, there are good reasons for doing unit tests now, instead of later:
botan
, eg the structure of our ADTsBasically, it helps us plan our next move more clearly. There’s enough code now that it is no longer trivial to ensure that changes here don’t break things there, and so unit tests help us keep moving forward.
So what’s next? After working on the proposal for so long, I need a moment to reorient my focus back to the code, but I’ve definitely got some specific goals:
botan-low
and botan
algorithm namesbotan-low
ready for hackage candidateOnwards!
Once again, the repo has been updated, this time with a good number of things.
Almost all of the basic unit tests are done now, and though there are algorithm combos that fail, things are more or less passing in general. There are still some PubKey functions that need testing:
X509 certificates also don’t have unit tests yet, but I already know I’m going to need to take a deeper look at them so I’m holding off on that and considering it to be a larger, but important issue.
The big thing today, aside from all of the unit tests, is that pubkey encryption, decryption, signing, verification are now all working for all algorithms, though a bit fiddly because of parameters.
Signing was actually slightly broken, and was in some cases producing signatures that would fail verification. A closer look and some experimentation showed that sizes weren’t correct, which was slightly unexpected behavior given that many other functions allow you to query the size pointer by providing a null pointer for input.
In retrospect I realized that the size was non-deterministic due to the random generator. Querying once and running it again with the queried size would actually result in a different random number being generated internally, and thus the occasional InsufficientBufferSpace exception - which is why I couldn’t use querying for these functions (or any other function with non-deterministic buffers, really).
Armed with this knowledge, I figured out that I needed to both poke a size upper bound into a particular pointer, and read its actual size afterwards, and that pretty much got it working properly. Viola!
In some good news that actually happened last Friday, the TWG committee has voted to approve the proposal for recommendation to the Haskell Foundation.
This is a recommendation and not a commitment, and the Foundation still has to accept (or reject) it, but they are meeting Thursday, and this item is on the agenda! I obviously have high hopes, but I’ve made it this far with everyone’s help and feedback, and either way I’m happy to have had your support.
'Til next time!
It has been a busy few days, and I am happy to state today that unit tests for botan-low
are minimally complete! I divvied up up the remaining unit tests into similar groups, and managed to get through them all, including a few fixes for key agreement and key encapsulation, which were the final pubkey operations.
Admittedly, there’s still lots of testing improvements to be made, but every primitive should be passing tests for at least one algorithm, and most of them for several or all. Its awkward to use, but it all seems to be working, aside from a few specific things that need a more dedicated focus (which we are getting closer to now that these tests are done).
After writing these tests, I am also thinking about making a change in responsibilities regarding botan-low
vs botan
- making botan-low
be more responsible for all of the algorithm data types. It’s hard to get 100% algorithm combination testing coverage using only string data types, because not everything is compatible, and the tests help us tell what is - so algorithm ADTs are almost necessary to do better testing.
This would be a slight departure from the proposal already, but not a great one, and I believe the proposal process acknowledges this sort of change anyhow. I’m sure it will all shake loose as things develop.
The repo has been updated.
Also I just noticed I goofed on a commit message as I pasted the previous message for format but forgot to edit it - the second commit entitled Added unit tests for PubKey DH, DSA, ElGamal, and RSA
should be entitled Added unit tests for PubKey ECDH, ECDSA, and SM2
- pay this no mind
I am also thinking about making a change in responsibilities regarding
botan-low
vsbotan
- makingbotan-low
be more responsible for all of the algorithm data types.
For the record, I like that botan-low API currently simply mimics botan’s FFI API by accepting strings and I think it should stay this way for mirroring and flexibility purposes.
Higher level API with ADTs should be a layer over it.
100%. C embeds in Haskell pretty much perfectly (except CBV structs ), so verbatim C bindings are crucial.
I even just call the module C
. Like SDL.GPU.C
.
Than the layer on top that abstracts over memory allocation and bracketing and other idioms, I call Simple
.
In that case, rather than give more responsibility to botan-low
, I need to take some away and give it to botan
instead.
There is a power-responsibility mismatch as over the last few months, botan-low
has accrued a few more responsibilities than I’d like (online vs offline processing, unit testing, size queries) that make it more than just managing buffers. I need the strong typing to be able to make some necessary distinctions between algorithms, but moving those items up into botan
would suffice instead of moving the types down.
This is exciting! My first update after this project’s proposal was accepted for funding!
So what have I been up to? X509 certificates! Support for this functionality in the Botan C FFI is a little sparse, so I’ve been going over everything to see what’s what in particular. There’s actually quite a bit of functionality missing compared to the full Botan C++ library, and its a section of this project that I know I’ll be writing a little bit of C++ which may get contributed upstream. I’ve started adding lots of notes and data types, and X509 is probably going to be my primary focus for a bit while I regroup over some things and plan out some of the next month’s work.
All in all though, the basic certificate loading and queries seem to be functional, but there is no support for Certificate Revocation Lists aside from checking against pre-existing CRLs that can’t be created or modified, and no support for Certificate Authorities or Stores - not in the FFI, which is why we’ll be getting our hands dirty with C++, to add to the C FFI.
The good thing is that what’s there does work, and that’s nice to check off the list. I’ve been digging through the Botan C++ source to see what else is needed - identifying gaps to fill. TLS is probably also going to need similar work, but I’ve been preparing for this
That’s it for the moment - 'til next time!
For the last few days, I’ve been working on extending Botan’s FFI to include X509 Certificate Authorities and Certificate Signing Requests. I’m a bit rusty with the C++, but I’m shaking off that rust quite nicely. Modern C++ has some nice things going for it, if you aren’t burdened with a legacy codebase.
I’ve hit a (temporary) wall with what I can achieve in my attempts to extend Botan’s FFI by adding C++ to the Haskell project directly. Now, this isn’t really bad, as it is something I very much expected, but the practical result is that I just need to spend a bit of time setting up build flags and things earlier than expected.
The problem is that Botan does not expose the proper headers to access the Botan_FFI
C++ namespace, and as a result, I cannot obtain a reference to the encapsulated C++ objects which I need in order to pass it to the C++ functions that I am newly wrapping / calling.
For example, I have a Haskell PrivKeyPtr / botan_privkey_t
and need to access the Botan::Private_Key
underneath. I can see botan_privkey_t
exposed in the FFI headers, and I have access to Botan::Private_Key
via the public C++ headers, but I can only translate between the two if I have access to Botan’s internal Botan_FFI
headers, which I don’t have.
-- Haskell
foreign import ccall unsafe hs_botan_x509_ca_create
:: Ptr X509CAPtr
-> X509CertPtr
-> PrivKeyPtr
-> Ptr CChar
-> RNGPtr
-> IO BotanErrorCode
// C FFI
int hs_botan_x509_ca_create(
hs_botan_x509_ca_t* ca,
botan_x509_cert_t cert,
botan_privkey_t key,
const char* hash_fn,
botan_rng_t rng);
// C++
X509_CA X509_CA(
const X509_Certificate &cert,
const Private_Key &key,
const std::string &hash_fn,
RandomNumberGenerator &rng);
Now, this doesn’t stop me from implementing new C structs for wrapping C++ objects, nor does it stop me from using the old ones, but the issue is I can’t mix the two - the opaque botan_struct
pointers remain opaque, despite the functions that need that access.
I could just re-implement botan_privkey_t
as hs_botan_privkey_t
and then be access its Botan::Private_Key
for use with hs_botan_x509_ca_create / X509_CA(...)
, but then I can’t use that new privkey struct in the old functions. I’d have to rewrite the entire FFI from scratch, or at the very least clone the entire botan/ffi
directory into cbits
which I do not want to do.
The only real option is to fork Botan and extend the FFI from inside Botan C++. However, now you see why this is not really a problem - we were planning on contributing these improvements back upstream anyway, its just that we have to get set up to do that now. Oh bother
As a result of this, I am gating the extended FFI behind the XFFI
flag. Anyone who wants to test the experimental FFI can instead clone the experimental botan-upstream fork fork, build and install it from source, and use the XFFI
flag to enable the experimental FFI modules. There are some instructions in the README, and it looks mostly like this:
# Clone
git clone https://github.com/apotheca/botan-upstream $BOTAN_CPP
# Build and install C++
cd $BOTAN_CPP
./configure.py --prefix=$BOTAN_OUT
make
make install
# Play around with it
cd $BOTAN_HASKELL
cabal repl botan-low -fXFFI --extra-lib-dirs=$BOTAN_OUT/lib --extra-include-dirs=$BOTAN_OUT/include
I’m still setting up everything, but I will be moving the C++ FFI code from cbits
to the botan-upstream
fork. This leaves regular users unaffected, and you can continue to use a regular Botan install via your favorite package manager / distribution if you wish.
For the moment the botan-upstream
fork is unmodified, so there’s not much point to it yet, but that will be changing quite soon.
That’s all for now!
So I’ve been doing a lot more tinkering with C++ for the last few days, and have gotten everything set up with the new botan-upstream fork The extended FFI is working, the new C structs now really wrap their C++ counterparts, and the new bindings can all find their symbols which is great. Plus, contributing back upstream to C++ Botan is just a pull request away now, too.
Botan’s FFI tools are reasonably nice to work with, even though they aren’t exposed externally. They have macros and defines for simple tasks, and for safely ‘visiting’ C++ -land from C, and so a lot of it boils down to safely casting between C and C++ as long as the arguments line up. I still have some questions, but for the most part Botan’s FFI follows a consistent formula, which I’ve been able to apply to the new structs and functions.
This has made for some pretty good progress, despite my not having touched C++ for the better part of a decade - I’ve even gotten as far as implementing botan_x509_ca_create
, botan_x509_ca_create_padding
, and botan_x509_ca_destroy
, though it is untested for the moment until I build Botan.Low.X509.CA
Now, it may feel a bit odd to be gallivanting off to C++ -land when there’s still a lot to do in Haskell, but the C++ work to implement better X509 support was both a high-value target, as well as one of the larger “I know this is doable, but I don’t know how hard” tasks. Now I know roughly how hard it will be (and its better to know this early), and I must say I was pleasantly surprised to find that it was less effort than expected so far - I have worked with far worse codebases.
All in all, I’m pretty satisfied with how its going
This update has been building for a few days now, and I’m glad to get it out. You see, you pull on one thread, and things start to unravel - and I can see why the original FFI authors didn’t want to implement more extensive X509 support.
Basic X509 functionality does exist, but is almost entirely limited to loading read-only objects, and I want to be able to use this library to create, sign, revoke certificates, encode / decode PEM and BER/DER formatted objects - so that means getting my hands dirty and doing it myself
It is a bit tedious: to enable one functionality you must implement this other function which requires these data types in order to create these other ones - and and so on until you’re basically forced to implement everything, together. Luckily, that’s our explicit goal, so its no skin off our back!
I’m mostly mirroring the existing C++ API, and after soaking my brain in the X509 C++ source for a few days, I’ve written C FFI data types and function stubs for almost all of the X509 data structures. When I started, the existing X509 data types were just basic read-only Certificates and CRLs - now, we have data types for:
All of the things that you need to generate and sign new certificates! There still are some open questions (namely, returning arrays of things, and some questions of ownership), and I still need to do FFI types and function stubs OCSP support, and there’s some missing functions, but we’re getting there!
One large caveat, is that this is just the stub functions, and now I still have to go through and actually implement them all to call the C++ from C. There’s still a lot of work to do, but I’ve basically started in the middle, between the Haskell and the C++ - and with the bindings set, I can get to work on implementing the Haskell and the C++ together.
On the other hand, I’m basically defining the FFI types, and that’s one of the most important parts of programming. Plus, all of this X509 work is effectively doubling the size of the Botan C FFI (it now accounts for half of the FFI header file!) so I shouldn’t be surprised it’s taking a bit of time - I’ll bet the original FFI wasn’t built in a few weeks either!
These changes have been pushed to the repo and upstream fork.
If you want to check out this recent work you’ll have to clone the experimental botan-upstream fork, build and install it from source, and use the XFFI
flag to enable the experimental FFI modules. There are some instructions in the README, or you can follow along here:
# Clone
git clone https://github.com/apotheca/botan-upstream $BOTAN_CPP
# Build and install C++
cd $BOTAN_CPP
./configure.py --prefix=$BOTAN_OUT
make
make install
# Play around with it
cd $BOTAN_HASKELL
cabal repl botan-low -fXFFI --extra-lib-dirs=$BOTAN_OUT/lib --extra-include-dirs=$BOTAN_OUT/include
That’s all for the moment! Next up, I’ll be fleshing out these function stubs to be vending actual objects that we can interact with!
I might have missed it, so I’m sorry if you’ve already written this somewhere else, but: what’s the eventual plan for this fork? Are you going to be able to upstream these FFI changes?
Yes, I will (eventually) be submitting these FFI changes as a patch / pull request to the original Botan C++ library repo. It is an open issue that I am happy to contribute to.