Unfortunately, as of the latest version of botan, the tls feature is still not yet exposed by the C-FFI - I think with some gruntwork it could write a C++ shim, but other things are higher priority at the moment.
Botan: Project Recap and Major Ergonomics Changes
I have been a little quiet on the forums, and you know what that means - I’ve been working on something sizeable, and all my energy has been put towards ‘doing’ rather than ‘talking’. I do tend to cycle between focuses, it helps keep things interestnig, and stops me from getting too far into the weeds, as I am often wont to do.
This last month’s focus has been botan, and I am happy to report some significant progress. A quick summary, I’d estimate the refactor to be 40% complete by volume / 60% complete by effort, with the RNG, Hash, BlockCipher, Cipher, and MAC modules more or less done, and the remaining refactor needs being split fairly evenly between the PubKey modules and the collection of miscellaneous specialized algorithms ranging from Bcrypt to ZFEC. Less completion by volume because there is still plenty to do, but more completion by effort because we’re running into fewer and fewer novel requirements, which is both nice and according to plan.
Now, I’ve found myself at a pretty point to stop and do a recap - how we got here, what we’ve done, what we are doing next, and what the future holds - this should be helpful because this thread has grown quite large, spanning multiple years now, and as some readers have pointed out, it can be a bit difficult to follow what precisely is happening.
So, the recap before details on the refactor progress!
Recap: Some history
Haskell has always been subject to the whims of its contributors, for better and for worse. Ten years ago, a significant number of critical packages were dependent on the work of one person, who in a rather explosive exit abandoned them while simultaneously forbidding their takeover.
This left several rather sizeable holes in the Haskell ecosystem, in particular, cryptonite, one of if not the Haskell ecosystem’s main source of cryptographic primitives, and one of its dependencies, memory, also among the set. Nothing went wrong immediately - they were stable packages, after all - but over time Haskell and GHC continued to change and evolve, and these libraries slowly began to suffer from bitrot and aging APIs.
Forks have been made, but without the original author, maintenance is difficult, and the original code is no longer trusted - the tolerance of unverified, handwritten cryptography being predicate on the trust in the author, now absent.
If you really want to get into it, there are the links already provided, and here is a more recent thread covering some discussion of the topic and how it still affects the ecosystem to this day.
Research for a Proposal
My solution to this was to develop some bindings to a suitable popular cryptography library. In the end, I chose to develop bindings to Botan C++, for 3 reasons:
-
It fit our rather specific needs - free and open source, with an appropriate license, good coverage of algorithms, a C FFI, well-supported across multuple architectures, and most importantly, a wide userbase including significant corporate usage, meaning that this library was under active development and could be counted on to receive development for the foreseeable future.
-
It was written in C++, which meant I could dive into the source code and trace things out if I needed - and this was important because at the time, Botan was not very well documented, so there was a lot of that (Botan has greatly improved documentation since this project started, and this project has improved alongside it).
-
There existed an old set of bindings to an out of date version of botan, called
z-botan. This was dependent on a custom base with bespoke wrappers around everything, it used custom C++ to bind to Botan 2 rather than 3, and it didn’t build on a modern setup, so using that codebase was out of the question - but it made for a wonderful template, and gave a good the estimate for the size of the task of producing new bindings.
These choices were significant factors in the development of the resulting bindings, and I’m glad that I took the time to do the research on various candidates and alternatives before settling on Botan.
Even now I am pleased to briefly mention that Botan supports configuring compilation for WASM, so some of my recent work this last month has been towards eg support for building botan-bindings with wasm32-wasi-ghc - not quite ready to show off just yet, but in due time.
Initial project run, botan-bindings and botan-low released
With a well-defined goal, I got to work, and after making some progress, I wrote up a proper proposal which was accepted and funded
A major point of this proposal and project is that the resulting library is actually owned by the Haskell Foundation, specifically to stop the problem of ownership of critical libraries threatening the stability of the ecosystem - in other words, I have given up ownership of the codebase and respository in order to ensure this, though I remain as a significant contributor. (Don’t worry - this was my idea, and it always the plan)
Due to the funding, I was able to work on Botan full-time for several months, and in that time I managed to produce the botan-bindings and botan-low packages, giving us access to a solid and reliable cryptography system.
The low-level bindings were effective, and despite the imperative design, a result of a nearly 1:1 API translation, stable, with ergonomics then-intended to be provided as part of a higher-level interface.
So what did contain?
Well, botan is a comprehensive cryptographic “kitchen sink” - it seeks to present a wide variety of algorithms under a unified interface, allowing the developer to choose what algorithms to employ.
NOTE: This is in comparison to more opinionated libraries like the
libsodium / nacl / saltinefamily, which select a single algorithm per each intended use case - so there’s no configuration, because there aren’t any choices to make.
It of course contains your usual gaggle of suspects:
- Random Number Generators
- Hash
- Block Ciphers
- Cipher
- Message Authentication Codes
- Public Key Encryption & Signing
- Bcrypt
- Key Derivation
- Password Hashing
- One-time passwords
But really, the strength of botan is the sheer number of algorithms it supports - it has all the algorithms that you know and love, and a whole slew of others you’ve probably never heard of. Part of the goal of botan is, if you have botan, you shouldn’t need another cryptographic library, meaning it can serve as a convenient cryptographic foundation for the ecosystem, if we want.
In total, it has 26-ish hash algorithms and 24-ish block cipher algorithms (or more, depending on how you count configurations), and those algorithms then get used as algorithm parameters for higher-level symmetric ciphers and message auth codes, and some of those also have configuration parameters - so you can really customize your cryptographic system if you know what you are doing. It even has post-quantum algorithms which I’ve been eager to get into (I meant to last year, but my health took priority).
It isn’t perfect - specifically, our bindings are limited to the C FFI, and not all features are exposed in the C FFI - chiefly, not all X509 features are available, and the Botan TLS implementation is completely unavailable. There are things that can be done about this - for example writing our own C FFI bindings for the unavailable features - but while desirable, we do have native implementations that rely on the crypton fork, so it is less critical / lower priority.
Unfortunately, sometimes good things must come to an end, and after the initial release, funding ended, and I had take a step back to find another means of employment - and so began a period of quiet maintenance, during which Joris joined as a contributor, and he has done a wonderful job at keeping the day-to-day tasks managed ever since.
In the meantime, the job I took at a fintech company ended up turning into 60-hour weeks of AI-fueled hell, which began impacting my health, and ultimately I was glad to leave after saving up enough. It was not a healthy environment, and it resulted in my abstaining from any form of software engineering for several months as I recuperated, started down my current journey of good health, and figured out if I even still liked programming.
The Project Returns
When Jose called 6 or so months ago, and asked if I was interested in doing some more botan work, I wasn’t sure if I ready to return to programming just yet - I was still feeling really burnt out from the fintech thing, I definitely still needed some rest, but I remembered how much I enjoyed working on Botan on the intial project run, and I decided that I really don’t hate programming, just forced vibe coding and middle managers who think that yelling is acceptable behavior.
So, I set some limits - my health comes first, and I would be returning in a partial capacity, rather than full-time - but I would be returning, and this time as a provisional Haskell Foundation member, with funding for 3 months, with the potential to become a full member with funding for the rest of the year (this did happen about a month ago, so hooray for me for that!).
Setting things up took a few months - but that was okay, it let me continue to focus on improving my health, and it gave me time to slowly get back into the project, which was more than a little bit daunting. I hadn’t touched the code in a while, and there was a lot of it, so the first month or so was mostly spent reading my own code and refamiliarizing myself with the bindings. Seeing it with fresh eyes, was a good reminder of where I last was in the project - lots of old plans, things blocking those plans, higher-level code waiting to be used once unblocked.
Observations
One of the biggest pain points was the bespoke Make / Remake / mkBindings modules that I had built, in order to maintain a zero-dependency footprint. I had tried to refactor it more than once before, but the resulting sprawl of spaghetti served to occlude how things worked more than it provided code re-use.
It was not a bad idea in theory - the original cryptonite followed the same ethos, one of the reasons for its own longevity, even after the absence of its author. However, the original cryptonite did depend on memory, and there being yet another ecosystem discussion thread, this time about memory, I realized that instead of writing memory routines by hand for botan (since I couldn’t use memory), I could take a stab at filling that gap as well - to refactor the binding generators AND to provide a replacement for memory.
To do that, 2 functions were most important: allocInit for allocating FFI-capable pinned bytestrings, and withByteArray for accessing their pointers. And so the memalloc library was borne, a total overhaul of memory that seeks to provide a much stronger set of memory abstractions.
There’s the whole thread about it, but it is basically a categorization of memory allocations, handles, references, pointers, and arrays, that turns eg withByteArray AKA withPtr :: (Storable) a -> (Ptr p -> IO b) -> IO b into a more generalized
class (Memory mem, Memorable memo) => WithMemory mem memo where
withMemory :: memo -> (mem (MemRep memo) -> IO a) -> IO a
memory also separated ByteArrayAccess from ByteArray (Allocation), which really inspired me to tear off allocators as its own class that works with Memory types, a la:
class (Memorable (Allocation alr)) => Allocator alr where
type family Layout alr :: Type
type family Allocation alr :: Type
allocInit
:: alr
-> Layout alr
-> (Mem (Allocation alr) (MemRep (Allocation alr)) -> IO ())
-> IO (Allocation alr)
This is nice, because type WithPtr memo = WithMemory Ptr memo and withPtr = withMemory. It makes using pointers / the C FFI a breeze for any memory-backed data structure, no need to copy or marshal through ByteString - and we can control where and how the memory is allocated, which is a big deal for memory safety and security.
It isn’t published yet, as I’m still testing it out, but it has made refactoring botan-low significantly easier, and cleaner, and it has been immediately useful in replacing all that mkBindings junk, plus it has helped provide a unified BotanObject class for wrapping opaque pointers. So, having proven its utility, it is due for release as a prequisite for this refactor.
I do have this tendency to write a lot of extraneous code for far-off future plans, that usually doesn’t make the first cut - an abstraction of an abstraction and so on - but I always do tuck those bits of code away for another day, and so I often ‘displace’ such explorative code to more speculative libraries. It is always nice to see some of it make the cut, and actually get used.
That’s sort of where we are now - or, where we were as of the last few updates, which have been more or less concerned with the integration of memalloc into botan-low under a botan-low-refactor branch - successfully getting rid of Make / Remake / mkBindings without affecting anything else.
Planning
Refactoring low-level code let me clean things up, which eases the burden of maintenance, but it didn’t change the API. What does change the API, is being unblocked, and looking at your code with fresh eyes - all the easier to see various flaws that you were blind to before, or unable to deal with.
See, the new memalloc support allows us to supply the allocator as an argument - this allows us to use things other than ByteStrings for interacting with the C FFI. However, this would also change the API, which breaks with our original plans where botan-low remains 1:1 with the Botan C FFI - like botan-bindings but with ByteStrings.
In the old plan, allocators would be part of botan(-high), because the allocator is a high-level concept that isn’t part of the Botan C FFI - but we can’t do that, because it would need to be part of the lower-level API first.
So, we can either stay true to the original plan API, or we can break with the API and make a new plan with changes - and since botan-bindings already serves as the 1:1 interface, and since memalloc gives us pointer access without going through ByteStrings, it wasn’t a hard decision - only botan-bindings needs to be 1:1.
This turned out to be the snowball that started an avalanche, as this in turn unblocked a whole host of things that have now made their way into botan-low as part of a significant overhaul of the API.
Ergonomic changes coming down the pipeline
Okay, since botan-low no longer needed to be 1:1, this meant a few things
- We could clean up and fix some of the oddities / inconsistencies in the botan C FFI API
- We can rename a bunch of things for consistency
- We can bring in a bunch of stuff from
botan(-high), like algorithm data types (instead of having to deal with botan’s finnicky algorithm name syntax)
Another thing, the first version of botan-bindings and botan-low were deliberately zero-dependency packages, aside from base and core libraries - that’s right, the bindings only requires base, plus just bytestring, deepseq, and text for low. This is intended to help with churn - zero dependencies means minimal surface for breakage, and stability is important for security. It also means that reasoning about the code and refactoring it isn’t difficult - all that you need to know about is right there.
The zero-dependency philosophy is something we are going to keep, but we are going to bend that rule ever so slightly, because we of course have one new dependency - the new memalloc library that has been developed firstly to replace memory with something having a modern API, and secondly to support refactoring botan-low.
memalloc itself follows the same zero-dependency philosophy, except it includes a few more core libraries as dependencies, which I decided to control via a flag system, that I will be bringing back into botan-low to allow for selectively including support. This makes it easy to control through a cabal.project stanza eg:
package *
flags:
-- Example: These are all on by default, but we can turn them off
-use-array
-use-bytestring
-use-containers
-use-ghc
-use-mtl
-use-primitive
-use-text
-use-vector
Even though they’re not really being used yet, they will be, and we still retain the ability to turn on or off support for dependencies, as to include them safely when they are available. I’ve even done a little prep for better supporting non-GHC compilers eg by starting to gate off GHC-specific code behind a flag that turns on -DUSE_GHC and #if defined(USE_GHC) and so on.
Once in the project, we get to use our new memalloc library for all of our C FFI needs - its very consistent, and as a result, most of the functions in the botan-low-refactor branch (publically available soon!) are some variation on the form:
objectDo :: Object -> arg -> ... -> IO ()
objectDo obj arg ...
= mask_ $ withObject obj $ \ objPtr ->
withArg arg $ \ argPtr ->
-- ...
throwBotanIfNegative_ $ botan_object_do objPtr
(ConstPtr argPtr)
-- ...
where Object is some opaque pointer, nicely managed by the memalloc classes which provide allocation and pointer access. Sometimes we need an alloca here, or an allocaMany @[CSize,...] there, and if we need to return a result, we use throwBotanNegative (no underscore) - all of this a vast improvement over the first-version Make / Remake / mkBindings modules, which was effective but terribly unorganized and a nightmare for maintenance.
Another benefit of the new memory management and allocator functions, most botan functions only require a few lines of code. There’s sometimes extra verbage due to needing to provide the allocator (currently, always ByteStringAllocator, but we’re preparing to expose the allocator as an argument), and we currently need to follow up every allocInit ByteStringAllocator $ \ fPtr with a withPtr fPtr $ \ ptr -> ... - for example:
hashFinalize :: Hash -> IO HashDigest
hashFinalize h = withHash h $ \ hPtr -> do
sz <- hashGetDigestSize h
allocInit ByteStringAllocator sz $ \ fPtr -> do
withPtr fPtr $ \ digestPtr -> do
throwBotanIfNegative_ $ botan_hash_final hPtr digestPtr
But this isn’t too bad, and it’s only because allocInit yields the top ForeignPtr instead of the base Ptr (this is extremely sensible in-context, it is technically correct, the best kind of correct) that we just need to drill down an extra level - basically, we’re just “missing” a convenience function.
So, it is a little boiler-platey at the moment, but highly consistent, and now we don’t have dozens of inscrutible functions named things like mkGetBoolCode_csize or mkWithObjectSetterCBytesLen that we have to look up to remember their exact behaviors - though repeat / common cases such as objectGetName and objectGetKeySpec do get a convenience function tucked away in the internal prelude.
Speaking of, the new internal prelude also helps with readability, and all of the refactored modules are much cleaner now. For example, Botan.Low.Hash imports have gone from this:
import Botan.Bindings.Hash
import Botan.Low.Error.Internal
import Botan.Low.Internal.ByteString
import Botan.Low.Internal.String
import Botan.Low.Make
import Botan.Low.Remake
import Data.ByteString (ByteString)
import Foreign.C.Types
import Foreign.ForeignPtr
import Foreign.Ptr
to this:
import Botan.Low.Internal.Prelude
import Botan.Bindings.Hash
The other refactored modules have seen a similar reduction in noise - basically, after declaring Memorable instances and a newtype wrapper, we get straight to implementing the bindings, no fuss - so, all of the memalloc work was really worth it.
I’m still giving memalloc a shakedown - with something as critical as security and cryptography, it really pays to do it right - but seeing how smooth it’s made this refactoring, I have enough confidence in it now to begin the process of releasing it.
There are a few things I need to move over to memalloc, a few other things that need polish first, but I feel confident now that I’ve refactored most of the important modules, and because of that, I have started pulling in all those higher-level goodies I talked about earlier. Some of this will have been mentioned in recent updates, but we’ll go over everything because it is a recap.
Algorithm Data Types
Probably the biggest deal / change in the API is we no longer require the developer / user to initialize an algorithm context via a specially-formatted algorithm name. I did my best to provide patterns and functions to do this for the user, but even so, passing around (byte)string identifiers is terrible.
-- Ie, this
type BlockCipherName = ByteString
pattern
AES128
, AES192
, AES256
-- ...
:: BlockCipherName
pattern AES128 = BOTAN_BLOCK_CIPHER_AES_128
pattern AES192 = BOTAN_BLOCK_CIPHER_AES_192
pattern AES256 = BOTAN_BLOCK_CIPHER_AES_256
-- ...
-- Has become
data BlockCipherType
= AES128
| AES192
| AES256
-- ...
So, now we have proper data types for algorithms, and their parameters too. This introduces a lot of type safety, and we translate to a bytestring name only when we’re initializing a context:
h <- hashInit SHA3
In retrospect, it was kind of dumb to keep that for the high-level, just so the API could stay 1:1 - I’m glad we have chosen a different path.
Key / Nonce / Size specifiers
So, with that out of the way, there’s another common data type - size specifiers. A lot of algorithms have a botan_foo_keyspec function that returns a triple (Int, Int, Int), which describes the range of sizes of valid keys, as (min, max, mod).
Rather than pass around undocumented tuples, I’ve pulled in the SizeSpec class, calling it that instead of KeySpec because we can use it to describe eg nonce sizes as well.
type role SizeSpec phantom
data SizeSpec a
= SizeEnum Int Int Int -- NOTE: Is min max mod
| SizeList [ Int ] -- INVARIANT: Not empty, in increasing order
| SizeExact Int
deriving stock (Eq, Ord, Show)
So, now blockCipherKeySpec, cipherKeySpec, macKeySpec, etc all return one of these, and there’s a phantom type to track were it came from too.
Static algorithm sizing functions
One of the idiosyncracies of botan is that, despite being statically calculable on a per-algorithm basis, all queries all require a live object - for key specs, nonce block and digest sizes. So, even though we know that SHA3 has a 64-byte digest, we must still initialize a context to ask. This leads to code like this:
enc <- cipherInit ChaCha20Poly1305 Encrypt -- Potentially slow operation
> (_,keySize,_) <- cipherGetKeyspec enc
> nonceSize <- cipherGetDefaultNonceLength enc
Since we now have algorithm data types I’d rather look this up / calculate it from a table, rather than initialize a potentially sizeable context - so I’ve gone and calculated those values, and now we just need to pass in the algorithm type:
blockCipherBlockSize :: BlockCipherType -> Int
blockCipherBlockSize Blowfish = 8
-- ... more algorithms
blockCipherBlockSize Twofish = 16
blockCipherBlockSize SHACAL2 = 32
blockCipherBlockSize Threefish512 = 64
-- Calculated via a function that is not exported
Splitting context vs algorithm functions
Since we now have some functions that take just the algorithm type instead of a full live context, we need to separate the two. The botan FFI is pretty inconsistent in its nomenclature, but it does use get terminology already in a few places, so, now things are named according to this pattern:
-- Context functions are named with 'get'
hashGetName :: Hash -> IO ByteString
hashGetBlockSize :: Hash -> IO Int
hashGetDigestSize :: Hash -> IO Int
h <- hashInit htype
n <- hashGetName h
bsz <- hashGetBlockSize h
dsz <- hashGetDigestSize h
-- Algorithm functions are shorter
hashName :: HashType -> ByteString
hashDigestSize :: HashType -> Int
hashBlockSize :: HashType -> Int
n = hashName htype
bsz = hashBlockSize htype
dsz = hashDigestSize htype
Also, I’ve tried to be more consistent with naming.
Preferred sizes
Botan doesn’t always provide functions for a default or preferred size. This complicated eg key and nonce generation:
rng <- rngInit "user"
(_,keySize,_) <- blockCipherGetKeyspec blockCipher
key <- rngGet rng keySize
Now, algorithms with keys and nonces have sets of functions like this:
cipherPreferredKeySize :: CipherType -> Int
cipherPreferredKeySize (CBC bc _) = blockCipherPreferredKeySize bc
cipherPreferredKeySize (CFB bc _) = blockCipherPreferredKeySize bc
cipherPreferredKeySize (XTS bc) = 2 * blockCipherPreferredKeySize bc
cipherPreferredKeySize ChaCha20Poly1305 = 32
cipherPreferredKeySize (GCM bc128 _) = blockCipherPreferredKeySize bc128.un
cipherPreferredKeySize (OCB bc128 _) = blockCipherPreferredKeySize bc128.un
cipherPreferredKeySize (EAX bc _) = blockCipherPreferredKeySize bc
cipherPreferredKeySize (SIV bc128) = 2 * blockCipherPreferredKeySize bc128.un
cipherPreferredKeySize (CCM bc128 _ _) = blockCipherPreferredKeySize bc128.un
-- NOTE: Uses the system RNG for now, probably should be UserThreadsafe
generateCipherKey :: CipherType -> Int -> IO (Maybe CipherKey)
generateCipherKey c sz | validSize (cipherKeySpec c) sz = Just <$> systemRNGGet sz
generateCipherKey _ _ = pure Nothing
generatePreferredCipherKey :: CipherType -> IO CipherKey
generatePreferredCipherKey c = systemRNGGet (cipherPreferredKeySize c)
I’ve gone and done my best to research preferred key and nonce-sizes where it wasn’t obvious, and tried to match common standards - but it isn’t perfect, so there’s still eg generateKey vs generatePreferredKey.
All in all, this makes generating keys and nonces dead simple, and yet again makes things safer - this is probably my second greatest pain point, the first being the bytestring algorithm names which have also been taken care of!
Easy functions
You can tell where this is going - now that we have functions that take the algorithm type instead of a live context, we can actually hide the context entirely.
So, I’ve gone and done that too:
{-
-- Hashing
> hash SHA3 "Fee fi fo fum"
-}
hash :: HashType -> ByteString -> IO HashDigest
hash htype msg = do
h <- hashInit htype
hashUpdateFinalize h msg
{-
-- Block ciphers
> bc = AES256
> k <- generatePreferredBlockCipherKey bc
> ct <- blockEncrypt bc k msg
> pt <- blockDecrypt bc k ct
-}
blockEncrypt :: BlockCipherType -> BlockCipherKey -> ByteString -> IO BlockCipherText
blockEncrypt bcType key msg = do
bc <- blockCipherInit bcType
blockCipherSetKey bc key
blockCipherEncryptBlocks bc msg
blockDecrypt :: BlockCipherType -> BlockCipherKey -> BlockCipherText -> IO ByteString
blockDecrypt bcType key ct = do
bc <- blockCipherInit bcType
blockCipherSetKey bc key
blockCipherDecryptBlocks bc ct
{-
-- Ciphers
> c = ChaCha23Poly1305
> k <- generatePreferredCipherKey c
> n <- generatePreferredCipherNonce c
> ct <- encrypt c k n msg
> pt <- decrypt c k ct
-}
encrypt :: CipherType -> CipherKey -> CipherNonce -> ByteString -> IO CipherText
encrypt cipherType key nonce msg = do
c <- cipherInit cipherType CipherEncrypt
cipherSetKey c key
cipherStart c nonce
cipherEncrypt c msg
decrypt :: CipherType -> CipherKey -> CipherNonce -> CipherText -> IO ByteString
decrypt cipherType key nonce ct = do
c <- cipherInit cipherType CipherDecrypt
cipherSetKey c key
cipherStart c nonce
cipherDecrypt c ct
{-
-- Message authentication codes
> m = HMAC SHA3
> k <- generatePreferredMACKey m
> n <- generatePreferredMACNonce m
> tag <- mac m k n msg
> valid <- auth m k n msg tag
-}
mac :: MACType -> MACKey -> Maybe MACNonce -> ByteString -> IO MACDigest
mac macType key nonce input = do
m <- macInit macType
macSetKey m key
maybe (pure ()) (macSetNonce m) nonce
-- Can we do this ^^^ more idiomatically? Maybe: traverse_
macUpdate m input
macFinalize m
auth :: MACType -> MACKey -> Maybe MACNonce -> ByteString -> MACDigest -> IO Bool
auth macType key nonce input dg = do
dg' <- mac macType key nonce input
return $ dg' == dg
This makes the API so much easier to use, and I’m not quite done yet - I’m probably going to use unsafePerformIO for deterministic ‘easy’ functions, and a MonadRandom for ones requiring an RNG context - there’s a MonadRandom class I’ve got in the speculative wing - think random:RandomGen except it’s got an associated Distribution type rather than assuming uniform distribution, which really works for eg key specifiers which are technically descriptions of the multi-modal distribution of valid keys - so, obviously useful here.
Next up: Public Key Encryption
So, that’s where we’re at, and these changes are what’s coming down the pipeline. Getting rid of the mkBindings spaghetti has really unblocked things, and our decision to pull more things into botan-low has really improved the interface, and this in turn has made everything easier. memalloc has been so well-behaved, it has been mostly a straight shot of “pick a module, refactor it”, and one by one they are getting refactored.
There is a course some sort of sequence imposed by the structure of botan - ciphers require block ciphers, macs require hashes, etc - but those are done now, and so Public Key Encryption is next.
That’s all for now!
Why not Set Int then? Memory?
While min max mod suffices to describe most cases, sizes aren’t required to follow a simple formula, so in that case I still want to be able to fold over the available sizes in order, like I do with SizeEnum via enumFromThenTo. So supporting a list representation the a fallback is natural.
I’m hoping that the generatePreferred functions make interacting directly with the SizeSpec unnecessary.
I was just wondering, given that invariant, Set felt like a better choice as Data.Setautomatically enforces that invariant.
Wouldn’t Data.Set enforces neither? Sets can be empty, and aren’t ordered.
Did you mean Data.NonEmpty? If so, probably yeah ![]()
Thanks for continual improvement of Haskell cryptography story ![]()
I suggest unconditionally including boot dependencies of GHC (i.e. ones that always get installed alongside it) to not make the work harder than it needs to be for yourself ![]()
That’s probably best. Thanks for the quick reply.
How about:
-- equivalent to `NonEmpty.(:|)`
| SizeList Int [Int] -- ^ SizeList {head} {tail}
Also, thanks for still going at it and writing these lovely reports. It’s always a pleasure to read.
I would also love to see the memalloc library.
Does it have an extensive test suite? It feels like such a fundamental library that it should really be as robust as can be.
And another question: do you think a higher level botan package would even be needed if botan-low will get higher level concept types and convenience functions?
Whew! I needed a few days respite after last week’s writing and editing, to keep my health up.
![]()
I guess this is less important for botan than it is for memalloc. My rationale here is mostly consistency - it makes sense for memalloc because ideally it shouldn’t depend on eg ghc or bytestring, but rather they should depend on it - perhaps one day ![]()
The dependency control flags are all true by default and require turning off manually, so they aren’t unsafe, plus:
- It helps me to organize the code
- I can turn things on and off during feature development
- It helps locate / localize problems
- Better support for non-GHC compilers
But botan does not follow this, it is a genuine downstream dependent, so I suppose it does not have the same needs. Right now I’m just seeing how they feel, but I shall ponder this. The flags do provide value in a holistic context, eg considering it as part of a set of interacting libraries, and they make it easy to control the set consistently.
You’ve sort of hit it dead on - it needs more tests before I’m comfortable declaring ‘release’ or putting it up on ‘hackage’, but I’ll be getting the memalloc repo and the botan-low-refactor branch up soon so everyone can clone, use a cabal.project.local for the local repo, and build it themselves.
Actually, yes! I was wondering myself if I would need to rename botan-low to botan if I’m consolidating / moving all the high-level stuff down to it, but it turns out that it still makes sense to have a high-level botan, because it is an optimal place for per-algorithm data types and cryptographic typeclasses.
This lets botan-low continue to get away with using a simple single eg HashType data type of which eg SHA3 is a constructor, while botan will have a Hash(Algorithm) class, of which eg, SHA3 is a data type that is an instance of Hash(Algorithm).
The cryptographic typeclasses themselves will probably be in a separate cryptographic library explicitly for cryptographic abstractions, of which botan will implement instances - similar to how I pulled out the memory abstractions in memalloc ![]()
PubKey progress, Post-quantum support, and Better versioning
Today’s update focuses on two related things - public key infrastructure, and post-quantum algorithms! The former is basically the same thing we’ve been doing with Hash and MAC and Cipher - upgrading it to the new BotanObject, bringing in algorithm data types, adding more convenient interfaces! I won’t bother pasting everything, just the highlights, because there’s a lot.
PubKey improvements
First off, the new data type for public key (PK) algorithms:
data PKType
= RSAType
| SM2Type
| ElGamalType
| DSAType
| ECDSAType
| ECKCDSAType
| ECGDSAType
| GOST_34_10Type
| Ed25519Type
| Ed448Type
| XMSSType
| DHType
| ECDHType
| X25519Type
| X448Type
| DilithiumType
| ML_DSAType
| KyberType
| ML_KEMType
| McElieceType
| ClassicMcElieceType
| FrodoKEMType
| HSS_LMSType
| SphincsPlusType
| SLH_DSAType
deriving stock (Show, Eq, Ord, Enum, Bounded)
You may note that the constructors are suffixed with -Type unlike our earlier primitives - this is because PKI (public key infrastructure) isn’t handled the same as primitive hashes, ciphers, etc. They usually combine the algorithm and any parameters into a single argument, but PK keeps the algorithm and the parameters as separate arguments - thus, we have a separate PKScheme type:
data PKScheme
= RSA Word32
| SM2 ECGroup
| ElGamal DLGroup
| DSA DLGroup
| ECDSA ECGroup
| ECKCDSA ECGroup
| ECGDSA ECGroup
| GOST_34_10 ECGroup
| Ed25519
| Ed448
| XMSS XMSSParams
| DH DLGroup
| ECDH ECGroup
| X25519
| X448
| Dilithium DilithiumMode
| ML_DSA DilithiumMode
| Kyber KyberMode
| ML_KEM KyberMode
| McEliece Word32 Word32 -- n, t -- TODO: Make a McEliece parameter type
| ClassicMcEliece ClassicMcElieceParams
| FrodoKEM FrodoKEMMode
| HSS_LMS ByteString -- TODO: Make a HSS-LMS parameter type
| SphincsPlus SphincsPlusMode
| SLH_DSA SLH_DSAMode
deriving stock (Show, Eq)
There are also data types for DLGroup, ECGroup, alg-specific params (I’ve omitted the constructors for brevity):
data ECGroup
data DLGroup
data DilithiumMode
data KyberMode
data ClassicMcElieceParams
data FrodoKEMMode
data SphincsPlusMode
data SLH_DSAMode
data XMSSParams
There is of course a function to get the suggested scheme:
pkSuggestedScheme :: PKType -> PKScheme
pkSuggestedScheme RSAType = RSA 3072
pkSuggestedScheme SM2Type = SM2 Sm2p256v1
pkSuggestedScheme ElGamalType = ElGamal MODP_IETF_2048
pkSuggestedScheme DSAType = DSA DSA_BOTAN_2048
pkSuggestedScheme ECDSAType = ECDSA Secp256r1
pkSuggestedScheme ECKCDSAType = ECKCDSA Secp256r1
pkSuggestedScheme ECGDSAType = ECGDSA Brainpool256r1
pkSuggestedScheme GOST_34_10Type = GOST_34_10 Gost_256A
pkSuggestedScheme Ed25519Type = Ed25519
pkSuggestedScheme Ed448Type = Ed448
pkSuggestedScheme XMSSType = XMSS XMSS_SHA2_10_512
pkSuggestedScheme DHType = DH MODP_IETF_2048
pkSuggestedScheme ECDHType = ECDH Secp256r1
pkSuggestedScheme X25519Type = X25519
pkSuggestedScheme X448Type = X448
pkSuggestedScheme DilithiumType = Dilithium Dilithium6x5
pkSuggestedScheme ML_DSAType = ML_DSA ML_DSA_6x5
pkSuggestedScheme KyberType = Kyber Kyber1024_R3
pkSuggestedScheme ML_KEMType = ML_KEM ML_KEM_768
pkSuggestedScheme McElieceType = McEliece 2960 57
pkSuggestedScheme ClassicMcElieceType = ClassicMcEliece ClassicMcEliece_6960119f
pkSuggestedScheme FrodoKEMType = FrodoKEM FrodoKEM976_SHAKE
pkSuggestedScheme HSS_LMSType = HSS_LMS "SHA-256,HW(10,1)"
pkSuggestedScheme SphincsPlusType = SphincsPlus SphincsPlus_SHA2_128_Small
pkSuggestedScheme SLH_DSAType = SLH_DSA SLH_DSA_SHA2_128_Small
Then, generating a key is as simple as:
prk <- generatePrivKey (pkSuggestedScheme SLH_DSAType)
I would like to note that this only covers key generation - unlike the other cryptographic primitives, a given PK algorithm is only associated with key generation, and the various operations of key agreement / key encapsulation / signatures / encryption require their own additional scheme, which will be the subject of the next update.
Its kind of complicated, not every PK algorithm supports every PK operation (in fact, most support only one or two, and you usually need different algorithms for eg key agreement vs signing vs encryption). In the mean time, I have created a table to help disambiguate the various PK algorithm and their uses - I’m still testing, but the preliminary result is this:
ALG KA KEM Sign Encrypt PQ Notes
-----------------------------------------------------------------
- Prime factorization
RSA Yes Yes Yes Yes
- DL Groups
DH Yes
DSA Yes
ELGAMAL Yes
- ECC
ECDH Yes
EC*DSA Yes
ECIES Yes Not FFI supported
SM2 Yes Yes Yes Yes
GOST-34.10 Yes Deprecated
- Named curves
X25519 Yes
X448 Yes
ED25519 Yes
ED448 Yes
- Post-quantum
MCELIECE Yes Yes
FRODOKEM Yes Yes
KYBER Yes Yes
ML_KEM Yes Yes
DILITHIUM Yes Yes
ML_DSA Yes Yes
SPHINCS_PLUS Yes Yes
SLH_DSA Yes Yes
HSS_LMS Yes Yes Stateful
XMSS Yes Yes Stateful
Post Quantum support
The keen-eyed among you will already have noticed the increased support for post-quantum algorithms, specifically the support for the recently approved FIPS / NIST final selection of post-quantum algorithms!
- Key encapsulation
- McEliece
- ClassicMcEliece
- FrodoKEM
- Kyber
- ML_KEM (NIST approved Kyber)
- Digital signatures
- Dilithium
- ML_DSA (NIST approved Dilithium)
- SphincsPlus
- SLH_DSA (NIST approved Sphincs)
- XMSS (stateful, use with caution)
- HSS_LMS (stateful, use with caution)
The definitions for their parameter types is not super interesting, aside from my having to spend a few days* scouring the C++ source code to find their precise formats and arguments, stuff like inconsistent casing really screws with accurately identifying algorithms, etc, but that’s all taken care of now.
I am glossing over a lot here. It was a huge pain in the ass tracking down every algorithms’ specific, inconsistently capitalized capitalization-sensitive algorithm and parameter identifier because they only exist as magic strings in the C++ source code! For example, the algorithm is
SPHINCS+withSHA512but the params actually need to be formatted asSphincsPlusandsha2, and this is just plain not mentioned anywhere! It’s fricken terrible! And you don’t have to deal with that anymore!
You still need an appropriate version of botan to enable support, however, and that takes us to our final section of the update!
Versioning support
These bindings were originally written against botan 3.2, and in the time since, botan has received several updates - 3.11 is now available, and enough things have been added that I am currently determining how I am going to handle versioning support - which is a problem.
One of the issues with adding support for new algorithms is that I still need to handle botan versions that don’t have them yet - and while the lowest botan-bindings take bytestring names and parameters for algorithm identifiers, now that we have data types for algorithms, I need some way of indicating to the user whether a given algorithm is actually available.
Why is it a problem? Well, botan provides botan/build.h as file to import to gain access to the BOTAN_HAS_<alg> defines, which sounds perfect! Just import it, and use CPP and conditional compilation, right? Something like:
module Botan.Low.PubKey where
import Botan.Bindings.PubKey
-- Like this
#include <botan/build.h>
-- So I could do things like:
pkTypeIsSupported :: PKType -> Bool
#if defined(BOTAN_HAS_RSA)
pkTypeIsSupported RSAType = True
#endif
pkTypeIsSupported _ = False
Did that work? Nope!
It fails because build.h has an indented define that causes the strict GHC C preprocessor to fail with an error - this little line here:
#ifndef BOTAN_DLL
#define BOTAN_DLL __attribute__((visibility("default")))
#endif
If anyone knows how to allow GHC to parse this without failing, it would be really nice to be able to just import the file. Otherwise, to fix this we need to either pre-parse that file as as pre-build step to fix it or otherwise generate the list of BOTAN_HAS_<alg> supported algorithms for us to consume, which seems like a lot of work for what ultimately is a fragile bandaid - really this should be fixed by Botan C++ itself so I will probably create an issue for them.
Since I can’t include the file directly, for now I have instead resorted to using CApiFFI to import the defines as constants one by one.
-- Since the indented #define doesn't allow us to use the constants directly for
-- conditional compilation, we will import the defines and allow the compiler to
-- elide things via constant folding
-- Prime factorization
foreign import capi safe "botan/build.h value BOTAN_HAS_RSA" botan_has_rsa :: CInt
-- DL
foreign import capi safe "botan/build.h value BOTAN_HAS_DL_GROUP" botan_has_dl_group :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_DIFFIE_HELLMAN" botan_has_diffie_hellman :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_DSA" botan_has_dsa :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_ELGAMAL" botan_has_elgamal :: CInt
-- ECC
foreign import capi safe "botan/build.h value BOTAN_HAS_ECC_GROUP" botan_has_ecc_group :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_ECC_PUBLIC_KEY_CRYPTO" botan_has_ecc_public_key_crypto :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_ECDH" botan_has_ecdh :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_ECDSA" botan_has_ecdsa :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_ECKCDSA" botan_has_eckcdsa :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_ECGDSA" botan_has_ecgdsa :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_SM2" botan_has_sm2 :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_GOST_34_10_2001" botan_has_gost_34_10_2001 :: CInt
-- Named curves
foreign import capi safe "botan/build.h value BOTAN_HAS_X25519" botan_has_x25519 :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_X448" botan_has_x448 :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_ED25519" botan_has_ed25519 :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_ED448" botan_has_ed448 :: CInt
-- Post-quantum
foreign import capi safe "botan/build.h value BOTAN_HAS_MCELIECE" botan_has_mceliece :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_CLASSICMCELIECE" botan_has_classicmceliece :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_FRODOKEM" botan_has_frodoKEM :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_KYBER" botan_has_kyber :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_KYBER_90S" botan_has_kyber_90s :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_ML_KEM" botan_has_ml_kem :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_DILITHIUM" botan_has_dilithium :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_ML_DSA" botan_has_ml_dsa :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_HSS_LMS" botan_has_hss_lms :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_SPHINCS_PLUS_WITH_SHA2" botan_has_sphincs_plus_with_sha2 :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_SPHINCS_PLUS_WITH_SHAKE" botan_has_sphincs_plus_with_shake :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_SLH_DSA_WITH_SHA2" botan_has_slh_dsa_with_sha2 :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_SLH_DSA_WITH_SHAKE" botan_has_slh_dsa_with_shake :: CInt
foreign import capi safe "botan/build.h value BOTAN_HAS_XMSS_RFC8391" botan_has_xmss_rfc8391 :: CInt
-- TODO: Also use has_dl_group, has_ecc_group
pkTypeIsSupported :: PKType -> Bool
pkTypeIsSupported X25519Type = botan_has_x25519 > 0
pkTypeIsSupported X448Type = botan_has_x448 > 0
pkTypeIsSupported RSAType = botan_has_rsa > 0
pkTypeIsSupported McElieceType = botan_has_mceliece > 0
pkTypeIsSupported ClassicMcElieceType = botan_has_classicmceliece > 0
pkTypeIsSupported FrodoKEMType = botan_has_frodoKEM > 0
pkTypeIsSupported KyberType = botan_has_kyber > 0 || botan_has_kyber_90s > 0
pkTypeIsSupported ML_KEMType = botan_has_ml_kem > 0
pkTypeIsSupported DilithiumType = botan_has_dilithium > 0
pkTypeIsSupported ML_DSAType = botan_has_ml_dsa > 0
pkTypeIsSupported HSS_LMSType = botan_has_hss_lms > 0
pkTypeIsSupported SphincsPlusType = botan_has_sphincs_plus_with_sha2 > 0 || botan_has_sphincs_plus_with_shake > 0
pkTypeIsSupported SLH_DSAType = botan_has_slh_dsa_with_sha2 > 0 || botan_has_slh_dsa_with_shake > 0
pkTypeIsSupported XMSSType = botan_has_xmss_rfc8391 > 0
pkTypeIsSupported Ed25519Type = botan_has_ed25519 > 0
pkTypeIsSupported Ed448Type = botan_has_ed448 > 0
pkTypeIsSupported ECDSAType = botan_has_ecc_public_key_crypto > 0 && botan_has_ecdsa > 0
pkTypeIsSupported ECKCDSAType = botan_has_ecc_public_key_crypto > 0 && botan_has_eckcdsa > 0
pkTypeIsSupported ECGDSAType = botan_has_ecc_public_key_crypto > 0 && botan_has_ecgdsa > 0
pkTypeIsSupported SM2Type = botan_has_ecc_public_key_crypto > 0 && botan_has_sm2 > 0
pkTypeIsSupported GOST_34_10Type = botan_has_ecc_public_key_crypto > 0 && botan_has_gost_34_10_2001 > 0
pkTypeIsSupported DHType = botan_has_diffie_hellman > 0
pkTypeIsSupported ECDHType = botan_has_ecdh > 0
pkTypeIsSupported DSAType = botan_has_dsa > 0
pkTypeIsSupported ElGamalType = botan_has_elgamal > 0
This isn’t perfect - instead of conditional compilation, we have to rely on constant folding to eliminate dead branches, which isn’t ideal, but at least now we can check whether algorithms are properly supported without having to attempt generating a key or context and then catching a NOT_IMPLEMENTED exception.
Anyway I have spent a few days testing against various versions of botan, and it is extremely satisfying to be able to print out and verify what algorithms are supported by this installation, and see it change when I change installation versions. I intend to do the same thing for the other modules in the future, adding versioning support.
Another thing is that botan 4 is coming with a current ETA of 2027, which will be the first major version jump we have to handle, so it is better to get on this sooner rather than later - we should be able to absorb any changes due to the ergonomics refactor, since we no longer need to stay 1:1 with the bindings. I don’t expect huge changes to the FFI, though several algorithms are slated to be removed. Good thing I’m adding versioning support now, rather than later.
Additional miscellanea
I am also looking into the view functions that have been added in 3.5 - they are designed to help avoid some of the song-and-dance routine required to give some algorithms the right-sized buffers by instead giving you access to the buffer and making you copy it yourself.
On the main branch, someone opened a PR to fix some base64 decoding for 3.12, so Joris has merged a PR, and new minor versions are on their way!
Health
My health has continued to improve, so much that I have recently broken a personal record / hit a milestone and was able to walk 12 miles (a half-marathon!) in a single day ![]()
One year ago, I was having to consider getting a cane because of my leg, but this weekend, I was able to jump properly on it for the first time in a very long time. I have only been able to regain my health, because this project and your support affords me the time I need to focus on it - I can work on my PT while mulling over problems, and I don’t have the stress of a boss clocking my time and forcing me to sit down for 8-10 hours a day.
That means working on this project is a joy that gives me energy, rather than one that drains me - and that keeps these updates keep coming!
That’s all for now!
Very happy to read about your health improvement. Keep up the great progress!