Botan bindings devlog

ApothecaLabs · November 15, 2023, 9:56pm

Update: X509 data types, stub functions, and bindings

This update has been building for a few days now, and I’m glad to get it out. You see, you pull on one thread, and things start to unravel - and I can see why the original FFI authors didn’t want to implement more extensive X509 support.

Basic X509 functionality does exist, but is almost entirely limited to loading read-only objects, and I want to be able to use this library to create, sign, revoke certificates, encode / decode PEM and BER/DER formatted objects - so that means getting my hands dirty and doing it myself

It is a bit tedious: to enable one functionality you must implement this other function which requires these data types in order to create these other ones - and and so on until you’re basically forced to implement everything, together. Luckily, that’s our explicit goal, so its no skin off our back!

I’m mostly mirroring the existing C++ API, and after soaking my brain in the X509 C++ source for a few days, I’ve written C FFI data types and function stubs for almost all of the X509 data structures. When I started, the existing X509 data types were just basic read-only Certificates and CRLs - now, we have data types for:

Distinguished Names
Extensions
Certificates
Certificate Authority
Certificate Signing Requests
Certificate Options
Certificate Stores
Path validation
Supplemental data types

All of the things that you need to generate and sign new certificates! There still are some open questions (namely, returning arrays of things, and some questions of ownership), and I still need to do FFI types and function stubs OCSP support, and there’s some missing functions, but we’re getting there!

One large caveat, is that this is just the stub functions, and now I still have to go through and actually implement them all to call the C++ from C. There’s still a lot of work to do, but I’ve basically started in the middle, between the Haskell and the C++ - and with the bindings set, I can get to work on implementing the Haskell and the C++ together.

On the other hand, I’m basically defining the FFI types, and that’s one of the most important parts of programming. Plus, all of this X509 work is effectively doubling the size of the Botan C FFI (it now accounts for half of the FFI header file!) so I shouldn’t be surprised it’s taking a bit of time - I’ll bet the original FFI wasn’t built in a few weeks either!

These changes have been pushed to the repo and upstream fork.

If you want to check out this recent work you’ll have to clone the experimental botan-upstream fork, build and install it from source, and use the XFFI flag to enable the experimental FFI modules. There are some instructions in the README, or you can follow along here:

# Clone
git clone https://github.com/apotheca/botan-upstream $BOTAN_CPP

# Build and install C++
cd $BOTAN_CPP
./configure.py --prefix=$BOTAN_OUT
make
make install

# Play around with it
cd $BOTAN_HASKELL
cabal repl botan-low -fXFFI --extra-lib-dirs=$BOTAN_OUT/lib --extra-include-dirs=$BOTAN_OUT/include

That’s all for the moment! Next up, I’ll be fleshing out these function stubs to be vending actual objects that we can interact with!

jackdk · November 15, 2023, 9:59pm

I might have missed it, so I’m sorry if you’ve already written this somewhere else, but: what’s the eventual plan for this fork? Are you going to be able to upstream these FFI changes?

ApothecaLabs · November 15, 2023, 10:00pm

Yes, I will (eventually) be submitting these FFI changes as a patch / pull request to the original Botan C++ library repo. It is an open issue that I am happy to contribute to.

ApothecaLabs · November 21, 2023, 5:00pm

Update: X509 Cert Options and Cert Store implementations

Now that I’ve got the X509 FFI functions all declared, it’s time to write implementations, and that’s what today’s update is about. Certificate options and certificate stores are more or less first-draft implemented on the C++ side, with certificate signing requests and certificate authorities on the way next. This means that I am very close to being able to generate new X509 certificates through the FFI, which would be a great chunk of new functionality. If I can achieve this, it will be a meaningful contribution not just for the Haskell library but to the upstream C++ library as well.

So close!

Since this is C++, I’ve run into a few questions of string / object ownership and lifetime - it compiles, and the types are correct, but I’ve not really tested it yet. There’s also a few functions that return a shared_ptr, but the Botan FFI is designed for unique_ptr. In general, I’m not up-to-date on the move / copy / ownership semantics yet, and the heavy use of auto and implicit casting makes it difficult to track what is going on. Another issue is a question string encodings, and whether Botan supports null bytes in X509 Distinguished Names, part of the X509 spec which inherits a lot of data types from ASN1 (which does allow for null bytes). I’m just using cstrings for now, but may have to go back and change some things to be uint8_t arrays instead - no biggie if I do.

I plan on posting an update to Botan’s open issue for the X509 FFI asking for some guidance, and in the mean time I’m forging ahead and I’m hoping to have the rest of the X509 interface at least roughly-implemented by the end of the week, barring the open questions about ownership and lifetimes, and for those I will at least make ready to apply the answers for when I do get them.

The repo and upstream fork have been updated. Let’s keep this train rolling!

While I’m working on that, check out the impact that this work has had on my github contribution activity - it’s neat to look back, and see the history of the project:

botan-commit-activity

I started this project back in July, and have managed to keep a consistent level of activity and momentum going ever since, making sure to get something meaningful done just about every day. Notably, the contributions to the botan-upstream fork do not appear here (yet) because it is a fork, but they will show if the fork’s changes are accepted in a PR - when that happens, this last month will be even greener

ApothecaLabs · November 29, 2023, 7:09pm

Update: More X509 C++ and Haskell over the Holiday

It’s been a bit of a quiet week; I meant to give an update days ago, but the holiday left me worn out, it has been difficult for me to write, so this update is a bit unpolished. I think I am in need of a small break, so I will be taking it a bit easy as I prepare the coming first monthly status update, and I will be trying to maintain a shorter, more frequent update schedule.

X509 Cert Store Bindings and Functions, and More

First off, what have I accomplished since the last update?

C++ Distinguished Name implementations
C++ Certificate Extension function declarations and stubs
- Still have some questions over how I’m going to implement this
C++ Certificate Options seem to be working
C++ Created Certificate_Store_In_SQL::find_key_unique, a unique_ptr version of find_key which produces a shared_ptr that we can’t use.
C++ / Haskell Fixed int botan_x509_cert_store_find_all_certs return type
C++ / Haskell X509 Path Validation names and types fixed
Reorganized Haskell X509 types (now spread across multiple modules)
Haskell Certificate Store bindings and implementations
Haskell Certificate Signing Request implementations
Haskell CRL Entry bindings and implementions
Haskell Certificate Authority implementions
Haskell Path validation implementions

I’ve spent a fair bit of time wrapping my head around STL pointer lifetimes, to which a friend has helpfully supplied the following ‘guide’ (I can’t tell if he is serious, or if it is in jest, or both):

What a monstrosity! C++ is taking a bit of a toll on me and I can feel my sanity slipping away…

Regardless, I’m having to do a bit more than just wrap functions at this point, I’m having to update or define new C++ functions to get the requisite access that I need (I can’t use shared_ptr objects due to the structure of the FFI, but there are still some functions using them so I have to write unique_ptr variants of them, as I haven’t yet determined if it is “safe” to relax the type constraint even though I suspect that it should auto-cast via move-constructors).

Aside from that, there’s a few functions that need to return jagged 2d arrays, which means an array of array sizes plus the actual size of the array, meaning you’re dealing with 3 pointer variables. Then there’s a few functions that return key-value pairs, which require 5 pointer variables (keys, key lengths, vals, val lengths, count).

Ignoring those two particular issues, I’m still missing Haskell bindings on some things (DN, Extensions), and about 30 or so missing C++ function implementations (Cert Authority, Cert Signing Request, CRL Entry, Cert Store) - otherwise things are coming along decently. Everything is at least declared and either bound or implemented in Haskell or C++, and having one side done makes it easier to fill in the other.

Some thoughts I have as I prepare the monthly update.

X509 explodes off into a fractal of complicated dependencies: for example, it is reliant on ASN1, and implementing bindings to ASN1 that is far out of scope and I’d rather do that in Haskell anyway.
Issues / questions over what encoding some things are in - in C++, string just means array of chars, are not cstrings, are not necessarily UTF8
May ignore DN objects for now in favor of passing string-encoded subject and issuer distinguished names
Extensions: singular object vs individual extension objects (which do have getters)
I worry about the ‘query with a null array pointer and we’ll populate the size pointer’ method
- Doesn’t scale to higher arrays (see issues with 2d jagged arrays)
- Potential efficiency / non-determinism issues with generated results
- Eg, the generate_crls method for sql cert stores
- I would prefer an actual size query function.
- May revisit this issue, but it applies to several
Some things are unavailable to me (Sqlite3 store implementation) at this time pending investigation
Learning some lessons on what I’d do differently if I were to start from scratch (but now is not the time for refactoring)

My biggest concern right now is that I know that the X509 stuff is still going to take continued effort, but I can’t let it take up the entire project’s focus. So, in order to make sure that I hit all of my deliverables in the next two months, I have to figure out a limit to what I’m going to focus on, and cut it off at a certain reasonable point.

Hence my thoughts are that the X509 interface I’ve so far declared is sufficient for now, and definitely an improvement over what was there already. I’ve done enough work on the X509 stuff to know that it should make it in as a deliverable, and that’s good. But it also illustrates why some high-value goals (like X509) are optional deliverables - at this point, it is also safe to say that the other optional deliverables may not make it this stage as I still have much to do.

That’s it for now - I’m preparing a more thorough update for next time to mark the end of the first month, see you then!

The repo and upstream fork have been updated.

ApothecaLabs · December 11, 2023, 6:49pm

Yikes! It’s been a little while since I update the devlog; after posting the Monthly Status Report, I had some personally impactful events occur that necessitated my taking a few days off to process. Your patience is appreciated.

Tightening the X509 ship

The big thing of note this update is I’ve sort of worked out the boundaries of what is necessary to implement to get the features that I want. It is a question of functionality vs faithfulness to the full and complete X509 spec, and it can be hard to tell what is X509 spec, and what is opinionated Botan C++ - and Botan certainly has some opinions that don’t really cross the C boundary very well.

At some point it is more sensible to encode things into a known format and then parse it rather than trying to juggle an FFI; if we cross that point, time would then be better served spent on creating / modifying a native X509 implementation while using Botan to supply cryptographic primitives instead - this would mean integrating botan more tightly with / mutually reliant on things like asn1-encoding and crypton-x509 and crypton-x509-store and tls.

We aren’t at that point yet; the convenience functions in Botan serve to both limit our choices, but also make it easier to implement since there is a limit to what they expose. What we need is to be able to:

Read and verify certificates
Create self-signed certificates (certificate authority)
Create and sign requests and new certificates using a certificate authority
Create and manage certificate revocation lists
Revoke or affirm certificates using revocation lists
Read and write certificates and private keys to and from a certificate store

It turns out that we can get (or already have) quite a lot of this without directly implementing a few things:

Distinguished names are fancy multimaps / key-value lists, and can be represented in an encoded string format rather than constructing a map object and passing it around. However, it is worth noting that there is logic attached to what keys and values are ‘valid’. Ultimately, it may eventually become useful to fully implement, but right now the cost/opportunity balance is against it.
creating extensions is obviated by the ‘options’ struct, and by implication of supplied arguments in the wrapping functions - in other words, the extensions objects are not used directly but are instead created on-the-fly and immediately consumed. However, this makes querying the finished objects for extensions difficult - they are represented via an abstract class, and having a pointer to that is slightly complicated especially when we don’t know what subclass it is (I am looking into how private keys manage it). If I can get the abstract vs subclass stuff working, it may be worth implementing anyway, due to functions that allow us to query certificates and other objects for their related extensions
Path validation has already been ‘handled’ by the botan_x509_cert_verify_with_crl which serves to encapsulate the necessary logic to call Botan::x509_path_validate and return the validity. Like distinguished names, it may be worth implementing fully eventually, but at the moment time is better spent elsewhere.

As a result, I may end up temporarily or permanently axing the X509 DN, Extensions, and Path modules. This shrinks our implementation surface a little, and allows us to keep focused on implementing what is necessary to get the desired functionality. So we don’t get everything we wanted, but its still a vast improvement on the basic X509 FFI that I started with, and we can focus on completeness later if that direction still favorable. It is okay to write code that you don’t end up using, it is a problem if you keep that code to your detriment.

This means we can focus on the remaining items for X509:

PEM and DER encoding for X509 more objects
Finishing the Cert Authority and new CRL function C++ implementations
A bunch of small encoding things

Codewise, this update features new functions for creating, querying, and updating certificate revocation lists, which before could only be read from a file. They’re still in progress, but huzzah!

I also got the sqlite3 certificate store working; this necessitates configuring botan using the --with-sqlite3 flag, but it should be working now.

The repo and upstream fork have been updated.

ApothecaLabs · December 14, 2023, 3:47pm

Update - Fresh New README

Today’s update is, I hope, rather refreshing. After staring a bit too long into the abyss that is C++, I decided to do some long-awaited spring cleaning of the repo, and perform some of the non-code-related tasks.

The big thing for this update is a complete rewrite of the project README. It is a vast improvement, so you should definitely check it out!

I took a few cues from this curated awesome-readme list, it was very helpful. I even managed to get navigation and collapsible sections working, so that it doesn’t get in its own way when scrolling through to where you want to read. There’s some minor formatting to be done on the collapsibles, but I’ll take care of it soon enough.

It is important to remember that non-coding tasks may be just as meaningful to a project as its coding tasks, and 10+ years of industry experience have taught me that a good README is essential - and I have allowed the README to lay fallow far too long. Good documentation doesn’t just happen, you have to make it happen.

Some of the (platform-specific) build instructions haven’t been tested yet, but now it is looking prim and proper, as befitting a well-organized project such as this. I have also split off crypto-schemes into its own repo, due to the narrowing of project scope. Don’t worry, it’s not going away - it’ll be back! I also cleaned up and removed a bunch of non-botan stuff from the repo.

Next up, I’m going to spend the next few days going over building on various platforms, and try to get the ball rolling on the CI stuff.

I finally got around and set up a Ko-Fi account for anyone who wishes to donate to help keep this project going, or to just say thank you. It really makes a difference to me and the longevity of this project.

ApothecaLabs · December 22, 2023, 8:00pm

Yee-haw! Major `botan-bindings` CAPI update!

I am absolutely stoked to bring this update, as I’ve been working on it for the last week while checking over everything repeatedly. The major focus of this update is that we have upgraded botan-bindings from using ccall to using capi and the newer CApiFFI.

It is our first significant update to botan-bindings in quite a serious while (XFFI notwithstanding), and is made with the goal of stability, to freeze botan-bindings for its first release. In fact, pending a few things, in the next update (when I have fixed botan-low and merged CAPI-experiments back into main) will involve a version change from 0.0.1 to 0.0.2, and hopefully it will be stable enough to consider as a candidate for versioned release on hackage.

Effectively, this accepts @BurningWitness 's PR that I’ve been referencing for the last several days, though there are some significant tweaks and differences such that I merged by hand, so can’t actually merge the PR, but it saved me a lot of time and effort. Please thank them for their serious contribution to this project.

This means every module in botan-bindings has been somewhat re-written, albeit with the intended goal of requiring minimal changes downstream because the bound interfaces haven’t actually changed, just our way of wrapping them.

So, specifics:

I’ve upgraded from ccall to capi using the newer CApiFFI.

This gives us a much tighter bindings to the ctypes using CTYPE
We now get a warning for non-matching Haskell and C types
The bindings are now safe by default, though I may need to revisit specific functions with callbacks in the future to make sure this is appropriate.
I’ve renamed things to be more 1:1 consistent with Botan C FFI for predictability, and because we’ll start standardizing with idiomatic naming at higher levels.
I’ve created a ConstPtr shim for base < 4.18
- This does cause an issue for GHC 9.0.2

There is now a new method of defining Haskell C data types:

-- Old method
data FooStruct
type FooPtr = Ptr FooStruct

-- New method
data {-# CTYPE "botan/ffi.h" "struct botan_foo_struct" #-} BotanFooStruct
newtype {-# CTYPE "botan/ffi.h" "botan_foo_t" #-} BotanFoo
    = MkBotanFoo { runBotanFoo :: Ptr BotanFooStruct }
        deriving newtype (Eq, Ord, Storable)

The new method is great because I can generate bindings in botan-low now a la:

mkBindings
    ::  (Storable botan)
    =>  (Ptr struct -> botan)
    ->  (botan -> Ptr struct)
    ->  (ForeignPtr struct -> object)
    ->  (object -> ForeignPtr struct)
    ->  FinalizerPtr struct
    ->  (   botan -> IO object
        ,   object -> (botan -> IO a) -> IO a
        ,   object -> IO ()
        ,   (Ptr botan -> IO CInt) -> IO object
        )
mkBindings mkBotan runBotan mkForeign runForeign destroy = bindings where
    bindings = (newObject, withObject, objectDestroy, createObject)
    newObject botan = do
        foreignPtr <- newForeignPtr destroy (runBotan botan)
        return $ mkForeign foreignPtr
    withObject object f = withForeignPtr (runForeign object) (f . mkBotan)
    objectDestroy object = finalizeForeignPtr (runForeign object)
    createObject init = mask_ $ alloca $ \ outPtr -> do
        throwErrorIfNegative_ $ init outPtr
        out <- peek outPtr
        newObject out

I’ll be using this in botan-low as I update it in response to more consistently generate the foreign pointer wrappers and functions. Here’s an example implementation of RNG using it:

newtype RNG = MkRNG { getRNGForeignPtr :: ForeignPtr BotanRNGStruct }

newRNG      :: BotanRNG -> IO RNG
withRNG     :: RNG -> (BotanRNG -> IO a) -> IO a
rngDestroy  :: RNG -> IO ()
createRNG   :: (Ptr BotanRNG -> IO CInt) -> IO RNG
(newRNG, withRNG, rngDestroy, createRNG)
    = mkBindings MkBotanRNG runBotanRNG MkRNG getRNGForeignPtr botan_rng_destroy

It’s a big step towards achieving enough stability for a proper release. Unfortunately, all of this breaks botan-low for the moment, and so it exists in its own CAPI-experiments branch for now.

This is because there are now some issues with the older mkInit functions that I made and used back when FooPtr was a type instead of a newtype, and ConstPtr causes issues with existing functions that predate it but now require it due to updated bindings.

I’m working on fixing all of that next, while performing a similar ‘refactor-and-freeze’ on botan-low.

Update on CI work

Aside from all of that, there’s also @ocramz 's PR, which I think I am going to accept and then probably heavily edit. I don’t like including things into projects that I don’t understand or have a thorough grasp on, so I’ve been reading up heavily on Github CI to better understand their markup, because I’m almost certainly going to go with github actions / workflows for CI - the bar for entry is very low.

To that end, I’ve gathered the following options:

get-tested (which ocramz PR used, and embeds into a workflow)
haskell-ci (which generated a hard-to-read workflow)
Writing workflows by hand

It turns out that I need some flexibility that I don’t know if haskell-ci affords (I need to be able to edit the workflow, and its a generator), so I ended up not considering it as an option.

On the other hand,get-tested does one thing simply and does it well - it generates the ghc-os test matrix for me, and the PR includes useful steps like caching that I knew I’d need to be using. However, I have other matrix needs - specifically, the library target also needs to be a part of the matrix, so I need more power than get-tested gives me.

In the end I’ll probably write the CI workflow by hand, as its only a few lines of code to write our own matrix, but it is quite fine for now, as I’ll need a few more days going over github CI workflows to produce something better.

Well, I’m a bit bushed after all of this. The next few days are going to be a bit quiet as I go visiting family, and I hope everyone enjoys the holidays too

romes · December 22, 2023, 8:11pm

Great work Leo and collaborators! It’s exciting to see this work being done.

Vlix · December 22, 2023, 10:32pm

Props to @BurningWitness and @ocramz for the PRs, and of course to @ApothecaLabs for the continued effort

jackdk · December 23, 2023, 1:42am

That’s a pretty great update. I maintain a small (nearly-irrelevant) package which does FFI bindings. I’m going to have to play with CApiFFI and those CTYPE annotations, and try and modernise it.

ApothecaLabs · December 29, 2023, 2:00am

Holiday Update: Fixing `botan-low` for the new CAPI changes

The last update was a good one, but it broke a few things in botan-low - hence why I’ve been working off of the CAPI-experiments branch. This update is about as big, and once again touches every module in a library - this time botan-low. It’s a bit late, so I’ll keep this short and sweet.

Change log / greatest hits

botan-low now uses the newer better data types and their newtype constructors
The mkBindings generator function has been moved to botan-low and expanded
Initializers now use generated createObject functions
Destructors are now generated
Functions now use ConstPtr where appropriate
FooCtx-style objects have been renamed Foo
withFooPtr-style functions have been renamed withFoo
mkFoo functions in Botan.Low.Make are being re-worked
Suite of more consistent marshalling functions are being developed
“Fixed” unit tests to the point of compiling if the XFFI test modules are deleted and removed from the cabal file (havent actually scrutinized the results yet)
botan is still broken

Whew! That covered a lot and I’m still testing everything (in fact, the unit tests are still broken, but that’s because hspec-discover doesn’t respect our if flag(XFFI) and tries to include modules that won’t exist), so this update is still living on the CAPI-experiments branch for the time being.

There’s more work to do on botan-low but I’m aiming to get it release-stable like how botan-bindings effectively is - that’s all for now!

This update has been pushed to the CAPI-experiments branch branch.

ApothecaLabs · January 3, 2024, 9:43pm

New Year’s First Update - Merging `CAPI-experiments` back into `main`

Its the first update of the new year! And I bring some good things!

The first big point is the CAPI-experiments branch has become stable enough that I have merged the changes back into main. That means all of the recent CAPI good-ness has now made it back in, effectively freezing botan-bindings minus any additive changes.

Second big point is that I’ve added a bunch of pattern constants for algorithm names to botan-bindings and botan-low to match the data types in botan.This makes the lower-level libraries a tad more useful on their own, and will help guard against stringly-typed errors.

As an example, we might have the following in botan-bindings:

pattern BOTAN_BLOCK_CIPHER_128_AES_128
    :: (Eq a, IsString a) => a
pattern BOTAN_BLOCK_CIPHER_128_AES_128      = "AES-128"

pattern BOTAN_AEAD_CHACHA20POLY1305
    ::  (Eq a, IsString a) => a
pattern BOTAN_AEAD_CHACHA20POLY1305 = "ChaCha20Poly1305"

pattern BOTAN_AEAD_MODE_GCM
    ::  (Eq a, IsString a) => a
pattern BOTAN_AEAD_MODE_GCM         = "GCM"

But express it in botan-low more succinctly:

type BlockCipher128Name = ByteString

pattern AES128
    :: BlockCipher128Name
pattern AES128          = BOTAN_BLOCK_CIPHER_128_AES_128

type AEADName = ByteString

chaCha20Poly1305 :: AEADName
chaCha20Poly1305 = BOTAN_AEAD_CHACHA20POLY1305

gcmMode :: BlockCipher128Name -> AEADName
gcmMode bc = bc // BOTAN_AEAD_MODE_GCM

-- With *extended* algorithm parameters
gcmMode' :: BlockCipher128Name -> Int -> AEADName
gcmMode' bc tagSz = gcmMode bc /$ showBytes tagSz

-- Formatting helpers

infixr 6 //
(//) :: (IsString a, Semigroup a) => a -> a -> a
a // b = a <> "/" <> b

infixr 0 /$
(/$) :: (IsString a, Semigroup a) => a -> a -> a
a /$ b = a <> "(" <> b <> ")"

And even further, give it a proper data type in botan:

data AES
    = AES128
    ...

data BlockCipher128
    = AES AES
    ...

data BlockCipher
    = BlockCipher128 BlockCipher128
    ...

-- NOTE: Current datatype in `botan` does differ slightly
data AEAD
    = ChaCha20Poly1305
    | GCM BlockCipher128 (Maybe Int)
    ...

I’ve tried to strike a balance between safety and exposing functionality by providing constants for algorithm families and modes, without making a constant for every combination. I’ve yet to make the botan data types now use the botan-low constants and functions properly, but I’ll be working on that shortly.

NOTE: I haven’t used Ptr "TheConst\0"# because most algorithms have parameters, and we need to glue the parts together for compound algorithms, and that would be thousands of constants if we unrolled that all.

Third big point is that I’ve split the unit tests up into individual targets so that its easier to see what’s passing and failing - no longer will cipher unit tests spam everything with test failures! This is great because I can just test everything with:

cabal test botan-low

Or I can test specific things via

cabal test botan-low-hash-tests

I’ve also fixed the unit tests - some were broken by being split into separate targets, but I had to get the pattern constants in first as preparation. However, this also means that tests can no longer use hspec-discover - 1) because it doesn’t respect flags and 2) each file is getting its own test-suite anyway - alas!

Unit tests also now (somewhat) use the new algorithm name constants, though I’m still working on that - I’ve tried to be careful, but I’ll be scrutinizing these changes quite severely as I go over them again a few times in coming days.

Last big point is that I’ve also gotten botan building again with all of the changes in the last month or so that broke it. It had been a while since I last poked around the high-level library since so much attention has gone towards botan-bindings, botan-low, and botan-upstream, but I was pleasantly surprised by how I had left things.

I still have a few things to do before I can declare botan-low stable enough for versioning like botan-bindings, however - I need to go over everything with a fine-toothed comb:

Find and fix any missing constants
Implement the view-bin and view-str functions
Clean up unit tests especially algorithm test suite generators
- Consolidate the Make and Remake files
The mystery of why certain unit tests started passing
- Probably was the cipher tests because of the simplified set of algorithms that we’re testing against (we’ll leave extensive testing to botan)
Move some of the non-canonical functions from botan-low to botan
- I’m specifically thinking of the cipher online-vs-offline functions and the hash convenience functions
A little bit of nomenclature standardization
- BOTAN_BINDINGS_FOO_VAL vs LowFooVal vs (high-level) val

But aside from those things, botan-low has arrived near some critical level of stability, much like botan-bindings. Now that the lower libraries aren’t going to be changing as much, and the roles of the various library levels are more clear, I’m effectively playing ‘fill in the blanks’, and the majority of my focus can be on getting the high-level bindings in botan to that sort of state too, now.

With all of the preparation we’ve made, I hope that it should go fairly quick.

All-in-all, I’m really relieved to be getting all of this merged back into main. There’s a lot of modules to juggle, so it’s nice when it all aligns nicely and everything builds as it should - botan-bindings, botan-low, botan, and all of the botan-low-*-tests.

These changes have been merged to main and pushed to the github repo.

That’s it for now, though you should check back soon for the incipient monthly status report!

ApothecaLabs · January 11, 2024, 11:34pm

Update: Work on `botan` recommences!

Now that the lower level libraries are more or less stable, I’ve had the pleasure of being able to focus back on the high-level botan library, with its idiomatic and pure-as-possible interface. I have made several large strides in the background while writing the latest monthly update, since I am finally getting to work a bunch of fun stuff that had to be put off until now.

First off, Botan.RNG has been strongly refined, and use of random generators has been ensmoothened with the introduction of the MonadRandomIO monad and the RandomT monad transformer (which is an instance of MonadRandomIO). Getting this right is essential - because of the many functions that rely on it, it affects the ergonomics of the rest of the library.

Now, there are two ways of accessing randomness:

Directly using an RNG context
Implicit access to an RNG context using MonadRandomIO

Direct usage is the old way, and it pretty much looks like this:

main = do
    rng <- newRNG Autoseeded
    addEntropyRNG "Fee fi fo fum!" rng
    reseedRNG 32 rng
    x <- getRandomBytesRNG 12 rng
    print x

Implicit access to an RNG through MonadRandomIO is the new way. IO is itself a convenient instance of MonadRandomIO that
uses the systemRNG:

main = do
    addEntropy "Fee fi fo fum!"
    x <- getRandomBytes 12
    print x

It is also possible to use the RandomT transformer or RandomIO monad (currently a typealias for ReaderT RNG and ReaderT RNG IO respectively):

main = do
    rng <- newRNG Autoseeded
    flip runRandomIO rng $ do
        addEntropy "Fee fi fo fum!"
        x <- getRandomBytes 12
        liftIO $ print x

I’m not exactly sure how MonadRandomIO will change under the hood (RandomT is probably going to become a newtype at least) but basically, any functions that need random values, or that take an RNG as an argument, can now be MonadRandomIO instead - this should make it really easy to generate keys and whatnot, and reduce the number of arguments in general.

For example, with MonadRandomIO, bcryptGenerate now only takes 2 parameters!

main = do
    dg <- bcryptGenerate "Fee fi fo fum!" Fast
    print dg

Secondly, all of the algorithm name constants and functions have been completely vertically integrated, from botan-bindings to botan-low to botan. This makes using a given algorithm much easier and more consistent no matter what level of library you are using.

Thirdly I’ve created a KeySpec class for a better representation of what keys sizes are available for a given primitive. Its similar to the KeySpecifier from crypton. I’m probably going to rename it.

Fourthly, in the Botan C FFI, you have to initialize a cipher or hash or mac context in order to query its sizes, but this data is constant and initializing a context is not free, so I’ve been writing pure / static versions of key spec / block size / tag length / other size query functions so that I can just say:

let bsz = hashBlockSize MD5

Instead of:

bsz <- do
    ctx <- Low.hashInit (hashName MD5)
    Low.hashBlockSize ctx

This is just overall a much better experience, as we can get algorithm parameters by referencing the algorithm itself instead of referencing an initialized context. It is really helpful for generating keys even if you aren’t looking to use them immediately.

Lastly, CI is working - at least for MacOS. I need to test manual Botan C++ installation for Linux, as I’ve determined that botan 3.x packages haven’t been published yet, but that seems to be the only issue at the moment!

As always, these changes have been pushed to the repo

tsuraan · January 12, 2024, 9:56pm

Lastly, CI is working - at least for MacOS. I need to test manual Botan C++ installation for Linux, as I’ve determined that botan 3.x packages haven’t been published yet, but that seems to be the only issue at the moment!

Which Linux distro were you looking at? It looks like Gentoo’s shipping botan-3.2.0 right now, at least for amd64 and ppc/ppc64

ApothecaLabs · January 21, 2024, 9:14pm

@tsuraan

I was using the latest Ubuntu (23.10), and it turns out that that’s the culprit - Botan 3.x+ is too recent to be in the 22.04 LTS and apparently hasn’t made it into 23.10 yet. So Ubuntu needs to follow manual installation for now; in other words, the CI can be fixed rather simply and it’s on my todo list!

Update: `botan` lurches to life!

Hello everyone, I’m sorry that I’m a bit late. I’ve been doing a lot of thinking, and it can be a bit difficult to post while that is still underway - I am the epitome of ‘think before you speak’, taken to pathalogical extremes

The botan-low interface works, so if you don’t mind putting up with its clunky low-level interface, you can have at it already to get stuff working - but the whole point of the higher-level libraries is a better interface, so its worth thinking heavily about.

While stewing about, I’ve been able to do a lot of the rather mechanical but still useful work in getting the high-level botan library up and running - regardless of what interface I eventually choose to express, there are certain things to be done along the way, so I might as well get them out of the way.

As a result, if you’ve gotten used to anything in the highest-level botan library, I’m afraid its changed heavily. Some of the modules that existed prior have been reworked for consistency, and the others will be too soon enough.

Most of the modules are nominally completed:

Botan.RNG
Botan.Bcrypt
Botan.BlockCipher
Botan.Cipher minus lazy / online processing
Botan.Hash
Botan.KeySpec
Botan.KDF
Botan.MAC
Botan.PubKey
Botan.PubKey.Load
Botan.PubKey.Encrypt
Botan.PubKey.Decrypt
Botan.PubKey.Sign
Botan.PubKey.Verify
Botan.SRP6

Other modules are underway as I investigate the various cruft and minutia that’s piled up. I’ve found some gnarly issues with PubKey.Sign since not every pubkey algo works with every signing algo, and I’m seeing some odd things like PEM signatures only working in unit tests and DER signatures working only in GHCI. I’ll slog through it eventually, but for now, try not to stray from the established path, for here be dragons!

This represents our first stab at an idiomatic interface, so it isn’t great, but it isn’t terrible. I’ve mostly made things pure where I could, MonadIO or MonadRandomIO where I couldn’t. I’ve tried to give everything proper types as it helps me smooth the course and see what idioms need to be applied in a given case, but now that it’s here, I know that I can do better. There’s still a lot of other things to get through, but I’ve got a whole checklist and templates to speed all of it up.

If you want to take a look at the sort of increased ergonomics I have in mind, check out the source code to Botan.SRP6 - the exposed interface is mostly a straightforward translation of the botan-low interface, but if you scroll down further, you can see I’ve been building server and client session logic to abstract away a lot of the fiddly bits - I want to do that for all modules, as appropriate!

This of course leaves us with a lot of decisions to make - those pesky things that I’ve been thinking about - so by no means is anything in botan considered stable yet. We have a stable low-level library which works functionally, and now we are trying to get the ergonomics right:

Decision: FooType and Foo vs Foo and MutableFoo
- Foo type and MutableFoo context, vs FooType type and Foo context
- RNG is the exception? Why? (Because the context is not mutating, it is non-deterministic)
- Current decision: Foo type and MutableFoo context; reasoning: the highest-level interface gets the simplest names
Decision:
- Flatten SHA / etc variants into larger types?
- Or define function aliases? Eg sha3 = Cryptohash $ SHA_3 SHA3_512
- Current decision: Undecided
- The more I have to use it in its current state, the more it annoys me…
Decision:
- Terminology: algorithms with default
- Low uses foo for default / argless and foo' with an apostrophe for the function with args
- Might switch to fooDefault for default, and foo
- Current decision: Undecided
Decision:
- Classes for things like HasKeySpec / KeySettable, HasBlockSize, etc
- Current decision: No. It is probably best left for a higher-level classy library, can make botan instances.
Decision:
- Use Enum + Bounded for algos?
- Initial solution: ADT trees + functions (manual solution eugh)
- Current decision: None, defaulting to all of the fooName functions…
Decision:
- Designed for qualified or unqualified import?
- Probably want unqualified import for high level, but what about Mutable?
Decision:
- How to treat nonced MACs (GMAC and Poly1305) vs non-nonced MACs (deterministic to the key+text)?
- We can apply a MonadRandomIO constraint and ~~ignore the difference~~ return the nonce too, but it is unnecessary for most MACs
- Initial solution: data MAC = DeterminsticMAC DeterministicMAC | NonceMAC NonceMAC
- Afterthought: setMACNonceIfNeeded :: (MonadRandomIO m) => MutableMAC -> m ()?
- Current solution: Only GMAC is actually nonced (Poly1305 folds it into the key),
  so GMAC gets gmac-specific functions
- Or we could do what nacl / saltine do and make Poly1305 a distinct OneTimeAuth / MAC
- We could then pull out GMAC as non-deterministic MAC similarly
Decision:
- FooSize vs FooLength
- Eg, DigestSize vs DigestLength, BlockSize vs BlockLength
- Initial solution: Undecided, but considering standardizing on ‘Size’
- Currently standardizing on: Size for algorithm components, Length for (plain- and crypt-) texts
Decision:
- Split MutableCipher into MutableEncipher and MutableDecipher?
- Would be consistent with PK encrypt, sign
- Would obviate CipherDirection type
- Current decision: Keeping track of encrypt vs decrypt in the mutable context
Decision:
- Create aNonceSpec data type? Or generalize to SizeSpec / SizeSpecifier
- Current decision: Unconsidered, still using validNonceSize and defaultNonceSize functions
Decision:
- Terminology: validKeySize vs defaultKeySize
- Also for nonces
- Also size vs length (size for elements, length for message / ciphertext?)
  - Size slightly implies (a somewhat) fixed value, whereas length is more instanced
- Leaning towards defaultFooSize :: alg -> Int and validFooSize :: alg -> Int -> Bool
Decision:
- Error handling
  - Some (mutable) functions are failable if not used in the proper order
  - Some (mutable) functions are failable because the specific algorithm lacks support
  - Some (user) errors are not fatal (eg, setFooKey with incorrect key length)
    - Should we catch the error and return a bool?
- How do we express this?
- Current status: Allowing exceptions to be thrown - are exceptions satisfactory here?
Decision:
- Cipher (and other processing algorithms) need to conform to a consistent interface
  but the APIs have differences
- Example: Nonces in ciphers as an argument to cipherStart vs (G)MAC set(G)MACNonce
  - Mostly affects only the mutable interface, in that they may have a required order
    or some other peculiarity that is only visible to the internals
  - The use of a nonce is associated with that specific instance of processing,
    and setNonce is more free-er than start(WithNonce) which has a specific order of use.
  - We could imagine the other cryptoprocesses as having a no-op start function
- Arguably, the highest-level API should take all of the arguments, such that the implementation’s
  order of application doesn’t matter.
Decision:
- Clarify ‘clear’ vs ‘reset’ consistently
- Some algorithms only have ‘clear’, but others have a more limited ‘reset’ that preserves keys
Decision:
- Push the higher MutableFoo terminology down to botan-low, eg (eg, setFooBar instead of fooSetBar)
- Current status: Pondering, no need for the churn at the moment
Decision:
- Collapse modules together?
  - HOTP + TOTP = OTP
  - No PubKey submodules?
- Specificity vs ease of use
- High-level libraries focus on ease of use, the low-level libraries are quite specific;
  there is benefit to collapsing these modules in botan
- Addendum: Collapsed all the pubkey algorithm-specific module down to Botan.PubKey.Load
Decision:
- What to call PubKey?
- The module name references Public Key Cryptosystems, of which public keys are a component.
- Possibly rename it PKC or CryptoSystem (or should CryptoSystem be the generalized concept, not just public keys?)
- With the PKC namespace, it may make a bit more sense to move PrivKey and PubKey under it a la:
  - Botan.PKC.PubKey
  - Botan.PKC.PrivKey
  - Botan.PKC.Encrypt
  - Botan.PKC...
Decision:
- Use of MP in APIs?
- Can take Integer instead, eschew Botan.MPI entirely
- Result: Yeah, definitely elide MP entirely in favor of Integer
Decision:
- Elide pointless accessors (such as length queries that have already been used in Botan.Low)
Decision:
- How to represent PubKey types that are only used for specific operations
- That PubKeys require an algo and params causes issues the current setup for pk operations
- Example: What signing algos are usable is dependent on what pubkey is use
Decision:
- How to deal with the gnarly algorithm hierarchies?
- Break algorithms up into individual data type, and use classes?
  - Currently we end up with stuff like: pkSign rsa (EMSA $ EMSA4 (Cryptohash $ SHA2 SHA512) Nothing) ...
    as opposed to something like pkSign rsa (EMSA4 SHA512)
- Would require cryptographic classes (crypto-schemes) and botan instances
- Decision: not yet
- Much longer scope
Decision:
- Botan’s unified data types are a mixed bag - convenience, but problems too
- functions with keys can fail if an incorrect key is used, eg:
  - mac :: MAC -> MACKey -> ByteString -> Maybe MACDigest
- algorithm-specific keys can be assumed valid and thus we can get rid of the Maybe:
  - sha512hmac :: SHA12MACKey -> ByteString -> MACDigest
- Right now we use exceptions if the key is incorrectly sized, should we keep doing that?
- Or should we convert all of these exceptions to Maybe?
- Or should we expose algorithm-specific functions?
- Current result: Mostly still throwing exceptions

So yeah, a lot. It is taking shape though, and just re-reading the list of decisions-to-be-made as a whole helps give me a direction though, and you can of course provide any feedback you might have.

As always, the repo has been updated.

Vlix · January 21, 2024, 11:41pm

Wow, there’s lots to still be done it seems

A few things that pop into my head after reading this and looking through the SRP6 and other modules:

Please use newtype for any types that will be function arguments. The worst thing to happen is switching up ByteString arguments and not knowing until runtime that you’re using the password as the salt, or some other mixup.
Exporting SRP6Salt(..) does nothing if it’s a type synonym, right? Might even produce some warnings? Also will export any constructors if you change it to a newtype, which might not be desireable.
As a general rule, I’d probably not export any constructors of types that shouldn’t be directly fiddled with, as I expect that will be the case in a lot of modules for crypto functionality.
I’m not very knowledgeable on all the different permutations/combinations of hashes and algorithms, but having separate functions for separate combinations would make for a more pleasant API, IMHO.
i.e. make a sha512hmac if that provides an API where it is guaranteed to work. This way, you’d also have sections in the documentation for every hash/algorithm, so you can more accurately provide the caveats and other side-notes that come with each hash/algorithm combination.
I see here that the HMAC constructor takes a Hash, but that the comment says it should never be a Checksum, so should it just take a CryptoHash?
Is SipHash 2 4 the only valid SipHash? Might make more sense to just name the constructor SipHash24 then?
I personally would like that these functions never throw exceptions, but please return Maybe or Either if I have to be ready for them to fail. Though setting up the API so that non-failing combinations are easy and type-safe should be a priority, I feel.

I have lots of opinions on making good developer experiences with nice APIs, but I just don’t know what all the options are
I’d be happy to go through the botan library with you once it’s more “done” to help with nailing down the API and/or to do some brainstorming in general.

ApothecaLabs · January 23, 2024, 1:47am

Oh there’s always more work to be done, especially now that we have some choice regarding how we should implement things. I’m just glad to have someone else following along as a sanity check, and I think I agree with most if not all of your points - you’ve highlighted a few items that are burning a hole in my todo list.

Decide a few of the right things and it all falls into place soon enough, but it is easier to decide when others share their opinion. Thanks

ApothecaLabs · January 28, 2024, 9:49pm

A Classy Update

Decisions are like dominos, knock a few over and the rest come tumbling down.

After some feedback, and a lot of playtesting, it has become clear that the algorithm ADT trees are terribly unwieldy, and not at all the sort of interface that I’d envisioned when setting out on this project. In response, I’ve come to a decision:

ADTs were better than raw strings or constant patterns, but now they are getting in the way - expressions like AEAD $ GCM (BlockCipher128 AES_256) 16 and Cryptohash $ SHA3 $ SHA3_512 are awfully frustrating to read and use. I’m (eventually) axing the algorithm ADTs, in favor of a better approach.

I was initially following z-botan's lead which was helpful at first - however, we are not beholden to that format. Additionally, with the need to add support for BOTAN_HAS_ conditional defines for individual algorithms, the ADT approach makes less and less sense.

Instead, I am proposing a classier interface that uses data families to ensure type isolation and inference. Originally, I was planning on working on this interface as a separate cryptography library (originally called crypto-schemes but that sounds too nefarious), and then making botan conform to it in a separate cryptography-botan library. However, at this point it seems more sensible to just skip the extra step of a separate library, and just implement the conformances in botan itself, while developing cryptography inside of botan to be extracted as a separate library later.

As a result, this update is focused heavily on these new typeclasses:

Botan.BlockCipher.Class
Botan.Cipher.Class
Botan.Hash.Class
Botan.MAC.Class
Botan.OneTimeAuth.Class

The new classes are something like:


data family SecretKey alg
data family Ciphertext alg

class BlockCipher bc where
    blockCipherEncrypt :: SecretKey bc -> ByteString -> Maybe (Ciphertext bc)
    blockCipherDecrypt :: SecretKey bc -> Ciphertext bc -> Maybe ByteString

data family Nonce alg

class Cipher c where
    cipherEncrypt :: SecretKey c -> Nonce c -> ByteString -> Ciphertext c
    cipherDecrypt :: SecretKey c -> Nonce c -> Ciphertext c -> Maybe ByteString

data family Digest alg

class Hash h where
    hash :: ByteString -> Digest h

data family Auth alg

class MAC m where
    auth :: SecretKey m -> ByteString -> Auth m

data family OneTimeAuth alg

class OTA ota where
    oneTimeAuth :: SecretKey ota -> Nonce ota -> ByteString -> OneTimeAuth ota

This isn’t exactly how they are (still being) implemented, but its an accurate enough representation. Other algorithms and modules having multiple data families are slightly more complicated to write, but are coming soon, pending some more data family work. I have tried to create a proof-of-class implementations of at least one algorithm per class type, to show that it functions as intended:

Botan.BlockCipher.AES
Botan.Cipher.ChaCha20Poly1305
Botan.Hash.SHA3
Botan.MAC.CMAC
Botan.OneTimeAuth.Poly1305

A gold-star example of a relatively finished algorithm module (and the effectiveness of the approach) would be Botan.Hash.SHA3, which we can explore:

import Botan.Hash.SHA3

It has per-algorithm -level functions:

sha3_512 "Fee fi fo fum!"
-- 03a240a2...

It also has algorithm-family -level functions that can use TypeApplications to select specific variants:

sha3 @512 "Fee fi fo fum!"
-- This produces the same digest as before

Explicit typing also works:

sha3 "Fee fi fo fum!" :: SHA3Digest 512 -- Or SHA3_512Digest

These functions are implemented via a more generic, classy Hash interface which uses the Digest data family to ensure that different algorithms and variants have different types while still being inferred properly.

import Botan.Hash.Class
:i Hash
-- class Hash h where
-- hash :: ByteString -> Digest h
:i Digest
-- data family Digest h

We can allow our hash algorithm to be parametric using hash, while still using type applications or inference to select our specific algorithm:

-- Once more at the class-level
hash @(SHA3 512) "Fee fi fo fum!"
-- Once more with explicit typing
hash "Fee fi fo fum!" :: Digest (SHA3 512)

The other classes work for at least one algorithm, but at the moment it might require a bit of unsafeCoerce to turn bytestrings into keys, while I get better support for that sort of thing underway.

Here’s CMAC AES128:

import Botan.MAC.Class
import Botan.MAC.CMAC
import Botan.BlockCipher.AES
import Botan.RNG
import Unsafe.Coerce
k <- getRandomBytes 16
mac @(CMAC AES128) (unsafeCoerce k) "Fee fi fo fum!"
-- 7989fb40105646e975311785efae3048

And here’s the ChaCha20Poly1305 cipher

import Botan.RNG
import Botan.Cipher.Class
import Botan.Cipher.ChaCha20Poly1305
import Unsafe.Coerce
k <- getRandomBytes 32
n <- getRandomBytes 12
ct = cipherEncrypt @ChaCha20Poly1305 (unsafeCoerce k) (unsafeCoerce n) "Fee fi fo fum!"
-- 2b0c0e4e332b4214d3c939b0d1af90a89167d914df538f6cdc364371dd8d
pt = cipherDecrypt @ChaCha20Poly1305 (unsafeCoerce k) (unsafeCoerce n) ct
-- Just "Fee fi fo fum!"

Other classes and data families will be quite similar. Notably, we avoid passing around an explicit algorithm witness / proxy, but remain type-injective due to the data families, and only one call site is required for inference to work. It is also clear that this approach will be very amenable to TemplateHaskell in the future. And don’t forget, eventually, these classes will be pulled out into a backend-agnostic cryptography library.

I’m still currently working on some support classes for data families in Botan.Types.Class, such as Encodable and SecretKeyGen and NonceGen which have not yet been applied to the aforementioned cryptography classes but will provide the necessary support to make writing data family instances much easier. If you’ve used saltine or cryptonite, you’ll recognize their influence.

I would like some feedback from the community on this - it does delay publishing to hackage as well as writing tutorials, as things still shift around a bit.

As always, this has been pushed to the repo.

Vlix · January 29, 2024, 11:13pm

I’m wondering if it’d be an ok API if you’d have the Enums in their horribly wordy state, but then:

make newtypes for every section that only accepts parts of certain algos/etc
don’t export any way to create that newtype
EXCEPT for a group of pattern synonyms that contain all the valid ones; and
have a {-# COMPLETE #-} pragma to tell GHC the provided patterns are all that are valid