Botan bindings devlog

ApothecaLabs · February 18, 2024, 6:20pm

Update: botan 0.0.1 package candidate imminent

Whew! It’s been an intense couple of days, now that botan-bindings and botan-low are published to hackage. Additionally, the round 2 funding proposal has been updated to provide more details on the intended trajectory for the next several months.

Of course, now that the excitement has settled down, we are back to focusing on our next goal: getting botan to package candidate status, and soon after to release. Here lies the issue that I’ve been gnawing on for the last few days:

Our initial ADT-based approach to managing algorithms has turned out to be insufficient. Cryptographic typeclasses were something that was in the original proposal, but they were removed to reduce scope; however, they’ve turned out to be necessary in the long term.

It is going to take time to develop the proper solution of typeclasses and data families in order to build per-algorithm gold-standard modules, and I’d like to get botan live before then. Now, there’s still:

the problem of the insufficiency of the ADT interface, which must be dealt with before we publish a 0.0.1, and
the question of whether to prune the typeclasses / data families / per-algorithm gold-standard modules from said initial release because they are still being developed, and add quite a bit of inertia - I’d rather present something comparable to z-botan first before* going after higher-level targets.

* I’m trying to keep my prioritization straight

I do have a solution to #1, which I feel may suffice. Our issue is thus:

-- We have some set of algorithms for a common operation
data Hash
    = CryptoHash CryptoHash
    | Checksum Checksum

-- We have nested ADTs because some functions require a specific subset of algorithms
data CryptoHash
    = SHA3 SHA3
    | ...

data Checksum
    = Adler32
    | ...

-- Algorithms may themselves variants, and so the nesting gets deeper
data SHA3
    = SHA3_512
    | ...

-- An algorithm for one operation may be a component in another operation, but this is dangerous:
data MAC
    = HMAC Hash -- Wrong!

-- The component algorithm may require a specific subset:
data MAC
    = HMAC CryptoHash  -- Right!

-- This all turns out to be unwieldy in practice, and leads to ridiculous stacks of wrappers:
hmac_sha3_512 = HMAC $ CryptoHash $ SHA3 $ SHA3_512

Typeclasses are the obvious long-term solution, but will take time. What’s the interim solution? Let’s ask ourselves, “What did z-botan do?”

Well, they flattened the ADT to an enumeration, and threw an exception if an incorrect algorithm was used. I think we can do slightly better…

-- Flatten the ADTs, have newtype wrappers for specific subsets!
data Hash
    = SHA3_512
    ...
    | Adler32

-- Then, it's up to each subset to define their wrapper and smart constructors
newtype CryptoHash = MkCryptoHash { unCryptoHash :: Hash }

-- We'll use smart constructors instead of just throwing exceptions like z-botan
cryptoHash :: Hash -> Maybe CryptoHash
cryptoHash SHA3_512 = Just $ MkCryptoHash SHA3_512
...
cryptoHash _ = Nothing

unsafeCryptoHash :: Hash -> CryptoHash
unsafeCryptoHash h = fromJust $ CryptoHash h

newtype Checksum = MkChecksum { unChecksum :: Hash }

checksum :: Hash -> Maybe Checksum
checksum Adler32 = Just $ MkChecksum Adler32
...
checksum _ = Nothing

unsafeChecksum :: Hash -> Checksum
unsafeChecksum h = fromJust $ Checksum h

-- Algorithm families can get the same treatment if necessary
newtype SHA3 = MkSHA3 { unSHA3 :: Hash }

-- This means that we have at most 1 wrapper, from enumeration to specific subset
hmac_sha3_512 = HMAC (unsafeCryptoHash SHA3_512)

It’s not perfect, but it gives us the type safety we need, without impairing the ergonomics too much while we work on the proper typeclasses. A happy medium!

There’s no update to the repo quite yet as I’m still doing some cleanup while applying this, but once that is done, the botan 0.0.1.0 package candidate will be going up.

So then, this leaves the question of #2 - to prune or not to prune? Do we elide the in-progess abstractions to keep focus on the core modules?