Botan bindings devlog

For the sake of usefulness I think it’s nice to have hexadecimal encoding and decoding handled for you by the library, however :stuck_out_tongue:

You mean as an ease-of-use way of marshalling the bytestring? I can see how that would help with eas-of-use in non-specific situations.

I’d argue it’ll only help the people who want to use that de-/encoding, or people who don’t care about the encoding, because anyone else would still have to import Data.ByteString.BaseXX or something to get what they want.
Though I guess if the library already depends on bytestring-baseXX you can just use that to provide the marshalling for a cheap QoL addition :man_shrugging:
Because if you only depend on an extra library for the ease-of-use marshalling, maybe that’s an unnecessary dependency? Especially if someone won’t ever use it, since now their library/application (in)directly depends on both e.g. bytestring-baseXX and bytestring-baseYY, while only using one of them.

[ TL;DR: If the library already has byte string encoding functionality in its dependencies, go for it. ]

1 Like

Yes I meant like this: Sel.Hashing.Password

1 Like

Hmmm, yeah. I guess if you’re providing Show instances for hashes you’re going to have some kind encoding in your dependencies anyway. :thinking:

1 Like

Update time!

Today’s update is geared towards improving stability and memory safety with unit tests - and they are already paying off because they’ve already helped me find and fix a few bugs.

  • Removed CAST-256 from ciphers because it is no longer supported
  • Split off 128-bit ciphers to separate type for 128-bit-specific cipher / AEAD modes
  • Fixed name of ceilMod
  • Various notes about crypto object requirements
  • Various improvements to bytestring allocation
  • Fixed cipherCtxValidNonceLengthIO
  • Removed automatic padding from block cipher encrypt / decrypt (was temporary anyway)
  • Attempts to fix Botan.Low.Bcrypt
    • An InsufficientBufferSpaceException occurs during tests, but not with GHCi
  • Various unit tests for modules:
    • Botan.Low.Bcrypt
    • Botan.Low.BlockCipher
    • Botan.Low.Cipher (incomplete)
    • Botan.Low.FPE
    • Botan.Low.Hash
    • Botan.Low.ZFEC
  • Consolidated existing mkGetCString into mkGetBytes to reduce confusion
  • Replaced mkGetName with new mkGetCString for improved consistency

I’ll be working a bit more on the community proposal this evening, as I spent yesterday almost entirely on the unit tests.


Re: fancy ease-of-use marshalling / encoding:

It isn’t exactly cryptography itself, but it is a cryptography-related transformation, so I plan on defining / exposing classes for that sort of thing as its own ‘scheme’ in the interface-focused crypto-schemes to keep the interface unified, but leave enforcing their use up to implementation-focused libraries (such as I plan to in the high-level botanium, but not in botan).

There is a growing list of cryptography-adjacent stuff though that it may be worth splitting off some sort of crypto-utilities to handle them - despite the additional overhead, the library layers are helping shake things down to an appropriate level, as each’s responsibilities become more clear.

2 Likes

That’s… one small step for stability, one giant leap for stability-kind… Today’s update is minor, and consists mostly of three things:

  • Continued working on unit tests and bugfixes for Botan.Low.Cipher
  • Added unit tests and bug fixes for the Botan.Low.MPI BigInt interface
  • The community proposal has been updated with a considerable amount of notes, and the rough draft has been started.

I will be continuing to focus on botan-low unit tests and the community proposal for the remainder of this week.


Small note: Due to the sheer number of algorithm combinations needing testing (easily several thousand tests), the unit tests no longer run to completion and only report a single bulk PASS / FAIL. Neither the number of passing / failing tests nor details of the failures are reported, in the console or in the log. Has anybody else experienced this using hspec over a significant quantity of tests?

3 Likes

Cryptography Community Project: Leg 1 Proposal

Today I am happy to announce that the first draft of the Community Project Proposal has been finished, and it explains in much greater detail what this effort is about. I have been working on it for the last week, and would be satisfied with it as-is, but before I submit it officially, I would like to ask the community for feedback first. Please, take some time to read it, and then give me any feedback, questions, or concerns that you might wish to provide.

No code changes this week since I have been working on the proposal, but now that is out of the way. :slight_smile:

4 Likes

Good proposal in general, though I didn’t see any mention of testing in regards to the botan*-libraries.

Might be a good idea to also include that in the goals of the first leg, just to be explicit?
(I hope any cryptography engineer would assume tests are important, but it’s always nice to have it written down in the “to-do” section, so to speak)

1 Like

I think the proposal is pretty good. Great work!

1 Like

There is mention, but it got condensed down to a single reference in the middle of a paragraph in Deliverables (along with a few other items).

I have adjusted the Deliverables section to increase the prominence of these items:

This paragraph:

Each library should be beta- or / production-ready, and submitted to hackage as a candidate or accepted package. Each library should have documentation, tutorials and examples, tracked issues, and unit tests. A project website will also be created.

Has been changed to:

Each library should be beta- or / production-ready, and submitted to hackage as a candidate or accepted package.

Each library should have:

  • Documentation
  • Tutorials and examples
  • Tracked issues
  • Unit tests

A project website will also be created.

Thank you for pointing this out! :grin:

1 Like

Found this and thought it might be useful to add to the proposal?

I’m also thinking about maybe making a test suite of all the crypto libraries in Haskell, and try to make property tests that check that random input and random salts give the exact same hash in all libraries :thinking: Just to check if there’s any inconsistencies.

Does anyone know of defined “unit tests” for crypto algorithms? I mean in the sense that the paper describing AES/PBKDF/MD5 would give a canonical example of “a correct implementation will result in the bytes Y using input X”?

1 Like

Nice find!

If you click through the tests to an individual algorithm / scheme (example: SHA3), you can find them under Test Vectors - basically, cryptography unit tests for NIST / FIPS -approved algorithms. They’re in a format that needs some parsing first, though.

One of the benefits of this proposal is that such a test suite could be build against crypto-schemes and any backend could then take advantage of it automatically simply by having implemented the crypto-schemes typeclasses. I think that this is an appropriate “layer” for such things, as lower layers’ testing will be more focused on whether bindings and functions are implemented properly, leaving higher layers’ testing to ensure algorithm correctness.

This level of extensive testing is a bit out of scope of this first leg proposal, but is exactly the sort of long-term goals I want to enable.


Even the testing I’m doing right now isn’t quite to that level, though. Even so, I am still having to test against every algorithm because each has their peculiar requirements - in some cases it is a challenge to even write a test that all algorithms pass, owing to constraints on things like input length.

Test-driven development is really helping straighten it all out though - for example I am trying to find the ‘safe path’ for writing higher-level / incremental cipher functions that work for every algorithm, and even though so far each implementation has failed for some specific algorithms, I am steadily making headway because I want to support lazy / incremental cipher processing. Even Z-Botan did not take advantage of incremental cipher processing because of this, instead using one-shot cipher encryption / decryption with no chunking.

2 Likes

I’ve got a good update for everyone today! :grin:

Good progress has been made on Botan.Low.Cipher, which is now almost complete:

  • Implemented offline / one-shot and online / incremental encryption
    • Online encryption should be much more performant since you can stream the data instead of having to load it all at once
    • Used in botan-low unit tests, may be used in or moved up to botan for higher-level functions
    • Still need to implement online / incremental decryption but that will be easy now

This is paired with progress on unit tests to match:

  • Unit tests for Botan.Low.Cipher are almost complete, with tests passing for all algorithms
    • Added unit tests for online and offline encryption
    • Still need to unit test online / incremental decryption
  • Unit tests revealed that botan_cipher_output_length is inaccurate / flawed.
    • It does not properly calculate padding for cbc mode when block-aligned
      • Most padding methods always add at least 1 byte, even if block-aligned
      • This means that block-aligned inputs may require a full block of padding
      • The return value does not include this extra full block when block-aligned
      • The return value does not include tag length
      • Botan source notes that it is supposed to be an upper bound, but it isn’t because of this flaw
    • This flaw can be corrected for with a small overestimate by adding 1 block length + tag length to output length to get a valid buffer size for encryption / decryption.
    • The flawed function may be hidden in favor of exporting the corrected estimate function
    • It may be helpful to differentiate cipher modes with block-aligned input from those modes with variable length input.

Furthermore, the community proposal has been updated based on feedback:

  • Increased prominence of deliverable items (esp unit tests) in proposal
  • Performance benefit section has been added to Problem Statement
  • Test vectors for algorithms (esp CAVP / FIPS / NIST) added to Optional Deliverables

In addition to all of this, I am looking at a pull request from a contributor that refactors much of botan-bindings - it covers a lot of the things that I’ve mentioned in the todo list in the README, which is really quite nice of them.


Community proposal feedback

It’s certainly been a busy week, with some good feedback regarding the community proposal! The items everyone has mentioned have been quite helpful, and I’ve tried to respond and update / improve the proposal in turn.

I’m still looking for any feedback before officially submitting it, so if you haven’t already, please do give it a read, and bring up any thoughts, issues or concerns you might have here.

Thanks to everyone who has already responded here and on the reddit thread! :partying_face:

3 Likes

Still love following these updates! :heart:

I hope I’ll have time to do a read through of the code bases before they are finished, to maybe add a pair of fresh eyes focusing on usability and documentation.

1 Like

Its time for a minor update! This update is light on code, so we’ll go a little heavy on the details instead. :slight_smile:


In our last update, we were implementing online / incremental cipher encryption.

Nominally, everything seemed okay, and for the most part, it was / is, though I’m still working on online decryption in order to verify everything. Since I have already been dealing with a few issues with the Botan FFI (for example, the flaws in botan_cipher_output_length from the previous update), I like to be thorough, and because of that I noticed something odd, which has lead to a day or two of investigating. It turns out that output_length may not be the only ill-behaved botan_cipher_* FFI function.

botan_cipher_update is already somewhat awkward to use, in that the same function is used for encrypting and decrypting, as well as updating and finalizing. This is in part due to a conflation here of many structures in botan (block vs variable length ciphers, encryption vs authenticated encryption vs AEAD). Well, it turns out that it is a little more than just awkward - it may be incorrectly implemented for certain algorithms, in that online vs offline processing yields ciphertext of different lengths - but only for SIV and CCM.

I noticed this when I checked for parity between online and offline processing, and found that the online ciphertext was not equal, and was 7 bytes longer. I was alarmed for a while, as I was using SIV as my test case, and so I spent a bit wondering what I was doing wrong before realizing that it was algorithm-specific, and I just happened to have been using the affected algorithm. Naively, this implies that SIV / CCM are broken, but there may be some unknown characteristic of the algorithm that yields a different ciphertext if it is processed in chunks.

Further investigation has yielded that the extra ciphertext length is related to the number of processed non-final update chunks - I noticed that 7 regular updates + 1 final update yields 7 extra bytes, so I upped the test message length from 8 * g (ideal granularity) to 16 * g to test, and the result was that the online / incremental ciphertext now was 15 bytes longer than the offline. I then tested with 4 * g to confirm the formula of n - 1 extra bytes for a message of n * g where there are n updates of g-bytes-or-less. Since the length estimate does not include this extra length, for large enough messages the buffer overruns and throws a botan exception, but again only for online processing.

This issue is not a blocker, as it only affects SIV / CCM and the are cipher and AEAD modes are unaffected (and offline decryption works just fine!), but it warrants further investigation. I’m not exactly overjoyed to find that there are more flaws in the FFI, though :confused:

On the other hand, specific to AEAD, we need the tag to validate - we should not use any of the plaintext if the tag is invalid, which we only find out at the end of processing, which means we must finishing processing before we do anything with the result. Therefore online cipher processing may be of lesser value than initially thought*, at least for AEAD / non-stream ciphers. See usage note for Cipher.finish Cipher Modes — Botan

* This is due to the FFI requiring attaching the tag to the end of the ciphertext. A ciphertext could be verified before decrypting if the schema is Encrypt-then-MAC, and thus find online processing useful. This is geting down into the weeds though :grin:

Anyway, that’s what I’ve been working through for the last few days.

4 Likes

Another small update! , but a good one.

Online cipher processing

At long last, after a week or two of tedious testing, online processing for ciphers is complete! The knowledge learned will hopefully be useful in online processing for other cryptographic primitives, but for now I’m just glad to have these issues resolved:

  • Some algorithms were not consuming the full input buffer, even though I was using input blocks of the ideal size.
    • New, safer buffer estimation functions have been implemented
    • Cipher update now handles remaining input properly
  • Several issues regarding buffer sizes in Botan.Low.Cipher online processing have been fixed, and the resulting functions are suitable for use with any online-capable cipher / aead algorithm
  • It has been discovered that SIV, CCM cipher modes do not support online processing, but Botan does not throw an error if you attempt to use them anyway, instead silently yielding an invalid ciphertext. This has been noted.
  • Unit tests for cipher are complete enough for the moment to move on to another module
    • Some tests (correctly) fail for specific algorithms (online processing for SIV, CCM); these tests will ignore the pertinent algorithms in the future.

Community proposal final draft

After all the feedback, I think I can consider the community proposal to be at its final draft. I will likely be making an official proposal submission to the Haskell Foundation on Monday.

That is all for now! See you next update!

4 Likes

Is the online processing of SIV / CCM not working an FFI thing, or is it just Botan that does not support it?

And follow up question if it’s the latter… do you know if the Botan people know that the current implementation has this issue? :thinking:

2 Likes

It is a property of the SIV and CCM algorithms themselves - I dug into a NIST paper Structural Classification of Authenticated Encryption Schemes to be sure.

They are both ‘MAC-then-Encrypt’ AEAD algorithms, which mean that they must first process the plaintext once to authenticate it, and then a second time to encrypt it. This property is one of the reasons that ‘Encrypt-then-MAC’ AEAD algorithms are usually considered preferable.

The only real error is that botan returns garbage here and does not return an error code / throw an exception if you try anyway :grin:

2 Likes

It’s time for another update!

Community proposal submission

The first thing is something to celebrate - I have officially submitted the cryptography community project proposal to the Haskell Foundation - this proposal has been a few weeks in the making, so many thanks to all who provided feedback :grin:

What to work on next

Getting the community proposal written and implementing online / incremental processing for ciphers has been my focus for the last few weeks, and now that they’re both done I need to take a moment and figure out what to work on next. I have several options:

  • Get more unit tests written for botan-low
  • Incremental processing for other things
  • Merge the pull request that refactors botan-bindings
  • Work on the consistency / ergonomics / exported interface for botan-low & botan
  • Flesh out a proper task list to make these decisions easier :slight_smile:

I am probably going to go for unit tests, or merging the pull request, because it is better to get the lower level stuff done first.

Cryptography Abstractions

In the mean time, while I’m deciding that, I like to like to work on the higher-level abstractions a bit to see how practical my lower-level decisions have been. With the recent addition of online processing to the mix, that has got me thinking about two things:

  • Lazy bytestrings which lead me on to a deeper dive into their internals - very satisfying
  • Higher level abstractions, which led me to re-sketch out some cryptography classes

The impetus is that existing efforts for cryptography typeclasses seem to focus on the nitty gritty mechanics of cryptography processing. Look at crypton/ite’s HashAlgorithm (or the (Block)Cipher classes), and the functions used to operate on them:

-- crypton/ite's HashAlgorithm class
class HashAlgorithm alg where
    hashBlockSize :: ...
    hashDigestSize :: ...
    hashInternalContextSize :: ...
    hashInternalInit :: ...
    hashInternalUpdate :: ...
    hashInternalFinalize :: ...

It just concretizes how it implemented incremental hashing, rather than abstracting hashing in general. I think we can build a much better typeclass hierarchy for cryptographic primitives, by analyzing the commonalities of the algorithm-specific functions.

You’d expect that a typeclass abstraction of hashing would be something much simpler, less restrictive, like:

-- An ideal Hash typeclass
class Hash alg where
        hash :: ByteString -> Digest alg

We can also just use the same interface but with lazy bytestrings for incremental processsing, and leave how the incremental processing is achieved up to the instance. This is significant because different crypto libraries may expose incremental processing differently (eg API differences), so we mustn’t assume anything about how it performs incremental processing.

-- The only difference is that it takes lazy bytestrings
class (Hash alg) => IncrementalHash alg where
        hash :: LazyByteString -> Digest alg

Then on top of these we can build more typeclasses specific to block sizes and updates and finalizing.

The Cipher typeclasses aren’t much better; the Cipher class actually only concerns itself with initialization, without defining anything about encryption / decryption. The BlockCipher class slightly improves on this, by providing individual functions for each block cipher mode, but that conflates all modes as a requirement, plus that means that the class should be called BlockCipherMode. Furthermore, AEADs and Stream ciphers are just sort of left to implement algorithm-specific encrypt / decrypt functions on their own. Other crypto primitives do not have classes.

Don’t get me wrong, extended typeclasses for block-based and incremental hashing / ciphers is great, but we shouldn’t use that as a base typeclass nor assume that all algorithms are capable of such things. Streaming hashes and ciphers do not have (input) block sizes and can take arbitrarily small inputs, but not all algorithms can do this (eg, block-based hashes and ciphers).

I’ve sketched more out in a file but its mostly just musings and scratchwork - I haven’t even tried to compile it yet - but a consistent interface has developed so we’re getting somewhere, and just about anything would be an improvement over the current situation.

Boolean typeclass proposal

We all know there’s some awkward stuff in base that could be improved, and Bits is one of them - second only to Num in awkwardness, the Bits class is too large, and unnecessarily lumps boolean algebra functions together with bit field functions. As a result, Bits is often not properly implemented, to the point that there is no proper instance for ByteString, and I believe this negatively affects the low-level-binary Haskell ecosystem in general.

This particular pet peeve is something that’s been churning in my mind for a while, and it has only gotten more relevant with the cryptography project. I believe we can fix these things by reworking things a little, and so I’ve started working on a base proposal to split off a Boolean typeclass from Bits, something like:

class (Eq a) => Boolean a where

    complement :: a -> a -- aka not

    and :: a -> a -> a
    or  :: a -> a -> a
    xor :: a -> a -> a

    -- Potentially added to complete the boolean algebra
    false :: a
    true  :: a

class (Boolean a) => Bits a where
    ...

By splitting off a Boolean class, we can now make an instance for ByteString for boolean logic, and it also leaves the Bits class as clearly pertaining to bit fields, with more clearly relevant functions.

It is still very rough, mostly being a bunch of notes, but I think there is a good point here - what do you think?


That’s all for now - til next time! :slight_smile:

5 Likes

Amazing work Leo, thanks for all your progress reports, and community proposal!

1 Like

What, no {-# MINIMAL nand #-}? :grinning_face_with_smiling_eyes:

3 Likes