Botan bindings devlog

The first issue to solve that the low-level libraries are dependent on bytestring which isn’t the worst, but requires awkward / inefficient copying / marshalling between and adhering to the restrictions of garbage-collected bytestrings

Please don’t invent a new type to do this instead.

If you do, the end result will be way worse than the current situation, because in order to do anything with the bytes that pass though botan people will have to first convert from a bytestring, then convert back to a bytestring, because everything else uses bytestrings.

3 Likes

I don’t want to introduce or force a new type, I’m rather actually hoping to avoid it (and instead merely enable alternatives), hence the need for some low-level memory work eg to typeclassify allocation and byte-addressable things and make instances for ByteString so that they just work.

However the awkward and inconvenient truth is that ByteString still is extremely unsuitable for sensitive cryptographic operations because its lifetime and secure erasure cannot be controlled, and as an added malus, when binding to C libraries you have a high chance of having to do your own allocation which Haskell mostly avoids / does with a ByteString by secretly / unsafely yoinking out its Ptr to operate on it, relying on GHC to clean it up - wholly unsuitable.

Note: This is why the ScrubbedBytes data exists in memory

So to cast this in a different light, I am merely trying to codify how we allocate and use the underlying pointers, of which garbage-collected bytestrings are certainly the most common (and thus highly desirable) interface.


Additional clarification:

any […] bytes that pass though […] will have to first convert from a bytestring, then convert back to a bytestring, because everything else uses bytestrings

I want to specifically recognize this, and point out that only the Haskell side of ‘everything else’ uses ByteStrings - anything interacting with foreign C libraries is already not a bytestring, and specifically one of my concerns is avoiding unnecessary conversions from and to bytestrings which are an additional operation from the perspective of the foreign C library or primitive code.

That we can rather-unsafely expose a garbage-collected, pinned ByteString’s buffer to a C API is a statement that a ByteString can act as a memory buffer, not that all memory buffers should be garbage-collected pinned ByteStrings.

In a more succinct manner, ByteStrings are defined by an already-existing low-level memory management / lifetime, and should not be used to define low-level memory management. They just make a nice wrapper, that gets in the way at this lower level.

2 Likes

Update

I have been away for a few weeks, traveling for a wedding, and then recovering from it. My hands are still a bit stiff, so this update will be concise, and also very raw.

Our last update was actually in the Improving Memory thread, and progress has occurred since.

Weekly meeting notes

The notes of the last several weekly botan meetings have piled up, so the following summation covers the 3-4 meetings since the last update to this thread.

A Weekly meeting

  • Discussion regarding long-term planning, eg “where is this going”
    • There needs to be an easily-understood plan
    • Value needs to be provided at each step
    • Each step needs to carry us towards our long-term goal
    • Need to create / illustrate the plan better
  • Discussion over deciding how to handle updates & deprecations eg how we do version vs botan versioning
    • Joris’s recent work on managing / detecting the botan install helps a lot here
  • Discussion over new / known issue - clean up of memory objects
    • There are some memory leaks eg the random context (one of the oldest parts of the library)
    • Also in general Haskell does not guarantee immediate cleanup
    • It relies on GC which might not happen until program close
    • This is the quintessential problem - improving prompt cleanup - issue 68 - This is actually one of the core reasons why I am working on ‘Improving memory

The next weekly meeting

  • Weekly now include @jmct in addition to joris and myself

  • Joris was out traveling

  • My update was posted to the Improving Memory

  • Discussion and planning of immediately applicable integration of memory support wrt/ cryptography

    • Better support for control over & immediate cleanup of memory
    • Breaking up Data.Bits into Boolean, BitAddressable, ByteAddressable
    • Lots of pedantry over address space classification
    • Breaking ByteArray/Access into Allocator & Array classes
    • Generalizing ByteArray into eg MemArray Byte
    • Finite lifespan memory protected by bracket / uninterruptable masks

Last last week’s meeting

  • Gone for a wedding, no time and bad internet
  • Discussion of the interface problem
    • Applies to both memory and cryptography
    • There are many different ways to surface the functionality that we need
      • It is difficult to efficiently describe cryptography in Haskell
      • It is necessary to describe memory to describe cryptography
      • Eg, forcing a pure description onto it is possible but slow
        • Still doesn’t capture eg secure allocation and erasure
    • Reducing them down to 4 main camps, eg
      • Pure-ish
      • Monadic
      • Linear
      • Effects
    • Which is ‘best’ is a matter of opinion
    • Most common cases first
    • Going to focus on providing the most popular interfaces from memory first - monadic & IO, with some pure wrappers that use unsafePerformIO under the hood
  • Mostly read up on linear haskell and effect systems
    • Definitely need linear haskell interface at some point
      • Extremely valuable for describing memory ops in a pure context
    • Should probably be a separate library
    • Secondary / low priority
      • applying the OG to botan comes first
      • necessary for a linear-botan
  • Third interface style would require an effect + co-effect system
    • Implicit parameters are a co-effect so it makes sense to implement ‘the allocator’ as one
      • This is relevant to the question of ‘multiple allocator instances’ - doesn’t exactly solve it unless you add a scoping law
    • Implies we should have official implementations for popular effect libraries
    • Tertiary / very low priority - way future but good to have decided now
  • Did realize that one of the questions posed doesn’t actually have one answer - the allocator reference problem
    • There are smart pointers that know their owner
    • There are allocators that can query an address for ownership
    • Allocators can stack, so an address-pointer is owned by every allocator in the ancestry stack
    • Not every allocator / pointer can free / be freed
    • Stack allocators dont free, they rewind.
    • So ultimately, there has to be multiple typeclasses for these cases
      • They are low priority though, basics first
  • Discussion over algorithm identifiers
    • botan-low uses string identifiers (and functions to generate string identifiers)
    • Question: Why not data types?
      • Answer: string symbols suffice for lower-level because it is near 1:1
      • ADTs are planned for in botan
  • Discussion: Generating bindings automatically via hs-bindgen
    • give a C header file and generates the foreign imports
    • first version of botan-bindings handwritten
      • not bad for first pass
      • difficult to maintain
      • can test against handwritten bindings to verify behavior when changing to generated bindings
    • we are considering using hs-bindgen to generate much of the lower bindings to lower the maintenance burden

Last week’s meeting

  • Recovering from travel, health was priority this last week
  • Worked on refining plans, making diagrams (easier than typing)
    • Have prepped plan for sets of libraries vs interfaces
    • Has made it easy to focus on the present task, memalloc
  • memalloc is starting to take shape
    • still have plenty of open questionss
    • focusing on useful things
      • Breaking apart bytearray/access into allocator, array, and memory(?)
      • allocators, layouts and allocations
      • memory regions, addresses (still pondering)
      • references, pointers, and arrays
    • ideas for how to represent mutability
      • that is, have illustrated the different interfaces
      • eg, primmonad / primstate
      • various support for bracketing for secure stuff
    • Breaking apart bytearray/access into allocator, array, and memory(?)
      • allocRet is strange but its… complex
      • it does too much - combines allocation with initialization with arrays
      • initializable memory - eg write-once vs readwrite
    • Balancing immediate goal of replicating memory’s most popular functionality vs providing a better interface
  • Discussion: How does improving memory via memalloc serve botan?
    • Need memalloc for botan-low allocators
    • Need for cryptography abstractions
    • Need for botan implementations

This week’s update

Now that we’ve caught up with the present:

This week’s meeting notes

  • Discussion of repo organization and ownership

    • Re: Organization: Keeping related libraries of the same topic in the same repo
      • Eg botan, botan-low, and botan-bindings in one repo
      • memalloc in its own repo
    • Re: Ownership of memalloc
      • Easier just to also publish & be managed by HF (like botan)
      • Helps share responsibility & burden - many hands make light work
      • Lets me focus on long term plans
  • Discussion of getting used in more real-world applications

    • Eg getting botan in a better position to replace / provide an alternative cryptonite
  • Joris: looking at today - hsbindgen

    • tricky bits how to support multiple C++ versions
      • problematic because botan version macro support needed
      • might have to annotate our own
      • different versions vs one haskell version with conditional compilation
      • Botan backwards compatibility is a concern
      • Botan 4.0 possibly
    • knows that the unix package does conditional compilation
    • api is same but get runtime errors if using unsupported thing
    • this matches the one haskell version
      • there are functions for querying support of conditionally compiled features such that we can allow the user to check and avoid hitting the runtime exception
    • writing a script to help automate this
    • ideally make maintaining botan-bindings easier / more simply
      • user story about how to support it properly now and in the future
  • This weeks goals:

    • Publish this update
    • Get a git repo up to get eyes on memalloc

Maintenance

A summation of the past few weeks:

Joris has continued to maintain botan-bindings and botan-low, working on:

  • Refactoring the test suites to get rid of the need for multiple test targets
    • The sheer number of permutations needing to be tested caused problems
    • Original fast dirty solution had a different test target for each cryptographic operation
    • Now there is one test suite, and we can now specify subtests and filter more easily, which also helps with generating a coverage report
  • merged test suite refactoring
  • Fixed some bugs (some were managing botan bugs)
  • Botan released new version - looking at changelogs but it builds out of box with our library so maybe no change needed not done yet
  • Finished fixing last 2 bugs in CI

Planning

This is my diagram of plans & progress:

Right now I am churning through memalloc, and although there are still open questions, I am reaching the stage where I am not just designing interfaces and have now started transplanting / translating functions from memory.

I’m finding most functions to be rather straightforward, and though the addition of an allocator argument does cause some additional gruntwork, the resulting framework is cleaner, and the Layout type is doing me proud - I have successfully written allocators covering the C Stdlib malloc and free, as well as GHC’s garbage-collected ByteStrings.

Health

I don’t know what my sustained pace is going to be like, but I’ve settled in quite nicely so far, and I’m taking care of my hands.

Responding

I have many messages and things to respond to. I will get to all of it.

I am joining the Haskell Foundation

I am also pleased to announce that I will be officially joining the Haskell Foundation. This will help keep everything organized and moving along at a good pace.

:partying_face:

14 Likes

Thanks for the update! Let the Haskell Cryptography Group if there’s anything that we can do for you. :slight_smile:

2 Likes

We had our weekly meeting over botan, here are the meeting cliff notes:

Leo

  • published the last month of meeting’s notes
  • working on both the next update to the memory thread as well as the code
  • Allocators, Layouts, and Allocations - working - this is the ByteArray half of the ByteArrayAccess class reimagined
  • Very simple allocate :: alr -> Layout alr -> IO (Allocation alr) interface with an explanation of the derivation
  • Broke apart allocation and initialization, can recover the original allocRet function with an Initializer
  • Deallocation is separate class too
  • Writeup up on allocators is almost ready for an Improving Memory thread update
  • Working on typeclasses for allocations (pointerish things) - Handle / Reference / Pointer / Array - each is different
  • Will write up on pointers and arrays next

Joris

  • looking into hs-bindgen
  • figuring how autoconf works
  • main goal is to have a script that finds the botan installed version and changes the cabal build depending on the found version
  • can help us define C macros that define function availability
  • Botan FFI version has macros but only since 3.5 - older has C++ code stuff but needs to be automated

Jose

  • Unable to attend today

Meeting outcome:

Today goal:

  • publish meeting notes
  • publish allocator writeup

Weeks goals

  • finish pointer code
  • do pointer writeup
  • hs-bindgen

Not much else to say here, but a meaty update will be coming to the memalloc thread later :slight_smile:

6 Likes

Weekly meeting notes

A sizeable update has been posted to the Improving memory / memalloc thread.

Leo

  • Published and updated memalloc repo
  • Updated the Improving Memory thread with a deep dive explaining the new typeclass hierarchy
  • Focused mostly on re-creating the most-used APIs from memory
    • Successfully split ByteArray/Access up into Address, Layout and Allocator plus various allocation types
      • Allocation types are Handle, Ref, Array, and Pointer
      • Non-specific allocation type-classes include Castable and Retainable
    • Reached parity with ByteArray by implementing ByteArray.allocRet using Allocator.alloc
    • withAddress neé ByteArrayAccess.withByteArray is part of Allocator now but might become an allocation access class
    • Combined with Array, we have achieved our core goal of re-creating the ByteArray/Access class API
      • We still need to implement the functions and instances that use them though
  • Created an example of using ImplicitParams to hide the alr :: Allocator alr argument to better recover the original memory interface
    • There are a few other methods of doing this
      • eg if the monad supplies the allocator
      • or if the resulting allocation or data structure keeps a reference to its allocator
      • or if the allocator is a singleton so we can just infer it / use a Proxy
  • Implemented the Std allocator that uses GHC’s wrapping of the C malloc and free
    • It isn’t finished yet but I used it to illustrate the problem that I’ll be dealing with next
    • Basically Allocation is of kind * but Handle, Reference, Pointer, Array are all of kind * -> *
    • But we need to allow both allocating eg a polymorphic Ptr a but also something like a monomorphic ByteString that is secretly a Ptr Word8 that is secretly an Addr#
    • But I think I have a solution via parametric allocators / allocations - this is my main goal this week

Joris

  • Last week continued looking into using hs-bindgen
  • Also was working w/ autoconf (legacy way of configuring packages w/ system dependencies) for build scripting - but now looking into cabal hooks instead for
    • Main reason wanted to do this was because he noticed he was trying to write actual programs in autoconf instead of scripts - eg parsing c macros, significant logic, at that point just use cabal hooks and write a haskell program
    • This makes the build scripts way more accessible to other devs
  • also going to look at the memalloc stuff (thx!)
  • main task is hs-bindgen

Jose

  • Unable to attend

Outcome

This week:

  • Leo
    • Have a 1:1 with Jose
    • Continue to work on memalloc in order to use it in botan-low for managing allocation (and in botan in the future)
    • Focus on parametric allocators / allocations
    • Update the memalloc repo again
    • Update the Improving Memory thread again
  • Joris
    • Continue working on hs-bindgen
    • Look into cabal-hooks
    • Read up on the new memalloc stuff

Until next time!

7 Likes

December Monthly update

It has been difficult to write this month, in part due to the holidays, and being ill for a few weeks, and then I had the bittersweet duties of hosting an early Christmas potluck for a dear friend before helping them move.

I have been busy working the fine details of a sizable update pending to the memalloc thread - this of course is where most of my energy has gone. It involves a rather careful peeling apart of the concepts of memory and arrays, which will get us closer to replacing ByteArrayAccess ba with something like MemoryAccess mem Byte allowing us to generalize to Bit, Byte, Word, and so on. Nomenclature is rather sticky* but I’ve done a great deal of disambiguation, and even have some examples now of ways that memory / addresses / allocations can break common expectations.

*Much like describing the generalized concept that collects handles, references, pointers, and arrays, without saying that they handle, refer to, point to, or arrange something, which is very easy to say colloquially eg “this handle points to something” even though only pointers should “point”, a handle “handles” something but my prose-ometer violently rejects such phrasing

In particular I’ve taken a good deal of influence taken from the Ix and Data.Array.IArray classes - in fact, there is now an Addr class that corresponds to a weaker notion in between Eq and Ix- it turns out that addresses are in general not orderable, not even partially orderable, but rather only pre-orderable because it is possible that a < a - as a concrete albeit historical example, the i8086 address space and its segmented pointers. Never have I needed a PreOrd class until now!

I have also been working on a pre-proposal to split up Data.Bits into a more structured hierarchy, because doing so has actually been helpful to achieve the above, and relates to MemoryAccess mem Bit as well. The proposal has both a simple proposed hierarchy of Boolean => Bitwise => Bitfield neé Bits, and an alternative extended hierarchy that goes as far as Boolean => Bitwise => Bitfield => IntegralBitfield => SignedIntegralBitfield => TwosComplementBitfield, which pairs nicely with Num* and fromInteger, and does things like eg disambiguate logical and arithmetic shifts.

* Since it effectively defines signOf / signNum and fromInteger / toIntegralBitfield, it could even be related to / placed underneath Num leaving it to be even more ring-ish except that would force every number-like thing to talk about binary representations which would be terrible.

Regardless of any acceptance of such proposal, the resulting typeclasses should have significant utility for eg low-level memory encoding and cryptography. Once I have finished the process of editing, I’ll be publishing the update to the memalloc thread as well.

Meeting Notes 12/8/25

Leo

  • Out sick half of the week, still recovering
  • Working on an update to memalloc
  • Working on a response to Jack’s questions am glad for the interest
  • Looking at how to integrate / pull some allocation convenience functions from botan-low’s C FFI hook generator functions, so we can start using memalloc
  • Focusing on integration, so providing a better surface API (eg botan)

Joris

  • Working on hs-bindgen, its working
  • Improvin it
  • Inspirtion from rust bindings
  • Same author as the C++ library
  • Has a custom setup that configures botan bindings on the fly
  • Relies on pkg-cfg for now, but almost ready with other options
    • Example: rust bindings use pkgcfg or you can give a directory
    • Also potential for vending the source directly
  • Solves the problem of how to vend / surface botan while allowing the user to configure it

Jose

  • Nothing to report

Meeting Notes 12/15/25

Joris

  • Will be away next week because of holidays
  • After the holidays will be working in a reduced capacity
  • Continued improving the setup hooks script for botan bindings
    • Supports using extra-include-dirs
    • Work is mostly complete, waiting on hs-bindgen release

Leo

  • Was sick w/ bad migraines (weather), little to report

Jose

  • Did not attend

Meeting Notes 12/22/25

Recovering , just me and Jose today , mostly just talked with Jose about what I’ve been looking at / working on

Leo

  • Binary representations
  • Looking at ‘array’ package for inspiration
  • Breaking apart Data.Bits
  • Creating a proper binary hierarchy
  • All useful for cryptography / botan because BitString and stuff
  • can use eg newtype MagnitudeBits a = Mk a to ‘pick’ out a specialized subset of bits to talk about, which is very useful
  • Is-a vs has-a problems
  • Eg is an allocator a memory space, or does it have a memory space
  • Tried with DFs, FDs, indicates its my hierarchy thats the problem
  • Studying Array a i e for inspiration, maybe Allocator alr lay aln
  • Want to separate eg allocation vs address vs pointer
  • Maybe decouple allocation from address - then allocation can have or can be
  • Address vs eg additional allocation data eg refcount
  • Trying to split up addressing, finding (suitable) addresses, storing at an address, reading from an address, registering that address, and releasing it
  • Because an address space only cares about addresses, a memory space stores and loads at addresses so it is an address space but it doesn’t necessarily care about registering addresses its just putting a thing into the memory units according to a given layout - it says nothing about the memory space tracking what is where - and the allocator cares about vending objects which MAY involve registering addresses, if the allocation is an address and not a value being passed around
  • Still trying to figure out how where to stitch / cross the threshold of ‘addr’ to ‘aln a’ to wrapping it again ‘bs = (ptr u8, int)’
  • We’re basically taking a concrete type ‘Addr’, adding a phantom type by wrapping it as a ‘Ptr a’, then wrapping a concretized version of that as a ByteString = (Ptr Word8, Int)
  • allocative functors between monofunctors and functors (constrained by size not type)
  • haskell’s implicit allocator, every lifted value is a pointer which inverts things syntactically (we have ‘Int’ instead of ‘Lifted Int’) which makes things problematic
  • Haskell modeled as an infinite register machine
  • Problems of constraining allocation types - monomorphic
  • How Storable is reallly about producing a Layout
  • How Addr is between Ord and Ix (if we say Ord => Addr)

Jose

  • Jose will be out next week for the holidays

Meeting Notes 12/29/25

No meeting / no notes (I would be the only one attending)

Leo

  • Is working on this update / out for the holiday

Jose

  • Is out for the holiday

Joris

  • Is out for the holiday / working in a reduced capacity (but has continued to make commits - I see them!)
9 Likes

Hey Leo,

Do you think there’s space in this project for a student to help via GSoC?

1 Like

You might find some API inspiration in the semigroupoids library, which introduces e.g. class Apply which “should” be a superclass of class Applicative.

2 Likes

@LaurentRDC Quite possibly - what does that entail?

@jackdk Ah yes another one of ye olde haskell warts to be corrected for a more modern haskell :slight_smile:

Starting the new machinery: Refining botan-low

This thread has been a bit sparse lately, with most things happening in the Improving Memory thread. I don’t have in-depth meeting notes, as between various parties being ill, computer troubles, and the world at large, there hasn’t been much to say. It mostly boils down to Joris is working on some disabled tests that were failing on some architectures, and my continued memory work.

However, the memory concepts have matured enough that we can start applying them to botan, which has been our goal for the last few months, meaning now we can start the process of refining our existing code. I think the natural progression of refactoring should roughly follow the order in which the bindings were developed, starting with the oldest - so there is a vague sort of hierarchy that we will be following, but nothing too strict.


What are we replacing?

The original bindings were written in a fairly unstructured manner (as so happens before a project defines what ‘consistency’ means for it), though some attempt was made to corral memory management in Botan.Low.Internal.ByteString and Botan.Low.Make and later the Botan.Low.Remake. Its pretty gnarly, lots of wrapping alloca with throwBotanIfNegative_ and things like type WithPtr typ ptr = (forall a . typ -> (ptr -> IO a) -> IO a).

This all was moderately successful in that it got the bindings and library off the ground; however, it is fairly esoteric to follow - Remake still relies on Make, and many modules import both, so it really just gets everywhere, all just to expose a ByteString interface - we can do better (and have always planned to - the Improving Memory work is a direct response to this need).

NOTE: I originally wanted to make it so we had these function generators to minimize errors, but it really ends up getting in the way or at least obscuring things - some of these things are almost easier to just unroll and write by hand, especially the ones with only one or two arguments. Hopefully this new implementation will be more straightforward and less ‘magic’, even if it means we have to implement some things more manually / boilerplate-y. Counterpoint, they do handle the packing and unpacking of the bindings-level data types, which is necessary eg because the bindings hides the Ptr behind a newtype. Counter-counterpoint, while the bindings data types have always been rather strange, it is to match the actual Botan C structs using CAPIFFI and CTYPE, so I digress for now…

Nothing to do now except to dive on in!


Updating RNG

We will start with the Botan.Low.RNG module - it is the first, and the oldest module. It also has a fairly simple interface, which minimize our surface / exposure.

Since it has been a while, we’ll go over the interface, and make some comments.

First up, the star of this module,

newtype RNG
    = MkRNG 
    { getRNGForeignPtr :: ForeignPtr BotanRNGStruct
    } 

It just wraps a foreign pointer to a botan object (in this case, our RNG context). Interestingly, while it does use BotanRNGStruct from Botan.Bindings.RNG, it doesn’t use the BotanRNG object, probably because we really only need the pointer inside so it uses a ‘magic’ mkBindings function to define some helper functions that automatically unwrap the bindings-level BotanRNG type:

withRNG     :: RNG -> (BotanRNG -> IO a) -> IO a
rngDestroy  :: RNG -> IO ()
createRNG   :: (Ptr BotanRNG -> IO CInt) -> IO RNG
(withRNG, rngDestroy, createRNG)
    = mkBindings MkBotanRNG (. runBotanRNG) MkRNG (. getRNGForeignPtr) botan_rng_destroy

Yeah, I feel like this doesn’t need to be that convoluted. However, it is dealing with the RNG context, it is working fine (if it ain’t broke don’t fix it), and refactoring that isn’t our primary goal

  • Note that RNG’s inner ForeignPtr (and its own inner Ptr) does make for a great application of the WithAllocation/WithMem mem class, which more or less generalizes the WithPtr concept that I’d sort of been lumping in with allocation.
  • Also, this is a great example of memory that is a handle but not a reference or pointer (well, it is, but its a pointer to an empty / opaque data BotanRNGStruct we can’t dereference it or do anything with it except supply it as an argument)
  • Hence WithRNG is an expression of said WithMem concept itself, and contains a foreign pointer beneath that also conforms, and ditto the pointer underneath that

Following this are some pattern synonyms:

type RNGType = ByteString

pattern SystemRNG, UserRNG, UserThreadsafeRNG, RDRandRNG ::  RNGType
pattern SystemRNG = ...

These are used because the bindings take the RNG type as a string - we leave hiding this as a sum type to the higher-level library. Moving on…

Our first function in this module is the one that creates an RNG context:

rngInit :: RNGType -> IO RNG
rngInit = mkCreateObjectCString createRNG botan_rng_init

Earlier, when I said that these magic generators sometimes obscure more than they help? Perfect example - mkCreateObjectCString really just calls mkCreateObjectWith which calls withCString and createObject and init all of which are arguments to it that we could just call ourselves. So maybe we will clean this up at some point D: eek

Note that the RNG context object’s destructor was wrapped into the foriegn pointer’s finalizer so it will automatically be destroyed when the last Haskell reference to it is garbage collected - as an improvement to this API, we could also expose a finite-lifetime equivalent, eg withRNGInitTemporary that guarantees more prompt cleanup (although conceivably this is more useful for other modules eg ones that allocate memory for secret keys and such).

After that, we have a few functions rngReseed, rngReseedFromRNG, and rngAddEntropy that we will skip because they only deal with affecting the RNG context.

This leaves the only 2 other functions in this module (it is a rather small interface) - but they are the only ones we really care about. One takes an RNG context as an argument, the other assumes the system RNG.

rngGet :: RNG -> Int -> IO ByteString
rngGet rng len = withRNG rng $ \ botanRNG -> do
    allocBytes len $ \ bytesPtr -> do
        throwBotanIfNegative_ $ botan_rng_get botanRNG bytesPtr (fromIntegral len)

systemRNGGet :: Int -> IO ByteString
systemRNGGet len = allocBytes len $ \ bytesPtr -> do
    throwBotanIfNegative_ $ botan_system_rng_get bytesPtr (fromIntegral len)

So, how can we improve this API? It is so small, there isn’t much to do - but there is one thing: we can return something other than bytestring, something more like rngGet :: (ByteArray ba) => RNG -> Int -> IO ba where ByteArray mem ~ MemArray mem Word8; this would be an immediate improvement and major boon to the interface, not having to bounce through ByteString means we can be more type-safe while potentially avoiding an extra copy operation that could otherwise be a significant penalty to speed.

NOTE: I do find it interesting that the RNG context acts in some ways as some sort of pseudo-allocator - doubly so with the ‘implicit’ system allocator. However, we must still actually allocBytes ourselves on the inside so it really hides the allocator and is more of an initializer.

On the inside, there isn’t much to do, except to consider how our new polymorphism and class constraint affect things - we can no longer use allocBytes because that implies ByteString, obviously we need to replace it with the type’s respective allocator functions, meaning we can either rely on an implicit allocator, or add one as an explicit argument, eg something like

-- The new rngGet with an allocator implied by the ba type
rngGet :: (ByteArray ba, ImplicitAllocator ba) => RNG -> Int -> IO ByteString
rngGet len = withRNG rng $ \ botanRNG -> do
    allocInit len $ \ bytesPtr -> do
        throwBotanIfNegative_ $ botan_rng_get botanRNG bytesPtr (fromIntegral len)

-- An explicitly provided allocator
-- NOTE: These are not quite the real class constraints but close enough
rngGetAlloc :: (ByteArray ba, Allocator alloc ba) => alloc -> RNG -> Int -> IO ByteString
rngGetAlloc alloc rng len = withRNG rng $ \ botanRNG -> do
    allocInitWith alloc len $ \ bytesPtr -> do
        throwBotanIfNegative_ $ botan_rng_get botanRNG bytesPtr (fromIntegral len)

-- We could even expose eg copy- variants in keeping with the memfunctor stuff
-- This is like working with the bindings, except now we'd be flexible on the pointer type, eg it could be Ptr or ForeignPtr and so on.
rngGetCopy :: (ByteArray ba, Allocator alloc ba) => RNG -> ba -> Int -> IO ()
rngGetCopy = ... -- TODO: fill the already-allocated memory with random samples

NOTE: As mentioned before, we could go further and apply the WithMem concept to the RNG context object vis a vi withRNG which would become withHandle rng $ \ hdl -> ... - it sure would be nice to unify / get rid of the bajillion withFoo functions that botan-low has…

And then, whatever we do with rngGet, we will want to do the same thing with systemRNGGet, and that’s it for the RNG module!


We are starting light, with this being more of a refresher on how the bindings are constructed than making grand sweeping changes, but that is quite alright.

3 Likes

Monday Update

No meeting notes today, I was the only one in attendance. Last week saw the start of the botan-low + memalloc refactor, mostly getting our feet planted beneath us by dissecting the existing RNG interface, and discussing what we wanted to preserve and what we wanted to replace. Major points of were the complexity of the Internal.ByteString, Make, and Remake modules, the resulting mkBindings as a method of constructing context objects, and a quick review of the RNG interface. Later, in the memory thread we presented an in-depth discussion of the new Memory interface that will be handling the allocation and object construction moving forwards.

Continuing where we left off:

A look at mkBindings and handling context objects - BotanRNG vs RNG

One of the things that I have been mulling over is the necessity of unwrapping and wrapping pointers. The mkBindings function is obviously a critical component, it performs a vital duty, so we cannot get rid of it - and yet it is ugly and terrible. If anything could be improved - here it is. So let us look closer to understand why it exists; then we can replace it.

Botan.Bindings.RNG defines BotanRNG in the following manner:

-- | Opaque RNG struct
data {-# CTYPE "botan/ffi.h" "struct botan_rng_struct" #-} BotanRNGStruct

-- | Botan RNG object
newtype {-# CTYPE "botan/ffi.h" "botan_rng_t" #-} BotanRNG
    = MkBotanRNG { runBotanRNG :: Ptr BotanRNGStruct }
        deriving newtype (Eq, Ord, Storable)

foreign import capi safe "botan/ffi.h botan_rng_init"
    botan_rng_init
        :: Ptr BotanRNG
        -> ConstPtr CChar
        -> IO CInt

foreign import capi safe "botan/ffi.h &botan_rng_destroy"
    botan_rng_destroy
        :: FinalizerPtr BotanRNGStruct

Then, Botan.Low.RNG defines RNG in the following manner:

newtype RNG = MkRNG { getRNGForeignPtr :: ForeignPtr BotanRNGStruct }

withRNG     :: RNG -> (BotanRNG -> IO a) -> IO a
rngDestroy  :: RNG -> IO ()
createRNG   :: (Ptr BotanRNG -> IO CInt) -> IO RNG
(withRNG, rngDestroy, createRNG)
    = mkBindings MkBotanRNG (.runBotanRNG) MkRNG (.getRNGForeignPtr) botan_rng_destroy

rngInit
    :: RNGType
    -> IO RNG
rngInit = mkCreateObjectCString createRNG botan_rng_init

If we proceed through this step by step (and ignore how awkwards mkBindings is), we note that it is not all that complicated, and that we have several reasons for doing what we did. Making improvements here is valuable, because most all Botan context objects follow this pattern, so whatever we do here probably will affect every other module in a similar manner (we do have a high degree of consistency in that regard).

  • We have an opaque Botan struct type signified by an empty data type BotanRNGStruct - this makes sense because as an opaque type, we can never actually see an instance of it.
  • We have a newtype wrapper around a Ptr to a BotanRNGStruct - we can actually have an instance of this, it is just a pointer to an opaque struct.
  • The empty data BotanRNGStruct and newtype BotanRNG are necessary in order to use CApiFFI and CTYPE, due to type safety requirements.
  • This requires boilerplate / wrapping / unwrapping to use the pointer, but is a zero-cost abstraction
  • The Ptr is unmanaged, and we must allocate and provide one to be filled
  • There is a FinalizerPtr in the form of botan_rng_destroy, but Ptr needs to be a ForeignPtr to use it
    • Maybe we should have both botan_rng_destroy and botan_rng_destroy_finalizer
  • botan-low lifts the BotanRNG from a Ptr BotanRNGStruct to a ForeignPtr BotanRNGStruct so we can finalize the memory when no references remain
    • The RNG newtype isn’t strictly necessary, unlike BotanRNGStruct and BotanRNG; it provides value but could be reduced to a type alias.
    • The Botan API requires that we allocate a temporary pointer Ptr BotanRNG (that is, a Ptr (Ptr BotanRNGStruct)) for the ‘create’ function to populate
    • Then, mkBindings takes that unmanaged Ptr and stuffs it into a ForeignPtr for the GC to track
    • The temporary pointer uses alloca so it is good to know we are not leaking that pointer either
  • mkBindings and eg mkCreateObjectCString are mostly just thin wrappers for converting to and interacting with the inner foreign pointer
  • The mkCreateObjectFoo functions sort of combine this lifting while also handling initialization arguments, but may no longer provide as much / any convenience due to the new Memory classes
  • We didnt have Memory.Pointer last time to handle things, so mkBindings filled the gap
  • The new Memory.Memory and Memory.Pointer classes allow us to provide the same functionality more naturally - I can just access the inner pointer using withMem (which also replaces the withFoo that mkBindings generates)

So, it turns out that some of the complexity is unavoidable - especially the BotanRNGStruct vs BotanRNG vs RNG stuff. We really do need to unwrap the Ptr and pack it into a ForeignPtr - to allow it to be tracked by the GC, and because a ForeignPtr BotanRNG is a ForeignPtr (Ptr BotanRNGStruct) and not a ForeignPtr BotanRNGStruct. We could turn RNG into a type instead of a newtype but type safety would suffer and I’d rather take advantage of the new Memory classes.

But, it seems that we should be able to more or less get rid of mkBindings, if we don’t really need any of its functions anymore! The new memory classes can handle both allocation (create and destroy) and pointer access (withPtr), so all that complexity can just be tossed out.

This does mean that our context objects will have to conform to some classes, and the botan-low library will have slightly more responsiblities, but I think upkeep will be more tolerable, and as a bonus we’ll be polymorphic over the return type of various functions so we can use anything that conforms to the new memory classes instead of being stuck with ByteString. That, I think, makes all this worthwhile.

5 Likes

This is a really solid breakdown, thanks for writing it up. The step-by-step walk through why mkBindings exists makes it clear that most of the ugliness isn’t accidental, it’s just the natural consequence of juggling CTYPE, CApiFFI, unmanaged Ptrs, and GC-tracked lifetimes.

I especially agree with the conclusion that the real pain point isn’t BotanRNGStruct vs BotanRNG vs RNG (that separation feels unavoidable and actually useful), but that mkBindings ended up being a one-off “mini memory framework” before you actually had proper memory abstractions. In that light, replacing it with Memory.Memory / Memory.Pointer feels like the right direction rather than just another refactor for aesthetics.

The observation that ForeignPtr BotanRNGStruct vs ForeignPtr BotanRNG forces the unwrap/repack dance is an important one too - it’s easy to forget how much of this is dictated by FFI type safety rather than design preference.

One thing I really like about the new approach is the knock-on effect you mention at the end: making context objects conform to common memory classes and becoming polymorphic over return types. That feels like a genuine improvement in expressiveness, not just a cleanup, and probably pays for the added responsibility in botan-low.

Overall this feels like a good example of “we tried to abstract too early, now we have the right primitives” - and I’m very much in favor of deleting mkBindings once it’s no longer pulling its weight.

Curious to see how this pattern generalizes across the other Botan contexts.

1 Like

This is a great write-up. I really appreciate how clearly you separate “incidental ugliness” from the parts that are genuinely forced by CApiFFI and type safety. The BotanRNGStruct / BotanRNG / RNG layering reads much less like over-engineering once you walk through the constraints.

The framing of mkBindings as an early, ad-hoc memory abstraction really resonates. Given that you now have Memory.Memory and Memory.Pointer, it feels like the natural evolution rather than a breaking redesign—using the right primitives once they exist.

I also like the point about polymorphism over return types being a feature gain, not just cleanup. That kind of payoff usually signals the refactor is doing real work.

Deleting mkBindings once it’s redundant sounds like a net win for maintainability. Very curious to see how cleanly this pattern carries over to the other Botan context objects.

1 Like

Update continued: Replacing mkBindings with BotanObject

Picking up where we left off yesterday, its time for some code! We’re going to replace the mkBindings function with something more compact.

Minor aside - upgrading Memorable

Per yesterday’s discussion, mkBindings takes 2 newtype constructors-getter pairs (one wraps a Ptr, the other a ForeignPtr), and a FinalizerPtr for the destructor. All of the botan context objects follow this pattern, so we’re going to just codify this as a typeclass, making it a lot easier to wrangle.

Preparing today’s update involved making a few small changes to the Memorable class:

class ... => Memorable memo where
    -- Added
    type MemRep memo :: Type
    -- Changed
    withMem :: memo -> (Mem memo (MemRep memo) -> IO a) -> IO a

We have added a new MemRep associated type family, for the underlying memory type - so for ByteString, which is a Ptr Word8, Mem is Ptr, and MemRep is Word8, respectively. This allows us to not care whether a Memory type is monomorphic (eg, ByteString) or polymorphic (eg ForeignPtr). I had suspected that this would be necessary sooner or later, per earlier notes on the wildcard a parameter in withMem, which has now become fixed to MemRep memo.

NOTE: This change in general does have implications that I am still pondering, such as no longer being 1:1 drop-in compatible with memory:ByteArrayAccess.withByteArray, but I’d rather allow non-castable memory types and require that you use castPtr, than the inverse, and it seems so much more sensible that if we allocate a ByteString with a length corresponding to a number of bytes, then when we access the pointer of that bytestring it should be a Ptr Word8, and we should have to cast when we want anything else. It is more type-safe, and the longer I think about it, the more confused I am as to why memory did it that way.

Back to refactoring Botan.Low.RNG

Now, we’re going to start with a bit of a fresh slate - no Internal.ByteString or Make or Remake, just the bindings and our new supporting Memory classes:

-- Used to define BotanObject
{-# LANGUAGE AllowAmbiguousTypes #-}

import Botan.Bindings.ConstPtr (ConstPtr (..))
import Botan.Bindings.RNG

import Control.Monad (void)
import Control.Exception (mask_)

import Data.Kind
import Data.ByteString (ByteString)

import Foreign.Ptr (Ptr)
import qualified Foreign.Ptr as Ptr
import qualified Foreign.Storable as Ptr

import Foreign.ForeignPtr (ForeignPtr, FinalizerPtr)
import qualified Foreign.ForeignPtr as ForeignPtr

import Foreign.C.Types (CInt)
import Foreign.Marshal.Alloc (alloca)

-- This vvv is really the only 'new' import
-- I could actually pull in even more but this suffices

import Memory.Memory
import Memory.Pointer

There are a few things to note immediately - because RNG and BotanRNG are just a ForeignPtr and a Ptr respectively, they actually conform to Memorable.

BotanRNG gets an orphan instance since it is declared in Botan.Bindings.RNG.

instance Memorable BotanRNG where
    type Mem BotanRNG = Ptr
    type MemRep BotanRNG = BotanRNGStruct
    withMem (MkBotanRNG ptr) action = action ptr

Our definition of RNG hasn’t changed, it just gets its new Memorable instance.

newtype RNG = MkRNG { foreignPtr :: ForeignPtr BotanRNGStruct }

instance Memorable RNG where
    type Mem RNG = ForeignPtr
    type MemRep RNG = BotanRNGStruct
    withMem (MkRNG fptr) action = action fptr

Now comes the bit of the code where before we would use mkBindings to generate these functions:

withRNG     :: RNG -> (BotanRNG -> IO a) -> IO a
rngDestroy  :: RNG -> IO ()
createRNG   :: (Ptr BotanRNG -> IO CInt) -> IO RNG

Since all these do is pack and unpack pointers and finalizers, we are going to codify it as a typeclass instead of an awkward function that returns functions - that means we need to take a look at mkBindings itself, to see how it works:

mkBindings
    ::  (Storable botan)
    =>  (Ptr struct -> botan)                                   -- mkBotan
    ->  (botan -> Ptr struct)                                   -- runBotan
    ->  (ForeignPtr struct -> object)                           -- mkForeign
    ->  (object -> ForeignPtr struct)                           -- runForeign
    ->  FinalizerPtr struct                                     -- destroy / finalizer
    ->  (   object -> (botan -> IO a) -> IO a                   -- withObject
        ,   object -> IO ()                                     -- destroyObject
        ,   (Ptr botan -> IO CInt) -> IO object                 -- createObject
        )
mkBindings mkBotan runBotan mkForeign runForeign destroy = bindings where
    bindings = (withObject, objectDestroy, createObject)
    newObject botan = do
        foreignPtr <- newForeignPtr destroy (runBotan botan)
        return $ mkForeign foreignPtr
    withObject object f = withForeignPtr (runForeign object) (f . mkBotan)
    objectDestroy object = finalizeForeignPtr (runForeign object)
    createObject = mkCreateObject newObject

mkCreateObject
    :: (Storable botan)
    => (botan -> IO object)
    -> (Ptr botan-> IO CInt)
    -> IO object
mkCreateObject newObject init = mask_ $ alloca $ \ outPtr -> do
        throwBotanIfNegative_ $ init outPtr
        out <- peek outPtr
        newObject out

:melting_face:

I think it is about the most confusing code I have ever written. That’s because botan requires that we allocate a pointer to a pointer to an opaque struct*, that it fills, that we have to peek at, attach a finalizer to, and wrap it up, all while handling a potential allocation or initialization failure. Luckily we were fairly smart - we use mask_ and alloca for the ptr-ptr, it is really just confusing as to when things are what, and that type definition is some horror upon the deep.

* Technically, a pointer to the CApiFFI-enforced newtype-wrapper over a pointer to an opaque struct, that we must first unwrap before rewrapping…

So lets clean that up with a little more of our recently-favorite hammer, TypeFamilies, shall we?

NOTE: Data families would also work, if we redefined Bindings and Low as a single module, and if CApiFFI / CTYPE allowed it (no idea if it does)

class
    ( Memorable a
    , Memorable (BotanPtr a)
    , Mem a            ~ ForeignPtr
    , Mem (BotanPtr a) ~ Ptr
    , MemRep a            ~ BotanStruct a
    , MemRep (BotanPtr a) ~ BotanStruct a
    ) => BotanObject a where

    type family BotanStruct a :: Type
    type family BotanPtr    a :: Type

    toBotanPtr :: Ptr (BotanStruct a) -> BotanPtr a
    toBotan    :: ForeignPtr (BotanStruct a) -> a
    botanFinalizer :: FinalizerPtr (BotanStruct a)

    withBotanPtr :: a -> (BotanPtr a -> IO b) -> IO b
    createBotan :: (Ptr (BotanPtr a) -> IO CInt) -> IO a
    destroyBotan :: a -> IO ()

Lets take a moment to clarify the way this works: BotanStruct Foo now refers to BotanFooStruct, and BotanPtr Foo is now a Memorable, who’s mem is a Ptr and who’s rep is a BotanFooStruct. We just codified a relationship between the wrapper types such that the Foo type (which is a Memorable ForeignPtr BotanFooStruct) ties them all together., that’s all. It feels odd to declare a type family and then immediately force-constrain it, but remember, we’re actually constraining the corresponding Mem and MemRep types to be relevant to each other.

Then, once you deal with your types, we can fill in the functions with reasonable defaults:

-- class BotanObject a continued

    withBotanPtr :: a -> (BotanPtr a -> IO b) -> IO b
    withBotanPtr botan action =
        withMem botan $ \ fptr -> do
            withMem fptr $ \ ptr -> do
                action (toBotanPtr @a ptr)

    createBotan :: (Ptr (BotanPtr a) -> IO CInt) -> IO a
    createBotan init = mask_ $ alloca $ \ ptrPtr -> do
        throwBotanIfNegative_ $ init (Ptr.castPtr ptrPtr) -- NOTE: Can be defined without this cast
        ptr <- Ptr.peek ptrPtr
        fptr <- ForeignPtr.newForeignPtr (botanFinalizer @a) ptr
        return $ toBotan fptr

    destroyBotan :: a -> IO ()
    destroyBotan botan = withMem botan ForeignPtr.finalizeForeignPtr

The noted cast turns a Ptr (Ptr BotanFooStruct) into a Ptr (BotanPtr Foo) for the init method, which we could avoid by applying a Ptr.Storable (BotanPtr a) constraint to the BotanObject a instead - we just cast it knowing that it is a newtype over the pointer we want. Then we just get the pointer out from the temporary ptr-ptr, stick it in a ForeignPtr with our finalizer, before finally returning the wrapped foreign pointer.

NOTE: Ideally I’d be doing something like withMem ptrPtr $ \ ptr -> throwBotanIfNegative_ $ init (toBotanPtr @a ptr) but (as part of the aforementioned repercussions of de-wilding withMem) the withMem implementation for Ptr is currently simply id meaning we can’t use it on a ptr-ptr like we expected - to fix that we would need to relax Memorable (Ptr a) to Memorable (Ptr (Ptr a)) which I might in the near future.

So, how well does this work? Let’s define our instance for RNG - it should be reasonably similar to how difficult it will be for other Botan objects, so this will give idea of how hard this will be to apply to the rest of the library:

instance BotanObject RNG where

    type BotanStruct RNG = BotanRNGStruct
    type BotanPtr    RNG = BotanRNG

    toBotanPtr ptr  = MkBotanRNG ptr
    toBotan    fptr = MkRNG      fptr
    botanFinalizer  = botan_rng_destroy

And that’s it! RNG no longer needs mkBindings, this does all of the same work - toBotanPtr and toBotan act as constructors, withMem acts as a getter / pattern match, and the finalizer is our destructor!

Way less complicated! :partying_face:


Next up, we’ll be dealing with the helper methods, such as mkCreateObjectCString in:

rngInit :: RNGType -> IO RNG
rngInit = mkCreateObjectCString createBotan botan_rng_init

Nominally, there is nothing wrong with this, aside from mkCreateObjectCString being a part of mkBindings, and thus, is no longer imported. However, it is just a thin wrapper around createBotan that calls withMem over a bytestring before passing it as an additional argument - very reader / profunctor-ish, possibly unnecessary or kept after refactoring to also use the Memory classes.

We will get to that next time, as we continue finishing our refactor of the RNG module.

3 Likes