Botan bindings devlog

The first issue to solve that the low-level libraries are dependent on bytestring which isn’t the worst, but requires awkward / inefficient copying / marshalling between and adhering to the restrictions of garbage-collected bytestrings

Please don’t invent a new type to do this instead.

If you do, the end result will be way worse than the current situation, because in order to do anything with the bytes that pass though botan people will have to first convert from a bytestring, then convert back to a bytestring, because everything else uses bytestrings.

3 Likes

I don’t want to introduce or force a new type, I’m rather actually hoping to avoid it (and instead merely enable alternatives), hence the need for some low-level memory work eg to typeclassify allocation and byte-addressable things and make instances for ByteString so that they just work.

However the awkward and inconvenient truth is that ByteString still is extremely unsuitable for sensitive cryptographic operations because its lifetime and secure erasure cannot be controlled, and as an added malus, when binding to C libraries you have a high chance of having to do your own allocation which Haskell mostly avoids / does with a ByteString by secretly / unsafely yoinking out its Ptr to operate on it, relying on GHC to clean it up - wholly unsuitable.

Note: This is why the ScrubbedBytes data exists in memory

So to cast this in a different light, I am merely trying to codify how we allocate and use the underlying pointers, of which garbage-collected bytestrings are certainly the most common (and thus highly desirable) interface.


Additional clarification:

any […] bytes that pass though […] will have to first convert from a bytestring, then convert back to a bytestring, because everything else uses bytestrings

I want to specifically recognize this, and point out that only the Haskell side of ‘everything else’ uses ByteStrings - anything interacting with foreign C libraries is already not a bytestring, and specifically one of my concerns is avoiding unnecessary conversions from and to bytestrings which are an additional operation from the perspective of the foreign C library or primitive code.

That we can rather-unsafely expose a garbage-collected, pinned ByteString’s buffer to a C API is a statement that a ByteString can act as a memory buffer, not that all memory buffers should be garbage-collected pinned ByteStrings.

In a more succinct manner, ByteStrings are defined by an already-existing low-level memory management / lifetime, and should not be used to define low-level memory management. They just make a nice wrapper, that gets in the way at this lower level.

2 Likes

Update

I have been away for a few weeks, traveling for a wedding, and then recovering from it. My hands are still a bit stiff, so this update will be concise, and also very raw.

Our last update was actually in the Improving Memory thread, and progress has occurred since.

Weekly meeting notes

The notes of the last several weekly botan meetings have piled up, so the following summation covers the 3-4 meetings since the last update to this thread.

A Weekly meeting

  • Discussion regarding long-term planning, eg “where is this going”
    • There needs to be an easily-understood plan
    • Value needs to be provided at each step
    • Each step needs to carry us towards our long-term goal
    • Need to create / illustrate the plan better
  • Discussion over deciding how to handle updates & deprecations eg how we do version vs botan versioning
    • Joris’s recent work on managing / detecting the botan install helps a lot here
  • Discussion over new / known issue - clean up of memory objects
    • There are some memory leaks eg the random context (one of the oldest parts of the library)
    • Also in general Haskell does not guarantee immediate cleanup
    • It relies on GC which might not happen until program close
    • This is the quintessential problem - improving prompt cleanup - issue 68 - This is actually one of the core reasons why I am working on ‘Improving memory

The next weekly meeting

  • Weekly now include @jmct in addition to joris and myself

  • Joris was out traveling

  • My update was posted to the Improving Memory

  • Discussion and planning of immediately applicable integration of memory support wrt/ cryptography

    • Better support for control over & immediate cleanup of memory
    • Breaking up Data.Bits into Boolean, BitAddressable, ByteAddressable
    • Lots of pedantry over address space classification
    • Breaking ByteArray/Access into Allocator & Array classes
    • Generalizing ByteArray into eg MemArray Byte
    • Finite lifespan memory protected by bracket / uninterruptable masks

Last last week’s meeting

  • Gone for a wedding, no time and bad internet
  • Discussion of the interface problem
    • Applies to both memory and cryptography
    • There are many different ways to surface the functionality that we need
      • It is difficult to efficiently describe cryptography in Haskell
      • It is necessary to describe memory to describe cryptography
      • Eg, forcing a pure description onto it is possible but slow
        • Still doesn’t capture eg secure allocation and erasure
    • Reducing them down to 4 main camps, eg
      • Pure-ish
      • Monadic
      • Linear
      • Effects
    • Which is ‘best’ is a matter of opinion
    • Most common cases first
    • Going to focus on providing the most popular interfaces from memory first - monadic & IO, with some pure wrappers that use unsafePerformIO under the hood
  • Mostly read up on linear haskell and effect systems
    • Definitely need linear haskell interface at some point
      • Extremely valuable for describing memory ops in a pure context
    • Should probably be a separate library
    • Secondary / low priority
      • applying the OG to botan comes first
      • necessary for a linear-botan
  • Third interface style would require an effect + co-effect system
    • Implicit parameters are a co-effect so it makes sense to implement ‘the allocator’ as one
      • This is relevant to the question of ‘multiple allocator instances’ - doesn’t exactly solve it unless you add a scoping law
    • Implies we should have official implementations for popular effect libraries
    • Tertiary / very low priority - way future but good to have decided now
  • Did realize that one of the questions posed doesn’t actually have one answer - the allocator reference problem
    • There are smart pointers that know their owner
    • There are allocators that can query an address for ownership
    • Allocators can stack, so an address-pointer is owned by every allocator in the ancestry stack
    • Not every allocator / pointer can free / be freed
    • Stack allocators dont free, they rewind.
    • So ultimately, there has to be multiple typeclasses for these cases
      • They are low priority though, basics first
  • Discussion over algorithm identifiers
    • botan-low uses string identifiers (and functions to generate string identifiers)
    • Question: Why not data types?
      • Answer: string symbols suffice for lower-level because it is near 1:1
      • ADTs are planned for in botan
  • Discussion: Generating bindings automatically via hs-bindgen
    • give a C header file and generates the foreign imports
    • first version of botan-bindings handwritten
      • not bad for first pass
      • difficult to maintain
      • can test against handwritten bindings to verify behavior when changing to generated bindings
    • we are considering using hs-bindgen to generate much of the lower bindings to lower the maintenance burden

Last week’s meeting

  • Recovering from travel, health was priority this last week
  • Worked on refining plans, making diagrams (easier than typing)
    • Have prepped plan for sets of libraries vs interfaces
    • Has made it easy to focus on the present task, memalloc
  • memalloc is starting to take shape
    • still have plenty of open questionss
    • focusing on useful things
      • Breaking apart bytearray/access into allocator, array, and memory(?)
      • allocators, layouts and allocations
      • memory regions, addresses (still pondering)
      • references, pointers, and arrays
    • ideas for how to represent mutability
      • that is, have illustrated the different interfaces
      • eg, primmonad / primstate
      • various support for bracketing for secure stuff
    • Breaking apart bytearray/access into allocator, array, and memory(?)
      • allocRet is strange but its… complex
      • it does too much - combines allocation with initialization with arrays
      • initializable memory - eg write-once vs readwrite
    • Balancing immediate goal of replicating memory’s most popular functionality vs providing a better interface
  • Discussion: How does improving memory via memalloc serve botan?
    • Need memalloc for botan-low allocators
    • Need for cryptography abstractions
    • Need for botan implementations

This week’s update

Now that we’ve caught up with the present:

This week’s meeting notes

  • Discussion of repo organization and ownership

    • Re: Organization: Keeping related libraries of the same topic in the same repo
      • Eg botan, botan-low, and botan-bindings in one repo
      • memalloc in its own repo
    • Re: Ownership of memalloc
      • Easier just to also publish & be managed by HF (like botan)
      • Helps share responsibility & burden - many hands make light work
      • Lets me focus on long term plans
  • Discussion of getting used in more real-world applications

    • Eg getting botan in a better position to replace / provide an alternative cryptonite
  • Joris: looking at today - hsbindgen

    • tricky bits how to support multiple C++ versions
      • problematic because botan version macro support needed
      • might have to annotate our own
      • different versions vs one haskell version with conditional compilation
      • Botan backwards compatibility is a concern
      • Botan 4.0 possibly
    • knows that the unix package does conditional compilation
    • api is same but get runtime errors if using unsupported thing
    • this matches the one haskell version
      • there are functions for querying support of conditionally compiled features such that we can allow the user to check and avoid hitting the runtime exception
    • writing a script to help automate this
    • ideally make maintaining botan-bindings easier / more simply
      • user story about how to support it properly now and in the future
  • This weeks goals:

    • Publish this update
    • Get a git repo up to get eyes on memalloc

Maintenance

A summation of the past few weeks:

Joris has continued to maintain botan-bindings and botan-low, working on:

  • Refactoring the test suites to get rid of the need for multiple test targets
    • The sheer number of permutations needing to be tested caused problems
    • Original fast dirty solution had a different test target for each cryptographic operation
    • Now there is one test suite, and we can now specify subtests and filter more easily, which also helps with generating a coverage report
  • merged test suite refactoring
  • Fixed some bugs (some were managing botan bugs)
  • Botan released new version - looking at changelogs but it builds out of box with our library so maybe no change needed not done yet
  • Finished fixing last 2 bugs in CI

Planning

This is my diagram of plans & progress:

Right now I am churning through memalloc, and although there are still open questions, I am reaching the stage where I am not just designing interfaces and have now started transplanting / translating functions from memory.

I’m finding most functions to be rather straightforward, and though the addition of an allocator argument does cause some additional gruntwork, the resulting framework is cleaner, and the Layout type is doing me proud - I have successfully written allocators covering the C Stdlib malloc and free, as well as GHC’s garbage-collected ByteStrings.

Health

I don’t know what my sustained pace is going to be like, but I’ve settled in quite nicely so far, and I’m taking care of my hands.

Responding

I have many messages and things to respond to. I will get to all of it.

I am joining the Haskell Foundation

I am also pleased to announce that I will be officially joining the Haskell Foundation. This will help keep everything organized and moving along at a good pace.

:partying_face:

14 Likes

Thanks for the update! Let the Haskell Cryptography Group if there’s anything that we can do for you. :slight_smile:

2 Likes

We had our weekly meeting over botan, here are the meeting cliff notes:

Leo

  • published the last month of meeting’s notes
  • working on both the next update to the memory thread as well as the code
  • Allocators, Layouts, and Allocations - working - this is the ByteArray half of the ByteArrayAccess class reimagined
  • Very simple allocate :: alr -> Layout alr -> IO (Allocation alr) interface with an explanation of the derivation
  • Broke apart allocation and initialization, can recover the original allocRet function with an Initializer
  • Deallocation is separate class too
  • Writeup up on allocators is almost ready for an Improving Memory thread update
  • Working on typeclasses for allocations (pointerish things) - Handle / Reference / Pointer / Array - each is different
  • Will write up on pointers and arrays next

Joris

  • looking into hs-bindgen
  • figuring how autoconf works
  • main goal is to have a script that finds the botan installed version and changes the cabal build depending on the found version
  • can help us define C macros that define function availability
  • Botan FFI version has macros but only since 3.5 - older has C++ code stuff but needs to be automated

Jose

  • Unable to attend today

Meeting outcome:

Today goal:

  • publish meeting notes
  • publish allocator writeup

Weeks goals

  • finish pointer code
  • do pointer writeup
  • hs-bindgen

Not much else to say here, but a meaty update will be coming to the memalloc thread later :slight_smile:

6 Likes

Weekly meeting notes

A sizeable update has been posted to the Improving memory / memalloc thread.

Leo

  • Published and updated memalloc repo
  • Updated the Improving Memory thread with a deep dive explaining the new typeclass hierarchy
  • Focused mostly on re-creating the most-used APIs from memory
    • Successfully split ByteArray/Access up into Address, Layout and Allocator plus various allocation types
      • Allocation types are Handle, Ref, Array, and Pointer
      • Non-specific allocation type-classes include Castable and Retainable
    • Reached parity with ByteArray by implementing ByteArray.allocRet using Allocator.alloc
    • withAddress neé ByteArrayAccess.withByteArray is part of Allocator now but might become an allocation access class
    • Combined with Array, we have achieved our core goal of re-creating the ByteArray/Access class API
      • We still need to implement the functions and instances that use them though
  • Created an example of using ImplicitParams to hide the alr :: Allocator alr argument to better recover the original memory interface
    • There are a few other methods of doing this
      • eg if the monad supplies the allocator
      • or if the resulting allocation or data structure keeps a reference to its allocator
      • or if the allocator is a singleton so we can just infer it / use a Proxy
  • Implemented the Std allocator that uses GHC’s wrapping of the C malloc and free
    • It isn’t finished yet but I used it to illustrate the problem that I’ll be dealing with next
    • Basically Allocation is of kind * but Handle, Reference, Pointer, Array are all of kind * -> *
    • But we need to allow both allocating eg a polymorphic Ptr a but also something like a monomorphic ByteString that is secretly a Ptr Word8 that is secretly an Addr#
    • But I think I have a solution via parametric allocators / allocations - this is my main goal this week

Joris

  • Last week continued looking into using hs-bindgen
  • Also was working w/ autoconf (legacy way of configuring packages w/ system dependencies) for build scripting - but now looking into cabal hooks instead for
    • Main reason wanted to do this was because he noticed he was trying to write actual programs in autoconf instead of scripts - eg parsing c macros, significant logic, at that point just use cabal hooks and write a haskell program
    • This makes the build scripts way more accessible to other devs
  • also going to look at the memalloc stuff (thx!)
  • main task is hs-bindgen

Jose

  • Unable to attend

Outcome

This week:

  • Leo
    • Have a 1:1 with Jose
    • Continue to work on memalloc in order to use it in botan-low for managing allocation (and in botan in the future)
    • Focus on parametric allocators / allocations
    • Update the memalloc repo again
    • Update the Improving Memory thread again
  • Joris
    • Continue working on hs-bindgen
    • Look into cabal-hooks
    • Read up on the new memalloc stuff

Until next time!

7 Likes

December Monthly update

It has been difficult to write this month, in part due to the holidays, and being ill for a few weeks, and then I had the bittersweet duties of hosting an early Christmas potluck for a dear friend before helping them move.

I have been busy working the fine details of a sizable update pending to the memalloc thread - this of course is where most of my energy has gone. It involves a rather careful peeling apart of the concepts of memory and arrays, which will get us closer to replacing ByteArrayAccess ba with something like MemoryAccess mem Byte allowing us to generalize to Bit, Byte, Word, and so on. Nomenclature is rather sticky* but I’ve done a great deal of disambiguation, and even have some examples now of ways that memory / addresses / allocations can break common expectations.

*Much like describing the generalized concept that collects handles, references, pointers, and arrays, without saying that they handle, refer to, point to, or arrange something, which is very easy to say colloquially eg “this handle points to something” even though only pointers should “point”, a handle “handles” something but my prose-ometer violently rejects such phrasing

In particular I’ve taken a good deal of influence taken from the Ix and Data.Array.IArray classes - in fact, there is now an Addr class that corresponds to a weaker notion in between Eq and Ix- it turns out that addresses are in general not orderable, not even partially orderable, but rather only pre-orderable because it is possible that a < a - as a concrete albeit historical example, the i8086 address space and its segmented pointers. Never have I needed a PreOrd class until now!

I have also been working on a pre-proposal to split up Data.Bits into a more structured hierarchy, because doing so has actually been helpful to achieve the above, and relates to MemoryAccess mem Bit as well. The proposal has both a simple proposed hierarchy of Boolean => Bitwise => Bitfield neé Bits, and an alternative extended hierarchy that goes as far as Boolean => Bitwise => Bitfield => IntegralBitfield => SignedIntegralBitfield => TwosComplementBitfield, which pairs nicely with Num* and fromInteger, and does things like eg disambiguate logical and arithmetic shifts.

* Since it effectively defines signOf / signNum and fromInteger / toIntegralBitfield, it could even be related to / placed underneath Num leaving it to be even more ring-ish except that would force every number-like thing to talk about binary representations which would be terrible.

Regardless of any acceptance of such proposal, the resulting typeclasses should have significant utility for eg low-level memory encoding and cryptography. Once I have finished the process of editing, I’ll be publishing the update to the memalloc thread as well.

Meeting Notes 12/8/25

Leo

  • Out sick half of the week, still recovering
  • Working on an update to memalloc
  • Working on a response to Jack’s questions am glad for the interest
  • Looking at how to integrate / pull some allocation convenience functions from botan-low’s C FFI hook generator functions, so we can start using memalloc
  • Focusing on integration, so providing a better surface API (eg botan)

Joris

  • Working on hs-bindgen, its working
  • Improvin it
  • Inspirtion from rust bindings
  • Same author as the C++ library
  • Has a custom setup that configures botan bindings on the fly
  • Relies on pkg-cfg for now, but almost ready with other options
    • Example: rust bindings use pkgcfg or you can give a directory
    • Also potential for vending the source directly
  • Solves the problem of how to vend / surface botan while allowing the user to configure it

Jose

  • Nothing to report

Meeting Notes 12/15/25

Joris

  • Will be away next week because of holidays
  • After the holidays will be working in a reduced capacity
  • Continued improving the setup hooks script for botan bindings
    • Supports using extra-include-dirs
    • Work is mostly complete, waiting on hs-bindgen release

Leo

  • Was sick w/ bad migraines (weather), little to report

Jose

  • Did not attend

Meeting Notes 12/22/25

Recovering , just me and Jose today , mostly just talked with Jose about what I’ve been looking at / working on

Leo

  • Binary representations
  • Looking at ‘array’ package for inspiration
  • Breaking apart Data.Bits
  • Creating a proper binary hierarchy
  • All useful for cryptography / botan because BitString and stuff
  • can use eg newtype MagnitudeBits a = Mk a to ‘pick’ out a specialized subset of bits to talk about, which is very useful
  • Is-a vs has-a problems
  • Eg is an allocator a memory space, or does it have a memory space
  • Tried with DFs, FDs, indicates its my hierarchy thats the problem
  • Studying Array a i e for inspiration, maybe Allocator alr lay aln
  • Want to separate eg allocation vs address vs pointer
  • Maybe decouple allocation from address - then allocation can have or can be
  • Address vs eg additional allocation data eg refcount
  • Trying to split up addressing, finding (suitable) addresses, storing at an address, reading from an address, registering that address, and releasing it
  • Because an address space only cares about addresses, a memory space stores and loads at addresses so it is an address space but it doesn’t necessarily care about registering addresses its just putting a thing into the memory units according to a given layout - it says nothing about the memory space tracking what is where - and the allocator cares about vending objects which MAY involve registering addresses, if the allocation is an address and not a value being passed around
  • Still trying to figure out how where to stitch / cross the threshold of ‘addr’ to ‘aln a’ to wrapping it again ‘bs = (ptr u8, int)’
  • We’re basically taking a concrete type ‘Addr’, adding a phantom type by wrapping it as a ‘Ptr a’, then wrapping a concretized version of that as a ByteString = (Ptr Word8, Int)
  • allocative functors between monofunctors and functors (constrained by size not type)
  • haskell’s implicit allocator, every lifted value is a pointer which inverts things syntactically (we have ‘Int’ instead of ‘Lifted Int’) which makes things problematic
  • Haskell modeled as an infinite register machine
  • Problems of constraining allocation types - monomorphic
  • How Storable is reallly about producing a Layout
  • How Addr is between Ord and Ix (if we say Ord => Addr)

Jose

  • Jose will be out next week for the holidays

Meeting Notes 12/29/25

No meeting / no notes (I would be the only one attending)

Leo

  • Is working on this update / out for the holiday

Jose

  • Is out for the holiday

Joris

  • Is out for the holiday / working in a reduced capacity (but has continued to make commits - I see them!)
9 Likes

Hey Leo,

Do you think there’s space in this project for a student to help via GSoC?

1 Like

You might find some API inspiration in the semigroupoids library, which introduces e.g. class Apply which “should” be a superclass of class Applicative.

2 Likes

@LaurentRDC Quite possibly - what does that entail?

@jackdk Ah yes another one of ye olde haskell warts to be corrected for a more modern haskell :slight_smile: