A Space Oddity Update: Big Changes Coming
This update has been a long time coming. I don’t mean since the last update - I mean that it has things in it that I’ve wanted to get out for over a year. I have been diligently clearing the way for this for a very long time, and it has produced 2 other libraries as a side effect - it is time to show you what all that effort has been for.
(Ten) Ground Control (Nine) to Major Tom (eight, seven)
memalloc has turned out to be in fine shape. I finally was able to add the last piece I needed to make it minimally viable. The question here was how to deal with generalizing the withPtr function to support arbitrary memory types, and to allow for not just drilling down step-by-step, but for drilling down to a specific memory type - eg, if you have Ptr Word8 inside a ForeignPtr inside a ByteString, you want to be able to call withPtr directly, instead of calling withMem twice - once for the ByteString to get the ForeignPtr, once with the ForeignPtr to get the Ptr.
class (Memory mem, Memorable memo) => WithMemory mem memo where
withMemory :: memo -> (mem (MemRep memo) -> IO a) -> IO a
default withMemory :: (Mem memo ~ mem) => memo -> (mem (MemRep memo) -> IO a) -> IO a
withMemory = withMem
instance {-# OVERLAPPABLE #-} (Memory mem, Memorable memo, Mem memo ~ mem) => WithMemory mem memo where
#if defined(USE_BYTESTRING)
instance WithMemory Ptr ByteString where
withMemory ptr action = withMem ptr $ \ fPtr -> withMem fPtr action
instance WithMemory ForeignPtr ByteString where
#endif
type WithPtr memo = WithMemory Ptr memo
withPtr :: WithPtr memo => memo -> (Ptr (MemRep memo) -> IO a) -> IO a
withPtr = withMemory
type WithForeignPtr memo = WithMemory ForeignPtr memo
withForeignPtr :: WithForeignPtr memo => memo -> (ForeignPtr (MemRep memo) -> IO a) -> IO a
withForeignPtr = withMemory
In the end, I turned to my first ever use of OverlappableInstances, which I hope is both safe and justified. Normally I would resort to some type-level and family shenanigans, so I might revisit a few things in memalloc and the math library in some places where I did, to see if this is a more appropriate way of handling things than with a recursive type family 
I am quite pleased with there result here, however.
(Six) Commencing (five) countdown, engines (four) on (three)
In the most recent meeting on Monday (nothing of note has occurred recently due to travel and other events), Joris Jose and I discussed the needs and effects of integrating the new memory allocation code into botan-low, and came to the conclusion that it doesn’t really fit the old plan of botan-bindings -> botan-low -> botan, and that, surprisingly, I quite like the state that botan-low is in - that is to say, its not that there arent improvements to be made, but that the IO-Bytestring interface is quite effective for what it is, and I see no sense in mucking that up for no reason.
Sometimes you go back and look at old code and are embarrassed, other times you are proud - the underbelly still needs the refactoring that memalloc is designed for, but I can’t really justify changing what is a stable API for no reason. And since the old plan was only an idea that we are not beholden to, the plans have been changed.
How have they changed?
The new memory management library provides allocators, and by providing the allocator as an argument, we can ie specify that we want to use a SecureByteStringAllocator that guarantees cleanup, instead of just using the default ByteStringAllocator which cleans up lazily. This changes the API interface:
-- This
rngGet :: RNG -> Int -> IO ByteString
-- Changes to this
type ByteAllocator alr bytes =
( Allocator alr
, Allocation alr ~ bytes
, Layout alr ~ Int
, Bytes bytes
)
-- ...
rngGet :: (ByteAllocator alr bytes) => alr -> RNG -> Int -> IO bytes
This is nice, it gives us quite a bit of freedom in interacting with the botan-bindings C interface. Given that this will change almost every function, we decided it best to make botan-low into a more expressive interface with a major version jump, and the existing IO-Bytestring -focused interface will be re-exposed in botan-io-bytestring. This also gives reason to bring on in some of the higher-level ergonomics that are laying fallow in the unpublished botan, since we are not beholden to keeping botan-low as a 1:1 reflection of the botan-bindings interface anymore.
The Bytes class needs just a wee bit more work before I can use it here in botan, so today, we are going to finish ripping the guts out of the Make / Remake / mkFoo classes first, and with great relish, replacing them with ByteString allocators from memalloc.
(Two) Check ignition (One) and may god’s love (lift off…) be with you
RNG
Last time, we defined a BotanObject class, and used it to manage turning a Ptr into a ForeignPtr into a BotanObject and back. Now, it is time to use it. In fact, lets rewrite the entire RNG module… I know that sounds drastic, but trust me on this 
We have a few more imports this time around, courtesy of the memalloc library:
import Botan.Bindings.RNG
import Botan.Bindings.ConstPtr (ConstPtr (..))
import Botan.Low.Internal.Object
import Botan.Low.Internal.Error
import Memory.Memory
import Memory.Pointer
import Memory.Castable
import Memory.Allocator
import Memory.Allocator.ByteString
import Data.ByteString (ByteString)
import qualified Data.ByteString as ByteString
import Foreign.Ptr (Ptr)
import Foreign.ForeignPtr (ForeignPtr)
Step 1 of implementing a module with the new Memory types is to define your memory instances. We sort of did this last time, so we’ll revisit with minor improvements due to changes since then.
-- This should probably go in botan-bindings
instance Memorable BotanRNG where
type Mem BotanRNG = Ptr
type MemRep BotanRNG = BotanRNGStruct
withMem (MkBotanRNG ptr) action = action ptr
newtype RNG = MkRNG { foreignPtr :: ForeignPtr BotanRNGStruct }
instance Memorable RNG where
type Mem RNG = ForeignPtr
type MemRep RNG = BotanRNGStruct
withMem (MkRNG fptr) action = action fptr
instance BotanObject RNG where
type BotanStruct RNG = BotanRNGStruct
type BotanPtr RNG = BotanRNG
toBotanPtr = MkBotanRNG
toBotan = MkRNG
botanFinalizer = botan_rng_destroy
And then its engines on - didn’t you hear the countdown?
First, we define a withRNG helper, which is withBotanPtr specialized - it makes things easier to read.
withRNG :: RNG -> (BotanRNG -> IO a) -> IO a
withRNG = withBotanPtr
Then, we define our RNG initializer - it is very straightforward: We get the RNGType’s name’s pointer as a ConstPtr CString and pass it in and createBotan handles the actual initialization logic, so all we have to do is turn arguments into pointers.
rngInit :: RNGType -> IO RNG
rngInit t = ByteString.useAsCString (rngTypeName t) $ \ tPtr -> do
createBotan @RNG $ \ outPtr -> botan_rng_init outPtr (ConstPtr $ cast tPtr)
I’d use withPtr here instead of ByteString.useAsCSString except that the trailing null-terminator actually matters, and so the copy is required. I will probably either put a convenience function in the memalloc library to make this more convenient, or have the ...TypeName functions add the \0 themselves. It is a minor nit.
Previously botan-low passed around context and algorithm types as ByteStrings, and that wasn’t very fun to manage. The higher-level botan has had ADTs for the algorithm types just sitting there, so I’ve been bringing them down to botan-low because we’re going to want to reuse them in any library built on top.
data RNGType
= System
| User
| UserThreadsafe
| RDRand
rngTypeName :: RNGType -> ByteString
rngTypeName System = BOTAN_RNG_TYPE_SYSTEM
rngTypeName User = BOTAN_RNG_TYPE_USER
rngTypeName UserThreadsafe = BOTAN_RNG_TYPE_USER_THREADSAFE
rngTypeName RDRand = BOTAN_RNG_TYPE_RDRAND
We still point to the constants declared in botan-bindings, but the ADT is much friendlier.
rngGet is our first real test - both as the first testable action of any BotanObject using the new initializers, but also as a test of our allocators as a means of getting data out of the C / Botan world:
rngGet :: RNG -> Int -> IO ByteString
rngGet rng len = withRNG rng $ \ botanRNG -> do
allocInit ByteStringAllocator len $ \ fPtr -> withPtr fPtr $ \ bytesPtr -> do
throwBotanIfNegative_ $ botan_rng_get botanRNG bytesPtr (fromIntegral len)
systemRNGGet :: Int -> IO ByteString
systemRNGGet len = allocInit ByteStringAllocator len $ \ fPtr -> withPtr fPtr $ \ bytesPtr -> do
throwBotanIfNegative_ $ botan_system_rng_get bytesPtr (fromIntegral len)
Ah! The only beef I have with my design here is that when the ByteStringAllocator is initialized, the Mem ByteString is actually ForeignPtr Word8 and so we need an extra withPtr step to drill down to Ptr Word8 - this isn’t necessarily a bad thing, but I will probably put a convenience function in the ByteStringAllocator module to do this automatically. Otherwise, this is dirt simple!
These functions didn’t really change (withRNG silently handled the transition):
rngReseed :: RNG -> Int -> IO ()
rngReseed rng bits = withRNG rng $ \ botanRNG -> do
throwBotanIfNegative_ $ botan_rng_reseed botanRNG (fromIntegral bits)
rngReseedFromRNG :: RNG -> RNG -> Int -> IO ()
rngReseedFromRNG rng source bits = withRNG rng $ \ botanRNG -> do
withRNG source $ \ sourcePtr -> do
throwBotanIfNegative_ $ botan_rng_reseed_from_rng botanRNG sourcePtr (fromIntegral bits)
Finally, we make use of withPtr to supply entropy to an RNG.
rngAddEntropy :: RNG -> ByteString -> IO ()
rngAddEntropy rng bytes = withRNG rng $ \ botanRNG -> do
withPtr bytes $ \ bytesPtr -> do
throwBotanIfNegative_ $ botan_rng_add_entropy botanRNG (ConstPtr bytesPtr) (fromIntegral $ ByteString.length bytes)
That is it. RNG module refactored. Got energy for one more?
Hash
Step 1: Define our object and memory instances, same as RNG.
instance Memorable BotanHash where
type Mem BotanHash = Ptr
type MemRep BotanHash = BotanHashStruct
withMem (MkBotanHash ptr) action = action ptr
newtype Hash = MkHash { foreignPtr :: ForeignPtr BotanHashStruct }
instance Memorable Hash where
type Mem Hash = ForeignPtr
type MemRep Hash = BotanHashStruct
withMem (MkHash fptr) action = action fptr
instance BotanObject Hash where
type BotanStruct Hash = BotanHashStruct
type BotanPtr Hash = BotanHash
toBotanPtr = MkBotanHash
toBotan = MkHash
botanFinalizer = botan_hash_destroy
withHash :: Hash -> (BotanHash -> IO a) -> IO a
withHash = withBotanPtr @Hash
Initializing a hash is pretty much the same as RNG:
hashInit :: HashType -> IO Hash
hashInit htype = ByteString.useAsCString (hashTypeName htype) $ \ htypePtr -> do
createBotan @Hash $ \ outPtr -> botan_hash_init outPtr (ConstPtr htypePtr) 0
The Hash type is a bit more complicated though
In a good way…
data HashType
= BLAKE2b BLAKE2bSize
| GOST_34_11
| Keccak1600 Keccak1600Size
| MD4
| MD5
| RIPEMD160
| SHA1
| SHA224
| SHA256
| SHA384
| SHA512
| SHA512_256
| SHA3 SHA3Size
| SHAKE128 SHAKE128Size
| SHAKE256 SHAKE256Size
| SM3
| Skein512 Skein512Size Skein512Salt
| Streebog256
| Streebog512
| Whirlpool
-- Combination strategies
| Parallel HashType HashType
| Comb4P HashType HashType
-- Checksums
| Adler32
| CRC24
| CRC32
deriving stock (Eq, Ord, Show)
hashTypeName :: HashType -> ByteString
hashTypeName = implementation omitted
We actually ran into our first major hiccup in the Hash module - in botan, many functions require that you supply a buffer function with a size pointer with the value set to 0 in order to query for the required buffer length. This is a bit awkward, but I have created this small helper function:
queryBufferLength :: (Ptr CSize -> IO CInt) -> IO Int
queryBufferLength fn = mask_ $ alloca $ \ szPtr -> do
Ptr.poke szPtr 0
code <- fn szPtr
if code == BOTAN_FFI_ERROR_INSUFFICIENT_BUFFER_SPACE
then fromIntegral <$> Ptr.peek szPtr
else throwBotanError code
This works for a single buffer, but there are functions that take two buffers, with two size pointers - I’m not entirely sure how this will hold up, but we’ll get to that in due time. For now, this works quite well!
Here we use it to query the length of the buffer required to retrieve the hash’s name from a context object:
hashName :: Hash -> IO ByteString
hashName h = mask_ $ withHash h $ \ hPtr -> do
n <- queryBufferLength $ \ szPtr -> botan_hash_name hPtr nullPtr szPtr
name <- alloca $ \ szPtr -> do
Ptr.poke szPtr (fromIntegral n)
allocInit ByteStringAllocator n $ \ fPtr -> withPtr fPtr $ \ namePtr -> do
throwBotanIfNegative_ $ botan_hash_name hPtr (cast namePtr) szPtr
-- TODO Check n vs the actual length
return $!! ByteString.takeWhile (/= 0) name
Now, I have mixed feelings about this because queryBufferLength allocates a temporary size and returns its value only to immediately shove it into a different temporary size pointer, I think we could re-use the same size pointer but I want to wait until I deal with multi-buffer functions first - so for now, this inefficiency is acceptable, and is dwarfed by the need to copy the name buffer anyway.
This function barely changed - it just uses createBotan now
hashCopyState :: Hash -> IO Hash
hashCopyState source = withHash source $ \ sourcePtr -> do
createBotan @Hash $ \ outPtr -> botan_hash_copy_state outPtr sourcePtr
No more mkAction, we just use withHash
hashClear :: Hash -> IO ()
hashClear h = withHash h $ \ botanHash -> do
throwBotanIfNegative_ $ botan_hash_clear botanHash
These are a little bit longer since now they call alloca themselves instead of a convenience function, but they are still very clean:
hashBlockSize :: Hash -> IO Int
hashBlockSize h = mask_ $ alloca $ \ szPtr -> do
withHash h $ \ botanHash -> do
throwBotanIfNegative_ $ botan_hash_block_size botanHash szPtr
fromIntegral <$> Ptr.peek szPtr
hashOutputLength :: Hash -> IO Int
hashOutputLength h = mask_ $ alloca $ \ szPtr -> do
withHash h $ \ botanHash -> do
throwBotanIfNegative_ $ botan_hash_output_length botanHash szPtr
fromIntegral <$> Ptr.peek szPtr
Finally, we come to the actually hashing functions. Because of withPtr and allocInit, they are pretty trivial - almost exactly the same as the rngGet in `RNG
hashUpdate :: Hash -> ByteString -> IO ()
hashUpdate h bs = withHash h $ \ hPtr -> do
withPtr bs $ \ bsPtr -> do
throwBotanIfNegative_ $ botan_hash_update hPtr (ConstPtr bsPtr) (fromIntegral $ ByteString.length bs)
type HashDigest = ByteString
hashFinal :: Hash -> IO HashDigest
hashFinal h = withHash h $ \ hPtr -> do
sz <- hashOutputLength h
allocInit ByteStringAllocator sz $ \ fPtr -> withPtr fPtr $ \ digestPtr -> do
throwBotanIfNegative_ $ botan_hash_final hPtr digestPtr
And that’s it! That’s 2 modules down, with more following soon.
Conclusion
As a first real test of the memalloc library, I am proud of the result. Refactoring these 2 modules shows that it is more than sufficient to replace the original Make / Remake / mkBindings, since the new modules don’t use any of that anymore.
Exposing the allocator and generalizing from bytestring to bytes will only take a little more work once we are ready for that, too.
These newly refactored modules won’t be live until I get the memalloc library up on hackage, since it is nows a dependency, so I have got to finish getting a few last things in shape, but I’ll be continuing to update the botan-low modules in the meantime - there’s nothing quite like having a use case to drive development forward.