Weekly update!
I’m not dead, I’m just taking a nap.
It has been a little slow getting started, but slow and steady wins the race condition. My health is good and continues to improve, but I must admit, I am still a bit more physically tired than I thought; much of the time I set aside ended up going to getting some extra sleep that I needed and only realized when I tried to take off a few ‘easy’ days to work on this.
In the meantime, Joris has continued to tend to maintenance and git tickets, and we met again on Monday to discuss some of the more technical points of what the botan libraries needs next, and how to approach it. I’m getting to like having some help on this ![]()
As a result, I have a fairly concrete plan, that I will now elaborate.
The plan
There are two major needs for improving the botan-low library and producing the botan high library.
The first issue to solve that the low-level libraries are dependent on bytestring which isn’t the worst, but requires awkward / inefficient copying / marshalling between and adhering to the restrictions of garbage-collected bytestrings which is problematic for security reasons. Really, I’d like to make it operate over bytestring-like things, including primitives such as ByteArray#.
-- In other words, this sucks, even if we `type` alias things.
type Digest = ByteString
hash :: ByteString -> Digest
-- Can't do this without OverloadedStrings and explicit typing
hash "foo"
-- Definitely can't do this
hash "foo"# -- Unboxed literal!
-- I'd definitely prefer something like this, because then we could
hash :: (ByteArray ba) => ba -> Digest
NOTE: Speaking of unboxed things, see the note further down in
memory
Solving this problem is also needed for improving / making the low-level bindings consistent in terms of allocating memory buffers for the C API to fill; C has a lot of different patterns for allocating memory, passing arguments, and returning multiple data, so there are a lot of convenience functions in botan-low for “generating” bindings to various functions depending on what they are passed and how - essentially, it makes things like nested arrays easier to deal with, and vastly reduces the amount of duplicated code. All of this ahem memory management and is needed first.
NOTE: And it’s not just a
ByteArrayclass that we need. Memory management and allocation are both important to cryptography - the former deals with safely creating data in a secure place, and the latter deals with destroying it - because part of cryptography deals with how to hold secret information safely, both while allocating to and reading from memory. Yetmemorydoesn’t deal with allocators - more on that further below.
The second issue to solve is that of developing high-level interface and set of cryptographic abstractions, which also really shouldn’t be reliant on bytestring either (we’ll ignore that for now) - in addition to supporting other types, I plan on using data families to allow for strongly-typed algorithms, something like:
-- A simple pure interface
class Hash h where
data family Digest h :: *
hash :: ByteString -> Digest h
NOTE: Data families, not type families. I know some folks don’t take kindly to
strangerstype families
At its simplest, that works, but that only serves to describe the one-shot / offline hashing, and when we attempt to describe incremental / online algorithms, we run into a thorny issue - we have
-- An extended interface that allows for a context
-- Technically, either:
-- A) The new hash context should be an updated copy of the original
class Hash h => IncrementalHash h where
data family HashCtx h :: *
newHash :: HashCtx h
updateHash :: HashCtx h -> ByteString -> HashCtx h
finalizeHash :: HashCtx h -> Digest h
-- OR
-- B) It is a reference to or a mutable object and this should be enforced -- via `LinearTypes`.
class Hash h => IncrementalHashL h where
data family HashCtx h :: *
newHashL :: HashCtx h
updateHashL :: HashCtx h %1 -> ByteString -> HashCtx h
finalizeHashL :: HashCtx h %1 -> Digest h
-- OR
-- C) We must use monads
class Hash h => IncrementalHashM h where
data family HashCtx h :: *
newHashM :: m (HashCtx h)
updateHashM :: HashCtx h -> ByteString -> m ()
finalizeHashM :: HashCtx h -> m (Digest h)
Note that while A is functional, it is grossly inefficient and cryptography is heavily dependent on mutating the same state over and over - so while a pure functional description such as interface A is possible and suffices, any efficient library is actually going to use mutable state under the hood, so interfaces B and C are more likely to be necessary, and if we are binding to a third-party library, we may not even get a choice.
This issue only gets more thorny, and more important with more complex algorithms than mere hashing - algorithms with key-pairs and nonces and ciphertexts and signatures.
NOTE: And even thornier, we must consider what an interface with a proper
Memoryclass orAllocatorargument would be
So what to do about this?
The solution to both is actually found in the memory / memoria package, with the ByteArray/Access class, but for many reasons, I find this package to be awkward to use (much in the same way that I find Bits and Num ). This also doesn’t solve the problem of managing allocation, does it?
NOTE: I must say that I do enjoy the rare usage of unboxed tuples / sums / data / literals in
memory, sometimes I wish for aPrimitiveHaskellextension or adata#and->#syntax sugar. Extra note: Unboxed data types are different fromStrictData.
So I did dig up and read the conversation that happened in Improving memory with better abstractions while I was gone
I do need to respond to everything in that thread, but I want to call out one response in particular:
I mean the trade-off between a) “need to control memory allocation” and b) “I don’t particularly care about memory allocation”. Once the trade-off has been made in favor of a), improving the API of the memory package ranks high in desirability.
I strongly agree, and to that end I do have a version of the ByteString interface that has been adapted to support an allocator argument - so I have begun exploring this space. In addition, I have begun exploring / classifying the needs of describing Memory vs Arrays, Handles vs Pointers vs Addresses, Memory ‘Units’ a la classical / discrete Bit, analog / continuous Level, and quantum / probabilistic Qubit, memory Layouts a la Struct of Arrays vs Array of Structs vs complex memory layouts in GPUs.
Another large problem is that GHC Haskell assumes that pointers are OS-managed userland C pointers, and that is an incredibly naive and restrictive view on what pointers are and there is isn’t support for things like fat or smart pointers, or segmented memory pointers, or pointers in an interpreted machine, let alone describing things like striping or virtual addressing.
For instance, what if I had a C interpreted virtual machine written in Haskell, and I wanted to be able to transparently allocate memory in the virtual machine that has its own address format and space?
I have written a rather large amount of notes about this so I have some good idea of what needs to happen, but translating it / synthesizin code is tedious and there are many degrees of freedom. Yes, I do want to make where ByteArray ba ~= MemArray CPtr ByteAddressable Bit and make MemArray QPtr BitAddressable Qubit a thing because why not but also is that the most important aspect right now?
NOTE: I am intrigued by that roughly speaking Bit = 1-dimensional discrete, Level = 1-dimensional continuum, and Qubit = 2-dimensional continuum. Byte-addressing turns 1-d bit addressing into a 2-d byte-plus-bit-offset addressing
So, in short, to support improvements to botan-low and to create botan we really need to improve support for memory, so that we can use those abstractions to help wrangle these very stateful concepts into something nice.
I will expand upon this in more detail in an upcoming post to the Improving memory with better abstractions thread some time this week.
Closing and Teaser
It isn’t out of nowhere that I am interested in low-level memory management. Rather, memory management poses an interesting question for a functional language that would rather abstract it away - because often, allocators and memory management deal with implementation details, which we are supposed to hide, and yet we also need to classify them.
It also isn’t out of nowhere that I am interested in cryptography - which also interestingly deals with things that don’t normally come up in a pure functional context - after all, the idea of “secret data” that “must not be known” is an odd one - how does one describe “My Secret Key” when “Mine” is a relative concept - an implementation detail? Also, how do I talk about something that fundamentally I can’t have - like “Your Secret Key”?
You’ll note that just like memory layouts & management being an implementation detail that is more or less of no use in the pure and abstract world, cryptography also has almost no use in the pure world itself either, because it is concerned with providing integrity and authenticity in order to keep the pure world pure - just like memory layouts and calling conventions are an implementation detail to be dealt with by the compiler specific to that machine, OS, and architecture.
So, what is this odd similarity that they share - these things that we should only care about enough to make it such that we no longer care about? Just as I don’t care about my memory layout particulars except that it is fast and correct, I don’t care about my cryptographic algorithm particulars except that they are safe and secure - we care about them enough to make them disappear way-below…
The answer is that memory management, cryptography and secret management, and distributed computing all fall under the purview of something that I have taken to calling “Displacement types”, which is something that I have been slowly developing for several years and have alluded to before - something that I have talked about with individuals in private, but never publicly - but now I feel it is complete enough and I am confident enough to talk about it properly, especially now that I’ve developed it sufficiently to established a relation to memory management and cryptography and even quantum programming (which was also really fun and I should do a write-up on).
NOTE: Way-below and way-above (a particular facet of partial orders) are concepts that I definitely also need to write about as a precursor to displacement types, because displacement is split into two flavors - above and below.
Anyway, this is about the limit of my writing energy. Until next time!