Memory Functors
This isn’t quite ready for publishing yet, mostly because I am still mulling things over, but I think I have had broken through the sticky wicket with something I hope you’ll like. A few things, actually.
Bits Proposal
First, I have been working on an extended Bits hierarchy. I have an actual proposal draft with more detail, but I’ll give you the cliffs notes because it is still in progress. The code is still shaking loose but I am pleased with the fundamental breakdown so far.
Why? Because Haskell’s bitwise tooling is so poor that it is mentioned in wikipedia:
“Haskell likewise currently lacks standard support for bitwise operations, but both GHC and Hugs provide a Data.Bits module with assorted bitwise functions and operators, including shift and rotate operations and an “unboxed” array over Boolean values may be used to model a Bit array, although this lacks support from the former module.” - Wikipedia, Bit Array, Language Support
I’d really like to fix that. So what are the problems?
The current Data.Bits is extremely monolithic, and conflates several distinct concepts and responsibilities, making it difficult to implement lawful instances.
Particular issues include:
Bits is huge - it has 22 different functions to implement
- It is large enough that many implementations are partial
- Eg they implement
xor or testBit while more complex functions such as rotate are left unimplemented.
- It does not distinguish between logical and arithmetic shifts
FiniteBits is actually about fixed-width precision, which excludes arbitrary precision units (eg finite but dynamic)
bitSize, bitSizeMaybe, and finiteBitSize have historical issues
- None of the size functions actually check the value argument, which is necessary for arbitrary precision units
.&. and .|. are super awkward* but then we have xor instead of .^. probably because ^ is used for exponentiation in Haskell instead.
* I’d love to actually steal (&&) and (||) from Data.Bool and recast them as (&&), (||) :: (Boolean a) => a -> a -> a which is backwards-compatible with Bool
If we break the Bits (and FiniteBits) class down and group the functions by use, several distinct categories become apparent:
- Boolean algebra
complement, .&., .|., xor
- Multi-bit operations / shifts
shift, rotate, shiftL, unsafeShiftL, shiftR, unsafeShiftR, rotateL, rotateR
- Per-bit access
bit, setBit, clearBit, complementBit, testBit
- Size introspection
bitSizeMaybe, bitSize, finiteBitSize
- Population statistics
popCount, countLeadingZeros, countTrailingZeros
- Miscellaneous / representational
A fair spread of concepts - one might say that the functions in the Bits class are more united by an implicit or unstated assumption regarding therir expected performance, than by any single concept.
The proposal is to replace the singular Data.Bits class with a hierarchy of three core classes - Boolean, Bitwise, and Bits to fix multiple deficiences, distinguish between finite / arbitrary precision and fixed width / precision (eg Integer vs WordN 128), and to maybe also integrate Bytes neé ByteArray more closely into that hierarchy.
This restructuring will provide a clearer separation of concerns, better type safety, more lawful instances, and help provide a more extensible foundation for low-level programming in Haskell. If done carefully, it could be almost backwards-compatible; a FromDataBits wrapper for deriving via would serve to make the transition easier.
There are actually more classes to the hierarchy with respect to bit formats (eg, TwosComplement) which I am still working on, as well as integrating concepts like Endianness, but they are not so pressing at present.
I am especially keeping in mind certain disciplines:
- Cryptography
- Game Development
- 3D modeling
- Graphical processing
Believe it or not, one of my current goals and major reasons for learning Haskell was to build a (distributed) game engine - something I used to be quite good at, something that needs good low-level memory management to run quickly, and cryptography to make it run safely and securely in a distributed environment.
I have always had this as a concrete, long term goal, and a not insignificant amount of this recent memory work is aided by my looking back on and translating some of my older (decades old by now) C projects and tech demos.
Units
Bit and Byte are now a thing, and have very specific meaning (smallest unit, and smallest addressable unit); the distinction between them and eg Bool or Word8 is that Bit and Byte do not have a value interpretation attached.
I would like to do the same for Word, however, I have run into the issue that Haskell obviously already uses the nomenclature Word instead of UInt, when I need it to mean ‘largest addressable unit’ which is the more widely-accepted meaning for Word - which it does already mean, it just also has a value-judgement attached to it that I want to separate off. I might just have to deal with newtyping Memory.Word over Data.Word so they can be separate concepts.
Why care about this? Proper handling of units makes more things possible, like being able to declare sizes with IEC or SI prefixes eg newtype Bank 64 Kibi Byte = Bank (Ptr Word8), or being able to declare other base unit types in the future, such as Qubit.
Functors
The main thing that I have been working on, are memory functors.
You see (and this is the thing that I’ve been stuck on), Haskell is all about functors, and the problem is, memory can’t be a functor (or even a monofunctor):
- Memory can’t be passed by value, only by reference
- Memory can only contain primitive / unlifted / storable things
- Memory operations require a destination to put the result
- Memory operations require a length to be supplied
- Memory operations run in IO
This has several knock-on effects:
- You can’t really copy memory, only the contents
- Memory is where the values live, after all
- You need to either supply an existing destination, allocate a new one, or mutate the source
- Each of these affects functions differently
- They are not mutually exclusive, either
- Mutable memory is further constricted to act like a monofunctor
- This is because the contents are not lifted, and are of fixed size
Despite not actually being functors (not in the usual sense, anyway), I have managed to characterize the behavior in such a way that they are similar enough, and produce a set of typeclasses:
-- Takes a destination as an additional argument
class MemCopyFunctor mem where
memCopyMap :: (a -> b) -> mem a -> mem b -> Int -> IO ()
-- Allocates a destination and returns it
class MemAllocateFunctor mem where
memAllocateMap :: (a -> b) -> mem a -> Int -> IO (mem b)
-- Mutates the source in-place
class MemMutateFunctor mem where
memMutateMap :: (a -> a) -> mem a -> Int -> IO ()
-- Memory folds
-- NOTE: The IO may be unnecessary
class MemFoldable mem where
memFoldl :: (b -> a -> b) -> b -> mem a -> Int -> IO b
memFoldr :: (a -> b -> b) -> b -> mem a -> Int -> IO b
memFoldMap :: Monoid m => (a -> m) -> mem a -> Int -> IO m
-- Convenient
class MemCopyFunctor mem => MemCopyZipFunctor mem where
memZipWithCopy :: (a -> b -> c) -> mem a -> mem b -> mem c -> Int -> IO ()
I’ve been iterating on this quite a bit (hence my delay in updating), but I eventually settled on recognizing that the most important functions in memory are not the ByteArray/Access classes, but rather the functions that they rely on for implementation, which are actually the functions in Data.Memory.ExtendedWords. Hence this successful iteration seeks to generalize those functions, and thus transitively get the ByteArray/Access functions more or less for free.
For example, although they should be defined per instance in a more efficient manner, memCopy = memCopyMap id, and memSet mem x = memMutateMap (const x) mem, and memXor = memZipWithCopy Data.Bits.xor.
Notably, Traversable may not have a similarly sensible interpretation, mostly because memory can only contain primitive things, so no sequence since that would require mem (m a) and there may be other issues with mapM. However, it seems that Mem*Functor and MemFoldable are sufficient to accomplish quite a lot.
The practical effect of this design is that it covers the most-used portions of the memory interface, and most of the functions in ByteArray/Access can / have found representation with some combination of Allocator, Bytes, Memory*Functor, and MemoryFoldable - and given how sparse ByteArray/Access is compared to ByteString and Vector, we may actually have better coverage.
NOTE: I have not unified this with the earlier Allocator classes but that is probably okay because how the allocator is chosen is left up to the implementation, otherwise MemAllocateFunctor would / will need an explicit allocator argument
I think I feel good about this, because with these classes, low-level Haskell programming is starting to feel “fun”, in the same way that programming in C can be - it is nice to be close to the metal, but at the same time I can fall back and rely on high-level Haskell to fill development gaps.
Bonus: Order
This is the third distinct time that I have wished for a more nuanced order typeclass hierarchy than just Eq and Ord - some very important classes burdened by legacy and history. Just look at the documentation and implementation notes in Data.Ord. I have taken the effort to at least illustrate what a proper order hierarchy would look like:
I am not entirely sure on how to express the necessary functions for each typeclass, but if these were actually lawful classes, Eq and Ord could be softly relaxed / demoted to focus more on their duty of providing convenient infix operators for data types that have equivalence and comparison relations on a non-exceptional subset of values, eg which is how they are actually being used now, because they are actually technically lawless. Fixing this would put us on par with other modern languages with more technically correct equivalence and comparison classes.
This will almost certainly not be accepted as a proposal though, due to the burden of legacy - eg fixing Ord Double could cause so many breakages probably.
Anyway, my justifcation for spending time on this is that it turns out that preorders (directed graphs with cycles) and partial orders (directed acyclic graphs) are intimately related to memory addressing, in that memory addresses are not only not guaranteed to be totally ordered, they are not even guaranteed to be partially ordered; they are actually only necessarily pre ordered*, though this pre order plus the equivalence relation of “addresses the same location” induces a weak ordering / total preorder duality (it is complicated).
* For example, segmented pointers allow multiple addresses to point to the same location, which gives rise to cycles because now a < a for every address.
Indeed, pre and partial orders are very useful for not just memory addressing but also memory management algorithm in general, because garbage collection is all about reachablity, which is what directed graphs (and thus pre and partial orders) are good for.
Responding
@jackdk - some things have changed, but I will try to answer as best I can:
Many classes have class-specific fooEq functions: AddressSpace has addressEq, Handle has hdlEq. Would it be better to have class Eq (Address asp) => AddressSpace asp, class (Eq (h a), Eq1 h) => Handle h, etc?
The difference between (==) addrEq and hdlEq is a subtle difference of equivalence - (==) is used for exact / structural equality, addrEq is used for same-location equivalence, and hdlEq is for same-reference equivalence which is basically when addrEq = (==).
Are your fooInit functions (layoutInit, allocInit, etc) meant to be safe bracket-style functions? Would it make sense to shoehorn a with into those names? Or are these reserve-and-initialise? I think it might be the latter; would naming the functions like layoutAndInit be clearer here?
Both? Bracket-style functions allow us to turn an alloc-and-init function into a with-temporary-resource function. The difference is whether or not the allocation persists afterwards, which may be allowable with eg counted references. So mostly the latter, but designed to enable support from the former.
At the moment, layoutInit reads to me like it initialises a layout, but actually it seems like it takes a tag indicating the LayoutSpace as well as the layout and returns an Address.
The relation between address spaces, addresses, memory spaces, layouts, and allocators is something I am still clarifying / working on - right now the way I conceptualize it is that roughly asp + addr + layout = pointer, eg where alloc :: Int -> IO (Ptr Word8) ~ allocAddr :: asp -> layout -> IO addr. The pointer type tells us the address space, and the pointer content type tells us layout needs, which may be count, alignment, pixel format, etc.
I think this is why I kept eg oldLayout as an argument for deallocate - it is still shaking loose.
Am I correctly reading that that an instance of LayoutSpace is not meant to be a singleton (e.g., you could have layout spaces for individual memory segments if you were in a strange memory model)?
Yes! You get it!
Nit: Should the type variable in LayoutSpace be lsp instead of alr or something?
Yes typoes I need to clean things up
The distinction between a LayoutSpace and an Allocator is still unclear to me, despite your clarification that “this is in comparison to a layout space, which only allocates individual addresses and doesn’t care what you do with them”. In particular, I struggle to imagine what I am able to do with a LayoutSpace and I don’t know what operations would be safe to pass to the callback in layoutInit. Could you please provide a code snippet explaining how to use this machinery both for simple malloc()/free() style memory usage, as well as for some more exotic layout like SoA?
I agree that it is unclear, (see the earlier answer), but conceptually is related to the difference between allocating addresses and allocating values - an allocator attaches some value judgement to an address, but a layout space just allows to to ‘find an available space’ which isn’t necessarily the same thing as claiming it.
Easy example is you have a bank of 64KiB of memory, it is completely unmanaged, and you need to find an address / location that fits the layout requirements. The layout space only cares whether a piece of data ‘fits’ at an address but it doesn’t yet care whether that address is occupied.
Is there a reason layoutInit takes a polymorphic callback while allocInit takes a monomorphic one?
I agree though that it is muddled - work in progress, shaking loose, etc.
layoutInit a CPS-style function equivalent for convenience, it isn’t technically necessary but may be more ergonomic for certain cases.
allocInit just forces you to be able to initialize without returning anything else, but is also for ergonomics. We can implement it with a default:
allocInit alr lo f = do
aln <- alloc alr lo
withAddress aln f
pure aln
-- We also can implement allocRet
allocRet alr lo f = do
-- aln <- allocInit alr lo (\_ -> pure ())
-- Or, with just alloc
aln <- alloc alr lo
a <- withAddress aln f
pure (a, aln)
However, it hinges upon withAddress which I may make into its on class - so it actually may be necessary.
Are the class names in your deallocator hierarchy too tightly coupled to specific allocation strategies? That is, is it possible to have a deallocator that can “drop all” that isn’t an arena allocator? (StackAllocator seems fine, since it’s a name of an abstract data type as well as a machine stack.)
Terminology for allocation strategies is incredibly inconsistent, so I understand your worry. I will need to be explicit on what I mean eg an arena to be, because different readers may interpret eg what a slab or an arena allocator are differently.
For this reason, the typeclasses are specifically about various properties that allocators have - so for “That is, is it possible to have a deallocator that can “drop all” that isn’t an arena allocator?” the answer is yes, sort of.
The classic ArenaAllocator implementation has only 2 operations - allocate, and deallocateAll, but since allocate is already identified with Allocator, that leaves us with deallocateAll as uniquely identifying Arena and Arena-like allocators.
A classic arena allocator might only be able to deallocate everything all at once, but you might have a combination of a Stack and an Arena allocator, that can pop the most recent alloction, or all of them - which it could do by popping all allocations thus deallocating them one by one.
What is the function parameter to pop used for?
Because pop is a deallocation, we can’t return the popped value to act on it after, and usually people want to be able to do something with the popped value. So the action gives you a chance to act on the popped value before it gets deallocated, and you can return the result.
Have you considered using the StateVar library? It seems like its class HasGetter is your class Reference, and its classes HasSetter and HasUpdate are your class MutableReference. It also allows write-only and write/update vars, in case that split is useful to you (e.g. for things which act like output registers).
Not specifically, but I have been going over many similar reference-like data types such as IORef and whatnot - so this is getting added to my list.
If a “pointer is just an address that”, should you have class Array ptr => Pointer ptr?
Pointers do have addresses (they are defined by addrOf), but pointers aren’t arrays - arrays can be pointers, but they don’t have to be, they might be references, etc.
Most of the time when pointers and arrays are mixed up, it is because C silently casts arrays to the pointer to their first element, but a pointer and an array are quite different things.
Consider the difference between a pointer-to-an-array, and an array-of-pointers - they are not the same. Consider also that a pointer can access multiple buckets in a single operation, whereas an array can not. Consider that a pointer may have padding between elements, whereas an array does not.
Pointers can efficiently implement arrays, but an array can only inefficiently implement pointers unless the element fits in a single bucket.
Is it true that the Int is a right group action on a ptr? This might be a useful law to note.
Yes - excellent catch
Are there any pointer schemes which lack a null pointer? It could potentially be very useful to do arithmetic on non-null pointers only.
I agree - I think I should split off Nullable or NonNullable to it own class like Castable.
Is it universally true that uintPtr nullPtr == 0? If so, this might be worth writing as a law; if not, there might be some work here to ensure that nullPtr doesn’t compare equal to a zero pointer.
It is extremely not universally true, and only perceptually true for historical reasons. Even in most languages that allow for assigning 0 to mean the same thing as assigning nullPtr, it is usually a compiler mechanic giving the zero literal special treatment when being assigned to a pointer. The null pointer itself does not actually have the zero address, and a non-null pointer with the zero address can usually be obtained by awkward multi-casting to get around the special compiler rules.
Is it true that for UIntPointers p and q plusPtr p (diffPtr p q) == q? Another potential law.
For the C std flat pointer? Yes. I think it holds more or less even for more exotic memory layouts (eg segmented pointers) so long as you only consider exact equality and not equivalence.
Should the pointer hierarchy have an Eq ptr constraint?
Probably - I think I omitted it because it is assumed that all allocations are Eq, so all pointers will get Eq from somewhere else, and the Eq constraint isn’t technically necessary for the class itself oddly enough.
I think I agree with you that there should be some kind of constraint on class Castable because otherwise that’s a pretty scary type. Maybe there needs to also be some kind of constraint that a and b are “castable to each other” in some kind of meaningful sense that depends on something elsewhere in your layout/address/pointer hierarchy?
Yes this is an area of research for me that is also relevant for memory functors regarding being able to cast between things that have the same sized representation. However, this starts getting into wierd things like quantified constraints because the constraint for an allocation / pointer content may be specific to the type of allocator. I am not quite sure what to do, but it should become more clear as things continue to develop.
Final conclusion & health
This isn’t published to git yet, but it will be this week.
I have been struggling with my health lately, and these concepts require a pretty deep dive / multi-hour concentrated effort so I have not been able to write as frequent of updates as I would like. I am pacing myself, just know that I am here quietly, not gone.