Distinguishing pinned and unpinned arrays via a type parameter

Recently I saw Consider making distinct types for pinned and unpinned arrays · Issue #405 · haskell/primitive · GitHub, which suggests adding a type parameter to ByteArray and PrimArray to indicate the “pinnedness” of the array, i.e. wether it is pinned or unpinned (pinned arrays can not be moved by the garbage collector). This would look something like this:

data Pinnedness = Pinned | Unpinned

data ByteArray (p :: Pinnedness) = ...
data MutableByteArray (p :: Pinnedness) s = ...

newByteArray :: PrimMonad m => Int -> m (MutableByteArray 'Unpinned (PrimState m))
newPinnedByteArray :: PrimMonad m => Int -> m (MutableByteArray 'Pinned (PrimState m)) 

While this would be a breaking change and thus likely won’t be implemented in primitive, it piqued my interest. Since some operations are only safe on pinned arrays (e.g. byteArrayContents), this could be used to statically enforce that they’re only used on pinned arrays. However, it would make it harder to mix pinned and unpinned arrays.

I’d like to know if anyone here would find this useful and if you’d use a package for this, that provides thin wrappers over ByteArray and PrimArray with a “pinnedness” type parameter.

Some arrays are explicitly pinned (by virtue of being created as such) and some are implicitly pinned by GC (because they are over 3K). Pinnedness is essentially a property of RTS, not an inherent property of data, so I’m not sure type-level distinction is a fundamental solution.

1 Like

>3K being pinned is an implementation detail though. If you have an unpinned (per OP’s definition) array larger than 3K, it is technically unsafe to call certain functions that assume pinnedness on it.

So if Pinned means “guaranteed to be pinned” and Unpinned means “not guaranteed to be pinned,” I think there is something to this proposal.

Interesting, I didn’t know that!

That would make sense, although Pinned and Unpinned are perhaps not the best names then.

1 Like

Hm I think that’s what people usually mean by pinned and unpinned. People don’t usually consider the 3K thing. So I think the names are valid.

The main reason to distinguish is that certain operations should be used only for pinned arrays. The rest of the operations should work on both pinned and unpinned. So all your array operations will either be for ByteArray Pinned or for ByteArray p; none for ByteArray Unpinned.

One concern I’ve heard is that you might want to be able to mix pinned and unpinned ByteArrays in the same collection. If you had such a collection, you would only be able to use the pinnedness-agnostic operations on the contained arrays, but that would probably be the intention of such a collection.

To allow this, you could use existential types, but I think that may be overkill. I think it would suffice to be able to just convert a Pinned array into an Unpinned one, i.e. have a ByteArray Pinned -> ByteArray Unpinned. This isn’t a copy. It’s just a restricted view of the same array, with which you are unable to use pinned-only operations, but which allows you to mix it with other unpinned arrays.

This leads me to think that the Pinnedness values should be Pinned and Unknown rather than Unpinned. If the value is Unknown, then it might have been explicitly pinned but treated as unpinned for the sake of mixing types, or it might have been implicitly pinned for being over 3K, or it might be unpinned. I don’t see a need to have a tag for strictly unpinned arrays.

What’s the use case for ByteArray Pinned which is not covered by ByteString? Both are a pinned pointer with length.

Oh also, ByteArray Unknown is equivalent to the existing ByteArray type. So you could just add a new PinnedByteArray type, and make the pinned-only operations expect that. Problem with this approach is that the pinnedness-agnostic operations now have to be defined such that they work for both, e.g. by making a type class for which those two types have instances. Using a ByteArray (p :: Pinnedness) approach makes the polymorphism simpler.

Looks like ByteString’s buffer is ultimately acquired through mallocPlainForeignPtrBytes . It’s not a buffer in GCed memory, like ByteArray is.

Here is my understanding, but I may be wrong: Acquiring a ByteArray is faster since managed allocations are faster than malloc. The downside, though, is that if the ByteArray is pinned, it can create fragmentation in managed memory. Thus:

  • A pinned ByteArray is good for quickly allocating it, passing it to the FFI, and then freeing it. Keeping it short-lived minimizes fragmentation.
  • An unpinned ByteArray doesn’t have the same fragmentation impact, so it can be kept around longer, but it can’t be handed to the FFI.
  • A ByteString can be kept around without causing managed memory fragmentation (like an unpinned ByteArray) and can be handed to the FFI (like a pinned ByteArray), but is slower to allocate than ByteArrays and could cause fragmentation in unmanaged memory.

mallocPlainForeignPtrBytes is newPinnedArray# in disguise, there is no difference.

Ah I see… the name is a bit misleading. In that case, I agree … I don’t really see a difference between using a ByteString and using a pinned ByteArray.