Using Immutable Vector extractors on Mutable Vectors

Hi everyone,

I have been trying to use the bitvec: Space-efficient bit vectors library, but most of the interesting operations are using Immutable Vectors. I however is planning to use mutable bit-vectors in my code.

Is it possible to use a mutable vector temporarily as a vector in a function call, without copying it?

Essentially, I want something like this:

extract :: 
  PrimMonad m => 
     MVector (PrimState s) a 
  -> (Vector a -> b) 
  -> m b

You cannot do it in a safe way: what if you modify the original MVector while the Vector a -> b function is running? However, you can use unsafeFreeze.

2 Likes

Ah, the three problems with extract is that,

  1. you might access the MVector in parallel,
  2. b might contain (Vector a) and therefore keep the reference alive outside the scope, and
  3. since (Vector a -> b) is a pure function and might first be evaluated when b is forced.

Keeping that i mind, Iā€™m unsure how unsafeFreeze solves the problem since it states that: ā€œThe mutable vector may not be used after this operation.ā€

Can I implement the unsafeExtract function like this:

unsafeExtract :: 
  PrimMonad m => 
     MVector (PrimState s) a 
  -> (Vector a -> b) 
  -> m b
unsafeExtact mv fn = do 
  v <- unsafeFreeze mv
  let !b = fn v
  pure b

And use it in a context like a bit-vector set:

import Data.Bit as BV
import Data.Vector.Unboxed.Mutable as UM

add :: 
  PrimMonad m => 
     UM.MVector (PrimState m) BV.Bit 
  -> Int 
  -> m ()
add mv ix = do
  UM.insert mv ix 1

countUnique :: 
  PrimMonad m => 
     UM.MVector (PrimState m) BV.Bit 
  -> m Int
countUnique mv = extractUnsafe mv countBits

Yeah, you can do that and it might be somewhat safer than just using unsafeFreeze directly. But either way there is some burden of proof on the user of these functions. The compiler does not check that you are always using them correctly.

2 Likes

Thank you!

This is kind of interesting, so we are missing a way in the type-system to represent ā€œBurn after readingā€ items, with two abilities 1) cannot be saved in hunks, and 2) cannot be saved in constructors or returned without being copied. This smells a little like the borrow checker in Rust.

Youā€™re completely right and there is already some work in that direction for Haskell. We call it Linear Types. I think this video is a pretty good introduction.

But for practical info youā€™ll have to look at the documentation and linear-base.

Linear types are not quite right, since itā€™s all about arguments being consumed exactly once, in this case we want our argument to be consumed many times, but never returned. Itā€™s about ā€˜borrowingā€™ the argument to the function, expecting it to not leak out on execution.

ā€œBorrowingā€ the syntax of linear types, we might want something like a %0 -> b, ei, something that is returned exactly zero times. Destructive functions have these types:

index :: Int -> Vector a %0-> a
maybe :: (a -> b) -> b -> Maybe a %0-> a

Maybe something like this could also be beneficial to reduce the use of the garbage collector. If you call a destructive function with an argument, you can reuse the piece of memory right after the call, because it is not refereed to in the results.

Which operations are you interested in?

Iā€™m looking at the ā€˜countBitsā€™ and ā€˜nthBitIndexā€™, there does not seem to be mutable versions of the same operations. Iā€™m specifically trying to implement a succinct dictionary to improve the storage of a data structure Iā€™m working on in the https://discourse.haskell.org/t/counting-words-but-can-we-go-faster/ project.

Ps. Always happy to talk to the author of the library :slight_smile:

Are you looking for popkey: Static key-value storage backed by poppy and hw-rankselect: Rank-select?

Iā€™ll look into them, they look interesting! Do you recommend them instead of the bitvec library and should bitvec be considered obsolete? Alternatively, is calling the immutable operators on a mutable bit vector through the ā€˜unsafeFreezeā€™ operator the correct way to use the library?

bitvec is a general purpose library for bit vectors, while popkey is an implementation of a succinct data structure. They have different purposes and are not interchangeable.

I have not used popkey myself, so cannot tell if itā€™s a right choice for your purposes.

Yes, your unsafeExtract is fine, as much as ā€œunsafeā€ functions can be ā€œfineā€. But Iā€™m also open for someone to extend bitvec with countBitsM and nthBitIndexM for mutable vectors, it should be straightforward to implement.

3 Likes

Itā€™s mostly a pet project so ā€˜popkeyā€™ is not the best choice! Thank you for your help though. If I can figure it out, Iā€™ll love to send a pull request. Iā€™ll be busy the next couple of though.