KnownNat-indexed vectors

The way sized vectors are usually implemented in Haskell is with a data declaration, typically involving several fields of the same type, for example:

data V3 x = V3 !x !x !x deriving ({- ... -})

This representation has the obvious benefits of automatic deriving for many classes, but working with arbitrary sizes is rather difficult. So I’ve been toying with another approach: using a function type!

{-# LANGUAGE BlockArguments #-}
{-# LANGUAGE LambdaCase #-}

newtype V n x = V {unV :: Natural -> x} deriving (Functor)

This particular formulation doesn’t allow for automatic deriving of Eq, Ord, Foldable or Traversable, but with a few helper functions they are easily expressed by hand:

dimensions :: forall n. (KnownNat n) => Proxy n -> [Natural]
dimensions Proxy = case natVal (Proxy @n) of
  0 -> []
  n -> [0 .. pred n]

(!) :: V n x -> Natural -> x
V f ! x = f x

instance (KnownNat n, Eq x) => Eq (V n x) where
  (==) :: V n x -> V n x -> Bool
  v == w = List.all (\i -> v ! i == w ! i) (dimensions (Proxy @n))

instance (KnownNat n, Ord x) => Ord (V n x) where
  compare :: V n x -> V n x -> Ordering
  compare v w = foldMap (\i -> compare (v ! i) (w ! i)) (dimensions (Proxy @n))

instance (KnownNat n) => Foldable (V n) where
  foldMap :: (Monoid m) => (a -> m) -> V n a -> m
  foldMap f (V x) = foldMap (f . x) (dimensions (Proxy @n))

instance (KnownNat n) => Traversable (V n) where
  traverse :: (Applicative f) => (a -> f b) -> V n a -> f (V n b)
  traverse f (V x) =
    traverse (f . x) (dimensions (Proxy @n)) <&> \xs ->
      V \i -> xs List.!! fromIntegral i

The Traversable instance is perhaps unfortunate, since we must keep a reference to the list xs to determine the values of the resulting vector, but I haven’t heavily used the Traversable instance yet in my experiments, so I can’t muse on its performance.

The Applicative instance is also straightforward, and (<*>) in particular gives me peace:

instance (KnownNat n) => Applicative (V n) where
  pure :: x -> V n x
  pure x = V (const x)

  (<*>) :: V n (x -> y) -> V n x -> V n y
  V f <*> V x = V (f <*> x)

Of course, it’s tedious to write case expressions each time you want to define a vector, but the pattern is easily extractable:

v2 :: forall x. x -> x -> V 2 x
v2 x y = V \case
  0 -> x
  _ -> y

withV2 :: (x -> x -> y) -> V 2 x -> y
withV2 f v = f (v ! 0) (v ! 1)

v3 :: forall x. x -> x -> x -> V 3 x
v3 x y z = V \case
  0 -> x
  1 -> y
  _ -> z

withV3 :: (x -> x -> x -> y) -> V 3 x -> y
withV3 f v = f (v ! 0) (v ! 1) (v ! 2)

v4 :: forall x. x -> x -> x -> x -> V 4 x
v4 x y z w = V \case
  0 -> x
  1 -> y
  2 -> z
  _ -> w

withV4 :: (x -> x -> x -> x -> y) -> V 4 x -> y
withV4 f v = f (v ! 0) (v ! 1) (v ! 2) (v ! 3)

This formulation also extends to matrices:

newtype M m n x = M {unM :: V m (V n x)} deriving (Functor)

m22 :: forall x. x -> x -> x -> x -> M 2 2 x
m22 a b c d = M $ V \i -> V \j -> case (i, j) of
  (0, 0) -> a
  (0, 1) -> b
  (1, 0) -> c
  _ -> d

{- m33, etc... -}

row :: V n x -> M 1 n x
row x = M $ V (const x)

column :: V m x -> M m 1 x
column x = M $ V \i -> V (const (x ! i))

In this case the instances map/fold/traverse over both of the dimensions, @m and @n. What’s more, the Kronecker product is super easy to exhibit:

kronecker ::
  forall m n p q x.
  (KnownNat p, KnownNat q, Num x) =>
  M m n x -> M p q x -> M (m * p) (n * q) x
kronecker (M a) (M b) = M $ V \i -> V \j ->
  let p = natVal (Proxy @p)
      q = natVal (Proxy @q)
      (ia, ib) = quotRem i p
      (ja, jb) = quotRem j q
   in a ! ia ! ja * b ! ib ! jb

Are there libraries where this kind of vector has been implemented? So far I’ve used this technique in my own attempt at Ray Tracing in One Weekend with much success, though I haven’t compared it to data libraries like linear in terms of performance.

6 Likes

I’m highly interested in this given I’ve been dealing with vectors matrices and tensors lately. Fun stuff.

I’ve been taking an approach that uses a type family to dispatch to a specific implementation for efficiency. It works, but that is about all I can say about it - I more or less took my functor / foldable / traversable implementation straight from vector, so its not very fancy.

type family V (n :: Nat) (a :: Type) :: Type where
    V 0 a = V0 a
    V 1 a = V1 a
    V 2 a = V2 a
    V 3 a = V3 a
    -- ...
    V n a = BigV n a

data V0 a = V0

data V1 a = V1
    {-# UNPACK #-} !a

data V2 a = V2
    {-# UNPACK #-} !a
    {-# UNPACK #-} !a

data V3 a = V3
    {-# UNPACK #-} !a
    {-# UNPACK #-} !a
    {-# UNPACK #-} !a

-- MemVector is roughly ~ Ptr (MemRep a), from my `memalloc` -backed vector classes
newtype BigV (n :: Nat) a = BigV (MemVector n a)

:smiling_face_with_three_hearts:

1 Like

To get good performance you probably want to regularly cache the contents (and why use Natural instead of Int?):

{- cabal:
build-depends: base, primitive
default-language: GHC2021
-}

{-# LANGUAGE BlockArguments #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE TypeAbstractions #-}
module T where

import Data.Primitive.Array
import Data.Proxy
import GHC.TypeLits

newtype V n x = V {unV :: Int -> x} deriving (Functor)

cache :: KnownNat n => V n x -> V n x
cache @n (V f) = 
  let 
    n = fromIntegral (natVal (Proxy @n)) :: Int
    arr = createArray n undefined $ \m -> do
      let go i
            | i < n = writeArray m i (f i) *> go (i + 1)
            | otherwise = pure ()
      go 0
  in V (indexArray arr)

But why not just use arrays at that point?

1 Like

Vectors aren’t negative-indexed. I could use Word but there’s no real performance decrease in using Natural, as far as I’m concerned.

In linear-base, Data.Array.Polarized.Pull.Array seems to be a function internally. Although it’s not parameterized by its length at the type level.

1 Like

There is vector-sized: Size tagged vectors for vectors, annotated with type level length.

3 Likes

The function is more like an interface compared to an array, which is an implementation.

iirc, “Algebra Driven Design” has a whole part where it discusses using functions like this.

This style is what puts the F in FP. I rarely see it in industry sadly. Industrial Haskell smells more like Java than Scheme :confused:

massiv also has delayed/pull arrays: Data.Massiv.Array.Delayed

1 Like

As long as you don’t expose the constructor(s), all types are like interfaces rather than implementation. Backpack makes this distinction even clearer.

Eh kinda. Seq vs Array if used explicitly aren’t one for one. Whereas with a function, the same ! works with both.

Data.Array is lazy, so you can just write the memoized version directly:

cache :: KnownNat n => V n x -> V n x
cache @n (V f) = 
  let 
    n = fromIntegral (natVal (Proxy @n)) :: Int
    arr = array (0,n-1) [ (i, f i) | i <- [0..n-1] ]
  in V (arr !)
2 Likes

5 posts were split to a new topic: The design of array libraries

There exist several packages that define a family of finite types indexed by type-level naturals, e.g. finite-typelits, data-fin, fin-int or fin. For each natural number n, there is a type Finite n that has exactly n elements. Then you can define

newtype V n x = V (Finite n -> x)

and derive a lot of instances via Reader (Finite n). You certainly want memoization for performance.

By the way, it is fun to define matrices this way. Conal Elliott gave a talk about this at least once.

1 Like

Are there Num instances for any of these Finite types? I have used some before for studying finite topological spaces but it seems indexing would be a pain, having to convert things to Finites in order to do it properly. This approach has the advantage that it doesn’t “segfault” if indexing past the length, it just returns whatever the value at the largest index is.

This is part of my own private library that I created after reading your Bolt Math thread! I’m also miffed at the state of the Num hierarchy and would see it changed. I’d be more than happy to compare notes on what each other has been working on in this space!

1 Like

I do hope to be able to sink some good focus time into it again soon, especially because I have a bunch of stuff related to random distributions that needs integrating - my data science friend has been helping me with that a bit. Between bolt, memalloc, and botan, I have been growing quite the little ecosystem, and it all needs tending!

1 Like

I’d imagine a typelit approach would be better if syntax is what you’re after. Especially with required type arguments, you’ll barely notice as a user.

I’m not sure I understand what you mean here. Could you give an example?

I’m afraid switching to a data representation improved performance dramatically. Unfortunate, but unsurprising. Doing heavy math with these kinds of vectors, creating new vectors from old, requires keeping references around to all of these functions. I will experiment with memoization, but I think this spells doom for the approach I outlined in the original post.

At Target, I worked on a project that used memoized functions with integer inputs. It was for supply chain optimization. It was in Kotlin but it was SUPER nice. It mapped to the theory really well too.

So there’s definitely something to this. I guess those weren’t vectors but still. It’s kind of annoying to use these things in Haskell compared to Kotlin. There’s a library gap imo.

2 Likes