Experience Report: Linear Haskell enables Pure, Parallel, and In-place Fast Fourier Transformation

konn · December 2, 2023, 12:47pm

Motivation

Since I have read the Linearly Quantified Types paper and Arnaud’s articles on Linear Constraints, I’ve been very excited about how we will be able to use linear constraints to formulate borrowing subslices of arrays within Linear Haskell.
Linear Constraint is not available in GHC today, as the Linear constraints proposal by Arnaud has just been discussed (I ed a lot ):

github.com/ghc-proposals/ghc-proposals

Linear constraints proposal

master ← tweag:linear-constraints

opened 01:03PM - 21 Nov 23 UTC

aspiwack

+1737 -0

Adds a linear fat arrow `%1 =>` this is meant to greatly improve the ergonomics …of some of the APIs using linear types (it tends to apply to APIs based on typestate or related to mutation). [Rendered](https://github.com/tweag/ghc-proposals/blob/linear-constraints/proposals/0621-linear-constraints.rst). There are quite a bit of interactions to consider, let me know if I forgot (or misrepresent!) anything. The alternatives section is rather large, with variants in the design space that may be worth exploring. Don't hesitate to peruse and argue from one variant or other.

Although it is not available in the current GHC, we should still be able to simulate linear constraints by token data types, following the path in Arnaud’s article just the other way around.
To see this, I gave it a try to implement pure, parallel, and in-place FFT with today’s Linear Haskell (say GHC >= 9.0), inspired by the quicksort example in the paper (§4.2.3). I chose FFT because:

FFT is another typical example of a divide-and-conquer algorithm, incorporating lending/borrowing subslice of an array.
When I implemented FFT with ST-monad, I wanted to parallelise FFT in a type-safe manner and Linear Haskell seems to provide a means for it.

Result

So, here it is:

github.com/konn/linear-extra

linear-fft/src/Numeric/FFT/CooleyTukey/Linear.hs

e2e3d0787


      
          loop rw arr !n !c !s
            | n <= 1 = rw
            | otherwise =
                SA.halve rw arr & \(SA.MkSlice sliced rwL rwR l r) ->
                  let !half = n `quot` 2
                      !dblCs = 2 * c * c - 1
                      !dblSn = 2 * s * c
                      !kW = c :+ s
                      divide
                        | n <= thresh = (,)
                        | otherwise = par
                   in divide
                        (loop rwL l half dblCs dblSn)
                        (loop rwR r half dblCs dblSn)
                        & \(rwL, rwR) ->
                          SA.combine sliced rwL rwR l r & \(Ur arr, rw) ->
                            forN
                              half
                              ( \ !k (RW r w) ->
                                  SA.unsafeGet r k arr & \(Ur ek, r) ->

This file has been truncated. show original

Here is the simplified version of the main loop:

loop :: RW n %1 -> SArray (Complex Double) n -> Int -> Double -> Double -> RW n
loop rw arr !n !c !s
  if n <= 1 then rw else
      halve rw arr & \(MkSlice sliced rwL rwR l r) ->
        let !half = n `quot` 2
            !dblCs = 2 * c * c - 1
            !dblSn = 2 * s * c
            !kW = c :+ s
            divide = if n <= thresh then (,) else par
          in divide
              (loop rwL l half dblCs dblSn)
              (loop rwR r half dblCs dblSn)
              & \(rwL, rwR) ->
                combine sliced rwL rwR l r & \(Ur arr, rw) ->
                  forN half
                    ( \ !k (RW r w) ->
                        get r k arr & \(Ur ek, r) ->
                        get r (half + k) arr & \(Ur ok, r) ->
                        set (RW r w) k (ek + kW ^ k * ok) arr & \rw ->
                        set rw (half + k) (ek + kW ^ (half + k) * ok) arr
                    )
                    rw

The below is the example of the result of the above code:

And ThreadScope assures that the computation is indeed done parallelly, hurray!

Besides RWs, it looks really simple and declarative implementation of FFT, parallelising works until the array length becomes below the prespecified threshold.

In the example above, RW and SlicesTo are the token datatype, mimicking the hypothetical corresponding linear constraints. So, once LinearConstraints gets implemented these arguments will mostly dissapear and handled implicitly by the compiler.
Anyway, here is the current token-based API (just the API described in the paper blindly translated to a token-based one):

data SArray a s -- Storable Array with elements of type a, s the unique Skolem variable

data R s -- ^ Read permission token for SArray a s
data W s -- ^ Write permission token for SArray a s

data RW s = RW (R s) (W s)

get :: 
  (SV.Storable a) => 
  R s %1 -> Int -> SArray a s -> (Ur a, R s)

set :: 
  (SV.Storable a) => 
  RW s %1 -> Int -> a -> SArray a s -> RW s

-- | The token witnessing @SArray a l@ (resp. @SArray a r@) is the first (resp. latter) half of the @SArray a s@.
data SlicesTo s l r

data Slice a s where
  MkSlice ::
    !(SlicesTo s l r) %1 ->
    !(RW l) %1 ->
    !(RW r) %1 ->
    !(SArray a l) ->
    !(SArray a r) ->
    Slice a s

split :: 
  (SV.Storable a) => 
  RW s %1 -> Int -> SArray a s -> Slice a s
combine ::
  SlicesTo s l r %1 ->
  RW l %1 ->
  RW r %1 ->
  SArray a l ->
  SArray a r ->
  (Ur (SArray a s), RW s)
halve ::
  (SV.Storable a) =>
  RW s %1 -> SArray a s -> Slice a s

Update: the original type of set was incorrect. Thank you @atravers for pointing this out!

Note that, concerning solely on a single array access, the token-based approach doesn’t reduce the burden of passing-around RW, R, and/or W tokens (and Linear Constraints will help here). But slicing the array into two halves and lending to different function calls needs some bookkeeping with the Skolem variable trick, and the token-based approach can help even if LinearConstraints is unavailable. Without it, we should provide a specialised datatype for an array slice, requiring duplication of all the API for arrays for it.

Summary: even if the token-based approach is the tentative workaround until we get LinearConstraints, it still makes sense to adapt it when simulating uniform borrowing!

Notes on pure parallelism under destructive pure computations

The hardest part of the implementation was to avoid duplication of pure but destructive operations (such as writing to the array).
The subtlety is that now the seemingly pure term can contain destructive operations on arrays and if such ops are duplicated, the referential transparency collapses all at once!
To avoid this, we must use unsafePerformIO instead of unsafeDupablePerformIO in the write function for arrays (sets).
But we also have to pay attention to the implementation of par combinator:

par :: a %1 -> b %1 -> (a, b)
par a b = runEval C.do
  a <- rpar a
  b <- rpar b
  a <- rseq a
  b <- rseq b
  C.pure (a, b)

Here, runEval, rpar and rseq are custom linear-variant of those in parallel package.
The key difference lies in the implementation of runEval. In parallel, it is implemented with unsafeDupablePerformIO - this makes sense since, in the standard (non-linear) Haskell world, pure terms must not incorporate destructive operations.
But with our linear interface, OTOH, the set function for arrays returns pure (token) value with the destructive operation under the hood. This means that if a call of runEval (in par) is evaluated more than once, then undesired writes to the same array could occur more than once, out of order!
So we had to implement runEval in terms of unsafePerformIO, not Dupable one.

Conclusion

Although a little clumsy to pass around tokens, the token-based approach in Linear Haskell helps us to write pure array manipulation algorithms involving lending slices of arrays.
The result of FFT implementation looks concise and much alike traditional pure parallelism with pars in traditional Haskell.

It is still clumsy to thread token values everywhere. Fortunately, if once LinearConstraints gets implemented, this burden goes away! So, let’s Linear constraints proposal:

github.com/ghc-proposals/ghc-proposals

Linear constraints proposal

master ← tweag:linear-constraints

opened 01:03PM - 21 Nov 23 UTC

aspiwack

+1737 -0

Adds a linear fat arrow `%1 =>` this is meant to greatly improve the ergonomics …of some of the APIs using linear types (it tends to apply to APIs based on typestate or related to mutation). [Rendered](https://github.com/tweag/ghc-proposals/blob/linear-constraints/proposals/0621-linear-constraints.rst). There are quite a bit of interactions to consider, let me know if I forgot (or misrepresent!) anything. The alternatives section is rather large, with variants in the design space that may be worth exploring. Don't hesitate to peruse and argue from one variant or other.

Thank you Linear Haskell guys for pushing this forward!

romes · December 2, 2023, 2:05pm

Brilliant! Thanks for showing us this @konn

atravers · December 2, 2023, 9:14pm

…so presumably having to add that extra arr argument must also be clumsy - based coarsely(!) on that in-place quicksort example from the paper:

sort :: (RW n % 1, UArray Int n % r) → () * R RW n
sort = let !len = length in
  if len <= 1 then ()
  else let !pivotIdx = partition
           !(Ur (l, r)) = split pivotIdx
           !() = sort l
           !() = sort r
           !(Ur _) = join l r
       in ()

partition :: (RW n % 1, UArray Int n % r) => Int * RW n
partition =
  let !last = length − 1
      !(Ur pivot) = read last
      go :: RW n %1 => Int → Int → Int * RW n
      go l r
          | l > r
          = let !() = swap last l in !l
          | otherwise
          = let !(Ur lVal) = read l in
            if lVal > pivot
            then let !() = swap l r
                    in go l (r − 1)
            else go (l + 1) r
  in go 0 (last − 1)

Clumsiness begone!

konn · December 2, 2023, 10:47pm

@atravers Interesting idea! But I think that your approach has an ambiguity on type parameter n and have to use TypeApplications to pass n to every funcall. So if we have to treat multiple arrays (e.g. when doing with slices), you have to pass arr sowehow and it seems mundatory.

atravers · December 3, 2023, 12:20am

Actually, my intention was to bring attention to a potential, but subtle conflation of terminology: it’s the manual threading of token values through definitions which is clumsy, and nonsense-prone:

doing :: s -> (s, a)
doing s = ...

undoing :: a
undoing = let (s, r) = doing s in r  -- this could take a while...

…this being a major reason why Haskell now has the monadic interface, and eventually other interfaces like arrows and applicatives. As I vaguely recall, even the linear version of IO a uses token values…

…there it is:

(Using the correct search facility helps :-)

In your implementation, get is like the linear I/O constructor - it really does thread through a token value: R s. But set is subtly different: its input token is RW s, not R s (which it returns). As for loop:

loop :: RW n %1 -> SArray (Complex Double) n -> Int -> Double -> Double -> RW n`

…it’s like get: a “token through-threader”, for want of a better description.

More generally though, not all token values are intended to be passed though every definition which uses them - for the majority of definitions they could just be parameters, and not returned at all. Clearly that usage isn’t “clumsy threading” - it’s more like the behaviour of the monadic reader type: the value of interest is shared, not “plumbed”.

konn · December 3, 2023, 6:41am

Oh, sorry for making you explain your joke by yourself… I am not so good at comprehending well-thought, implicit jokes.

Exactly. And I’m afraid that I’m still missing the point of your joke - with this joke together, you successfully detected the implicit occurrence of the adjective manual right before threading, which is also clear from the context. So, what was your intention?

That’s right, and indeed I have extensively used that fact in the implementation of linear-analogues of parallel combinators:

github.com/konn/linear-extra

linear-parallel/src/Control/Parallel/Strategy/Linear.hs

e2e3d0787


      
          rseq x = Eval (LIO.fromSystemIO (Unsafe.toLinear evaluate x))
          
          -- | 'rpar' sparks its argument (for evaluation in parallel).
          rpar :: Strategy a
          {-# INLINE rpar #-}
          rpar x = Eval $ LIO.IO $ Unsafe.toLinear2 GHC.spark# x

Aha! this is my typo. The actual type of set returns RW s:

set :: (SV.Storable a, HasCallStack) => RW s %1 -> Int -> a -> SArray a s -> RW s

Thank you for pointing this out (I could recognise your implicit joke this time! ). I will correct the original post.

Yes. And, as you might know, the compiler will take care of such threading once LinearConstraints gets implemented. Yes, I had to clear this point in my original post. Thank you again.

ocramz · December 3, 2023, 8:15am

Very nice example and report @konn ^^ Will LinearConstraints unlock high-perf numerics in Haskell?

atravers · December 3, 2023, 9:21am

Alright, the type signature for set had the wrong return type…then the only relevant point from my previous post now is this:

konn · December 3, 2023, 10:40am

@atravers Then I have no objection - I’ve never intended to insist that “all threadings in any forms are cumbersome”. I meant that the repetition of particular forms of threading tokens like RW or SlicesTo can be reduced once Linear Constraints. Sorry for my unclear writings and thank your for your discussion.

konn · December 3, 2023, 10:54am

@ocramz Thanks!

Well, it depends. Linear types and constraints primarily concern resource management and local mutability, so they can be used to improve performance where such points matter.

One thing to note: if we use explicit tokens to simulate Linear Constraints, as demonstrated in this example, we can achieve similar effects at the expense of boilerplate. In other words, linear constraints do not strictly increase theoretical strength or expressivity.
OTOH, it is true that Linear Constraints (and their mimics) can drastically reduce such boilerplate code and make lending / slicing more handy. As it will make writing Linear Haskell more comfortable and comprehensible, linear constraints will definitely help us write and understand high-performance programs.

atravers · December 4, 2023, 12:04am

Sometimes, however, linearity annotations can add ~~boilerplate~~ clutter. What I’m thinking about is the Unique and associated UniqueSupply types in GHC:

(…presumably values of these types should not be reused, hence the use of the word “unique”.)

Since both types already exist and are used considerably in GHC, changing them to use linearity annotations should preferably require changes to none (or a few) existing types and type signatures. It seems to me that based on !10952: Linear let and where bindings · Merge requests · Glasgow Haskell Compiler / GHC · GitLab , it should be possible to add linear annotations directly to types:

data UniqSupply
 = (
    MkSplitUniqSupply {-# UNPACK #-} !Int -- make the Unique with this
                   UniqSupply UniqSupply
                                -- when split => these two supplies
   ) %1  -- should only be used once

This would be similar to the adding of strictness annotations:

…and much less intrusive than e.g. updating the type signature of practically every definition which uses those types:

splitUniqSupply     :: UniqSupply %1 -> (UniqSupply, UniqSupply)
listSplitUniqSupply :: UniqSupply %1 -> [UniqSupply]
uniqFromSupply      :: UniqSupply %1 -> Unique
uniqsFromSupply     :: UniqSupply %1 -> [Unique] 
takeUniqFromSupply  :: UniqSupply %1 -> (Unique, UniqSupply)
                           	⋮

splitUniqSupply     :: UniqSupply ⊸ (UniqSupply, UniqSupply)
listSplitUniqSupply :: UniqSupply ⊸ [UniqSupply]
uniqFromSupply      :: UniqSupply ⊸ Unique
uniqsFromSupply     :: UniqSupply ⊸ [Unique] 
takeUniqFromSupply  :: UniqSupply ⊸ (Unique, UniqSupply)
                           	⋮

…or:

splitUniqSupply     :: UniqSupply 🍭 (UniqSupply, UniqSupply)
listSplitUniqSupply :: UniqSupply 🍭 [UniqSupply]
uniqFromSupply      :: UniqSupply 🍭 Unique
uniqsFromSupply     :: UniqSupply 🍭 [Unique] 
takeUniqFromSupply  :: UniqSupply 🍭 (Unique, UniqSupply)
                           	⋮

There was some research into making uniqueness types less unique in 2008 - maybe a few results from that work could help with further decreasing the bureaucracy of working with linear types…

konn · December 4, 2023, 1:15am

@atravers Rather off-topic, but interesting. But introducing unique types in Haskell means we now have two separate universes of types - one is unrestricted (traditional) and the other linear. This requires much more boilerplate definitions of data types, serving almost the same purpose but differing in terms of linearity. The authors of the Linear Types paper chose to annotate arrows rather than types to avoid this dichotomy (see §6.1 of Linear Haskell paper).
In addition, the following talk by Daniel (granule guy) describes the difference and relation between linear and uniqueness types really clearly:

And, even if Linear Haskell takes the linearity-in-arrow approach, Linear Constraints can reduce the burden of introducing linear resources, as described in Arnaud’s blog post and Linearly Quantified Constraints paper, using Linearly constraint:

linearly :: (Linear %1=> Ur r) -> Ur r
new :: Linearly %1=> Int -> SArray a `SuchThat` RW

I think this is enough for introducing linear resources, but sometimes it is cluttering to have to write %1 on every arrows. We can still use the linear version of State-monad to reduce such arrows. Still, whether we have to introduce another form of uniqueness type (say, by linearity-by-kind as you indicated) is an important and open topic, so it definitely deserves another dedicated post (but rather off-topic in this Discourse thread). If you open the new post, then I would be involved in that discussion. Thanks!

Topic		Replies	Views
Improving performance of mutable arrays using linear types	21	2824	September 18, 2024
Linearly-typed mutable vector of unboxed values (hack) for NFA benchmark Learn	0	119	April 7, 2025
Is there a parFoldMap? How about libraries for similar functions? Learn	20	467	March 12, 2025
[ANN] A series of articles on Heftia: The Next Generation of Haskell Effects Management Announcements	78	2746	May 31, 2025
Reference Counting with Linear Types Show and Tell	19	5484	June 17, 2024

Experience Report: Linear Haskell enables Pure, Parallel, and In-place Fast Fourier Transformation

Motivation

Result

Notes on pure parallelism under destructive pure computations

Conclusion

Related topics