Can linear types make this implementation of `inits` safe?

Lysxia · March 16, 2024, 9:32am

I wrote a tiny bit faster implementation of Data.List.inits that replaces the underlying purely functional queue with a mutable data structure and careful uses of unsafePerformIO. Can we use linear types instead of unsafePerformIO to encapsulate this as a purely functional data structure?

Implementing `inits` with a queue

The high-level structure of inits :: [a] -> [[a]] is the same between the standard library one and my version: scanl the input list with a queue as an accumulator, snoc-ing (pushing) elements to it. Then every prefix of the input list can be obtained by repeated uncons-ing (popping) from one of the accumulated queues.

-- pointful
inits :: [a] -> [[a]]
inits xs = empty : go empty xs
  where
    go q [] = []
    go q (x : xs) = toList (snoc q x) : go (snoc q x) xs

-- pointfree
inits :: [a] -> [[a]]
inits = map toList . scanl' snoc empty

-- Queue API
empty :: Q a
snoc :: Q a -> a -> Q a
toList :: Q a -> [a]  -- unfoldr uncons where uncons :: Q a -> Maybe (a, Q a)

This must be a real-time queue—uncons should run in non-amortized constant time—so that the output can be consumed lazily (notably map head . tail . inits :: [a] -> [a] should take linear time). This must be a persistent queue, because the same queue will be passed as an argument to both snoc (to construct the next prefix) and uncons (to produce the current prefix).

Implementing a persistent queue

The standard persistent queue is the banker’s queue described in Okasaki’s book. For inits, we also know that the only elements that will ever be pushed to the queue are the elements of the list. This allows a more efficient representation using a mutable (and resizable) array to hold the elements.

Every queue is a slice of the array: snoc appends an element at the end of the array, and uncons reads the first element of the slice. This is not a general purpose queue: we are assuming that every snoc is followed by at most one snoc, and uncons is never followed by a snoc. Under those assumptions, that data structure actually implements a purely functional interface: every queue is an immutable slice of the array; snoc only ever writes where no past slice can access.

Can we give pure (non-IO) types to the operations of that queue, and implement inits with it? Linearity is necessary to encode the usage assumptions, but is it sufficient?

Even ignoring the specifics of the implementation, I can’t imagine any linear types for a queue expressive enough to implement inits. The problem can be seen in the above definition of inits (which may be too naive to account for linearity): how can we reconcile the linearity of q with the recursive call go (snoc q x) under an unrestricted :? (Careful readers will also note the double occurrence of q, but that can easily be solved by merging toList and snoc as a single function.)

tomjaguarpaw · March 16, 2024, 10:29am

I think this should be inits ys = empty : go empty ys.

And I wonder if this alternative implementation is helpful:

inits :: [a] -> [[a]]
inits = go empty
  where
    go q [] = [toList q]
    go q (x : xs) = toList q : go (snoc q x) xs

Lysxia · March 16, 2024, 11:26am

Good catch!
The other one is less lazy

atravers · March 16, 2024, 1:58pm

Just out of curiosity - why not use runST and ST s a instead of unsafe... I/O entities?

jaror · March 16, 2024, 3:32pm

On my machine your solution is not consistently faster. If there is a difference at all then yours is slightly slower. What results did you get?

Here’s some of my results which give very inconclusive results (I switched from criterion to tasty-bench):

[user@nixos-laptop:/tmp]$ cabal run -O2 Inits.hs -- --stdev 2 +RTS -T
All
  initsA: OK
    251  ms ± 1.5 ms, 2.6 GB allocated, 306 KB copied, 8.0 MB peak memory
  initsQ: OK
    217  ms ± 3.9 ms, 2.2 GB allocated, 3.4 MB copied, 8.0 MB peak memory
  initsA: OK
    251  ms ± 2.8 ms, 2.6 GB allocated, 172 KB copied, 8.0 MB peak memory
  initsQ: OK
    250  ms ± 3.4 ms, 2.2 GB allocated, 3.5 MB copied, 8.0 MB peak memory

All 4 tests passed (4.87s)

[user@nixos-laptop:/tmp]$ cabal run -O2 Inits.hs -- --stdev 2 +RTS -T
All
  initsA: OK
    252  ms ± 5.4 ms, 2.6 GB allocated, 384 KB copied, 7.0 MB peak memory
  initsQ: OK
    250  ms ± 358 μs, 2.2 GB allocated, 3.6 MB copied, 8.0 MB peak memory
  initsA: OK
    287  ms ±  11 ms, 2.6 GB allocated, 343 KB copied, 8.0 MB peak memory
  initsQ: OK
    250  ms ± 6.6 ms, 2.2 GB allocated, 3.6 MB copied, 8.0 MB peak memory

All 4 tests passed (24.50s)

[user@nixos-laptop:/tmp]$ cabal run -O2 Inits.hs -- --stdev 2 +RTS -T
All
  initsA: OK
    252  ms ± 2.0 ms, 2.6 GB allocated, 306 KB copied, 8.0 MB peak memory
  initsQ: OK
    250  ms ± 1.1 ms, 2.2 GB allocated, 3.6 MB copied, 8.0 MB peak memory
  initsA: OK
    252  ms ± 3.1 ms, 2.6 GB allocated, 172 KB copied, 8.0 MB peak memory
  initsQ: OK
    219  ms ± 5.8 ms, 2.2 GB allocated, 3.5 MB copied, 8.0 MB peak memory

All 4 tests passed (6.87s)

Lysxia · March 16, 2024, 3:50pm

How odd. A typical run looks like this for me:

benchmarking initsA
time                 268.8 ms   (259.4 ms .. 281.3 ms)
                     0.998 R²   (0.994 R² .. 1.000 R²)
mean                 281.6 ms   (274.4 ms .. 288.0 ms)
std dev              8.417 ms   (6.315 ms .. 10.18 ms)
variance introduced by outliers: 16% (moderately inflated)

benchmarking initsQ
time                 305.0 ms   (293.4 ms .. 319.2 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 308.5 ms   (303.5 ms .. 315.1 ms)
std dev              7.385 ms   (4.581 ms .. 11.01 ms)
variance introduced by outliers: 16% (moderately inflated)

Just out of curiosity - why not use runST and ST s a instead of unsafe... I/O entities?

The technical reason is that runST doesn’t let you write lazy algorithms. When you force a runST thunk, all of the allocations, reads and writes in it happen at once. Anything that accesses a mutable array or other resource allocated in ST s must mention the s parameter in its type. When you want a thunk of type [a] to be accessing such a resource, there’s no room for that s parameter.

The moral reason is that I want to write algorithms declaratively.

Bodigrim · March 16, 2024, 7:22pm

On my machine (aarch64) initsA is faster than initsQ or initsT, both using criterion and tasty-bench:

$ cabal run inits.hs
benchmarking initsA
time                 175.6 ms   (174.8 ms .. 176.9 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 174.4 ms   (173.9 ms .. 175.1 ms)
std dev              825.8 μs   (663.4 μs .. 1.072 ms)
variance introduced by outliers: 12% (moderately inflated)

benchmarking initsQ
time                 209.8 ms   (206.6 ms .. 212.2 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 211.9 ms   (210.7 ms .. 212.5 ms)
std dev              1.274 ms   (671.9 μs .. 2.050 ms)
variance introduced by outliers: 14% (moderately inflated)

benchmarking initsT
time                 201.9 ms   (200.4 ms .. 205.3 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 201.5 ms   (201.0 ms .. 202.8 ms)
std dev              1.045 ms   (174.6 μs .. 1.506 ms)
variance introduced by outliers: 14% (moderately inflated)

$ cabal run inits.hs
All
  initsA: OK
    176  ms ±  17 ms
  initsQ: OK
    212  ms ±  19 ms
  initsT: OK
    205  ms ±  16 ms

atravers · March 16, 2024, 8:44pm

My mistake: from your response here, I thought you were looking for a solution that wasn’t (as) lazy. The original (lazier) version of ST s a is still available - it’s just that most prefer the strict version to avoid space leaks…

The moral reason is that I want to write algorithms declaratively.

But (in this definition) using unsafe... entities - this could be of some use:

Can linear types make this implementation of `inits` safe?

Implementing inits with a queue

Implementing a persistent queue

Implementing `inits` with a queue