This was prompted and inspired by Let's run some NFAs — 0xd34df00d.me to write a faster linear-type based NFA match function using a mutable vector of unboxed data.
It is made especially to compare fairly with the ST monad version of the NFA match function mentioned in the article.
How it was done
First, it is a hack: A linear-type-safe API wrapping unsafePerformIO under the hood.
However, the ST monad version uses the unsafe APIs (unsafeWrite, unsafeRead, unsafeNew) directly from Data.Vector.Unboxed.Mutable
, which arguably is worse.
The linear-type-safe API wrapper over Data.Vector.Unboxed.Mutable
is straightforward:
import Prelude.Linear qualified as L
import Unsafe.Linear qualified as L
newtype LinearIOVector a = MkLinearIOVector (L.Ur (VM.IOVector a))
instance L.Consumable (LinearIOVector a) where
consume (MkLinearIOVector ur) = L.consume ur
new_linear_iovec :: VM.Unbox a => Int -> LinearIOVector a
new_linear_iovec = unsafePerformIO . (MkLinearIOVector . L.Ur <$>) . VM.unsafeNew
read_linear_iovec :: VM.Unbox a => LinearIOVector a %1 -> Int -> (L.Ur a, LinearIOVector a)
read_linear_iovec (MkLinearIOVector (L.Ur vec)) i = L.toLinear unsafePerformIO $ do
a <- (L.toLinear VM.unsafeRead) vec i
pure (L.Ur a, MkLinearIOVector (L.Ur vec))
write_linear_iovec :: VM.Unbox a => LinearIOVector a %1 -> Int -> a -> LinearIOVector a
write_linear_iovec (MkLinearIOVector (L.Ur vec)) i a = L.toLinear unsafePerformIO $ do
(L.toLinear VM.unsafeWrite) vec i a >> pure (MkLinearIOVector (L.Ur vec))
Comparing to ST monad, the match function using the linear-type-safe API is monad/do-notation free:
match_linear2 :: forall q. (VM.Unbox q, StateId q) => NFA q -> BS.ByteString -> MatchResult Int
match_linear2 NFA{..} bs = L.unur L.$ go initState 0 (new_linear_iovec 24_000_000) 0
where
go q i !stack s
| q == finState = stack `L.lseq` L.Ur (SuccessAt i)
| otherwise = case q `getTrans` transitions of
TEps q' -> go q' i stack s
TBranch q1 q2 -> go q1 i (write_linear_iovec stack s (q2, i)) (s + 1)
TCh ch q'
| bs `BS.indexMaybe` i == Just ch -> go q' (i + 1) stack s
| s == 0 -> pure Failure
| otherwise -> let (L.Ur (q'', i''), stack'') = read_linear_iovec stack (s - 1)
in go q'' i'' stack'' (s - 1)
Performance
- Generate dataset:
make aa.dat aaz.dat ab.dat abz.dat
. This is in line with the data used in the original article. - To run the benchmark:
make bench
for i in aa aaz ab abz;do ./nfa-perf.hs linear2 $i.dat +RTS -s;done
("linear2","aa.dat",Failure)
2,848,059,560 bytes allocated in the heap
19,400 bytes copied during GC
288,028,464 bytes maximum residency (2 sample(s))
1,394,896 bytes maximum slop
282 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 619 colls, 0 par 0.001s 0.001s 0.0000s 0.0000s
Gen 1 2 colls, 0 par 0.007s 0.007s 0.0034s 0.0067s
INIT time 0.000s ( 0.000s elapsed)
MUT time 0.426s ( 0.425s elapsed)
GC time 0.007s ( 0.007s elapsed)
EXIT time 0.000s ( 0.001s elapsed)
Total time 0.434s ( 0.434s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 6,692,589,459 bytes per MUT second
Productivity 98.2% of total user, 98.1% of total elapsed
("linear2","aaz.dat",SuccessAt 20000001)
608,060,256 bytes allocated in the heap
15,064 bytes copied during GC
288,028,464 bytes maximum residency (2 sample(s))
1,394,896 bytes maximum slop
282 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 77 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s
Gen 1 2 colls, 0 par 0.007s 0.007s 0.0035s 0.0070s
INIT time 0.000s ( 0.000s elapsed)
MUT time 0.281s ( 0.281s elapsed)
GC time 0.007s ( 0.007s elapsed)
EXIT time 0.000s ( 0.001s elapsed)
Total time 0.289s ( 0.289s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 2,165,434,650 bytes per MUT second
Productivity 97.3% of total user, 97.3% of total elapsed
("linear2","ab.dat",Failure)
2,848,059,560 bytes allocated in the heap
19,416 bytes copied during GC
288,028,464 bytes maximum residency (2 sample(s))
1,394,896 bytes maximum slop
282 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 621 colls, 0 par 0.001s 0.001s 0.0000s 0.0000s
Gen 1 2 colls, 0 par 0.003s 0.003s 0.0017s 0.0034s
INIT time 0.000s ( 0.000s elapsed)
MUT time 0.350s ( 0.350s elapsed)
GC time 0.004s ( 0.004s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 0.355s ( 0.355s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 8,140,847,621 bytes per MUT second
Productivity 98.7% of total user, 98.6% of total elapsed
("linear2","abz.dat",SuccessAt 20000001)
1,728,060,256 bytes allocated in the heap
17,240 bytes copied during GC
288,028,464 bytes maximum residency (2 sample(s))
1,394,896 bytes maximum slop
282 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 349 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s
Gen 1 2 colls, 0 par 0.003s 0.003s 0.0016s 0.0031s
INIT time 0.000s ( 0.000s elapsed)
MUT time 0.288s ( 0.288s elapsed)
GC time 0.004s ( 0.004s elapsed)
EXIT time 0.000s ( 0.001s elapsed)
Total time 0.292s ( 0.292s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 6,004,032,978 bytes per MUT second
Productivity 98.6% of total user, 98.6% of total elapsed
for i in aa aaz ab abz;do ./nfa-perf.hs st $i.dat +RTS -s;done
("st","aa.dat",Failure)
288,058,104 bytes allocated in the heap
9,544 bytes copied during GC
43,584 bytes maximum residency (1 sample(s))
30,144 bytes maximum slop
282 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 2 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s
Gen 1 1 colls, 0 par 0.007s 0.007s 0.0069s 0.0069s
INIT time 0.000s ( 0.000s elapsed)
MUT time 0.389s ( 0.389s elapsed)
GC time 0.007s ( 0.007s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 0.396s ( 0.396s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 740,982,909 bytes per MUT second
Productivity 98.1% of total user, 98.1% of total elapsed
("st","aaz.dat",SuccessAt 20000001)
288,058,800 bytes allocated in the heap
9,544 bytes copied during GC
43,584 bytes maximum residency (1 sample(s))
30,144 bytes maximum slop
282 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 2 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s
Gen 1 1 colls, 0 par 0.007s 0.007s 0.0066s 0.0066s
INIT time 0.000s ( 0.000s elapsed)
MUT time 0.280s ( 0.280s elapsed)
GC time 0.007s ( 0.007s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 0.287s ( 0.287s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 1,028,819,377 bytes per MUT second
Productivity 97.5% of total user, 97.5% of total elapsed
("st","ab.dat",Failure)
288,058,104 bytes allocated in the heap
9,544 bytes copied during GC
43,584 bytes maximum residency (1 sample(s))
30,144 bytes maximum slop
282 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 2 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s
Gen 1 1 colls, 0 par 0.003s 0.003s 0.0035s 0.0035s
INIT time 0.000s ( 0.000s elapsed)
MUT time 0.299s ( 0.299s elapsed)
GC time 0.003s ( 0.004s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 0.303s ( 0.303s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 962,118,372 bytes per MUT second
Productivity 98.7% of total user, 98.6% of total elapsed
("st","abz.dat",SuccessAt 20000001)
288,058,800 bytes allocated in the heap
9,544 bytes copied during GC
43,584 bytes maximum residency (1 sample(s))
30,144 bytes maximum slop
282 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 2 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s
Gen 1 1 colls, 0 par 0.003s 0.003s 0.0034s 0.0034s
INIT time 0.000s ( 0.000s elapsed)
MUT time 0.268s ( 0.268s elapsed)
GC time 0.003s ( 0.003s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 0.272s ( 0.272s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 1,075,067,816 bytes per MUT second
Productivity 98.6% of total user, 98.5% of total elapsed
This is not a very scientifically rigorous experiment with many test runs, but to hint at the general performance difference:
aa* | aa*z | ab* | ab*z | |
---|---|---|---|---|
linear2 | 0.434s | 0.289s | 0.355s | 0.292s |
ST | 0.396s | 0.287s | 0.303s | 0.272s |
The good news is that there is almost no difference. If anything, the linear2 version seems to have a little overhead after many runs.
This should not be surprising since essentially, it is the same mutable vector wrapped in different APIs with different type safetiness.
Conclusion
For the use case of mutability, the main advantage of LinearTypes over ST monad is to allow the usage of linear-typ-safe API.
Arguably, some people may also think that being able to write monad/do-notation free code is an advantage. But that seems mostly syntactical.
The downside is that, currently, you are on your own, since no library for linear, mutable unboxed data has been built and widely distributed yet.
Furthermore, I personally believe that LinearTypes
offer many more use cases than providing a safe API for mutable data. I think stronger type safety should be its unique selling point. I am still exploring this, and I hope that I will be able to provide more inspiring examples in the future.
The code is available here