Memory performance when reading large files

These are my testing results:

ghc gc peak mem runtime
8.10.7 copying 1.13 GB 8.9s
8.10.7 nonmoving 1.91 GB 9.4s
9.0.1 copying 1.12 GB 9.0s
9.0.1 nonmoving 1.47 GB 9.4s
9.2.1 copying 1.23 GB 8.9s
9.2.1 nonmoving 1.57 GB 9.5s

Updated code on github.

The nonmvoing garbage collector didn’t help improving performance. Note that the nonmoving gc requires compilation with -threaded. Switching to bytesting > 0.11.1 did save me about 200 MB.

ghc-8.10.7, --copying-gc

  10,854,192,904 bytes allocated in the heap
   4,168,751,200 bytes copied during GC
     483,539,712 bytes maximum residency (12 sample(s))
       2,217,216 bytes maximum slop
            1099 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     10286 colls,     0 par    1.849s   1.851s     0.0002s    0.0007s
  Gen  1        12 colls,     0 par    1.014s   1.014s     0.0845s    0.2666s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    4.700s  (  4.746s elapsed)
  GC      time    2.863s  (  2.866s elapsed)
  EXIT    time    0.000s  (  0.008s elapsed)
  Total   time    7.563s  (  7.620s elapsed)

  Alloc rate    2,309,560,431 bytes per MUT second

  Productivity  62.1% of total user, 62.3% of total elapsed

ghc-8.10.7, --nonmoving-gc

  10,854,191,096 bytes allocated in the heap
   3,014,050,232 bytes copied during GC
   1,359,623,712 bytes maximum residency (35 sample(s))
18,446,744,073,709,367,944 bytes maximum slop
            2253 MiB total memory in use (436 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     10263 colls,     0 par    2.946s   2.949s     0.0003s    0.0011s
  Gen  1        35 colls,     0 par    0.041s   0.041s     0.0012s    0.0148s
  Gen  1        35 syncs,                       0.040s     0.0012s    0.0261s
  Gen  1      concurrent,              3.057s   6.547s     0.1871s    2.2230s

  TASKS: 38 (35 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    5.292s  (  5.338s elapsed)
  GC      time    2.987s  (  2.990s elapsed)
  CONC GC time    3.057s  (  6.547s elapsed)
  EXIT    time    0.000s  (  0.002s elapsed)
  Total   time   11.335s  (  8.330s elapsed)

  Alloc rate    2,051,224,999 bytes per MUT second

  Productivity  73.6% of total user, 64.1% of total elapsed

ghc-9.0.1, --copying-gc

  11,142,965,792 bytes allocated in the heap
   4,160,725,696 bytes copied during GC
     481,143,792 bytes maximum residency (12 sample(s))
       2,229,264 bytes maximum slop
            1095 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     10566 colls,     0 par    1.917s   1.920s     0.0002s    0.0008s
  Gen  1        12 colls,     0 par    1.017s   1.017s     0.0847s    0.2659s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    4.750s  (  4.797s elapsed)
  GC      time    2.934s  (  2.937s elapsed)
  EXIT    time    0.001s  (  0.006s elapsed)
  Total   time    7.685s  (  7.740s elapsed)

  Alloc rate    2,345,705,836 bytes per MUT second

  Productivity  61.8% of total user, 62.0% of total elapsed

ghc-9.0.1, --nonmoving-gc

  11,142,964,104 bytes allocated in the heap
   3,006,568,272 bytes copied during GC
     994,390,544 bytes maximum residency (38 sample(s))
18,446,744,073,708,941,944 bytes maximum slop
            1680 MiB total memory in use (4 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     10540 colls,     0 par    2.944s   2.948s     0.0003s    0.0015s
  Gen  1        38 colls,     0 par    0.026s   0.026s     0.0007s    0.0019s
  Gen  1        38 syncs,                       0.022s     0.0006s    0.0083s
  Gen  1      concurrent,              3.136s   6.654s     0.1751s    1.2637s

  TASKS: 41 (38 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.000s elapsed)
  MUT     time    5.362s  (  5.409s elapsed)
  GC      time    2.971s  (  2.974s elapsed)
  CONC GC time    3.136s  (  6.654s elapsed)
  EXIT    time    0.000s  (  0.007s elapsed)
  Total   time   11.470s  (  8.390s elapsed)

  Alloc rate    2,077,983,578 bytes per MUT second

  Productivity  74.1% of total user, 64.5% of total elapsed

ghc-9.2.1, --copying-gc

  11,142,998,064 bytes allocated in the heap
   4,018,312,024 bytes copied during GC
     539,641,000 bytes maximum residency (11 sample(s))
       2,390,872 bytes maximum slop
            1197 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      2635 colls,     0 par    1.877s   1.878s     0.0007s    0.0032s
  Gen  1        11 colls,     0 par    0.988s   0.988s     0.0898s    0.3087s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    4.729s  (  4.779s elapsed)
  GC      time    2.864s  (  2.866s elapsed)
  EXIT    time    0.000s  (  0.005s elapsed)
  Total   time    7.594s  (  7.650s elapsed)

  Alloc rate    2,356,120,004 bytes per MUT second

  Productivity  62.3% of total user, 62.5% of total elapsed

ghc-9.2.1, --nonmoving-gc

  11,142,996,392 bytes allocated in the heap
   2,935,283,184 bytes copied during GC
   1,059,977,008 bytes maximum residency (29 sample(s))
18,446,744,073,708,492,376 bytes maximum slop
            1802 MiB total memory in use (342 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      2617 colls,     0 par    3.008s   3.010s     0.0011s    0.0056s
  Gen  1        29 colls,     0 par    0.072s   0.072s     0.0025s    0.0097s
  Gen  1        29 syncs,                       0.099s     0.0034s    0.0536s
  Gen  1      concurrent,              2.818s   6.257s     0.2158s    1.1607s

  TASKS: 33 (30 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    5.300s  (  5.350s elapsed)
  GC      time    3.081s  (  3.082s elapsed)
  CONC GC time    2.818s  (  6.257s elapsed)
  EXIT    time    0.001s  (  0.008s elapsed)
  Total   time   11.201s  (  8.440s elapsed)

  Alloc rate    2,102,289,662 bytes per MUT second

  Productivity  72.5% of total user, 63.4% of total elapsed