so the issue with -A64m seems relatively clear to me. The default value of -A is 1 Megabyte (-A1m
) and I checked cache size of my CPU, AMD Ryzen 5 3600:
Cache L1: | 64K (per core) |
---|---|
Cache L2: | 512K (per core) |
Cache L3: | 32MB (shared) |
I guess the default in the ghc settings is oriented to common hardware. If I increase the allocation area size to much more than 4 MB, the main memory will be used instead of cache. The allocation area size is per job, i.e. “+RTS -N10 -A4m” implies 40 MB in total.
At around -A64m, gargabe collecting efficiency is highest, but it all takes place in main memory. At around -A64m, gc efficiency is relatively low, but apparently counterweighted by way faster memory access.
The same behavior can be observed on my laptop with intel cpu.
I will present the minimal example shortly, here, so others will be able to quickly reproduce the issue.