What speed-up to expect for parallel program on 6 cores?

so the issue with -A64m seems relatively clear to me. The default value of -A is 1 Megabyte (-A1m) and I checked cache size of my CPU, AMD Ryzen 5 3600:

Cache L1: 64K (per core)
Cache L2: 512K (per core)
Cache L3: 32MB (shared)

I guess the default in the ghc settings is oriented to common hardware. If I increase the allocation area size to much more than 4 MB, the main memory will be used instead of cache. The allocation area size is per job, i.e. “+RTS -N10 -A4m” implies 40 MB in total.

At around -A64m, gargabe collecting efficiency is highest, but it all takes place in main memory. At around -A64m, gc efficiency is relatively low, but apparently counterweighted by way faster memory access.

The same behavior can be observed on my laptop with intel cpu.

I will present the minimal example shortly, here, so others will be able to quickly reproduce the issue.

2 Likes