What speed-up to expect for parallel program on 6 cores?

This my result. scheduler/traverseConcurrently confirms about equal compared to async/mapConcurrently, with a slight edge at nj >= 10. It does follow the general trend though and is stil being outperformed by the processes of OS (with singlethreaded ghc).

Do you have an idea of what parameters to tweak maybe? I determined the optimum of the allocation area size of the gc/the -A parameters, to be at 2 MB, close to the default value. Other parameters haven’t made a difference yet. I still don’t think this speed-up is satisfying.