I thought I’d write up how you can use LateCCPlugin to get more meaningful profiling for GHC 9.2.
Currently (GHC 9.2) profiling centres are inserted before optimisation happens. This is bad because you’d want profiling not to influence the performance of your program. GHC 9.4 will fix this with an option similar to
-fprof-late, but it is not yet available in 9.2. Klebinger is working on it, he created a plugin to get similar behaviour on GHC 9.2 and possibly earlier.
You need to enable profiling for all dependencies using e.g. the following snippet in cabal.project:
profiling: true profiling-details: none
profiling-details: none is necessary because if the automatic SCC’s are inserted, nothing will get optimized. We want the plugin to insert SCC’s after optimization.
ghc-options: -fplugin=LateCCPlugin to the component that you want to profile, and make sure
-rtsopts is on. I prefer
rtsopts, opposed to
with-rtsopts because it means you don’t need to rebuild if you want to run without profiling.
You must have the profiled component depend on the plugin:
ghc-options can be set in either the
cabal.project file or the cabal file stanza itself.
The plugin isn’t on Hackage yet, but you can use
source-repository-package to make Cabal find it:
source-repository-package type: git tag: 4b02365f1daeab0fa93dbc7f14e72ba8952376e0 location: firstname.lastname@example.org:AndreasPK/late-cc-plugin.git subdir: late-cc-plugin
(You can do a similar thing with Stack)
If you build your component now, try to see whether it also builds the plugin.
After the build, invoke your process, but add
+RTS -p -l-au -RTS to the command line. The
u means “user events”, and these are the only events we need because we don’t need the rest and the eventlog can grow very large. Docs for rts flags
Now, a file with the
eventlog extension should have been written. You can convert this file to a renderable format using hs-speedscope.
Finally, you can view the eventlog using https://www.speedscope.app/ , it would look something like this:
Thanks to amesgen and Andreas Klebinger for helping me figuring this out.
Here is a screenshot of the speedscope: