I thought I’d write up how you can use LateCCPlugin to get more meaningful profiling for GHC 9.2.
Currently (GHC 9.2) profiling centres are inserted before optimisation happens. This is bad because you’d want profiling not to influence the performance of your program. GHC 9.4 will fix this with an option similar to -fprof-late
, but it is not yet available in 9.2. Klebinger is working on it, he created a plugin to get similar behaviour on GHC 9.2 and possibly earlier.
You need to enable profiling for all dependencies using e.g. the following snippet in cabal.project:
profiling: true
profiling-details: none
profiling-details: none
is necessary because if the automatic SCC’s are inserted, nothing will get optimized. We want the plugin to insert SCC’s after optimization.
Then, add ghc-options: -fplugin=LateCCPlugin
to the component that you want to profile, and make sure -rtsopts
is on. I prefer rtsopts
, opposed to with-rtsopts
because it means you don’t need to rebuild if you want to run without profiling.
You must have the profiled component depend on the plugin: build-depends: late-cc-plugin
.
The ghc-options
can be set in either the cabal.project
file or the cabal file stanza itself.
The plugin isn’t on Hackage yet, but you can use source-repository-package
to make Cabal find it:
source-repository-package
type: git
tag: 4b02365f1daeab0fa93dbc7f14e72ba8952376e0
location: git@github.com:AndreasPK/late-cc-plugin.git
subdir: late-cc-plugin
(You can do a similar thing with Stack)
If you build your component now, try to see whether it also builds the plugin.
After the build, invoke your process, but add +RTS -p -l-au -RTS
to the command line. The u
means “user events”, and these are the only events we need because we don’t need the rest and the eventlog can grow very large. Docs for rts flags
Now, a file with the eventlog
extension should have been written. You can convert this file to a renderable format using hs-speedscope.
Finally, you can view the eventlog using https://www.speedscope.app/ , it would look something like this:
Thanks to amesgen and Andreas Klebinger for helping me figuring this out.
Here is a screenshot of the speedscope: