Need suggestions about using ghc as lib

Right now im examining what load :: GhcMonad m => LoadHowMuch -> m SuccessFlag does when freshly loading a target.

I reasoned with my brain until reaching its capacity. I couldn’t reason further after runPipeline' returns.

I thought why not just modify the ghc source code with some traceM sprinkling around, so i can peek into the intermediate value. Hence, i built ghc-9.2.8 following the guideline listed in gitlab. The building works smoothly with some adjustments(mainly, the ghc.nix uses ghc96 as bootghc, which causes ghc-9.2.8's dependencies unable to fullfill)

However, it went wrong when evaluating the expression after loading ghc in ghci, following load ghc in ghci

I got the following error

Ok, 279 modules loaded.
ghci> :load GHC.Linker.Types
....loading
ghci> let t = uninitializedLoader
*** Exception: expectJust getLinkDeps
CallStack (from HasCallStack):
  error, called at compiler/GHC/Data/Maybe.hs:70:27 in ghc:GHC.Data.Maybe
  expectJust, called at compiler/GHC/Linker/Loader.hs:730:28 in ghc:GHC.Linker.Loader

I looked at some issues in gitlab, tried to duplicate the fix, but no success.Then i stumble upon the third bullet here. From my understanding, it’s not possible to evaluate ghc expression in ghci. Is it true?

If that’s the case, how ghc-as-lib people figure out what exactly each function does, other than creating a mental image with given prestate, and reason along?

1 Like

One just come to my mind. I could compile ghc with traceM sprinkling around. Then run the function i would like to peek into in ghci.

is this an overkill?

When I wanted to use GHC as a library, I stumbled upon this question too. These are the best things that I know of:

  • Read the NOTEs: The primary way to understand the way GHC works is embedded in the Notes. (I think there’s an HLS plugin that helps you navigate the Notes easier.)
  • GHC Commentary: There’s a lot to learn in the wiki in general.
  • GHC contributor’s workshop videos, and slides
  • Outgoing call hierarchy: it helps you get a better feel of who’s calling who and what’s happening in general.
  • -ddump-tc-trace, -ddump-rn-trace and other variants: In my idea, you can learn a lot by just compiling with the ddump flags and leveraging the trace statements already in the source.
  • Matrix: If you couldn’t find the thing you were looking for, you can ask in the chat.
1 Like

Hej, Ei30metry, Thx for the suggestions. Really appreciate them. I will definitely dig into those. I guess i will start with playing around flags. They seem to tackle my issue directly.

1 Like

I regularly get confused, too, when trying to follow the chain of wrappers around command-line entry points. Fortunately I don’t often have to think about those, so what is it that you are trying to achieve?


Re: GHC-in-GHCi:

Unfortunately, I don’t think that the bytecode interpreter works when using hadrian/ghci (not even on a more recent version of GHC). I think this is deliberate; note that hadrian/ghci-cabal passes -fno-code to GHCi, explaining the linker error: It simply can’t run code that does not exist. (Granted, the error message could be better.) It does not help to delete passing -fno-code or passing -fobject-code instead; then I get a different linker error related to use of the FFI and lack of a shared library defining ghc_unique_counter64.

In short, you currently can’t load GHC’s code base in GHCi. That seems ironical, but this is actually an improvement compared to a few years ago when all you could do was make to trigger type-checking of your changes, without any editor integration. I think there is continuous work to get GHCi to work on GHC, but there are major road blocks related to cross compilation and Template Haskell that are difficult to remember. @bgamari or @angerman might know more.

2 Likes

Thank for the clarification. I guess i was on the correct track on a surface level.

Right now, the current project performs parsing and typechecking twice when given a haskell program. The firs time performed by calling load, and we ignore the return flag. The second time performed by explicitly calling GHC.parseModule and GHC.typecheckModule. I thought, since load has already performed both phases, it’s really unnecessary to do it again.

So i need to know what effects load performs such that i can safely discard running parsing and typechecking the second time.

1 Like

I don’t think parseModule and typeCheckModule do any recompilation checking. They straight go to parsing the module/type-checking it. So even if you ran load before, it will just call hscParse' again.

I had a look at hscPipeline, which is a transitive callee somewhere deep in the call graph. It looks like this:

hscPipeline :: P m => PipeEnv ->  ((HscEnv, ModSummary, HscRecompStatus)) -> m (ModIface, HomeModLinkable)
hscPipeline pipe_env (hsc_env_with_plugins, mod_sum, hsc_recomp_status) = do
  case hsc_recomp_status of
    HscUpToDate iface mb_linkable -> return (iface, mb_linkable)
    HscRecompNeeded mb_old_hash -> do
      (tc_result, warnings) <- use (T_Hsc hsc_env_with_plugins mod_sum)
      hscBackendAction <- use (T_HscPostTc hsc_env_with_plugins mod_sum tc_result warnings mb_old_hash )
      hscBackendPipeline pipe_env hsc_env_with_plugins mod_sum hscBackendAction

Note the tc_result, which is passed to hscBackendPipeline but otherwise discarded. This makes me doubtful that the results of type-checking are cached by GHC itself, so perhaps it’s impossible to use load to get type-checked results. I suggest asking around on the #GHC matrix channel to attract attention of more GHC devs to help you.

1 Like

Thank again for the insights. I guess the Dev team modified load behavior after the release of ghc-9.2.8. There is no such hscPipeline function exists in it. They changed the code structure noticeably.

I agree with this “I don’t think parseModule and typeCheckModule do any recompilation checking.” I just want to avoid doing parsing and typechecking twice

I succeeded with this approach. Although, it highly depends on whether the data supports Show instance. At least, i can get a snapshot of data when evaluating to that point

FWIW, you will want to use the pprTrace* family of functions which operates on Outputable (a pretty-printer backed replacement of Show)

1 Like

i dont know how to manage my project without you :grinning_face_with_smiling_eyes: