"Modularizing GHC" paper

Thanks for the paper! There is much good stuff in it.

Here are some thoughts.

  • Although GHC is complex, hard to modify, and often ill-structured, I think it is also pretty good in many ways, including:

    • It is a thirty-year-old piece of software that is still in a state rapid development – and in its core pieces too, not just tinkering about the edges. I think that’s a testament both to Haskell’s static type system which supports fearless large-scale refactoring (as the paper says), but also to a coherent overall design, and a decades-long process of relentless refactoring. Every bit has been rewritten, often more than once!

    • Many of the internal structures, notably the design of Core and STG, have been remarkably stable over time.

    • GHC has absorbed huge changes in its core mission: the design of Haskell itself. That kind of thing can lead to a huge accumulated techincal debt, but in GHC’s case much of it (not all, I will be the first to agree) has been paid as we have gone along

    All that said, the kind of refactoring described in this paper is warmly to be welcomed.

  • The paper makes a many good points, but I think they all come down to two, in the end (I say this partly to see if others want to suggest other key points):

    1. Reduce coupling between components
    2. Don’t pass junk; that is, functions should not take arguments that they ignore.
  • The paper elaborates on (2) at length (wrt DynFlags). But one merit of (2) that is not really discussed is its impact on (1). If many module take a DynFlags input, then they must depend on the DynFlags type, which in turn depends on all the data types in DynFlags. By narrowing the inputs to a function, you narrow the number of types that function depends on, and hence on its (transitive) module dependencies, i.e. its coupling. Less junk leads to less coupling, and this may be the most important merit of (2).

  • So perhaps (1) “reduce coupling” is ultimately the main message of the paper, and indeed Section 5 opens with a plan that is solely about reducing coupling.

  • I note (in 5.2.1) that the paper recommends building a smaller version of DynFlags for (in this case) the Cmm code generation. That’s ok, but it cuts against the recommendations of 4.2.1. It would be nice to acknowledge the tension here, and that it is no fun to define a fresh data type for each function, containing only the fields that function needs. (Or, I suppose, a zillion implicit parameters.) Essentially I think there is implicit agreement here that the DynFlags/HscEnv pattern is OK (desirable, even), but only at a smaller scale: within one “component”, whatever that is, rather than the entire software system. I don’t know what a general guidance rule might be.

  • One big thing that the paper doesn’t discuss at all is the design of the API of the GHC library described in Section 2. The GHC library API has evolved, based on what GHC happened do to, rather than been designed based on what would make sense to clients. Moreover, to get their job done, clients often call functions defined deep into GHC’s innards that were never specifically designed for external use.

    I have always thought that GHC-as-a-library would benefit from a thoughtful, client-led redesign that starts from a blank sheet of paper, sketches a design that makes sense to a client, and then refines that design in the light of the many complexities dealt with by the current API. Good library design is hard!

  • Even on the coupling story, I’d love to see the next layer of detail on what needs to be done, very concretely. For example, what to we need to do to uncouple the parser from the rest of the compiler? Ditto Language.Haskell.Syntax should in principle be separable into a separate package. Maybe identifying sub-goals like these, and then giving a checklist of what needs to be done for each, would help to unlock volunteer effort?

19 Likes