I’ve been interested in experimenting with compiler backends, and Cmm seems like a promising target to explore.
I’ve been reading the GHC wiki, which provides a wealth of information. However, the documentation is somewhat fragmented, containing comments on potential future developments that may not have materialized, along with some outdated information.
To gain a better understanding, I’ve been compiling small Haskell examples and analyzing their corresponding Cmm output. Through this process, I encountered a possible contradiction in the wiki documentation. The wiki states that Cmm does not support function calls with parameters—function calls are merely jumps, and parameters are explicitly passed via STG registers. While traditional function calls are syntactic sugar over this mechanism, handwritten Cmm can choose whether to use it.
However, it appears that GHC does generate Cmm with function calls. For example, consider the following Haskell code:
module Example where
sumOverArray :: [Int] -> Int
sumOverArray (x:xs) = x + sumOverArray xs
sumOverArray [] = 0
Looking at the generated Cmm (as seen on Compiler Explorer), parameters are assigned to registers and then used in function calls, such as:
call sumOverArray_info(R2)
This seems to contradict the wiki’s claim. I observed similar behavior when compiling examples on a fresh local installation of GHC. Has there been a change in how GHC handles function calls in Cmm, or is there some nuance I’m missing?
This is just notation to list registers that should be kept alive to make the call (indeed to pass arguments). The registers are determined by the callee and the calling convention. I.e. you can’t decide to “pass” R3 instead of R2 to sumOverArray_info.
GHC documents these registers as follows (cml_args_regs field of CmmCall):
-- The argument GlobalRegs (Rx, Fx, Dx, Lx) that are passed
-- to the call. This is essential information for the
-- native code generator's register allocator; without
-- knowing which GlobalRegs are live it has to assume that
-- they are all live. This list should only include
-- GlobalRegs that are mapped to real machine registers on
-- the target platform.
I imagine this is particularly useful for the LLVM backend, as it relies on a traditional parameter-based IR. In LLVM, Cmm functions are mapped to LLVM functions that always receive a fixed number of parameters corresponding to the first n STG registers, where n depends on the platform. For example, on a platform with four registers, function calls would take the form:
functionEx(R1, R2, R3, R4);
Typically, when a function requires fewer parameters, this would result in unnecessary assignments:
function1parameter(R1, 0, 0, 0);
However, since Cmm explicitly indicates the number of registers used as parameters, the backend can optimize this by using:
Here, undefined (or poison in LLVM) prevents the need to assign any value to unused parameters, avoiding redundant register assignments and improving performance.
Does this information also benefit the native code generator, or is it primarily designed to enhance LLVM?
It also benefits the native code generator: there is less register pressure before the call (in your example, R2, R3, and R4 can be reused for something else without being saved).
The native code generator using the parameter information to optimize register usage reminds me of parametrized ssa.
I wonder if a similar result can be obtained in cmm by splitting basic blocks as separate functions, but optimized for interprocedural calling each other ( after all , branching to a different block and tail calls are both jumps ? )