Rolling my own Either: does GHC optimize it differently?

Question for anyone who knows stuff about how GHC optimizes things:

I’m working on a Haskell project in which I defined and used my own Either-isomorphic data type. This involved writing versions of any Either-handling utilities I wanted to use for this type. I did this in part because I wanted to experiment with the impact of marking one or both of the constructors as strict. (So a simple newtype wrapper plus pattern synonyms wouldn’t be an acceptable alternative.)

I’ve been using weigh to profile allocations, and I was surprised to note that, even before using any strictness annotations, the number of allocations in my test suite went up after making this replacement. (This is with -O2.)

Thinking that surely I must have made some accidental change in the process of refactoring, I commented out my custom type and its instances and replaced it with a type synonym pointing to Either and bidirectional pattern synonyms for the constructors, while keeping all of my replacement utility functions as-is. The allocation count went back down to its original value.

I then dumped the Core output of all relevant modules, commented out the above synonyms, uncommented the original type definition, and dumped the Core output again. The two sets of Core files appear identical, up to local variable names, line breaks, and the names of the two constructors replacing Left and Right.

The only two instances for Either that I’m using are Functor and Bifunctor, and I’ve written those instances to be character-for-character identical to their definitions in base, except for type and constructor names. Everything else that’s executable code should be unchanged when I do the Either-synonym switcheroo.

So I’m not sure how to explain the difference in run-time behavior. Are there optimizations GHC is applying after the Core simplification passes that might treat literal Either differently from an isomorphic type?

2 Likes

I don’t think there are any Either-specific optimizations in GHC itself. There might be a special rewrite rule in base, but that also seems unlikely to me. Can you post the project here so that we can try to reproduce the performance difference?

After which pass? if you dump the core at all the different stages of the pipeline you might be able to determine which pass causes the difference.

Hasn’t been published yet, but I’m working on minimizing and extracting a self-contained chunk to share.

Whatever pass -ddump-simpl is after. I’m new to this, and there are a ton of -ddump options; what else should I try?

As far as I understand, -ddump-simpl is the final stage of Core (at least, that’s what [the Users’ Guide](5.13. Debugging the compiler — Glasgow Haskell Compiler 9.6.2 User's Guide] seems to suggest. Perhaps you can try -ddump-stg-from-core to see if there’s any difference in the generated STG.

I’d try -ddump-stg-final

Okay, after dumping and comparing stg-{from-core,unarised,tags,cg,final}, and assuming that’s the correct order of the passes (I couldn’t find it in the documentation anywhere, but that’s the order in which the files are written to disk), the two compilations start to diverge between stg-unarised and stg-tags. In the stg-tags dump, I see several structures like this:

let {
  (sat_s3RW,
   <TagProper>) =
      MyProject.A.MyLeft! [x2_s3RV];
} in 
  $j_s3Px
      sat_s3RW;

when compiling with my hand-written Either; when using Data.Either, the corresponding places in the dump look like:

$j_s3OR
    wild8_s3Re;

What might this mean? I don’t even know what’s supposed to be happening in the stg-tags pass.

This is very interesting and needs a GHC expert to explain. Perhaps summoning @simonpj will work. If not you can try posting to the ghc-devs mailing list to direct their attention there (or I will do so in due course).

Oh, this could happen if you write a function that somewhere has some code like this:

case Left x of
   y -> ... Left x ...

Which will be optimized to:

case Left x of
   y -> ... y ...

But if you replace that second Left with MyLeft you won’t get that optimisation:

case Left x of
   y -> ... MyLeft x ...

Still, I’d expect that to show up in the Core dumps…

3 Likes

@jaror is certainly right if your program has a mixture of Either and MyEither. But if you replace all Eithers with MyEithers, I’d be surprised if there was a difference.

A-ha! Yes! That does seem to be what is happening! The wild8_s3Re in the previous dump points to an outer variable that, in this branch, matches with Data.Either.Left (x2_s3RV, <TagDunno>).

Thanks, that clears up quite a bit of my confusion. I suppose that means things are working as intended?