How cmdargs subverts Haskell

Continuing the discussion from Simple newbie-friendly CLI parser?:

Let me try to explain why I think cmdargs subverts Haskell. The package has a few different interfaces, but I’m talking about the implicit interface from the module System.Console.CmdArgs.Implicit. The interface allows you to write a completely normal data type for your command line options, for example:

data Sample = Sample {hello :: String} deriving (Show, Data)

The only special thing is that you need to derive the Data class which allows the package to inspect the structure of the data type generically (see the documentation in Data.Data).

The magic of cmdargs happens in the way you specify the behavior of the command line arguments. The way you do that is by defining a value of your command line options data type, but with magic annotations, for example:

sample = Sample{hello = "" &= help "World argument" &= opt "world"}
         &= summary "Sample v1"

Here we are assigning an annotated string value to the hello field of our Sample data type. But String in Haskell is just a list of characters. It should not be possible to add extra annotations to a String. So where do the annotations come from? Or rather, where do the annotations go?

The answer is that it uses unsafePerformIO to write the annotations to a global mutable variable. It writes this information in such a way that it is possible to recover the structure of the data type from the final result, so it is possible to recover the information about which annotation applies to which part of the options data structure.

There are some big problems with this approach as the documentation notes:

Values created with annotations are not pure - the first time they are computed they will include the annotations, but subsequently they will not.

Even using this scheme, sometimes GHC’s optimisations may share values who have the same annotation. To disable sharing you may need to specify {-# OPTIONS_GHC -fno-cse #-} in the module you define the flags.

So you need to stick very closely to the examples and even then you might need to ask the compiler not to interfere.

12 Likes

Thanks @jaror. Very understandable explanation. Makes total sense.

[…] cmdargs subverts Haskell

…more or less than the examples presented here e.g:

Alternately, is it more or less “subversive” than GHC’s internal UniqSupply type:

…if it is, you could be in good company:

To me at least, the approach used in cmdargs seems more a case of “incurring technical debt” to avoid having to use the lumbering monadic IO type in a lot more places, rather than being “subversive”.

2 Likes

I think the build system example is just pure, especially considering that throwing exceptions is still considered pure by Haskell. Their caching problem seems to be more of a problem with error handling than with purity.

The UniqSupply trick actually seems pure too. It creates an infinite tree where the nodes have integer labels that “happen to” correspond with the order in which you evaluate them. The fact that these labels could change every run of the program is not a problem because creating the tree can only be done in I/O:

If the program was recompiled with, say, a different analysis technique which meant that the evaluation order changed, then indeed different uniques would be generated. But since the whole mkUniqueSupply operation is typed as an I/O operation there is no reason to suppose that the same uniques will be generated.

A way to argue that it is pure is that you could get the same tree by generating random labels and getting lucky. The unsafeInterleaveIO is just ensuring you always get lucky.

The note, titled “Optimising the unique supply”, about previously relying on -fno-state-hack, full laziness, and inlining is as the title suggests just about the performance, not about the correctness of the approach (as far as I can tell). And either way it is solved now with the new implementation.

I think Kiselyov’s bad_ctx example would be less surprising (and more safe) if he gave it the I/O type ((Bool, Bool) -> Bool) -> IO Bool, just like mkUniqueSupply has. Kiselyov is right that unsafeInterleaveIO is very unsafe, perhaps almost as unsafe as unsafePerformIO. But you can still write a safe API (like the unique supply) that uses those unsafe functions under the hood.

So, I think both the examples you mention are actually pure or at least much more benign than what cmdargs is doing.

3 Likes

I’ve just spent some time debugging cmdargs, finding the {-# OPTIONS_GHC -fno-cse #-} solution, then trying to understand why do I need it. Can’t say I enjoyed it: it’s like lazy IO, all is great until it bites you.

It comes up often enough in the issue tracker as well Issues · ndmitchell/cmdargs · GitHub.

1 Like

[…] it’s like lazy IO […]

…you mean lazy I/O like this:

Perhaps it’s just the choice of interface: