Using unsafePerformIO safely?

I’m glad this is picking up! Particularly, thanks to everyone for the great suggestions and discussion and for @rae for posting this here :slight_smile:

@atravers, that’s amazing, thanks for all the references! I’ll study them!

I’ll try answering some points inline below…

That’s a good question indeed! That seems to enable one to catch calls to error (or other exceptions) purely. I’m inclined in calling f pure indeed, but its weird that f undefined == False. I guess that’s precisely the type of question I believe we have to answer as part of shedding light to unsafePerformIO.

I don’t think I can prove that. What is “equal” supposed to mean if we’re running an IO computation under the hood?

Ok, fair. But it could have been implemented in parallel. I’d hope that future compilers would detect that fmap f can always be executed in parallel for any functor; there’s no dependency between the calls of f.

But it would never the equal to pure True (not even bringing it up the problem of defining “equal”). We brought IO into the domain of reasoning, we’re now doomed to have to reason about the side-effects that are performed or bring in a different equivalence relation.

Precisely! That’s the question that brought all of this up for me. Hence, I’m looking into understanding what such a proof would look like, even if just informally. There’s tremendous value in pure APIs.

Memory can also run out making allocation in bytestring fail too…

I’d encourage trying to write proofs for monadic code versus trying to write proofs for pure code. I believe this complexity becomes much more present when we try to reason about the correctness of our programs. Nevertheless, even for programming exclusively, I’d bring up again how much nicer the bytestring API is, when compared to a hypothetical monadic alternative.

Is mapM (f <=< g) xs, in fact, equal to mapM g xs >>= mapM f?

Yes! In SMT it would be.

I don’t think I can prove that.

I think you can, from the commutativity and idempotency properties I gave above. How you prove those is another matter, but no more difficult than, and in fact very similar to, proving “purity”.

What is “equal” supposed to mean if we’re running an IO computation under the hood?

It means that you can replace one with another in your program and get equal results. You might respond: “But my program is in IO so I don’t even know what it means to get equal results!” True, but the same applies to proving that your program is pure. After all, you’ll have a similar problem when trying to prove that solve env exp = True iff exp is SAT.

Ok, fair. But it could have been implemented in parallel. I’d hope that future compilers would detect that fmap f can always be executed in parallel for any functor; there’s no dependency between the calls of f.

Perhaps, but you can also execute traverse f in parallel for any commutative monad (which SMT is) because in such a case there is no dependency between the calls of f either.

But it would never the equal to pure True (not even bringing it up the problem of defining “equal”). We brought IO into the domain of reasoning, we’re now doomed to have to reason about the side-effects that are performed or bring in a different equivalence relation.

I’m pretty sure that proving it’s observationally equal to pure True is roughly equally as difficult as proving that solve env exp = True for a pure version of solve, as I mentioned above.

Memory can also run out making allocation in bytestring fail too…

Yes, true. But there are few ways the RTS can fail, and they are largely non-recoverable. On the
other hand there are many ways calling out into the OS can fail, and they are largely recoverable.

I’d encourage trying to write proofs for monadic code versus trying to write proofs for pure code. I believe this complexity becomes much more present when we try to reason about the correctness of our programs. Nevertheless, even for programming exclusively, I’d bring up again how much nicer the bytestring API is, when compared to a hypothetical monadic alternative.

Well, I admit that I haven’t tried it, but my strong belief is that writing proofs for idempotent, commutative monads is not much harder than for pure code.

As I said, I agree that the bytestring API is nicer pure, but there are mitigating circumstances. For example, we typically want to combine ByteStrings with other ByteStrings in a multitude of ways. I maybe be too unimaginitive, but I do not envisage SAT results being combined in quite so many ways.

Furthermore, if there were an API similar to ByteString but operating on the (immutable once written) contents of files on the filesystem then I would definitely want the API to be monadic, even though it would be as morally pure as your proposed SAT API.

1 Like

Here’s a concrete example of why I don’t like pretending that anything that interacts with the OS is “pure”:

At work we have a Nix-based build system. It “should be” pure, since Nix is reproducible. However, it sometimes calls out to external hardware which is transiently broken, caching “hardware did not respond” into our build results, and with no way of purging it. There’s no direct analogue of that in Haskell, but it really makes me cautious of “pure” APIs that call out to the OS, and I wouldn’t use one myself, in practice. (I think it’s a worthy academic exercise though.)

2 Likes

There’s some prior work on catching exceptions in pure code. It is highlighted in “A Semantics for Imprecise Exceptions” that catching from pure code leads to nondeterministic results, e.g. pureCatch (error "this" + error "that") could return the “this” error or the “that” error, because the evaluation order of Haskell is not fixed. However since my function only returns True or False based on whether there is an exception at all, I think it might still be pure. This also only works if every exception is caught by that SomeException signature.

I don’t think f undefined == False is a problem, e.g. f _ = False is pure and also satisfies that equation.

1 Like

Regarding Using unsafePerformIO safely? - #16 by tomjaguarpaw, I would suggest another possible approach, however:

  • that would probably be too far off-topic for this thread;
  • at least one long chunk of Haskell could be required to help explain its workings.

If you or anyone else are interested, just raise your own lightning rod thread named e.g. “Fun with effects”…


Darn: how did I miss this?

Let’s consider a simple use case for that version of safePerformIO:

deleteFile :: Text -> ()
deleteFile path = safePerformIO (System.Directory.removeFile path) safeToDelete

So what should happen here?

task = (... (deleteFile "./temp.txt") ... (deleteFile "temp.txt") ...)

For ProofOfSafety values enities to be able to check for a problem like this, they would need some way to be “connected” e.g.

deleteFile :: Text -> ProofOfSafety (...) -> ()
deleteFile path safeToDelete =
  safePerformIO (System.Directory.removeFile path) safeToDelete

task safeToRun =
  let !(safe1, safe2) = safeParts safeToRun in
  (... (deleteFile "./temp.txt" safe1) ... (deleteFile "temp.txt" safe1) ...)

…well, something like that, but probably at the level of types, not expressions since ProofOfSafety is a dependent type. In either case, the implementation can use the connection between the two deleteFile calls to respond accordingly.

If you haven’t already read it, this is loosely based on the functional pearl On generating unique names:
https://www.cambridge.org/core/services/aop-cambridge-core/content/view/763DE73EB4761FDF681A613BE0E98443/S0956796800000988a.pdf/functional_pearl_on_generating_unique_names.pdf (the first “alternate approach”).

I see you took on challenge much further than what I had in my mind. You seem to be thinking of making ProofOfSafety aware of external resources. That would be awesome, but its much harder than the original idea/question of starting with ProofOfSafety covering only the aspects intrinsic to the code manipulations runtime system. At the end of the day, if you try to delete the same file twice, being in IO or not doesn’t help you at all; you get an exception and crash and that’s it.

Here’s an aspect intrinsic to the compiler and the RTS: let x = unsafePerformIO k in (x, x) must be “equal” to (unsafePerformIO k, unsafePerformIO k); I’ll call that the CSE-equality for k. A proof of CSE-equality should naturally be a component of ProofOfSafety. In fact, it already informs us that deleteFile "foo.txt" is not safe:

let x = unsafePerformIO (deleteFile "foo.txt") in (x, x) == ((), ())

whereas

(unsafePerformIO (deleteFile "foo.txt"), unsafePerformIO (deleteFile "foo.txt"))

will crash.

That’s a great example because it brings up the question of exceptions! How about defining an operation deleteFileIdemp f = deleteFile f `catch` (\(e :: IOError) -> ())? Is that safe? If would certainly satisfy the CSE-equality. My question is whether we can find a set of such properties that are sufficient to prove that running unsafePerformIO k is just as safe as running the entire code in IO for some k. In other words, relying unsafePerformIO to benefit from purity doesn’t come at any additional safety cost: if the code crashes, it would also have crashed had we placed IO everywhere anyway.

1 Like

Alright, I now think your best option is to look at those two papers by Schmidt-Schauß, seeing if there’s more recent research (e.g. via a citation search). My interest in this topic is the result of searching for other approaches to I/O in languages like Haskell:

Before monadic IO, there was a lot of vibrant and imaginative work on functional I/O. It hadn’t arrived yet, but was still in touch with the spirit of functional programming.

Conal Elliott.

In all that time, I’ve never thought of trying to establish a basic set of rules or principles for the safe use of unsafe... pseudo-definitions, particularly after encountering remarks like:

During the 1960s, several researchers began work on proving things about programs. Efforts were made to prove that:

  • A program was correct.
  • Two programs with different code computed the same answers when given the same inputs.
  • One program was faster than another.
  • A given program would always terminate.

While these are abstract goals, they are all, really, the same as the practical goal of “getting the program debugged”.

Several difficult problems emerged from this work. One was the problem of specification: before one can prove that a program is correct, one must specify the meaning of “correct”, formally and unambiguously. Formal systems for specifying the meaning of a program were developed, and they looked suspiciously like programming languages.

The Anatomy of Programming Languages, Alice E. Fischer and Frances S. Grodzinsky (page 353 of 600, emphasis by me.)

…but maybe you can see what the rest of us has missed, either in Schmidt-Schauß’s papers or other content, and make the breakthrough which reconciles the presence of effects with the otherwise-mathematical clarity presented by languages like Haskell - all the best with your research.

it can impede sharing, for one!

I know when I want to use unsafePerformIO at least - one might perform a pure computation (calling C functions) that allocates/frees memory.

I had a tidbit on The Interesting Part of Monadic Effects

I’m not sure how it works under the hood though - so I can’t justify why this whole affair works. But my intuition around unsafePerformIO at least has always worked!

But this interface should be robust to multi-threading (at least).

How so? You could certainly solve something in one thread and then pass its result to another I imagine. You could probably even use unsafeDupablePerformIO there if you didn’t call the SMT solver twice on the same inputs (as I understand it).

…which is more-or-less the “safe-use requirements” for unsafePerformIO unsafeLocalState - see chapter 30 of the Haskell 2010 Report.

I’m just wondering: is there a library for this particular SMT framework? If so, perhaps primitive imports (one of GHC’s FFI extensions) could access the library directly, using a non-IO type.

However in either case, foreign code is being used, and to quote one M. Chakravarty:

After all, you can import any C function with a pure type, which also allows you to wreak arbitrary havoc. We enable the user to disguise arbitrary machine code as a Haskell function of essentially arbitrary type. In comparison, unsafePerformIO [or unsafeLocalState] seems angelic.

If anyone can make that arrangement robust to multi-threading, etc: well done!

1 Like

Acquiring an MVar is a global effect in the program. Doing this in the context of IO is much easier to understand than in the execution of pure code. What does it matter if the runtime system chooses to stop executing some pure code? If it makes such a choice after acquiring the MVar it prevents any other thread from acquiring the MVar.

Taking an MVar is only impure if the MVar is exposed to the rest of the program. That is not the case here. In this case the only thing that can happen is that a function takes longer to run (assuming that the MVar does eventually get released), but that is not considered a side effect because pure programs cannot observe time.

2 Likes

Partial application exposes an invocation of withMVar. This is the reason for the MVar in the first place.

Regarding the specification of unsafePerformIO:

I think the documentation of unsafePerformIO is clear enough to answer this, specifically:

For this to be safe, the IO computation should be free of side effects and independent of its environment.

I, too, think this specification is exact. It also gives compiler optimisations the greatest leeway to compile pure code efficiently and is the reason why Haskell can execute efficiently in the first place.

The beauty is that the specification of unsafePerformIO in f (unsafePerformIO e) only relates to e; it doesn’t assume anything about the function/environment/context f which is called with an argument that uses unsafePerformIO. If the specification would allow some premise about all such f, then certain compiler optimisations (like strictness analysis) would have to prove that f is never called with an argument that uses unsafePerformIO to unbox f's argument, say. This would amount to saying that the optimiser must assume that Haskell is lazy, (non-total,) and impure, which is horrible to work with.

IMO, that is why the docs of unsafePerformIO are written as they are. Many use cases, like those in bytestring, fall into this fragment: After the “IO transaction” e finishes, the intermittent side-effects are no longer observable.


Regarding supposedly safe uses of unsafePerformIO that fall outside the specification:

Given enough confidence of the programmer, deferring and interleaving side-effects with unsafeInterleaveIO to achieve something like value-supply: A library for generating values without having to thread state. (which @atravers linked and which is also used in GHC) is possible if you make sure that all environments indeed can’t observe the side-effects. That is often done by making very judicious use of supposedly strictly internal APIs like GHC.Types.Unique which break the abstraction. Properly locking away these APIs to “environments” that we control and where we can make sure the optimiser doesn’t interfere (by making use of NOINLINE etc.) means that the specification of unsafePerformIO applies again. I would say this is quite similar to encapsulation in the OO sense, where an export like nonDetCmpUnique would be considered bad practice to uphold the abstraction.

In terms of GHC, failure to do so means non-deterministic builds (in terms of binary hashes) and the resulting loss of incremental rebuilds. Despite years of ongoing efforts, I still feel like bugs still hide because of use of unsafe abstractions like that, because I sometimes observe more rebuilding than I feel is necessary.

Why is it so difficult to fix these bugs? Because we actually use the Int# identity of GHC’s Unique for implementing efficient maps in terms of IntMap. And now suddenly the whole, pure API (well, at least half) of IntMap has become non-deterministic with no IO to warn us! See this Note in the compiler: compiler/GHC/Types/Unique.hs · 706deee0524ca6af26c8b8d5cff17a6e401a2c18 · Glasgow Haskell Compiler / GHC · GitLab Our solution so far is to justify each use of a non-deterministic function (which all have a nonDet* prefix) with a comment like It's OK to use nonDetUFMToList here because ... which acts as our “proof” in the sense of the OP.

Without Ord Unique (which is non-deterministic), we could only use O(n^2) assocation lists for Unique/Name lookup, which is terrible. Non-determinism as a direct result of our unsafe use of unsafePerformIO is just horrible, but we get to use O(n) IntMaps. So I’d call our solution horribly efficient :slight_smile:

2 Likes

…first unsafe, then nonDet - appearing soon: unknown.

  • Having praised monads to the hilt, let me level one criticism. Monads tend to be an all-or-nothing proposition. If you discover that you need interaction deep within your program, you must rewrite that segment to use a monad.

    Philip Wadler (page 29 of 33).

  • Once you’re in the IO monad, you’re stuck there forever, and are reduced to Algol-style imperative programming. You cannot easily convert between functional and monadic style without a radical restructuring of code.

    Robert Harper.

One alternative would be to replace the prefixes with types - then entities like these:

unsafeLocalState :: IO a -> a
nonDetUFMToList  :: UniqFM key elt -> [(Unique, elt)]

would require an extra argument:

localState :: Unsafe# -> IO a -> a
ufmToList  :: NonDet# -> UniqFM key elt -> [(Unique, elt)]

where Unsafe# and NonDet# are abstract, like GHC’s RealWorld. Since it’s IO model is already nondeterministic and can be unsafe e.g. using the FFI to call “poorly-implemented” foreign entities:

unsafely      :: IO Unsafe#
nondetermined :: IO NonDet#

…so no extra harm done to (what’s left of) the semantics attributed to Haskell.

Yes, the need for extra parameters would probably be annoying, but if it saves having to “monadify” vast segments of existing code (or even a change of paradigm)…

Uh, do we assume that pure style is strictly better than monadic style?

1 Like

Here are my points against monadic style compared to pure style:

  • Purely syntactically, I think it is clear that f x is objectively better than f =<< x.
  • Monadic style clutters code with irrelevant sequencing information.
  • Monads add implicit information which must be considered when reasoning about functions. Just considering the arguments is no longer sufficient. In particular, almost anything can happen in the IO monad.
1 Like

A couple of counterpoints:

  • it is no easier (in principle) to reason about pure code than code in the Identity monad (because they are the same thing)
  • it is often easier (in practice) to reason about code written in the State monad than the equivalent pure code that hand-threads the state parameter (because the monad captures much of the book keeping you would otherwise have to do by hand)

So I don’t think it’s clear that monadic code always hinders reasoning.

In fact I would say it’s generally easiest to reason about code that is written in the “appropriate monad”, neither “too much monadic” (IORefs in IO) nor “too little monadic” (manually threading state parameters) but “just right monadic” (State).

1 Like

I don’t think so. The only thing you can do with the result of that partial application is apply it again. You can’t scrutinize the code.

That spec is not exact. At best it’s over, approximating. Arguably, the only environment-independent computations are those of the form f <$> pure x; which are those that don’t need to be in IO in the first place.

Just to be clear, what I’m trying to argue is that there are many legitimate uses of unsafePerformIO for running computations that albeit not being env-independen or side-effect free (whatever that even means), are still perfectly safe to be used.

That starts to be something much more satisfactory that we might have a chance at formalizing and understanding. In fact, its not too far from the ideas proposed by Schmidt-Schauß on the papers that @atravers suggested.

I think what is central to your distinction is that Identity and State s are honest monads – i.e., can be interpreted in a suitable mathematical domain with a suitable equality and can be proven to satisfy the monad laws – IO is not a monad in that sense. For one, what should the suitable equality be for IO? This makes it effectively impossible to formally reason about it.

I side with @jaror and I prefer pure code. Actually, just last week we got bitten by monadic code when a ListT was introduced into the stack, but we forgot that it comes with a MonadFail instance that simply returns the emtpy list. We did not have an easy time finding this bug. We needed the transformer version because we wanted to avoid unsafePerformIO and needed to thread a (MonadIO m) around.

Exactly, while A is evaluating it the first time (the MVar is taken) B applies it again, blocking on the MVar. For B to complete, A must finish. What guarantees do you have that a particular execution of pure code will run to completion? Will it remain live? Will it be GCed? Will the MVar be freed if it is GCed?