Using unsafePerformIO safely?

vmchale · March 4, 2022, 11:40pm

it can impede sharing, for one!

vmchale · March 4, 2022, 11:46pm

I know when I want to use unsafePerformIO at least - one might perform a pure computation (calling C functions) that allocates/frees memory.

I had a tidbit on The Interesting Part of Monadic Effects

I’m not sure how it works under the hood though - so I can’t justify why this whole affair works. But my intuition around unsafePerformIO at least has always worked!

But this interface should be robust to multi-threading (at least).

How so? You could certainly solve something in one thread and then pass its result to another I imagine. You could probably even use unsafeDupablePerformIO there if you didn’t call the SMT solver twice on the same inputs (as I understand it).

atravers · March 5, 2022, 2:49am

…which is more-or-less the “safe-use requirements” for ~~unsafePerformIO~~ unsafeLocalState - see chapter 30 of the Haskell 2010 Report.

I’m just wondering: is there a library for this particular SMT framework? If so, perhaps primitive imports (one of GHC’s FFI extensions) could access the library directly, using a non-IO type.

However in either case, foreign code is being used, and to quote one M. Chakravarty:

After all, you can import any C function with a pure type, which also allows you to wreak arbitrary havoc. We enable the user to disguise arbitrary machine code as a Haskell function of essentially arbitrary type. In comparison, unsafePerformIO [or unsafeLocalState] seems angelic.

If anyone can make that arrangement robust to multi-threading, etc: well done!

fryguybob · March 6, 2022, 3:33am

Acquiring an MVar is a global effect in the program. Doing this in the context of IO is much easier to understand than in the execution of pure code. What does it matter if the runtime system chooses to stop executing some pure code? If it makes such a choice after acquiring the MVar it prevents any other thread from acquiring the MVar.

jaror · March 6, 2022, 9:44am

Taking an MVar is only impure if the MVar is exposed to the rest of the program. That is not the case here. In this case the only thing that can happen is that a function takes longer to run (assuming that the MVar does eventually get released), but that is not considered a side effect because pure programs cannot observe time.

fryguybob · March 6, 2022, 1:38pm

Partial application exposes an invocation of withMVar. This is the reason for the MVar in the first place.

sgraf · March 7, 2022, 11:40am

Regarding the specification of unsafePerformIO:

I think the documentation of unsafePerformIO is clear enough to answer this, specifically:

For this to be safe, the IO computation should be free of side effects and independent of its environment.

I, too, think this specification is exact. It also gives compiler optimisations the greatest leeway to compile pure code efficiently and is the reason why Haskell can execute efficiently in the first place.

The beauty is that the specification of unsafePerformIO in f (unsafePerformIO e) only relates to e; it doesn’t assume anything about the function/environment/context f which is called with an argument that uses unsafePerformIO. If the specification would allow some premise about all such f, then certain compiler optimisations (like strictness analysis) would have to prove that f is never called with an argument that uses unsafePerformIO to unbox f’s argument, say. This would amount to saying that the optimiser must assume that Haskell is lazy, (non-total,) and impure, which is horrible to work with.

IMO, that is why the docs of unsafePerformIO are written as they are. Many use cases, like those in bytestring, fall into this fragment: After the “IO transaction” e finishes, the intermittent side-effects are no longer observable.

Regarding supposedly safe uses of unsafePerformIO that fall outside the specification:

Given enough confidence of the programmer, deferring and interleaving side-effects with unsafeInterleaveIO to achieve something like value-supply: A library for generating values without having to thread state. (which @atravers linked and which is also used in GHC) is possible if you make sure that all environments indeed can’t observe the side-effects. That is often done by making very judicious use of supposedly strictly internal APIs like GHC.Types.Unique which break the abstraction. Properly locking away these APIs to “environments” that we control and where we can make sure the optimiser doesn’t interfere (by making use of NOINLINE etc.) means that the specification of unsafePerformIO applies again. I would say this is quite similar to encapsulation in the OO sense, where an export like nonDetCmpUnique would be considered bad practice to uphold the abstraction.

In terms of GHC, failure to do so means non-deterministic builds (in terms of binary hashes) and the resulting loss of incremental rebuilds. Despite years of ongoing efforts, I still feel like bugs still hide because of use of unsafe abstractions like that, because I sometimes observe more rebuilding than I feel is necessary.

Why is it so difficult to fix these bugs? Because we actually use the Int# identity of GHC’s Unique for implementing efficient maps in terms of IntMap. And now suddenly the whole, pure API (well, at least half) of IntMap has become non-deterministic with no IO to warn us! See this Note in the compiler: compiler/GHC/Types/Unique.hs · 706deee0524ca6af26c8b8d5cff17a6e401a2c18 · Glasgow Haskell Compiler / GHC · GitLab Our solution so far is to justify each use of a non-deterministic function (which all have a nonDet* prefix) with a comment like It's OK to use nonDetUFMToList here because ... which acts as our “proof” in the sense of the OP.

Without Ord Unique (which is non-deterministic), we could only use O(n^2) assocation lists for Unique/Name lookup, which is terrible. Non-determinism as a direct result of our unsafe use of unsafePerformIO is just horrible, but we get to use O(n) IntMaps. So I’d call our solution horribly efficient

atravers · March 7, 2022, 6:20pm

…first unsafe, then nonDet - appearing soon: unknown.

Having praised monads to the hilt, let me level one criticism. Monads tend to be an all-or-nothing proposition. If you discover that you need interaction deep within your program, you must rewrite that segment to use a monad.

Philip Wadler (page 29 of 33).
Once you’re in the IO monad, you’re stuck there forever, and are reduced to Algol-style imperative programming. You cannot easily convert between functional and monadic style without a radical restructuring of code.

Robert Harper.

One alternative would be to replace the prefixes with types - then entities like these:

unsafeLocalState :: IO a -> a
nonDetUFMToList  :: UniqFM key elt -> [(Unique, elt)]

would require an extra argument:

localState :: Unsafe# -> IO a -> a
ufmToList  :: NonDet# -> UniqFM key elt -> [(Unique, elt)]

where Unsafe# and NonDet# are abstract, like GHC’s RealWorld. Since it’s IO model is already nondeterministic and can be unsafe e.g. using the FFI to call “poorly-implemented” foreign entities:

unsafely      :: IO Unsafe#
nondetermined :: IO NonDet#

…so no extra harm done to (what’s left of) the semantics attributed to Haskell.

Yes, the need for extra parameters would probably be annoying, but if it saves having to “monadify” vast segments of existing code (or even a change of paradigm)…

Abab9579 · March 8, 2022, 10:06am

Uh, do we assume that pure style is strictly better than monadic style?

jaror · March 8, 2022, 11:19am

Here are my points against monadic style compared to pure style:

Purely syntactically, I think it is clear that f x is objectively better than f =<< x.
Monadic style clutters code with irrelevant sequencing information.
Monads add implicit information which must be considered when reasoning about functions. Just considering the arguments is no longer sufficient. In particular, almost anything can happen in the IO monad.

tomjaguarpaw · March 8, 2022, 11:43am

A couple of counterpoints:

it is no easier (in principle) to reason about pure code than code in the Identity monad (because they are the same thing)
it is often easier (in practice) to reason about code written in the State monad than the equivalent pure code that hand-threads the state parameter (because the monad captures much of the book keeping you would otherwise have to do by hand)

So I don’t think it’s clear that monadic code always hinders reasoning.

In fact I would say it’s generally easiest to reason about code that is written in the “appropriate monad”, neither “too much monadic” (IORefs in IO) nor “too little monadic” (manually threading state parameters) but “just right monadic” (State).

VictorCMiraldo · March 8, 2022, 1:18pm

I don’t think so. The only thing you can do with the result of that partial application is apply it again. You can’t scrutinize the code.

That spec is not exact. At best it’s over, approximating. Arguably, the only environment-independent computations are those of the form f <$> pure x; which are those that don’t need to be in IO in the first place.

Just to be clear, what I’m trying to argue is that there are many legitimate uses of unsafePerformIO for running computations that albeit not being env-independen or side-effect free (whatever that even means), are still perfectly safe to be used.

That starts to be something much more satisfactory that we might have a chance at formalizing and understanding. In fact, its not too far from the ideas proposed by Schmidt-Schauß on the papers that @atravers suggested.

I think what is central to your distinction is that Identity and State s are honest monads – i.e., can be interpreted in a suitable mathematical domain with a suitable equality and can be proven to satisfy the monad laws – IO is not a monad in that sense. For one, what should the suitable equality be for IO? This makes it effectively impossible to formally reason about it.

I side with @jaror and I prefer pure code. Actually, just last week we got bitten by monadic code when a ListT was introduced into the stack, but we forgot that it comes with a MonadFail instance that simply returns the emtpy list. We did not have an easy time finding this bug. We needed the transformer version because we wanted to avoid unsafePerformIO and needed to thread a (MonadIO m) around.

fryguybob · March 8, 2022, 1:39pm

Exactly, while A is evaluating it the first time (the MVar is taken) B applies it again, blocking on the MVar. For B to complete, A must finish. What guarantees do you have that a particular execution of pure code will run to completion? Will it remain live? Will it be GCed? Will the MVar be freed if it is GCed?

VictorCMiraldo · March 8, 2022, 2:42pm

Oh, I see what you mean. In this case I think its indeed alright: the MVar used with withMVar, which is an exception-safe function, so even if A fails, the MVar would be released. I also think that the GC must not collect a computation whose result is still being computed. Are there counterexamples to this?

In this particular example, the risk is the SMT looping. But having IO all over the code wouldn’t save you either. So, I guess unsafePerformIO is as safe as explicit IO.

Regardless, I’d like to reiterate that that function with the MVar was just an example. Documenting all of these aspects and what the questions one must ask themselves to justify their use of unsafePerformIO is exactlty what my original question was all about.

fryguybob · March 8, 2022, 3:10pm

I don’t have time to work up a test case right now, but, as far as I recall, the warnings listed in unsafeIOToSTM apply to unsafePerformIO in pure code. There may of course be other or future forms of speculative execution. It is not hard (modulo memory use) to reason about what happens in pure code execution. If it is becoming hard to reason about, perhaps “purely functional” is the wrong abstraction.

sclv · March 8, 2022, 3:53pm

unsafeIOToSTM is much worse than unsafePerformIO in that STM does things that don’t happen in IO otherwise – in particular the retry mechanism can completely circumvent exception handling and safety. So if you have a bracket call, the resource may be acquired, stm may interrupt the computation with a retry, and the resource will never be released.

On the other hand, using unsafePerformIO in pure code, I think we can always assume that bracket works as intended.

Abab9579 · March 9, 2022, 3:36am

Uhm, isn’t ListT considered bad and thus be avoided at all costs?

atravers · March 9, 2022, 4:49pm

From page 17 of 51 in Launchbury and Peyton-Jone’s State in Haskell:

dfs g vs = runST (
              newArr (bounds g) False `thenST` \ marks ->
              search marks vs
           )
  where search :: MutArr s Vertex Bool -> [Vertex]
                  -> ST s [Tree Vertex]
        search marks   []   = returnST []
        search marks (v:vs) = readArr marks v `thenST` \ visited ->
                              if visited then
                                 search marks vs
                              else
                                writeArr marks v True `thenST_`
                                search marks (g!v)    `thenST` \ ts ->
                                search marks vs       `thenST` \ us ->
                                returnST ((Node v ts): us)

where:

type Graph = Array Vertex [Vertex]
data Tree a = Node a [Tree a]

Here’s one way to rewrite that original code for dfs:

dfs g vs = unsafePerformIO (
              newIOArray (bounds g) False >>= \ marks ->
              search marks vs
           )
  where search :: IOArray Vertex Bool -> [Vertex]
                  -> IO [Tree Vertex]
        search marks   []   = return []
        search marks (v:vs) = readIOArray marks v >>= \ visited ->
                              if visited then
                                 search marks vs
                              else
                                writeIOArray marks v True >>
                                search marks (g!v)    >>= \ ts ->
                                search marks vs       >>= \ us ->
                                return ((Node v ts): us)

Even though the use of side effects in both definitions is basically the same, the IO version would be a violation of the given rule (it not being “free of side effects”) - in this case the presence of side effects is of no consequence, as they cannot be observed outside the scope of dfs.

jaror · March 9, 2022, 5:01pm

I guess this again turns into a deeper discussion of what side effects really are. Let’s take for example the current Wikipedia definition of side effects which seems reasonable to me:

In computer science, an operation, function or expression is said to have a side effect if it modifies some state variable value(s) outside its local environment, that is to say has an observable effect besides returning a value (the primary effect) to the invoker of the operation. [emphasis added]

In your example the state is not modified outside its local environment and the effects are not observable by the caller, so I would say that there are no side effects in that code (at least when taken as a whole).

sgraf · March 9, 2022, 5:21pm

Even though the use of side effects in both definitions is basically the same, the IO version would be a violation of the given rule (it not being “free of side effects”)

I rather have interpreted the “spec” in the sense that “after the IO action concludes (atomically), no side-effects must be visible” rather than “the IO action must not perform any side-effects while executing”. And the former clearly is the case for the IO version of dfs, so to me the use of unsafePerformIO is safe. The latter interpretation is much less useful.

I concur that there is too much room for interpretation in that comment.

Topic		Replies	Views
[SOLVED] Safe elimination of `unsafePerformIO`? Learn	3	634	October 19, 2023
Generalizing `unsafeInterleaveIO` to Functions	1	411	September 25, 2023
Is unsafeInterleaveIO idempotent? Learn	2	525	January 31, 2024
Trick to lift IO into any monad Learn	9	1635	May 20, 2024
Is unsafeIOToSTM ever safe?	6	590	July 28, 2022

Using unsafePerformIO safely?

Related topics