Using unsafePerformIO safely?

jaror · July 21, 2022, 8:37am

Luckily, you can’t observe roman candles destroying your computer or file systems changing access times from within pure code.

atravers · July 21, 2022, 2:22pm

Since pure code also cannot observe the side effects of their use, wouldn’t that imply unsafe I/O entities can be used anywhere?

jaror · July 21, 2022, 2:30pm

No, you can do some things that are observable, e.g.:

ref = unsafePerformIO (newIORef True)

set x = unsafePerformIO (writeIORef ref x)
get = unsafePerformIO (readIORef ref)

On their own get and set are relatively harmless. And the ref is even a common practice. But having the ability to both set and get a mutable value means that you now have the ability to observe the effects.

Similarly, you could imagine having an SMT solver function and having a function that gets the list of running processes, both using unsafePerformIO. The latter can observe the effects of the former, so this is unsafe. Also, in this situation the list of running process probably changes all the time, so that is a problem all on its own.

atravers · July 21, 2022, 3:02pm

…harmless or useless?

```
ref = return ()
set _ = return ()
```
```
ref = return ()
get = return True
```

Consider the situation where that SMT solver and working-process monitor are in two separate third-party libraries: by themselves in each library, their effects are “benign”. But what if you need those two libraries in your new program?

Now imagine if every library could potentially have its own "benign" use of an unsafe I/O entity…

jaror · July 22, 2022, 9:51am

True, to get modularity, modules have to communicate which assumptions they make about the use of unsafePerformIO in rest of the program.

Then it makes sense to make an explicit model of the larger system and to use that to specify which things you consider side effects. For example, do you consider creating a new process a side effect or not? In that sense @cdsmith’s suggestion above seems more principled and modular.

Now I’m also wondering, a use of unsafePerformIO that is often considered safe is if all the side effects are encapsulated. For example, if you deallocate or freeze all mutable arrays before leaving the scope of unsafePerformIO. Or in the process example, if the process finishes before leaving the scope of unsafePerformIO.

One problem with that is that the created process might still be observable if other code is running concurrently.

atravers · July 22, 2022, 10:31am

Unfortunately, those assumptions would need to be pessimistic by default because of transitivity: a definition in one module can indirectly depend on a definition in another which uses unsafePerformIO. For most codebases, that could end up “tainting” many modules, quite possibly the majority of them.

It would then probably be easier to just assume the worst - that any module can be tainted - and not bother with those communications.

That resembles the “safe-usage” requirements for unsafeLocalState:

unsafeLocalState :: IO a → a

Sometimes an external entity is a pure function, except that it passes arguments and/or results via
pointers. The function unsafeLocalState permits the packaging of such entities as pure functions.
The only IO operations allowed in the IO action passed to unsafeLocalState are

(a) local allocation (alloca, allocaBytes and derived operations such as withArray and withCString), and

(b) pointer operations (Foreign.Storable and Foreign.Ptr) on the pointers to local storage, and

(c) foreign functions whose only observable effect is to read and/or write the locally allocated memory.

Passing an IO operation that does not obey these rules results in undefined behaviour.

It is expected that this operation will be replaced in a future revision of Haskell.

The Haskell 2010 Report (page 269 of 329).

…perhaps this is as close as anyone will get for now to a guide for the safe use of unsafePerformIO.

nomeata · August 25, 2022, 2:18pm

The question “when do we know that unsafePerformIO is safe” can be specialized to “why do we believe that runST is safe?” (which is one possible use of unsafePerformIO).

I asked that before, and shortly after an answer was given by Amin Timany et al in their POPL 2018 paper “A Logical Relation for Monadic Encapsulation of State”.

So that establishes one possible way of answering that question. I am not sure how easy it is to extend to other uses of unsafePerformIO (e.g. the tricks I am playing here), or to safety in the presence of concurrency.

atravers · August 25, 2022, 10:50pm

Correct.

The problem is observable effects:

ST actions like readSTRef and writeSTRef only manipulate the contents of the program’s memory - usually only made visible with a debugger.
however, IO actions like getChar and putChar work with I/O devices which can potentially manipulate anything, including the program itself.

As for safety with concurrency, everyone’s favourite “type-smith” has already shown how unsafeInterleaveST (and consequently unsafeInterleaveIO) breaks regular Haskell semantics, even though their results are monadic:

unsafeInterleaveST :: ST s a -> ST s a
unsafeInterleaveIO :: IO a -> IO a

as opposed to:

unsafePerformST :: ST s a -> a
unsafePerformIO :: IO a -> a

If only there was a denotative way to write system software…

nomeata · August 26, 2022, 8:58am

Is anyone surprised that unsafeInterleaveST can be used to break semantics? That’s why it’s called unsafe… But certainly some uses of it are ok, and we are back to the original question of how to prove that that’s the case (or even what exactly “that” means).

atravers · August 26, 2022, 10:21am

Is anyone surprised that unsafeInterleaveST can be used to break semantics? That’s why it’s called unsafe…

Originally it was just named interleaveST (see page 10 of 12). I’m guessing that because it was for encapsulated state rather than general I/O, some may have thought it was safe. But something must have annoyed Oleg enough to devise his counter-example…

But certainly some uses of it are ok, and we are back to the original question of how to prove that that’s the case (or even what exactly “that” means).

Because of Rice’s theorem, such a non-trivial semantic property cannot always be proven - it could be possible to check for safe usage in certain circumstances, but as for the rest: we’re back to using heuristics and due care.

(If economics is the dismal science, then what are we to make of computer science?)

sgraf · August 26, 2022, 11:08am

I think what a side-effect is depends on what the observer is interested in when running the program. In an operational semantics, you would perhaps make such side-effects explicit by labelling certain kinds of transitions with an Event such as “asks for a new ref cell”, “reads from a ref cell”, “launches missiles”, “terminates” etc.

Over the course of the program execution, the observer can collect these events and then compare the event log of different traces. If the observer decides that ref cells are boring, then the new, read and write events aren’t a side effect and hence not collected. (The decision not to care for ref cells might still make a difference when one trace launches a missile when the other doesn’t simply because of different contents of a ref cell. Which is pretty much why Haskellers consider reading and writing to ref cells a side-effect.)

See also What is a good formal definition of purity? - #21 by sgraf in that regard.

If the observer doesn’t care whether the SMT solver is run and the result of doing so isn’t otherwise observable compared to running a pure function instead, then it’s not a side-effect. By contrast, if the SMT solver reads and writes non-existant configuration files and whatnot, the observer often questions whether the program is actually free of side-effects.

atravers · August 26, 2022, 11:21am

Unfortunately, there’s no reliable way to tell if the SMT solver (or any other foreign entity) is going to be “well-behaved” or not when it’s used:

VictorCMiraldo · September 23, 2022, 1:52pm

(apologies for my delay in coming back to this thread)

I think there’s quite a lot of value in this thread! I certainly learned a lot from all the links that were shared here, so thank you! This thread has a lot of resources and knowledge that was once scattered across different places and I guess the lack of a definite answer is an answer in and of itself: How to use unsafePerformIO safely? Well, it depends!

To rely on unsafePerformIO, it seems to be necessary to safeguard against any assumptions on the order and number of executions: your unsafePerformIO is at the mercy of the runtime system and it may be executed multiple times or none at all; these runs might happen sequentially in some unspecified order or concurrently.

Is that sufficient to conclude a use of unsafePerformIO is safe? In general, not at all. The full answer is inherently domain specific: what do you care about in terms of side-effects? There is already a myriad of ways in which we pretend operations are pure, but in reality they are not (memory allocation, for instance).

Nevertheless, I’m convinced that it is worth the effort to conduct this analysis: if it is possible to rely on unsafePerformIO f, this will keep IO hidden from the rest of the code and the benefits of that are huge!
Moreover, you now only have to reason about a small piece of the code being “safe”, whatever that means in your context.

I’m not sure how being in IO would help anything in this case. Is it simply because now you’d have to pick an order in which these libraries are called, therefore “acknowledging” their interaction? Because they would still interact.

atravers · September 23, 2022, 3:10pm

A Partial Rehabilitation of Side-Effecting I/O: Non-Determinism in Non-Strict Functional Languages(PS) - Manfred Schmidt-Schauß

[…] It is interesting that novices in lazy functional programming in general expect that there is some direct (side-effecting) I/O using a function call. […]
HasFuse - Haskell with FUNDIO-based side effects:
- FUNDIO: A Lambda-Calculus With letrec, case, Constructors, and an IO-Interface: Approaching a Theory of unsafePerformIO - Manfred Schmidt-Schauß
  
  […] The common method to relieve the programming language designer from the inherent IO-problems is to shift responsibility to the programmer who has to sequentialize all IO-requests. This is also true for the monadic approach implemented in Haskell. […]
- Realising nondeterministic I/O in the Glasgow Haskell Compiler - David Sabel
  
  […] It is possible that a declarative programming style for the IO part of a program can be integrated into Haskell.

VictorCMiraldo · September 27, 2022, 7:35am

I think a core distinction that is needed is the definition of “observable”, coupled with a careful analysis of what failures would mean for the application at hand. To circle it back home, in my original example we were working on a symbolic evaluation engine, and I’m now convinced that interacting with the SMT solver inside unsafePerformIO is the right option. It’s a morally pure operation and a failure due to some OS context that was improper is by no means catastrophic: just re-run the application. Finally, it brings a lot of clarity to the code, making it significantly easier to maintain.

Wouldn’t drivers hide this away from (most of) the programmers? Ideally to print something you send some information to a driver, who is later responsible to communicate with the printer. This is an interesting thought I had not considered, though! We’d still have the driver implementors to consider, too!

jaror · December 29, 2022, 1:20pm

I just found this in the GHC 5.04.3 user guide:

When is it safe to use unsafePerformIO?

We’ll give two answers to this question, each of which may be helpful. These criteria are not rigorous in any real sense (you’d need a formal semantics for Haskell in order to give a proper answer to this question), but should give you a feel for the kind of things you can and cannot do with unsafePerformIO.

It is safe to implement a function or API using unsafePerformIO if you could imagine also implementing the same function or API in Haskell without using unsafePerformIO (forget about efficiency, just consider the semantics).

In pure Haskell, the value of a function depends only on the values of its arguments (and free variables, if it has any). If you can implement the function using unsafePerformIO and still retain this invariant, then you’re probably using unsafePerformIO in a safe way. Note that you need only consider the observable values of the arguments and result.

For more information, see this thread.

I cannot find this any more in the current user guide. Maybe it should be added back in?

And that mailing list thread also contains this answer from Simon Marlow:

Ok, if we’re going to nit pick When talking about equality you have
to restrict that to “observable equivalence” in the context of whatever
abstract types you’re dealing with. If a function always returns
observably equivalent results when given observably equivalent
arguments, then it is “safe”.

For example, toFoo would not be safe if you can test for equality
between pointers. But if all you can do is convert it back using
fromFoo, then you’re ok.

It fails to recognise the fact that IO actions have side effects.

Well, side effects are usually only visible in the IO monad so that’s
ok. In your free() example, the result is either () or |, so the
function does depend on more than just the value of the argument. I
think the definition holds (but only just).

For example, we normally consider a function which uses some temporary
allocation on the C heap as “safe” to use from unsafePerformIO. But its
side effect is visible from the IO monad by inspecting the C heap
pointer.

Cheers,
Simon

dixonary · December 30, 2022, 6:27pm

To circle it back home, in my original example we were working on a symbolic evaluation engine, and I’m now convinced that interacting with the SMT solver inside unsafePerformIO is the right option. It’s a morally pure operation […]

This is a very interesting example, and one I have some experience with. Interacting with an SMT solver in the sense of “throw some constraints at it and get back a simple response” is definitely pure in the sense we care about here and using unsafePerformIO is morally fine. However, most SMT solvers support some kind of iterative refinement by storing intermediate state, and using it in that way in several places in your application would definitely not qualify as a safe use of unsafePerformIO.

The upshot of this is that if you want to do multi-shot work with a solver you either need to lift all your logic into IO and run unsafePerformIO over it once; or bubble IO through your program. Alternatively, make yourself a monad that empowers you to make SMT calls and run everything in that. (This is exactly the approach of the sbv library.)

nomeata · January 2, 2023, 3:47pm

Unless the “intermediate state” can be multiplexed and referenced, then you can represent it in a safe API as an opaque value, to be passed around like a pure value.

This is similar to the following musing: Most Haskell Web frameworks like to use (some amount of) purity in theirs “Views”: The monadic controller code, with full access to the database, generates a pure value with all information needed and passes that to a pure function that turns it into HTML (applying templates etc.) This is very pretty, as it guarantees that the view doesn’t update account balances or shenanigans like that.

But I wonder what’s so bad if the view code could read more data from the database? Why does the controller need to know if the view might choose to print “5 comments” next to a post?

Now, even read-only access to the database wouldn’t be pure, due to concurrent writes…

But what if the data base connection passed to the view could point to a immutable snapshop of the database (assuming such a things is possible – is it?), and the view has access to a pure (unsafePerformIO-wrapped) API to fetch more data? It seems that then we could allow the code in the view component to freely load more data as necessary – just as we would if all data would be in-memory anyways.

Has this been done before?

tomjaguarpaw · January 2, 2023, 4:13pm

Would the SMT solver have to keep all referenced intermediate states around indefinitely, just in case some other processes with a reference might one day call upon it?

nomeata · January 2, 2023, 4:53pm

You could use finalizers on the opaque values to free the resources on the other side when they are no longer needed, I’d expect.