Using unsafePerformIO safely?

fryguybob · March 8, 2022, 1:39pm

Exactly, while A is evaluating it the first time (the MVar is taken) B applies it again, blocking on the MVar. For B to complete, A must finish. What guarantees do you have that a particular execution of pure code will run to completion? Will it remain live? Will it be GCed? Will the MVar be freed if it is GCed?

VictorCMiraldo · March 8, 2022, 2:42pm

Oh, I see what you mean. In this case I think its indeed alright: the MVar used with withMVar, which is an exception-safe function, so even if A fails, the MVar would be released. I also think that the GC must not collect a computation whose result is still being computed. Are there counterexamples to this?

In this particular example, the risk is the SMT looping. But having IO all over the code wouldn’t save you either. So, I guess unsafePerformIO is as safe as explicit IO.

Regardless, I’d like to reiterate that that function with the MVar was just an example. Documenting all of these aspects and what the questions one must ask themselves to justify their use of unsafePerformIO is exactlty what my original question was all about.

fryguybob · March 8, 2022, 3:10pm

I don’t have time to work up a test case right now, but, as far as I recall, the warnings listed in unsafeIOToSTM apply to unsafePerformIO in pure code. There may of course be other or future forms of speculative execution. It is not hard (modulo memory use) to reason about what happens in pure code execution. If it is becoming hard to reason about, perhaps “purely functional” is the wrong abstraction.

sclv · March 8, 2022, 3:53pm

unsafeIOToSTM is much worse than unsafePerformIO in that STM does things that don’t happen in IO otherwise – in particular the retry mechanism can completely circumvent exception handling and safety. So if you have a bracket call, the resource may be acquired, stm may interrupt the computation with a retry, and the resource will never be released.

On the other hand, using unsafePerformIO in pure code, I think we can always assume that bracket works as intended.

Abab9579 · March 9, 2022, 3:36am

Uhm, isn’t ListT considered bad and thus be avoided at all costs?

atravers · March 9, 2022, 4:49pm

From page 17 of 51 in Launchbury and Peyton-Jone’s State in Haskell:

dfs g vs = runST (
              newArr (bounds g) False `thenST` \ marks ->
              search marks vs
           )
  where search :: MutArr s Vertex Bool -> [Vertex]
                  -> ST s [Tree Vertex]
        search marks   []   = returnST []
        search marks (v:vs) = readArr marks v `thenST` \ visited ->
                              if visited then
                                 search marks vs
                              else
                                writeArr marks v True `thenST_`
                                search marks (g!v)    `thenST` \ ts ->
                                search marks vs       `thenST` \ us ->
                                returnST ((Node v ts): us)

where:

type Graph = Array Vertex [Vertex]
data Tree a = Node a [Tree a]

Here’s one way to rewrite that original code for dfs:

dfs g vs = unsafePerformIO (
              newIOArray (bounds g) False >>= \ marks ->
              search marks vs
           )
  where search :: IOArray Vertex Bool -> [Vertex]
                  -> IO [Tree Vertex]
        search marks   []   = return []
        search marks (v:vs) = readIOArray marks v >>= \ visited ->
                              if visited then
                                 search marks vs
                              else
                                writeIOArray marks v True >>
                                search marks (g!v)    >>= \ ts ->
                                search marks vs       >>= \ us ->
                                return ((Node v ts): us)

Even though the use of side effects in both definitions is basically the same, the IO version would be a violation of the given rule (it not being “free of side effects”) - in this case the presence of side effects is of no consequence, as they cannot be observed outside the scope of dfs.

jaror · March 9, 2022, 5:01pm

I guess this again turns into a deeper discussion of what side effects really are. Let’s take for example the current Wikipedia definition of side effects which seems reasonable to me:

In computer science, an operation, function or expression is said to have a side effect if it modifies some state variable value(s) outside its local environment, that is to say has an observable effect besides returning a value (the primary effect) to the invoker of the operation. [emphasis added]

In your example the state is not modified outside its local environment and the effects are not observable by the caller, so I would say that there are no side effects in that code (at least when taken as a whole).

sgraf · March 9, 2022, 5:21pm

Even though the use of side effects in both definitions is basically the same, the IO version would be a violation of the given rule (it not being “free of side effects”)

I rather have interpreted the “spec” in the sense that “after the IO action concludes (atomically), no side-effects must be visible” rather than “the IO action must not perform any side-effects while executing”. And the former clearly is the case for the IO version of dfs, so to me the use of unsafePerformIO is safe. The latter interpretation is much less useful.

I concur that there is too much room for interpretation in that comment.

silky · March 10, 2022, 10:37am

If I may, I might say that I think there’s a few big/interesting questions at play here, that I think (but perhaps I’m mis-understanding) are interesting to discuss.

Namely:

In Haskell, we’ve arbitrarily decided that some things are pure, and some things aren’t. For example, (paraphrasing something Conal once said to me, I think) even calling functions requires “IO”,
Maybe it could be nice to have a way to optionally tag an explicitly impure thing as pure,
The best mechanism we have for this is unsafePerformIO and friends,
Is it actually suitable for this purpose? What are the risks of using it like so?

From my understanding, it seems like there are at least some risks; and that feels annoying. Is there some better option for pushing the boundary on what Haskell would/would not consider pure? What would it mean to do so?

atravers · March 10, 2022, 3:16pm

In Haskell, we’ve arbitrarily decided that some things are pure, and some things aren’t.

…only if the difference between entities being free of, and having observable effects (e.g. sound from a speaker or text on a display) is considered “arbitrary”.

For example, (paraphrasing something Conal once said to me, I think) even calling functions requires IO,

…was that meant to be “calling foreign functions requires IO”? Or are you referring to this comment from: Conal Elliott » Can functional programming be liberated from the von Neumann paradigm?

(Sadly, the very definition of “program” in Haskell requires that there be a main of type IO ()! […])

Maybe it could be nice to have a way to optionally tag an explicitly impure thing as pure,

…with some inspiration from Rust, perhaps something like:

unsafe value (io_action)

which would be expanded to:

value = __impl_specific_runIO__ (io_action)

by the implementation.

The best mechanism we have for this […]

Maybe not the “best”, but rather “only” …

Is it actually suitable for this purpose? What are the risks of using it like so?

Those are different, but related topics - this thread is about specifying when it’s appropriate to directly use observable I/O effects. You can always start a new thread…

From my understanding, it seems like there are at least some risks; and that feels annoying.

Outside interactions often contain some level of risk:

When combining effects, their order is often highly important (the reader might want to try different combinations of the effcts “Open door”, and “Walk through door”, for example).

Magnus Carlsson and Thomas Hallgren (page 12 of 263).

It’s much easier to debug programs which take on this risk, than it is to ~~delouse~~ debug humans…

silky · March 10, 2022, 3:20pm

was that meant to be “calling foreign functions requires IO” ? Or are you referring to this comment …

Neither, I don’t think; what I mean is that literally any action at all requires physics; i.e. some interaction with the outside world.

To call a function the computer needs to perform actions in the real world; a disk needs to spin, a memory address must be read, etc, etc. We’ve decided that we don’t want to capture this level of detail in our type system (thankfully!), but what I mean is that it’s totally arbitrary. And, I think, this thread is about pushing the boundaries of that idea; i.e. how can we (lowly) programmers control that boundary? I.e. merely because I don’t have an SMT solver in Haskell I’m forced to think about that boundary; and that’s, conceptually, super frustrating! There’s no in principle reason for this abstraction to be broken at this point, in my view (and this is how I interpret the thread/question).

atravers · March 10, 2022, 3:45pm

So what’s the alternative - shoving the entire SMT-solver framework inside a Haskell implementation just to have it available as primitive, but otherwise regular Haskell functions?

Like it or not, programming languages must have some way to access legacy code: considering how many new ones continue to appear year after year, it is utterly infeasible for us programmers to rewrite all that legacy code to suit each and every language. Maybe at the other end of this century (or millennium) matters will be easier, but until then we’ll just have to deal with it as best we can, with what we currently have.

silky · March 10, 2022, 3:50pm

So what’s the alternative - shoving the entire SMT-solver framework inside a Haskell implementation just to have it available as primitive, but otherwise regular Haskell functions?

Certainly not! That’s the entire point of the thread - there has to be a better alternative than that? Merely to call a function that we know is “pure” but just inconveniently happens to live just outside our abstraction boundary. The point is, how we can be bring it inside safely, without this annoying abstraction-breaking burden of a monad.

VictorCMiraldo · March 10, 2022, 3:50pm

No, and I think @silky is defending exactly that and shares the same frustration as I: we need to come up with better answers about how to interact with the world. The blanket answer of “just push the IO monad through the entire codebase” is not great for a number of reasons that were already outline higher up in this thread.

atravers · March 10, 2022, 3:57pm

Well, discussing alternative approaches to I/O (or dealing with state) is probably best left to a new thread…are there any volunteers?

VictorCMiraldo · March 10, 2022, 4:09pm

I think unsafePerformIO might actually be a pretty good alternative, we just need to understand how to use it safely, which was what motivated my original question to @rae

I got a lot out of this thread and I’ll read and study all of the material, so thank you to everyone that gave me pointers and suggestions!

atravers · July 11, 2022, 4:10pm

…is a draft version of e.g. a guide now publicly available?

Nycticorax · July 20, 2022, 11:26am

The Haskell redis driver hedis makes a brilliant use of UnsafeInterleaveIO. I cannot imagine an STM implemention able to compete with it, performance-wise.

tomjaguarpaw · July 20, 2022, 12:18pm

It’s not quite clear to me why that can’t just be done in IO.

atravers · July 20, 2022, 1:51pm

Let’s refactor @Nycticorax’s example - if I’m reading it correctly:

connGetReplies :: Connection -> IO [Reply]
connGetReplies conn = go S.empty (SingleLine "previous of first")
  where
    go rest previous = do
      -- lazy pattern match to actually delay the receiving
      ~(r, rest') <- unsafeInterleaveIO $ 
        -- Force previous reply for correct order.
        previous `seq` connGetReply conn rest
      rs <- unsafeInterleaveIO (go rest' r)
      return (r:rs)

connGetReply ::  Connection -> S.ByteString -> IO (Reply, S.ByteString)
connGetReply conn@Conn{..} rest =
     do scanResult <- Scanner.scanWith readMore reply rest
        case scanResult of
          Scanner.Fail{}       -> CC.errConnClosed
          Scanner.More{}    -> error "Hedis: parseWith returned Partial"
          Scanner.Done rest' r -> do
            -- r is the same as 'head' of 'connPending'. Since we just
            -- received r, we remove it from the pending list.
            atomicModifyIORef' connPending $ \(_:rs) -> (rs, ())
            -- We now expect one less reply from Redis. We don't count to
            -- negative, which would otherwise occur during pubsub.
            atomicModifyIORef' connPendingCnt $ \n -> (max 0 (n-1), ())
            return (r, rest')

With the comments removed:

connGetReplies :: Connection -> IO [Reply]
connGetReplies conn = go S.empty (SingleLine "previous of first")
  where
    go rest previous = do
      ~(r, rest') <- unsafeInterleaveIO $ previous `seq` connGetReply conn rest
      rs <- unsafeInterleaveIO $ go rest' r
      return (r:rs)

…connGetReplies has an interesting resemblance to Data.Supply.newSupply:

{-# INLINE newSupply #-}
newSupply :: a -> (a -> a) -> IO (Supply a)
newSupply start next = gen =<< newIORef start
  where gen r = unsafeInterleaveIO
              $ do v  <- unsafeInterleaveIO (atomicModifyIORef r upd)
                   ls <- gen r
                   rs <- gen r
                   return (Node v ls rs)
        upd a = let b = next a in seq b (b, a)

in particular the dual use of unsafeInterleaveIO. Another similar definition is Control.Concurrent.getChanContents:

getChanContents :: Chan a -> IO [a]
getChanContents ch
  = unsafeInterleaveIO (do
        x  <- readChan ch
        xs <- getChanContents ch
        return (x:xs)
    )

Using getChanContents as the example, if unsafeInterleaveIO was removed:

getChanContents ch = do
        x  <- readChan ch
        xs <- getChanContents ch
        return (x:xs)

…the result (x:xs) would never be returned - it may as well be defined as:

getChanContents ch = sequence $ repeat $ readChan ch

At least one of two things would run out - memory (stack overflow or heap exhaustion) or patience!

Whether unsafeInterleaveIO can be avoided by using e.g. interatees is left as an exercise…