Using unsafePerformIO safely?

If I may, I might say that I think there’s a few big/interesting questions at play here, that I think (but perhaps I’m mis-understanding) are interesting to discuss.

Namely:

  • In Haskell, we’ve arbitrarily decided that some things are pure, and some things aren’t. For example, (paraphrasing something Conal once said to me, I think) even calling functions requires “IO”,
  • Maybe it could be nice to have a way to optionally tag an explicitly impure thing as pure,
  • The best mechanism we have for this is unsafePerformIO and friends,
  • Is it actually suitable for this purpose? What are the risks of using it like so?

From my understanding, it seems like there are at least some risks; and that feels annoying. Is there some better option for pushing the boundary on what Haskell would/would not consider pure? What would it mean to do so?

  • In Haskell, we’ve arbitrarily decided that some things are pure, and some things aren’t.

…only if the difference between entities being free of, and having observable effects (e.g. sound from a speaker or text on a display) is considered “arbitrary”.


  • For example, (paraphrasing something Conal once said to me, I think) even calling functions requires IO,

…was that meant to be “calling foreign functions requires IO”? Or are you referring to this comment from: Conal Elliott » Can functional programming be liberated from the von Neumann paradigm?

(Sadly, the very definition of “program” in Haskell requires that there be a main of type IO ()! […])


  • Maybe it could be nice to have a way to optionally tag an explicitly impure thing as pure,

…with some inspiration from Rust, perhaps something like:

unsafe value (io_action)

which would be expanded to:

value = __impl_specific_runIO__ (io_action)

by the implementation.


  • The best mechanism we have for this […]

Maybe not the “best”, but rather “only”


  • Is it actually suitable for this purpose? What are the risks of using it like so?

Those are different, but related topics - this thread is about specifying when it’s appropriate to directly use observable I/O effects. You can always start a new thread…


From my understanding, it seems like there are at least some risks; and that feels annoying.

Outside interactions often contain some level of risk:

When combining effects, their order is often highly important (the reader might want to try different combinations of the effcts “Open door”, and “Walk through door”, for example).

Magnus Carlsson and Thomas Hallgren (page 12 of 263).

It’s much easier to debug programs which take on this risk, than it is to delouse debug humans…

was that meant to be “calling foreign functions requires IO” ? Or are you referring to this comment …

Neither, I don’t think; what I mean is that literally any action at all requires physics; i.e. some interaction with the outside world.

To call a function the computer needs to perform actions in the real world; a disk needs to spin, a memory address must be read, etc, etc. We’ve decided that we don’t want to capture this level of detail in our type system (thankfully!), but what I mean is that it’s totally arbitrary. And, I think, this thread is about pushing the boundaries of that idea; i.e. how can we (lowly) programmers control that boundary? I.e. merely because I don’t have an SMT solver in Haskell I’m forced to think about that boundary; and that’s, conceptually, super frustrating! There’s no in principle reason for this abstraction to be broken at this point, in my view (and this is how I interpret the thread/question).

3 Likes

So what’s the alternative - shoving the entire SMT-solver framework inside a Haskell implementation just to have it available as primitive, but otherwise regular Haskell functions?

Like it or not, programming languages must have some way to access legacy code: considering how many new ones continue to appear year after year, it is utterly infeasible for us programmers to rewrite all that legacy code to suit each and every language. Maybe at the other end of this century (or millennium) matters will be easier, but until then we’ll just have to deal with it as best we can, with what we currently have.

1 Like

So what’s the alternative - shoving the entire SMT-solver framework inside a Haskell implementation just to have it available as primitive, but otherwise regular Haskell functions?

Certainly not! That’s the entire point of the thread - there has to be a better alternative than that? Merely to call a function that we know is “pure” but just inconveniently happens to live just outside our abstraction boundary. The point is, how we can be bring it inside safely, without this annoying abstraction-breaking burden of a monad.

2 Likes

No, and I think @silky is defending exactly that and shares the same frustration as I: we need to come up with better answers about how to interact with the world. The blanket answer of “just push the IO monad through the entire codebase” is not great for a number of reasons that were already outline higher up in this thread.

1 Like

Well, discussing alternative approaches to I/O (or dealing with state) is probably best left to a new thread…are there any volunteers?

I think unsafePerformIO might actually be a pretty good alternative, we just need to understand how to use it safely, which was what motivated my original question to @rae :slight_smile:

I got a lot out of this thread and I’ll read and study all of the material, so thank you to everyone that gave me pointers and suggestions!

1 Like

…is a draft version of e.g. a guide now publicly available?

The Haskell redis driver hedis makes a brilliant use of UnsafeInterleaveIO. I cannot imagine an STM implemention able to compete with it, performance-wise.

It’s not quite clear to me why that can’t just be done in IO.

Let’s refactor @Nycticorax’s example - if I’m reading it correctly:

connGetReplies :: Connection -> IO [Reply]
connGetReplies conn = go S.empty (SingleLine "previous of first")
  where
    go rest previous = do
      -- lazy pattern match to actually delay the receiving
      ~(r, rest') <- unsafeInterleaveIO $ 
        -- Force previous reply for correct order.
        previous `seq` connGetReply conn rest
      rs <- unsafeInterleaveIO (go rest' r)
      return (r:rs)

connGetReply ::  Connection -> S.ByteString -> IO (Reply, S.ByteString)
connGetReply conn@Conn{..} rest =
     do scanResult <- Scanner.scanWith readMore reply rest
        case scanResult of
          Scanner.Fail{}       -> CC.errConnClosed
          Scanner.More{}    -> error "Hedis: parseWith returned Partial"
          Scanner.Done rest' r -> do
            -- r is the same as 'head' of 'connPending'. Since we just
            -- received r, we remove it from the pending list.
            atomicModifyIORef' connPending $ \(_:rs) -> (rs, ())
            -- We now expect one less reply from Redis. We don't count to
            -- negative, which would otherwise occur during pubsub.
            atomicModifyIORef' connPendingCnt $ \n -> (max 0 (n-1), ())
            return (r, rest')

With the comments removed:

connGetReplies :: Connection -> IO [Reply]
connGetReplies conn = go S.empty (SingleLine "previous of first")
  where
    go rest previous = do
      ~(r, rest') <- unsafeInterleaveIO $ previous `seq` connGetReply conn rest
      rs <- unsafeInterleaveIO $ go rest' r
      return (r:rs)

connGetReplies has an interesting resemblance to Data.Supply.newSupply:

{-# INLINE newSupply #-}
newSupply :: a -> (a -> a) -> IO (Supply a)
newSupply start next = gen =<< newIORef start
  where gen r = unsafeInterleaveIO
              $ do v  <- unsafeInterleaveIO (atomicModifyIORef r upd)
                   ls <- gen r
                   rs <- gen r
                   return (Node v ls rs)
        upd a = let b = next a in seq b (b, a)

in particular the dual use of unsafeInterleaveIO. Another similar definition is Control.Concurrent.getChanContents:

getChanContents :: Chan a -> IO [a]
getChanContents ch
  = unsafeInterleaveIO (do
        x  <- readChan ch
        xs <- getChanContents ch
        return (x:xs)
    )

Using getChanContents as the example, if unsafeInterleaveIO was removed:

getChanContents ch = do
        x  <- readChan ch
        xs <- getChanContents ch
        return (x:xs)

…the result (x:xs) would never be returned - it may as well be defined as:

getChanContents ch = sequence $ repeat $ readChan ch

At least one of two things would run out - memory (stack overflow or heap exhaustion) or patience!

Whether unsafeInterleaveIO can be avoided by using e.g. interatees is left as an exercise…

Sorry @atravers I didn’t quite get the intent; so you agree that it’s a useful/legit use case in this context, don’t you?

I agree with this, but I’m not certain that it’s enough. The problem here is that words like “side effect” imply some kind of abstraction boundary between things that the program should care about, and things that it should not. Without that abstraction boundary, nothing can ever satisfy the conditions to be safe.

In this example, invoking an SMT solver can definitely have side effects if other parts of the program look at the list of running processes to decide whether the SMT solver is running. Does that matter? Most of the time, probably not… but it certainly does matter if you’re using the SMT solver as part of a program that enforces a policy on how many processes a user is allowed to run!

For this reason, I reach the conclusion that Richard’s hypothetical ProofOfSafety is fundamentally impossible to define as part of a general-purpose library, because it relies on decisions about how to draw the right abstraction boundary, and that decision varies from program to program.

Note that the side effects must be observable from pure Haskell code. You cannot access the list of running processes that way.

Inside IO you can monitor everything that happens on the computer, so otherwise everything would be a side effect. Even 1 + 1 would have the side effect of taking time to compute and running instructions on your CPU, which you can observe inside IO.

Huh. So even the oft-quoted launching of the missiles, the most famous example of a side effect, isn’t a side effect in your view? This seems far too restrictive a notion of side effects. Now basically anything, no matter how unsafe, qualifies as technically “safe” because you cannot observe it from pure code.

Yes, this is basically what I said. If you fail to draw an abstraction boundary between internals and exposed behavior, then there is no such thing as safety. In practice, we don’t define that boundary formally. We just all sort of know that the time it takes to compute a pure expression isn’t thought of as a side effect, whereas launching missiles typically (but I guess not always!) is. If you want to prove anything, you’d need to be quite explicit about the boundary.

I guess so. I would consider anything that can’t come back to influence the actual evaluation is not really a side effect (that we should care about).

Consider (what feels to me like) the dual: reading something from the “outside world” that is a static value, i.e. not influenced by anything else in the environment. I believe that is a common use case of unsafePerformIO in practice, e.g. Control.Concurrent.rtsSupportsBoundThreads (this actually uses the FFI and not unsafePerformIO directly).

I was just providing some examples of where “regular” monadic I/O (which is strict) is too restrictive - then again, perhaps I didn’t quite get the intent of @tomjaguarpaw’s remark

so you agree that it’s a useful/legit use case in this context, don’t you?

In this context (and in the absence of a practical alternative): yes - as much as lazy I/O is despised, sometimes you need that laziness. Yes, such a mechanism has its problems, but then so does regular I/O e.g:

import System.Process(getCurrentPid, callCommand)

main = do pid <- getCurrentPid
          callCommand $ "kill " ++ show (toInteger pid)

Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots.

So far, the Universe is winning.

Rich Cook.

…unfortunately our reality isn’t quite so “neat and tidy” - a Roman candle could go off-course and land where the computer running the Haskell program in control of the fireworks is located. Even reading in a static value can have an effect e.g. changing the access time of a file.

To me, this topic is just another aspect of I/O, and it continues to confound:

  • otherwise there wouldn’t have been calls for more "semantically simple" alternatives

  • …and this thread wouldn’t have attracted so much attention :-)

Getting back on-topic: a guide for the safe safer use of unsafe entities (even if it is a draft version) is probably the best that we can do for now, hence the inquiry. It would surely be a small improvement of the current situation where such advice, if it exists, is scattered throughout cyberspace…

1 Like

Nice, I completely agree.

tl;dr: Basically here they are making a cheap, thread-safe “queue” of IO Reply actions without the need for a “canonical” queue-like data structures (Chan, TQueue…) I have not benchmarked this but I can only assume it blows these alternatives out of the water.