Exactly, while A
is evaluating it the first time (the MVar
is taken) B
applies it again, blocking on the MVar
. For B
to complete, A
must finish. What guarantees do you have that a particular execution of pure code will run to completion? Will it remain live? Will it be GCed? Will the MVar
be freed if it is GCed?
Oh, I see what you mean. In this case I think its indeed alright: the MVar
used with withMVar
, which is an exception-safe function, so even if A
fails, the MVar
would be released. I also think that the GC must not collect a computation whose result is still being computed. Are there counterexamples to this?
In this particular example, the risk is the SMT looping. But having IO
all over the code wouldn’t save you either. So, I guess unsafePerformIO
is as safe as explicit IO
.
Regardless, I’d like to reiterate that that function with the MVar
was just an example. Documenting all of these aspects and what the questions one must ask themselves to justify their use of unsafePerformIO
is exactlty what my original question was all about.
I don’t have time to work up a test case right now, but, as far as I recall, the warnings listed in unsafeIOToSTM
apply to unsafePerformIO
in pure code. There may of course be other or future forms of speculative execution. It is not hard (modulo memory use) to reason about what happens in pure code execution. If it is becoming hard to reason about, perhaps “purely functional” is the wrong abstraction.
unsafeIOToSTM
is much worse than unsafePerformIO
in that STM does things that don’t happen in IO otherwise – in particular the retry mechanism can completely circumvent exception handling and safety. So if you have a bracket
call, the resource may be acquired, stm may interrupt the computation with a retry, and the resource will never be released.
On the other hand, using unsafePerformIO
in pure code, I think we can always assume that bracket
works as intended.
Uhm, isn’t ListT considered bad and thus be avoided at all costs?
From page 17 of 51 in Launchbury and Peyton-Jone’s State in Haskell:
dfs g vs = runST (
newArr (bounds g) False `thenST` \ marks ->
search marks vs
)
where search :: MutArr s Vertex Bool -> [Vertex]
-> ST s [Tree Vertex]
search marks [] = returnST []
search marks (v:vs) = readArr marks v `thenST` \ visited ->
if visited then
search marks vs
else
writeArr marks v True `thenST_`
search marks (g!v) `thenST` \ ts ->
search marks vs `thenST` \ us ->
returnST ((Node v ts): us)
where:
type Graph = Array Vertex [Vertex]
data Tree a = Node a [Tree a]
Here’s one way to rewrite that original code for dfs
:
dfs g vs = unsafePerformIO (
newIOArray (bounds g) False >>= \ marks ->
search marks vs
)
where search :: IOArray Vertex Bool -> [Vertex]
-> IO [Tree Vertex]
search marks [] = return []
search marks (v:vs) = readIOArray marks v >>= \ visited ->
if visited then
search marks vs
else
writeIOArray marks v True >>
search marks (g!v) >>= \ ts ->
search marks vs >>= \ us ->
return ((Node v ts): us)
Even though the use of side effects in both definitions is basically the same, the IO
version would be a violation of the given rule (it not being “free of side effects”) - in this case the presence of side effects is of no consequence, as they cannot be observed outside the scope of dfs
.
I guess this again turns into a deeper discussion of what side effects really are. Let’s take for example the current Wikipedia definition of side effects which seems reasonable to me:
In computer science, an operation, function or expression is said to have a side effect if it modifies some state variable value(s) outside its local environment, that is to say has an observable effect besides returning a value (the primary effect) to the invoker of the operation. [emphasis added]
In your example the state is not modified outside its local environment and the effects are not observable by the caller, so I would say that there are no side effects in that code (at least when taken as a whole).
Even though the use of side effects in both definitions is basically the same, the
IO
version would be a violation of the given rule (it not being “free of side effects”)
I rather have interpreted the “spec” in the sense that “after the IO action concludes (atomically), no side-effects must be visible” rather than “the IO action must not perform any side-effects while executing”. And the former clearly is the case for the IO version of dfs
, so to me the use of unsafePerformIO
is safe. The latter interpretation is much less useful.
I concur that there is too much room for interpretation in that comment.
If I may, I might say that I think there’s a few big/interesting questions at play here, that I think (but perhaps I’m mis-understanding) are interesting to discuss.
Namely:
- In Haskell, we’ve arbitrarily decided that some things are pure, and some things aren’t. For example, (paraphrasing something Conal once said to me, I think) even calling functions requires “IO”,
- Maybe it could be nice to have a way to optionally tag an explicitly impure thing as pure,
- The best mechanism we have for this is
unsafePerformIO
and friends, - Is it actually suitable for this purpose? What are the risks of using it like so?
From my understanding, it seems like there are at least some risks; and that feels annoying. Is there some better option for pushing the boundary on what Haskell would/would not consider pure? What would it mean to do so?
- In Haskell, we’ve arbitrarily decided that some things are pure, and some things aren’t.
…only if the difference between entities being free of, and having observable effects (e.g. sound from a speaker or text on a display) is considered “arbitrary”.
- For example, (paraphrasing something Conal once said to me, I think) even calling functions requires
IO
,
…was that meant to be “calling foreign functions requires IO”? Or are you referring to this comment from: Conal Elliott » Can functional programming be liberated from the von Neumann paradigm?
(Sadly, the very definition of “program” in Haskell requires that there be a
main
of typeIO ()
! […])
- Maybe it could be nice to have a way to optionally tag an explicitly impure thing as pure,
…with some inspiration from Rust, perhaps something like:
unsafe value (io_action)
which would be expanded to:
value = __impl_specific_runIO__ (io_action)
by the implementation.
- The best mechanism we have for this […]
Maybe not the “best”, but rather “only” …
- Is it actually suitable for this purpose? What are the risks of using it like so?
Those are different, but related topics - this thread is about specifying when it’s appropriate to directly use observable I/O effects. You can always start a new thread…
From my understanding, it seems like there are at least some risks; and that feels annoying.
Outside interactions often contain some level of risk:
When combining effects, their order is often highly important (the reader might want to try different combinations of the effcts “Open door”, and “Walk through door”, for example).
Magnus Carlsson and Thomas Hallgren (page 12 of 263).
It’s much easier to debug programs which take on this risk, than it is to delouse debug humans…
was that meant to be “calling foreign functions requires IO” ? Or are you referring to this comment …
Neither, I don’t think; what I mean is that literally any action at all requires physics; i.e. some interaction with the outside world.
To call a function the computer needs to perform actions in the real world; a disk needs to spin, a memory address must be read, etc, etc. We’ve decided that we don’t want to capture this level of detail in our type system (thankfully!), but what I mean is that it’s totally arbitrary. And, I think, this thread is about pushing the boundaries of that idea; i.e. how can we (lowly) programmers control that boundary? I.e. merely because I don’t have an SMT solver in Haskell I’m forced to think about that boundary; and that’s, conceptually, super frustrating! There’s no in principle reason for this abstraction to be broken at this point, in my view (and this is how I interpret the thread/question).
So what’s the alternative - shoving the entire SMT-solver framework inside a Haskell implementation just to have it available as primitive, but otherwise regular Haskell functions?
Like it or not, programming languages must have some way to access legacy code: considering how many new ones continue to appear year after year, it is utterly infeasible for us programmers to rewrite all that legacy code to suit each and every language. Maybe at the other end of this century (or millennium) matters will be easier, but until then we’ll just have to deal with it as best we can, with what we currently have.
So what’s the alternative - shoving the entire SMT-solver framework inside a Haskell implementation just to have it available as primitive, but otherwise regular Haskell functions?
Certainly not! That’s the entire point of the thread - there has to be a better alternative than that? Merely to call a function that we know is “pure” but just inconveniently happens to live just outside our abstraction boundary. The point is, how we can be bring it inside safely, without this annoying abstraction-breaking burden of a monad.
No, and I think @silky is defending exactly that and shares the same frustration as I: we need to come up with better answers about how to interact with the world. The blanket answer of “just push the IO
monad through the entire codebase” is not great for a number of reasons that were already outline higher up in this thread.
Well, discussing alternative approaches to I/O (or dealing with state) is probably best left to a new thread…are there any volunteers?
I think unsafePerformIO
might actually be a pretty good alternative, we just need to understand how to use it safely, which was what motivated my original question to @rae
I got a lot out of this thread and I’ll read and study all of the material, so thank you to everyone that gave me pointers and suggestions!
…is a draft version of e.g. a guide now publicly available?
The Haskell redis driver hedis makes a brilliant use of UnsafeInterleaveIO. I cannot imagine an STM implemention able to compete with it, performance-wise.
It’s not quite clear to me why that can’t just be done in IO
.
Let’s refactor @Nycticorax’s example - if I’m reading it correctly:
connGetReplies :: Connection -> IO [Reply]
connGetReplies conn = go S.empty (SingleLine "previous of first")
where
go rest previous = do
-- lazy pattern match to actually delay the receiving
~(r, rest') <- unsafeInterleaveIO $
-- Force previous reply for correct order.
previous `seq` connGetReply conn rest
rs <- unsafeInterleaveIO (go rest' r)
return (r:rs)
connGetReply :: Connection -> S.ByteString -> IO (Reply, S.ByteString)
connGetReply conn@Conn{..} rest =
do scanResult <- Scanner.scanWith readMore reply rest
case scanResult of
Scanner.Fail{} -> CC.errConnClosed
Scanner.More{} -> error "Hedis: parseWith returned Partial"
Scanner.Done rest' r -> do
-- r is the same as 'head' of 'connPending'. Since we just
-- received r, we remove it from the pending list.
atomicModifyIORef' connPending $ \(_:rs) -> (rs, ())
-- We now expect one less reply from Redis. We don't count to
-- negative, which would otherwise occur during pubsub.
atomicModifyIORef' connPendingCnt $ \n -> (max 0 (n-1), ())
return (r, rest')
With the comments removed:
connGetReplies :: Connection -> IO [Reply]
connGetReplies conn = go S.empty (SingleLine "previous of first")
where
go rest previous = do
~(r, rest') <- unsafeInterleaveIO $ previous `seq` connGetReply conn rest
rs <- unsafeInterleaveIO $ go rest' r
return (r:rs)
…connGetReplies
has an interesting resemblance to Data.Supply.newSupply
:
{-# INLINE newSupply #-}
newSupply :: a -> (a -> a) -> IO (Supply a)
newSupply start next = gen =<< newIORef start
where gen r = unsafeInterleaveIO
$ do v <- unsafeInterleaveIO (atomicModifyIORef r upd)
ls <- gen r
rs <- gen r
return (Node v ls rs)
upd a = let b = next a in seq b (b, a)
in particular the dual use of unsafeInterleaveIO
. Another similar definition is Control.Concurrent.getChanContents
:
getChanContents :: Chan a -> IO [a]
getChanContents ch
= unsafeInterleaveIO (do
x <- readChan ch
xs <- getChanContents ch
return (x:xs)
)
Using getChanContents
as the example, if unsafeInterleaveIO
was removed:
getChanContents ch = do
x <- readChan ch
xs <- getChanContents ch
return (x:xs)
…the result (x:xs)
would never be return
ed - it may as well be defined as:
getChanContents ch = sequence $ repeat $ readChan ch
At least one of two things would run out - memory (stack overflow or heap exhaustion) or patience!
Whether unsafeInterleaveIO
can be avoided by using e.g. interatees is left as an exercise…