…is a draft version of e.g. a guide now publicly available?
The Haskell redis driver hedis makes a brilliant use of UnsafeInterleaveIO. I cannot imagine an STM implemention able to compete with it, performance-wise.
It’s not quite clear to me why that can’t just be done in IO
.
Let’s refactor @Nycticorax’s example - if I’m reading it correctly:
connGetReplies :: Connection -> IO [Reply]
connGetReplies conn = go S.empty (SingleLine "previous of first")
where
go rest previous = do
-- lazy pattern match to actually delay the receiving
~(r, rest') <- unsafeInterleaveIO $
-- Force previous reply for correct order.
previous `seq` connGetReply conn rest
rs <- unsafeInterleaveIO (go rest' r)
return (r:rs)
connGetReply :: Connection -> S.ByteString -> IO (Reply, S.ByteString)
connGetReply conn@Conn{..} rest =
do scanResult <- Scanner.scanWith readMore reply rest
case scanResult of
Scanner.Fail{} -> CC.errConnClosed
Scanner.More{} -> error "Hedis: parseWith returned Partial"
Scanner.Done rest' r -> do
-- r is the same as 'head' of 'connPending'. Since we just
-- received r, we remove it from the pending list.
atomicModifyIORef' connPending $ \(_:rs) -> (rs, ())
-- We now expect one less reply from Redis. We don't count to
-- negative, which would otherwise occur during pubsub.
atomicModifyIORef' connPendingCnt $ \n -> (max 0 (n-1), ())
return (r, rest')
With the comments removed:
connGetReplies :: Connection -> IO [Reply]
connGetReplies conn = go S.empty (SingleLine "previous of first")
where
go rest previous = do
~(r, rest') <- unsafeInterleaveIO $ previous `seq` connGetReply conn rest
rs <- unsafeInterleaveIO $ go rest' r
return (r:rs)
…connGetReplies
has an interesting resemblance to Data.Supply.newSupply
:
{-# INLINE newSupply #-}
newSupply :: a -> (a -> a) -> IO (Supply a)
newSupply start next = gen =<< newIORef start
where gen r = unsafeInterleaveIO
$ do v <- unsafeInterleaveIO (atomicModifyIORef r upd)
ls <- gen r
rs <- gen r
return (Node v ls rs)
upd a = let b = next a in seq b (b, a)
in particular the dual use of unsafeInterleaveIO
. Another similar definition is Control.Concurrent.getChanContents
:
getChanContents :: Chan a -> IO [a]
getChanContents ch
= unsafeInterleaveIO (do
x <- readChan ch
xs <- getChanContents ch
return (x:xs)
)
Using getChanContents
as the example, if unsafeInterleaveIO
was removed:
getChanContents ch = do
x <- readChan ch
xs <- getChanContents ch
return (x:xs)
…the result (x:xs)
would never be return
ed - it may as well be defined as:
getChanContents ch = sequence $ repeat $ readChan ch
At least one of two things would run out - memory (stack overflow or heap exhaustion) or patience!
Whether unsafeInterleaveIO
can be avoided by using e.g. interatees is left as an exercise…
Sorry @atravers I didn’t quite get the intent; so you agree that it’s a useful/legit use case in this context, don’t you?
I agree with this, but I’m not certain that it’s enough. The problem here is that words like “side effect” imply some kind of abstraction boundary between things that the program should care about, and things that it should not. Without that abstraction boundary, nothing can ever satisfy the conditions to be safe.
In this example, invoking an SMT solver can definitely have side effects if other parts of the program look at the list of running processes to decide whether the SMT solver is running. Does that matter? Most of the time, probably not… but it certainly does matter if you’re using the SMT solver as part of a program that enforces a policy on how many processes a user is allowed to run!
For this reason, I reach the conclusion that Richard’s hypothetical ProofOfSafety
is fundamentally impossible to define as part of a general-purpose library, because it relies on decisions about how to draw the right abstraction boundary, and that decision varies from program to program.
Note that the side effects must be observable from pure Haskell code. You cannot access the list of running processes that way.
Inside IO you can monitor everything that happens on the computer, so otherwise everything would be a side effect. Even 1 + 1
would have the side effect of taking time to compute and running instructions on your CPU, which you can observe inside IO.
Huh. So even the oft-quoted launching of the missiles, the most famous example of a side effect, isn’t a side effect in your view? This seems far too restrictive a notion of side effects. Now basically anything, no matter how unsafe, qualifies as technically “safe” because you cannot observe it from pure code.
Yes, this is basically what I said. If you fail to draw an abstraction boundary between internals and exposed behavior, then there is no such thing as safety. In practice, we don’t define that boundary formally. We just all sort of know that the time it takes to compute a pure expression isn’t thought of as a side effect, whereas launching missiles typically (but I guess not always!) is. If you want to prove anything, you’d need to be quite explicit about the boundary.
I guess so. I would consider anything that can’t come back to influence the actual evaluation is not really a side effect (that we should care about).
Consider (what feels to me like) the dual: reading something from the “outside world” that is a static value, i.e. not influenced by anything else in the environment. I believe that is a common use case of unsafePerformIO
in practice, e.g. Control.Concurrent.rtsSupportsBoundThreads
(this actually uses the FFI and not unsafePerformIO
directly).
I was just providing some examples of where “regular” monadic I/O (which is strict) is too restrictive - then again, perhaps I didn’t quite get the intent of @tomjaguarpaw’s remark…
so you agree that it’s a useful/legit use case in this context, don’t you?
In this context (and in the absence of a practical alternative): yes - as much as lazy I/O is despised, sometimes you need that laziness. Yes, such a mechanism has its problems, but then so does regular I/O e.g:
import System.Process(getCurrentPid, callCommand)
main = do pid <- getCurrentPid
callCommand $ "kill " ++ show (toInteger pid)
Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots.
So far, the Universe is winning.
…unfortunately our reality isn’t quite so “neat and tidy” - a Roman candle could go off-course and land where the computer running the Haskell program in control of the fireworks is located. Even reading in a static value can have an effect e.g. changing the access time of a file.
To me, this topic is just another aspect of I/O, and it continues to confound:
-
otherwise there wouldn’t have been calls for more "semantically simple" alternatives
-
…and this thread wouldn’t have attracted so much attention
:-)
Getting back on-topic: a guide for the safe safer use of unsafe
entities (even if it is a draft version) is probably the best that we can do for now, hence the inquiry. It would surely be a small improvement of the current situation where such advice, if it exists, is scattered throughout cyberspace…
Nice, I completely agree.
tl;dr: Basically here they are making a cheap, thread-safe “queue” of IO Reply
actions without the need for a “canonical” queue-like data structures (Chan
, TQueue
…) I have not benchmarked this but I can only assume it blows these alternatives out of the water.
…unfortunately our reality isn’t quite so “neat and tidy” - a Roman candle could go off-course and land where the computer running the Haskell program in control of the fireworks is located. Even reading in a static value can have an effect e.g. changing the access time of a file.
Luckily, you can’t observe roman candles destroying your computer or file systems changing access times from within pure code.
[…] you can’t observe roman candles destroying your computer or file systems changing access times from within pure code.
[…] because you cannot observe it from pure code.
Since pure code also cannot observe the side effects of their use, wouldn’t that imply unsafe
I/O entities can be used anywhere?
No, you can do some things that are observable, e.g.:
ref = unsafePerformIO (newIORef True)
set x = unsafePerformIO (writeIORef ref x)
get = unsafePerformIO (readIORef ref)
On their own get
and set
are relatively harmless. And the ref
is even a common practice. But having the ability to both set and get a mutable value means that you now have the ability to observe the effects.
Similarly, you could imagine having an SMT solver function and having a function that gets the list of running processes, both using unsafePerformIO
. The latter can observe the effects of the former, so this is unsafe. Also, in this situation the list of running process probably changes all the time, so that is a problem all on its own.
On their own
get
andset
are relatively harmless. […]
…harmless or useless?
-
ref = return () set _ = return ()
-
ref = return () get = return True
Similarly, you could imagine having an SMT solver function and having a function that gets the list of running processes, both using
unsafePerformIO
. The latter can observe the effects of the former, so this is unsafe. […]
Consider the situation where that SMT solver and working-process monitor are in two separate third-party libraries: by themselves in each library, their effects are “benign”. But what if you need those two libraries in your new program?
Now imagine if every library could potentially have its own "benign" use of an unsafe
I/O entity…
A strange game.
The only winning move is
not to play.How about a nice game of chess?
True, to get modularity, modules have to communicate which assumptions they make about the use of unsafePerformIO
in rest of the program.
Then it makes sense to make an explicit model of the larger system and to use that to specify which things you consider side effects. For example, do you consider creating a new process a side effect or not? In that sense @cdsmith’s suggestion above seems more principled and modular.
Now I’m also wondering, a use of unsafePerformIO
that is often considered safe is if all the side effects are encapsulated. For example, if you deallocate or freeze all mutable arrays before leaving the scope of unsafePerformIO
. Or in the process example, if the process finishes before leaving the scope of unsafePerformIO
.
One problem with that is that the created process might still be observable if other code is running concurrently.
True, to get modularity, modules have to communicate which assumptions they make about the use of
unsafePerformIO
in [the] rest of the program.
Unfortunately, those assumptions would need to be pessimistic by default because of transitivity: a definition in one module can indirectly depend on a definition in another which uses unsafePerformIO
. For most codebases, that could end up “tainting” many modules, quite possibly the majority of them.
It would then probably be easier to just assume the worst - that any module can be tainted - and not bother with those communications.
Now I’m also wondering, a use of
unsafePerformIO
that is often considered safe is if all the side effects are encapsulated. […]
That resembles the “safe-usage” requirements for unsafeLocalState
:
unsafeLocalState :: IO a → a
Sometimes an external entity is a pure function, except that it passes arguments and/or results via
pointers. The functionunsafeLocalState
permits the packaging of such entities as pure functions.
The only IO operations allowed in the IO action passed tounsafeLocalState
are(a) local allocation (
alloca
,allocaBytes
and derived operations such aswithArray
andwithCString
), and(b) pointer operations (
Foreign.Storable
andForeign.Ptr
) on the pointers to local storage, and(c) foreign functions whose only observable effect is to read and/or write the locally allocated memory.
Passing an IO operation that does not obey these rules results in undefined behaviour.
It is expected that this operation will be replaced in a future revision of Haskell.
The Haskell 2010 Report (page 269 of 329).
…perhaps this is as close as anyone will get for now to a guide for the safe use of unsafePerformIO
.
The question “when do we know that unsafePerformIO
is safe” can be specialized to “why do we believe that runST
is safe?” (which is one possible use of unsafePerformIO
).
I asked that before, and shortly after an answer was given by Amin Timany et al in their POPL 2018 paper “A Logical Relation for Monadic Encapsulation of State”.
So that establishes one possible way of answering that question. I am not sure how easy it is to extend to other uses of unsafePerformIO
(e.g. the tricks I am playing here), or to safety in the presence of concurrency.
The question “when do we know that
unsafePerformIO
is safe” can be specialized to “why do we believe thatrunST
is safe?” (which is one possible use ofunsafePerformIO
).
I am not sure how easy it is to extend to other uses of
unsafePerformIO
[…] or to safety in the presence of concurrency.
The problem is observable effects:
-
ST
actions likereadSTRef
andwriteSTRef
only manipulate the contents of the program’s memory - usually only made visible with a debugger. -
however,
IO
actions likegetChar
andputChar
work with I/O devices which can potentially manipulate anything, including the program itself.
As for safety with concurrency, everyone’s favourite “type-smith” has already shown how unsafeInterleaveST
(and consequently unsafeInterleaveIO
) breaks regular Haskell semantics, even though their results are monadic:
unsafeInterleaveST :: ST s a -> ST s a
unsafeInterleaveIO :: IO a -> IO a
as opposed to:
unsafePerformST :: ST s a -> a
unsafePerformIO :: IO a -> a
If only there was a denotative way to write system software…