TL;DR: It seems we can build a pure data structure and API that can handle cyclic equations, but I am not entirely sure, so please try to break it.
Lazyness does not do magic
A while ago I head a job where I was writing Ocaml, which isn’t a pure language, so you can pull some neat tricks using mutable references, without making your whole code monadic.
In particular, I was writing a compiler pass that tries to determine which expressions are statically known in some sense. I want to annotate each subexpression and variable with a boolean flag, and (more or less) a value is statically known if all its subexpressions are. This would be a very simple programing task in a pure language, simply using True
, False
and and :: [Bool] -> Bool
if it were not for recursion: Variables can have right-hand side that mention that variable. Doing this naively with just pure data will obviously not work: Code like
ghci> let x = and [y]; y = and [x, False] in x
^CInterrupted.
does not terminate.
Monadic code is easy
In the Ocaml world I solved this by introducing a reference to a “monotonously mutable bool”, which can either be surely False, surely True, or we don’t know yet, and if we don’t know yet, we keep track of which other such “monotonously mutable bool” values will have to be made false if this one turns false.
If I port this code to Haskell, it might look like this (You might have to scroll the listing to see it all):
{-# LANGUAGE LambdaCase #-}
module Experiment where
import Data.IORef
import System.IO.Unsafe
-- The imperative part
data PotentiallyBool
= SurelyTrue -- this is mostly an optimization
| SurelyFalse
| MaybeFalse [LBool] -- stores pointers to dependent variables
type LBool = IORef PotentiallyBool
surelyTrue :: IO LBool
surelyTrue = newIORef SurelyTrue
surelyFalse :: IO LBool
surelyFalse = newIORef SurelyFalse
maybeFalse :: IO LBool
maybeFalse = newIORef (MaybeFalse [])
setFalse :: LBool -> IO ()
setFalse l = readIORef l >>= \case
SurelyTrue -> error "setFalse: Argument already surely True"
SurelyFalse -> pure ()
MaybeFalse ls -> do
-- Do this first, this breaks cycles
writeIORef l SurelyFalse
-- Now notify the dependencies
mapM_ setFalse ls
implies :: LBool -> LBool -> IO ()
implies l1 l2 = readIORef l2 >>= \case
SurelyTrue -> pure ()
SurelyFalse -> setFalse l1
MaybeFalse ls -> writeIORef l2 (MaybeFalse (l1 : ls))
-- this is only safe if it is run after all “requiredFor” have run
canBeTrue :: LBool -> IO Bool
canBeTrue l = readIORef l >>= \case
SurelyFalse -> pure False
_ -> pure True
This way we can write monadic code that creates these mutable boolean variables, registers the implications between them, and eventually reads off the result:
ghci> do { x <- maybeFalse; y <- maybeFalse; x `implies ` y; y `implies` x; canBeTrue x }
True
ghci> do { x <- maybeFalse; y <- maybeFalse; x `implies ` y; y `implies` x; f <- surelyFalse; y `implies` f; canBeTrue x }
False
Note that it is crucial to write SurelyFalse
in setFalse
before notifying all the dependencies.
Can we have it pure?
So far so good. But it is really quite annoying having to turn a nice elegant pure traversal of your AST into a monadic beast just for that. This made me wonder: Can we wrap this IO-infested API into an elegant and pure API, providing the combinators mentioned above (true, false and and
), while still being safe?
It seems it is possible!
I first need a helper data structure to deal with thunks of code that may be recursive:
type Thunk = IORef (Either (IO [RunningThunk]) RunningThunk)
type RunningThunk = MVar ()
thunk :: IO [RunningThunk] -> IO Thunk
thunk act = newIORef (Left act)
doneThunk :: IO Thunk
doneThunk = do
done <- newMVar ()
newIORef (Right done)
kick :: Thunk -> IO [RunningThunk]
kick d = readIORef d >>= \case
Left act -> do
inProgress <- newEmptyMVar
writeIORef d (Right inProgress) -- so that it is kicked at most once
mvs <- act -- Perform the action
putMVar inProgress () -- Mark this as done
pure mvs
Right mv -> pure [mv]
force :: Thunk -> IO ()
force t = do
mvs <- kick t
mapM_ readMVar mvs
The idea is that
-
thunk act
does not evaluateact
yet. This is important to get the laziness we need. -
force (thunk act)
then performs the IO actionact
. -
act
can require other thunks to be peformed. In order to break cycles, it is expected to justkick
, notforce
, the other thunk, and just thatRunningThunk
- This way, the top-level
force
can first kick off all execution, and then wait for all to happen.
It is crucial that force
marks the thunk as in progress before running the thing inside. This way I can have such thunks refer to each other and still be able to resolve them:
ghci> :{
mdo x <- thunk (putStrLn "Forced x" >> kick y);
y <- thunk (putStrLn "Forced y" >> kick x)
putStrLn "Starting now…"
force x
:}
Starting now…
Forced x
Forced y
With this, and of course some unsafePerformIO
, we can now attempt to create an API around a pure value that allows us to tie the knot and still get proper answers.
data PBool = PBool LBool Thunk
The idea is that the LBool
in the PBool
is available right away to be used to build other PBool
values, and the Thunk
defers all the work still possibly pending before the PBool
is “done”. So we get to this implementation for the API.
pTrue :: PBool
pTrue = unsafePerformIO $ do
l <- surelyTrue
t <- doneThunk
pure $ PBool l t
pFalse :: PBool
pFalse = unsafePerformIO $ do
l <- surelyFalse
t <- doneThunk
pure $ PBool l t
pand :: [PBool] -> PBool
pand [] = pTrue
pand xs = unsafePerformIO $ do
l <- maybeFalse
d <- thunk $ concat <$> mapM (\(PBool b t) -> l `implies` b >> kick t) xs
pure $ PBool l d
unPBool :: PBool -> Bool
unPBool (PBool l d) = unsafePerformIO $ do
force d
canBeTrue l
and indeed, it seems to work: I can build my booleans using pTrue
, pFalse
and pAnd
, and even if I have recursion (through pand
), it still finds the correct answer (defaulting to True
if possible):
ghci> let x = pand [y]; y = pand [x, pTrue] in unPBool x
True
ghci> let x = pand [y]; y = pand [x, pFalse] in unPBool x
False
ghci> let x = pand [y]; y = pand [x] in unPBool x
True
Of course this only works if we have data-dependencies, not if we have control dependencies, or else are too strict:
ghci> let x = pand [y]; y = pand [x, if unPBool x then pTrue else pFalse] in unPBool x
^CInterrupted.
So it seems if we just expose the PBool API (without the PBool
constructor), we have built a nice pure API for booleans that still works even if we use it recursively. In particular, were I to re-implement my compiler analysis in Haskell, I could implement this pass with pure non-monadic code.
Questions and observations
-
Is the API really sound, in the sense that if
x = pand ys
, thenunPBool x == all unPBool ys
, no matter how I build these PBools and where I may forceunPBool
? -
Does it break desirable properties of our nice pure language, like Referential Transparency?
-
Does it break in the presence of threads and concurrency?
Certainly all the code using
readIORef
andputIORef
needs to be made atomic. But I assume even then the above code is broken if two threads race toforce
andkick
a network of Thunks. Can this be fixed? -
I expect this can be extended to much more general data flow problems than just booleans and implication. Essentially anything of the pattern “define a bunch of objects – note how they are related – solve – read results”, including constraint-based type checking.
The question will be: Under which circumstances is the result still pure? For example, is likely necessary that it it does not matter in which order the relations between the objects are registered, and in which order they are resolved.
-
How could I prove that the API is indeed safe and pure? Reminds me a bit of the question of how to prove that
runST
is pure, answered in the POPL18 paper by Amin Timany et al. But that work doesn’t deal with concurrency yet. -
Whose wheel did I just re-invent?