Weird behavior with bracket and global IORef

brandonchinn178 · November 17, 2023, 1:39am

Yes, global IORefs = Bad.

But anyone know what might be happening here? Basically, we want to bracket setting/unsetting an IORef. But reading the IORef (also with unsafePerformIO) has different behavior when inlining the definition of bracket:

import Data.IORef
import Control.Exception
import System.IO.Unsafe

{-# NOINLINE isSetRef #-}
isSetRef :: IORef Bool
isSetRef = unsafePerformIO $ newIORef False

setMessage :: IO ()
setMessage =
  writeIORef isSetRef True

unsetMessage :: IO ()
unsetMessage =
  writeIORef isSetRef False

{-# INLINE message #-}
message :: String
message =
  unsafePerformIO $
    readIORef isSetRef >>= \case
      True -> pure "SET"
      False -> pure "UNSET"

main :: IO ()
main = do
  let before = setMessage
  let after () = unsetMessage
  let thing () = print message

  -- (1)
  let test1 = do
        bracket before after thing

  -- (2)
  let test2 = do
        mask $ \restore -> do
          a <- before
          r <- restore (thing a) `onException` after a
          _ <- after a
          return r

  -- (3)
  let test3 = do
        before
        thing ()
        after ()

  {- ******************************************************************************** -}

  putStrLn "Should print 'SET':"
  test1

  {- ******************************************************************************** -}

  putStrLn "Should print 'UNSET':"
  print message

With test1, it shows SET/SET, but test2 and test3 show the correct SET/UNSET. I’m not sure why test1 and test2 have different behavior. Any ideas?

atravers · November 17, 2023, 3:10am

With the addition of {-# LANGUAGE LambdaCase #-}, it seems to work here:

# ghci Main.hs
GHCi, version 9.4.4: https://www.haskell.org/ghc/  :? for help
[1 of 2] Compiling Main             ( Main.hs, interpreted )
Ok, one module loaded.
ghci> main
Should print 'SET':
"SET"
Should print 'UNSET':
"UNSET"
ghci>

brandonchinn178 · November 17, 2023, 3:15am

What if you compile and run? I’ve been doing stack run… I should repro it later with vanilla ghc

atravers · November 17, 2023, 3:17am

# ghc -o runme Main.hs
[1 of 2] Compiling Main             ( Main.hs, Main.o )
[2 of 2] Linking runme
# ./runme 
Should print 'SET':
"SET"
Should print 'UNSET':
"UNSET"
# ghc -V
The Glorious Glasgow Haskell Compilation System, version 9.4.4
#

brandonchinn178 · November 17, 2023, 6:31am

Ok interesting. I reran with vanilla GHC 9.4.5:

Test	Scenario	Result
test1: With `bracket`	`-O0`	SET/UNSET
	`-O1`	SET/SET
	`-O2`	SET/SET
	`-O0 -funfolding-use-threshold=16`	SET/UNSET
	`-O1 -funfolding-use-threshold=16`	SET/SET
	`-O2 -funfolding-use-threshold=16`	SET/SET
test2: Inline `bracket` definition	`-O0`	SET/UNSET
	`-O1`	SET/SET
	`-O2`	SET/SET
	`-O0 -funfolding-use-threshold=16`	SET/UNSET
	`-O1 -funfolding-use-threshold=16`	SET/UNSET
	`-O2 -funfolding-use-threshold=16`	SET/UNSET

So when inlining the definition of bracket with -funfolding-use-threshold, we get the expected behavior. But otherwise, -O0 is the only version that does the expected behavior.

My understanding was that the INLINE pragma would always force message to inline its unsafePerformIO action, but it doesn’t seem to be the case in 3 out of 4 scenarios. Am I missing something?

lorenzo · November 17, 2023, 6:39am

You probably need to use NOINLINE for each declaration using unsafePerformIO. That’s a well-known Haskell idiom when creating global mutable variables.

The way I understand it is that if you let the compiler inline the definition, it is also free to float your variable and not execute the code inside as you expect.

atravers · November 17, 2023, 7:16am

As I suspected:

# ghc -O2 -fno-full-laziness -o runme Main.hs
[1 of 2] Compiling Main             ( Main.hs, Main.o )
[2 of 2] Linking runme [Objects changed]
# ./runme
Should print 'SET':
"SET"
Should print 'UNSET':
"UNSET"
#

…it seems you’re fighting the “Glorious Simplifier” - it’s reverting the inlining.

brandonchinn178 · November 17, 2023, 7:19am

Right, so I have NOINLINE for the global ref variable. But I intentionally want INLINE for message, because for my usecase, I actually require that the compiler inlines and reruns the IO action.

The use case here is we have a global ref that production code should read in pure functions. In production, these values will never change, but in tests, we want to test with different values. So we need one test to set the ref, run a pure function, and check it returned one thing, and another test to set the ref differently, run the pure function again with the same arguments, and check it returned something else.

brandonchinn178 · November 17, 2023, 7:20am

Makes sense. My intuition is that the INLINE pragma should override the compiler though. Is that not correct?

atravers · November 17, 2023, 7:49am

Apparently not:

…that isn’t a “pure function” - it’s a “wolf in a fleecy overcoat”.

Welcome to the “flip-side” of trying to “smuggle effects” - the implementation assumes an effect-free type signature means a proper effect-free mathematical definition, and will treat the definition accordingly.

If the situation is that bad…you could try to salvage it using e.g:

value-supply: A library for generating values without having to thread state.

…to avoid (most of) the ugliness of “monadising” by at least providing that “pure function” with a different argument each time it’s called. But even this approach has it’s limits (in Haskell 2010 anyway).

For a working implementation of the technique, see compiler/GHC/Types/Unique/Supply.hs · master · Glasgow Haskell Compiler / GHC · GitLab .

jaror · November 17, 2023, 9:23am

This is not specific to bracket or global IORefs (which can be used safely IF you remember to use noinline and make sure they’re not polymorphic). It is really because of the use of unsafePerformIO in message.

Even if GHC decides to inline it, it may still decide to do common subexpression elimination later (changing two unsafePerformIO calls into one) or let-floating (moving an unsafePerformIO out of a function that is called multiple times). As you’ve seen there are so many details to GHC’s optimizations that trying to get consistent behavior out of this will really be an uphill battle.

Also, inline pragmas only work when optimizations are enabled (at least across modules).

I’d suggest just using IO.

jaror · November 17, 2023, 3:50pm

Actually, rereading your comment about using a different value for tests, I think another good option is to use implicit parameters: 6.11.5. Implicit parameters — Glasgow Haskell Compiler 9.8.1 User's Guide

brandonchinn178 · November 17, 2023, 4:05pm

Theoretically, we could also always use -O0 for tests, right?

But yes, I did think about implicit params last night. The downside is needing to add a constraint up the whole call stack if we add a usage to a nested pure function. But we’ll consider it

jaror · November 17, 2023, 4:12pm

A problem with that is that -O0 only works together with that INLINE pragma if your binding is in the same module as all use sites.

atravers · November 17, 2023, 9:05pm

In the “types as documentation” vernacular: the documentation was inaccurate, all the way along that call stack. But the difference here is that your Haskell implementation also relies on this particular form of documentation, and it cannot “read between the lines”, hence the inconsistent behaviour.

So all that documentation now has to be updated.

brandonchinn178 · November 17, 2023, 9:53pm

Understood. In production, this is fine. The unsafePerformIO step should be considered constants. The only problem is tests, where we want to toggle the constants and test their behavior

jaror · November 17, 2023, 10:05pm

Another option you have is to put all such constants in one module and compile several version to shared libraries and swap them out at test-time. That does sound pretty difficult. I wish backpack supported a workflow like that.

You can already use backpack for something similar:

library business-indef
    build-depends:    base
    signatures:       Constants
    exposed-modules:  Business
    hs-source-dirs:   business-indef

library constants-prod
    build-depends:    base
    exposed-modules:  Constants
    hs-source-dirs:   constants-prod

library constants-test1
    build-depends:      base
    exposed-modules:    Constants
    reexported-modules: Constants as Constants1
    hs-source-dirs:     constants-test1

library constants-test2
    build-depends:      base
    exposed-modules:    Constants
    reexported-modules: Constants as Constants2
    hs-source-dirs:     constants-test2

executable prod
    main-is:          Main.hs
    build-depends:    base, business-indef, constants-prod
    hs-source-dirs:   prod

executable test
    main-is:          Main.hs
    build-depends:    base, business-indef, constants-test1, constants-test2
    mixins:           business-indef (Business as Business1)
                        requires (Constants as Constants1),
                      business-indef (Business as Business2)
                        requires (Constants as Constants2)
    hs-source-dirs:   test

-- business-indef/Constants.hsig:
signature Constants where

x :: Int

-- business-indef/Business.hs:
module Business where

import Constants

business = print x

-- constants-prod/Constants.hs:
module Constants where

x :: Int
x = 1

-- constants-test1/Constants.hs:
module Constants where

x :: Int
x = 2

-- constants-test2/Constants.hs:
module Constants where

x :: Int
x = 3

-- prod/Main.hs:
module Main where

import Business

main :: IO ()
main = business

-- test/Main.hs:
module Main where

import qualified Business1
import qualified Business2

main :: IO ()
main = do
  Business1.business
  Business2.business

$ cabal run prod
1
$ cabal run test
2
3

But this will recompile your business code for each set of constants that you have. You can speed it up by using -O0 for the test builds.

Ignoring the compile time problems, it seems like a very verbose approach with this simple example. However, with this approach you never have to change the functions in your business code. The only thing you have to do with your business code is add one import to each module that uses one of the constants.

sgraf · November 18, 2023, 1:29pm

I’d abstract message over something that is used insid the unsafePerformIO block, e.g.,

message :: IORef Bool -> String
message ref = unsafePerformIO $ readIORef re >>= ...
{-# NOINLINE message #-}

Of course the problem at use sites is then that we will likely float message isSetRef to top-level and we end up where we began, so we’d need the hypothetical noupdate:

main = do
  let before = setMessage
  let after () = unsetMessage
  let thing () = print (noupdate $ \hide -> message (hide isSetRef))

jaror · November 18, 2023, 2:50pm

Doesn’t that then still require passing the isSetRef everywhere? Then you might as well just pass the value directly without IORef.

sgraf · November 18, 2023, 3:02pm

isSetRef is a global variable, so no need for implicit params