[ANN] lazy-scope - ST-like IO Handle and lazy ByteString

lazy-scope library appeared as an attempt to improve lazy IO API from bytestring package:

  • hGetContents closes handle which was open by somebody else.
  • hGetContents closes handle only on EOF

E.g. git-phoenix does GIT objects recovery. Recovered compressed file usually has trailing trash bytes after archive ends. In such circumstance bracket finalizer should check every handle before closing.

lazy-scope library provides hGetContents with alternative semantic - it never close the handle! Handle and values, derived from it, have a type parameter which prevents accidental thunk escape beyond open handle scope. Solution is based on approach used in ST monad.

2 Likes

Nice! I appreciate this solution offers an answer other than “avoid it at all costs” to the lazy I/O problem.

2 Likes

Nice!

However, I would suggesting implementing this as a Bluefin library, rather than by hand (using s :> es => Eff es r instead of LazyT s m r). Why? A few reasons:

One reason is that Bluefin is a general theory of scopes, so some of the implementation is already written for you. Another reason is composability of your library: suppose I write

Scoped T1 s1 -> Scoped T2 s2 -> Lazy s1 (Lazy s2 m R1)

and you write

Scoped T1 s1 -> Scoped T2 s2 -> Lazy s2 (Lazy s1 m R2)

then how do we get (R1, R2)? I’m not sure we can. In Bluefin those two can be written as

(s1 :> es, s2 :> es) => Scoped T1 s1 -> Scoped T2 s2 -> Eff es R1
(s1 :> es, s2 :> es) => Scoped T1 s1 -> Scoped T2 s2 -> Eff es R2

so they’re trivially compatible. Of course, similar typeclass machinery can be layered on top of LazyT, but Bluefin’s already done it for you, so why bother?

Another reason is composibility with other effects. Bluefin provides Exception, State, Stream etc. for you for free, so they’ll be available when you’re processing your lazy ByteStrings.

Furthermore, lazy-scope typically requires MonadUnliftIO m, and Bluefin’s Eff is compatible with MonadUnliftIO, so you don’t lose anything.


A couple of notes on your design:

  • unScope is not safe because someone can write an NFData instance that violates its invariants.
import Control.DeepSeq
import Lazy.Scope
import Relude hiding (withFile)
import System.IO (IOMode (ReadMode))

data Bad a = MkBad a deriving Show

instance NFData (Bad a) where rnf _ = ()

-- MkBad "*** Exception: /dev/zero: hGetBufSome: illegal operation (handle is closed)
--
-- HasCallStack backtrace:
--  ioException, called at libraries/ghc-internal/src/GHC/Internal/IO/Handle/Internals.hs:347:20 in ghc-internal:GHC.Internal.IO.Handle.Internals
bad :: IO (Bad LByteString)
bad =
  withFile
    "/dev/zero"
    ReadMode
    (\h -> do bs <- hGetContents h ; unScope (fmap MkBad bs))
  • The NFData constraints on WithFile are redundant.
3 Likes

Thanks for review.
Bluefin must be a great library, but I’m a little afraid of an effect library.
I worked on a project using polysemy and I had terrible navigation experience in code.

Usually NFData instance is derived automatically by GHC, so bad instance is not likely.

1 Like

It’s not really a well-defined statement, but I want to say “Bluefin is orders of magnitude simpler than polysemy”. It really is as simple as simple as possible and no simpler: just arguments and a monad, tagged so that effects can’t escape scope, just like lazy-scope.

Instead of

Scoped s -> LazyT s m r

you would have

s :> es => Scoped s -> Eff es r

and then for free you would have the ability to use everything else that Bluefin offers, including the ability to use Bluefin state, exceptions and streams within withFile blocks (which is not possible with LazyT).

Have a look. I think you’ll like it! If you’d like my help porting lazy-scope to Bluefin then please open an issue.

eyes #29

1 Like

Sure, but you can still use Bluefin’s MonadUnliftIO in order to be compatible with lazy-scope and indeed in general you’ll need to use it to be compatible with the MonadUnliftIO ecosystem in general. It’s just that uses are not guaranteed to be safe.


I had another thought: LazyT has nothing to do with laziness. It is also a theory of scopes that applies more generally, so maybe ScopedT or WithScopeT would be a better name.

1 Like

What about exceptions during I/O? IIRC, lazy IO doesn’t throw exceptions, it just closes the handle silently when they happen. Does this library mimic that, or does it actually throw the exceptions?

1 Like

Good point! Handle scope is controlled by a bracket. An exception is just re-thrown.
So an exception thrown from lazy code smuggles thunk beyond the handle scope:

  withBinaryFile fn ReadMode (hGetContents >=> toLbs . fmap (throw . AssertionFailed . decodeUtf8))

SomeException does not require NFData instance from the wrapped exception and
best workaround so far is to catch, rnf showed value and rethrow.

1 Like

I don’t understand the design of this library, maybe I’m missing something.

If I understand this right, the problem is that if one reads from a handle with lazy IO, closes the handle, then consumes the read value, it will try to actually read from the handle and cause a runtime error. The culprit for this is lazy IO.

The solution in this library is to have a newtype over Handle which can perform lazy IO, but it cannot be manually closed. This makes the above error impossible, but it also removes some control: one can no longer partially read the input and manually close the handle. I can see that this trade-off might be desirable if one always wants to read the full input. But we get this from just the newtype.

What’s everything else for?

1 Like

app uses the library for handles which are not fully consumed because zlib archive usually ends way before EOF.

In that case you’re relying on GC to close the handle, which is “not generally recommended” (see System.IO.Handle).

But ignoring that, I still don’t see what everything else (scopes, NFData) is meant to solve.

Bracket is responsible for handle allocation and disposal.

hGetContents from Data.ByteString.Lazy relies on GC if exception happens or file content is not fully consumed, because function comment prohibits using hClose manually.

In the use case an average git commit file is about 500 bytes, but file size is at least 4kb and could reach 100mb. So at best only 15% is consumed. 400 000 files => too many open files.

Exact size to be consumed is not known beforehand:

Codec.Compression.Zlib.decompress :: LByteString -> LByteString
2 Likes

Ah I missed this, and I think I understand now.

To avoid the problem of lazy IO + manually closed handle,

  1. The library has its own withFile/withBinaryFile that creates a Handle newtype and closes it when done. The user cannot close this manually.
  2. The lazy bytestring could leak out of this withFile/withBinaryFile, so the result of reading the file is forced fully with NFData. Note that there are a couple of problems with this.
    1. It can adds unnecessary overhead if the result is big and already has no thunks.
    2. The lazy bytestring can still escape, if one puts it in an IORef for instance. This is true even if the IORef is created inside the bracket, since NFData for references don’t (and can’t) force the values inside.

Additionally there are two features completely unrelated to lazy IO, which added to my confusion.

  1. The Handle is “scoped” to make it unusable outside of its withFile/withBinaryFile.
  2. It uses MonadUnliftIO instead of IO.