Using the Garbage Collector to free user resources

Is it possible to use the RTS garbage collector to free resources allocated in user-land when there are no longer references to it?

It seemed as if addFinalizer from System.Mem.Weak could be what I was looking for but I couldn’t figure it out.

Something along the lines of

allocateResource :: IO Resource
allocateResource = do
   res <- Internal.allocateResource
   addFinalizer res (Internal.freeResource res)
   pure res

Such that the code calling this function would receive a Resource tracked by the GC and when there were no more uses of that resource it would be freed with Internal.freeResource

Does anyone have thoughts on this? :slight_smile:

2 Likes

That looks precisely how I expect it to work :slight_smile:

2 Likes

Interesting, I did so with a trace on the free resource but it seemed to never fire. Are there any obvious pitfalls?

Perhaps setting up a minimal example is a good idea.

So in theory I should be able to use Haskell’s garbage collector to manage and free user land resources ?

Maybe the resource wasn’t actually freed, i.e. still referenced somewhere…

Ah, you are referring to res from res’s finalizer. Not sure if that will work. I’d suggest to try something like

data TrackedResource = TrackedResource Resource -- find a better name…
allocateResource :: IO TrackedResource
allocateResource = do
   res <- Internal.allocateResource
   let tres = TrackedResource res
   addFinalizer tres (Internal.freeResource res)
   pure tres

I am not a complete expert here, and maybe you need to deal with “weak pointers” here, see

https://hackage.haskell.org/package/base-4.17.0.0/docs/System-Mem-Weak.html#v:addFinalizer

for more documentation.

3 Likes

You need to watch out what type of res you attach a weak finalizer to! Caveats:

  • GHC optimization may unbox/rebox datatypes. So the same res value may be backed by a different closure at runtime, and the finalizer may fire prematurely.
  • GHC parallel garbage collector may create multiple copies of the same closure to avoid CAS overhead (see rts/sm/Evac.c · master · Glasgow Haskell Compiler / GHC · GitLab, some closures are copied using copy_tag_nolock instead of copy). So again, same Haskell value may be different closure some time later!
  • If you’re using nonmoving garbage collector, there were some recently merged fixes that enable weak finalizers to fire, but the fixes will only be in 9.6 IIRC

In general, you should not attach a weak finalizer to any lifted value given all the footguns. The standard libraries do use weak finalizers, but the finalizers are attached to the underlying unlifted closure like MutVar# and etc, those closures will stay the same throughout that Haskell value’s lifetime.

For your use case, I suggest adding a dummy IORef to your resource datatype, and use mkWeakIORef to add finalizer to it, instead of adding finalizer to the resource type itself. It’s fine to throw away the Weak, the finalizer will be effective even if Weak# is unreachable.

EDIT: I played a bit further with the idea of a lightweight Ticket type that one can add to their resource datatype for robust finalization logic, the implementation has less overhead than using IORef: StgTicket.cmm · GitHub

5 Likes

Relying on the GC for releasing resources might be ok as a last-ditch measure, but I’m uneasy with it being the main way of releasing them.

What if one runs a program in a machine with finite Resources but infinite memory?

If all those Resourses are still being actively used elsewhere but more are needed, then presumably the program fails in some way, by throwing an exception or raising an error.

On the contrary, I think it’s essential that the GC steps up to fill this role, otherwise we’ll be forever relying on things like the bracket pattern, a cost we would never pay tolerate for managing memory. Why should we tolerate it for managing other resources?

1 Like

You can add System.Mem.performMajorGC to flush all (?) the unused resources.

Exactly - just imagine having to deal with the likes of:

  • freeSTRef :: STRef s a -> ST s ()
  • closeIORef :: IORef a -> IO ()
  • endMVar :: MVar a -> IO ()
  • discardTVar :: TVar -> STM ()

…and the rest of them: it would be a (bad) joke! So why, oh why are annoyances like:

  • hClose :: Handle -> IO ()
  • finalizeForeignPtr :: ForeignPtr a -> IO ()

allowed to “hang around” like a bad smell?

1 Like

I actually don’t know if IORefs, MVars and TVars depend on OS-level synchronization primitives which are limited in ways other than memory (like, maximum number of OS-allowed semaphores or whatever). If they do, they would be a good analogy for an hypothetical GC-driven hClose, yes.

I guess that what I don’t like about GC-driven release of resources which are limited in ways other than memory is that:

  • It ties unrelated things together.
  • It lacks promptness. And solving the lack of promptness by manually triggering GCs seems like a blunt instrument which takes us back to manually releasing resources.

  • Using GC - it lacks promptness;
  • Using dismissives like hClose - to get incrementality along with prompt resource release requires pipes, conduits, tubes, and other contortions.

Perhaps we should consider dispensing with the GC-collected heap altogether…

This question in the CS Stack Exchange seems related to the theme of this thread: Why does garbage collection extend only to memory and not other resource types?

In particular, I find this answer compelling. Memory is more “fungible” than, say, file handles, and that makes it more amenable for GC.

Perhaps, but it seems like a weakness in the OS if it only grants access to resources on an exclusive basis. In any case, I can see it’s an uphill struggle to bring automatic management of other OS resources onto par with automatic management with memory; I just wish someone would make that struggle for the benefit of all!

…with the old Lazy ML/Haskell B. implementation from Chalmers being an early example: see filegcfinish() in LML/runtime/file.c (line 296) from the sources I refer to in this post.

Languages like C# encourage use of the Dispose pattern to deal with optional early disposal.

E.g., the finaliser will call the protected Dispose method, which must immediately exit if it had been called before.

It is good practice to call Dispose in a using statement

using(var socket = new Socket())
{ 
    ... 
}

Which (similar to Java’s resourced try(...) { } blocks) desugars to

{
    var socket = new Socket();
    try
    {
         ...
    } 
    finally
    {
         socket.Dispose();
    }
}

In Haskell, we’d use bracket to do the same.

The appeal of the Dispose pattern is that it works even if the programmer forgot or can’t use the using block (perhaps its lifetime is non-lexically scoped) while it centralises the cleanup logic in one place.

Perhaps it makes sense to have a library like

class Disposable args d where
  create :: args -> IO d
  dispose :: d -> IO ()

wrap :: (d -> IO ()) -> d -> IO ()
wrap = ... -- implement Dispose pattern

scoped :: Disposable d => args -> (d -> IO r) -> IO r
scoped args run = bracket (create args) (wrap dispose) run

unscoped :: Disposable d => args -> IO d
unscoped args = mask $ \restore ->
  d <- create args
  ... add finaliser for `wrap dispose d` ...
  return d

Note that it is always possible to call dispose on a d in a scoped block or on a d returned by unscoped. It is the job of wrap to implemnt the Dispose pattern (e.g., needs a Bool tracking whether disposal happened already. I just realised that it would need to return IO (d -> IO ()), but I hope you get the idea). Perhaps that’s a useful library; at any rate I haven’t needed it so far.

Do also note that this makes me aware of a potential issue of allocateResource in the OP: It doesn’t mask exceptions. If allocateResource resource where to be canceled after Internal.allocateResource but before addFinalizer, you’d get a resource leak.

So perhaps such a library would pull its weight.

3 Likes

A ReleaseKey attached to a weak pointer? :thinking:

1 Like

Yes, that’s exactly it. Call allocate and release the ReleaseKey early to call the cleanup action before GC claims the key. Nice that this pattern already exists. No need to mingle with weak pointers directly.

The details of weak pointers and finalizers are explained in the paper Stretching the storage manager: weak pointers and stable names in Haskell.

The finalizer can refer to the key res without adverse effect — unless you keep a reference to the closure (Internal.freeResource res) yourself, the closure would be garbage collected, and that’s why addFinalizer introduces the rule that the closure is reachable if the key is reachable.

Put differently: In order for the closure (Internal.freeResource res) to keep alive the key res, it has to be alive in the first place.

Depending on how you set up your trace, a finalizer that blocks forever or throws an exception may not be visible.

But the most likely explanation is that your code still keeps a reference to the key and thus can’t free it. :man_shrugging: Does it work for a minimal example where you immediately forget about the res value and System.Mem.performMajorGC afterwards?

I view this as a design choice rather than a general rule — if I want to explicitly control the lifecycle of a specific resource, then I have to deal with IO and explicitly schedule the release of the resource; but if I don’t want to explicitly control the lifecycle, then using finalizers and the garbage collector is perfectly fine.

There was a time when controlling memory explicitly was important, because 640k was enough for everyone. Fortunately, those times are behind us, and we can afford to write programs in a way that is not explicit about the use of this resource, making them simpler to express.

5 Likes