I got rid of `ReaderT` and now my application is hanging by a thread

danidiaz · April 14, 2024, 8:41pm

I’ve been experimenting with a way of structuring Haskell applications where components are records-of-functions, and when one component is a dependency of another it gets passed as a parameter of the latter’s constructor function.

So no effect systems, no MTL-like typeclasses, no monad transformer stacks, no ReaderT even. Or so I thought!

Turns out that ReaderT was still needed in one case: request-scoped values. For example, in Servant it’s a common pattern to reserve a database connection from a connection pool at the beginning of each request, and then pass down the connection in the environment of a ReaderT.

Eric Torreborre makes a similar point about the need for ReaderT at 33:05 in his “Wire all the things” talk, which is about a wiring system for applications (among other things).

But I was somewhat unsatisfied with that use of ReaderT because:

It’s a transformer, and I was trying to see if I could do without them. The idea was to simply pass records around as regular arguments. Can we do everything that way?
It forces the component definitions to either know about the ReaderT, or to be polymorphic on the monad. Not a big deal, but what if I wanted simple non-polymorphic functions working in IO?

As an alternative to ReaderT, I turned to a form of thread-local data. The idea is that we have an IORef containing a map indexed by ThreadId. At the beginning of a request, we get a database connection from the pool and associate it in the map with the current ThreadId. Downwards in the call chain, the repository component is injected with the map (actually, with a view of it) and gets the connection corresponding to the current ThreadId from it.

As for the intermediate components in the call chain, they are not injected with the map and don’t know anything about the current connection. (Formerly, the “don’t know anything” would be accomplished by being polymorphic on the monad.)

It works, although I’m unsure if it’s improvement. Possible disadvantages:

Less type-safe. If we forget to set the connection in the map, the application will compile, but fail at runtime when the repository tries to get the connection.
There will be concurrency overhead in managing the ThreadId-indexed map.

This is the branch that uses ReaderT. And this is the branch that uses thread-local data. This is the diff.

winterland1989 · April 15, 2024, 3:11am

I had this idea before, that we actually could have first class thread-local support in Haskell:

we could add a heap pointer field in TSO, then provide update and set primitives for this field.
then we could add some type safe version of IO env a which limited the type of current thread-local, with helper like readTLS :: IO env a -> IO env env, withTLS :: env -> IO env a -> IO x a
The main function could start with main :: IO () a
The runtime cost is negligible, we always have the current TSO reference in register, so read & write will be a simple pointer read & write.
The type system will ensure no type error between read and write.
We declare it’s an undefined behavior if user store updating function and call it in another thread.

But this require a giant change in the IO type, which i think will not going to happen.

danidiaz · April 15, 2024, 6:51am

The HaskellWiki has an old (last edited 2006) page with some proposed designs for thread-local storage.

tomjaguarpaw · April 15, 2024, 8:59am

This is very interesting to me, because recently I have been interpreting effectful as an “effect-tracked ReaderIO pattern” and my new effect library Bluefin as an “effect-tracked record-of-functions/handle pattern”. I think the record-of-functions/handle pattern has a lot to recommend it.

I didn’t understand this bit. Why can you not pass the database connection around at the value level, just like the records of functions/handles?

jship · April 15, 2024, 2:59pm

There might be some food for thought available to your approach by taking a look at the context package. That package provides thread-local storage wrapped up in an interface that is similar to ReaderT's with enough squinting, e.g. mine to get the thread’s current context value as opposed to ask. It uses the same encoding you’ve discovered - IORef over map of ThreadId - with a few additional bells and whistles, like maintaining a stack of context values to support arbitrary nesting (the conceptual equivalent of ReaderT's local).

danidiaz · April 15, 2024, 7:24pm

The Connection is an implementation detail of the Repository. It’s not reflected in the Repository's record “interface”, but hidden inside the closures returned by a particular Repository constructor.

If we added a Connection parameter to the functions of all users of Repository, we would be coupling them to a particular implementation of the Repository. But what if we wanted to use an in-memory Repository for testing, or simply a Repository implementation for a database other than Sqlite?

One way of solving this infelicity is using ReaderT + polymorphism. The idea is:

Parameterize all record definitions by an effect monad. The Repository record and all of its callers, direct and indirect.
The constructor of the repository record returns something like a Repository (ReaderT Connection IO). That is: it knows the concrete monad and that it can obtain a Connection from it.
The constructors of components that depend on Repository are polymorphic on the monad. Something like makeFoo :: forall m. Repository m -> Foo m. They will use whatever effect monad their Repository dependency uses. This way, they don’t need to know that there is such a thing as a Connection!

This works, but requires the use of a monad transformer and extensive use of polymorphism. Unlike the solution with thread-local values.

tomjaguarpaw · April 15, 2024, 7:51pm

I’m not understanding something. Why can’t it be

make ::
  Logger ->
  CurrentConnection IO ->
  CommentsRepository IO

I still don’t see what role the ReaderT plays.

danidiaz · April 15, 2024, 8:13pm

Sorry, I was a bit misleading. In my toy application, it’s actually the CurrentConnection constructor the one which uses ReaderT / MonadReader.

Suppose the make constructor had the signature you mention. From where would CurrentConnection IO actually get the Connection? The connection for a request can only be obtained from the pool while the app is already “running”. But the wiring of the component constructors happens before that.

Another thing that I didn’t make explicit is that we would like to reuse the same connection across multiple calls to the Repository, in order to group them in the same transaction in a way that is transparent to the business logic. That’s why the Repository can’t simply obtain the connection from the pool and return it each time it’s invoked.

tomjaguarpaw · April 15, 2024, 9:27pm

Thanks, this is interesting. I think that what I was missing was that this is all cooked up in a cauldron. That is, some sort of static wiring which is independent of anything that happens dynamically? If that is so then I’m not surprised that you have to do your dynamic behaviours in a monad.

I’m not familiar with cauldron-oriented programming, so if I was doing this I would have written

make :: Connection -> IO CurrentConnection

which I would call wherever I get the connection from, and then pass to the make which makes a Repository, and so on. The fact that the IO is on the outside of CurrentConnection makes the wiring “dynamic”. If you want to be “static” then indeed I can see that you have to put the Connection -> IO ... on the inside (in the form of ReaderT Connection IO, as a parameter to CurrentConnection.

I’m not familiar with this cauldron style architecture. What’s the benefit of having the static wiring, as opposed to doing everything dynamically?

sgraf · April 16, 2024, 6:01am

This cauldron stuff looks a lot like dependency injection containers to me. I remember from my time enjoying c# that these frameworks are quite flexible and you can partially wire up constructors as well; meaning that it would be possible to cook up a function that just takes a DB connection to return an otherwise completely wired CurrentConnection.

I enjoyed AutoFac quite much: Getting Started — Autofac 7.0.0 documentation
Have a look at “Application Execution” which discusses things like lifetimes. You will need such lifetimes if you plan to manage DB connections with it, I think. Still, just passing the DB connection as a function parameter seems the simplest solution to me.

tomjaguarpaw · April 16, 2024, 7:36am

Yes indeed it is, according to the README.

cauldron is a library for dependency injection. It’s an alternative to manually wiring the constructors for the components of your application.

I think maybe I just don’t understand what dependency injection is for.

danidiaz · April 16, 2024, 6:41pm

Indeed, I was trying to emulate some features from the DI container I’m most familiar with, the one from Java’s Spring Framework.

In Spring, the components (called “beans”) can have different scopes (in the sense of “lifetimes”). Some beans are long-lived. Others are created and deleted for every request. And yet, one can inject a request-scoped bean inside a long-lived bean!

If you want to inject (for example) an HTTP request-scoped bean into another bean of a longer-lived scope, you may choose to inject an AOP proxy in place of the scoped bean. That is, you need to inject a proxy object that exposes the same public interface as the scoped object but that can also retrieve the real target object from the relevant scope (such as an HTTP request) and delegate method calls onto the real object.

Investigating how this trickery is actually accomplished, I found this explanation in Stack Overflow:

whenever you invoke any methods on that request scoped proxy , it will somehow get the actual bean instance by calling RequestScope#get(beanName,objectFactory) which in turn get the bean from the RequestAttributes . The RequestAttributes will be stored in the ThreadLocal such that each thread will have its own instance. If no beans can be retrieved , it will instantiate a new one and store it to the RequestAttributes .

So the implementation involves thread-local variables. Which is what I tried to replicate here.

sgraf · April 17, 2024, 8:32am

I think maybe I just don’t understand what dependency injection is for.

In OO, it is useful to write your classes (i.e., implementations) such that they depend on other code through interfaces, rather than implementations thereof, in order to allow for maximum (unit) testability and loose coupling.
So for example, if some domain logic (Foo) needs to load entries from the database, that code will depend on an interface IDatabase rather than creating and calling a MySqlDatabase directly, so that unit tests can provide stub implementations.

Dependency injection is just the process of providing concrete instances (MySqlDatabase) to code (Foo) depending on an interface that this instance satisfies (IDatabase). There are different ways to achieve this injection process, but to us functional programmers the most natural choice is to pass dependencies to the constructor of Foo.

Now your main entrypoint will look like a huge mess, trying to wire up all the dependencies.
What’s more, in OO, you want some of those dependencies to resolve to the same singleton instance (shared mutable state), whereas other, stateless dependencies can be instantiated as many times as wanted. Some other kind of dependencies (e.g., pooled database connections) need some special allocation logic and have a lifetime/scope associated with them.

So this wiring quickly becomes very complicated; hence the need for dependency injection containers to specify how this wiring should happen declaratively.
Bonus point: Once you declared all the wirings, you never need to worry about “Oh, in order to implement this new logic, I need an instance of IDatabase here. Now I have to thread through 7 other classes only to pass it to the constructor of my current class. URGH!”

I honestly don’t know many other big code bases besides GHC, but prior to the ongoing modularisation work, we simply passed around the “god class”/“service locator” DynFlags. Now we do a better job, at the cost of introducing the kind of wiring I alluded to. This entire directory hierarchy in GHC’s code base could be wired-up automatically at use sites, I think: compiler/GHC/Driver/Config · master · Glasgow Haskell Compiler / GHC · GitLab
I imagine that one could use a DI container to say “just give me THE LlvmConfig here”.
Admittedly, the gain just for static configuration is low; but for web servers the scoped lifetimes usage might be useful.

Do also note that throughout GHC’s code base, dependencies are tightly coupled. An example in the small: We use transformers, but not mtl (the former is implementation, the latter is interface). In the large: There is no way to swap out the implementation of the Simplifier for some stub code with the same interface. It has not been a problem so far because we (1) never expect that we need a different implementation, also because (2) we have no unit tests, only many great end-to-end golden tests.

So I can’t really judge if there’s a use case in real-world Haskell code; perhaps there is.
Certainly if you begin to replace type classes (which wire up “automatically” but need types as guidance) by records-of-functions that can be instantiated in many different ways (i.e., something resembling Java/C# style interfaces), you give DI containers more to chew on. But again, I can’t tell without a concrete use case, also because some of the issues such as singleton instances are a non-issue.

sgraf · April 17, 2024, 10:57am

FWIW, I like the code without MonadReader (which is not just ReaderT) better. Parameterising all your functions over some monad m seems very antithetical to what you want to achieve; namely have code announce all its dependencies and just get passed those, not anything else.

How you technically achieve it to pass me just the dependencies of the current lifetime scope is not important to me and can be part of the abstraction you offer (but that abstraction should better be reliable).

You say you have some IORef with a Map – be careful about threading. Better use MVar.
Also I would be surprised if OS-based TLS would work reliably with GHC Haskell because of green threads.
I.e., IIRC, ThreadId names the Id of a green RTS thread, not an OS-level thread id. It doesn’t make sense to key your map by OS-level thread id, hence it doesn’t make sense to use TLS for the job. Any kind of (emulated) TLS support would have to be part of GHC’s RTS.

Regarding

Less type-safe. If we forget to set the connection in the map, the application will compile, but fail at runtime when the repository tries to get the connection.

Surely this is an issue of your library’s implementation and not of the library’s users, right? Type safe uses of your library should not lead to crashes. How you achieve that under the hood, e.g., by abandoning type safety, I don’t care.

etorreborre · April 17, 2024, 12:56pm

Hi Dani,

I just want to share my experience on the matter. Even if we had polymorphism for our components and we used ReaderT we were still confronted to an issue. We had:

A low-level Sql component with query functions. The implementation of those functions require a Connection.
Some components, RepositoryX, RepositoryY, BusinessX (depending on RepositoryX), BusinessY (depending on RepositoryY), etc… where:
- For BusinessX, every time a function on RepositoryX is called, the corresponding query executed with the Sql component needs a fresh connection.
- For BusinessY, all the function calls on RepositoryY, must use the same connection with the Sql component.

We eventually ended up with a ReaderT (Maybe Connection IO) as our base monad for all components, and used 2 implementations for the Sql component:

One that would just hope that the caller of Sql.query has put a Connection in the ReaderT context
One that takes a connection from the pool and puts it in the ReaderT context every time Sql.query is called.

This could have been made more type-safe but at the expense of more awkward types where the components wiring happens (we tried that for a moment).

We had a somewhat similar situation with the passing of a query id, which is needed in many places, but not all.

tomjaguarpaw · April 18, 2024, 8:18am

Thanks. To test if my understanding is correct I’ll try to rephrase in language that is more familiar to me.

We want particular operations to be available at various locations in our app. Suppose we want to write something like

Logger -> Random -> Accumulator -> App

We could pass the Logger, Random and Accumulator around by hand but that’s tedious. We could pass them implicitly via type classes, but that’s inflexible. We’d have to write

class (Logger m, Random m, Accumulator m) => App m where
   ...

instance (Logger m, Random m, Accumulator m) => App m where
   ...

but then we have to come up with a newtype for each different collection of behaviours.

Instead, using a dependency injection framework, we write the original thing

Logger -> Random -> Accumulator -> App

and the cauldron or registry stores it, and similar things, and works out how to wire them itself. If I request an App it will implicitly request a Logger, a Random and an Accumulator and apply my function to create an App.

It seems we still have the coherence property of type classes in the sense that there can only be one way of making an App within the same registry/cauldron. Is that right? But unlike type classes, we can define these things “locally” to switch out behaviours. (Perhaps it’s a bit like reflection then?)

I’m basing this example on @etorreborre’s example at effects/src/Modules.hs at master · etorreborre/effects · GitHub

sgraf · April 18, 2024, 8:34am

Yes, that seems right.

Note that you might no longer need a huge App “service locator” now in the first place; you can simply
write a function handleRequest :: DBConn -> Logger -> Random -> HttpRequst -> IO () and expect the DI container to fill in all the parameters it can (perhaps needs some Typeable magic). If my handler does not need the DBConn, I can just leave it out and the DI container still knows what to do. Nice and compositional.

It seems we still have the coherence property of type classes in the sense that there can only be one way of making an App within the same registry/cauldron. Is that right? But unlike type classes, we can define these things “locally” to switch out behaviours.

Yes, this “locally” part I think is a big part of what makes the approach attractive. You can “inherit” from your existing registry/cauldron to provide the DBConn and HttpRequest just for the lifetime of this particular HTTP request. I think this local/scoped lifetime part is what the OP is about.

(Perhaps it’s a bit like reflection then?)

I think that’s true, but I would never use reflection to provide a DBConn… That feels rather dirty. But perhaps that’s just for a lack of precedent.

etorreborre · April 18, 2024, 8:42am

That’s exactly it.

But unlike type classes, we can define these things “locally” to switch out behaviours.

This is also why I started to use the same approach for data generators and encoders/decoders when the serialization protocol evolves over time. Typeclasses are wonderful but have the possibility to tweak the instance resolution is very useful.

With registry I went as far as being able to say “change this instance but only if it is used in that context in the dependency graph”. For example, use a Text generator with only capital letters if I am generating the Name of an Employee.

tomjaguarpaw · April 18, 2024, 8:55am

Do you have an example of this that you could point me to? I don’t quite get how it would look.

etorreborre · April 18, 2024, 9:18am

In this example, I use an existing registry and override it with a different generator for the Department name:

registry12 :: Registry _ _
registry12 = specializeGen @Department genDepartmentName registry3

genDepartmentName :: Gen Text
genDepartmentName = T.take 5 . T.toUpper <$> genText

The function specializeGen @t (Gen a) Registry says:

In the context of generating values for t, if you need a Gen a, take that one instead of the one specified in the Registry

And in the general registry library there is a function specializePath which can do the same thing for a list of types t1, t2, t3:

In the context of creating t1, when t1 requires a t2, when t2 requires a t3 then use this value instead

I haven’t made yet a specializeGenPath function for Hedgehog generators but I suppose that could be done.