Pre-HFTP: GHC should offer low-level logging infrastructure

Logs are a critical thing in production systems and I would like to start a discussion about bringing low-level support for efficient logging into GHC

Context (in Haskell)

Something I see quite often in Haskell applications is the interleaved outputs of two or more logging functions, from the main application that needs to log something, to libraries in dev mode that trace their execution (like printing SQL queries on the terminal as they appear).

Michael Snoyman wrote a blog post in 2016 about a couple of concurrency problems in Haskell, including that putStrLn is not thread-safe: Haskell's Missing Concurrency Basics

And for some years I have been using @hackage/log-base for structured logging with multiple backends, which has been a delightful experience, since shipping logs to an aggregator or just on the terminal is very easily done. A delight both at home and at work.

Context (in Erlang)

Before programming in Haskell I was using Elixir and Erlang, who have logging primitives included in the standard application framework shipped with the language (OTP): Logger — Logger v1.16.2

If you want a short read about it, see this article: Erlang/OTP 21's new logger

Fundamentally the idea is that one unified process (green thread) receives all these logging messages from the other processes, and so the distribution of the messages enforces the order in which they are printed on screen (or shipped off to ElasticSearch). Which rids us from interleaved outputs.

Let’s talk about it

Unified primitives for structured logging would in my opinion be a great way to progressively start unifying the ecosystem. Logging is systematically one of the first things one has to figure out when integrating Haskell into a tech stack.

While it is normal to have differences of opinions about the interfaces, one could easily pick a package like @hackage/katip or @hackage/log-base, and retain the guarantees of interoperability that GHC provides: Common log levels, structured logging, thread-safety, common backend interfaces.

I think that unifying the logging infrastructure of Haskell programs will ease its adoption in pre-existing tech stacks, since backends would be written not against specific libraries but against the APIs of GHC. So there would be an obvious gain in industrial adoption.

Edit: I initially proposed that it would live in the RTS (because C code fast), but having this live in user-land would be a perfectly valid solution.

Here is a diagram that captures the high-level flow of things:

16 Likes

Could you elaborate on why there is something missing in GHC, rather than a missing library? It’s not yet clear to me why what you’re asking for couldn’t be implemented as a library completely independently of GHC.

4 Likes

Dependencies aren’t free, licenses aren’t all universally compatible, and wiring some things in the RTS would give a performance boost, especially in order to ease the scheduling, help with backpressure, etc.

1 Like

Just to check I understand you, you’re saying what you’re asking for could be implemented in a library, but it’s better to implement it in GHC

  • so it becomes “free”, dependency wise
  • so that it has a license that is universally compatible, and
  • so that it has a performance boost not available to code written as a library

Is that right?

Yep, much like what was done in Erlang, but thanks for making me clarify.

Dependencies aren’t free

I don’t believe hiding the dependency as “part of GHC” is benefitting this.

licenses aren’t all universally compatible

I don’t think this is solved by putting them in GHC.

wiring some things in the RTS would give a performance boost

Does this actually need to be put in GHC or is this just a gripe about the state of haskell logging? (I say this because I do not know about wiring things in the RTS)

1 Like

Thanks, the “performance boost not available to code written as a library” is the part that I would particularly welcome further details about. I guess it has something to do with “one unified process receives all these logging messages from the other processes” where “process” here is something specific to Erlang’s runtime, and we’d have something similar in GHC’s runtime. Is that right?

I don’t believe hiding the dependency as “part of GHC” is benefitting this.

Please elaborate

Please elaborate

yes, process as in green thread, like what we have in GHC’s RTS today. :slight_smile:

1 Like

snarky XKCD about something different

Please read the bit where I say that we retain our various higher-level interfaces

You’ve now moved from “packages may have a dependency on a logging library if they need it” to “all packages now have a dependency on a logging library because it’s part of GHC”. Not sure how this is beneficial.

Similarly, “the licenses on these libraries aren’t compatible with <project>, so now we’ve just rewritten the hypothetical licenses to all be the same as the GHC license”. Again, not sure what is gained here.

Please read the bit where I say that we retain our various higher-level interfaces

I never thought that the higher-level interfaces would go away, but 2 of the 3 points are in this higher-level interface space and therefore are not solved by this.

Similarly, “the licenses on these libraries aren’t compatible with , so now we’ve just rewritten the hypothetical licenses to all be the same as the GHC license”. Again, not sure what is gained here.

Putting together a product with a mixed bag of licenses is not always a nice experience if you have a legal department that enforces compliance with some (internal) standard. Since GHC is already validated (because the idea is to introduce Haskell, or it has already been introduced), same license would reduce headache.

1 Like

Okay, but now you:

  • have the same problem if you want to use a higher-level interface (because you still have license hell).
  • are rolling an inhouse logger from low level primitives which you probably did anyway if you had problems with the higher-level interfaces.

Both of these seem like ecosystem problems, not “we need more things in GHC” problems.

I’ve been thinking about something similar recently, although I’m not sure if it’s exactly what you have in mind.

The GHC’s existing eventlog is pretty helpful in terms of a unified interface to get at both user supplied and RTS logging information. But I think there’s a big gap in the sorts of user supplied messages it can emit.

You can basically emit arbitrary strings, or arbitrary binary data. The issue with these is that as far as the eventlog is concerned these are completely unstructured. It’s normal for users to encode more structure but generic eventlog libraries don’t know about it, eg, GitHub - ethercrow/opentelemetry-haskell: The OpenTelemetry Haskell Client https://opentelemetry.io embeds opentelemetry data.

I think it would be good to add some more events to the eventlog that allow emitting structured events/traces largely modelled upon the openetelemetry data model.

This gets us a bit closer to what you are proposing but there is still a gap. The main way to consume the eventlog is to either write it to a file or pipe it to another process. But we’d ideally want to process this data as part of the main process on a Haskell Thread to make things user-friendly. I think something like this should be doable, since the eventlog interface has a configurable way to decide how to process this data, which GitHub - well-typed/ghc-eventlog-socket: Pipe the GHC eventlog stream to a UNIX domain socket uses to send it to a unix socket for instance.

So basically what I’m proposing is rather than adding a new logging system to GHC, we re-use the eventlog and we extend it in ways that allow us to achieve the types of things you mention. That gives us the advantage of having one unified logging system for both user and low-level logging (and also gives an impetuous to clean up or finish off a lot of the really neat experiments in this area).

5 Likes

My 2c, but I don’t think this should be part of GHC at all. You mention Erlang, but how many languages provide logging at the runtime level? There might be common logging interfaces like log4j in Java, or libraries in the std like log in Golang, but I’ve never heard of a language where the runtime contains such primitives. At the same time, your reasons don’t sound compelling to me (I share the opinion of other users who already replied to the post). Lastly, if it were to be added I cannot fathom the amount of bike-shedding it would involve; just look at how many comments are on adding a todo function to the Prelude: Add the `todo` function · Issue #260 · haskell/core-libraries-committee · GitHub

Though I agree that GHC and co. should make it easier to analyze runtime information (last time I tried to use ThreadScope I could not get it to start at all), but this is completely different from application business logic logging.

1 Like

Yes after some thinking, I’m not wedded to it being wired into the RTS. It can live in userland. What do you think about the rest of the proposal?

2 Likes

Well, it would be nice if there were a “blessed” logging library but I find it quite hard for that to happen since it would need to reach MTL/transformers levels of adoption. For example, lens is not ubiquitous despite being one of the nicest solutions to the Records problem.

My suggestions would be to take an existing logging library and contribute as much as possible to make it the “default” solution by improving docs, compatibility with multiple sinks, runtime control, etc.

5 Likes

just popping in to say that GHC already has a pretty-printing-plus-IO library which could be extended into a logging one

https://hackage.haskell.org/package/ghc-9.8.2/docs/GHC-Utils-Outputable.html

1 Like

Oh but that’s entirely something else :smiley: Outputable is something for the compiler, and incidentally exposed because the GHC API exposes everything. Users of base or ghc-prim or ghc-experimental don’t have access to Outputable because this goes away after compilation.

What I miss from Python/Rust is a way to configure the individual component logging’s verbosity. Should this proposal define a common configuration format? For example using such a prefix: env HASKELL_LOG="warp=debug,http-client=warning"