Design problem: Generate a graph from function calls

seanhess · August 14, 2023, 12:17pm

I’ve been goofing off designing a data workflow monad at my new job. These workflows are in the vein of Apache Airflow, or the Haskell library FunFlow.

A workflow is an example of a Direct Acyclic Graph, which is a graph that has a distinct start and end node, and no loops. They’re not nearly as complicated as a full graph: they can be represented with an ADT.

While brainstorming, I realized that these are no more complicated than monadic functions (or arrows really). You have tasks which require certain inputs, and produce a single output. So a simple workflow might look like this:

taskA :: m A
taskB :: A -> m B
taskC :: A -> m C
taskD :: B -> C -> m D

end :: A -> D -> m ()

workflow :: m ()
workflow = do
  a <- taskA
  b <- taskB a
  c <- taskC a
  d <- taskD c d
  end a d

Note that for our workflows, we know we will only ever produce one A, B, C, or D. So in a sense tasks are producing certain data, given the availability of other specific data. This is more limited than the above functions would allow.

Another way to represent these is with an ADT

data Workflow a where
  TaskA :: Workflow A
  TaskB :: A -> Workflow B
  TaskC :: A -> Workflow C
  TaskD :: B -> C -> Workflow D
  End :: A -> D -> Workflow ()

It would be really nice to have a visual graph of this workflow, like the image above. To do this, I need to be able to generate a runtime representation of it (could be an adjacency list, or some sort of ADT, not the focus of the question).

Can you think of a way to generate a runtime representation of a workflow given a definition equivalent to the above? Or any way to generate a graph in a way that will only typecheck if it is correct?

EDIT: updated code to match example image

jackdk · August 14, 2023, 12:22pm

You might be interested in Conal’s work on Compiling To Categories (and the stuff it spawned) as well as more recent research on Evaluating Linear Functions to Symmetric Monoidal Categories.

danidiaz · August 15, 2023, 5:33pm

Because there is only one way of producing each type, the “wiring” seams pretty mechanical. We never have to choose at the term level between two possible constructors for a given type.

This resembles the “wiring” problems that type-driven dependency injection frameworks help with, so a library like registry could be useful here.

I’ve been playing with a toy version of a dependency injection framework in my cauldron repo. Like you, I have a bunch of (non-monadic in my case) constructors and I want to wire them automatically and obtain a dot graph of the dependencies.

I took the following approach:

Uncurry (using my multicurryable package) each constructor so that it receives its arguments in an anonymous n-ary product form sop-core.
Wrap each uncurried constructor in an existential datatype (but remembering that all arguments and the result must be Typeable) and store them in a map.
Use a graph library to build a graph of dependencies between the types (the TypeReps identify the types.)
Check that there are constructors for all the required types, and that there aren’t any dependency cycles.
Topologically sort the TypeRep graph. This gives us a “build plan”, the correct order in which to build each component.
Build the components using a bit of Data.Dynamic magic.
Return the target component along with the dependency graph.

silky · August 15, 2023, 8:23pm

Perhaps you would be interested in Crem - GitHub - marcosh/crem: Compositional Representable Executable Machines by @marcosh

chrisdone · August 15, 2023, 8:29pm

One approach to graph a monad is to make all actions in your semantics return an applicative, and accept such applicatives as arguments. I used this to define a build system which was capable of describing what steps would be run.

Sunroof or Rel8 are all monads whose actions return “Expr a” rather than “a”, and all actions take “Expr a” as an argument, which makes it easy to graph them and generate source code from them.

Their implementations are complicated but the idea isn’t. I’ve done it using a free monad and a free applicative. A tagless final approach can also work: in the “real” run the applicatives are just Identity, and in the “graphing” run the applicatives are node IDs or the composition thereof.

seanhess · August 18, 2023, 6:12pm

Thanks everyone for your answers! I’ve been reading and thinking about them.

I’m struggling to implement either of these approaches. Is there an example online somewhere illustrating either of these techniques? I think I understand the gist of them, but I keep getting stuck.

seanhess · August 18, 2023, 7:14pm

The difficulty comes when I try to create a Graph interpreter (or a list, whatever). I can’t figure out how to convert the tasks into a graph automatically, based on the definition of workflow

blamario · August 24, 2023, 11:15pm

You can use Template Haskell to walk the GADT type representation and generate the graph in whichever format you prefer. I’ve done something similar here a year ago.

bolt12 · August 25, 2023, 8:54am

You might be interested in reading Selective Applicative Functors there’s a section on static analysis for approximating dependencies for build systems. It might provide more food for thought if you require branching/conditionals in your tasks