I’ve been goofing off designing a data workflow monad at my new job. These workflows are in the vein of Apache Airflow, or the Haskell library FunFlow.
A workflow is an example of a Direct Acyclic Graph, which is a graph that has a distinct start and end node, and no loops. They’re not nearly as complicated as a full graph: they can be represented with an ADT.
While brainstorming, I realized that these are no more complicated than monadic functions (or arrows really). You have tasks which require certain inputs, and produce a single output. So a simple workflow might look like this:
taskA :: m A
taskB :: A -> m B
taskC :: A -> m C
taskD :: B -> C -> m D
end :: A -> D -> m ()
workflow :: m ()
workflow = do
a <- taskA
b <- taskB a
c <- taskC a
d <- taskD c d
end a d
Note that for our workflows, we know we will only ever produce one A, B, C, or D. So in a sense tasks are producing certain data, given the availability of other specific data. This is more limited than the above functions would allow.
Another way to represent these is with an ADT
data Workflow a where
TaskA :: Workflow A
TaskB :: A -> Workflow B
TaskC :: A -> Workflow C
TaskD :: B -> C -> Workflow D
End :: A -> D -> Workflow ()
It would be really nice to have a visual graph of this workflow, like the image above. To do this, I need to be able to generate a runtime representation of it (could be an adjacency list, or some sort of ADT, not the focus of the question).
Can you think of a way to generate a runtime representation of a workflow given a definition equivalent to the above? Or any way to generate a graph in a way that will only typecheck if it is correct?
EDIT: updated code to match example image