Lazily consuming a self-referential linked list

I grew tired of explicitly writing fixed-point/worklist algorithms so I tried to create a combinator for it, the signature looks like this:

run :: [q] -> s -> (q -> s -> [q] -> ([q], s)) -> s

Which is to say: Given any queue of workqueue jobs, an initial state and a function to perform a job by updating the state and enqueuing new jobs, compute the final state.

I first used Data.Sequence.Seq for the queue for more efficient FIFO semantics, but I thought I could do better, hopefully without any concatenation, so I ended up doing this:

run :: [q] -> s -> (q -> s -> [q] -> ([q], s)) -> s
run initial state action
  = let
      (qs, state') = go (initial ++ qs) state action
    in state'

This is awesome! The function can access their own generated queue elements and process them all, I implemented go like this:

go :: [q] -> s -> (q -> s -> [q] -> ([q], s)) -> ([q], s)
go qs s0 f = case qs of
  [] -> ([], s0)
  q:rest -> let
      (qs1, s1) = f q s0 qs2
      (qs2, s2) = go rest s1 f
    in (qs1, s2)

This threads the state forwards and the list backwards, but hides one caveat.

In the case where the list is (finally) depleted or the run function is called with an empty queue, this happens:

run [] undefined undefined
-- expands to
let
  (qs, state') = go ([] ++ qs) undefined undefined
in state'
-- also expand go, rename variables for clarity
let
  (qsOuter, state') =
  let
    qsInner = [] ++ qsOuter
    s0 = undefined
    f = undefined
  in case qsInner of
    -- omitted
in

Which is just a very elaborated way to say:

let (qs, state') = case qs of
  -- blah

Is there a way to correctly detect the end of the list?
My only idea is a counter which keeps track of how many items were consumed/generated (it is possible to keep track of this.), list evaluation would immediately stop when the counter reaches zero. But this feels like cheating.

I think you could condense go to look something like:

go :: [q] -> s -> (q -> s -> [q] -> ([q], s)) -> ([q], s)
go qs s0 f = case qs of
  [] -> ([], s0)
  q:rest -> let
      (qs1, s1) = f q s0 rest
      newQ = rest ++ qs1
      in go newQ s1 f

Unless I’m missing something.

Your original approach causes an infinite loop because the recursive call to go depends on s1 whose result depends on qs2 which is produced by go.

Applying your suggestions in the quote produces this code:

go :: [q] -> s -> (q -> s -> [q] -> ([q], s)) -> ([q], s)
go qs s0 f = case qs of
  [] -> ([], s0)
  q:rest -> let
      (qs1, s1) = f q s0 rest -- exchanged qs2 for rest
      (qs2, s2) = go qs1 s1 f -- exchanged rest for qs1
    in (qs2, s2) -- exchanged qs1 for qs2

I’m not quite sure what kind of behaviour this would be, I feel like the run would need to be adjusted as well. Maybe what you’re aiming for is this?

go :: [q] -> s -> (q -> s -> [q] -> ([q], s)) -> s
go qs s0 f = case qs of
  [] -> ([], s0)
  q:rest -> let
      (qs1, s1) = f q s0 rest
    in  go qs1 s1

Which would produce a LIFO queue, unless concatenation is used.

I’m afraid the condensed go function might perform poorly when a lot of tasks enqueue two more tasks, leading to catastrophic concatenation. I might just try it… premature optimization is the root of all evil, after all.

The order is like this because it allows the list that is build-up to travel-backwards along the computation, like it would with TardisT. This enables cheap list-building (only cons, no concatenation) while preserving FIFO semantics.

Ah I see, that pretty neat! And beyond my skill level xD.

I’ve had a bit more of a look and the circle I’m trying to square is the (q → s →[q] → ([q],s)) function. The question is where do you get this q element from if [q] is empty?

A minor change to the type signature to make it (s → q → (s, [q])) means then this function can also have some resposible logic for determining when to stop processing.

run3 s0 [] _ = s0 -- queue is empty stop processing
run3 s0 (q:qs) f = let
  (s1, newQ) = f s0 q
  in run3 s1 (newQ ++ qs) f
  -- | swap the order of the (++) and `use reverse` to get LIFO/FIFO behaviour 

test2 initState queue = run3 initState queue someFunc
  where someFunc s qElem
      --  | some rock solid cases that stops new list elements being generated indefinitely

I’m not totally sure if that’s helpful as I’m struggling to grasp the behaviour of Tardis.

Part of me feels like this is also achievable with nested foldl/foldrs but I haven’t looked that far into it.

Sorry I couldn’t help more!

I thought so too, but apparently it reliably blows up the stack and my machines memory x) so it may as well be above mine

This is the type for the function performing the work. I apologize for the bad naming: q is meant to be the work queue element type, it holds some information about the work that needs to be done. s is the state that can be accessed and altered in every step. The last argument ([q]) is the work queue itself, or rather, the end of the queue. By prepending to the end of the work queue, new work can be scheduled to be performed before the work that has not yet been scheduled.

This is all very backwards, because of this ingenious lazy list, which I am not so fond of anymore.

Be assured, I’m happy I sparked your interest and you are being helpful.

I agree, it should be achievable, I’ll explore it probably.

Reasoning about self-recursive types for a second, does it make sense that a self-recursive type can completely consume itself?

If we take just a typical infinite list, we know that we cannot guarantee that we process all of it; we must either work on only some subset of it (take a prefix or search through it), or perform an operation that occurs over the entire list lazily (fmap or foldr with a function lazy in the second argument).

What you’re trying to do here is produce a potentially infinite list as well as consume it in its entirety, and I think that that has issues.

I spent some time trying to use lazy writer and Endo to get some “efficient” concatenation but kept running up against looping code and end of list evaluations. I’d be interested in a solution if someone finds one!

You’re right, if worded that way it does sound quite impossible.

I don’t think I have a solution. I’m in a position where I can control/count the additions to the list and keep that as a balance. As soon as the balance reaches zero I stop evaluating (end-of-list divergence), more of a workaround than a solution, really.

Just process the batches explicitly instead of knot-tying?

run :: [q] -> s -> (q -> s -> [q] -> ([q], s)) -> s
run initial state action = loop initial state
  where
    loop [] s = s
    loop qs s =
      case go qs s action of
        ([], s')   -> s'
        (qs', s')  -> loop qs' s'
 
go :: [q] -> s -> (q -> s -> [q] -> ([q], s)) -> ([q], s)
go qs s0 f = case qs of
  [] -> ([], s0)
  q:rest ->
    let
      (qs1, s1) = f q s0 qs2
      (qs2, s2) = go rest s1 f
    in
      (qs1, s2)

Not sure if that’s what you want, it’s a bit hard to tell without examples.

The code was produced by GPT-5.5.

This looks to me as if the [q] is really part of the state being modified:

data PipelineState = PS {globalState :: s, jobs :: [q]}
run :: PipelineState -> (q ->  PipelineState -> PipelineState) -> s

and supposedly the final s is really a PS s [], no more jobs to do. Now I wonder: Why does the action argument of run have an extra q argument at all, when it could pop one off the [q]? Perhaps that is the missing occasion for pattern-matching on the empty list? So suppose you could merge run and go into a single recursive function that does not need the extra q for non-emptyness. Suppose further that we agree the final state is just a special PipelineState. Now a function with the signature

run :: x -> (x -> x) -> x

looks like something that seeks the least fixed point of an endo-function above a given starting value, as per Kleene’s fixed point theorem.

Perhaps you just want this?

run :: (s -> q -> (s, [q])) -> s -> [q] -> s
run _ s [] = s
run f s qs = case mapAccumL f s qs of
  (s', qss) -> run f s' (concat qss)

The concatenation is properly oriented so it’s effectively free.

Edit: generalised from State s to Monad f:

run :: Monad f => (q -> f [q]) -> [q] -> f ()
run _ [] = pure ()
run f qs = do
  qss <- traverse f qs
  run f (concat qss)

Since you seem to really like lists, you could use another list as a counter :slight_smile:

run :: [q] -> s -> (q -> s -> ([q], s)) -> s
run initial state action =
  let (qs, state') = go initial (initial ++ qs) state
  in state'
  where
    go [] _ s0 = ([], s0)
    go (_:counter) (q:rest) s0 =
      let (newq, s1) = action q s0
          (qs', s2) = go (newq ++ counter) rest s1
      in (newq ++ qs', s2)

I’m missing the broader picture here, would you care to explain?

My guess: There is a dependency tree of tasks q that possibly hand data to each other, and have access to a common environment s. The abstraction for a task is backwards, so each task can say: “My prerequisite is this number of tasks, that need to have finished before I can run”. The [q] serves as an abstraction for a topological sorting of the forest of dependencies.

Your run function and its arguments apparently attempt to do two things in parallel, that may better be kept separate:

  1. Build the dependency tree of tasks
  2. execute/schedule the tasks

I’d rather start by finding a fitting data abstraction for tasks with dependencies, while keeping the question of scheduling and execution entirely separate. Once you have both, lazyness can still take care of interleaving creating and running the jobs.

Joachim Breitner has a package rec-def that lets you compute fixed points of data structures that are otherwise strict, such as maps and sets.

This was a great suggestion, I ended up trying this, it works perfectly well when you sprinkle in strictness annotations.
My implementation, sadly, was still outperformed by Data.Sequence.Seq both in terms of space and time.

Sure, algorithms I had in mind were something like breadth-first-search, dijktra, A*, FIRST-Set of a Grammar Nonterminal.
All of these problems have a recursive notion. E.g. to determine the First-Set of Nonterminal you must also determine the First Set of all Nonterminals that are reachable through the original.
Or in my abstraction, any abstract “job” can issue any number of new work portions that arose from it’s processing. The function signature was very involved because I tried so hard to have a lazily generated work queue.

I want to thank everyone involved for the suggestions, I had a blast trying it all and benchmarking some stuff, though nothing held up against an implementation that used Seq.

I’m now doing this: j -> m (Seq j, a), it is always possible to read the current job, perform a monadic action and produce a Sequence of new jobs that should be run afterwards. This is very boring (no infinitely lazy lists) but very memory and space-efficient, apparently.

I think I might release a small library if this abstraction proves useful for me.
What I’m doing currently looks like this:

reachableVertices :: Graph Vertex -> Vertex -> Set Vertex
reachableVertices graph start = WorkList.run (Set.singleton start) [start] $ do
  node <- WorkList.current
  visited <- State.get
  forM_ (neighbors graph node) $ \ neighbor -> do
    unless (neighbor `Set.member` visited) $ WorkList.queue neighbor
    State.modify (Set.insert neighbor)

the do-notation-block would have this type:

bfsDo :: WorkList Vertex (Set Vertex) ()

the first type argument is the job type, the second one is the state type and the third one is the monadic result type.

Using the WorkList.run function will run the monadic computation repeatedly until the job queue is empty and then yield the final state.

run :: Foldable f => s -> f q -> WorkList q s () -> s

Hopefully this isn’t reviving a dead topic but @VegOwOtenks I thought this was quite an interesting problem for me given that I enjoy recursion schemes too much. So, if I may chime in:

If you tease the type a bit you end up with a co-algebra:

-- type JobF m a = m :.: (,) a :.: Seq
-- or, inlining, composition...
data JobF m a j = JobF { unJobF :: m (a, Seq j) } 
  deriving (Functor)

given this your function type above becomes: j -> JobF m a j which for some fixed monad m and fixed a we get a co-algebra (JobF m a is a functor). What you have then described seems to be a at least a monadic anamorphism (or g-anamorphism with a monad distributive law if you’re using the recursion-schemes package).

I took a look at your nested mutual recursion functions by re-writing go and run using a Reader-State monad and using mfix. I haven’t come to any conclusions yet but I have a hunch that your problems sounds very similar to some other problems I solve that involve obtaining values “from the future”. Mind if you share your new definitions of WorkList.run and friends?

Thanks for all the fancy words, made me look up and learn what (co-)algebras are.

This is roughly where I got lost, what is a monadic anamorphism?

Not at all, you can find the entire source code in the git repository.

The base definition is very involved, because I added support for Prioritizable Tasks:

WorkListT

WorkListT


-- | Parametric WorkList monad transformer.
--
-- Type parameters are:
--
-- [@t@]: Task Type
-- [@q@]: Queue Type (constrained by 'Queue')
-- [@m@]: Base monad
-- [@a@]: monadic computation result
newtype WorkListT t q m a = WorkListT (t -> m (q t, a))

WorkListT.run

-- | Execute the provided monadic action until the task queue is exhausted.
--
-- Will diverge if the task queue is infinite, unless the base monad allows
-- for an early exit, like t'Control.Monad.Except.ExceptT'.
run :: (Foldable f, Monad m, Queue q, Queue.TaskConstraints q t) => f t -> WorkListT t q m () -> m ()
run fQueue (WorkListT f) = go (Queue.fromList $ Foldable.toList fQueue)
  where
    go qs = case Queue.dequeue qs of
      Nothing -> pure ()
      Just (curJ, rest) -> do
        (newQs, _) <- f curJ
        go $ rest `Queue.extend` newQs

queueAll

-- | Try to add every @t@ in @f@ to the task queue. Exact behaviour will depend on the
-- 'Queue' implementation you're using.
--
-- Monomorphized re-export of 'C.queueAll' in 'C.MonadWorkList'.

queueAll :: (Foldable f, Applicative m, Queue q, Queue.TaskConstraints q t) => f t -> WorkListT t q m ()
queueAll fq = WorkListT $ \_ -> pure (Queue.fromList $ Foldable.toList fq, ())

But all the type tetris is not visible anyway unless you need to build something custom, the interface for the standard WorkList is not nearly as parametric:

You’re welcome :slight_smile: I wasn’t sure if you were aware of them or not. Sorry for the dictionary shock.

I’ll explain an anamorphism first and then I will explain a monadic
anamorphism. An anamorphism is a combinator for constructing recursive
functions that construct/build recursive data structures.

-- | Given any functor construct it's recursive form.
data Fix f = Fix (f (Fix f)) 

-- | Given a co-algebra construct a function that produces a recursive data structure.
ana :: (Functor f) => (a -> f a) -> a -> Fix f
ana coalg = let c = Fix . fmap c . coalg in c

One provides a co-algebra (which is typically non-recursive) to an
anamorphism and the anamorphism provides a function that creates the
recursive data structure. Technically however, the functor that’s in
use corresponds to a class of recursive data structures. And that’s what
the Recursive/CoRecursive type classes and the Base type family are
for. I don’t have room to explain that here. So instead of using Fix f
as the co-domain (return type) of the function we can use a more concrete
data structure like [e].

The functor f above describes the structure of a single layer of
recursion (when to terminate, and when to keep going, what data to
remember at each layer etc). The purpose of the anamorphism is then to
produce the fully recursive function a -> Fix f. Here’s a concrete
example:

data ListF e a = Nil | Cons e a 
  deriving (Functor f)

nSized :: Natural -> ListF Natural Natural
nSized 0 = Nil
nSized n = Cons n (n-1)

-- Fix (ListF e) ~ [e]
sizedList :: Natural -> [Natural]
sizedList = ana evens

Monadic anamorphisms are anamorphisms that operate over a monad. More
concretely their co-domain is wrapped in a monad m - in both the
co-algebra and the return type:

ana  :: (Functor f)          => (a ->    f a)  -> a ->    Fix f
anaM :: (Monad m, Functor f) => (a -> m (f a)) -> a -> m (Fix f)

Now let’s say that you want to write a program where the user gives your
program n names to consume via CLI prompts. You’ll need a way ask
the user for a value as noted by askforName. Now let’s say we will
keep asking the user for values until they type “done”. We’ll need to
recursively ask for a name, check if the given string is equivalent to
“done” and then based on the resulting truth value we terminate or
we recurse.

We can use a monadic anamorphism to accomplish this. I’ll write the proof steps below:

import Data.Functor ((<&>))

-- If IO throws exceptions we won't handle them here.
askForName :: IO String
askForName = do
  putStrLn "Next name please:"
  getLine

-- Claim: allNames can be defined using a monadic anamorphism
allNames  :: IO [String] -- we want to construct this
  {- forall a. a ~ () -> a -}
  -- :: () -> IO [String]
  {- exists f. Base t ~ f. t ~ Fix f -}
  -- :: () -> IO (Fix (ListF String))
allNames = anaM nextName ()

nextName :: () -> IO (ListF String ())
nextName _ = askForName <&> \case
  -- we're done, terminate
  "done" -> Nil
  -- assume the user typed a valid name...
  name   -> Cons name ()

The benefit of nextName is that it’s non-recursive. The ListF
functor completely determines the structure of the recursion and when we
terminate becomes very clear. Unfortunately this example doesn’t show
case the full power of recursion schemes, however I’ve written many a
server streaming handler with them with robust error handling and you
can imagine the recursion gets very tricky. Recursion schemes solves
many of these problems.

I believe a lot of mistakes in Haskell can be avoided by writing
recursion-schemes instead of explicit recursion. If you’re interested
in learning more:

  • Here is a great resource to learn more about recursion schemes.
  • I highly recommend reading the
    Programming with Bananas Lenses Envelopes and Barbed
    Wire

    functional perl. It can be overwhelming at first but if you have the
    time and energy to wade through it you’ll be a changed programmer (for
    example you’ll begin to spot easy ways to refactor your code and create
    optimizations algebraically).

I will post another response after I did int your code a bit more.

As promised here’s an application of recursion schemes to your problem. What you are in fact doing is what appears to be a hylomorphism. After looking through your type classes it appears you are trying to construct containers that have list-like semantics. That is they can be unfolded and folded over. In fact your call to Foldable.toList is actually a catamorphism and can be fused with the hylomorphism. Unfortunately I don’t have time to dive deeper into the theory here, but I believe your library is a wrapper around these concepts. I would take a deeper look into the ListF data type and learn about cata-, ana-, and hylomorphisms (and maybe futu-). But nonetheless here’s a small something I came up with:

import Data.Functor.Foldable (ListF (..), hylo, project)
import Control.Monad (join)
import Data.Foldable (toList)
import Control.Category ((>>>))

data GoF m a
  = Done
  | Rec (m a)
  deriving (Functor)

newtype WorkListT t q m a = WorkListT (t -> m (q t, a))

-- | Execute the provided monadic action until the task queue is exhausted.
--
-- Will diverge if the task queue is infinite, unless the base monad allows
-- for an early exit, like t'Control.Monad.Except.ExceptT'.
run :: (Foldable f, Monad m) => f t -> WorkListT t [] m () -> m ()
run fQueue (WorkListT f) = hylo runGoF (go' project f) . toList $ fQueue

runGoF :: (Monad m) => GoF m (m ()) -> m ()
runGoF = \case
  Done  -> pure ()
  Rec x -> join x

-- Maybe (e, a) ~ ListF e a
go' :: (Functor m, Semigroup (q a)) => (q a -> ListF a (q a)) -> (a -> m (q a, b)) -> q a -> GoF m (q a)
go' snoc f = snoc >>> \case
  Nil       -> Done
  Cons a as -> Rec $ ((as <>) . fst) <$> f a 

Note that go' is a co-algebra “constructor” that is given a way to project from a list, and a WorkListT construct a co-algebra where your q a is the seed and GoF is the shape of your recursion.

Final thought, I believe you can fuse the hylo and toList together but I’ll leave this an exercise.