Foldable append-only data structure

I need to implement a data structure for a finite ordered collection of elements, which needs to support two operations:

  • append elements at the end
  • fold all elements starting from the beginning

Hence something like this should hold

fold (:) "" (append "c" (append "b" (append "a" empty))) = "abc"

There are some obvious options:

  1. Use a simple list: fold is already perfect. Appending at the end is linear in the length of the list, not good
  2. Use a reversed list: appending at the end (which is actually the start of the list) is perfect, but for folding I need to reserve the list, which is linear in the list length, not good
  3. I could use Seq: appending at the end is constant time with (|>) and I have a ready made Foldable instance. But still it allows many more operations and can’t be optimized just for folding.

I know an answer to the question is “measure in your specific case”.
I was just curious whether, theoretically, there is a data structure optimized for such cases

foldr associates at the right, i.e. starts evaluating from the end of the list. If you use snoc lists then you can use foldr to start at the beginning, perhaps there’s other reasons why this wouldn’t work, but as I interpreted your needs, I think you can get it working with what I said :smiley:

1 Like

I’m not sure I get your suggestion.

Are you suggesting to use snoc for implementing append? Then we’re in case 1.
Are you suggesting to use snoc when folding? How so?

I was pointing out that you don’t necessarily need to reverse the list when using a Snoc list (reversed list, point 2 in your example), since foldr, by associating at the right will evaluate from the end of the list. Also if you are worried about asymptotics, fold is linear too so the overall complexity will still be linear even if you reverse the list.

Note that difference lists are also a good candidate depending on the access patterns of your specific use case.

if you use a reversed list and use a simple foldr, without reversing it, you will get

fold (:) "" (append "c" (append "b" (append "a" empty))) = "cba"

which is the contrary of what I’d like to get.

This is because foldr associates to the right, but it uses new elements as the left argument of the combining function

I know, but the evaluation order starts from the right most element of the list (exactly how you put the parenthesis in your example) which was what I thought you wanted… If you want

fold (:) "" (append "c" (append "b" (append "a" empty))) = "abc"

you need to use foldl (flip (:))

1 Like

foldl (flip (:)) works, thanks!

1 Like

You can use Data.Functor.Reverse to reorient your folds and traversals.

1 Like

But that’s just reverse

ghci> foldl (flip (:)) [] [1..5]
[5,4,3,2,1]

that’s just the example.

What I wanted is something like

newtype EntityEventStore a = EntityEventStore [a]
  deriving newtype (Functor)

empty :: EntityEventStore a
empty = EntityEventStore []

append :: a -> EntityEventStore a -> EntityEventStore a
append a (EntityEventStore store) = EntityEventStore (a : store)

instance Foldable EntityEventStore where
  foldr :: (a -> b -> b) -> b -> EntityEventStore a -> b
  foldr f initial (EntityEventStore store) = foldl (flip f) initial store

which is guess is isomorphic to Reverse [a]

Is this really that big of a deal? Folding is going to be linear in the list length anyway (well, linear at best, depending on the operation), so the reverse won’t have that big of an impact on the grand scale.

1 Like

If you’re using a snoc list, would you not want foldr' to reverse it? Similar to how you’d use foldl' to reverse a regular (cons) list. The feels like the simplest solution to me.

You can also use Dual, as in

instance Foldable EntityEventStore where
    foldMap f (EntityEventStore xs) = getDual $ foldMap (Dual . f) xs

The only reason I can imagine that the reversal is unacceptable is if you need the fold to be a lazy right fold, so I will assume this to be the case. I would use Seq for this. Some might suggest dlist, but when you fold it at the end you would end up pushing everything onto the stack first, to enter all the closures, which is virtually the same as reversing it anyway. Seq would not have the behavior, at least not if it’s already in WHNF.

1 Like

that is exactly what is used by Reverse

https://hackage.haskell.org/package/transformers-0.6.1.1/docs/src/Data.Functor.Reverse.html#line-130

1 Like

My standard solution to these kinds of situations is the Bag data type:

data Bag a = Empty | Single a | Pair !(Bag a) !(Bag a)
  deriving stock (Show, Functor, Foldable, Traversable)
instance Semigroup (Bag a) where
  Empty <> bag = bag
  bag <> Empty = bag
  b1 <> b2 = Pair b1 b2
instance Monoid (Bag a) where
  mempty = Empty

This supports efficient append on either end, by wrapping an element in the Single constructor and then adding it with the semigroup operation (<>). If you have other ways of constructing Bags that are not just mempty, Single and (<>), you can add additional constructors to Bag that reflect those efficiently; for example, you can add List [a] if you’re going to add multiple items in one go, saving expansion into Pair nodes and deferring the list re-allocation to the eventual fold.

Note that GHC can auto-derive Foldable and Traversable for this data type (using -XDeriveFoldable -XDeriveTraversable, which are auto-enabled in GHC2021 (the default since GHC 9.2)).

Perhaps you can build something slightly more efficient if you know that you’re only going to append, but it’s not going to be much better without fancy tricks, and this data type is very general. We like that in Haskell land. :slight_smile:

EDIT: it’s worth noting that the “redundant reverse” is still happening here, it’s just more hidden. If you append lots of times, you get a left-leaning tree of Pair nodes, and the foldr in the derived Foldable instance will need to walk over that “list” all the way to the leftmost child to find the first element. Then it comes back up from recursion, discovering all the other elements on the way. This is the same control flow pattern as reverse has.

Since it hasn’t been mentioned yet, a queue might also be a good fit.

To be honest this leaves a lot of questions open to decisively recommend a structure. A fold can be strict, lazy, perhaps short circuiting. Other requirements will also affect the choice, such as a bound on how long it takes to produce the first element or each element after that.