Functional representations of Graphs

I haven’t dug too deep into graphs when it comes to programming. Never really needed to, I guess. I’ve taken some time to look up graphs and google up haskell’s equivalent… to see that most of the resources are old. Well, I mean, that’s already a thing, and it doesn’t mean they’re already bitrotted just because of that, but they also were a little unclear.

I’ve seen something like

data Node a = MkNode a [a]
-- where [Node a] is a list of adjacent nodes to the current node
type Graph a = [Node a]

But it’s been called to my attention that this is technically not a graph, but a tree with cycles. Which is precisely what a tree is: it’s a directed acyclic graph where every node is connected. This is also basically the implementation that containers uses. (Though for whatever reason, containers’ graphs might not be so good…)

However I’ve seen other things like

http://www.cs.tufts.edu/~nr/pubs/zipcfg.pdf

http://www.cs.tufts.edu/~nr/pubs/hoopl10.pdf

And other resources like stackoverflow etc feel like they get ahead of themselves, like

Which just creates a bunch of type aliases. I understand it’s for performance reasons, but it does not make understanding this subject any easier.

From what I’ve read besides the aforementioned representation, there is also

data Edge a = MkEdge a a
data Graph a = MkGraph [a] [Edge a]

Which stores nodes in the first list, and edges between them in the second. I understand that the types I’m making here are equivalent to tuples, but I feel like they make it easier to see the intent of what I’m writing.

I’ve also found a couple interesting explanations about why graphs are perhaps not as simple to represent as something like trees in algebraic data types. It’s because graphs are not algebraic (though, obviously algebraic-graphs has a representation that makes them such, but it doesn’t look different from the first representation I posted…), and ADTs are more suited for representing types in the form of trees. I’ve also found different answers that allude to tying the knot to make graphs as a recursive data type, but somehow, I’ve never seen such a type, only allusions to it.

I was a little disappointed that there wasn’t a neat resource that compiled different representations of graphs in Haskell and just showed the more easy to read version of their definition first. It feels like a lot of these resources start talking and showing other things before showing their representation in an understandable way. Also, probably a big reason I’m not understanding them this easily is because I’m probably a little rusty when it comes to Haskell, but I’m not sure.

So that’s the reason I’ve made this little thread. I’m interested in people’s responses to this, and I guess I’m kind of hoping someone else would do the work of compiling a list of interesting different representations of graphs. Also because I’m hoping Google will give this a higher priority than the more hodge-podge stackoverflow answer I got so that when other people (or when I forget in the future and end up looking it up again) there’s a more up-to-date resource that might be easier to understand.

I’ve renamed this post to reflect that I’m interested in any different functional representation of graphs, even if they’re in other languages… though, I’d be more interested in something that doesn’t really need dependent types to work!

8 Likes

Graphs are a complicated data structure, even outside functional programming. What operations do you need to be fast? Are you looking for an immutable graph or do you want to make changes? Do you only want to add new vertices and edges or also remove them? Are you expecting the graph to be dense or sparse? Do you expect to query the graph much more than you change it? All of these questions can influence which representation is best for you.

In Haskell, a major use case is always in compilers themselves. Graphs are used to represent ASTs (a misnomer), type constraints, data flow, register conflicts, etc.

One observation I found interesting is the fact that the syntax of a simple functional language with algebraic data types and let-bindings is essentially a graph representation. The algebraic data types can model tree-shaped fragments of the graph and let bindings and variables can represent the cycles or back references in the graph.

Generally, I think most graphs in Haskell are built in an ad hoc way using arrays or even lists. Every element of the array is a node which can contain some data payload which in turn can index into the array itself to represent edges to other nodes.

Do you have a specific use case in mind?

8 Likes

Nope! No specific use-case in mind. I was just trying to catch myself back up on graphs in Haskell because I felt like I used to know they work. I realize that I actually never knew how they worked and they ended up being a little more complex than I thought. I was just looking for more of a surface level, easy to understand, summary of a few popular Graph representations in Haskell and their use-cases or benefits/problems. I’m also a little curious about the bit that says containers’ implementation is bad, or maybe their interface for them?

1 Like

Looking at the source they do still use the same data type under the hood, so it is indeed just the interface.

1 Like

My supervisor and I wrote a paper on functional graphs last year, and in it we advocated for using the simple a → [a] type for graphs ( POPL Paper—Formalising Graph Algorithms with Coinduction - Donnacha Oisín Kidney ).

Of course, there are always tradeoffs with different representations, so this representation isn’t perfect for everything (it’s good for simple search algorithms). I think it’s a good default, though.

There’s also been a lot of other work over the past few years on graph representations. A few that come to mind are:

The last one in particular (algebraic graphs with class) was the primary inspiration for the FGAC paper.

6 Likes

Most of the paper links in your OP are timing out for me, so apologies if you’ve seen this before.

My favourite graph representation is from Algebraic Graphs with Class (Functional Pearl), which basically says that a graph is either empty, a singleton, the overlay of two graphs, or the result of connecting every node in one graph to every node in another.

The ADT implementation of this idea is probably very inefficient to actually write graph algorithms with, but presumably the classy version can be implemented with in a more performant way under the hood. Perhaps using adjacency lists or matricies in ST or something?

4 Likes

This is really interesting! In spirit of my original post wanting something easy to glance-over in one place without needing to sift through papers or other resources, the ADT equivalent of your paper seems to essentially be (sorry, I couldn’t figure out what you meant by a -> [a] and I could only see it as a -> [(Int, a)]

Formalizing Graph Algorithms with Coinduction

data Graph a = a → [Adj Int a]
data Adj a b = MkAdj a b
data Vert = A | B | C | D

graph :: Graph Vert

graph A = [MkAdj 7 B, MkAdj 2 C]
graph B = [MkAdj 1 C]
graph C = [MkAdj 3 D, MkAdj 1 B]
graph D = [MkAdj 5 B]

Interestingly enough, a graph in this representation ends up being a function rather than simply a data structure, which I think is fitting, though it might present some problems for some cases? Maybe graph analysis? I’m unsure, anyone else is probably more of an authority on this than me :sweat_smile:

Let a Thousand Flowers Bloom

Amazing name. This uses the strangest representation I have seen yet:

data EdgeGraph a = 
    Empty
  | Edge a
  | Overlay (EdgeGraph a) (EdgeGraph a)
  | Into (EdgeGraph a) (EdgeGraph a)
  | Pits (EdgeGraph a) (EdgeGraph a)
  | Tips (EdgeGraph a) (EdgeGraph a)

This representation seems to be more interested in dealing with compositions of different graphs, and as the paper mentions:

Informally, Empty represents the empty graph while Edge x is a single edge with
the label x. The Overlay constructor joins two graphs by unifying all nodes that have
overlapping incoming or outgoing edges. Into, Pits and Tips each extend the Overlay
operator in a different way. Into additionally connects each outgoing node in the
first graph to each incoming node in the second. Pits connects each outgoing node
in the first graph to each outgoing node in the second. And finally, Tips does the
same as Pits but for incoming nodes. These operations are closed over the set of edge
graphs and can be used to construct any edge graph. Moreover, each edge graph has
a unique representation from the operations up to their equational laws which will be
introduced later.

This paper is really easy to read, and it feels like, along with the paper you worked on, should be the ones to be read first for people interested in this topic. It’s really interesting that their graphs are more of an edge-focused construction, and I’m not really sure of it’s upsides or downsides.

An initial-algebra approach to directed acyclic graphs

This has a representation that is actually not outright expressed in programmatic syntax which is irritating. Considering it requires dependent types, the following code is Agda:

data Graph : Nat -> Nat -> Set where
  Empty  : Graph 0 0
  Vert   : (m : Nat)       -> (n : Nat)       -> Graph m n
  Beside : (g : Graph m n) -> (h : Graph p q) -> Graph (m + p) (n + q)
  Before : (g : Graph m n) -> (h : Graph p q) -> Graph m p 
  Swap   : (m : Nat)       -> (n : Nat)       -> Graph (m + n) (n + m)

In a previous edit I attempted to write a Haskell approximation but it was much too horrible, and wholly inaccurate. As a summary, you’re supposed to have a Graph (x: Nat) (y : Nat). The data constructors are: Vert, and its first argument is inputs to the graph, the second is outputs, and this is the same for the type of Graph itself. Beside considers two graphs in parallel, even they’re still “disconnected”, so a Beside (Vert 1 2) (Vert 2 1) is a Graph 3 3, three inputs and three outputs. Before x y connects the outputs of the first graph x to the inputs of the second graph y. Empty is empty. Swap (x : Int) (y : Int) is of type Graph (x+y) (y+x) and it connects the latter inputs to be the first outputs and the first inputs to be the latter outputs. if you do Swap 3 2 it’ll result in a Graph 5 5 meaning 5 inputs and 5 outputs. This is just a best approximation of what I’ve read, probably a little inaccurate. It feels like you might need something similar to dependent types to really do this right. Also, this representation is for directed acyclic graphs (DAGs), so it’s not really general graphs.

Algebraic graphs with class

Another pleasantly easy to read paper. This paper has this representation:

data Graph a = Empty
             | Vertex a
             | Overlay (Graph a) (Graph a)
             | Connect (Graph a) (Graph a)

Which is pretty simple in comparison:

Here Empty and Vertex construct the empty and single-vertex
graphs, respectively; Overlay composes two graphs by taking
the union of their vertices and edges, and Connect is similar to
Overlay but also creates edges between vertices of the two graphs,
see Fig. 1 for examples. The overlay and connect operations have
two important properties: (i) they are closed on the set of graphs,
i.e. are total functions, and (ii) they can be used to construct any
graph starting from the empty and single-vertex graphs. For exam-
ple, Connect (Vertex 1) (Vertex 2) is the graph with two ver-
tices {1, 2} and a single edge (1, 2). Malformed graphs, such as
G [1] [(1,2)], cannot be expressed in this core language.

Thank you for the help, and your very useful paper! However, the point of me making this post was mostly because of my bad comprehension of graphs, and scowering papers without having found proper syntax that told me something I’d understand. I find simply linking to papers obtuse and doesn’t really help me that much, but having some surface level information (I know mine is inaccurate, this is what I get from trying to understand these papers haha) is better because these papers just feel like they obscure what they do a little. Especially that initial-algebra approach paper…

So, please, if anyone wishes to comment in the future, the original post was me attempting to ask for Haskell syntax for this. If you do link to a paper, it’d be immensely helpful if you also type out an approximate representation of their Graphs as a Haskell ADT so that I don’t have to scan for it or reverse engineer it somehow.

4 Likes

Nice comment! It was mentioned in a previous reply, and I’ve just commented on it earlier :slight_smile:

I have also read something about graphs being represented as comonads as in

I guess this might be similar to the zipper approach adapted in the previously mentioned

and the frankly great paper @oidsk mentioned:

It looks like because of the cyclical nature of graphs, it might be more interesting to look at more coinductive/comonadic representations of them… I’ve never had this intuition linked before between graphs and comonads, but I guess it’s just a cute little idea.

I think it was more meant to be thought of as a generalization of Int -> [Int] (i.e., a is some enumerable data type) which is essentially the same as Vector [Int] and is generally called an adjacency list.

3 Likes

Is there a haskell implementation of your graph type? I imagine there could be with this type:

type Graph a w = a → MonoidalMap a (Min w)

Or possibly even:

type Graph a w = Map a (MonoidalMap a (Min w))

The latter makes the relation to adjacency maps particularly clear. But I’m unclear about performance and maybe other implementation details I’m not yet thinking about.

1 Like

While this representation is clear conceptually, it hides the complexity @jaror alluded to: Without let-bindings, there is no reason the adjacent nodes mentioned in one Node are actually the same as the nodes in other Node values in the Graph, unless each label a is unique.
Perhaps even clearer would be

data Node a = MkNode {label :: a, adjacent :: [Node a]}

In a language like C, the adjacent would be an array of pointers [*], and one would first allocate all nodes and once their memory pointers are known, one can populate the arrays. Here the memory locations serve as a kind of global dictionary of all nodes. In contrast, Haskell hides the memory locations from the programmer, whence some graph implementations (like fgl) internally use a different kind of dictionary (e.g, IntMap) for the nodes.

According to this observation, we can construct a full graph with three nodes like follows.

full3 :: a -> a -> a -> Graph a
full3 x y z = let
  nodeX = MkNode x [nodeY,nodeZ]
  nodeY = MkNode y [nodeX,nodeZ]
  nodeZ = MkNode z [nodeX,nodeY]
  in [nodeX,nodeY,nodeZ]

In the let block, you may assume that nodeY is already defined when defining nodeX. This technique is sometimes referred to as “tying the knot”.
The downside of this representation is that while following the adjacent nodes, you will not notice when you’re running around in circles, as a global dictionary is missing.

[*] In the containers implementation, it is an array of array indices.

The interface exposes too much. If you choose not to use the provided functions for graph construction, you can end up with an ill-defined structure. In contrast, fgl deliberately hides the implementation details.

2 Likes

Thank you for the very helpful answer! I haven’t focused on what @jaror mentioned about this, and this puts it in a clearer view in my mind!

You’re right that the representation I showed doesn’t force the labels to be unique, so you could run into problems… I had been assuming that the labels would need to be unique. Comparison of the memory location using C to ensure uniqueness is quite a funny way to do it, but I suppose it would work :laughing:. An IntMap like you mentioned would force something like that, maybe something like IntMap [Int]? However it’s a little interesting that the IntMap fgl uses isn’t like this. What I said seems to be more similar to the containers implementation. I suppose it’s because fgl contains the label and creates a distinction between incoming and outgoing links to nodes. In fact - I haven’t really gone on to study fgl’s implementation well at all; from a surface level view I assumed that it was similar to the [Node] representation I had earlier because of the first type definitions in Data.Graph.Inductive.Graph, but I see I was mistaken.

By the way – if the containers implementation is so bad and it’s well known, why isn’t it fixed in it? Couldn’t be because of backwards compatibility, could it? I mean, someone already showed how to fix it, so wouldn’t all they have to do be adopt a similar structure?

Very small tangent: After you mentioned jaror’s explanation, I also figured out it’s also similar to my earlier post mentioning the coinductive nature of graphs, but it’s a little obvious I suppose given that graphs are an inherently infinite structure, and that perhaps you’d use coinductive representations to create them. It’s still an interesting tidbit for me though because I never really made that connection myself…

I’ve pretty much defined the same wrapper API, too. Although I preserved the node type parameter.

I think graphs are quite broad unlike maps or sets, it’s harder to define an obvious broadly applicable API. I can see why Data.Graph is non-committal and pretty much just gives you a few raw algorithms enough to get you going. Though I’d be onboard with making what is there more type-safe.

2 Likes

Ah, alright. I guess it’s a bit of a “it’s good enough as it is,” thing. I haven’t used containers’ graphs at all, so I don’t have any experience in how good or bad it is. I just find it curious that I can’t exactly see a discussion about the quality of containers’ graph library in their GitHub or here. I think most people also go for a different library for graphs, like alga or fgl, right?

I don’t know about ‘most people’, but the PureScript compiler uses Data.Graph for a few things. Hasn’t ever come up as a hot path in profiling.

1 Like