Survey for users of the tree-sitter Haskell grammar

hey @wenkokke, from your experience with writing queries, is there a clear favorite between flat and nested repeating structures? In particular, these two:

fun :: ∀ a . C1 => a -> C2 => ∀ b . b -> a
fun = f a b c d

where the first one has these nested and flat trees:

(forall (binders)
  (context (class)
    (fun (variable)
      (context (class)
        (forall (binders)
          (fun (variable) (variable)))))))

vs

(sigtype
  (forall (binders))
  (context (class))
  (fun (variable))
  (context (class))
  (forall (binders))
  (fun (variable))
  (variable))

and the second one:

(apply
  (apply
    (apply
      (apply
        (variable) -- f
        (variable))
      (variable))
    (variable))
  (variable))
(apply
  (variable) -- f
  (variable)
  (variable)
  (variable)
  (variable))

My intuition is that the flat structure is easier, e.g. I saw that plugins that do argument swapping rely on it; but since the nested structure is “correct” in terms of reduction semantics and associativity there might be other aspects that I have no insight into.

Is this relevant/significant for your cursorless efforts?

Flattening only the apply node into a list of a function and its arguments is workable and probably easier. But once you start extending this to infix operators and I can no longer reliably grab the function node by taking the first child, we’re in trouble.

The flattening of the type signature looks much scarier and looks like it loses a lot of useful information. If you think it retains the same structural information, I might be misunderstanding exactly what you’re flattening there.

Ah no, that wouldn’t work anyway. Infix operators have no arbitrary repetition like application.
The nested apply is much easier to implement, but if the flat structure was more desirable I would choose that instead.

Note however that it has always been flat in this grammar, see for example the tests!
It’s pretty likely that this wasn’t deterministic though.
In the rewrite, I have implemented the nested structure, mainly because I couldn’t get it to work reliably otherwise, but now I’m pretty confident I could go either way, so I need opinions!


I’m pretty certain they’re isomorphic, but I think it depends on the tooling how that translates into practice. forall, context and arrow are always right-associative, so the nested tree would be a chain anyway at the top level.
If you can come up with some queries or other examples that would be different between the two approaches, we can probably figure it out.
I’m not particularly happy about either of the alternatives, so any justification to choose one would be very welcome!


Anyway, it sounds like you’re not very certain about either case; maybe you can think about it for a bit and get back to me :relaxed: I can also push a branch to try out if that helps, or elaborate some more on whatever’s unclear.

nvim-treesitter highlights

Here’s an example where the nested structure requires multiple queries that the flat variant would achieve with only one

Is there any plan to add grammars for the other languages that we regularly use, in particular Cabal, but I suspect Core might be interesting as well.

I looked around for a treesitter grammar for Cabal about a year ago. I found nothing and ended up hacking something together… PA: Treesitter grammar for Cabal

No concrete plans, but I was thinking about adding Alex/Happy grammars. I’d be happy to collaborate though.

1 Like

Hi, we are using tree-sitter-haskell in static-ls at Mercury. This is in an effort to improve language intelligence tooling for extremely large projects (~ 10,000 modules). I made Haskell bindings for tree-sitter that generates a full typed tree with traversals in tree-sitter-simple.

4 Likes