I have just finished rewriting large parts of the tree-sitter grammar to deal with some bugs and want to use this opportunity to get some user feedback about the structure of the produced AST.
Since I’m not a direct consumer of the grammar, i.e. I’m not maintaining any tools that use the tree-sitter API, but only use it rather superficially for benefits like specially highlighting constructors and class names in neovim, I don’t have the necessary insights into the requirements for a good node structure, so I’m essentially guessing what would be sensible for many constructs.
If there’s anyone here who has experience and strong opinions about this, I would be very grateful for some advice.
To make the conversation a bit simpler, here’s a list of questions that I’ve noted down along the way, though it’s far from complete:
-
Is there a point to having dedicated nodes for parens, like
type_parens
?
Should parens be part ofoperator
for prefix notation, or should the have a separate node, or nothing? -
Is it useful to have nodes wrapping variables and constructors, like
exp_name
/pat_name
/type_name
, that distinguish the namespaces from each other? -
Is
(qualified_operator (module) (constructor_operator))
better than:(qualified_constructor_operator (module) (constructor_operator))
(qualified (module) (constructor_operator))
-
Should qualifying modules look like:
(qualified_variable (module (module_segment) (module_segment)) (variable))
(qualified_variable (module) (module) (variable))
-
Should there be nodes for empty layouts, like
(alts)
for an empty case? -
Should a comment right after
where
be inside of thedeclarations
node? -
I took great care to get these different binding variants to parse correctly:
-
fun = exp
→(function (variable) (exp_...))
-
fun a b = exp
→(function (variable) (patterns (pat_name (variable)) (pat_name (variable))) (exp_...))
-
A a = exp
→(bind (pat_apply (pat_name (constructor)) (pat_name variable))) (exp_...)
-
a :: A = exp
→(bind (pat_annotated (pat_name) (type))) (exp)
Is there much value to this?
In particular, if the first form were parsed like the third one, the grammar would be a bit less complex, but we’d get a pattern for the function name:(bind (pat_name (variable)) (exp_...))
Similarly,signature
could be merged withpat_annotated
, but it would also introduce apat_name
insignature
. -
-
Infix function declarations like
a <> b = exp
are represented as(function (infix ...))
, because they can have
additional parameters, like(a <> b) c d = exp
, and this structure makes parsing easier.
Is it a problem thatinfix
is not specific?
Should it be(function (function_infix ...))
or something? -
where
doesn’t have a node wrapping thebinds
, because it’s always in the same place, unlike alet
.
But alet
only containsbinds
as well, so it might as well skip that node.
But then you couldn’t select all local bindings. -
Should linear arrows (especially the operator versions like
->.
) share the node nametype_fun
with regular functions?
Is it enough to match on the arrow or should it be calledtype_linear_fun
, and should the modifier version also be in there? -
Should it be
(data_instance (newtype))
or(newtype_instance)
? -
Is it useful not to use
type_name
for the head tycon of a prefix constructor declaration? -
Is it ok to have
type_variable
without nestedvariable
?
For expressions and patterns, the structure is({exp,pat}_name (variable))
, while types use(type_name (type_variable))
. -
For type families, I’ve tried an approach with very general terms that would require more context nodes in queries, but would be less verbose.
Which is better? -
Should all layouts have a container node, like
declarations
orbinds
?
E.g. type family equations have no container.
Shouldbinds
be disambiguated from type variable binds?
Should that term be binders or bindings?