I have just finished rewriting large parts of the tree-sitter grammar to deal with some bugs and want to use this opportunity to get some user feedback about the structure of the produced AST.
Since I’m not a direct consumer of the grammar, i.e. I’m not maintaining any tools that use the tree-sitter API, but only use it rather superficially for benefits like specially highlighting constructors and class names in neovim, I don’t have the necessary insights into the requirements for a good node structure, so I’m essentially guessing what would be sensible for many constructs.
If there’s anyone here who has experience and strong opinions about this, I would be very grateful for some advice.
To make the conversation a bit simpler, here’s a list of questions that I’ve noted down along the way, though it’s far from complete:
-
Is there a point to having dedicated nodes for parens, like
type_parens?
Should parens be part ofoperatorfor prefix notation, or should the have a separate node, or nothing? -
Is it useful to have nodes wrapping variables and constructors, like
exp_name/pat_name/type_name, that distinguish the namespaces from each other? -
Is
(qualified_operator (module) (constructor_operator))better than:(qualified_constructor_operator (module) (constructor_operator))(qualified (module) (constructor_operator))
-
Should qualifying modules look like:
(qualified_variable (module (module_segment) (module_segment)) (variable))(qualified_variable (module) (module) (variable))
-
Should there be nodes for empty layouts, like
(alts)for an empty case? -
Should a comment right after
wherebe inside of thedeclarationsnode? -
I took great care to get these different binding variants to parse correctly:
-
fun = exp→(function (variable) (exp_...)) -
fun a b = exp→(function (variable) (patterns (pat_name (variable)) (pat_name (variable))) (exp_...)) -
A a = exp→(bind (pat_apply (pat_name (constructor)) (pat_name variable))) (exp_...) -
a :: A = exp→(bind (pat_annotated (pat_name) (type))) (exp)
Is there much value to this?
In particular, if the first form were parsed like the third one, the grammar would be a bit less complex, but we’d get a pattern for the function name:(bind (pat_name (variable)) (exp_...))
Similarly,signaturecould be merged withpat_annotated, but it would also introduce apat_nameinsignature. -
-
Infix function declarations like
a <> b = expare represented as(function (infix ...)), because they can have
additional parameters, like(a <> b) c d = exp, and this structure makes parsing easier.
Is it a problem thatinfixis not specific?
Should it be(function (function_infix ...))or something? -
wheredoesn’t have a node wrapping thebinds, because it’s always in the same place, unlike alet.
But aletonly containsbindsas well, so it might as well skip that node.
But then you couldn’t select all local bindings. -
Should linear arrows (especially the operator versions like
->.) share the node nametype_funwith regular functions?
Is it enough to match on the arrow or should it be calledtype_linear_fun, and should the modifier version also be in there? -
Should it be
(data_instance (newtype))or(newtype_instance)? -
Is it useful not to use
type_namefor the head tycon of a prefix constructor declaration? -
Is it ok to have
type_variablewithout nestedvariable?
For expressions and patterns, the structure is({exp,pat}_name (variable)), while types use(type_name (type_variable)). -
For type families, I’ve tried an approach with very general terms that would require more context nodes in queries, but would be less verbose.
Which is better? -
Should all layouts have a container node, like
declarationsorbinds?
E.g. type family equations have no container.
Shouldbindsbe disambiguated from type variable binds?
Should that term be binders or bindings?