Is function highlighting useful with tree-sitter?

I am trying to use tree-sitter with NeoVim. Initially it was appealing, since I had the regex based highlighting break often, and I thought queries on a parse tree would be better.

But it seems that Haskell is hard with tree-sitter, since it can’t know if something is a function or not without doing type checking.

For example if I have

int :: Int -> Int
int = id

and I do :Inspect on the latter int, I get @function.haskell.

But if I change to:

type Int2Int = Int -> Int                                                           
                                                                                    
int :: Int2Int                                                                      
int = id

the color is changed and it now looks like any other term.

This makes me wonder, does tree-sitter even make sense with Haskell? Are we going to highlight using LSP at some point? Then I suppose Haskellers could skip tree-sitter entirely.

I suppose you could make the point that highlighting doesn’t need to be perfect. But in that case, you could make the argument that regexes are good enough. They seem to have less moving parts.

Or should I, as the title suggests, just color functions the same as other terms, and enjoy highlighting on other elements?

That’s called semantic highlighting and part of the LSP spec.
HLS PR Implement semantic tokens plugin to support semantic highlighting(textDocument/semanticTokens/full) by soulomoon · Pull Request #3892 · haskell/haskell-language-server · GitHub

Many LSPs support semantic highlighting, but, depending on your editor, the used theme/color scheme must support (color) these tokens too, which means that the highlighting of some “smaller” languages only works with special themes.

Well all or nothing seems a bit extreme, no? tree-sitter covers an awful lot correctly at the moment, but yes, it does miss a bit and thus isn’t perfect. Despite this I get a lot of value from it still - it drives highlighting in my editor (Helix), but also helps code navigation. In Helix I can press Alt-o and Helix will grow my selection to the parent node in the AST. I use this all the time to select sub-expressions to extract them into separate functions, for example. This would be a lot more complicated with regular expressions!

2 Likes

I’m surprised to hear that there are people who want to color identifiers of function type differently from other identifiers! What benefit do you get from doing so?

2 Likes

The problem is that Treesitter does the same things that LSPs can do (your selection example is covered by the Selection Range Request),but not all LSPs support the same features. And Treesitter is too limited to be used instead of LSP (yes, I know that there are LSPs implemented using Treesitter and they all are “better than nothing”). So, the sooner Treesitter is dropped for LSPs, the better (Emacs just beginning to adopt Treesitter doesn’t help), as it “forces” the LSP to implement the missing features and all editors to be able to use the same features.

If I’m allowed to hazard a guess, I’d say, knowing whether an identifier is a function or not :wink:
As always: as soon as you are used to a feature (in other LSPs), you miss it if an (LSP) implementation lacks that feature.

1 Like

But in Haskell you can only know that after type checking. So is it really a good idea to already start trying to guess it after only parsing?

I think knowing if something is a free or bound variable in an expression could be useful sometimes - certainly in the case of shadowing (e.g., is this id the function (free), or some id I bound earlier, such as binding it in do notation (bound))

I know (and it isn’t the only language with that problem), that’s why you have to use the LSP for that and can’t use e.g. Treesitter.

1 Like

That seems reasonable, but I don’t understand what it has to do with knowing if something is of function type.

Oh right. Yea, I had misunderstood that part of the original post

1 Like

Well, in “currying” languages, you can see at a glance when you have forgotten to pass the last argument.
Like (a bit contrived ;):

f x y = x * y
four = f 2

Ha! Because four would be coloured “as a function”? The logical conclusion of this idea is a variety of colours for a variety of different type classifications, and that would indeed imply full type checking …

The (predefined) semantic tokens are already quite specific: LSP: Semantic Tokens

export enum SemanticTokenTypes {
	namespace = 'namespace',
	/**
	 * Represents a generic type. Acts as a fallback for types which
	 * can't be mapped to a specific type like class or enum.
	 */
	type = 'type',
	class = 'class',
	enum = 'enum',
	interface = 'interface',
	struct = 'struct',
	typeParameter = 'typeParameter',
	parameter = 'parameter',
	variable = 'variable',
	property = 'property',
	enumMember = 'enumMember',
	event = 'event',
	function = 'function',
	method = 'method',
	macro = 'macro',
	keyword = 'keyword',
	modifier = 'modifier',
	comment = 'comment',
	string = 'string',
	number = 'number',
	regexp = 'regexp',
	operator = 'operator',
	decorator = 'decorator'
}
export enum SemanticTokenModifiers {
	declaration = 'declaration',
	definition = 'definition',
	readonly = 'readonly',
	static = 'static',
	deprecated = 'deprecated',
	abstract = 'abstract',
	async = 'async',
	modification = 'modification',
	documentation = 'documentation',
	defaultLibrary = 'defaultLibrary'
}
1 Like

Ah I see. That makes sense. There’s a predefined set of tokens, and they don’t necessarily correspond cleanly to Haskell entities.

Exactly. You can define your own ones (they are just indices of an array Integer Encoding for Tokens), but they have to be somehow supported by the client.

1 Like

The latest HLS release includes a new semantic tokens plugin! It’s off by default since it’s a new feature, but please do try it out.

In particular, it does highlight things which are of function type differently from things which aren’t.

4 Likes