Is function highlighting useful with tree-sitter?

janus · January 16, 2024, 6:06am

I am trying to use tree-sitter with NeoVim. Initially it was appealing, since I had the regex based highlighting break often, and I thought queries on a parse tree would be better.

But it seems that Haskell is hard with tree-sitter, since it can’t know if something is a function or not without doing type checking.

For example if I have

int :: Int -> Int
int = id

and I do :Inspect on the latter int, I get @function.haskell.

But if I change to:

type Int2Int = Int -> Int                                                           
                                                                                    
int :: Int2Int                                                                      
int = id

the color is changed and it now looks like any other term.

This makes me wonder, does tree-sitter even make sense with Haskell? Are we going to highlight using LSP at some point? Then I suppose Haskellers could skip tree-sitter entirely.

I suppose you could make the point that highlighting doesn’t need to be perfect. But in that case, you could make the argument that regexes are good enough. They seem to have less moving parts.

Or should I, as the title suggests, just color functions the same as other terms, and enjoy highlighting on other elements?

ReleaseCandidate · January 16, 2024, 7:11am

That’s called semantic highlighting and part of the LSP spec.
HLS PR Implement semantic tokens plugin to support semantic highlighting(textDocument/semanticTokens/full) by soulomoon · Pull Request #3892 · haskell/haskell-language-server · GitHub

Many LSPs support semantic highlighting, but, depending on your editor, the used theme/color scheme must support (color) these tokens too, which means that the highlighting of some “smaller” languages only works with special themes.

ocharles · January 16, 2024, 9:40am

Well all or nothing seems a bit extreme, no? tree-sitter covers an awful lot correctly at the moment, but yes, it does miss a bit and thus isn’t perfect. Despite this I get a lot of value from it still - it drives highlighting in my editor (Helix), but also helps code navigation. In Helix I can press Alt-o and Helix will grow my selection to the parent node in the AST. I use this all the time to select sub-expressions to extract them into separate functions, for example. This would be a lot more complicated with regular expressions!

tomjaguarpaw · January 16, 2024, 10:17am

I’m surprised to hear that there are people who want to color identifiers of function type differently from other identifiers! What benefit do you get from doing so?

ReleaseCandidate · January 16, 2024, 10:19am

The problem is that Treesitter does the same things that LSPs can do (your selection example is covered by the Selection Range Request),but not all LSPs support the same features. And Treesitter is too limited to be used instead of LSP (yes, I know that there are LSPs implemented using Treesitter and they all are “better than nothing”). So, the sooner Treesitter is dropped for LSPs, the better (Emacs just beginning to adopt Treesitter doesn’t help), as it “forces” the LSP to implement the missing features and all editors to be able to use the same features.

If I’m allowed to hazard a guess, I’d say, knowing whether an identifier is a function or not
As always: as soon as you are used to a feature (in other LSPs), you miss it if an (LSP) implementation lacks that feature.

jaror · January 16, 2024, 10:31am

But in Haskell you can only know that after type checking. So is it really a good idea to already start trying to guess it after only parsing?

ocharles · January 16, 2024, 10:31am

I think knowing if something is a free or bound variable in an expression could be useful sometimes - certainly in the case of shadowing (e.g., is this id the function (free), or some id I bound earlier, such as binding it in do notation (bound))

ReleaseCandidate · January 16, 2024, 10:32am

I know (and it isn’t the only language with that problem), that’s why you have to use the LSP for that and can’t use e.g. Treesitter.

tomjaguarpaw · January 16, 2024, 10:38am

That seems reasonable, but I don’t understand what it has to do with knowing if something is of function type.

ocharles · January 16, 2024, 10:52am

Oh right. Yea, I had misunderstood that part of the original post

ReleaseCandidate · January 16, 2024, 10:53am

Well, in “currying” languages, you can see at a glance when you have forgotten to pass the last argument.
Like (a bit contrived ;):

f x y = x * y
four = f 2

tomjaguarpaw · January 16, 2024, 10:55am

Ha! Because four would be coloured “as a function”? The logical conclusion of this idea is a variety of colours for a variety of different type classifications, and that would indeed imply full type checking …

ReleaseCandidate · January 16, 2024, 11:01am

The (predefined) semantic tokens are already quite specific: LSP: Semantic Tokens

export enum SemanticTokenTypes {
	namespace = 'namespace',
	/**
	 * Represents a generic type. Acts as a fallback for types which
	 * can't be mapped to a specific type like class or enum.
	 */
	type = 'type',
	class = 'class',
	enum = 'enum',
	interface = 'interface',
	struct = 'struct',
	typeParameter = 'typeParameter',
	parameter = 'parameter',
	variable = 'variable',
	property = 'property',
	enumMember = 'enumMember',
	event = 'event',
	function = 'function',
	method = 'method',
	macro = 'macro',
	keyword = 'keyword',
	modifier = 'modifier',
	comment = 'comment',
	string = 'string',
	number = 'number',
	regexp = 'regexp',
	operator = 'operator',
	decorator = 'decorator'
}
export enum SemanticTokenModifiers {
	declaration = 'declaration',
	definition = 'definition',
	readonly = 'readonly',
	static = 'static',
	deprecated = 'deprecated',
	abstract = 'abstract',
	async = 'async',
	modification = 'modification',
	documentation = 'documentation',
	defaultLibrary = 'defaultLibrary'
}

tomjaguarpaw · January 16, 2024, 11:04am

Ah I see. That makes sense. There’s a predefined set of tokens, and they don’t necessarily correspond cleanly to Haskell entities.

ReleaseCandidate · January 16, 2024, 11:09am

Exactly. You can define your own ones (they are just indices of an array Integer Encoding for Tokens), but they have to be somehow supported by the client.

michaelpj · January 16, 2024, 11:21am

The latest HLS release includes a new semantic tokens plugin! It’s off by default since it’s a new feature, but please do try it out.

In particular, it does highlight things which are of function type differently from things which aren’t.

Topic		Replies	Views
Survey for users of the tree-sitter Haskell grammar Announcements	26	3094	June 3, 2024
Semantic highlighting in haskell language server Show and Tell	1	1974	February 8, 2024
Syntax highlighting in emacs Learn	14	2482	June 5, 2023
[Serokell Blog] Work on GHC: Dependent Types, Part 3 Links	8	1262	April 28, 2024
Issue 148 :: Haskell Weekly Links	0	1313	February 28, 2019

Is function highlighting useful with tree-sitter?

Related topics