Seeking hasktags maintainer

In short, I’ve been doing a really terrible job keeping the lights on and I don’t foresee a lot of additional bandwidth in the near term. I haven’t used it actively since the LSP options became usable, but I know there are people who do. Please reach out if you have any interest.

Also, a huge thank you to @andreasabel for doing all of the hard work of keeping Hasktags building with newer versions of GHC.

7 Likes

To be honest it would be probably best to sunset it and recommend ghc-tags (or ghc-tags-plugin) instead since they don’t have any parser-related bugs.

I see you are the maintainer of ghc-tags, Andrzej. Are you sure ghc-tags and hasktags perfectly overlap in their goal and scope?

I am a user of hasktags Jack. Thanks for your work and thanks for deciding to seek a new maintainer here; a positive way forward that shows consideration for the ecosystem and allows previous maintainer a well-deserved closure to manage with other commitments.

1 Like

I’m not sure about ghc-tags, but I was a user of hasktags until something annoyed me and I moved to fast-tags and that library has been working great for me for years.

2 Likes

In such situation I’d consider transfering the repo to Haskell GitHub Trust · GitHub.

3 Likes

I think so. ghc-tags properly generates tags for all top-level definitions since it uses GHC API, unlike hasktags, fast-tags etc. which use ad-hoc parsers. I use it for tags-based navigation in all of the Haskell projects I interact with without any issues. When I was using hasktags, I was constantly encountering definitions I couldn’t jump to because of bugs in the parser. This was annoying enough for me that I ended up creating ghc-tags :wink:

It’s the recommended tool for tags generation: Tags · Wiki · Glasgow Haskell Compiler / GHC · GitLab and is getting support in the GHC repo as an alternative to HLS.

5 Likes

I use (a lot) haskdogs: Generate tags file for Haskell project and its nearest deps which depends on hasktags. I don’t know if it can be modified easily to use another “tagger”.

1 Like

I use hasktags because it just works. Not perfectly but it hasn’t changed for more than a decade. The source to hasktags is trivial, I could keep it running for as long as I like.

I don’t have that kind of confidence in anything that depends on GHC’s API. As tempting as they are, their maintainers are more at risk of burning out.

1 Like

Understandable. FWIW ghc-tags uses ghc-lib so I don’t drown in CPP and maintenance for the last 2 years was generally "every ~6 months spend an hour moving to the newest ghc-lib".

2 Likes

And thank you for that!! ghc-tags-plugin is a core part of my workflow ever siice I discovered it, and I couldn’t be happier!

Thank @coot, he developed ghc-tags-plugin and I just repackaged its tag extracting logic into ghc-tags :wink:

Ah. For some reason I thought the dependency graph was yhe other way round. Thank you @coot!

I use ghc-tags because it picks up definitions in happy/alex files. And generally works with preprocessors.

I used hasktags before that.

1 Like

I’m more than happy with ghc-tags. Its adoption in GHC is only the consequence of its quality and good user experience, which I can vouch for in both personal and professional contexts.

1 Like

Out of interest I looked at fast-tags vs hasktags vs ghc-tags on the same non-trivial work codebase (842 Haskell modules). Based on the output sizes, it’s clear that hasktags produces the least information, fast-tags produces more (likely due to a slightly better lexer), and ghc-tags the most information (understandably, it’s a full parser). In terms of speed, the difference between them is in the noise if you let them walk the directory structure themselves. I’ve found if you pipe fd into them, it turns seconds into something on the order of 200ms. I’ve updated my Emacs binding to do this, regardless of which tags implementation I’m using.

1 Like

Interesting discussion (I had also noticed hasktags’ output seems incomplete).

I see there is also ghc-tags-core - gosh, another option/fork?

Looking at Why fd is so much faster than find? · Issue #693 · sharkdp/fd · GitHub, it looks like fd manages to be faster than old standbys mostly just by using multiple cores. Perhaps the tags generators could apply that lesson.

Since the above results are not reproducible (private project), I can share how ghc-tags works locally for me with the ghc repo and the following config file:

ghc-tags.yaml
source_paths:
- compiler

cpp_includes:
- _build/stage1/compiler/build
- compiler

---

source_paths:
- libraries/base
- libraries/ghc-internal

exclude_paths:
- libraries/base/src/System/CPUTime/Javascript.hs
- libraries/base/src/System/CPUTime/Windows.hsc
- libraries/base/tests
- libraries/ghc-internal/src/GHC/Internal/Conc/POSIX/Const.hsc
- libraries/ghc-internal/src/GHC/Internal/Event/Windows.hsc
- libraries/ghc-internal/src/GHC/Internal/Event/Windows/ConsoleEvent.hsc
- libraries/ghc-internal/src/GHC/Internal/Event/Windows/FFI.hsc
- libraries/ghc-internal/src/GHC/Internal/IO/Windows/Handle.hsc
- libraries/ghc-internal/src/GHC/Internal/JS/Prim.hs

cpp_includes:
- _build/stage1/libraries/ghc-internal/build/include
- _build/stage1/rts/build/include
- libraries/ghc-internal/include
- rts/include

---

source_paths:
- libraries/ghc-bignum

exclude_paths:
- libraries/ghc-bignum/src/GHC/Num/Backend/Selected.hs

cpp_includes:
- libraries/ghc-bignum/include

---

source_paths:
- libraries/ghc-boot
- libraries/ghc-boot-th
- libraries/ghc-compact
- libraries/ghc-experimentsl
- libraries/ghc-prim
- libraries/ghc-platform
- libraries/template-haskell

exclude_paths:
- libraries/ghc-compact/tests
- libraries/ghc-prim/tests

---

source_paths:
- libraries/ghc-heap

cpp_includes:
- _build/stage1/rts/build/include
- rts/include

cpp_options:
- -DMIN_TOOL_VERSION_ghc(x,y,z)=1

exclude_paths:
- libraries/ghc-heap/tests

---

source_paths:
- utils/haddock/haddock
- utils/haddock/haddock-api
- utils/haddock/haddock-library
- utils/haddock/driver

cpp_includes:
- _build/stage1/rts/build/include
- rts/include

exclude_paths:
- utils/haddock/haddock-api/src/Haddock/InterfaceFile.hs
- utils/haddock/haddock-api/src/Haddock/Types.hs
unknown@electronics ghc $ rm TAGS*
unknown@electronics ghc $ time ghc-tags -e
libraries/ghc-heap/GHC/Exts/Stack/Decode.hs:12:14: warning: [GHC-53692] [-Wdeprecated-flags]
    -XTypeInType is deprecated: use -XDataKinds and -XPolyKinds instead
   |
12 | {-# LANGUAGE TypeInType #-}
   |              ^^^^^^^^^^

could not execute: hspec-discover
could not execute: hspec-discover

real    0m4,011s
user    0m24,264s
sys     0m3,897s
unknown@electronics ghc $ time ghc-tags -e
could not execute: hspec-discover
could not execute: hspec-discover

real    0m0,334s
user    0m0,861s
sys     0m0,237s

It runs on a little over 1500 modules. The cold run takes time, but after that it tracks modification times of all modules and on subsequent runs reruns parsing only on modules that changed. I don’t observe the slowness letting it do the directory traversal.

If anyone wants to try this locally, you need to compile GHC first for the build system to generate header files needed for parsing some modules.

1 Like

I decided to check whether fast-tags is still fast and this is what I came up with: GitHub - random-random-stuff/haskell-tags-benchmark2024: Benchmark generation of tags from Haskell sources

My workflow is to download some common dependencies and index them together with my project so that names will be resolved in the dependencies as well. The dependencies are static and typically are indexed only once, but when they’re not cached the delay matters for me so I use fast-tags.

The raw speed numbers that don’t include file collection are:

$ hasktags --ctags -o tags.hasktags STDIN +RTS -s <files.txt
...
  Total   time    8.831s  (  8.865s elapsed)

$ fast-tags -o tags.fasttags - +RTS -s -N <files.txt
...
  Total   time   10.805s  (  1.268s elapsed)

$ ghc-tags -c -o tags.ghctags +RTS -s <files.txt
...
  Total   time   30.259s  (  6.278s elapsed)

Overall fast-tags seems to still be a fast one. NB it also indexes hsc, Alex and Happy files.

Regarding precision there were comparisons stating that hasktags collects the least information. The report was assumming that all tag generators generate the same output but that’s not the case: hasktags generates Emacs-style tags by default, fast-tags generates vim-style tags and ghc-tags has no default and forces you to choose. Comparing size of Emacs-style tags against vim-style ones is not right. Different formats also take different time to generate.

I compared vim tags generation, of the three thet fast-tags gives least amount of entities:

$ wc -l tags.*
  154802 tags.fasttags
  229023 tags.ghctags
  230680 tags.hasktags

However hasktags contains the least info. For intance it doesn’t output tag type by default (e.g. whether it’s constructor, function, type, etc), while the others do. Maybe I missed the option to enable it. Example output:

$ grep -F "Key'F14" tags.*
tags.fasttags:26086:Key'F14	all-packages/GLFW-b-3.3.9.0/Graphics/UI/GLFW/Types.hs	358;"	C
tags.ghctags:41912:Key'F14	all-packages/GLFW-b-3.3.9.0/Graphics/UI/GLFW/Types.hs	358;"	c	term:Key'F14
tags.hasktags:47312:Key'F14	./all-packages/GLFW-b-3.3.9.0/Graphics/UI/GLFW/Types.hs	358
3 Likes

Good to be clear, I am aiming to just test the Emacs output modes.

2 Likes