Seeking hasktags maintainer

jhenahan · July 1, 2024, 12:46pm

In short, I’ve been doing a really terrible job keeping the lights on and I don’t foresee a lot of additional bandwidth in the near term. I haven’t used it actively since the LSP options became usable, but I know there are people who do. Please reach out if you have any interest.

Also, a huge thank you to @andreasabel for doing all of the hard work of keeping Hasktags building with newer versions of GHC.

arybczak · July 1, 2024, 6:48pm

To be honest it would be probably best to sunset it and recommend ghc-tags (or ghc-tags-plugin) instead since they don’t have any parser-related bugs.

f-a · July 1, 2024, 7:21pm

I see you are the maintainer of ghc-tags, Andrzej. Are you sure ghc-tags and hasktags perfectly overlap in their goal and scope?

I am a user of hasktags Jack. Thanks for your work and thanks for deciding to seek a new maintainer here; a positive way forward that shows consideration for the ecosystem and allows previous maintainer a well-deserved closure to manage with other commitments.

effectfully · July 2, 2024, 7:58am

I’m not sure about ghc-tags, but I was a user of hasktags until something annoyed me and I moved to fast-tags and that library has been working great for me for years.

Bodigrim · July 2, 2024, 10:23pm

In such situation I’d consider transfering the repo to Haskell GitHub Trust · GitHub.

arybczak · September 13, 2024, 2:35pm

I think so. ghc-tags properly generates tags for all top-level definitions since it uses GHC API, unlike hasktags, fast-tags etc. which use ad-hoc parsers. I use it for tags-based navigation in all of the Haskell projects I interact with without any issues. When I was using hasktags, I was constantly encountering definitions I couldn’t jump to because of bugs in the parser. This was annoying enough for me that I ended up creating ghc-tags

It’s the recommended tool for tags generation: Tags · Wiki · Glasgow Haskell Compiler / GHC · GitLab and is getting support in the GHC repo as an alternative to HLS.

chrisdone · September 14, 2024, 4:48pm

I use hasktags because it just works. Not perfectly but it hasn’t changed for more than a decade. The source to hasktags is trivial, I could keep it running for as long as I like.

I don’t have that kind of confidence in anything that depends on GHC’s API. As tempting as they are, their maintainers are more at risk of burning out.

arybczak · September 14, 2024, 7:42pm

Understandable. FWIW ghc-tags uses ghc-lib so I don’t drown in CPP and maintenance for the last 2 years was generally "every ~6 months spend an hour moving to the newest ghc-lib".

rickowens · September 15, 2024, 1:49am

And thank you for that!! ghc-tags-plugin is a core part of my workflow ever siice I discovered it, and I couldn’t be happier!

arybczak · September 15, 2024, 3:14am

Thank @coot, he developed ghc-tags-plugin and I just repackaged its tag extracting logic into ghc-tags

rickowens · September 15, 2024, 12:33pm

Ah. For some reason I thought the dependency graph was yhe other way round. Thank you @coot!

vmchale · September 15, 2024, 12:36pm

I use ghc-tags because it picks up definitions in happy/alex files. And generally works with preprocessors.

I used hasktags before that.

Kleidukos · September 15, 2024, 3:12pm

I’m more than happy with ghc-tags. Its adoption in GHC is only the consequence of its quality and good user experience, which I can vouch for in both personal and professional contexts.

chrisdone · October 30, 2024, 9:40am

Out of interest I looked at fast-tags vs hasktags vs ghc-tags on the same non-trivial work codebase (842 Haskell modules). Based on the output sizes, it’s clear that hasktags produces the least information, fast-tags produces more (likely due to a slightly better lexer), and ghc-tags the most information (understandably, it’s a full parser). In terms of speed, the difference between them is in the noise if you let them walk the directory structure themselves. I’ve found if you pipe fd into them, it turns seconds into something on the order of 200ms. I’ve updated my Emacs binding to do this, regardless of which tags implementation I’m using.

gist.github.com

https://gist.github.com/chrisdone-artificial/6934f1a5bc6a847253b96d2ecbab73dc

0readme.md

I thought I'd upgrade my hasktags to fast-tags, but discovered that it's not much faster. 
The much bigger saving is to pipe `fd` into it rather than letting either fast-tags/hasktags do its own
recursive directory searching. They appear to be egregiously slow in comparison to `fd`.

I added ghc-tags, and that's the slowest for recursive cold run. It's the fastest on a warm run with `fd`.

Confirmation that they're all producing outputs of roughly similar size.

```
bash-3.2$ ls -alh .*tags

This file has been truncated. show original

fast-tags-hasktags.txt

# Using the recursive directory checker it's not any faster.

bash-3.2$ hasktags . -o .tags +RTS -s
   4,280,500,176 bytes allocated in the heap
     938,002,744 bytes copied during GC
     205,046,592 bytes maximum residency (9 sample(s))
         552,128 bytes maximum slop
             406 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause

This file has been truncated. show original

using-fd-is-faster.txt

# Using fd, it's twice as fast. Both both are 1s faster.

bash-3.2$ fd '\.hs$' | xargs hasktags -o .fast-tags +RTS -s -RTS
   3,127,195,584 bytes allocated in the heap
     179,081,400 bytes copied during GC
      44,187,464 bytes maximum residency (7 sample(s))
       2,310,512 bytes maximum slop
             104 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause

This file has been truncated. show original

juhp · October 30, 2024, 9:50am

Interesting discussion (I had also noticed hasktags’ output seems incomplete).

I see there is also ghc-tags-core - gosh, another option/fork?

chreekat · October 30, 2024, 12:16pm

Looking at Why fd is so much faster than find? · Issue #693 · sharkdp/fd · GitHub, it looks like fd manages to be faster than old standbys mostly just by using multiple cores. Perhaps the tags generators could apply that lesson.

arybczak · October 31, 2024, 11:46am

Since the above results are not reproducible (private project), I can share how ghc-tags works locally for me with the ghc repo and the following config file:

ghc-tags.yaml

source_paths:
- compiler

cpp_includes:
- _build/stage1/compiler/build
- compiler

---

source_paths:
- libraries/base
- libraries/ghc-internal

exclude_paths:
- libraries/base/src/System/CPUTime/Javascript.hs
- libraries/base/src/System/CPUTime/Windows.hsc
- libraries/base/tests
- libraries/ghc-internal/src/GHC/Internal/Conc/POSIX/Const.hsc
- libraries/ghc-internal/src/GHC/Internal/Event/Windows.hsc
- libraries/ghc-internal/src/GHC/Internal/Event/Windows/ConsoleEvent.hsc
- libraries/ghc-internal/src/GHC/Internal/Event/Windows/FFI.hsc
- libraries/ghc-internal/src/GHC/Internal/IO/Windows/Handle.hsc
- libraries/ghc-internal/src/GHC/Internal/JS/Prim.hs

cpp_includes:
- _build/stage1/libraries/ghc-internal/build/include
- _build/stage1/rts/build/include
- libraries/ghc-internal/include
- rts/include

---

source_paths:
- libraries/ghc-bignum

exclude_paths:
- libraries/ghc-bignum/src/GHC/Num/Backend/Selected.hs

cpp_includes:
- libraries/ghc-bignum/include

---

source_paths:
- libraries/ghc-boot
- libraries/ghc-boot-th
- libraries/ghc-compact
- libraries/ghc-experimentsl
- libraries/ghc-prim
- libraries/ghc-platform
- libraries/template-haskell

exclude_paths:
- libraries/ghc-compact/tests
- libraries/ghc-prim/tests

---

source_paths:
- libraries/ghc-heap

cpp_includes:
- _build/stage1/rts/build/include
- rts/include

cpp_options:
- -DMIN_TOOL_VERSION_ghc(x,y,z)=1

exclude_paths:
- libraries/ghc-heap/tests

---

source_paths:
- utils/haddock/haddock
- utils/haddock/haddock-api
- utils/haddock/haddock-library
- utils/haddock/driver

cpp_includes:
- _build/stage1/rts/build/include
- rts/include

exclude_paths:
- utils/haddock/haddock-api/src/Haddock/InterfaceFile.hs
- utils/haddock/haddock-api/src/Haddock/Types.hs

unknown@electronics ghc $ rm TAGS*
unknown@electronics ghc $ time ghc-tags -e
libraries/ghc-heap/GHC/Exts/Stack/Decode.hs:12:14: warning: [GHC-53692] [-Wdeprecated-flags]
    -XTypeInType is deprecated: use -XDataKinds and -XPolyKinds instead
   |
12 | {-# LANGUAGE TypeInType #-}
   |              ^^^^^^^^^^

could not execute: hspec-discover
could not execute: hspec-discover

real    0m4,011s
user    0m24,264s
sys     0m3,897s
unknown@electronics ghc $ time ghc-tags -e
could not execute: hspec-discover
could not execute: hspec-discover

real    0m0,334s
user    0m0,861s
sys     0m0,237s

It runs on a little over 1500 modules. The cold run takes time, but after that it tracks modification times of all modules and on subsequent runs reruns parsing only on modules that changed. I don’t observe the slowness letting it do the directory traversal.

If anyone wants to try this locally, you need to compile GHC first for the build system to generate header files needed for parsing some modules.

sergv · October 31, 2024, 9:58pm

I decided to check whether fast-tags is still fast and this is what I came up with: GitHub - random-random-stuff/haskell-tags-benchmark2024: Benchmark generation of tags from Haskell sources

My workflow is to download some common dependencies and index them together with my project so that names will be resolved in the dependencies as well. The dependencies are static and typically are indexed only once, but when they’re not cached the delay matters for me so I use fast-tags.

The raw speed numbers that don’t include file collection are:

$ hasktags --ctags -o tags.hasktags STDIN +RTS -s <files.txt
...
  Total   time    8.831s  (  8.865s elapsed)

$ fast-tags -o tags.fasttags - +RTS -s -N <files.txt
...
  Total   time   10.805s  (  1.268s elapsed)

$ ghc-tags -c -o tags.ghctags +RTS -s <files.txt
...
  Total   time   30.259s  (  6.278s elapsed)

Overall fast-tags seems to still be a fast one. NB it also indexes hsc, Alex and Happy files.

Regarding precision there were comparisons stating that hasktags collects the least information. The report was assumming that all tag generators generate the same output but that’s not the case: hasktags generates Emacs-style tags by default, fast-tags generates vim-style tags and ghc-tags has no default and forces you to choose. Comparing size of Emacs-style tags against vim-style ones is not right. Different formats also take different time to generate.

I compared vim tags generation, of the three thet fast-tags gives least amount of entities:

$ wc -l tags.*
  154802 tags.fasttags
  229023 tags.ghctags
  230680 tags.hasktags

However hasktags contains the least info. For intance it doesn’t output tag type by default (e.g. whether it’s constructor, function, type, etc), while the others do. Maybe I missed the option to enable it. Example output:

$ grep -F "Key'F14" tags.*
tags.fasttags:26086:Key'F14	all-packages/GLFW-b-3.3.9.0/Graphics/UI/GLFW/Types.hs	358;"	C
tags.ghctags:41912:Key'F14	all-packages/GLFW-b-3.3.9.0/Graphics/UI/GLFW/Types.hs	358;"	c	term:Key'F14
tags.hasktags:47312:Key'F14	./all-packages/GLFW-b-3.3.9.0/Graphics/UI/GLFW/Types.hs	358

chrisdone · November 1, 2024, 7:40am

Good to be clear, I am aiming to just test the Emacs output modes.

Topic		Replies	Views
DevOps Weekly Log, 2024-01-17 Haskell Foundation	11	1406	January 22, 2024
Deprecation of GHCi's tags generation feature (starting with GHC 9.4) Announcements	0	1430	May 15, 2022
Convenience in the Haskell ecosystem	153	40984	October 16, 2023
HLS 2.5.0.0 is now available	17	2242	December 19, 2023
Emphasize "Why Haskell?" on haskell.org landing page	84	3428	May 31, 2025

Seeking hasktags maintainer

Related topics