Module auto-discovery in Cabal

You’d say, why doesn’t cabal just add this module to the cabal file?

Actually, my question is: why does this module need to be in the cabal file at all? exposed-modules makes sense to me - there needs to be some mechanism for specifying the public interface of a library. However it’s not clear to me why other-modules exists.

Hope this explanation satisfies you:

The short gist is, that the exposed list of modules is a critical piece
of information of a package and ought to be fully intentional, so
autodetection-via-filesystem runs the risk of including unintended
“garbage” that happened to lay around in a filesystem; so autodetecting
files rather than intentionally statically enumerating them isn’t
without issue as well. GHC was taught to warn about undocumented modules
(-Whome-missing-modules) to help with that (and autodetecting modules
from the fs would defeat the purpose of -Whome-missing-modules
again). This is a bit of a philosophical issue, and whether you consider
statically determined APIs more robust or are fine with
dynamically-via-filesystem-index-inferred APIs.

There’s also the technical minor benefit that tracking changes to the
filesystem directory index often requires a full recursive traversal
each time to be on the safe side. But more importantly, we have tooling
that has only access to the .cabal files in the 01-index.tar and needs
to know the set of exposed modules (including how they’re affected by
cabal package flags) and for which it would be too expensive to have
to download and inspect the actual source-tarball; in fact it would
kinda defeat the purpose of a package index if it lacks such
essential package-level information.

So at the very least for packages that end up in a package index,
automatic population is not something that’s sensible to do. However,
there’s no reason we can’t have tooling which is able to sync your
filesystem to the module list in your existing .cabal file. This way
you’d explicitly track the module manifest in your .cabal file and you’d
make changes to the manifest more explicit than merely by what
filenames happen to be in a folder, and you reduce the busy-work of
manually having to sync that list by hand if this is something that
causes you overhead. This would provide us best of both worlds IMO. A
proof-of-concept for such a tool would easily be hackable in a
single weekend; it can be easily prototyped outside of cabal proper and
if deemed convenient enough could be integrated into cabal proper.

Support automatic population of `exposed-modules:` · Issue #7016 · haskell/cabal · GitHub

4 Likes

Thanks, but unless I’m missing something, this is all talking specifically about exposed-modules which I’m satisfied is necessary - it doesn’t seem to argue that other-modules are necessary?

So, basically we’ve established that the cabal file is the source of truth, not the file system. Then comes the matter of “exposition”: other-modules are modules that are required for build, but need not be used by consumers who depend on your package component (library, executable, test suite, benchmark suite, etc)

1 Like

I hate to be a pain, but I still feel like I’m missing something.
It makes total sense to me that exposed-modules, as the interface of your library, ought to be specified as a conscious and deliberate choice. And it makes sense that the cabal file ought to be the source of truth for that. What I don’t see is why other-modules can’t be derived automatically as the transitive closure of modules imported by the exposed-modules. I feel like the cabal file is still the single source of truth in that case, it’s just that the info is “normalized” so to speak.

(aside: sorry for clogging up this thread with a question that’s not quite on topic, perhaps my question ought to be split into a separate thread by a mod?)

2 Likes

No you’re not a pain, just realise that a lot of things don’t move because not enough people find them painful.

What I don’t see is why other-modules can’t be derived automatically as the transitive closure of modules imported by the exposed-modules.

Because the cabal file is the source of truth. :slight_smile:
When I read a cabal file (programmatically), I get a fairly good overview of the structure of the project. This also allows me to determine that I may have a typo in the cabal file or in the import statement.

Moreover, the "transitive closure of modules imported by the exposed-modules" is something that you’d have to use GHC for! Which means depending on the GHC API. So now you’ve replaced a fairly simple human action (“write down a module name”) into reimplementing or instrumenting the semantic analyser of a compiler.

4 Likes

To be fair, the “fairly simple human action” entails a little bit more than just writing down a module name - it’s not the typing that’s annoying, it’s keeping track and remembering a step that is, from a user’s perspective, redundant.

I actually consider the fact that each module name needs to be stated three times (module file name, module header, exposed-modules / other-modules) one of Haskell’s biggest warts - I understand why things have to be this way (and frankly, I do not enjoy how Python does it at all), but it’s still unfortunate.

2 Likes