PVP Compliance of .Internal modules

In keeping with the KISS principle, I’m currently contemplating {-# INTERNAL #-} being used at the module level. It (sort of) takes its inspiration from the {-# BOOT #-} pragma used in dealing with cyclic module imports:

  • modules deemed “low-level” or “internal” would attract a {-# INTERNAL #-} on their module declaration:

    module {-# INTERNAL #-} UglyBits (...) where

  • to import one without warnings requires {-# INTERNAL #-} in the import declaration:

    import {-# INTERNAL #-} UglyBits (...)

Then the PVP could be applied to packages/libraries intended for “internal use”, and novices would be warned if they use them. Note: there should not be a way to cancel {-# INTERNAL #-} warnings as that would defeat the purpose of using {-# INTERNAL #-} to being with! (as novices would eventually find out how to disable those “inconvenient” warnings…)

As for an extra cabal stanza…that would definitely save a lot of keystrokes! However, it would depend on whether the same keyphrase (e.g. internal-package) can be used at both places - where the package is declared, and where the package is used.

1 Like

I agree this feels more like it should be a Cabal feature than a GHC one, although I actually think it would be sufficient to simplify by reducing the granularity even further, from module to library. It seems all we really need is a big scary “this library doesn’t follow PvP” tag that can be put on libraries. This tag would be displayed prominently on Hackage, and used by tools like cabal gen-bounds. It would even be useful for a few existing packages.

In fact, given that Cabal mostly doesn’t even really know about PVP (it’s a convention, rather than a core part of the solver), I don’t think it would be difficult to implement. It’s more of an issue of getting community buy-in on this being the right approach. I at least see it as a strict improvement on the status quo of using .Internal.

1 Like

Using internals is “safe” if you’ve specified an exact version of your dependency

If we follow the -internal packages. Yes. If not, I don’t see how? Again this is under the premise that you have PVP complaint packages, but expose symbols that explicitly do not necessarily follow the packages PVP, to permit consumers escape hatches for edge cases that the library author did not anticipate.

To me this is an annotation of the binding that is being exposed. Hence whether or not this is picked up but the compiler or the packaging tool is somewhat independent of this. Yet, to have the annotation at the binding granularity it would need to live in the source. We technically have something like this with the {-# ANN -#} pragmas, which are just severely handicapped as they cause the TH machinery to trigger when used. It also allows other tools (e.g. haddock, hie, …) to pick up the annotation at the binding level, and add the relevant rendering/tool-tips.

What benefit do we get from marking entire libraries as not following PVP? If a library really makes no effort at compatibility between versions it should just bump the major version every release and be done with it.

1 Like

Yes, I think that’s a fair summary. So far in opaleye I’ve been happy to follow the .Internal informal convention, but I absolutely don’t want to make it formal, and @hasufell (and Nikita) have convinced me that correctly-version internal packages are the way to go. I’ll do that for opaleye at some point in the future.

Please don’t give cabal more work to do :joy_cat:

I don’t think annotations will solve anything. We need to accept that what’s exposed is not internal and what’s internal must not be exposed.

We already head this story wrt base: anything that is exposed will used and relies upon.

Exposing implementation details is a bad idea because your users will break your invariants and will cause your library to fail.

Relying on implementation details as a user is also a bad idea, roles swapped: the library will change and break your code (if you haven’t already shot yourself in the foot).

I think this has nothing to do with versioning, at least in principle. It’s just good engineering practice.

4 Likes

+1

On the one hand, yes, definitely. On the other hand, absolutely not! I have been frustrated so many times by being prevented from making progress but some internal “helpfully” being hidden from me. I’m an adult! Let me use internals if I want to.

1 Like

With all due respect, I disagree. The phrase “We’re all consenting adults here” is well-known in the Python community. It has been used to explain with Python does not have private class methods or attributes. Let’s just say that is not the ecosystem I look up to :slight_smile:

You can indeed use internal modules (when exposed but also when they are not), you can also use accursedUnutterablePerformIO if you feel like. Nevertheless, I don’t consider relying on these things a good practice; and I would not accept them in a codebase I am responsible for.

2 Likes

It’s definitely not good practice! I don’t want to rely long term on other people’s internal modules. But I do want a 1 minute solution that lets me move forward with my prototype rather than a 1 week or 1 month “correct” solution that round trips through a request to the maintainer to expose an otherwise-safe internal.

3 Likes

The situation we have today is

  • that those folks who do not want to expose internal stuff can do so just fine.
  • Those who consider this sometimes useful, can do.
  • we have some rather loose convention to use .Internal modules, that some follow.

If we then try to rely on the pvp to reduce breakage as we version an exposed API to our library but ignore internal (yet exposed) bindings we run into the current situation.

We seem to all agree on some form of signaling for internal, yet exposed bindings (whether that be on the module level or not). We do certainly not advocate for every top level binding is exposed though.

1 Like

I definitely advocate for every top-level binding exposed! That’s what opaleye does (through .Internal modules). I don’t want to stop my users making quick progress by hacking around with internals whilst we take the time to deliberate a proper, stable, non-internal solution. rel8 is a client of opaleye and iterates quickly using opaleye's internals with great relish, and great success. From time to time we take stock and stabilise the interfaces rel8 uses. This is a very effective way of working (and a very sharp implement that must be used with care).

Where I have changed my mind is on excluding .Internal modules from PVP. @hasufell and Nikita have persuaded me that the correct thing to do is to implement all of opaleye in a (true to PVP) opaleye-internal package (that exposes every module and every top-level identifier), and have the opaleye package just be re-exports of the non-internal modules.

4 Likes

This last point seems like a very good motivation for sublibraries. If the whole library is the internals and the stable API is just re-exports, you’d definitely want those to remain in sync.

I thought so, but then @hasufell explained why it’s undesirable: it bumps the stable library much more than necessary!

3 Likes

This is a great ideal, but it doesn’t work in practice. At times, relying on implementation details is not just a good idea, it may literally be the only possible way something can be done.

A good example is a recent bug in Bytestring’s decodeUtf8 function, which caused it to accept invalid byte sequences as valid UTF-8. This bug was reported, and fixed for the next version, which is great.

While waiting for that process and for the next version, it would be nice stay up to date with bytestring, GHC and everything else. I wouldn’t want this bug to hold updates back to a version before the bug was introduced. After all, this is but a small hitch in an otherwise super useful set of updates.

The solution? Use bytestring internals to whip up a version of this function that doesn’t have this bug, and then live a happy life running the latest GHC, bytestring and all that good stuff. Later, we’ll remove our internal implementation once again, once the fix is released and we’re ready to update again.

Had bytestring not exported these symbols, the alternatives would have to be crazier:

  • Do not upgrade at all until the fix is released and stable
  • Fork (or manually pin or patch) bytestring, recompile ghc and all deps
  • Use a different library, with bytestring this is a particularly hard sell
  • Use some unholy hack to force the exporting of those symbols anyway. I have seen such tricks before involving TemplateHaskell

I honestly think how this is going is the best way. We’re happy with a stop-gap solution, and bytestring can do things on their pace to get the next release out in due time.

It’s not about being adults, it’s about being able to do the things you need to do to make your program work.

3 Likes

I think the way of resolving @FPtje’s practical approach with @andreabedini’s reasonable desire that we don’t become like Python is to notice that the standard practice in the Haskell world is to only let internal details be accessed from places where they’re very clearly marked (in .Internal modules, for example). Nothing like that can exist for Python because nothing can stop you getting into the internals of any object or module.

Oh, yes, of course. This is an interesting idea, all the same, so I may apply the principle in some of my projects and see how it plays out. I could see it easing the process of figuring out which bits of an API are really stable and giving a fairly straightforward path to stabilization. It would reduce release churn, as well, since the “public” package need only widen its dependency range on the “internal” package in cabal metadata in the common case. Neat.

1 Like

Genuine question: how do other language ecosystems deal with this? This is very much not a Haskell-specific problem.

So far I think we have:

  • Python: nothing is private

I can add:

  • Java: you can use reflection to access private methods, but this is pretty clearly “evil”

Any others? Does anyone solve this in an actually satisfactory way?

4 Likes

I agree on the proper analogy for package modularity in Haskell being “object” access in OO languages. I would add that the friend keyword in C++ as well as the protected keywords give some related solutions – notably friend lets packages forward-declare access to allow only “intended” consumers to access internals and protected operates similarly, but with regards to inheritance. I think its worth noting that the friend keyword was apparently confusing enough that Java just ditched it altogether. (And the protected keyword doesn’t give much guidance since thankfully we don’t have to worry about inheritance on top of everything else at least!)

Of course, even if we had protected forward-declarations, we’d want them to only be a warning and not strictly enforced (otherwise we could just keep unexposed everything designated “internal”). And that seems to be a lot of work for fairly little payoff!

Ultimately the issue is straightforward – you can’t both make everything possible to use by others, and also ban others from using it. I think the two-packages-and-pvp solution is the best way to manage this I’ve seen proposed for any language, and outside of a bit of extra clunkiness it seems like a marvelous approach, with the arguments against it being rather flimsy.

1 Like

That’s a great question to ask. I think Elm can be added to the list. It has no separation between internal/public API, but it does have something interesting: Publishing guide

Essentially, it enforces semver through a program called elm bump. It decides the package version for you. As far as I know, it looks at the entire program for this bumping. It is very much impossible to not stick to the rules as set out by that program.

I think one could call it a satisfactory solution, because it eliminates a lot of discussion. It’s a very radical solution though.

Separation can be achieved with a system like this by having two packages: one for internal, one for the API. Both have their versions decided by Elm bump, but typically the API package will have lots of patch versions, and the internal package will typically have lots of major versions.

Elm takes a unique place on this spectrum of version management solutions. It has its up and downsides, but it’s definitely worth mentioning.

1 Like

I still don’t understand why this follows. Even if the internal library gets lots of major version bumps very quickly, surely the stable library only needs to change its major version when its API changes? It seems to me like you can reorganise the internal library as much as you want, but as long as the stable library keeps the same API then it doesn’t need to change versions.