[RFC] "http-types" breakage / additions / rework

I am requesting comments from everyone who has experience in web development, networking and/or has strong opinions about library stability/documentation/improvement.

Background info

It’s been a few years since I’ve taken over maintainership of the http-types package. I had quickly released version 0.12.4 (4 years after the then latest release) which was mostly a documentation update and releasing what was already merged in the main branch, and I now have some time and space to work more on this package.
There were some small additions here and there that were easy to implement from issues on the previous repo, so together with a more involved test suite, we now have 0.12.5.

I mainly took over as maintainer, because I’ve found lots of functions specific to handling headers in the wai-extra/warp packages, which I felt would be more suited in a package that was more generally about HTTP than in server-specific libraries.
I asked around on the Discourse, and @Kleidukos advised that it’d be best to add those helper functions to the http-types library. Which, in my opinion, is indeed probably the best location.

If anyone else has a better idea/different opinion, do let me know, because we now come to the main reason why I am Requesting For Comments:


The main issue(s)

Just adding extra functions to http-types is one thing, and this will probably happen regardless, but while going through the code and especially the type definitions, I found that the API is at best “good enough”.
There’s a bunch of type synonyms instead of newtypes, and all of the data constructors and fields of data types are exported. This doesn’t help with stability/maintainability, but some definitions are just asking for users to make mistakes.

I’ll take Headers as an example, but I might rework all modules depending on if I find ways to improve the API or performance.

My issue with Headers

Having definitions like

type HeaderName = CI ByteString
type Headers = [(HeaderName, ByteString)]

works, but it also has a lot of shortcomings:

  • changing the implementation will almost by definition break every location they are used, since the functions creating and changing them are from case-insensitive or Data.List.
    e.g. so if HeaderName is no longer a CI a, hello compiler errors.
  • CI ByteString has no guarantee to be a valid HeaderName. Header Field Names are only allowed to be a subset of the ASCII visible characters.
  • A “list of headers” is true to the definition of HTTP Fields, but letting all users use functions from Data.List to manipulate the headers makes it easy to create bad header lists. i.e. introducing duplicate headers, which is bad because “a sender MUST NOT generate multiple field lines with the same name […] unless that field’s definition allows multiple field line values to be recombined as a comma-separated list” - RFC 9110 §5.3
  • Using a linked list also incentivises to add to the front, which is not bad per se, but you typically want more connection-specific headers to be at the front for HTTP/1.x
  • There might be better data structures than a linked list to make handling headers more efficient.

What to do about it

I’ve been working on improving HeaderNames and Headers, and I feel I’ve got a better API and also some promising benchmarks. I’d also like to improve other types where I can so that at least the HTTP types part of HTTP requests and responses are as fast as possible, which would be a boon to any user/company using Haskell for web development.
Once I’ve finished the test suite for this new implementation, I’d like to publish it as a candidate to hackage and hope some people will give me some feedback on it, but for now the following are more pressing matters:

Ecosystem hurdles

Looking at the reverse dependencies of http-types has shown me a few problems:

  • Because of how old this library is, and how little it has changed, a lot of dependency constraints from packages depending on http-types are either completely missing, or have no upper bound. (i.e. yesod-core: http-types (>=0.7), amazonka: http-types (>=0.12), or hpack’s naked http-types)
  • Because of the above-mentioned points on type synonyms, there’s no easy way to set up the migration. There will be breakage.
  • Adding this new implementation to http-types alongside the old implementations (e.g. in other modules like Network.HTTP.Types.Header.New) might result in confusing errors, but replacing the old modules with completely new types and functions will force all packages depending on http-types to change to the new types. (which is not strange for a major version release, but the scale of this impact makes me pause for obvious reasons)
  • Creating a http-types-compat package is a possible solution so that packages depending on http-types can support all the versions before and after this change, but this would still incur work for all down-stream packages to use the types and functions from the new packages, instead of CI and Data.List.

The big ask

So I ask advice on how to proceed. There are a few options I can see right now:

  1. Create a new major-major version of http-types (i.e. http-types-1.0.0) that is completely different from the current implementation.
    • This will break every package that doesn’t have an upper bound
    • If this is implemented in e.g. wai and warp, it will force all packages downstream to also update, or they won’t be able to use the newer wai/warp versions.
  2. Add the new implementation in other modules next to the old implementations in http-types (probably as http-types-0.13.0) to give users the option to use which one they want.
    • This will not break anything by publishing the library, but might lead to confusion when, for example, wai starts using the new types, which have the same name. Users will get errors like expected `HeaderName` but got `CI ByteString` and then everyone will need to adapt to the new implementations anyway.
  3. Create a completely new package, like http-utils, but this might create a split in the ecosystem of packages using http-types and those using http-utils, but at least it’s obvious that they are not compatible.
7 Likes

Specifically with regard to refactoring Headers, it may be prudent to not make the representation too constrained. While an application that is strictly conformant with the HTTP specifications must abide by the rules, the same is not necessarily true when parsing response headers from a less meticulously implemented peer. Received headers may well include duplicates, or include characters outside the permitted range.

So (CI ByteString, ByteString) has the advantage of being able to represent whatever some response might throw at you, which may not be a true of a more constrained representation.

The one thing that one might change now is to use unpinned ShortByteString instead of ByteString values, but it is not clear that the small improvement is justified by the cost it might impose on all the users.

So my advice would be to provide and promote functions that one could use to construct and/or validate headers and sets of headers, but perhaps not change the representation to preclude handling values might lie outside the strict bounds of the HTTP specification.

4 Likes

Quick take: from my reading, it sounds disruptive enough that a new package, perhaps with a similar name, is justified/needed. That would allow building with packages which both have and have not yet been updated for the new API, I imagine. Also with helper functions being added, it’s no longer just http types, right ?

2 Likes

What’s the point of versioning then?

2 Likes

I think it’s laudable but if people have, in spite of cabal check / hackage, not put upper bounds on their dependency for http-types, you cannot be held responsible.

Bump the major version, that’s the signal that there’ll be breakage, and you have fulfilled your part of the versioning contract with downstream users.

We have an A.B.C.D scheme where A indicates an epoch (radical changes, full revamp of the API) and B indicates a major version (breaking changes but still going in the same known direction). They’re not just for show and some people tend to treat the job of maintainer as forever-butler. Things change, be it people or APIs. That’s why we have version bounds. :slight_smile:

10 Likes

Which means there are two different concepts here:

  • List of raw headers as provided in the HTTP request;

  • Library-augmented set of headers served to the user at the high level.

Should the latter be in http-types, wai or somewhere else? Couldn’t tell you.

I wouldn’t be surprised if http-types is the only package using case-insensitive, everyone downstream is forced to include it just to work with header names.

What’s the point of versioning then?

What’s the point of separate packages, even ? :grinning_face:

Each has their uses; sometimes a greater separation is the right move. But it was just a quick take; I don’t know what’s best here.

The new implementations (the types) themselves don’t enforce any HTTP rules, though I only have strict parsing functions at the moment for some. So thank you for reminding me that lenient parsing functions might be a helpful addition.
The main benefit, though, is that the new implementations should be as fast or faster than the current ones, and that the functions manipulating them will make it easier to not make mistakes.
(the new HeaderName for example is a ByteArray which is 3x faster in making comparisons (i.e. (==)) than a ByteString, which also speeds up lookups)


The library already has helper functions since 2012, so it’s already more than just types, though I do get your point.
I’m hesitant to create a new library though, because it feels a lot like the XKCD comic about standards.


True, but seeing as this is such a fundamental package for web dev packages, just bumping the major-major version out of nowhere feels a bit like if base would suddenly bump to 5.0 without warning.
You have probably convinced me of just making this a major-major version, but I’ll probably first go around some major packages and inform the maintainers of the incoming change, so they will at least have a heads up to add bounds and such if they haven’t already. :thinking:


I’ve done a quick comparison of the reverse dependencies, and a rough estimate is that about half of all packages that depend on case-insensitive also depend on http-types. There might be more that only depend on wai, for example, and thus indirectly depend on http-types’s types.

1 Like

binary-instances is a usual suspect for bringing case-insensitive into a build plan, more common than http-types.

My general perspective is that Haskell has become as good as it has because people thoughtfully push through the work to make things right (e.g., Semigroup => Monoid rework, Functor => Applicative => Monad rework). As Amazonka maintainer I am absolutely happy to help get upper bounds in place to prevent breakage before you do a new release. amazonka-*-2.1 is taking much longer than I’d like because of a house move and other things, but I can make sure it (and older versions) are released/revised with correct upper bounds, and CPP in support for a breaking http-types-1.0.0.0 if it comes to that.

6 Likes

I would suggest not overly prioritising raw performance, usability and correctness are higher priorities here. Header parsing is rarely on the critical path for HTTP clients and servers. Since header name comparison does need to be case-insensitive direct memcmp()/strcmp() of byte arrays is not quite sufficient.

Since ShortByteString is just a thin wrapper around ByteArray, if that’s the path you’re taking, you should probably use those, with case-insensitive comparison as appropriate.

4 Likes

I would love to see any movement in the http-types front. It doesn’t feel like the package has found the sweetspot between usability, correctness and probably a lot of other aspects.

If you’re changing the api, I see 2 general directions:

  • Go even lower, no ci: use patterns and bytestrings. And add a few low level convenient helpers for e.g. list values and case manipulation. This is probably not worth the effort compared to the breakage it might cause downstream.
  • Go for Haskell’s real strength. Types guiding the programmer towards the correct usage by default. This would need some optional leniency for parsing and potentially even generating non-conforming data for bad clients? And probably also allow for adding some raw data manually to the otherwise conforming headers.

I’d love to see option 2, where for common headers I wouldn’t need to write helpers that correctly assemble data (e.g. cache headers, where multiple options may go on one line comma separated)

But I guess like most people here, we’re just happy that this library gets some much needed love. And that you’ve sent out a reminder for version bounds upfront.

2 Likes

My priorities are indeed correctness first, usability second, and performance third, but if I can improve performance, why not? :slight_smile:

What I have right now is a ByteArray with a “case bitmap”, so that the ByteArray is always lowercase, and only when using HTTP/1 or pretty printing you would create a case sensitive result. But I do have one more idea which I want to check before committing to the current implementation.

The main point will be to not expose internals, so that when we find more performant ways to handle things, updating won’t break user-space.

1 Like

Yes that is a good approach. :slight_smile:

2 Likes

I agree with those who say you should break the API and bump the bounds. Many who don’t set upper bounds do it in full knowledge that major bumps can happen and they’re willing to take the risk.

I would suggest that whatever new API you decide on you implement as much of it as possible in the old major version, so that users of the package have a forward-compatible mitigation. That way they can write code that is compatible with both major versions and avoid propagating breakage up the dependency tree.

3 Likes

Do you mean the current major-major version? (i.e. release it as http-types-0.13.0)
Or as the current major version? (i.e. release it as http-types-0.12.6)

Or either, depending on if the additions constitute a minor or major version increase?

I totally understand the dissatisfaction with the current API, but breaking changes here seem like a major pain for little benefit. Earlier and stricter validation always causes problems in existing software (we just dealt with such a situation upgrading aeson, despite my having Hyrum’s Law top of mind and really trying to anticipate what could go wrong). And as others have pointed out e.g. the loose (or an even looser) definition of Headers is likely necessary, if only to return a useful error message to naughty clients. It could also be informative to know how e.g. chrome or nginx parse headers: how lax are they, where does validation occur.

Instead I would go with your approach of adding validation functions or a Safe module that enforces tighter guarantees (e.g. TH for header name literals, a parsing function from loose Headers to Safe.Headers). Let libraries transition to those if they have value. But even here I’m really struggling to imagine any concrete benefit we could get that would be worth any breakage, especially subtle breakage like learning Important Customer X uses a homegrown middleware that injects a header with a slash into every request and they simply couldn’t possibly change it because it would break their whole business.

5 Likes

Here are two projects that picked option 2 and 3:

  • filepath introduced a new OsPath type, but it’s not used in core libraries like binary and it requires an extra file-io dependency.
  • lucid2 introduced a new package but it has not been adopted (6 direct dependency vs 82 for the old one).

Do we have other examples of such ecosystem changes?

Personally, I would prefer we don’t break such a core library, but if we are going to, then I vote for option 1. Would it be possible to provide automatic rewrite rules to help project migrate to the new API?

Thanks!

3 Likes

I’m skeptical of the value proposition here but i think that if the community decides this is valuable we should do it loudly and directly.

Inform all the major stakeholders upfront, give them a timeline to add bounds to their packages, then release the change as a major-major version.

Haskell prides itself on fearless re-factoring. We can make big changes :slight_smile: .

1 Like

I’m not sure what you mean by the options you’re describing, but I’ll try to explain another way. If you’re releasing a new major version with breaking changes that will require users to us e a new API then in as far as possible please try to make that new API available in the current major version too, so that users can accommodate the new API without upgrading the new major version.

1 Like