Reviving the Abstract FilePath Proposal (AFPP) in user-space

Introduction

For anyone who isn’t familiar, the original Abstract FilePath Proposal is here: https://gitlab.haskell.org/ghc/ghc/-/wikis/proposal/abstract-file-path

In short, it tries to get rid of the type FilePath = String mistake for various reasons.

Proposal

Since it’s unlikely that this proposal will ever make it into base, the alternative proposal is this:

  1. write a library that implements the proposal as a new type AbstractFilePath with various utility functions (as opposed to replacing FilePath)
  2. Patch unix package to provide System.Posix.Files.AbstractFilePath variants (we already have them for ByteString)
  3. Patch Win32 package to provide a similar variant
  4. write libraries using these types (filepath and directory aren’t forced to migrate)

Existing work

I’ve already taken a stab at 1:

The big advantage for me personally is that there are quite a few use cases where I can get away without knowing the filename encoding. But I can only ignore the encoding if we’re dealing with untampered byte strings. In case this will move forward, I’m planning to migrate some of my libraries to this type, which are alternatives to filepath and directory I currently use in all my projects.

Discussion

The main question is basically if unix and Win32 would accept patches for this as new API variants. One of the reasons of AFPP is to not roundtrip through String or ByteString, so there’s not much point if we need to convert from unpinned ShortByteString before the syscall.

I think this proposal doesn’t really need complete consensus, since it’s backwards compatible and just an alternative API. It won’t change base, it won’t break existing code and it won’t force anyone to use it, if they’re not interested. And even if someone uses a library utilizing AbstractFilePath against their will, they can still convert to String and use the other APIs.

I’m also interested to hear opinions about the current implementation. Please see the rendered documentation.

I’d also like to hear @snoyberg’s opinion, specifically, since he’s one of the authors of the original proposal.

7 Likes

Just for the record: I more cosigned the original proposal than authored it. I only point out that distinction because there may be nuances here that I’m not fully aware of.

Overall: yes, I’m totally in favor of this general kind of direction. I’d like to see if it has a chance of moving forward somehow.

In the Vector Types Proposal, we’re currently talking about putting the shared types into primitive as a “not-quite-base” place but still accessible enough to other packages. Perhaps some kind of shared type can go there.

It may be worthwhile discussing what a path to wider adoption may look like. But initially having a standard, semi-blessed better approach would be a huge step in the right direction.

Why is it unlikely? Is there something like a list of points for and against this change?

1 Like

In the Vector Types Proposal, we’re currently talking about putting the shared types into primitive as a “not-quite-base” place but still accessible enough to other packages. Perhaps some kind of shared type can go there.

I’d be all in favor of putting it wherever it’s most convenient.

It may be worthwhile discussing what a path to wider adoption may look like. But initially having a standard, semi-blessed better approach would be a huge step in the right direction.

Yeah, I think the original idea of putting it straight into base is just too hard to get right and there would likely be massive migration issues, despite GHC warnings etc. The proposed alternative is less invasive and allows us to explore this type of new API with reasonable overhead IMO (like the additional type-specific modules for unix, Win32 and possibly directory at some point).

So, after there is support from unix and Win32, I think other consumer libraries (like directory and process) have two options:

  1. Provide API variants via new modules for AbstractFilePath
  2. Switch to AbstractFilePath completely and direct users to the conversion functions if their codebase is still FilePath based

Option 1 is likely something that we want to see for directory at some point. Other packages that are less deep in the dependency hierarchy may go for option 2.
Depending on how the ecosystem moves forward with adoption, we can start deprecating FilePath based APIs and have more packages go for option 2.

One way to go about this would probably be to aim for a stackage release, where the majority of packages are AbstractFilePath based or have such support. I’m not sure if there’s a way to create more incentives for library maintainers to switch or accept patches?

The advantages of this approach are:

  • we don’t really break anyone’s code, because it’s a new type
  • migration can be done gradually
  • we don’t need to get it into base from the beginning, which is way harder and hasn’t been achieved until now
  • we can consider this as a proof of concept for theoretical base-adoption and can still reserve the possibility of killing off FilePath in the future

The only disadvantages I see are:

  • There will be some overhead for certain packages maintaining two APIs
  • As a result, it’s likely that one’s codebase has to deal with both types at the same time for a certain period of time

These disadvantages are relatively small, IMO, compared to the alternative of not achieving adoption at all.

Why is it unlikely? Is there something like a list of points for and against this change?

I already described most points in my previous comment. But the main thing is that AFPP in base is a huge breaking change and requires much more consensus, coordination and GHC support than the alternative proposal.

I don’t think we need to start in base to get this change underway. Baby steps!

1 Like

As a newbie, and worse, a newbie on Windows, I’m not sure this is the right place to comment, but . . .

I recently bailed out on trying to get the Phoityne debugger working on VScode, because, after days of struggles, I narrowed the problems down to one issue. But it’s really a class of issues: filepaths with elements with embedded spaces. Different issues for different tools, at different levels.

I’d get tools breaking like “C:/Users/Michael not found” because my username on my Windows box is “Michael Turner” and after expanding $(HOME) or ~ whatever . . . well, you can guess the rest. A problem in ghci too, though I vaguely remember finding a workaround on the ghchi command line. Since I’m on Japanese windows, who knows what would have happened if I’d decided on the username マイケル・ターナー, which I’d probably have if I was working in a Japanese corporation.

I was actually briefly inspired to try to go into the code and fix this this embedded-blank-in-filepaths problem, but then remembered something: I don’t know nearly enough Haskell yet. Also, I have no idea what I’d break.

What I’d hope to see come out of this:

(1) an AbstractFilePath that’s aware of the possibility of embedded blanks, and that makes it possible to use embedded-blank filepaths with existing tools (even if painful – just making it possible is a start), and

(2) guidelines for helpful Windows-friendly package authors, such as the author of Phoityne, so that when their tools emit filepaths across various protocol boundaries, they know how to bracket them in an appropriate, filepath-preserving way.

1 Like