If OsPath is intended to replace FilePath why isn't FilePath deprecated

if we look at filepath documentation. this seems to me to say that you almost always should use OsPath over FilePath if the cost is the same.

10 Likes

As, effectively, System.IO.FilePath is just a synonym for [Data.Char.Char], I am not sure if ‘deprecate’ is the correct term. In his associated blog, and elsewhere, Julian Ospald (@hasufell) has already gone the extra mile to urge people to make the change where it counts and explain the benefits of doing so. I think it is a good question to ask if the Haddock documentation for FilePath could helpfully say a little more than it currently does. It currently says:

-- | File and directory names are values of type 'String', 
-- whose precise meaning is operating system dependent. Files 
-- can be opened, yielding a handle which can then be used to 
-- operate on the contents of that file.

type FilePath = String

(EDIT: This documentation is a direct lift of the text at Section 41.2 of the Haskell 2010 Report.)

If I understand the blog correctly, this would be more accurate:

-- | File and directory names can be represented by values of type 
-- 'String'. However, the precise meaning is operating system dependent.
-- (In some cases, this can cause problems and other types have been
-- developed to avoid them; see the @filepath@ package.) Files can be
-- opened, yielding a handle which can then be used to operate on the 
-- contents of that file.

type FilePath = String
7 Likes

sure the proposed documentation change is good but why aren’t the functions and the datatype deprecated with a warning like head?

head is not deprecated. Also: FilePath and head are part of the Haskell report.

2 Likes

There are people here better placed than me to answer your question, but this is mine:

Haskell is used in a variety of contexts. In some, the problems with [Char] that Julian’s blog post identifies are mission critical. In others, they can be considered relatively unimportant.

An important context is education. There, at least initially, it can be useful to put certain complexity to one side, in order to reveal other things more plainly.

If you are confident that you will be working in an environment where directories and filepaths will involve only ASCII characters, then the problems that Julian identifies can be put to one side.

1 Like

The premise is merely that for every operation that accepts/returns a FilePath there exists that same operation that accepts/returns an OsPath and there’s a conversion in between. The default implicit conversion is encodeFS/decodeFS, so there’s no point reaching for OsPath if that’s all you’re going to do anyway.

Also as of right now using OsPath is excruciatingly tedious, as none of the system functions are where you expect (gotta fetch them from unix and Win32 packages and combine with CPP), general locale encoding is in GHC.IO.Encoding (internal module), and all the data conversions are performed at runtime (unless you use Template Haskell).

So, as with all things GHC, come back in five years, things may be different by then.

no you don’t, you can simply use file-io: Basic file IO operations via 'OsPath'

2 Likes

Didn’t know that existed and I don’t see getArgs there, exciting.

Hmm yes, admittedly, I don’t think there is a getArgs version that can return OsStrings yet. I remember a ticket somewhere about some functions that were still missing for that to work / (which was being worked on). Let me see if I can find a reference somewhere.

edit: see e.g. Support for ospath/osstring ¡ Issue #491 ¡ pcapriotti/optparse-applicative ¡ GitHub and the references therein.

1 Like

That should be easy:

But keep in mind that the windows version behaves slightly different than the base getArgs (because it uses actual windows API).

1 Like

yeah you are correct about head, it only has a warning,sorry for the mistake, about the report it is 14 years old, when it was made it already had many compromises. Currently i know 0 haskell implementations besides GHC so compatibility is not relevant, if we decide something is bad we gain nothing by keeping it

…and from the final page of the associated paper:

That is for CLC to decide.

And I’ll vote rather strictly against proposals that violate the standard.

So I have conflicting goals as filepath maintainer and CLC member.

I think we can make OsPath the defacto standard outside of base and slowly deprecate String/FilePath support from other core libraries. They are not obliged to support FilePath.

With “slowly” I mean 10+ years down the line.

FWIW, I use FilePath for documentation purposes.
That is, a String-ly type standing for a filename.
In my scenario, this file is never opened / read etc.
So IMHO, this type has a place.

Can you elaborate on that? Why couldn’t you do that with OsPath/OsString? Or rather: why would FilePath be a better choice there than OsPath/OsString?

2 Likes

From a naming perspective, I feel the Os prefix is out of place in my setting:
I receive a filename which is a field in a json received by yesod web server.
The actual file is never opened - the name just gets processed and propagated to several places
in the output json.

I see. Whether a file gets opened/read/written/etc is irrelevant for this discussion though; the main point is that String is not a good representation for paths. OsPath/OsString represent/store this data in a better way (more compact representation, better support for non-asci characters). I would think that that is relevant also in your scenario. I do agree that ‘FilePath’ may be a nicer name than ‘OsPath’, but ultimately that could just be solved using a type alias (i.e ‘type FilePath = OsPath’ ).

Note that it’s an interesting problem on how to encode OsPath for json.

If you binary serialize it prescisely, it can’t be deserialized between platforms (maybe that’s a good thing?).

If you encode it as FilePath, you lose the original bytes. There’s no simple answer. You have to understand the use case.

You’re phrasing it as if manipulating system paths is ancient magic and only GHC maintainers have the access to the tomes describing the blood rituals.

Serialization itself is not a problem here: JSON can represent all UTF-8 code points, the RFC merely states that deserialization of surrogate points is implementation-defined. aeson chooses convert them to replacement characters, that’s anotherr point where you’d lose the bytes.

It is thus the responsibility of the side that wants to serialize the OsPath to ensure that it can properly represent it, whether it be explicitly through a conversion (same as what FilePath does), or implicitly by guaranteeing the correct encoding through system setup.

1 Like

Sorry, I don’t understand what this means. What does anything have to do with GHC here?

The OsPath type has a platform CPP ifdef around an inner constructor: So trying to construct OsString (WindowsString x) on unix is a compile error.

newtype WindowsString = WindowsString { getWindowsString :: BS.ShortByteString }

newtype PosixString = PosixString { getPosixString :: BS.ShortByteString }

#if defined(mingw32_HOST_OS) || defined(__MINGW32__)
type PlatformString = WindowsString
#else
type PlatformString = PosixString
#endif

newtype OsString = OsString { getOsString :: PlatformString }

Again I’m confused. PosixString is not guaranteed to be UTF-8 or any unicode.

Yes, the conversion to FilePath has to make a choice on how to decode. I don’t know what you mean with “same as what FilePath does”. FilePath doesn’t do any conversion on its own. base does at the FFI layer and assumes the filepath encoding matches the system locale.

Not all standard libraries do that. It’s questionable and the main reason the new types exist. You’d be subverting the benefits.

If you don’t go the decode->encode->decode route and try to send OsPath directly over the wire, you need a strategy to deal with OsPaths that are e.g. sent from unix to windows (there might be valid cases of such).

WindowsString/PosixString are more sound.

Does that make it clearer?