if we look at filepath documentation. this seems to me to say that you almost always should use OsPath over FilePath if the cost is the same.
As, effectively, System.IO.FilePath
is just a synonym for [Data.Char.Char]
, I am not sure if âdeprecateâ is the correct term. In his associated blog, and elsewhere, Julian Ospald (@hasufell) has already gone the extra mile to urge people to make the change where it counts and explain the benefits of doing so. I think it is a good question to ask if the Haddock documentation for FilePath
could helpfully say a little more than it currently does. It currently says:
-- | File and directory names are values of type 'String',
-- whose precise meaning is operating system dependent. Files
-- can be opened, yielding a handle which can then be used to
-- operate on the contents of that file.
type FilePath = String
(EDIT: This documentation is a direct lift of the text at Section 41.2 of the Haskell 2010 Report.)
If I understand the blog correctly, this would be more accurate:
-- | File and directory names can be represented by values of type
-- 'String'. However, the precise meaning is operating system dependent.
-- (In some cases, this can cause problems and other types have been
-- developed to avoid them; see the @filepath@ package.) Files can be
-- opened, yielding a handle which can then be used to operate on the
-- contents of that file.
type FilePath = String
sure the proposed documentation change is good but why arenât the functions and the datatype deprecated with a warning like head
?
head
is not deprecated. Also: FilePath and head are part of the Haskell report.
There are people here better placed than me to answer your question, but this is mine:
Haskell is used in a variety of contexts. In some, the problems with [Char]
that Julianâs blog post identifies are mission critical. In others, they can be considered relatively unimportant.
An important context is education. There, at least initially, it can be useful to put certain complexity to one side, in order to reveal other things more plainly.
If you are confident that you will be working in an environment where directories and filepaths will involve only ASCII characters, then the problems that Julian identifies can be put to one side.
The premise is merely that for every operation that accepts/returns a FilePath
there exists that same operation that accepts/returns an OsPath
and thereâs a conversion in between. The default implicit conversion is encodeFS
/decodeFS
, so thereâs no point reaching for OsPath
if thatâs all youâre going to do anyway.
Also as of right now using OsPath
is excruciatingly tedious, as none of the system functions are where you expect (gotta fetch them from unix
and Win32
packages and combine with CPP), general locale encoding is in GHC.IO.Encoding
(internal module), and all the data conversions are performed at runtime (unless you use Template Haskell).
So, as with all things GHC, come back in five years, things may be different by then.
Didnât know that existed and I donât see getArgs
there, exciting.
Hmm yes, admittedly, I donât think there is a getArgs version that can return OsStrings yet. I remember a ticket somewhere about some functions that were still missing for that to work / (which was being worked on). Let me see if I can find a reference somewhere.
edit: see e.g. Support for ospath/osstring ¡ Issue #491 ¡ pcapriotti/optparse-applicative ¡ GitHub and the references therein.
That should be easy:
- unix/System/Posix/Env/PosixString.hsc at 4f7f16875c5e491722eda32df53a7b5c3094e4f9 ¡ haskell/unix ¡ GitHub
- win32/System/Win32/WindowsString/Console.hsc at 6b5bc0494e292ac122a99a8e20a8acc4bf2c7455 ¡ haskell/win32 ¡ GitHub
But keep in mind that the windows version behaves slightly different than the base getArgs
(because it uses actual windows API).
yeah you are correct about head, it only has a warning,sorry for the mistake, about the report it is 14 years old, when it was made it already had many compromises. Currently i know 0 haskell implementations besides GHC so compatibility is not relevant, if we decide something is bad we gain nothing by keeping it
âŚand from the final page of the associated paper:
That is for CLC to decide.
And Iâll vote rather strictly against proposals that violate the standard.
So I have conflicting goals as filepath maintainer and CLC member.
I think we can make OsPath the defacto standard outside of base and slowly deprecate String/FilePath support from other core libraries. They are not obliged to support FilePath.
With âslowlyâ I mean 10+ years down the line.
FWIW, I use FilePath for documentation purposes.
That is, a String-ly type standing for a filename.
In my scenario, this file is never opened / read etc.
So IMHO, this type has a place.
Can you elaborate on that? Why couldnât you do that with OsPath/OsString? Or rather: why would FilePath be a better choice there than OsPath/OsString?
From a naming perspective, I feel the Os prefix is out of place in my setting:
I receive a filename which is a field in a json received by yesod web server.
The actual file is never opened - the name just gets processed and propagated to several places
in the output json.
I see. Whether a file gets opened/read/written/etc is irrelevant for this discussion though; the main point is that String is not a good representation for paths. OsPath/OsString represent/store this data in a better way (more compact representation, better support for non-asci characters). I would think that that is relevant also in your scenario. I do agree that âFilePathâ may be a nicer name than âOsPathâ, but ultimately that could just be solved using a type alias (i.e âtype FilePath = OsPathâ ).
Note that itâs an interesting problem on how to encode OsPath
for json.
If you binary serialize it prescisely, it canât be deserialized between platforms (maybe thatâs a good thing?).
If you encode it as FilePath
, you lose the original bytes. Thereâs no simple answer. You have to understand the use case.
Youâre phrasing it as if manipulating system paths is ancient magic and only GHC maintainers have the access to the tomes describing the blood rituals.
Serialization itself is not a problem here: JSON can represent all UTF-8 code points, the RFC merely states that deserialization of surrogate points is implementation-defined. aeson
chooses convert them to replacement characters, thatâs anotherr point where youâd lose the bytes.
It is thus the responsibility of the side that wants to serialize the OsPath
to ensure that it can properly represent it, whether it be explicitly through a conversion (same as what FilePath
does), or implicitly by guaranteeing the correct encoding through system setup.
Sorry, I donât understand what this means. What does anything have to do with GHC here?
The OsPath
type has a platform CPP ifdef around an inner constructor: So trying to construct OsString (WindowsString x)
on unix is a compile error.
newtype WindowsString = WindowsString { getWindowsString :: BS.ShortByteString }
newtype PosixString = PosixString { getPosixString :: BS.ShortByteString }
#if defined(mingw32_HOST_OS) || defined(__MINGW32__)
type PlatformString = WindowsString
#else
type PlatformString = PosixString
#endif
newtype OsString = OsString { getOsString :: PlatformString }
Again Iâm confused. PosixString
is not guaranteed to be UTF-8 or any unicode.
Yes, the conversion to FilePath
has to make a choice on how to decode. I donât know what you mean with âsame as what FilePath doesâ. FilePath doesnât do any conversion on its own. base
does at the FFI layer and assumes the filepath encoding matches the system locale.
Not all standard libraries do that. Itâs questionable and the main reason the new types exist. Youâd be subverting the benefits.
If you donât go the decode->encode->decode route and try to send OsPath directly over the wire, you need a strategy to deal with OsPaths that are e.g. sent from unix to windows (there might be valid cases of such).
WindowsString/PosixString are more sound.
Does that make it clearer?