But it is kinda meaningless (or even outright wrong) to do so, when you don’t know the original encoding.
You now have code points for the sake of having code points, not because you have a meaningful interpretation of the ByteString.
It does.
If you look at the posix standard, the path separator is defined as a single byte in the ASCII range /
. If you have some funny multi byte encoding that contains such character, the system calls will not care and treat it as path separator.
Quote from posix Pathname documentation:
Additionally, since the single-byte encoding of the character is required to be the same across all locales and to not occur within a multi-byte character, references to a character within a pathname are well-defined even when the pathname is not a character string.
So you could argue that splitting on the unicode code point /
is actually wrong if there are multiple byte sequences that can translate to this code point (I didn’t check if there are encodings that would satisfy that).
The filepath extension character .
is not defined by posix, so whether it is in the ASCII range or encoding sensitive is a random decision.
The fact that the posix standard implicitly advises to use the portable character set for filepaths and that the path separator is in that single byte character set also drives the decision that OsChar
is a Word8
. If you want to split at unicode codepoint boundaries you should first understand what the actual encoding is and not shove it into some random internal representation and then pretend that will produce sensible output.
FilePath maintainers know what they’re doing
Could there be a different internal representation than raw byte arrays? Sure. Rust uses WTF-8 (not UTF-8b). It has other trade offs. And there are even successors/variants of it.
My guess is that you were kinda trying to reinvent that format, so you might find the specification interesting.
The original Abstract FilePath proposal doesn’t mention any of that and I agree that leaving the damn bytes untouched seems like a compelling property to have for the internals.