Informal discussion about the progression of `base`

How about the human equivalent of “local bindings”? For example,

Hey pair programming buddy, for the remainder of this session let’s agree to call <|> “the bar thing” and ^@. “the at thing”.

Great, now I think we fix this bug by defining a equals x the bar thing open paren y the at thing g close paren.

You can even literally define

theBarThing = (<|>)
theAtThing (^@.)

in your source files somewhere, use them, and go through to replace them with the operator versions before committing.

I’ve never had to read things loud (even when doing pair programming : I’ll just point thing with my finger),
but I understand some do and it can be a problem.

1 Like

We often do that. But standardized vocabulary makes it easier to get started on the task at hand, rather than spending time negotiating a common language.

I’m not really arguing either way for mandatory names for operators - I’m undecided as to the merits - but I do like learning useful ways of working with others. Probably a culture of canonical pronunciations in the Haddocks would be enough to serve my use case.

It’s like reading a paper in a group with others - it all flows nicer if there’s a sentence like “The form of judgment Σ;Γ ⊢ e : τ, which means that e can have type τ under global signature Σ and local context Γ, expresses …” instead of just dumping the syntax in directly.

4 Likes

This looks extremely ironic in the context of a language created to unify many local idioms that were referring to the same practice of lazy programming language development. We’re not above having names that everyone can use :slight_smile:

2 Likes

This looks extremely ironic in the context of a language created to unify many local idioms that were referring to the same practice of lazy programming language development

Is it ironic? Why? Sure, Haskell was developed to coalesce a multiplicity of lazy languages. That doesn’t mean it’s ironic to have a multiplicity of lens libraries, effect systems, web frameworks, names for operators …

FWIW I’m mildly in favour of “all operators should be defined to have a name”.

2 Likes

All these lens libraries, effect systems and web framework tackle their problems in different manners, often improving on the state of the art. A multiplicity of names is however quite harmful for communication and understanding. That is why we develop Ubiquitous Languages when working in teams.

This is also why societies develop legal corpuses, so that each contract does not have to rebuild the foundations of legal interactions from scratch each time two or more parties need to reach an agreement.

Seems like a reasonable line of thought and as I say, I’m not opposed, I was just trying to give an answer to the question that David posed!

I hope it’s OK to revive this discussion - feel free to (re-)move this if you want me to open a new therad instead.

I guess I’ll write down my worst “type safety issue” I had in the last weeks that was annoying and could have been a non-issue with a better base library (or rather: a better base ecosystem)

  1. threadDelay is taking an Int, which has no semantic meaning. (Control.Concurrent)
  2. Close to the same issue I’ve used a TimeSpec in the environment/record. It uses a different prefix by default for its Num instance (System.Clock)

(threadDelay: µs, TimeSpec: ns)

They are not really compatible. In this case I’d love it if the available threadDelay would use something like a TimeSpec instead of an Int. Or some class TimeDelta, where I can have explicit units like data Microsecond = MkMicrosecond Int. But no potentially buggy extraction needed on my side.

2 Likes

some time ago i encountered exactly the same issue! it reminded me of the great Duration type from the rust standard library

i guess there must be a package for those things, but when you interact with a base function that does not use the nice interface, why even bother adding yet another dependency for (sometimes) very small usage?

1 Like

String/Text situation is something that is a clearly problem to be solved for good.

5 Likes

Why/how?

String and Text are not really isomorphic: https://play.haskell.org/saved/uy78OF1i

Char is basically just Int with an upper bound of 1114112 and String as such is an unencoded list of unicode code points (that includes surrogates).

My opinion is:

  • Char is fine, it’s a subset of Int, but the name is unfortunate (UCodePoint or something else would have been more appropriate)
  • the String type synonym was a mistake. [Char] is something rather low-level… a nice encoding agnostic representation though

Other than these naming issues and the fact that most of the ecosystem uses this inefficient String representation, I don’t think we would be able to actually get rid of any of those types we have.

  • Char/String is low level unicode code point and encoding agnostic
  • Text is high level and platform agnostic (utf-8 representation)
  • OsString is high-level, but platform-specific to avoid encoding/decoding at the outer FFI layer (rust has this type too)
  • ByteString is about bytes and unicode or platform is irrelevant to it

These all have different properties. Converting between them isn’t just a matter of changing representation, but sometimes isn’t possible at all or not without losing information.

4 Likes

I think the ‘String/Text issue’ refers to this specifically. Sure, there’s times (albeit rare) when [Char] really is the best choice. But is it a good default? No, and it’s bad that base encourages people to use [Char] so much.

6 Likes

Already the fact that it’s called -XOverloadedStrings makes working with Text harder than it should be.

Well, the Haskell report mentions it explicitly:

The definition type String = [Char] is demanded to be exposed by Prelude.

Hence to solve the String issue base would need to a new encoding-agnostic string type, an array of integers. Then getArgs, peekCString and other functions can return that instead of encoding their results into Strings.


That would put GHC at four string types that are all just array flavors, so I wonder if the correct solution would be moving towards a general low-level interface for arrays (à la primitive), and then using other types as mere wrappers.

And that would supercede ByteString, which I personally find quite jarring, since it presents itself as an 8-bit string type, but the ecosystem widely uses it as a general memory allocation one (see zlib, cryptography libraries and data parsers).

I believe it was @ChShersh who pointed out that -XStringLiteralsAreText would be a more useful extension.

That’s what OsString is for.

That’s more of an implementation detail. And mostly affects people writing new String types (I implemented OsString).

Improving interfaces can be done regardless. But e.g. all those stringy type classes turned out to be even worse mistakes, since the common denominator is rather small.

Also see: Surprising behavior of ByteString literals via IsString · Issue #140 · haskell/bytestring · GitHub


As such, I find most of the critique rather unactionable.

maybe something like this can be considered in the light of the configurable defaulting discussion. I have to read up on that.

Is it? I had the impression that file path handling is set in stone encoding-wise in OS APIs, but general C string handling is a runtime rigmarole. (also if I’m wrong on this when are we getting OsString in base)

Not at all. See the documentation of System.OsPath

Basically, the point of OsString is to just keep whatever OS API (e.g. syscalls) throws back at you without any data transformation. That is what you then want for getArgs.

See the unix package:

getArgs :: IO [PosixString]
getArgs =
  alloca $ \ p_argc ->
  alloca $ \ p_argv -> do
   getProgArgv p_argc p_argv
   p    <- fromIntegral <$> peek p_argc
   argv <- peek p_argv
   peekArray (p - 1) (advancePtr argv 1) >>= mapM (fmap PS . B.packCString)

Compare that with base:

getArgs :: IO [String]
getArgs =
  alloca $ \ p_argc ->
  alloca $ \ p_argv -> do
   getProgArgv p_argc p_argv
   p    <- fromIntegral `liftM` peek p_argc
   argv <- peek p_argv
   enc <- argvEncoding
   peekArray (p - 1) (advancePtr argv 1) >>= mapM (GHC.peekCString enc)

GHC.peekCString here is what causes the disaster, while B.packCString does not to do any decoding.

You might notice that we have PosixString here (oh no, yet another type…). That’s because this allows us to express “posix strings” and “windows strings”, while OsString means “string of the current platform”.

E.g. for tar, we actually need PosixString, even if we’re on windows. All this has panned out nicely so far.


Are they the same as ByteString? No. They’re rather wrappers around ShortByteString (which is unpinned memory).

Again: there are differences, both in properties and implementation, between string types.

2 Likes