Informal discussion about the progression of `base`

Just out of curiosity, how does this work when you’re reading it out loud? I find that I experience silent reading of code similarly to you, but when I need to say it (e.g. while pair programming, or teaching, or asking questions in a video call) then I often stumble. I have idiosyncratic names for many bits of syntax or operators - “<*>” is pronounced “ap”, “>>=” is pronounced “bind”, and “->” is pronounced “arrow” - but I really struggle with pronouncing even fairly common operators like “<|>”, let alone “<?>” or “^@.”. I often end up “spelling them out”, so that “^@.” is “up at dot”, but that’s not the most pleasant way to communicate. It would be nice to find a better way.

3 Likes

How about the human equivalent of “local bindings”? For example,

Hey pair programming buddy, for the remainder of this session let’s agree to call <|> “the bar thing” and ^@. “the at thing”.

Great, now I think we fix this bug by defining a equals x the bar thing open paren y the at thing g close paren.

You can even literally define

theBarThing = (<|>)
theAtThing (^@.)

in your source files somewhere, use them, and go through to replace them with the operator versions before committing.

I’ve never had to read things loud (even when doing pair programming : I’ll just point thing with my finger),
but I understand some do and it can be a problem.

1 Like

We often do that. But standardized vocabulary makes it easier to get started on the task at hand, rather than spending time negotiating a common language.

I’m not really arguing either way for mandatory names for operators - I’m undecided as to the merits - but I do like learning useful ways of working with others. Probably a culture of canonical pronunciations in the Haddocks would be enough to serve my use case.

It’s like reading a paper in a group with others - it all flows nicer if there’s a sentence like “The form of judgment Σ;Γ ⊢ e : τ, which means that e can have type τ under global signature Σ and local context Γ, expresses …” instead of just dumping the syntax in directly.

4 Likes

This looks extremely ironic in the context of a language created to unify many local idioms that were referring to the same practice of lazy programming language development. We’re not above having names that everyone can use :slight_smile:

2 Likes

This looks extremely ironic in the context of a language created to unify many local idioms that were referring to the same practice of lazy programming language development

Is it ironic? Why? Sure, Haskell was developed to coalesce a multiplicity of lazy languages. That doesn’t mean it’s ironic to have a multiplicity of lens libraries, effect systems, web frameworks, names for operators …

FWIW I’m mildly in favour of “all operators should be defined to have a name”.

2 Likes

All these lens libraries, effect systems and web framework tackle their problems in different manners, often improving on the state of the art. A multiplicity of names is however quite harmful for communication and understanding. That is why we develop Ubiquitous Languages when working in teams.

This is also why societies develop legal corpuses, so that each contract does not have to rebuild the foundations of legal interactions from scratch each time two or more parties need to reach an agreement.

Seems like a reasonable line of thought and as I say, I’m not opposed, I was just trying to give an answer to the question that David posed!

I hope it’s OK to revive this discussion - feel free to (re-)move this if you want me to open a new therad instead.

I guess I’ll write down my worst “type safety issue” I had in the last weeks that was annoying and could have been a non-issue with a better base library (or rather: a better base ecosystem)

  1. threadDelay is taking an Int, which has no semantic meaning. (Control.Concurrent)
  2. Close to the same issue I’ve used a TimeSpec in the environment/record. It uses a different prefix by default for its Num instance (System.Clock)

(threadDelay: µs, TimeSpec: ns)

They are not really compatible. In this case I’d love it if the available threadDelay would use something like a TimeSpec instead of an Int. Or some class TimeDelta, where I can have explicit units like data Microsecond = MkMicrosecond Int. But no potentially buggy extraction needed on my side.

2 Likes

some time ago i encountered exactly the same issue! it reminded me of the great Duration type from the rust standard library

i guess there must be a package for those things, but when you interact with a base function that does not use the nice interface, why even bother adding yet another dependency for (sometimes) very small usage?

1 Like

String/Text situation is something that is a clearly problem to be solved for good.

5 Likes

Why/how?

String and Text are not really isomorphic: https://play.haskell.org/saved/uy78OF1i

Char is basically just Int with an upper bound of 1114112 and String as such is an unencoded list of unicode code points (that includes surrogates).

My opinion is:

  • Char is fine, it’s a subset of Int, but the name is unfortunate (UCodePoint or something else would have been more appropriate)
  • the String type synonym was a mistake. [Char] is something rather low-level… a nice encoding agnostic representation though

Other than these naming issues and the fact that most of the ecosystem uses this inefficient String representation, I don’t think we would be able to actually get rid of any of those types we have.

  • Char/String is low level unicode code point and encoding agnostic
  • Text is high level and platform agnostic (utf-8 representation)
  • OsString is high-level, but platform-specific to avoid encoding/decoding at the outer FFI layer (rust has this type too)
  • ByteString is about bytes and unicode or platform is irrelevant to it

These all have different properties. Converting between them isn’t just a matter of changing representation, but sometimes isn’t possible at all or not without losing information.

5 Likes

I think the ‘String/Text issue’ refers to this specifically. Sure, there’s times (albeit rare) when [Char] really is the best choice. But is it a good default? No, and it’s bad that base encourages people to use [Char] so much.

7 Likes

Already the fact that it’s called -XOverloadedStrings makes working with Text harder than it should be.

Well, the Haskell report mentions it explicitly:

The definition type String = [Char] is demanded to be exposed by Prelude.

Hence to solve the String issue base would need to a new encoding-agnostic string type, an array of integers. Then getArgs, peekCString and other functions can return that instead of encoding their results into Strings.


That would put GHC at four string types that are all just array flavors, so I wonder if the correct solution would be moving towards a general low-level interface for arrays (à la primitive), and then using other types as mere wrappers.

And that would supercede ByteString, which I personally find quite jarring, since it presents itself as an 8-bit string type, but the ecosystem widely uses it as a general memory allocation one (see zlib, cryptography libraries and data parsers).

I believe it was @ChShersh who pointed out that -XStringLiteralsAreText would be a more useful extension.

That’s what OsString is for.

That’s more of an implementation detail. And mostly affects people writing new String types (I implemented OsString).

Improving interfaces can be done regardless. But e.g. all those stringy type classes turned out to be even worse mistakes, since the common denominator is rather small.

Also see: Surprising behavior of ByteString literals via IsString · Issue #140 · haskell/bytestring · GitHub


As such, I find most of the critique rather unactionable.

maybe something like this can be considered in the light of the configurable defaulting discussion. I have to read up on that.

Is it? I had the impression that file path handling is set in stone encoding-wise in OS APIs, but general C string handling is a runtime rigmarole. (also if I’m wrong on this when are we getting OsString in base)