Haskell HTTP(S) libraries don't work well

  • 1(a): Haskell isn’t alone - other (and well-known) GC-based languages have similar difficulties e.g:

    Present but unreachable: reducing persistent latent secrets in HotSpot JVM (2017)

  • 1(b):

    AntC2:

    I like laziness. I spent decades programming in procedural languages, where I had to continually think about order of evaluation. It sucks.

    …and all these people liked laziness:



    (source)


    Laziness just presents an extra challenge, and apparently most Haskellers like those e.g. Dependent types in Haskell .

  • 2: And we now have the Haskell Foundation - maybe they can help out with that one…

  • 3: …in much the same way they’re helping out with this one.

Haskell isn’t alone in being an older language in need of securing, but things could have been so much worse:

C++ was not designed from the ground up to offer memory safety.

Josh Aas


…and which rustup also uses:

curl --proto '=https' --tlsv1.2 -sSf [rustup-script URL] | sh

But as we also know, those “normal users” normally use ('doze) OSs sold by another “well-known” company. Fortunately, those users are also catered for:

Other Installation Methods - Rust Forge

…no mention of curl, wget or shells there - just “download and run” the appropriate installer:

1 Like

May I clarify what OSs you are referring to? I know that Windows comes with curl and I thought all the Unix-like operating systems did too.

May I clarify what OSs you are referring to?

The ones still on approx. 90% of laptops and PCs…and which also has curl (for several years now). But Rust still has its own dedicated installer for these OSs:

(https://forge.rust-lang.org/infra/other-installation-methods.html)

  • On Windows, download and run rustup-init.exe .
1 Like

Windows might have curl, but does it have | and sh (and does it support all the things contained in the script)?

There is no guarantee that curl is present on Linux systems. I found no good information about which distributions do provide it by default, but there is this old askubuntu question which implies that it was not included in Ubuntu 7 years ago. Edit: I just checked the manifest of Ubuntu 23.10 and it now does include curl.

1 Like

I know very little about this thread’s headline topic (and read the posts with interest to learn more), but @hasufell has consistently advised, for some time, that Stack, when it seeks to fetch files over the Internet, should drop its reliance on Haskell libraries and rely on the curl executable. Hence my interest in what operating systems provide curl ‘out of the box’.

I would also be interested if people could be more concrete about what ‘libraries do not work well’ means in practice. @hausfell has referred to the tls package provided at the haskell-tls/hs-tls repository. Is network also ‘problematic’? Again, my personal interest is ‘do not work well in a way that adversely affects the use that Stack makes of them’ (fetching files over the Internet) or ‘do not work well in other ways’.

1 Like

The main stack installation script already relies on curl or wget and will error out if it doesn’t exist: https://raw.githubusercontent.com/commercialhaskell/stack/stable/etc/scripts/get-stack.sh

# Download a URL to file using 'curl' or 'wget'.
dl_to_file() {
  if has_curl ; then
    if ! curl ${QUIET:+-sS} -L -o "$2" "$1"; then
      die "curl download failed: $1"
    fi
  elif has_wget ; then
    if ! wget ${QUIET:+-q} "-O$2" "$1"; then
      die "wget download failed: $1"
    fi
  else
    # should already have checked for this, otherwise this message will probably
    # not be displayed, since dl_to_stdout will be part of a pipeline
    die "Neither wget nor curl is available, please install one to continue."
  fi
}

So you can just assume curl or wget exists in your stack Haskell code too.

Windows is a bit special: the curl.exe available in powershell can be a proper curl or some other scriptlet that doesn’t really support the options you’d expect. It’s generally not very portable and caused a lot of issues in the ghcup powershell script. I’d recommend to bootstrap everything using System.Net.WebClient.DownloadFile or Invoke-WebRequest on windows.

Then you install msys2 and add the internal bin PATH of msys2 while executing curl/wget in the actual stack binary, so you get them from there.

The issue is that stack doesn’t install msys2 via the bootstrap script, but through the main stack binary.

The TLS situation is a security issue. I’m not sure how well our haskell network stack works with IPv6. Generally, supporting esoteric proxy configurations is hard… curl just does the best job here. Stack has an immense amount of bug reports about network errors on download. I believe many of them are related to using haskell code to download.

Another issue could be that the Haskell downloading code in Stack somehow conflicts with curl or wget. But even if it isn’t, it’s still seems repetitive to use more that one downloading mechanism: DRY.


Alternatively:

  1. test the local downloader program (curl or wget);

  2. if the local downloader program is “real”, proceed as normal.

  3. if not, display informative message about replacing “fake” downloader program e.g. by installing msys2 manually or some other remedial measure.

This has three advantages:

  • Stack, Cabal and Ghcup can all assume a proper downloader program exists.

  • It makes it clear whose fault it is if those remedial measures are needed.

  • It also shows why some systems are better-supported than others.

I don’t think that’s a good strategy. There’s very few assumptions you can make about the windows provided curl.exe. You better use the one provided by msys2.

Also, probing cli programs for compatibility (existence of switches etc.) is error prone (some just ignore them) and bad form (you’re executing a program).

…in much the same way you cannot assume msys2 will always be installed, so you would also have to check for that too.


…it is?

Both ghcup and stack expect msys2 in a specific location, which can be changed through configuration. If that location is non-empty, we assume it is installed.

Yes, the lines you quoted are not “probing”. They’re attempting a proper download.

…which can then be used like a “probe”, possibly to update the configuration if curl failed (e.g. because it was a “scriptlet”).

But it’s your choice - if you’re happy to constantly work around this sort of nonsense and not expect these systems to Do The Right Thing by (at least) always having a proper working version of curl…enjoy bliss. I will thus merely advise you to not expect everyone else to make that same choice.

Chiming in that if you control the environment in a professional context, with standard tools such as Docker or Nix, where the developer and deployment environments are the same, it can be a very good trade-off to call out to an external curl process. Curl is tested, mature, supports the latest protocols and extensions, and most importantly has a fully exercised and alive CVE process. Running a separate process vs a library also protects against unwanted control flow gymnastics, and segfaults in your main process, and makes killing it easy, too. More Erlangy.

I’ve followed this pattern for RabbitMQ and JWT verification, via Java minion processes that use established Java libraries, and have done so with aws (rather than amazonka).

If you allow yourself to let go of the “Haskell all the things” urge, where possible (performance can factor in of course), it can actually free up a lot of burden.

4 Likes

Yes: I too was (eventually) thinking of how Erlang works. But that comparison also shows a potential problem - Erlang “computational units” are faster to start and stop than traditional OS processes, particularly if the program being started and stopped is large…or that program is used often.

On my system, curl is 239720 bytes in size: less than 1/4 megabyte. So using it for this purpose in a controlled manner seems reasonable for systems with “proper” versions of curl, and there are various “shell scripting” libraries available.


…with another example being described in Interacting with Functional Languages (1997), for GUIs.

I agree here if it’s about industrial use cases, where I definitely want to be fully aware of what the environment of the deployment target is (especially if it’s some random machine).

However, we’re talking about open source tools/installers here. In that context, I believe it’s not the tool authors responsibility to ensure the end user keeps their curl up to date etc. If they don’t, they have likely much bigger problems anyway. In that sense, you’re building on top of the distros security. I generally have high trust in distros, although (as seen in this thread wrt botan and Fedora) they also sometimes blunder.

1 Like

Yes. The current plan (I think) is back to moving everything to ghc-internals and then moving things bit by back out of ghc-internals, so we have a nice chance to “untangle/organize/refactor as we go”.