Deprecating Safe Haskell, or heavily investing in it?

In the non-threaded run-time, that will print “definition’s output”. In the threaded run-time, the same result is likely, but both could output (possibly interleaved, depending on buffering mode). I still don’t see how this is a special reason to need SafeHaskell.

From page 2 of 12 in the paper:

…maybe this article can also help to clarify matters.

But if you still don’t see how any of that is a reason (special or otherwise) to have SafeHaskell…you might be happier using Standard ML instead of Haskell - you can sneak in all the observable effects you like, and you rarely have to change any type signatures as a result. Furthermore, and as I dimly recall, there are still some SML implementations which are only single-threaded (being able to choose your preferred language implementation - what a novelty!).

Kaplan, Okasaki, and Tarjan demonstrate a way to simplify a catenable list implementation by using careful mutation in a way slightly more general than pure lazy evaluation allows. Packages like lvish and monad-par use unsafe operations under the hood to offer functionally that just wouldn’t be available otherwise. The art of using impure operations to implement pure ones isn’t and shouldn’t be restricted to compiler writers. Unsafe coercions (often buried in rewrite rules) are essential for optimizing Haskell code that uses very heavy type-level computations.

2 Likes

It isn’t - that’s why types like ST exist.

…and before long, there’ll no doubt be a need for coercions in kind and sort-level programming because optimising those will be essential, too. It’s like normalising relational DBs - there’ll always be another normal form to make them work just that bit better.

Oh, and of course dependent types will make this all so much easier…

ST supports one way to use mutable state to implement pure computations. Lazy evaluation is another. Do you think those should be the end of it? That nothing else of that sort could possibly be worth the trouble?

1 Like

No, they shouldn’t be the end of it. But it begs a similar question: beyond parametric encapsulation and lazy evaluation, has anything else been discovered that has have a similar impact to either of them?

As Bodigrim and yourself have both noted, more and more unsafe code is appearing (with various levels of “trust-washing”), none of which seems to suggest a new commonality or abstraction which can be used to manage the mayhem the way parametric encapsulation and lazy evaluation helps us to do. Instead, it’s just a ever-growing pile of “unsafe abominations” which makes the claim of Haskell being a purely functional programming language an ever-growing joke…

…so let’s solve both problems at once:

  • Haskell is relabelled the “nondeterministic functional programming language” - the unsafe prefix can then be replaced with nondet.

  • As I suggested here, unrecoverable errors in Haskell no longer stop programs, but sends them into an infinite regression: debuggers can then be attached to the still-running programs to examine their call stacks or other runtime meta-data as needed to discover what went wrong.

The research and development that went into HasFuse, an early Haskell implementation based on GHC, could be useful for this task.

The research and development that went into HasFuse, an early Haskell implementation based on GHC, could be useful for this task.

Do you have a link that works for HasFuse?

The wayback machine has one (from 2007): HasFuse

It seems like it is mainly based on this paper: Realising nondeterministic I/O in the Glasgow Haskell Compiler.

Link updated; HasFuse now at:

https://www2.ki.informatik.uni-frankfurt.de/research/diamond/hasfuse/index_content.html

(https://www2.ki.informatik.uni-frankfurt.de/research/diamond/hasfuse/index.html also works.)

1 Like

One perspective that I haven’t seen much yet is that Safe Haskell could have the same use that unsafe { } blocks have in Rust. In Rust those blocks are not used to give some safety guarantees about untrusted code. They act more like a code organization and documentation tool.

Unsafe blocks indicate clearly to both the author and later readers of the code that a particular part of the code needs more attention.

I think our experience up to now has shown that Safe Haskell is only useful if most of the ecosystem actually uses it and properly annotates unsafe parts of their code. Otherwise many modules will probably be inferred to be unsafe.

One possible course of action is to turn on Safe by default and require annotations to use unsafe features. I think that would lead to an awkward transitional period where many modules must be marked trustworthy, but I believe it is possible to eventually reach a point where only clearly unsafe parts have to be marked trustworthy. Rust has shown that it is possible to implement a system like this without it being too much of an issue for users of the language.

Maybe the ship has sailed and I should just add Safe Haskell to the list of changes I would make if the existing ecosystem and backwards compatibility would not be a problem.

1 Like

Fwiw, and a small tangent, but the “demo” on haskell.org and tryhaskell.org are really terrible demos, and aren’t even a reasonable means of trying haskell. It’s not running ghci behind the scenes nor even interpreting a reasonable subset of haskell. I can’t define functions without let bindings, and I can only execute the bare minimum of expressions (e.g. f x = x is not definable as a toplevel definition in tryhaskell “haskell”), and I wouldn’t care if they were shut down. Even the primes example on haskell.org isn’t runnable!

Also no, GitHub - tryhaskell/tryhaskell: Try Haskell tells me it doesn’t rely on -XSafe at all. It looks like it’s running through Gwern’s mueval interpreter.

5 Likes

While the idea of Safe Haskell makes sense as a language extensionrestriction as applied to Haskell 2010, it seems pretty clear that GHC Haskell is not a good base for it.

The only reason I’m hesitating to chant “burn it” along with everyone else is the timing. The next release of GHC is supposed to integrate GHCJS, and after another release or two people are going to try to use it to build websites that run user-defined Haskell code. That would be the time to establish if Safe Haskell should be fixed or eliminated. Are we sure we want to kill it right now?

and after another release or two people are going to try to use it to build websites that run user-defined Haskell code.

And I believe it’s the duty of the community (myself included) to warn against the dangers of doing such a things considering the missed spots of Safe Haskell. That being said I’m not suggesting that we scrape it here and now, but we have to re-organise the set of promises, use-cases and techniques that Safe Haskell failed to materialise after all these years.

2 Likes

“I use Safe Haskell, what could possibly go wrong :woman_shrugging:” is a dubious threat model in a first place.

While the endeavor is a cool academic exercise, touting safety* as a mainstream ecosystem feature would eventually lead to grief and embarrassment.

* with the threat model buried elsewhere in some paper you’d have to dig for, that may be not the TM you actually have.

1 Like

I’m not a hardcore Haskell person, still less a compiler writer or language theorist, and I don’t know what bugs or obscure corner case problems Safe Haskell may have. I do have about 30 years experience in trying to build, deploy, or help others build or deploy, secure software and secure systems, sometimes very large ones, including by sandboxing code.

I think it’s a dangerous idea that bubblewrap, or ad-hoc system call filtering, or any kind of sandbox, “solves the problem”.

No mechanism works absolutely, and anything can have bugs, but sandboxes around arbitrary code are really hard to get right. Given equal investment of effort, I’d expect to get more out of something like Safe Haskell. In most cases, it’s more fruitful to try to constrain what code can even express the desire to do, than to let it try just any old thing and hope you can guess at runtime whether whatever it’s trying is OK.

Sandboxing and syscall filtering are naturally bug prone. It’s incredibly hard to enumerate everything untrusted code should be allowed to do, and then map that to a large, extremely irregular, and ever-growing kernel API and a large and ever-changing “standard” name space of files, IPC mechanisms, network endpoints, kernel resources, and other wild and wonderful things, in a way that’s right in every case. The attack surface is huge, and you have to individually decide whether the program needs access to every part of that surface. Observing behavior only gets you so far, because you have to get it right for exceptional conditions, too. The runtime and libraries supporting the enclosed code are normally inside the sandbox, and may need to do things that aren’t obvious from the application code… which of course glosses over the question of exactly how complicated even the application code can be. Also, kernel bugs are at least as common as RTS bugs, so things that “look safe” and that you have to expose, are still allowed to ruin your day.

Even when sandboxing works, the restrictions you have to put on how the enclosed code interacts with the rest of the system can be prohibitively painful, not just in terms of functionality, but in terms of how easy it is to operate and monitor the resulting system. And you get to do it over again for ever OS.

If you REALLY need a sandbox to isolate code that might try to use the whole OS API, I’ve always taken the view that you shouldn’t trust anything short of a full virtual machine. Even then you should be super nervous, because VM escapes are a real thing, and cross-VM information leakage is a really real thing. Hypervisors are still big complicated programs that present complicated interfaces to the programs they enclose; they’re only simple by comparison with kernels and syscall filters. And the leakage problems usually come from CPU bugs.

Code that restricted itself to only a very minimal capability-like interface to the outside world might be easier to contain, but no such code exists, nor are there any standards for what that interface might be… and defining that “minimal interface” would be a project with much the same general feel as Safe Haskell.

Also, the “killer app” for Safe Haskell would be supply chain security. That requires running untrusted code inside your own process. External sandboxes can’t help with that.

That’s not to say that Safe Haskell as it stands is good for supply chain security or any other particular use. The original point about investment is dead on. As I said, I don’t know about subtle cases, but at the moment Safe Haskell doesn’t even try to address couple of major areas of what you’d need.

Most obviously, it doesn’t try to do anything about non-total code… including code that runs the entire system out of memory, which is absolutely a practical attack that real adversaries might use. That would of course be hard to fix. You could imagine ways to do it short of somehow creating a total language… but I’m sure the RTS surgery would be major, and you’d still have to provide some kind of wrapper to let calling code handle the “that function didn’t look like it was actually going to return” case.

Less obviously, but probably worse in an immediately practical sense, Safe Haskell packages exist in a universe where a malicious package can just use its build scripts to inject anything it wants into the final program(s), completely outside of the Haskell language. Maybe that could be addressed, since I don’t think most Haskell packages actually need to run arbitrary build code at all. But it means that the compiler and RTS, the core of the language, aren’t the only things you’d need to touch to make it practically useful. And it probably means more people and projects would need to be involved, too.

Oh, and Trustworthy would be a problem if you were going to use Safe Haskell for supply chain security. You can’t rely on code’s own opinion of whether it should be trusted; that’s something that has to be determined based on human concerns when it’s integrated into a larger system. I think it’s dangerous to even let code try to claim to be trustworthy, because there’s always the risk of some oversight that lets it be believed.

Anyhow, personally, I would be really sad to see Safe Haskell go away, because I think Haskell may be the only language with any serious adoption that has even tried to do containment in a vaguely clean, understandable, analyzable way. And Haskell’s type system gives it structural advantages for that. JavaScript is all about sandboxing, but it uses brute-force ad-hoc approaches still keep delivering nasty surprises even with enormous investment.

Not to say anybody cares about my sadness, but at least I hope nobody thinks the untrusted code problem has been “solved” by any Haskell-external means.

6 Likes

Total code can also easily take a very long time or a lot of space to compute. I think the only practical solution is to set time and space limits externally.

Safe Haskell includes a system which allows everyone to determine for themselves which packages to trust. GHC will throw an error if you try to import (possibly transitively) a trustworthy module from a package that you have not trusted.

3 Likes

I think it’s a dangerous idea that bubblewrap, or ad-hoc system call filtering, or any kind of sandbox, “solves the problem”.

You make this claim and don’t back it up with anything other than “it can have bugs”.

The point is to have different abstractions.

If you trust haskell, the RTS and your binary, why even sandbox? Or use SafeHaskell?

No, the point is we don’t trust it. And language level features to gain that trust are at the utterly wrong level. Unless we start to be serious and have a verified compiler etc. But we are nowhere near that. And SafeHaskell is nowhere near that.

and then map that to a large, extremely irregular, and ever-growing kernel API

I’m not sure where you got this from. If we’re talking about user-space API, kernel developers consider it a bug when user-space API breaks and lose their sleep over it.

You move on to suggest VMs instead. I guess those are bugfree and less complicated?

I mean, no.

Given your experience, you probably know that the issue here is the TCB. And that of VMs is not really lower than that of bubblewrap.

However, both systems can operate under the assumption “the binary may do anything”. SafeHaskell doesn’t.

This is possibly the first post in this thread that seems to be advocating for keeping Safe Haskell, but I’m afraid it’s so full of sarcasm that I don’t understand it.

Do you think we should keep Safe Haskell? What are its benefits? You have three links, but it’s not clear to me what any of them have to do with it.

In my ten years of using Haskell, “Safe Haskell” has only ever been a vestigial feature of the Hackage website. What I mean by that is I’ve never directly worked with it, used it, or relied on it.

Now, it’s possible that I have passively benefited from it! But I wouldn’t know. I also don’t know if I could have somehow actively benefited from it if I had been properly taught about it. I would love it if people could point it out to me if that is the case.

3 Likes