The research and development that went into HasFuse, an early Haskell implementation based on GHC, could be useful for this task.
Do you have a link that works for HasFuse?
The research and development that went into HasFuse, an early Haskell implementation based on GHC, could be useful for this task.
Do you have a link that works for HasFuse?
The wayback machine has one (from 2007): HasFuse
It seems like it is mainly based on this paper: Realising nondeterministic I/O in the Glasgow Haskell Compiler.
Link updated; HasFuse now at:
https://www2.ki.informatik.uni-frankfurt.de/research/diamond/hasfuse/index_content.html
(https://www2.ki.informatik.uni-frankfurt.de/research/diamond/hasfuse/index.html
also works.)
One perspective that I havenāt seen much yet is that Safe Haskell could have the same use that unsafe { }
blocks have in Rust. In Rust those blocks are not used to give some safety guarantees about untrusted code. They act more like a code organization and documentation tool.
Unsafe blocks indicate clearly to both the author and later readers of the code that a particular part of the code needs more attention.
I think our experience up to now has shown that Safe Haskell is only useful if most of the ecosystem actually uses it and properly annotates unsafe parts of their code. Otherwise many modules will probably be inferred to be unsafe.
One possible course of action is to turn on Safe
by default and require annotations to use unsafe features. I think that would lead to an awkward transitional period where many modules must be marked trustworthy, but I believe it is possible to eventually reach a point where only clearly unsafe parts have to be marked trustworthy. Rust has shown that it is possible to implement a system like this without it being too much of an issue for users of the language.
Maybe the ship has sailed and I should just add Safe Haskell to the list of changes I would make if the existing ecosystem and backwards compatibility would not be a problem.
Fwiw, and a small tangent, but the ādemoā on haskell.org and tryhaskell.org are really terrible demos, and arenāt even a reasonable means of trying haskell. Itās not running ghci behind the scenes nor even interpreting a reasonable subset of haskell. I canāt define functions without let bindings, and I can only execute the bare minimum of expressions (e.g. f x = x
is not definable as a toplevel definition in tryhaskell āhaskellā), and I wouldnāt care if they were shut down. Even the primes
example on haskell.org isnāt runnable!
Also no, GitHub - tryhaskell/tryhaskell: Try Haskell tells me it doesnāt rely on -XSafe
at all. It looks like itās running through Gwernās mueval
interpreter.
While the idea of Safe Haskell makes sense as a language extensionrestriction as applied to Haskell 2010, it seems pretty clear that GHC Haskell is not a good base for it.
The only reason Iām hesitating to chant āburn itā along with everyone else is the timing. The next release of GHC is supposed to integrate GHCJS, and after another release or two people are going to try to use it to build websites that run user-defined Haskell code. That would be the time to establish if Safe Haskell should be fixed or eliminated. Are we sure we want to kill it right now?
and after another release or two people are going to try to use it to build websites that run user-defined Haskell code.
And I believe itās the duty of the community (myself included) to warn against the dangers of doing such a things considering the missed spots of Safe Haskell. That being said Iām not suggesting that we scrape it here and now, but we have to re-organise the set of promises, use-cases and techniques that Safe Haskell failed to materialise after all these years.
āI use Safe Haskell, what could possibly go wrong ā is a dubious threat model in a first place.
While the endeavor is a cool academic exercise, touting safety* as a mainstream ecosystem feature would eventually lead to grief and embarrassment.
* with the threat model buried elsewhere in some paper youād have to dig for, that may be not the TM you actually have.
Iām not a hardcore Haskell person, still less a compiler writer or language theorist, and I donāt know what bugs or obscure corner case problems Safe Haskell may have. I do have about 30 years experience in trying to build, deploy, or help others build or deploy, secure software and secure systems, sometimes very large ones, including by sandboxing code.
I think itās a dangerous idea that bubblewrap, or ad-hoc system call filtering, or any kind of sandbox, āsolves the problemā.
No mechanism works absolutely, and anything can have bugs, but sandboxes around arbitrary code are really hard to get right. Given equal investment of effort, Iād expect to get more out of something like Safe Haskell. In most cases, itās more fruitful to try to constrain what code can even express the desire to do, than to let it try just any old thing and hope you can guess at runtime whether whatever itās trying is OK.
Sandboxing and syscall filtering are naturally bug prone. Itās incredibly hard to enumerate everything untrusted code should be allowed to do, and then map that to a large, extremely irregular, and ever-growing kernel API and a large and ever-changing āstandardā name space of files, IPC mechanisms, network endpoints, kernel resources, and other wild and wonderful things, in a way thatās right in every case. The attack surface is huge, and you have to individually decide whether the program needs access to every part of that surface. Observing behavior only gets you so far, because you have to get it right for exceptional conditions, too. The runtime and libraries supporting the enclosed code are normally inside the sandbox, and may need to do things that arenāt obvious from the application codeā¦ which of course glosses over the question of exactly how complicated even the application code can be. Also, kernel bugs are at least as common as RTS bugs, so things that ālook safeā and that you have to expose, are still allowed to ruin your day.
Even when sandboxing works, the restrictions you have to put on how the enclosed code interacts with the rest of the system can be prohibitively painful, not just in terms of functionality, but in terms of how easy it is to operate and monitor the resulting system. And you get to do it over again for ever OS.
If you REALLY need a sandbox to isolate code that might try to use the whole OS API, Iāve always taken the view that you shouldnāt trust anything short of a full virtual machine. Even then you should be super nervous, because VM escapes are a real thing, and cross-VM information leakage is a really real thing. Hypervisors are still big complicated programs that present complicated interfaces to the programs they enclose; theyāre only simple by comparison with kernels and syscall filters. And the leakage problems usually come from CPU bugs.
Code that restricted itself to only a very minimal capability-like interface to the outside world might be easier to contain, but no such code exists, nor are there any standards for what that interface might beā¦ and defining that āminimal interfaceā would be a project with much the same general feel as Safe Haskell.
Also, the ākiller appā for Safe Haskell would be supply chain security. That requires running untrusted code inside your own process. External sandboxes canāt help with that.
Thatās not to say that Safe Haskell as it stands is good for supply chain security or any other particular use. The original point about investment is dead on. As I said, I donāt know about subtle cases, but at the moment Safe Haskell doesnāt even try to address couple of major areas of what youād need.
Most obviously, it doesnāt try to do anything about non-total codeā¦ including code that runs the entire system out of memory, which is absolutely a practical attack that real adversaries might use. That would of course be hard to fix. You could imagine ways to do it short of somehow creating a total languageā¦ but Iām sure the RTS surgery would be major, and youād still have to provide some kind of wrapper to let calling code handle the āthat function didnāt look like it was actually going to returnā case.
Less obviously, but probably worse in an immediately practical sense, Safe Haskell packages exist in a universe where a malicious package can just use its build scripts to inject anything it wants into the final program(s), completely outside of the Haskell language. Maybe that could be addressed, since I donāt think most Haskell packages actually need to run arbitrary build code at all. But it means that the compiler and RTS, the core of the language, arenāt the only things youād need to touch to make it practically useful. And it probably means more people and projects would need to be involved, too.
Oh, and Trustworthy would be a problem if you were going to use Safe Haskell for supply chain security. You canāt rely on codeās own opinion of whether it should be trusted; thatās something that has to be determined based on human concerns when itās integrated into a larger system. I think itās dangerous to even let code try to claim to be trustworthy, because thereās always the risk of some oversight that lets it be believed.
Anyhow, personally, I would be really sad to see Safe Haskell go away, because I think Haskell may be the only language with any serious adoption that has even tried to do containment in a vaguely clean, understandable, analyzable way. And Haskellās type system gives it structural advantages for that. JavaScript is all about sandboxing, but it uses brute-force ad-hoc approaches still keep delivering nasty surprises even with enormous investment.
Not to say anybody cares about my sadness, but at least I hope nobody thinks the untrusted code problem has been āsolvedā by any Haskell-external means.
Total code can also easily take a very long time or a lot of space to compute. I think the only practical solution is to set time and space limits externally.
Safe Haskell includes a system which allows everyone to determine for themselves which packages to trust. GHC will throw an error if you try to import (possibly transitively) a trustworthy module from a package that you have not trusted.
I think itās a dangerous idea that bubblewrap, or ad-hoc system call filtering, or any kind of sandbox, āsolves the problemā.
You make this claim and donāt back it up with anything other than āit can have bugsā.
The point is to have different abstractions.
If you trust haskell, the RTS and your binary, why even sandbox? Or use SafeHaskell?
No, the point is we donāt trust it. And language level features to gain that trust are at the utterly wrong level. Unless we start to be serious and have a verified compiler etc. But we are nowhere near that. And SafeHaskell is nowhere near that.
and then map that to a large, extremely irregular, and ever-growing kernel API
Iām not sure where you got this from. If weāre talking about user-space API, kernel developers consider it a bug when user-space API breaks and lose their sleep over it.
You move on to suggest VMs instead. I guess those are bugfree and less complicated?
I mean, no.
Given your experience, you probably know that the issue here is the TCB. And that of VMs is not really lower than that of bubblewrap.
However, both systems can operate under the assumption āthe binary may do anythingā. SafeHaskell doesnāt.
This is possibly the first post in this thread that seems to be advocating for keeping Safe Haskell, but Iām afraid itās so full of sarcasm that I donāt understand it.
Do you think we should keep Safe Haskell? What are its benefits? You have three links, but itās not clear to me what any of them have to do with it.
In my ten years of using Haskell, āSafe Haskellā has only ever been a vestigial feature of the Hackage website. What I mean by that is Iāve never directly worked with it, used it, or relied on it.
Now, itās possible that I have passively benefited from it! But I wouldnāt know. I also donāt know if I could have somehow actively benefited from it if I had been properly taught about it. I would love it if people could point it out to me if that is the case.
Same here, I thought Safe Haskell
was sort of āwe trust the mantainer of this packageā
You make this claim and donāt back it up with anything other than āit can have bugsā.
My point is that sandboxes are particularly prone to hard-to-find bugs. More so than a compiler. Not only can the sandbox itself be buggy, but the whole structure of the setup tends to put bugs from the layer outside the sandbox onto the attack surface.
The point is to have different abstractions.
If you want to put something like Safe Haskell inside a sandbox, thus getting multiple layers of protection[^1], and if you have the resources to do that, then thatās an absolutely fine idea[^2].
But claiming that āsandboxes solve the problem, so Safe Haskell isnāt neededā isnāt that. Itās just abandoning one protection in favor of another. If you actively limit yourself to a single layer, how can you claim to be getting the benefit of more than one? And if you are limited to only one layer, either by choice or by circumstance, then, all other things being equal, you want to keep the more understandable one.
If you trust haskell, the RTS and your binary, why even sandbox? Or use SafeHaskell?
Because I donāt axiomatically ātrust my binaryā. Iām trying to make my binary more trustworthy, and Iād like to have language features to do that. Iām actively running code that I do not consider trustworthy, or at least code in which I have relatively limited confidence. I want to limit what it can do, and I want to be able to observe what it does do.
If I pull in something from Hackage, even if I give the code a once-over, I still canāt be really confident that itās not deliberately trying to cause trouble. To be honest, most projects donāt even do the āonce overā step. Theyāre not even necessarily as wrong as youād think to skip it, either, because trying to actually find intentional misbehavior with code review is a fraught project, even if you put in a lot of review time. And nonetheless youāre basically forced to pull in a lot of code.
No, the point is we donāt trust it. And language level features to gain that trust are at the utterly wrong level. Unless we start to be serious and have a verified compiler etc
You also donāt have a verified sandbox, a verified kernel, a verified hypervisor, verified SMM code, the rest of a verified BIOS, verified AMT/PSP/whatever, verified expansion card firmware, or a verified CPU. Most of those are in your TCB no matter what you do. Why do you feel you can rely on them if you canāt rely on an unverified compiler?
Iām not sure where you got this from. If weāre talking about user-space API, kernel developers consider it a bug when user-space API breaks and lose their sleep over it.
I think thereās a deep misunderstanding here. Iām not sure exactly where.
Itās not that kernel developers willingly break existing user-mode code. Itās that theyāre always extending what the API can do. If your sandbox has decided to allow ioctl
on sockets of type foobar, and the kernel developers add a dangerous ioctl, theyāre not breaking the API from their point of viewā¦ but they are breaking your sandbox.
ā¦ and the whole interface with the kernel is fractally complicated, with danger able to pop up anywhere. For example, did you know that in Linux, you can pass one of your open FDs to another process over a UNIX domain socket using sendmsg
? That feature is decades old and inherited from BSD. Itās a good feature; you could use it to build a capability system on top of Linux. But you can also use it to break a sandbox.
If you did know about that example, youāre unusual; most people donāt. I have definitely known many people who have built sandboxes who havenāt known about it. And itās just one example among an unknown but large number.
You move on to suggest VMs instead. I guess those are bugfree and less complicated?
No, they are not bug free (as I said). They are also complicated (as I said). However, yes, they are in fact a lot simpler than the Linux kernel.
The TCB is only one concern; the other question is how much of the TCBās surface is exposed to attack. I accept that the TCB containing a VM is larger than the TCB containing a program under bubblewrap, because your average hypervisor is larger than bubblewrap, and probably larger than the kernel code that actually implements most of what bubblewrap does. But, by the most complete definition, the TCB for both of them includes all of the host kernel, plus all the ancillary stuff I mentioned above. That stuff is a lot bigger than either of them, so the ratio of the TCB sizes isnāt really enormous. No matter what, you have a giant TCB. But the VMās attack surface is meaningfully smaller and simpler.
[^1] Layers are good, but they do have a pitfall in real projects. Itās easy for both programmers and project managers to fall into thinking that, because there are two or more layers, they donāt have to assure that any of them actually work. Nobody explicitly says that, or even explicitly thinks it, but thereās always a tendency to lose your healthy fear of cutting corners, because āthe other layer will take up the slackā. Thatās exacerbated because your resources are spread over the layers. You can end up with every layer being so full of holes that itās not hard to escape the whole thing. It takes discipline and good project governance to avoid that, and those can be in short supply.
[^2] Iām kind of allergic to calling it ādifferent abstractionsā rather than ālayers of protectionā. One of the reasons sandboxes tend to be bug-prone is that the set of operations they have to pass judgement on isnāt usually designed around any particular abstractions at all, which means that the number of regular, understandable things you can rely on when deciding what to enforce is relatively limited.
Without the sarcasm:
Condensed version:
Since itās already in GHC, letās first try to remediate Safe Haskell.
Failing that, try recycling it to make a replacement that everyone finds mutually-tolerable.
If neither is possible, then remove Safe Haskell.
Theyāre listed in the associated paper.
Explaining why I mentioned them is probably best left for another (long!) threadā¦
Telling other people what to do does not really work in open source. I am yet to see a proponent of Safe Haskell who is ready to get hands dirty. Itās all talk ādo not remove, we should fix itā, where āweā is always someone else.
Yes, it would be nice to have Safe Haskell for some definition of āsafeā. Except that no one has figured out which exactly. We know multiple reasons why Safe Haskell failed, but we donāt know what can be remediated to make it useful.
If anyone is interested to work on fixing / improving / remediating Safe Haskell and has an idea what needs to be done, please step forward. If there is no one, letās stop paying ongoing costs for this misfeature.
ā¦it isnāt a decree, only a suggestion - mine have been ignored in the past, and no doubt will be ignored in the future. And it seems (at least from the majority of comments here) that:
Safe Haskell is more hindering than helpful,
thereās no interest in finding another way to checking if unsafe
features or extensions are being used in a Haskell module (other than e.g. using grep
),
and there wonāt be any adverse consequences in the long term as a result of removing Safe Haskell.
That is all rather optimistic, at least to me. But maybe Iām just too old and cynicalā¦
Any evidence for this? I donāt really agree. Thereāve been lots of very subtle but dangerous bugs in GHC, such as broken maths on M1 in recent releases.
I donāt think thereās any use in making claims about number of bugs beyond considering the TCB. The rest is subjective, unless we have data.
Not at all. Because SafeHaskell doesnāt give any protection whatsoever. Iāve pointed out multiple times in this thread what the research standard on secure programming is (as in: language level security). SafeHaskell is not even near it.
Security layers donāt work this way: if you canāt trust code generation, you canāt trust the type system either.
If you donāt trust your kernel, you canāt trust anything else running on your OS.
This can be solved by a sufficiently sophisticated effects system that enforces no system calls can happen outside of defined effects.
However, this is not a proper security layer either with an untrustworthy compiler.
Excellent point. You can have levels of security without verification.
But my stance is that if I do security inside the same system (especially language), then thereās no way to make this interesting other than going formal verification route.
E.g. you donāt trust bubblewrap. I could say, wellā¦ let me rewrite it in Haskell utilizing SafeHaskell. Would that catch your attention? Probably not! And rightly so.
You want to know the binary behaves correctly, not that someone used fancy language features.
And bubblewrap, although not verified, works at exactly that layer: the outermost external layer, making no assumptions, having no knowledge what Haskell is.
Your anecdotes about sandbox bugs are interesting and I certainly did not know about them.
I could put it another way: thereās a reason some companies donāt let programmers write test cases. Subconscious bias, assumptions, etc. The only way to solve this is to either have proof the tests are good and exhaustive or to separate these concerns strictly.