Would you like to write a security advisory analyzer?

One of the HF’s initiatives this last year has been the creation of a security advisory database for Hackage. This database tracks potential security issues with libraries, and can serve as the basis for a tool that lets you know if you’re affected. The group running the database explicitly did not sign up to build tools, however - that would have been a bit much to ask.

We’d been talking about the Right Thing for this kind of tooling, and the conclusion of an earlier thread is that we would need to access some GHC internals. The reason for this is that it’s important to avoid false positives that train users to ignore advisories. There’s an advisory for a quite obscure function in base, for instance, and we certainly don’t want to warn every Haskell program on the planet about it - just those that use it. Thus, the database also tracks affected exports.

But this will take time and be a bit difficult to implement.

In the meantime, here’s a sketch of a tool that brings a big part of the value for much, much less labor. Let’s call it (for the sake of discussion) konsilisto. When invoked on a project, it would do the following:

  1. Generate a Stack lock file or Cabal freeze file, or accept them as input
  2. Parse them using a combination of a YAML library (for Stack) or Cabal’s syntax and the Cabal library for the version specifiers included in it
  3. Now you have a build plan. For each dependency in this set of transitive dependencies, konsilisto will:
    a. Use hsec-tools to determine whether the advisory database has any notifications about the library and version in question
    b. If the advisories found haven’t been suppressed (see further down), then notify the user about them

It’s important to be able to suppress irrelevant warnings - there will be many false positives when using this kind of approach, and users should be able to investigate, see that the advisory is not relevant, and then not hear it anymore. Thus, there should be a file full of package, version, and advisory ID triples that the tool consults. The reason to give it all three elements, rather than just a list of advisories to suppress, is that an advisory may affect multiple packages, and if you come to depend on a new one where you are affected, or if an update means that your mitigation no longer is effective, you probably want to know.

This approach has a lot of advantages - there’s not many moving parts, it doesn’t need to be part of an existing tool, and it ought not require all that many dependencies. It’s certainly more annoying to use than our grand vision, but somewhat annoying tools that exist beat dreams any day. The SRT has offered to provide advice and tech support to anyone who takes this on.

14 Likes

This is a great initiative!

I’m not sure I agree with this.

Yes, there are instances where you could patch your own code to work around the vulnerability or avoid the functions etc involved.

But in general, you should treat the entire version as tainted.

We don’t want end users trying to stay on vulnerable versions, even if they’re not affected by it. It’s a dangerous road down that path, because:

  1. even as a maintainer, you don’t necessarily understand all the ways the vulnerability could be exploited
  2. as such, tools relying on symbols and on maintainers judgement calls can give false negatives (much worse than false positives)
  3. even if said tools are perfect, it’s against good security practices, which suggest updating as much as possible to keep the cost for attackers high

It’s better to not try to be smart when dealing with security advisories and always assume the worst. This has been practice forever, in Linux distros and (afaik) automotive industry.

I don’t see end users ignoring advisories. They wouldn’t even necessarily know if it’s a false positive.

4 Likes

What you’re saying makes sense as a general rule, but here’s a concrete example of the kind of situation that’s been motivating these thoughts: HSEC-2023-0007.

I actually found this issue when I was experimenting with TOML parsers to construct the bare-bones POC for the advisory database that I could use to recruit volunteers. I found that including extremely large float literals in the file caused the parser to take a long time, an issue that has since been fixed. When it came time to actually write it up for the database, I looked into it more, and found that there was a function in base that was responsible. Together with the SRT members, we looked through Hackage, informed the very few users of the function about the issue, and then opened a report. The Haddocks for the dangerous function in question have been updated to warn about the issue, and deprecation is being considered (or other mitigations like accepting bounds as arguments).

In the meantime, every single Haskell program would receive an advisory that a little-used function in base could be used for a DOS. This seems like an excellent way to train people to ignore advisories in general - there’s no way to write a program that doesn’t use base, upgrading base is a major task that also involves a compiler migration, there is not yet a version of base without this issue, and the problem doesn’t actually affect almost anyone. While I don’t like the aggressive way in which this article is written, the problem that it points out is real, and we should try to avoid it.

3 Likes

Well. Let me give you a counter example:

  1. a haskell library uses some “unsafe” C bindings to tap into a stateful C library (e.g. crypto libs)
  2. the haskell library makes a mistake of using the C API, resulting in a security vulnerability
  3. the author identifies the misuse of the bindings
  4. As a fix, the author switches to a safer, easier to use API of the C library

Now, what Haskell functions does the author mark as vulnerable? The ones that they know where the misuse happened? The C binding itself is not exposed.
And fix 4 might have very well fixed further misuse sites, unknown to the author.

I think this is incredibly questionable practice and your ‘base’ example shows me that the solution is different:

  1. smaller base library
  2. reinstallable base
  3. less breaking changes in new GHC versions

Who is going to judge what the vulnerable functions are? How confident are they about understanding the entire call hierarchy and interplay of the functions/state involved?

I think this works only for very small and simple vulnerabilities. And now we need to hope the people reporting and analyzing those vulnerabilities judge the situation correctly.

Figuring out in general whether a program is exploitable based on a third party vulnerability seems like a huge research area.

If a tool provides such heuristics, it should be made very clear that it opens the door to false negatives. And that also means it shouldn’t be default, IMO.

1 Like

In this case, none, which means the whole package.

You’re absolutely identifying important trade-offs that should be considered here. But I would also like to note that two of your solutions are directly in conflict, as a smaller base is indeed a breaking change.

1 Like

Yes, this has been discussed as part of the base split proposal. Many parts don’t belong in base and have unstable or experimental API. That causes churn for base that would otherwise not exist.

So in order to get to a more stable base, we first have to cause breaking changes by deprecating a large class of modules.

So I’m not sure I’d consider those really conflicting goals. The way to get there just implies a period of disruption, but that should be fine, since the migration for the end users won’t be hard (e.g. importing ghc-internals).

Ah, I’d understood you as advocating a more radical reduction in the contents of base - after all, readFloat would not have likely ended up in ghc-internals.

True, but a reinstallable base would have fixed it.

As for vulnerabilities in the compiler shipped libraries, we simply need a process to fix and ship them (and not tell users to upgrade to the next major version).

GHCup can easily introduce tags and whistles for it.

1 Like

I agree that reinstallable base is great, and will be very useful.

But I still think it’s important to recognize that there is no universal threat model that all security should guard against, and that it’s important to allow tools to serve different contexts and workflows. The default should be conservative, of course.

1 Like