Pre-HFTP: Package security advisory database

Hello everyone!

Many programming language ecosystems today have a system for tracking known security vulnerabilities in packages. These vulnerability databases are used to provide a variety of services, including:

  1. Build tools that can warn if dependencies contain known vulnerabilities
  2. Source code hosting sites can notify maintainers if vulnerabilities become known for their packages (cf. GitHub’s Dependabot)
  3. Warn people who are browsing a package index if there are known vulnerabilities

These features are directly useful for developers. They’re also useful for organizations who need to live up to various standards like ISO 27001, and having these features can make the difference between being allowed to use Haskell on a project or not.

I think it would be good to have these things for Haskell - thus, I plan to submit an HFTP.

We’ve been having some discussions with GitHub about what it would take to get Dependabot working with Haskell. I’d like to focus on the part of the problem that enables their work first, while making sure that we have a path forward to the above points 1 and 3 as well. This seems like a good way to make our work useful as soon as possible.

To enable Github to set up Dependabot for Haskell, we basically just need to provide them with a data source for vulnerabilities. The most common way to do this is to have a repository full of files in some format that the language community decides on, with vulnerability reports added through a standard Git workflow. The language community supplies the data and a description of the format, and Github will take care of getting it imported. Github also has their own vulnerability database and process, and we’d need to write an importer that talks to their GraphQL API if we want their reports in our repository. We also need to tell them how to find dependencies - they can at least look in a .cabal file, but should we also point them at checked-in freeze files?

I would like to propose that we basically do what Rust does here, except when there are specific reasons not to. Here’s some details about their approach:

  1. Their vulnerability report format consists of a TOML header followed by Markdown text. The header specifies metadata such as the package name and the affected versions, while the free text at the bottom is for a description of the vulnerability itself.
  2. There is a specific committee in charge of merging advisory PRs to the database.
  3. The contents of the advisory database are CC0 (public domain in jurisdictions that support it).

I think that we should make the following changes to what Rust does:

  1. Swap out Cargo’s versioning scheme with Hackage’s (specifically, I’d say that affected version ranges use the same syntax and semantics as .cabal files)
  2. Rust’s format describes which versions are not affected. This strikes me as more difficult to combine with everything else than describing those that are, so I’d think we should do that.
  3. We should review the categories of vulnerabilities - I think that we may need to swap some out. That will be part of the detailed proposal, but I would appreciate feedback on this here and now.
  4. The Rust format may optionally contain pointers to the affected functions. I think we should additionally allow datatypes there, for cases of things like hash collision attacks.

Note that the vulnerability database is intended to be used after the vulnerability has been fixed. Anyone may file a vulnerability alert, but we should encourage notifying maintainers first, of course.

Governance is also an important issue here. I’d hope that the database can be administered by representatives from trusted community infrastructure providers, namely Hackage trustees, Stackage, a Core Libraries Committee delegate, GHC developers, etc.

I’ve posted on the Rust Zulip instance and asked for their reflections about their setup, and I’ll summarize their replies either here or on the HFTP, depending on timing.

Some things that this could enable, in the long run, that I do not think should be part of the first iteration:

  • Auditing of transitive dependencies by stack and cabal for known vulnerabilities
  • Warnings about known vulnerabilities on the Web interfaces to Hackage and Stackage

What do you all think?

15 Likes

Excellent. I’ve worked for a company some time ago that did have to satisfy ISO 27001 and was also developing processes against IEC 62443 (which is a much more interesting standard btw).

In that time we also developed a system that would use CAPEC to analyze vulnerabilities. The connection with vulnerability database here is CAPEC → CWECVE, where CAPEC describes attack patterns, CWE described weaknesses and CVE describes concrete vulnerabilities.

These 3 sets of databases, I’d argue, are commonly used in industry to perform risk assessments.

So I’d say it would be worthwhile to participate in that MITRE workflow, when possible, as a long-term goal. I’m not sure if there are any vendors that would already be theoretically impacted by Haskell ecosystem vulnerabilities.

Our own global vulnerability database would probably be the first step in that direction.

4 Likes

Thanks for the insight - I’ve only tangentially approached these topics in my past working life, and it sounds like you have a lot of relevant experience. Do you see anything in the proposed first steps above that would complicate participating in these larger workflows later?

1 Like

Looking at the advisory format of RustSec it seems they already have thought about integration with CVE.

E.g. RUSTSEC-2021-0071 references CVE-2021-3013.

3 Likes

but should we also point them at checked-in freeze files?

I just want to caution that all the “dependabots” I know of on GitHub today work exclusively from lock/freeze files. They also generate PRs manually updating those lock/freeze files.

I could be wrong, but I am not sure whether solving is involved. It could just be making PRs that “bump” pinned versions, and simply let CI check whether those bumped plans continue to satisfy all version constraints.

Now, there is no actual problem here. Dependabot can similarly modify versions bounds and let CI (which solves on the fly) sort it out. But the rarity of freeze files in the Haskell ecosystem versus their prevalence everywhere else could definitely take GitHub by surprise and raise some eyebrows.

That you mentioned “freeze files” probably means all this has already been discussed, but because compliance is such a delicate thing I wanted to make sure these issues were highlighted for the rest of the community.

1 Like

Checked in freeze files can only further specify what cabal files already allow, so I don’t think they need to look at both? Looking at cabal files gives a superset of results from freeze files…

1 Like

Great proposal, IIRC it was first mentioned on Slack.

However, I would prefer that the effort would be more focused on hackage, because not all projects are hosted on Github/GitLab (and dependabot may not be enabled).

Doing so would allow us to directly notify maintainers whenever a vulnerability is found.

PS: if a task force is created, I’d like to participate

1 Like

I think the key enabler for dependabot or anything else is really the same – a centralized repository that hosts known vulns in a standard format that other tools can integrate with. Since dependabot is an existing tool that integrates with such repos, it is a good “forcing function” as something to target.

The design for how hackage and or cabal would integrate with such a repo isn’t entirely clear to me. We’d want some tool pulling from the repo and syncing to hackage, I imagine. Then would we want a “vuln db” on hackage? How would it be presented? I suppose A) as info attached to each package page, and B) also in some file format consumable by cabal. So we need to specify that format. Having done so, then I’d want a command for cabal audit or the like that can act as sort of a “local dependabot” for vulns, but also perhaps handle some license-checks etc along the way.

4 Likes

Thanks for your thoughts.

Yes, the vuln db (I would call it an “advisory db”) should be part of or integrate with Hackage and cabal-install.

There are technical and policy considerations to work out. Technical is about what data to include and tooling integration. Policy is about people and processes for managing the advisory db.

For the technical side, here are some rough ideas:

  • a security advisory should attach to a particular package, specifying the version range affected
  • the advisory should include CVE and CWE references (where relevant)
  • value of including CVSS score is debatable; the score often varies depending on how software is configured or used. Vendors frequently advise different CVSS scores for the same CVE.
  • Hackage (and other package UIs) should flag affected versions and also link to a list of all advisories for each package, and for all packages
  • cabal-install should update the advisory db as part of cabal update and perhaps report on new advisories (if any)
  • cabal-install should reject or warn about affected versions when resolving dependencies, with a config knob for the default behaviour (reject/warn/ignore). What the default should be is debatable, but I would start with warning.
  • cabal-install should offer an audit command that audits the dependencies of package (mindful of bounds, to avoid spurious warnings).
  • There should be way to suppress, per project, particular advisories. Using the aeson example again, if your program or downstream library is definitely using it in a safe way despite the weakness, you should be able to suppress that advisory and move on.

[edit not all of this has to be done in one go, of course]

On policy, it is an important matter. I would imagine a small committee, appointed by Haskell Foundation or via election managed by HF, to be in charge of the advisory db. There need to be clear procedures for creating and updating advisories, and handling disputes.

Final note: the term “security advisory” is perhaps better than “vulnerability”. People can get very passionate about whether something is truly a vulnerability or not - see again the aeson example. But most people could agree that an advisory about the issue is reasonable.

3 Likes

Transitive deps are the difference.

With a lock file, dependabot happily bumps your transitive deps for you, and if you don’t get any such MRs (and you trust dependabot) you can be confident you aren’t using packages with known vulns.

With cabal files alone — without a freeze file or project file constraint stanza etc. — you are at the mercy of your deps also adjusting their bounds to avoid bad packages. Everyone must accept the dependabot MRs (or otherwise avoid packages with known vulns) for bad packages to not appear in solutions.

Also, there is a matter that removing packages from bounds can make bounds much harder to read. The modifications are also “needlessly” duplicated across packages. This is a matter of opinion, but one way to interpret bounds is as being more concerned with defining the interface of packages allowed rather then the exact package. (e.g. foo >= 1.0 && < 1.1 can mean "give me something compatible with foo version 1.) (This philosophical point makes the most sense when considering hypothetical versions that haven’t been released yet; happy to go into detail somewhere else on that.) From this perspective, since the implementation is considered to have the vuln not the interface, modifying bounds is excessive.

Any any event, the lockfile-based systems don’t bother opening bonds, because changing lockfiles alone is sufficient.

Perhaps feeding in a black list to the solver (c.f. deprecated packages on Hackage ??) when there is no freeze file / constraints is sufficient. This also insulates one from worrying about whatever dependencies might be doing / not doing for their bounds, because the back list overrides whatever lax bounds there are.

If we do that, then there are really 2 different tracks for what to do with the Vuln DB:

  1. Modify cabal.project / cabal.freeze, etc. Work for dependabot to do. Modification of the code in the repo.
  2. Provide black list to solver, fresh solution each time. No work for dependabot to do. No modification of code in the repo.

Final note: the term “security advisory” is perhaps better than “vulnerability”. People can get very passionate about whether something is truly a vulnerability or not - see again the aeson example. But most people could agree that an advisory about the issue is reasonable.

I absolutely agree that “Advisory database” is the right term to use. This is also what the Rustsec repository is called - this is an example of where we should do what they do unless we have a reason to do otherwise, because they’ve been through these discussions!

3 Likes

It’s nice if we can get advisories about transitive dependencies, but I don’t think that Dependabot should be our source of trust in whether we have known vulnerabilities. I see it as a good way to get something useful up and running quickly, and it brings real value, but it need not be able to do a full audit. We can build tools for that in a future phase of the project.

Dependabot currently supports a variety of formats, some of which are freeze files with a whole set of dependencies, and some of which only show direct dependencies. Getting alerts on direct dependencies only is useful, I think. Checked-in freeze files have issues like being specific to OS and GHC versions that make me a bit wary - I think that supporting them is fine, but perhaps it should be in the “future work” section? Or perhaps we should ask GitHub to additionally look at cabal.project.freeze.X files, where X is some sequence of letters and numbers?

I’m not sure that Dependabot would be sending MRs to adjust bounds in the first iteration. It can run in two modes - advisory mode, where it sends a private message to maintainers about potential dependency issues, and PR mode, where it actually sends PRs to adjust dependencies. I suspect that the former will be implemented before the latter, but I can follow up with GitHub to check for sure.

As a rough first idea, I would think that the synchronization to Hackage could be a git pull, and that Hackage could otherwise be set to read our standard file format. Then, it can put links to advisories on each affected package/version page, along with the README, the maintainer, etc.

The design of cabal audit seems a bit less clear. Should it generate a build plan, then see if it contains advisories? Or should it discover whether there exists a valid build plan that contains advisories? I would guess the latter, but it’s just a feeling.

Similarly, for stack and Stackage, I wonder how this information could be best used. I’ll see if I can point the developers at this thread.

There’s maybe another thing to consider: compilation flags.

I’m thinking specifically to aeson, which has flags that, by default avoid security issues, but can be changed.

I’m not sure we want it for the v1, but I think we might keep that in mind.

Flags are a good point!

I worry that explicitly representing flags in the advisory format is getting to be too specific. If we want to emit advisory warnings depending on the choice of flag, then someone could be lulled into a false sense of security if they ran the audit under a different set of flags than they deploy their code with. So I’d want to have advisories be listed if any flags or compiler version exhibit the issue, and then users can read the description and be careful about things like flags if they want to use the affected version.

Does that make sense?

1 Like

That’s also my concern (I have nix in mind, and I do not see how it could be done), however, if I stick with the aeson example, there’s no version without flags, which might be a good use case for ignoring alerts.

I wonder if we can leverage the existing support for deprecated package versions in Cabal package repositories. Here’s a straw design:

  • Extend the deprecated-versions format to include an optional reason
  • Deprecate package versions which have security advisories, including a link to the advisory in the reason
  • cabal audit can report any deprecated package versions in your build plan, along with the reasons why they are deprecated, and the reason why they appear in the build plan (can easily be established by solving with an additional constraint excluding the deprecated version and reporting the conflict).
  • I’m unsure if you can force cabal to pick a deprecated version by explicitly selecting it… I think I thought you could but I’m not sure. It’s probably important to be able to do this!

This somewhat avoids the transitive dependency problem, because you know that you won’t use any deprecated package versions so long as nothing forces you to. And cabal audit should be able to tell you if you’re getting it wrong. This might be too difficult for dependabot to do, however…

The other downside of this is that it’s a big hammer to deprecate a package version that has any security advisory. The aeson issue is a good example: it’s really totally fine for any use case that isn’t parsing untrusted JSON (i.e lots of them), so do we really want to push everyone hard to go towards the new major version even if they don’t need to? Unclear.

2 Likes

What you point out with the aeson example is a good reason why we should encourage maintainers to deprecate versions that have clear vulnerabilities, but not necessarily require it. But not all applications have the same threat models - is the presence of a timing attack a problem? Sometimes! Is a DOS opportunity with untrusted JSON a problem? Sometimes! Is an HTTP server with a remote code execution flaw a problem? Basically always!

I’m inclined to leave decisions up to maintainers, and provide a channel for informing users about trade-offs. This has the downside of information overload, but it seems like a first step that respects everyone’s autonomy.

3 Likes

It’s nice if we can get advisories about transitive dependencies, but I don’t think that Dependabot should be our source of trust in whether we have known vulnerabilities. I see it as a good way to get something useful up and running quickly, and it brings real value, but it need not be able to do a full audit. We can build tools for that in a future phase of the project.

OK so to flip around what I was saying a bit, I am not worried so much about dependabot on cabal files alone being not good enough so much as I am worried about it being accidentally harmful.

If bounds like

>= 1.1 && < 1.4

get replaced with

>= 1.1 && < 1.1.234 || >= 1.1.235 && <= 1.2.8754 || >= 1.2.8755 ... && < 1.4

I’m afraid we’ve just made cabal files and metadata on Hackage a lot harder to read! :slight_smile:

Furthermore, suppose everyone is

  • running dependabot
  • adjusting their bounds like this
  • uploading their libraries to Hackage

For each library nothing is redundant, but across Hackage as a whole we are now storing the vuln database in a highly duplicative way by “chopping up” bounds in the same way across many packages and versions. Viewing Hackage as a database, we’re denormalizing our data by doing this. That doesn’t strike me as very good, either.

I’m not sure that Dependabot would be sending MRs to adjust bounds in the first iteration. It can run in two modes - advisory mode, where it sends a private message to maintainers about potential dependency issues, and PR mode, where it actually sends PRs to adjust dependencies. I suspect that the former will be implemented before the latter, but I can follow up with GitHub to check for sure.

That’s a good point, but I don’t think it obviates the issues above. Whether users are getting PRs or getting private messages, dependabot is telling the user “I think your bounds are possibly sketchy”. But I would like to have a security system where version bounds are not responsible for avoiding packages wholesale[1].

Dependabot currently supports a variety of formats, some of which are freeze files with a whole set of dependencies, and some of which only show direct dependencies. Getting alerts on direct dependencies only is useful, I think.

Thanks for linking this! So a thing that is noticeable is the “Recommended formats” vs “All supported formats” distinction. While GitHub supports both lock files and “constraint files” in many cases, it always recommends just the lockfiles.[2] This makes me think only looking at freeze files / constraint files should be kosher to the folks at GitHub.

Checked-in freeze files have issues like being specific to OS and GHC versions that make me a bit wary - I think that supporting them is fine, but perhaps it should be in the “future work” section? Or perhaps we should ask GitHub to additionally look at cabal.project.freeze.X files, where X is some sequence of letters and numbers?

I think what you said about PR vs advisory mode handles this fine :slight_smile:. Dependabot may not know how to fix a freeze file given those subtleties, but it should do better at at least pointing out issues for the user to fix. I think that means the user can keep multiple freeze files with any sort of .X convention they like. Dependabot doesn’t need to understand the convention, because the vulns database isn’t OS- or GHC-version-specific either. It merely audits any such freeze file it can find.


Bottom line is I am not trying to say dependabot is bad or that the perfect should be the enemy of the good, but that to start dependabot should just ignore bounds and only warn / send PRs on freeze files / cabal files. Yes, that means dependabot will do a lot less, but that’s OK. The real win here with the HFTP I think is not technical, but so social — not automation, but giving Haskell a central vulns database at all!

Longer term, we can augment Cabal with the ability to use the vuln database too when solving, but that need not block anything we do today with dependabot.


[1] They are less inappropriate for avoiding specific relationships rather than packages altogether. If packages are nodes in a dependency graphs, version bounds constraints on edges. So while I think using bounds to express avoiding certain nodes (e.g. "aeson version 1.x is bad") is not so good, using them to avoid certain edges (e.g. "aeson is version 1.x is generally fine, but as I, package web-ui, use aeson on untrusted user data in so-and-so way, I better avoid it").

[2] requirements.txt I thought was an exception, but then I realized if one is using a pipfile.lock one also has a pipefile which is its “constraint file” sibling. requirements.txt is a legacy format from the pipenv perspective that lies awkardly across the “lock file–constraint file divide”.

Where I currently work, having a security advisory DB would be a requirement for us to consider using Haskell, so this is a great initiative!

I have experience using dependabot and cargo-audit for this purpose. Overall it is a very nice experience to use these tools together.

Our workflow is that when a new advisory is published dependabot automatically files a PR to update dependencies. Our CI will prevent new releases until the issue has been resolved (to encourage actions to be taken by us). Most of the time we can simply upgrade to the new version, which means we do not need to investigate the severity of the issue at this time (we may do that later). There are cases where there is no quick fix, or no solution in sight. When we cannot upgrade we do an investigation of the impact and either add the advisory to an ignore list, or resolve the issue somehow.

The cargo-audit workflow is very simple. cargo audit gives warnings and errors that links to relevant resources and provides a short summary of a suggested resolution, which is usually to upgrade the package, but it could also be downgrading or replacing the package. Ignoring an advisory can be done with cargo audit --ignore RUSTSEC-yyyy-nnn.

As it stands, cargo-audit is separated from cargo (the rust build tool), which means that cargo-audit can be developed mostly independently. The potential downside to this could be that breaking changes to the format of package manifests and freeze files in cargo may not be validated against cargo-audit. I have not seen this issue in practice, but based on changes that have been made to Cabal and Stack in the past, these things may happen so having buy-in from these maintainers is important.

Unfortunately we have fragmentation in the community, dependabot would need to inspect multiple different file formats: cabal freeze files, cabal files, stack lock files, stack.yaml, package.yaml (hpack) to work properly for a majority of projects. That said, an initial integration may not require us to support all of these.

From stackage’s perspective we would need some way to determine whether a release is good or bad. As discussed it is not that clear-cut. With the current stackage workflows; If an advisory is fixed by a new release of a package then we may automatically include it in our next snapshots. If we can’t, perhaps we should treat this like other bounds issues, where we post an issue to our tracker and ping affected maintainers. Then it may be up to these maintainers or others in the community to weigh in on the next steps, e.g. what should be done and what the timeline should be. If there would be a tool that allows us to check are there advisories for foo-X.Y.Z.W? we would have everything we need, and I expect integrating this to be a small task.

Some replies to others:

I just want to caution that all the “dependabots” I know of on GitHub today work exclusively from lock/freeze files. They also generate PRs manually updating those lock/freeze files.

Dependabot works with package manifests as well! See Configuration options for the dependabot.yml file - GitHub Docs . I can confirm that the default auto setting updates manifests for npm and cargo, but only when it is necessary to stay compatible with the lock file.

However, I would prefer that the effort would be more focused on hackage, because not all projects are hosted on Github/GitLab (and dependabot may not be enabled).

Dependabot can also be used for other vendors, and you can run without it being enabled on the provider side. see GitHub - dependabot/dependabot-script: A simple script that demonstrates how to use Dependabot Core. But I agree that ideally, any effort would not be tied to specific vendors.

1 Like