Hi! I’m a Phd student in the field of static analysis and security. Mainly my research topic is about consuming the output of static analysis(CodeQL mainly). I noticed there is only one parsing library on sarif file on hackage. It is primitive and out-of-dated.
I enhanced it to be compatible with the current sarif standard.
My enhancement is here.
I’m thinking to publish such library that parses a JSON-like file format on hackage.
I’d like feedback on what maturity level is expected before publishing it.
I wouldn’t say my interpretation on sarif standard is scientific, but at least systematical.
For a field of an object, if it contains keywords more than just must, the haskell representation is wrapped inside a Maybe. The entry for the decoding is the decodeSarifFileStrict function inside Data.SARIF.Log.hs
I tested the project with 9633 sarif files which are results of prototype pollution query on client side javascript.
From these files, i observe thatdecode sarif_content_bytestring = decode . encode . decode $ sarif_content_bytestring.
From the stats, we can at least assume that the project doesn’t lose any information.
The current status of the project:
Parsing supported: sarif-v2.1.0
API entrypoint: decode @DATA.SARIF.Log :: Data.ByteString.Lazy.ByteString -> DATA.SARIF.Log
Tests: 9633 golden tests
Error messages: sprinkled with things like parseJSON _ = fail "Unexpected value for xxxx"
Possible improvement on the library could be
- Make it consistent with code snippet written by previous contributor and me.