Run RFC 9535 JSONPath queries on Data.Aeson

I have created my very first haskell package. aeson-jsonpath is designed to be for haskell what serde_json_path is in Rust. It also gives you a nice interface to run JSONPath queries (one function call that parses and runs the query). It is currently only on Cabal, but I will be releasing it on Stack and Nix soon. Please suggest any improvements. I have taken full responsibility for maintaining this package. I don’t want this to be an abandoned package like a lot of packages on hackage. Your contributions are also welcome. Thank you.

23 Likes

Great to hear you’re contributing to the ecosystem! Always nice to see people trying to fill gaps.

Seeing as you’re asking for feedback, here some advice and/or questions:

  • Have you looked around for any existing JSON path libraries/functions? (jsonpath for example?)
    And if so, what’s the biggest thing you find lacking? (would like to hear your stance on it)
  • There’s pretty much no documentation in the package. If you don’t know about Haddock, do look into it, it’s a very nice system of adding documentation to source code that results in automatically generated documentation to your package on hackage/stackage/hoogle/etc.
    (If you do already know about Haddock, please add documentation :slight_smile: you can look to other mature packages for best practices: e.g. aeson, persistent, etc.)
  • I see you’re using protolude, which could be considered unnecessary bloat; I don’t see it used much except for toS, which can be replaced with Data.Text.(un)pack.
  • Might be a bit much at this stage, but you could add a TemplateHaskell splice function so that you can guarantee the Text argument is correctly formed at compile time.

I wish you success and good luck on your Haskell journey :slight_smile:

5 Likes

@Vlix Thanks a lot for the feedback!

Have you looked around for any existing JSON path libraries/functions? (jsonpath) for example?

Yes, I looked at the mentioned package. Unfortunately, it is not very maintained (last important commit 2 years ago). It does not comply to the standard and does not make any effort towards that. Lack of maintenance is the biggest reason probably.

If you don’t know about Haddock, do look into it, it’s a very nice system of adding documentation to source code

Yes, I will add documentation soon. Thanks.

I see you’re using protolude, which could be considered unnecessary bloat

Yes, I initially added protolude because I didn’t wanna use Prelude but now that Protolude isn’t maintained anymore (see: github comment), I will remove it in my next release. It is also preventing me to upload my package on stackage because Protolude doesn’t support GHC 9.10.

you can guarantee the Text argument is correctly formed at compile time

Hmm, I guess i don’t understand TemplateHaskell at all yet but I will definitely look into it. Why would it not be correctly formed? :thinking:.

I wish you success and good luck on your haskell jouney

Thank you and same to you.

1 Like

Ah, those are good reasons, yes.

What do you think of giving users the option to form the JSONPath as an AST, instead of plain Text?
I’d imagine it might be easier to programmatically construct the JSONPath as an AST instead of having to have logic in place to form it into a JSONPath Text, that then gets parsed back into an AST anyway. (i.e. your JSPSegments)

Also less surface to make mistakes, since you might get compile time errors if you try to order the segments in the wrong way.

If someone were to use a literal string for the Text argument, you could check that string to be valid JSONPath syntax, so the user would get a compilation error, instead of having to find out in tests or at runtime that the string itself is a bad JSONPath and will always fail.

1 Like

I’d imagine it might be easier to programmatically construct the JSONPath as an AST instead of having to have logic in place to form it into a JSONPath Text, that then gets parsed back into an AST anyway. (i.e. your JSPSegments)

Yes, it is just easier to traverse ASTs. Also the RFC 9535 gives a nice ABNF grammar which we use to construct the AST.

Also less surface to make mistakes, since you might get compile time errors if you try to order the segments in the wrong way.

We don’t need to raise any errors. If the query parses successfully, it runs. If nothing is found there, we return empty list which is correct behavior according to the standard.

If someone were to use a literal string for the Text argument, you could check that string to be valid JSONPath syntax, so the user would get a compilation error, instead of having to find out in tests or at runtime that the string itself is a bad JSONPath and will always fail.

Thanks for this information. This may be very useful and I might add this. Thanks again.

1 Like

I second that TemplateHaskell might be nice. If a well-formed query cannot fail, then that can be expressed in the API.

runJSPQuery :: WellFormedQuery -> Value -> Value

With auxiliary function.

parseQuery :: Text -> Either ParseError WellFormedQuery

Combined with TemplateHaskell it could look like this.

runJSPQuery [jsonpath| $.store.books[-4] |] jsonDoc

[jsonpath| $.store.books[-4] |] turns into WellFormedQuery or fails at compile time, no need to return Either from runJSPQuery.

Bonus points, you can return nice compile time errors from TemplateHaskell: example. As I’m guilty of shamelessly stealing PyF implementation of those, to atone for my sins, feel free to ask or tag me on an issue, I might be able to help.

Now, for some bikeshedding, I am not a fan runJSPQuery name. With query as the name and module imported qualified, it could become JSONPath.query. IMO much nicer, dunno how others feel about it.

5 Likes

@jeukshi Thanks alot for this.

[jsonpath| $.store.books[-4] |] turns into WellFormedQuery or fails at compile time, no need to return Either from runJSPQuery

This makes the API so much simpler. I will definitely add something like this. Thank you so much.

With query as the name and module imported qualified, it could become JSONPath.query. IMO much nicer,

This actually looks great assuming everyone uses qualified imports. In fact, having a simpler name like query might even force people to use qualified imports, because otherwise they might run into variable name shadowing warnings. Genius!

2 Likes

I don’t think the Either is avoidable. Even with a well-formed query into an object, you don’t know the shape of the object will correspond to the query.

Here’s the RFC.
Given a well formed query, the output should be a nodelist, implemented however you like.
So the result can be implemented as [Value], or Array (Vector Value) :: Value (what aeson-jsonpath does) or Array :: Vector Value or Vector Value.
If the shape doesn’t match you get an empty nodelist.

Perhaps Value is not the best return type for runJSPQuery since it may look like you can get a Value that is not an Array.
You can use Either, too. How an Either result can look: Either () (NonEmpty Value).

Another discussion is how lazy you want to make this. If I want the first node of a jsonpath query that can return 100 nodes, is it better to have [Value] or Array !Vector? Should I be concat-ing all those Vectors as I traverse the Value?
But then again if you just want 1 node then you can just reflect that in the selector. You can make two queries: one for 1 node, and one for all nodes.

1 Like

Perhaps Value is not the best return type for runJSPQuery since it may look like you can get a Value that is not an Array.

I guess then we can just return Array. It is implementation dependent, but popular implementations like rust’s serdejsonpath always return an array. We can see what the users demand later and change our implementation accordingly if need be.

If I want the first node of a jsonpath query that can return 100 nodes, is it better to have [Value] or Array !Vector? Should I be concat-ing all those Vectors as I traverse the Value?

100 nodes is peanuts for modern computers, but maybe a more interesting question from a data structures PoV is how many nodes on average can a JSONPath query return, over all possible documents. I suspect not that many, so even copying into a strict Vector wouldn’t be that big of a deal.

That’s how e.g beautifulsoup and scalpel work (scalpel: A high level web scraping library for Haskell.) (first match vs all matches)

2 Likes

Currently, the function doesn’t return Array, but JSON Value, and I agree with @darkxero that this might not be the best type. It won’t be long till users complain. How does one consume such API?

res <- runJSPQuery "$" jsonDoc
case res of
    Array arr -> ... -- cool, I get to handle `arr` which is `Vector Value`
    Object obj ->  ... -- not cool, what should I do here?
   _ -> ... -- same as above

So why not give me Vector Value in the first place?

Rust implementation also gives me query_located function, where I get Value and its NormalizedPath. That might be useful, but can’t be expressed with Value. I’d expect something like Vector (NormalizedPath, Value).

1 Like

@jeukshi Thanks for pointing this out. I guess we can learn many useful things from the Rust implementation.

I’d use this library for ghcup to get and set config values, which at the moment is a bit awkwardly implemented.

However, it seems:

  • retrieving fields that have hyphens doesn’t work very well (appears to need additional quotes):

    ghci> runJSPQuery "$.first-name" (fromJust $ decode @Value "{ \"first-name\": \"Chris\" }")
    Right (Array [])
    ghci> runJSPQuery "$.firstName" (fromJust $ decode @Value "{ \"firstName\": \"Chris\" }")
    Right (String "Chris")
    
  • I’d need a setter too, not just a query (working over Value is enough, it does not need to be strongly typed)… is this out of scope? The internals are not exposed.

  • the dependency footprint needs to be small

1 Like

I’d need a setter too, not just a query

The RFC doesn’t specify setting, but it seems easy enough. The pattern of “retrieve all values targeted by the expression” or “set all positions targeted by the expression” looks exactly like a lens Traversal, if you wanted to expose a fairly well-known interface that handles both getting and setting gracefully.

i.e. you would write something like:

jspTraversal :: WellFormedQuery -> Traversal' Value Value

(You can define it in terms of the definition of Traversal so you don’t need the lens dependency, also)

4 Likes

Is there a way to use this with YAML while retaining comments?

1 Like

Hi, i might add something to work with YAML as well. Please consider opening an issue here. This way it would be easier to keep this discussion in a single thread. Thank you.

1 Like

In case someone is interested, I have made a new release. Check it out and read the changelog here: changelog.

1 Like

Just to clear this up, according to RFC the only ASCII special character allowed in member-name-shorthand is _ (underscore). If you really need the first-name, then query should be something like $['first-name'].

Released v0.3.0.0. Check out changelog. I am open to any new suggestions that you might like implemented in the next release. Thanks.

3 Likes