I have created my very first haskell package. aeson-jsonpath is designed to be for haskell what serde_json_path is in Rust. It also gives you a nice interface to run JSONPath queries (one function call that parses and runs the query). It is currently only on Cabal, but I will be releasing it on Stack and Nix soon. Please suggest any improvements. I have taken full responsibility for maintaining this package. I don’t want this to be an abandoned package like a lot of packages on hackage. Your contributions are also welcome. Thank you.
Great to hear you’re contributing to the ecosystem! Always nice to see people trying to fill gaps.
Seeing as you’re asking for feedback, here some advice and/or questions:
- Have you looked around for any existing JSON path libraries/functions? (jsonpath for example?)
And if so, what’s the biggest thing you find lacking? (would like to hear your stance on it) - There’s pretty much no documentation in the package. If you don’t know about Haddock, do look into it, it’s a very nice system of adding documentation to source code that results in automatically generated documentation to your package on hackage/stackage/hoogle/etc.
(If you do already know about Haddock, please add documentation you can look to other mature packages for best practices: e.g. aeson, persistent, etc.) - I see you’re using
protolude
, which could be considered unnecessary bloat; I don’t see it used much except fortoS
, which can be replaced withData.Text.(un)pack
. - Might be a bit much at this stage, but you could add a TemplateHaskell splice function so that you can guarantee the
Text
argument is correctly formed at compile time.
I wish you success and good luck on your Haskell journey
@Vlix Thanks a lot for the feedback!
Have you looked around for any existing JSON path libraries/functions? (jsonpath) for example?
Yes, I looked at the mentioned package. Unfortunately, it is not very maintained (last important commit 2 years ago). It does not comply to the standard and does not make any effort towards that. Lack of maintenance is the biggest reason probably.
If you don’t know about Haddock, do look into it, it’s a very nice system of adding documentation to source code
Yes, I will add documentation soon. Thanks.
I see you’re using
protolude
, which could be considered unnecessary bloat
Yes, I initially added protolude because I didn’t wanna use Prelude but now that Protolude isn’t maintained anymore (see: github comment), I will remove it in my next release. It is also preventing me to upload my package on stackage because Protolude doesn’t support GHC 9.10.
you can guarantee the
Text
argument is correctly formed at compile time
Hmm, I guess i don’t understand TemplateHaskell at all yet but I will definitely look into it. Why would it not be correctly formed? .
I wish you success and good luck on your haskell jouney
Thank you and same to you.
Ah, those are good reasons, yes.
What do you think of giving users the option to form the JSONPath as an AST, instead of plain Text
?
I’d imagine it might be easier to programmatically construct the JSONPath as an AST instead of having to have logic in place to form it into a JSONPath
Text
, that then gets parsed back into an AST anyway. (i.e. your JSPSegments
)
Also less surface to make mistakes, since you might get compile time errors if you try to order the segments in the wrong way.
If someone were to use a literal string for the Text
argument, you could check that string to be valid JSONPath syntax, so the user would get a compilation error, instead of having to find out in tests or at runtime that the string itself is a bad JSONPath and will always fail.
I’d imagine it might be easier to programmatically construct the JSONPath as an AST instead of having to have logic in place to form it into a
JSONPath
Text
, that then gets parsed back into an AST anyway. (i.e. yourJSPSegments
)
Yes, it is just easier to traverse ASTs. Also the RFC 9535 gives a nice ABNF grammar which we use to construct the AST.
Also less surface to make mistakes, since you might get compile time errors if you try to order the segments in the wrong way.
We don’t need to raise any errors. If the query parses successfully, it runs. If nothing is found there, we return empty list which is correct behavior according to the standard.
If someone were to use a literal string for the
Text
argument, you could check that string to be valid JSONPath syntax, so the user would get a compilation error, instead of having to find out in tests or at runtime that the string itself is a bad JSONPath and will always fail.
Thanks for this information. This may be very useful and I might add this. Thanks again.
I second that TemplateHaskell might be nice. If a well-formed query cannot fail, then that can be expressed in the API.
runJSPQuery :: WellFormedQuery -> Value -> Value
With auxiliary function.
parseQuery :: Text -> Either ParseError WellFormedQuery
Combined with TemplateHaskell it could look like this.
runJSPQuery [jsonpath| $.store.books[-4] |] jsonDoc
[jsonpath| $.store.books[-4] |]
turns into WellFormedQuery
or fails at compile time, no need to return Either from runJSPQuery
.
Bonus points, you can return nice compile time errors from TemplateHaskell: example. As I’m guilty of shamelessly stealing PyF
implementation of those, to atone for my sins, feel free to ask or tag me on an issue, I might be able to help.
Now, for some bikeshedding, I am not a fan runJSPQuery
name. With query
as the name and module imported qualified, it could become JSONPath.query
. IMO much nicer, dunno how others feel about it.
@jeukshi Thanks alot for this.
[jsonpath| $.store.books[-4] |]
turns intoWellFormedQuery
or fails at compile time, no need to return Either fromrunJSPQuery
This makes the API so much simpler. I will definitely add something like this. Thank you so much.
With
query
as the name and module imported qualified, it could becomeJSONPath.query
. IMO much nicer,
This actually looks great assuming everyone uses qualified imports. In fact, having a simpler name like query
might even force people to use qualified imports, because otherwise they might run into variable name shadowing warnings. Genius!
I don’t think the Either
is avoidable. Even with a well-formed query into an object, you don’t know the shape of the object will correspond to the query.
Here’s the RFC.
Given a well formed query, the output should be a nodelist, implemented however you like.
So the result can be implemented as [Value]
, or Array (Vector Value) :: Value
(what aeson-jsonpath does) or Array :: Vector Value
or Vector Value
.
If the shape doesn’t match you get an empty nodelist.
Perhaps Value is not the best return type for runJSPQuery since it may look like you can get a Value that is not an Array.
You can use Either, too. How an Either result can look: Either () (NonEmpty Value)
.
Another discussion is how lazy you want to make this. If I want the first node of a jsonpath query that can return 100 nodes, is it better to have [Value]
or Array !Vector
? Should I be concat-ing all those Vectors as I traverse the Value?
But then again if you just want 1 node then you can just reflect that in the selector. You can make two queries: one for 1 node, and one for all nodes.
Perhaps Value is not the best return type for runJSPQuery since it may look like you can get a Value that is not an Array.
I guess then we can just return Array
. It is implementation dependent, but popular implementations like rust’s serdejsonpath always return an array. We can see what the users demand later and change our implementation accordingly if need be.
If I want the first node of a jsonpath query that can return 100 nodes, is it better to have
[Value]
orArray !Vector
? Should I be concat-ing all those Vectors as I traverse the Value?
100 nodes is peanuts for modern computers, but maybe a more interesting question from a data structures PoV is how many nodes on average can a JSONPath query return, over all possible documents. I suspect not that many, so even copying into a strict Vector wouldn’t be that big of a deal.
That’s how e.g beautifulsoup and scalpel work (scalpel: A high level web scraping library for Haskell.) (first match vs all matches)
Currently, the function doesn’t return Array
, but JSON Value
, and I agree with @darkxero that this might not be the best type. It won’t be long till users complain. How does one consume such API?
res <- runJSPQuery "$" jsonDoc
case res of
Array arr -> ... -- cool, I get to handle `arr` which is `Vector Value`
Object obj -> ... -- not cool, what should I do here?
_ -> ... -- same as above
So why not give me Vector Value
in the first place?
Rust implementation also gives me query_located
function, where I get Value
and its NormalizedPath
. That might be useful, but can’t be expressed with Value
. I’d expect something like Vector (NormalizedPath, Value)
.
@jeukshi Thanks for pointing this out. I guess we can learn many useful things from the Rust implementation.
I’d use this library for ghcup to get and set config values, which at the moment is a bit awkwardly implemented.
However, it seems:
-
retrieving fields that have hyphens doesn’t work very well (appears to need additional quotes):
ghci> runJSPQuery "$.first-name" (fromJust $ decode @Value "{ \"first-name\": \"Chris\" }") Right (Array []) ghci> runJSPQuery "$.firstName" (fromJust $ decode @Value "{ \"firstName\": \"Chris\" }") Right (String "Chris")
-
I’d need a setter too, not just a query (working over
Value
is enough, it does not need to be strongly typed)… is this out of scope? The internals are not exposed. -
the dependency footprint needs to be small
I’d need a setter too, not just a query
The RFC doesn’t specify setting, but it seems easy enough. The pattern of “retrieve all values targeted by the expression” or “set all positions targeted by the expression” looks exactly like a lens
Traversal
, if you wanted to expose a fairly well-known interface that handles both getting and setting gracefully.
i.e. you would write something like:
jspTraversal :: WellFormedQuery -> Traversal' Value Value
(You can define it in terms of the definition of Traversal
so you don’t need the lens
dependency, also)
Is there a way to use this with YAML while retaining comments?
Hi, i might add something to work with YAML as well. Please consider opening an issue here. This way it would be easier to keep this discussion in a single thread. Thank you.
In case someone is interested, I have made a new release. Check it out and read the changelog here: changelog.
Just to clear this up, according to RFC the only ASCII special character allowed in member-name-shorthand
is _
(underscore). If you really need the first-name
, then query should be something like $['first-name']
.
Released v0.3.0.0
. Check out changelog. I am open to any new suggestions that you might like implemented in the next release. Thanks.