Good and minimal XML parsing library?

There are a bunch of XML libraries out there, but all of them look a bit clunky and I can’t decide on the one I like.

What is the simplest and the least dependency heavy one that makes it easy to parse the data from XML, and generate XML? I also need the name space support.

So far the hxt seems like the best choice, although I do not like the network dependency in it.

2 Likes

xeno might suit your taste.

2 Likes

xeno does not seem to support namespaces, does it?

I think xml-conduit is the best choice, if you require namespace support. It’s somewhat heavy on dependencies, bringing in conduit and network and what not, but fairly solid.

7 Likes

I also got along well with xml-conduit so far. It’s really solid, has namespace support, and there is a DOM-based as well as a streaming API, in case you have to work with really big XML files. There is a nice tutorial; it covers only the DOM API though, so I made a blog post about the streaming API a while back.

2 Likes

Hm.. yeah. I do like the tutorial they have, but it’s still a bit beefy. I need to write some utilities that ought to function on different servers without the containerization so I’m looking for something that’s small, and has almost no dependencies to “system” layer like networking. I guess it won’t be a problem if I simply do not use networking, but I don’t know if any of the dependencies uses some standard libraries and stuff that may make static linking necessary.

I’ve used Xeno for cases where the full XML fits in memory, it worked well for that. There is minimal support for namespaces (as in tags can have colons). It has very few dependencies.

For large files I’ve used xml-conduit (where you can use conduit to stream from a zip file that would be larger than memory if expanded) and generally been happy with it. It’s slower than xeno, but supports ~constant-memory streaming, which I at least didn’t manage to do with xeno.

2 Likes

conduit and network themselves at least don’t have many (any?) exotic dependencies. mostly exceptionally standard stuff.

idk about network, but conduit’s API has been rock solid stable for a while now. It feels like it kinda Is What It Is at this point.

I failed to build conduit stuff because I don’t have zlib or something on my machine, and apparently a compression algorithm is very important in stream processing xD.

So yeah… I guess I’ll do with hxt after all, as it appears to be least dependent on stuff outside of the Haskell ecosystem.

That’s somewhat unexpected. Could you copy the error message you got?

Not really, as I do not have zlib installed on my machine, and one of the xml-conduit dependencies requires zlib which uses C library in the background.

Failed to build zlib-0.7.1.0. The failure occurred during the configure step.
Build log (
/home/mastarija/.cabal/logs/ghc-9.6.6/zlib-0.7.1.0-d1004059a4860be25320d478d7a503888aeacaa27fb9daba75518a4d88032baa.log
):
Configuring library for zlib-0.7.1.0...
Error: [Cabal-4345]
Missing dependency on a foreign library:
* Missing (or bad) header file: zlib.h
* Missing (or bad) C library: z
If the header file does exist, it may contain errors that are caught by the C compiler at the preprocessing stage. In this case you can re-run configure with the verbosity flag -v3 to see the error messages.

Error: [Cabal-7125]
Failed to build zlib-0.7.1.0 (which is required by izdajnik-0.1.0.0). See the build log above for details.

EDIT: But don’t worry. I’ll manage with hxt. As I’ve said. I need something without “external” dependencies for the lack of a better word.

zlib is supposed to use a bundled C library if a system one cannot be found. You might want to upgrade your Cabal installation, dependency solver in older Cabal releases was not flexible enough.

1 Like

I’ve had this error message very often on NixOS. I always just do nix-shell -p pkg-config -p zlib.

Edit: I don’t mean that this status quo is good, but just to add that this is something I often encounter and what workaround I use to “fix” it.

Yes. I’m on NixOS too. But the point stands I’d say. I am looking for something more “native” so it’s easily portable, and so that I do not get unexpected issues with it. I think I like hxt now that I’ve played with it for a bit, so I’ll stick to that for now and explore other suggestions if I hit a wall somewhere down the line.

Thanks everyone for your advice.

If you are on NixOS, it often struggles to find native libraries unless you’re using a nix-provided GHC that’s been told to use it. You can try: nix-shell -p 'haskellPackages.ghcWithPackages(p: [p.xml-conduit]).

I do use nix provided ghc (9.6.6) and cabal (3.12.1.0), but as a global package, not within the shell. I’m using the simple cabal workflow for this project, as I can’t use nix in this particular case. Anyway. I can get that building without issues if I want to, I just don’t want to if it requires me to use “foreign” libraries :slight_smile: