Good and minimal XML parsing library?

There are a bunch of XML libraries out there, but all of them look a bit clunky and I can’t decide on the one I like.

What is the simplest and the least dependency heavy one that makes it easy to parse the data from XML, and generate XML? I also need the name space support.

So far the hxt seems like the best choice, although I do not like the network dependency in it.

2 Likes

xeno might suit your taste.

2 Likes

xeno does not seem to support namespaces, does it?

I think xml-conduit is the best choice, if you require namespace support. It’s somewhat heavy on dependencies, bringing in conduit and network and what not, but fairly solid.

7 Likes

I also got along well with xml-conduit so far. It’s really solid, has namespace support, and there is a DOM-based as well as a streaming API, in case you have to work with really big XML files. There is a nice tutorial; it covers only the DOM API though, so I made a blog post about the streaming API a while back.

2 Likes

Hm.. yeah. I do like the tutorial they have, but it’s still a bit beefy. I need to write some utilities that ought to function on different servers without the containerization so I’m looking for something that’s small, and has almost no dependencies to “system” layer like networking. I guess it won’t be a problem if I simply do not use networking, but I don’t know if any of the dependencies uses some standard libraries and stuff that may make static linking necessary.

I’ve used Xeno for cases where the full XML fits in memory, it worked well for that. There is minimal support for namespaces (as in tags can have colons). It has very few dependencies.

For large files I’ve used xml-conduit (where you can use conduit to stream from a zip file that would be larger than memory if expanded) and generally been happy with it. It’s slower than xeno, but supports ~constant-memory streaming, which I at least didn’t manage to do with xeno.

3 Likes

conduit and network themselves at least don’t have many (any?) exotic dependencies. mostly exceptionally standard stuff.

idk about network, but conduit’s API has been rock solid stable for a while now. It feels like it kinda Is What It Is at this point.

I failed to build conduit stuff because I don’t have zlib or something on my machine, and apparently a compression algorithm is very important in stream processing xD.

So yeah… I guess I’ll do with hxt after all, as it appears to be least dependent on stuff outside of the Haskell ecosystem.

That’s somewhat unexpected. Could you copy the error message you got?

Not really, as I do not have zlib installed on my machine, and one of the xml-conduit dependencies requires zlib which uses C library in the background.

Failed to build zlib-0.7.1.0. The failure occurred during the configure step.
Build log (
/home/mastarija/.cabal/logs/ghc-9.6.6/zlib-0.7.1.0-d1004059a4860be25320d478d7a503888aeacaa27fb9daba75518a4d88032baa.log
):
Configuring library for zlib-0.7.1.0...
Error: [Cabal-4345]
Missing dependency on a foreign library:
* Missing (or bad) header file: zlib.h
* Missing (or bad) C library: z
If the header file does exist, it may contain errors that are caught by the C compiler at the preprocessing stage. In this case you can re-run configure with the verbosity flag -v3 to see the error messages.

Error: [Cabal-7125]
Failed to build zlib-0.7.1.0 (which is required by izdajnik-0.1.0.0). See the build log above for details.

EDIT: But don’t worry. I’ll manage with hxt. As I’ve said. I need something without “external” dependencies for the lack of a better word.

zlib is supposed to use a bundled C library if a system one cannot be found. You might want to upgrade your Cabal installation, dependency solver in older Cabal releases was not flexible enough.

1 Like

I’ve had this error message very often on NixOS. I always just do nix-shell -p pkg-config -p zlib.

Edit: I don’t mean that this status quo is good, but just to add that this is something I often encounter and what workaround I use to “fix” it.

Yes. I’m on NixOS too. But the point stands I’d say. I am looking for something more “native” so it’s easily portable, and so that I do not get unexpected issues with it. I think I like hxt now that I’ve played with it for a bit, so I’ll stick to that for now and explore other suggestions if I hit a wall somewhere down the line.

Thanks everyone for your advice.

If you are on NixOS, it often struggles to find native libraries unless you’re using a nix-provided GHC that’s been told to use it. You can try: nix-shell -p 'haskellPackages.ghcWithPackages(p: [p.xml-conduit]).

I do use nix provided ghc (9.6.6) and cabal (3.12.1.0), but as a global package, not within the shell. I’m using the simple cabal workflow for this project, as I can’t use nix in this particular case. Anyway. I can get that building without issues if I want to, I just don’t want to if it requires me to use “foreign” libraries :slight_smile:

It’s a weird edge case because Nix-provided GHC on a mainstream OS can often find development libraries from the system package manager, but Nix-provided GHC on NixOS needs to know where to find those libraries. And the easiest way to do that is to build some version of a package that pulls in the library that it wants to FFI into.

For the most part, I think you’d be best served not eschewing “foreign” libraries for its own sake.

C FFI is so well-supported by Haskell that the C ecosystem is a huge asset we have access to.

It’s best to learn how to best manage that sort of thing.

I consider knowledge of how to use and manage C dependencies a core competency of any Haskell programmer. Especially professional ones.

1 Like

I consider knowledge of how to use and manage C dependencies a core competency of any Haskell programmer. Especially professional ones.

I agree that the Haskell C FFI is great and very useful. However I think you may be overstating this case. I have somehow managed to program Haskell professionally for over 11 years and I can count on one hand the number of times I have to even think about interaction with C libraries. And there has been exactly one time I have needed to actually create a C foreign function (it was super easy when I did though!).

I don’t want to minimize the usefulness or ease of the C FFI, but I certainly wouldn’t characterize using it or managing C dependencies as a “core” part of of my career.

1 Like

probably missing zlib-devel package ( C library ), if you are on windows .. good luck, on any linux distribution single pkg manager command

So, I think people are missing my point / requirements. This app I’m making is going to be used in various environments that do not have the luxury of modern build systems CIs etc.

It is going to be built by different people with various skill levels, “deployed” through FTP, shared over email etc.

I’m fully capable of installing a dependency on my own. I’m also fully capable of using the Haskell FFI and managing the libraries. Me saying it requires the zlib is not me saying “I don’t know what to do, please help me.” but me saying “Hey, the conduit stuff doesn’t fit my requirements because it depends on a foreign library”.

The problem are very specific requirements for this one thing I need to make, and for that I DO WANT TO avoid any foreign libraries and I want to depend only on the very bare bones Haskell setup (ghc + cabal).

Anything else will certainly lead to complications down the line which is something I want to avoid.

4 Likes