TL;DR
I tried to vibe code an XML schema parser and code generator with Github Copilot using Claude Opus 4.6. I made good progress and learned a lot, but the AI kept lying to me in subtle and enraging ways.
My one-line recommendation on AI
Vibe coding will only pay off if you have, or create, or generate, an impeccable, readable test suite; or if you don’t care about quality.
XSD what now
Some obscure and arguably outdated APIs use the XML Schema Definition format to define their SOAP schemata. It’s a schema definition language as you’d expect, it defines what XMLs are valid and which ones are not - requiring specific element names, types, orders, ranges, whatnot. XSD is - to my knowledge - only supported in HaXML which is a bit dated (it uses String instead of Text, it doesn’t use xml-types).
I need a modern XSD parser and code generator at work. Given an XSD file, it needs to define:
- A type corresponding to valid XMLs
- A parser parsing XMLs into that type
- A pretty printer
XSD is self hosted
In order not to mess this up, a good implementation should be bootstrapped into a self hosting one. What? Well, XSD is written in itself. Meaning, there is a meta schema which describes valid XSD files, and that meta schema is an XSD schema.
So when you make a XSD code generator, the first thing you do is: generate the corresponding code for the xsd meta schema, and point your code generation to the datatype defined there.
The journey I’ll sing about
I obviously couldn’t be bothered to do this by hand. XSD is complex and has corner cases. The great openapi3 code generator was a thesis, not an afternoon project. Ok, on the Haskell matrix channel I have been mistaken for an AI in the past, so maybe I should have this relentless patience and just chew through it. But didn’t you know that AIs are also lazy? Oh, you will soon!
So an AI has to do the gritty job. I had this XSD bootstrapping idea on the backburner for some time, and finally Claude Opus 4.6 seems strong enough to shoulder the task.
Easy sailing, right? Just tell it to create a bootstrap schema, then tell it to bootstrap XSD, and then tell it to generate from the bootstrapped version. That way we ensure we don’t get some hallucinated implementation, but the real thing: We’re backed up by the original XSD meta schema.
Wow, I was so wrong. Claude lied to me in subtle ways about crucial details.
The good
First, everything looked great. All the scaffolding of all the necessary files, creating test suites, executables… I don’t need to learn how to use optparse-applicative! Migrate the test suite from hspec to tasty in a minute, done!
The bad
A quick inspection of the code shows that it can’t be right. All the data definitions are just newtypes around Maybe Text. Ok, it does parse something, but that information isn’t stored anywhere.
Having a look at the code generation “function”. Turns out it’s the constant function. Claude decided that it’s much easier to just have a bogus function that gets the XSD meta schema as an argument, discards it, and then just outputs a big text constant containing the Haskell code of a half-assed XSD implementation, which is later written to a file.
Time to bootstrap! I want to know how clever Claude really is, so I don’t point out its mistakes. I create a setup (ok, I let it create a setup) where it will soon discover its mistakes itself: It has to create a few XSD schemas, and both valid and invalid XMLs, and then test the bootstrapped XSD generator against xmllint. It found a lot of bugs.
The ugly
And it went on to fix all the bugs, it told me! Not much later, green test suite, all done. Claude applauds itself and even writes
emojis. It works! Right? RIGHT??
Haha, no.
There were so many turning points were I discovered how Claude went around my requirements in dumb and creative ways that I lost track of the order in which they appeared.
First of all, remember how I asked it to create a bootstrap XSD generator, from which I wanted it to generate an XSD schema parser, which it should then use to generate all the Haskell test cases from some test XSD files. Because… let’s say I had an inkling I shouldn’t trust the AI-created bootstrap generator. The point of going through the bootstrap process is exactly to create trust in the implementation. But it didn’t actually generate the test cases from the generated generator, instead from its bootstrap generator! Despite claiming being fully self-hosted.
So I told it it has to use the generated generator. So it did. And fixed all the tests somehow. And fixed the generated file. At the end it looked really neat. Then it off-handedly deleted a small test. What’s this test you just deleted, I asked? Oh, that’s just the test it needed to make sure that the generated generator is still up-to-date with what the bootstrap generator currently would spit out. Why would it delete such a test? Because it “manually” edited the automatically generated file to “fix” the test.
Oh, and by the way, about tests. LLMs can create test suites for you. Real quick, lots of test cases. That’s true, it created many tests that looked really pretty. But at some point I added a test XSD schema that I had from work. And it failed the test pretty hard. What’s up with that, I asked. Oh, this is a real world XSD file, it said, it uses a lot of features that are currently not supported. What do you mean, not supported, you just claimed you fully bootstrapped a feature-complete XSD implementation!
???
Claude will find clever ways to work around your requirements. Whether this works for you depends on how good you test, or how good the quality of your software needs to be. In my case, I needed 100%, an absolutely reliable XSD to Haskell code generator. It’s ok if it can’t deliver that right away. But it shouldn’t lie about what it achieved, and silently subvert my original requirements.
Still, I’m cautiously optimistic that I can dogfeed it a test setup where it can’t work around it. If you haven’t had enough of bootstrapping and generating code that generates code, let me tell you about code that generates code that generates code!
So there is the bootstrap code generator, xsd-to-hs-bootstrap. This is created by Claude. Then this reads the XSD meta schema and generates an XSD schema parser which parses again XSD. Together with a small code generator (created by Claude, because what could possibly go wrong), this is the first generated code generator, xsd-to-hs. Is it any good? Let’s find out! Let’s read the XSD meta schema and create a second XSD schema parser! Combine it with another code generator, voila we have xsd-to-hs2. The point of the second generation is to make sure that bootstrapping has reached a fixpoint. All three can be used to create xsd parsers, so we let them loose on some sample xsds and xmls (some Claude-created, some real world examples), and compare the results for all three of them, and against xmllint. With this setup, surely Claude won’t be able to cheat. Right? RIGHT??
I’d like to tell you, but my tokens have run out and I have to wait for next month.
Profit!
Is Claude a productivity gain? In this case, yes, because it was fun to experiment around with it. I could send it off with a prompt for a few hours to chew on, go to work, and look at the results after work. Will it be a productivity gain for you? That depends on a lot of factors. First, as I said before, on the quality you expect, and on how well you can inspect/read/verify your tests. The tests depend to a large extent on the domain. Complicated business logic? I hope you have an extensive external test set. UI? You should run the program yourself a lot of times, in addition to your integration tests. In my case, the domain lent itself to a great test approach: There is little test code in Haskell, and easily verified, and the actual content is in the XSD and XML files. Also there is a reference implementation. But still it managed to find its way around my harsh requirements, and I really have to study the code closely before it has my trust.