Canadian-income-tax version 2024.0

Package canadian-income-tax lives on! If you live in Canada and have some taxes to file, try it out. You can build it yourself and use it locally, or just head to [http://taxell.ca].

After the first two years in the desert, I decided to approach CRA and ask for permission to upload the completed return via their secret NETFILE protocol, instead of making users print the PDFs and mail them in. Unfortunately the software wasn’t accepted for this year, because it was lacking some undocumented technical requirements. This year I have added the missing features (T4 slips and save/load functionality, AFAIU), so there’s some hope it will be accepted for the next year.

Enjoy!

12 Likes

Does the tax office provide a formal specification of the relationships between the FDF fields in their PDF forms?

I’m asking because Nikita Volkov and me are working on the European e-invoice standard, which is also (formally) defined as a subset of a record type (XML in that case). If we succeed, we will have a quite general toolkit for Haskell code generation from data type specifications.

In many ways, e-invoices are similar to PDF forms, because the first iteration of the standard required the XML invoice to be bundled with a PDF that displayed the same information in human-readable form. Today, only the XML may be exchanged, which in the case of PDF forms is equivalent to handing in the naked FDF.

1 Like

I wish. I had to reverse-engineer it from the PDFs. The test suite checks that the fields are all there and that the calculations are idempotent and non-cyclic, but otherwise I have no way of knowing if they are correct. Furthermore, there are changes every year and some of them are subtle. One reason I applied for the NETFILE documentation is that I’m hoping the XML Schema is more stable.

Does the e-invoice standard come with an XML schema and Schematron correctness rules? That would be so nice to work with. Though Haskell unfortunately lacks a good XML schema library, and I don’t think there’s anything for Schematron at all.

When you said “reverse-engineer”, does that mean some sort of checking is baked into the PDFs (maybe as JavaScript)?

The e-invoice is defined in the EN16931 standard and one of the compliant implementations, the French Factur-X, has an XSD with associated schematron rules. Even the Basic variant has about 400 schematron rules!
Nikita has written a new parser for XPath expressions because the existing implementation does not cover the current standard. That is mostly what is needed for parsing schematron files. Next up is to define an algebra of assertions, which will refine the original XSD model. The crucial part of the spec are equational assertions, though, whence I’ve been asking all sorts of subtype-related questions on this discourse lately. The math is beautiful, hence a fun project to work on.

Yet all this is a lot of software development for a small company like mine. I’d be grateful if someone else with a use for Haskell XML models could step in and share the cost, maybe monetizing the code generation as SAAS.

2 Likes

I may have used the wrong term, sorry. I don’t actually have access to any sort of code or anything engineered to reverse-engineer, except for the fillable PDF forms and accompanying human-readable instructions on how to complete them. There are no checks in PDFs. I presume CRA has proprietary software to verify the forms as they arrive, but I have no evidence of that.

The architecture I settled on represents each form in three parts: the form type as an HKD record, the mapping of record fields to FDF form fields, and the equations for completing the form.

Is your code open source? I don’t know if I’ll have a use for it, but I may be able to contribute at some point. I have implemented XML Schema and RELAX NG validation and a streaming subset of XPath in the past, not in Haskell unfortunately. I just don’t have any spare time at the moment. I wish you best luck. XML standards are unfortunately nobody’s idea of fun, nor of career advancement these days.

That has some similarities with our approach:

  1. The form shape corresponds to an XSD spec. The HKD makes sure that any operations are done on the same shape. That connection is actually weaker between XSD and schematron, as any schematron can be applied to any XML.
  2. The mapping corresponds to the marshalling of Haskell values to exchange format, e.g XML
  3. Instead of equations we will have an algebra of assertions, which include equations but also arbitrary predicates over a language that can access all fields.

What we are aiming for is to completely eliminate those constructors that will be filled in by equations, and generate an embedding from the simplified to the original form. In your tax case those constructors are still there, but can be given an empty value. You could go one step further and use GHC.Generics.V1 on them to completely disable those fields. Nikita advocated against HKD because it does not go well with deriving of type classes.

There could be (PDF is an abomination and security risk), but CRA apparently chose not to implement them and do all validation server-side.

Ideally it will be, but the development cost is accumulating and the cost/benefit ratio is getting higher than what is justifiable for us. The toolkit will be split across several packages, allowing us to at least open-source the general core.