Cabal fails when I want install unicode-aware regex library

I want to use a unicode-aware regex. I have read that regex-compat-tdfa is the way to go. However when I issue

cabal install regex-compat-tdfa

I get the following error:

Resolving dependencies...
cabal: Could not resolve dependencies:
[__0] trying: regex-compat-tdfa-0.95.1.4 (user goal)
[__1] next goal: base (dependency of regex-compat-tdfa +/-newbase
+/-splitbase)
[__1] rejecting: base-4.16.4.0/installed-4.16.4.0 (conflict: regex-compat-tdfa
+/-newbase +/-splitbase => base<4.15)
[__1] rejecting: base-4.18.0.0, base-4.17.1.0, base-4.17.0.0, base-4.16.4.0,
...
base-4.1.0.0, base-4.0.0.0, base-3.0.3.2, base-3.0.3.1 (constraint from
non-upgradeable package requires installed instance)
[__1] fail (backjumping, conflict set: base, regex-compat-tdfa)
After searching the rest of the dependency tree exhaustively, these were the
goals I've had most trouble fulfilling: base, regex-compat-tdfa

Is regex-compat-tdfa not compatible? Or do I need to install it differently?

Unfortunately, it’s not compatible. The most important line in that error message is:

[__1] rejecting: base-4.16.4.0/installed-4.16.4.0 (conflict: regex-compat-tdfa
+/-newbase +/-splitbase => base<4.15)

According to this error, you have base-4.16.4.0 installed, but regex-compat-tdfa-0.95.1.4 requires base<4.15, and cabal can’t resolve this incompatibility.

If you search for regex-compat-tdfa on Hoogle with set: stackage, you get the message “Not on Stackage, so not searched”, which is a sign that the package may not be up to date.

Also, on the Hackage page, it says the package was last uploaded in 2012 (although there was a revision in 2022). And going to this bug tracker issue tells the rest of the story: the package isn’t being maintained and doesn’t work with GHC >= 9.0.

So, as far as I can see, your options are:

  • Download regex-compat-tdfa-0.95.1.4 yourself (not using cabal) and patch it manually
  • Use an earlier version of base (and since GHC versions are tied to base versions, that means using an earlier version of GHC too)
  • Give up on regex-compat-tdfa and look for another library

Thank you, @gcox. I guess regex-compat-tdfa isn’t a good choice then.

Are there other unicode-aware Regex libs? From what I understand, regex-compat is not unicode aware - or am I mistaken?

You appear to be right that regex-compat does not support Unicode. It’s based on regex-posix, which says in its documentation:

Note that the posix library works with single byte characters, and does not understand Unicode. If you need Unicode support you will have to use a different backend.

There is regex-tdfa, which says it does have Unicode support:

Depending on the text being searched this package supports Unicode. The [Char], Text, Text.Lazy, and (Seq Char) text types support Unicode. The ByteString and ByteString.Lazy text types only support ASCII.

Also text-icu has Data.Text.ICU.Regex. Since ICU is a Unicode library, it should definitely be Unicode aware.

But I don’t think I’ve ever actually used these. There may be other better choices too.

Just to ask. Could you try cabal install --allow-newer=base regex-compat-tdfa? . This tells cabal to try to compile with newer versions of base even if it is out of the range.

Also notice that running cabal install some-lib is almost always a bad idea. You may think that the command does something similar as pip install some-lib but it doesn’t

1 Like

regex-tdfa is pretty widely used:

Reverse Dependencies

130 direct, 3487 indirect

So it can’t be a bad choice.

3 Likes

regex-rure binds to a Rust library (which is unicode-aware)