I want to use a unicode-aware regex. I have read that regex-compat-tdfa is the way to go. However when I issue
cabal install regex-compat-tdfa
I get the following error:
Resolving dependencies...
cabal: Could not resolve dependencies:
[__0] trying: regex-compat-tdfa-0.95.1.4 (user goal)
[__1] next goal: base (dependency of regex-compat-tdfa +/-newbase
+/-splitbase)
[__1] rejecting: base-4.16.4.0/installed-4.16.4.0 (conflict: regex-compat-tdfa
+/-newbase +/-splitbase => base<4.15)
[__1] rejecting: base-4.18.0.0, base-4.17.1.0, base-4.17.0.0, base-4.16.4.0,
...
base-4.1.0.0, base-4.0.0.0, base-3.0.3.2, base-3.0.3.1 (constraint from
non-upgradeable package requires installed instance)
[__1] fail (backjumping, conflict set: base, regex-compat-tdfa)
After searching the rest of the dependency tree exhaustively, these were the
goals I've had most trouble fulfilling: base, regex-compat-tdfa
Is regex-compat-tdfa not compatible? Or do I need to install it differently?
According to this error, you have base-4.16.4.0 installed, but regex-compat-tdfa-0.95.1.4 requires base<4.15, and cabal can’t resolve this incompatibility.
If you search for regex-compat-tdfa on Hoogle with set: stackage, you get the message “Not on Stackage, so not searched”, which is a sign that the package may not be up to date.
Also, on the Hackage page, it says the package was last uploaded in 2012 (although there was a revision in 2022). And going to this bug tracker issue tells the rest of the story: the package isn’t being maintained and doesn’t work with GHC >= 9.0.
So, as far as I can see, your options are:
Download regex-compat-tdfa-0.95.1.4 yourself (not using cabal) and patch it manually
Use an earlier version of base (and since GHC versions are tied to base versions, that means using an earlier version of GHC too)
Give up on regex-compat-tdfa and look for another library
You appear to be right that regex-compat does not support Unicode. It’s based on regex-posix, which says in its documentation:
Note that the posix library works with single byte characters, and does not understand Unicode. If you need Unicode support you will have to use a different backend.
There is regex-tdfa, which says it does have Unicode support:
Depending on the text being searched this package supports Unicode. The [Char], Text, Text.Lazy, and (Seq Char) text types support Unicode. The ByteString and ByteString.Lazy text types only support ASCII.
Also text-icu has Data.Text.ICU.Regex. Since ICU is a Unicode library, it should definitely be Unicode aware.
But I don’t think I’ve ever actually used these. There may be other better choices too.
Just to ask. Could you try cabal install --allow-newer=base regex-compat-tdfa? . This tells cabal to try to compile with newer versions of base even if it is out of the range.
Also notice that running cabal install some-lib is almost always a bad idea. You may think that the command does something similar as pip install some-lib but it doesn’t