Parser-regex: A regex library

parser-regex is a regex library where regexes are composed together in the style of parser-combinator libraries.

import Regex.Text (REText)
import qualified Regex.Text as R
import qualified Data.CharSet as CS

reKeyValues :: REText [(Text, Text)]
reKeyValues = R.char '{' *> (kv `R.sepBy` R.char ',') <* R.char '}'
  where
    str = R.someTextOf CS.asciiLower
    kv = (,) <$> str <* R.text ":" <*> str
>>> import qualified Regex.Text as R
>>> R.reParse reKeyValues "{foo:bar,baz:qux}"
Just [("foo","bar"),("baz","qux")]

This makes parsing a lot more fluent compared to matching a regex pattern, extracting the submatches, potentially parsing those submatches, and putting them all in the result.

At the same time, it stays in the relatively simple world of regular expressions and avoids the complexity of more powerful parsing libraries.

You can find more details and examples in the README.

If you have used the regex-applicative library before, things should sound familiar. I thought it was a great idea and tinkered with different internals, which ultimately led to this library. parser-regex offers better performance, Text support, and more handy definitions.

Feedback is welcome!

14 Likes

The first thing I thought was, ‘why would I use this over a full parser combinator library?’. I feel the answer is a bit hidden: this is designed for find-and-replace as opposed to parsing to a data structure, and the runtime is linear rather than exponential. It might be worth highlighting those advantages a little more.

4 Likes

Because accepting restrictions (regular expressions) can bring about benefits (reliable running time)!

Find-and-replace really just falls out of the parsing implementation. If you inspect the definitions you will find them to be rather simple.

But those aren’t the only advantages.

I should clarify that parser-regex works differently from (parsec-style) parser combinator libraries despite the similar interface, which means they cannot replace each other. That being said, I certainly prefer parser-regex for parsing (when applicable) because it frees you from having to think about the backtracking behavior of your parser and how to control it. You never need try, which is great!

Parsec-style parsers also do not backtrack enough to work like a regex. Consider this example.

import Text.Parsec
data Number = Octal String | Hexadecimal String deriving Show
pNumber :: Parsec String () Number
pNumber = (Octal <$ string "0" <|> Hexadecimal <$ string "0x") <*> string "123"

Looks alright?

>>> parse pNumber "" "0x123"
Left (line 1, column 2):
unexpected "x"
expecting "123"

Unfortunately, it’s not. You cannot fix this with try, you must put Hexadecimal before Octal for this to work. I would rather avoid such issues.

import Regex.List
reNumber :: RE Char Number
reNumber = (Octal <$ list "0" <|> Hexadecimal <$ list "0x") <*> list "123"
>>> reParse reNumber "0x123"
Just (Hexadecimal "123")

Not a problem with parser-regex (or any regex library really)!

3 Likes

What is the expected domain of usefulness of custom Haskell regex parsers (including this one)?

For efficiently parsing data in Haskell without lexing you have the binary library and its flavors. They’re not super convenient, since you need to be explicit, but that’s about their only downside.

For efficiently performing find-and-replace over a template you have libpcre2, which reliably provides far more features than any Haskell homebrew could ever wish to.

I don’t see gold beyond these two.

At the risk of stating the obvious, parsing regular languages. The README summarizes the features that someone might be looking for in a library, if so it is for them.

binary is a serialization/deserialization library.

libpcre2 is different in some ways (and also not Haskell), but hey if it serves you well that’s great. You don’t have to use the library if it doesn’t match your use case (:

3 Likes

Awesome, just used parser-regex for Day 3 - Advent of Code 2024 :slight_smile:

4 Likes

Happy to see I wasn’t the only one :grinning_face_with_smiling_eyes:
It made the problem almost trivial.