Survey and Opinions: Backtracking in Parsers

Introduction

I am currently building an introspectable parser combinator library.
This has been a goal of mine for a long time, but only recently did the idea make sense in my head, after I read Exploring Arrows for sequencing effects, so it is arrow-based now.

You can see parts of it working in the source or in the docs.

Example

-- >>> take 5 $ exampleWords binaryNumber
-- [fromList [One],fromList [One,Zero],fromList [One,One],fromList [One,Zero,Zero],fromList [One,Zero,One]]

binaryNumber :: Parser BinaryToken () ()
binaryNumber = exact One *> many digit $> ()

Although these are applicative operators, they use arrows under the hood. I just thought it was easier to write (because I’m used to it) with them.

Moreover, I plan to do static analysis on the parser, whenever it fails. This would allow the library to recover from a parsing failure and suggest repair sequences. (some terminology from laurence tratt). Producing example words, (inputs the parser would accept) is only a first step.

Input and Opinions

Finally, my question for your feedback and input:
In your favorite parsing library:

  • what is the flavor of backtracking used
    • e.g.: full backtracking, backtracking unless input was consumed, explicit backtracking?
  • what do you like about it?
  • what do you dislike about it?
  • plus more if you care to share
5 Likes

Looks interesting, I’ll definitely take a look.

As for backtracking, I’m not sure it’s the central issue I’m concerned about. My main focus at the moment is writing resilient parsers and figuring out how to prove resilience properties in the type system.

Parsec-style backtracking unless input was consumed is definitely my least favourite. It’s confusing and misleading for newcomers, and occasionally bites even experienced users.

PEG-style backtracking parsers are performant and predictable, though they require some extra primitives for full power.

Full backtracking, which should be indistinguishable from full parallel parsing, is easy to understand but in its bare formulation it has trouble with syntax error reporting.

To control syntax reporting and help with performance, I prefer adding something like the commit/admit combinators rather than the consumed-input backtracking.

5 Likes

Thank you for you thoughts!

I’ve never heard of this approach before, thanks for sharing it.