Modern parser generator

deemp · January 2, 2024, 4:42pm

Hi! Is there a tool that generates Haskell ISGLR parsers with automatic error recovery and support for attribute grammars?

LR with error recovery like grmtools.
- Probably with compile-time grammar evaluation support
ISGLR like this.
- Maybe there’s a port of SDF3 (Spoofax) to begin with?
Attribute grammars like ones supported by happy.
- The evaluation should happen on a successful parse.

The target is Language Servers.

Is this combination of features an overkill?

jaror · January 2, 2024, 6:24pm

I don’t think there is one. I’ve been working on something in this direction. My idea was to build upon data-dependent grammars (a la Iguana) and fungll in a parser combinator library, but it doesn’t quite work out yet. My latest experiment is on Github. The main issue I’m getting stuck on is dealing with variable binding in the EDSL.

The reason I’m considering data-dependant grammars rather than SGLR parsers is because Jurgen Vinju gave quite a convincing talk at EVCS: https://homepages.cwi.nl/~jurgenv/papers/EVCS.pdf. TL;DR: data-dependant grammars allow you to easily define disambiguation strategies in a generic framework instead of modifying the internals of the parsing algorithm for each new disambiguation strategy.

When I manage to crack that, I think it should not be hard to turn the parser combinator library into a parser generator by using some Template Haskell.

I don’t have plans to include full attribute grammars, because I think they are usually at the wrong level of abstraction. After parsing you generally want to think in terms of control flow and data flow, but attribute grammars just keep viewing the IR as a tree. And you either have to thread the attributes yourself or rely on syntactic flow like from left to right through the tree.

But if you want to use attribute grammars anyway, I’d recommend taking a look at UUAGC which is a completely standalone attribute grammar system. You can combine that with any parser you want.

deemp · January 14, 2024, 10:50pm

The Rascal site lists some more works (link), including a thesis about Iguana and Meerkat (🎓 Ali Afroozeh and Anastasia Izmaylova, Practical General Top-down Parsers. (2019) Universiteit van Amsterdam.).

Unfortunately, error recovery is left for the future work.

Error recovery
One of the important topics that we did not discuss in this thesis is error recovery. Top-down parsers have a good reputation for error reporting and error recovery, and we suspect it should be easier to provide good error recovery features for GLL, say compared to GLR. Currently, Iguana reports the parse error location, but does not provide any error recovery features. Providing error recovery should be the highest priority feature, especially for using Iguana in language workbenches and development tools.

sgraf · January 15, 2024, 12:52pm

A modern version of happy has been on my wish list for quite some time. We would be able to use it to great effect in GHC, which currently has abysmal parse error recovery; see Fault-tolerant GHC compilation pipeline by michaelpj · Pull Request #63 · haskellfoundation/tech-proposals · GitHub.
Furthermore, a staged Template Haskell frontend for happy based on combinators or some other declarative framework would go a long way toward raising excitement about LR parsing that is currently reserved to monadic LL parsing. (An old idea that hasn’t generated much interest since.)

Ultimately I think we lack someone dedicated who is willing to invest the work on these matters in the Haskell ecosystem. That someone would need to have a good idea of the state of the art and then perhaps write a tool (or improve happy) that can cope with a large parser such as GHC’s.

As far as disambiguation is concerned, the shortcomings that are often associated with LR but not with LL aren’t so prominent in GHC because of judicious use of %shift pragmas and a crazy hand-crafted disambiguation mechanism for the overlapping exp and pat fragments (one that would be necessary in LL as well). It would be interesting to see whether GHC’s parser could instead use happy’s GLR mode to disambiguate the parses.

I wouldn’t worry too much about performance as long as the asymptotics are no worse than established generalised parsing techniques.

Kleidukos · January 15, 2024, 1:04pm

Oh you may have stumbled right into tree-sitter territory!

I have been told by @alanz that the Erlang Language Platform uses it to power the LSP server.

deemp · January 15, 2024, 2:39pm

Yep, I know about tree-sitter and Lezer. They are GLR.
After stuying these, I attempted to find something backed by recent research.

I probably formulated my initial question not completely clearly (or too specifically) and can’t edit it now.

Initially, I wanted generated parsers to be LR-something, but then @jaror linked (Modern parser generator - #2 by jaror) a paper by Jurgen Vinju who stated:

We conclude that top-down is much easier to experiment with and extend. Bottom-up could be faster due to partial evaluation, but still, additional bookkeeping and less sharing are required to filter correctly. Rascal’s SGLL in Java is as fast as the SDF2’s SGLR in C.

Second, it is arguably easier to design and implement (new) disambiguation constructs with GLL than with GLR. Third, data-dependent context-free grammars add the level of formality and generality that we were always searching for when inventing new disambiguation schemes (as exemplified by the Jakker and Iguana Parsing Architectures). We conclude that a top-down implementation of data-dependent context-free parsing is the way to go for Rascal as well as SDF3.

So, after years of research, the objective of the Rascal team became SGLL + DDCFG parsing.

Now, I want:

a tool written in Haskell
that produces an SGLL parser for a DDCFG
- the tool generates purely Haskell code
- or, the tool reads a grammar and generates a parser in memory (nice for prototyping)
and the produced parser is
- fault-tolerant
- optionally incremental

UPD: Maybe I want an even more powerful formalism the LLR-system (paper). Its repo has < 3 KLOC in Java

deemp · January 15, 2024, 5:03pm

I asked on the Rascal repo (off topic) Modern parser generator for Haskell · Issue #1909 · usethesource/rascal · GitHub for correct requirements and for an estimated amount of work.

In short:

1 man-year for a good parser generator
read the aforementioned thesis by Afroozeh and Izmaylova plus some other works

alanz · January 15, 2024, 6:25pm

FWIW, I did some experiments on applying the techniques from the paper that inspired tree-sitter to happy. It looked promising, but I never had the time to push it to anything real.

Topic		Replies	Views
Pre-HFTP: Fault-tolerant GHC compiler pipeline Haskell Foundation	4	1074	May 28, 2024
Reusing Haskell's binding when defining context free grammars Show and Tell	20	2373	September 11, 2023
Deep-transformations and more Show and Tell	5	704	November 12, 2020
Announcement for the Compiler Tooling Task Force Announcements	3	2617	January 14, 2022
Pre-HFTP: Coordination for finishing Trees That Grow	15	1534	August 28, 2023

Modern parser generator

Related topics