Happy shift/reduce conflict when using list1 with opt

GZGavinZhao · July 4, 2023, 7:37pm

I’ve been banging my head over this shift/reduce conflict on a production that has a non-empty list of tokens plus an optional token. Below is a minimal reproducible example:

%name example
%tokentype { Tok }

%token
	identifier { TokIdentifier }
	string     { TokString }
	name_it    { TokNameIt }
	apply      { TokApply }

%%

stmts : list(stmt) {}

stmt : func_call_statement {}

var : identifier {}

name : name_it identifier {}

func_call_statement
	: list1(both(apply, var)) opt(name) {}

-- Utilities

opt(p)
    :   { Nothing }
    | p { Just $1 }

rev_list1(p)
    : p              { [$1] }
    | rev_list1(p) p { $2 : $1 }

list1(p) : rev_list1(p) { reverse $1 }
list(p)
    : list1(p) { $1 }
    |          { [] }

both(p, q) : p q { ($1, $2) }

Helper productions like opt, list, both are all copied from Happy’s documentation here.

Here’s the state that is causing the conflict:

State 8

	list1__both__apply__var____ -> rev_list1__both__apply__var____ .    (rule 9)
	rev_list1__both__apply__var____ -> rev_list1__both__apply__var____ . both__apply__var__    (rule 14)

	name_it        reduce using rule 9
	apply          shift, and enter state 10
			(reduce using rule 9)

	%eof           reduce using rule 9

	both__apply__var__goto state 15

I know the problem is with the opt(name) at the end because if I either remove that or stmts the problem goes away, but what and how I should do to get rid of the conflict.

I read the portion in Serokell’s post on Happy’s reduce/shift conflict, but the two solutions (%shift or Indicate the precedence) it described don’t work for me: no matter where I add %shift the conflict still exists, and there’s no operator for me to define precedence.

Is there a way to solve this conflict, or must I refactor my grammar? Thank you in advance!

Probie · July 4, 2023, 8:08pm

There’s no magical incantation to give Happy to get rid of this shift/reduce - this grammar is simply ambiguous.

If names are optional, how do I know when one statement ends and another begins? Consider the steam of tokens apply identifier apply identifier. Is there one statement here or two? Both [[(apply, identifier), (apply, identifier)]] and [[(apply, identifier)], [(apply, identifier)]] are valid parses according to the grammar.

glguy · July 4, 2023, 9:11pm

To prefer the longest list, you’d need to put the shift all the way down on list1:

list1(p) : rev_list1(p) %shift { reverse $1 }

…but even better is to define a statement terminator so that users can disambiguate themselves which parse they intended.

Probie · July 4, 2023, 9:22pm

There’s no magical incantation to give Happy to get rid of this shift/reduce - this grammar is simply ambiguous.

This was a poorly worded statement, since it’s wrong. What I should have said, is that there’s no magical incantation to give Happy that will satisfactorily work (based on my limited understanding of what you’re trying to do). I’ve made the assumption that a function call with two variables is distinct from two function calls with one variable

GZGavinZhao · July 6, 2023, 2:08am

Ah, now I understand what’s causing the conflict. Thank you to both of you! I realized I should’ve provided more context. Sorry for the potential confusion! Here’s how a function call statement is written:

“There’s [x]. There’s [y]. There’s [z]. Take [2] and apply function [f]. Take [2] and apply function [g]. Call the result [a].”

This translates to a = g(x, f(y, z)). So for future reference, I believe this means that I prefer the longest list as @glguy described, correct?

The ideal AST would be something like this (a is used to store extra info, like the location of the token in file):

data Identifier a = ...
data Name a = Name a String

data Statement a = 
  RefStatement a (Identifier a)
  | FuncCallStatement a [(Int, Identifier a)] (Maybe (Name a))

So that the above statement would translate to:

[ RefStatement a (Identifier "x")
, RefStatement a (Identifier "y")
, RefStatement a (Identifier "z")
, FuncCallStatement 
    a [(2, Identifier "f"), (2, Identifier "g")] (Just (Name a "a"))
]

Note: I didn’t design this grammar. The grammar specification is here.

Actually, as I’m writing this, I think the definition for function_post_call is either flawed or the author forgot to add name_single_statement to the statement list, because in the official Syntax Cheeshet, there’s an example where the name_single_statement appeared directly after function_post_call to store the result of the function call as a variable, which according to the grammar spec would’ve caused a straight-out parse error. So I guess what I’ll do to solve this conflict is to promote name from an expression-ish level to the statement level.