Haskell98’s section 9.3 specifies with the layout algorithm L with the side-condition parse-error, even admitting that it is difficult to implement. GHC has a bullet point in the chapter on known bugs and infelicities that notes that GHC doesn’t attempt to implement this requirement. Haskell2010 removes the whole algorithm, bringing GHC in complicance.
My question is: how hard is it to adjust the algorithm in Haskell98 to remove this side-condition? What does it mean exactly, that GHC “gobbles up the whole expression”? This algorithm is producing a token stream, so expressions do not exist yet. Does it just mean that the case is removed, so any input matching it will instead match the next one, which is L (t:ts) ms = t : (L ts ms)?
Are there any known problems with L as written in Haskell98? I am asking because it was removed in Haskell2010. But I suppose it could also have been removed because someone thought that a standard should contain only prose?
Not-serious answer: No one will go back and edit the Haskell98 document, so it is impossibly hard. But OK we can talk about a future version…
Hopeful answer: Since what GHC does is actually simpler, it should also be simpler for an updated standard to adopt that.
Cynical answer: People have generally given up updating the standard since 10 years ago. Also, in the case of Haskell2010 removing L, I have no evidence but I guess that it not because of preferring prose, but rather just not knowing what to formalize instead.
I don’t think this is correct. GHC certainly implements L as specified in Haskell2010, including the dreadful parse-error test. The corresponding section simply moved to 10.3: 10 Syntax Reference
I say “dreadful”, but it is not really that dreadful in error-free programs, because it simply does The Right Thing.
However, last fall I found that the layout rule causes severe trouble for resuming parsing after a syntax error and producing a partial syntax tree: #25322: Haskell 2010's layout rule reacts poorly to parse errors · Issues · Glasgow Haskell Compiler / GHC · GitLab, so that’s why it’s dreadful.
I spent some time trying to find a formulation that doesn’t need to interleave parsing with layouting. That is, I literally tried to implement L as a standalone pass (here) and was hoping that we could do The Right Thing based on a static analysis of the grammar involved (idea based on Conor McBride’s layDKillaz). I soon discovered that this doomed unless you maintain the full LALR automaton, for programs such as
baz :: Int -> [Int]
baz a = [case a of _ | even a, a > 100 -> 0, 42]
baz :: Int -> [Int]
baz a = [case a of {_ | even a, a > 100 -> 0}, 42]
The problem is: How can a standalone pass insert } before the second comma, but not before the first?
I don’t think it would be wise to force breaking changes on the whole ecosystem, so any layout algorithm that is supposed to supersede the current one by default must get this right.
I suppose one could say “well, if you want to stick to the old layout algorithm, then you won’t see multiple parse errors.” I think that is a reasonable position to take, but presumably we should first ship !13145 and let people judge how useful that is.