Adjusting Haskell98's layout to GHC behaviour

janus · March 27, 2025, 12:16am

Haskell98’s section 9.3 specifies with the layout algorithm L with the side-condition parse-error, even admitting that it is difficult to implement. GHC has a bullet point in the chapter on known bugs and infelicities that notes that GHC doesn’t attempt to implement this requirement. Haskell2010 removes the whole algorithm, bringing GHC in complicance.

My question is: how hard is it to adjust the algorithm in Haskell98 to remove this side-condition? What does it mean exactly, that GHC “gobbles up the whole expression”? This algorithm is producing a token stream, so expressions do not exist yet. Does it just mean that the case is removed, so any input matching it will instead match the next one, which is L (t:ts) ms = t : (L ts ms)?

Are there any known problems with L as written in Haskell98? I am asking because it was removed in Haskell2010. But I suppose it could also have been removed because someone thought that a standard should contain only prose?

treblacy · March 29, 2025, 5:51pm

Not-serious answer: No one will go back and edit the Haskell98 document, so it is impossibly hard. But OK we can talk about a future version…

Hopeful answer: Since what GHC does is actually simpler, it should also be simpler for an updated standard to adopt that.

Cynical answer: People have generally given up updating the standard since 10 years ago. Also, in the case of Haskell2010 removing L, I have no evidence but I guess that it not because of preferring prose, but rather just not knowing what to formalize instead.

sgraf · March 29, 2025, 10:17pm

I don’t think this is correct. GHC certainly implements L as specified in Haskell2010, including the dreadful parse-error test. The corresponding section simply moved to 10.3: 10 Syntax Reference

I say “dreadful”, but it is not really that dreadful in error-free programs, because it simply does The Right Thing.
However, last fall I found that the layout rule causes severe trouble for resuming parsing after a syntax error and producing a partial syntax tree: #25322: Haskell 2010's layout rule reacts poorly to parse errors · Issues · Glasgow Haskell Compiler / GHC · GitLab, so that’s why it’s dreadful.
I spent some time trying to find a formulation that doesn’t need to interleave parsing with layouting. That is, I literally tried to implement L as a standalone pass (here) and was hoping that we could do The Right Thing based on a static analysis of the grammar involved (idea based on Conor McBride’s layDKillaz). I soon discovered that this doomed unless you maintain the full LALR automaton, for programs such as

baz :: Int -> [Int]
baz a = [case a of _ | even a, a > 100 -> 0, 42]

(real world example here.)

This should lay out as

baz :: Int -> [Int]
baz a = [case a of {_ | even a, a > 100 -> 0}, 42]

The problem is: How can a standalone pass insert } before the second comma, but not before the first?

I don’t think it would be wise to force breaking changes on the whole ecosystem, so any layout algorithm that is supposed to supersede the current one by default must get this right.

I suppose one could say “well, if you want to stick to the old layout algorithm, then you won’t see multiple parse errors.” I think that is a reasonable position to take, but presumably we should first ship !13145 and let people judge how useful that is.

blamario · April 9, 2025, 5:11pm

There’s an old alternative layout algorithm by John Meacham at http://repetae.net/repos/getlaid/Layout.hs

It’s a small self-contained Haskell source preprocessor, so obviously it doesn’t depend on the expression parser.

Lysxia · April 9, 2025, 7:33pm

There is another approach to specifying a layout rule in the paper Principled Parsing for Indentation-Sensitive Languages by Michael Adams (POPL 2013). It’s “principled” in that it proposes a general extension of context-free grammars to deal with indentation-sensitive layout, in which one can express a variant of Haskell’s layout rule. I’m not familiar with the details, but even if it is not 100% the same, the difference may not matter in practice.

There is a modified version of happy with that extension on the author’s site. It’s > 10 years old but with a bit of luck it may not be too hard to port it to today’s happy for a larger-scale breakage study.

Topic		Replies	Views
GHC 9.8.1-alpha1 is now available	5	1991	July 31, 2023
GHC 9.10.1 is now available! Announcements	24	7102	December 17, 2024
Possibly due to bad layout	0	399	March 24, 2022
GHC 9.8.1 is now available	21	12994	November 6, 2023
GHC 9.10.1-alpha2 is now available! Announcements	5	1911	March 31, 2024

Adjusting Haskell98's layout to GHC behaviour

Related topics