Parsec parsing multiline Ruby expressions

Hi,

I’m trying to extend herbalizer, which converts HAML templates to ERB format using the Parsec library.

Right now, ruby expression are parsed as single line only: herbalizer/src/Main.hs at master · danchoi/herbalizer · GitHub

rubyExp = do
  line <- ((:) <$> char '=' >> spaces >> manyTill anyChar newline <* spaces)
  return (RubyExp line)

Indeed, in HAML: “A line of Ruby code can be stretched over multiple lines as long as each line but the last ends with a comma” - File: REFERENCE — Haml Documentation

This is my first attempt at extending that parser expression:

rubyExp = do
  char '='
  spaces
  commaLines <- option [] (try $ many commaLine)
  line <- manyTill anyChar newline
  spaces
  return (RubyExp $ concat commaLines ++ line)
  where 
    commaNewline = string ",\n"
    commaLine = (++) <$> (manyTill anyChar (try (lookAhead commaNewline))) <*> commaNewline

which only matches a multi-line ruby expression string up until the first newline, even though it ends with a comma. The commaLines parser, i.e. the many parser, doesn’t seem to be matching greedily, which is what I was expecting. Where is my thinking error here?

I arrived at a somewhat working solution using this parser expression:

rubyExp = do
  char '='
  spaces
  commaLines <- option [] (try $ many commaLine)
  line <- manyTill anyChar (try (lookAhead(noneOf "," >> newline)))
  lastChar <- noneOf "," <* newline
  spaces
  return (RubyExp $ concat commaLines ++ reverse(lastChar : (reverse line)))
  where 
    commaNewline = string ",\n"
    commaLine = (++) <$> (manyTill anyChar (try (lookAhead commaNewline))) <*> commaNewline

I’m sure this can be expressed more elegantly than this. I’m still very new to Haskell, any help is greatly appreciated!

I do have a few suggestions:

  • Switch from Parsec to Megaparsec, Attoparsec, or another maintained parser library. A PEG parser library like Text.Grampa.PEG.Backtrack might be a better fit for this use case.
  • Simplify option [] (try $ many commaLine) to many (try commaLine).
  • Since commaLines should consume all the lines ending with comma, you shouldn’t worry about the last line ending with comma. You can replace (try (lookAhead(noneOf "," >> newline))) with just newLine and drop the lastChar and reverse wrangling.
  • You can use the sepBy1 combinator to make the parser a bit more obvious.
1 Like