I’m trying to write a parser that given the following type of input string
lorem ipsum "dolor sit" amed"lorem "
Splits it out into the following parts
lorem
ipsum
"dolor sit"
amed"lorem "
In summary, it considers parts as characters separated by spaces, but when it comes to quoted text, inner spaces are no longer considered separators until the quote is closed. At some point it should handle escaped quotes (\"
) within the quoted text, but that’s outside of the scope of this question.
Where I’m at, right now, and could use some help:
import Text.Parsec qualified as Parsec
import Text.Parsec.Char qualified as Parsec
stringPartsParser :: Parsec.Parsec String () String
stringPartsParser =
do wordpart <- Parsec.anyChar
`Parsec.manyTill`
((Parsec.oneOf " \"" >> pure [])
Parsec.<|> (Parsec.eof >> pure []))
(Parsec.lookAhead (Parsec.char '"') >> wordTillQuoteEnd wordpart)
Parsec.<|> return wordpart
where
wordTillQuoteEnd partial = do
Parsec.char '"'
inner <- Parsec.anyChar `Parsec.manyTill` (Parsec.char '"')
Parsec.char '"'
pure $ partial <> ['"'] <> inner <> ['"']
And in GHCi
> Parsec.parseTest (Parsec.many stringPartsParser) "lorem\" ipsum"
*** Exception: Text.ParserCombinators.Parsec.Prim.many: combinator 'many' is applied to a parser that accepts an empty string.
CallStack (from HasCallStack):
Not much experience with Parsec, mostly banging rocks trying to succeed
edit
I thought the error was from the function definition, but it was in my GHCi call instead. Replaced the call with Parsec.parseTest (stringPartsParser
Parsec.manyTill Parsec.eof) "lorem\" ipsum"
and able to progress further.
Ideas still welcome for what I’m trying to do.