Struggling through applicative functor parsers to learn

I’m trying to create a parser for “ini” style files – without any parser libraries (I guess that’s what I’ll look into after this) – mostly as a learning exercise to get more familiar with all topics involved. I’ve gotten it to partially work so far – it can go from:

[mysection]
hello = world
goodbye = moon sun

to

([("mysection",[("hello",KVString "world"),("goodbye",KVString "moon")])],"sun\n")

(note the trailing “sun\n”)
based on this data model:

type Ini     = [Section]
type Section = (String, [KVPair])
--type KVPair  = (String, [KVValue])
type KVPair  = (String, KVValue)

data KVValue = KVString String | KVInt Int
  deriving (Show, Eq)

But the real KVPair should be able to handle multiple whitespace-delimited KVValues in a list (like the commented out KVPair definition). As soon as I try to use the (String, [KVValue]) definition and many in the parser, it seems to just hang forever. I’m not sure what I’m missing here or how I can go about getting some debugging info to figure out where it’s going wrong. There are certainly many concepts here I don’t have a great grasp on, so maybe there’s a lot wrong. :slight_smile:

Here’s the full code. Hopefully I’ve explained myself well enough. Thank you for any helpful information.

type Ini     = [Section]
type Section = (String, [KVPair])
--type KVPair  = (String, [KVValue])
type KVPair  = (String, KVValue)

data KVValue = KVString String | KVInt Int
  deriving (Show, Eq)

newtype Parser a = Parser {
  runParser :: String -> Maybe (a, String)
}

instance Functor Parser where
  fmap :: (a -> b) -> Parser a -> Parser b
  fmap f (Parser p) = Parser $ \input -> do
    (res, rest) <- p input
    return (f res, rest)

instance Applicative Parser where
  pure x = Parser $ \input -> Just (x, input)
  (<*>) :: Parser (a -> b) -> Parser a -> Parser b
  (Parser p1) <*> (Parser p2) =
    Parser $ \input -> do
      (f, input')  <- p1 input
      (a, input'') <- p2 input'
      Just (f a, input'')

instance Alternative Parser where
  empty = Parser $ const Nothing
  (<|>) :: Parser a -> Parser a -> Parser a
  (<|>) (Parser p1) (Parser p2) = Parser $ \input -> p1 input <|> p2 input

spanP :: (Char -> Bool) -> Parser String
spanP f = Parser $ \input ->
  let (token, rest) = span f input
    in Just (token, rest)

ws :: Parser String
ws = spanP isSpace

charP :: Char -> Parser Char
charP x = Parser f
  where
    f (y:ys)
      | y == x    = Just (x, ys)
      | otherwise = Nothing
    f [] = Nothing

reservedChars :: [Char]
reservedChars = [ '=', '#', '[', ']' ]

stringLiteral :: Parser String
stringLiteral = ws *> spanP (\x -> x `notElem` reservedChars && not (isSpace x)) <* ws

kvString :: Parser KVValue
kvString = KVString <$> stringLiteral

kvInt :: Parser KVValue
kvInt = KVInt <$> (ws *> (read <$> spanP isDigit) <* ws)

kvValue :: Parser KVValue
kvValue = kvString -- <|> kvInt (this is my next challenge)

kvKey :: Parser String
kvKey = ws *> stringLiteral <* ws <* charP '=' <* ws

kvPair :: Parser KVPair
kvPair = (,) <$> kvKey <*> kvValue <* ws
-- below is what the actual definition should be... ish
--kvPair = (,) <$> kvKey <*> many kvValue <* ws

section :: Parser Section
section = (,) <$> (charP '[' *> stringLiteral <* charP ']' <* ws) <*> many kvPair

ini :: Parser Ini
ini = many section

main :: IO ()
main = do
  putStrLn "Starting..."
  putStrLn testInput
  mapM_ print $ runParser ini testInput

testInput :: String
testInput = "\
  \[mysection]\n\
  \hello = world\n\
  \goodbye = moon sun\n"

First tip: you’re looping forever because your kvValue pattern is infallible. It’s ultimately the same as stringLiteral, and the implementation of stringLiteral is made entirely of components that can match the empty string. So when you ask for many kvValue, once it has consumed as many non-empty kvValues as it can, it continues matching the empty string forever.

You can fix this by changing stringLiteral so that it is Nothing if an empty string is matched, either by defining a variant of spanP that does this, or by wrapping it in a combinator such as the following:

filterP :: (a -> Bool) -> Parser a -> Parser a
filterP f p = Parser $ \input -> do
  result <- runParser p input
  guard (f $ fst result) $> result

Second tip: you should be more discriminating in how you’re matching whitespace. From your test case, it seems like you want values to be separated by spaces but not newlines, and sections and key-value pairs to be separated by newlines but not spaces. But you’re using the same ws for both, and that’ll mess up your parse.

3 Likes

Thank you for the reply, your advice seems spot on. I’m going to play with these ideas today.

1 Like

Thanks to your guidance I was able to get things completely working. I added a horribly-named somethingSpanP that will only match a non-empty string, and added a newline parser function that I’m using for matching the end of kvPairs and section headers.

I don’t think I’d ever seen guard before either, so I took a crack at understanding that and using it in the somethingSpanP where it seems to be working as expected. Thank you so much for your quick and thorough reply. I was banging my head against this for a while.

somethingSpanP :: (Char -> Bool) -> Parser String
somethingSpanP f = Parser $ \input -> do
  let (token, rest) = span f input
  guard (not $ null token)
  return (token, rest)

ws :: Parser String
ws = spanP (`elem` [' ', '\t'])

newline :: Parser String
newline = ws *> spanP (== '\n')
...
kvPair :: Parser KVPair
kvPair = (,,) <$> many commentP <*> kvKey <*> many kvValue <* newline

section :: Parser Section
section = (,,)
  <$> many commentP
  <*> (charP '[' *> stringLiteral <* charP ']' <* newline)
  <*> many kvPair <* newline