If you strictly evaluate the position before parsing each list element then the space consumption stays constant (if GC kicks in).
main = traverse_ print
(parse (pList (seq <$> pPos <*> pIP) <* pEnd) (createStr (LineColPos 0 0 0) (ipString 5)))
Unfortunately, the garbage collector does not always kick in automatically (I’ve tested this in GHCi by manually calling System.Mem.performGC
). I don’t know why that is.
Yeah, that is a good plan. Attoparsec is more mature and streaming libraries are more predictable.
EDIT: I have done more testing. The garbage collector is not the problem. The problem was the way I was testing it. The lazy LineColPos
update was really the only cause of the linear space use. I think it would be pretty easy to fix in uu-parsinglib
.
EDIT2: I think I have fixed it. Nope, not completely.
EDIT3: I think this fix is sufficient. I have also noticed another leak related to the error correction, but your IP parsing example can be done in constant space now:
import Text.ParserCombinators.UU
import Text.ParserCombinators.UU.Utils
import Text.ParserCombinators.UU.BasicInstances
import Data.Word
import Data.Foldable
import System.Environment
data IP = IP !Word8 !Word8 !Word8 !Word8
deriving Show
ips :: [String]
ips = ["192.168.1.0", "8.8.8.8", "255.255.255.0"]
ipString :: String
ipString = unlines $ cycle ips
pIP :: P (Str Char String LineColPos) IP
pIP =
IP
<$> pNaturalRaw
<* pSym '.'
<*> pNaturalRaw
<* pSym '.'
<*> pNaturalRaw
<* pSym '.'
<*> pNaturalRaw
<* pSym '\n'
main :: IO ()
main = do
(x : _) <- getArgs
traverse_ print $ take (read x) $ parse
(pList pIP <* pEnd)
(createStr (LineColPos 0 0 0) ipString)
Note that I’m using pNaturalRaw
because pNatural
also includes arbitrary spacing after the number which conflicts with the pSym '\n'
and is generally is not wanted in IP addresses.
This can parse about 250,000 IP/s on my machine. That is not terrible but certainly not very great.