And then instances Monad, MonadFail, and Alternative in such newtype. And gets a Parser that can work as long you already have the entire thing to parse.
This has two inconveniences:
If the Syntax can change in length depending on the contents (specially in binary protocols), then is no clear how much data you may need to give it before it gets an valid result, likewise if you give it too much data, you may truncate information and get a failing program.
Similar problem with an Streaming input to the parser.
In any case you could try to solve giving it at input of k length (starting with k = 1) until you get a valid output or consume the entire input, but such thing relive is a memorization to no be O(n²) and therefore very suicidal.
There is some suggestions in how to solve this problem?
Not my specialty but my gut instinct is that you’d probably have some kind of standardised input format and one of the things that your input will tell you is the length of the message in some kind of Header.
You could also build some kind of Parser for any kind of synchronisation input, that you know must be a certain length and format which can always be listening and, on being triggered, brings the rest of your parsers into play.
data Header :: { protocol :: Protocol, length :: Word8 }
parseProtocol :: Parser Protocol
parseProtocol :: Parser $ \xs ->
let (protoBits, rest) = splitAt numberOfProtobits xs
protocol = matchProtoBits protobits
in either Nothing (\pro -> Just (pro, rest)) protocol
parseLength :: Parser Word8
parseLength = Parser $ \xs -> let
(lengthBits, rest) = splitAt 8 xs
in \word -> (word, rest) <$> ( readMaybe lengthBits :: Maybe Word8)
parseHeader :: Parser Header
parseHeader = do
prot <- parseProtocol
len <- parseLength
pure $ Header prot len
parseMessage :: Word8 -> Parser String
parseMessage len = Parser $ \xs -> let
message = take (fromIntegral len) xs
pure (xs, "") -- if you want to end the stream here, or just do the regular splitAt stuff and return rest.
handleMessage :: String -> a
handleMessage ...
parseSync :: Parser ()
parseSync= Parser $ \xs -> if take lengthOfSyncMessage xs == syncMessage then pure ((), drop lengthOfSyncMessage xs) else Nothing
parseMessage :: Parser a
parseMessage = do
_ <- parseSync -- discard the sync message as it's not used, no need to validate since if this doesn't pass, we can't do the rest of the block any way.
Header _ msgLen <- parseHeader
msg <- parseMessage msgLen
pure $ handleMessage msg
Types are probably all over the place, and there’s probably a better way to do it not sure if I was committing to using bytes or strings or stuff.
The key thing is realising that you build up smaller pieces, and stick them together, and by discarding parts you don’t need, you can use them as triggers to prepare you for the things you do need.
EDIT: My bad, missed the entire thing where you need to have the whole string before you can do anything lol. Yeah, the industrial strength implementations work with streams and are happy waiting / consuming more input when triggered.
This is why industrial-strength parser combinators actually work with monadic input streams (example: Text.Parsec.Stream) instead of a pure String. The input represents all the data, even if it hasn’t all been received yet, without the caller having to worry about how much input the parser wants. The parser consumes as much of it as it needs, executing monadic actions via the stream to fetch more if necessary, and (in the case of parsec, this requires using a lower-level function like runParsecT) returns a stream containing any unused input suitable for use in another parser.
Umm I guess if there are headers where a length is not specified, then I’m assuming that there must be some kind of encoder for the end of the message or indication in the header that the message is infinitely long. Otherwise, I don’t see how it can be parsed and considered a message.
I know there are combinators that allow you to keep parsing until you parse a specific thing and then it returns everything before that thing.
The second part isn’t a problem. It is incorporated into thi design. If you give it too much data then the left over data is returned as the String in the second component of the Maybe. If you want it to be able to handle not enough data then you can tweak the definition a bit, for example:
newtype Parser a = Parser { runParser :: String -> Maybe (Either (Parser a) (a, String)) }
Then you use runParser like
case runParser p of
Nothing -> -- parse failure
Left needMore -> -- feed more String into needMore
Right (a, leftovers) -> -- result and unparsed leftovers