I’ve been working on warp / wai-extra
internals again and I find I’m writing specific HTTP header parsing functions that feel like they should be in a more generic library for everyone to use. Here’s some examples:
Parsing ETags
import Data.ByteString as BS
import Data.Word8 (_quotedbl, _W)
data ETags = AnyResource | ETags [ETag]
-- ETag without the double quotes.
-- @"\"e-tag\""@ -> StrongTag "e-tag"
-- @"W/\"e-tag\""@ -> WeakTag "e-tag"
data ETag = StrongTag ByteString | WeakTag ByteString
getTag :: ETag -> ByteString
getTag (StrongTag bs) = bs
getTag (WeakTag bs) = bs
weakEQ :: ETag -> ETag -> Bool
weakEQ tag1 tag2 = getTag tag1 == getTag tag2
strongEQ :: ETag -> ETag -> Bool
strongEQ (StrongTag tag1) (StrongTag tag2) = tag1 == tag2
strongEQ _ _ = False
parseETags :: HeaderValue -> Maybe ETags
parseETags "*" = Just AnyResource
parseETags val = ETags <$> traverse parseETag tags
where
-- 'splitCommas' is also a generic function to split header values
-- on commas while trimming the allowed whitespaces
tags = splitCommas val
parseETag :: ByteString -> Maybe ETag
parseETag bs =
case BS.uncons bs of
Just (w8, rest)
-- Regular opaque tag in double quotes
| w8 == _quotedbl -> StrongTag <$> endIsQuote rest
-- Weak tag in double quotes prefixed with literal "W/"
| w8 == _W -> case BS.splitAt 2 rest of
(start, rest')
| start == "/\"" -> WeakTag <$> endIsQuote rest'
| otherwise -> Nothing
_ -> Nothing
where
endIsQuote x =
BS.unsnoc x >>= \(rest, w8) -> rest <$ guard (w8 == _quotedbl)
Parsing Q-values
import Data.ByteString as BS
import Data.ByteString.Char8 as BS8
import Data.Word8 (_0, _1, _period, _semicolon, _space)
-- | Only to be used on header's values which support quality value syntax
--
-- A few things to keep in mind when using this function:
-- * The resulting 'Int' will be anywhere from 1000 to 0 ("1" = 1000, "0.6" = 600, "0.025" = 25)
-- * The absence of a Q value will result in 'Just 1000'
-- * A bad parse of the Q value will result in a 'Nothing', e.g.
-- * Q value has more than 3 digits behind the dot
-- * Q value is missing
-- * Q value is higher than 1
-- * Q value is not a number
parseQValueList :: BS.ByteString -> [(BS.ByteString, Maybe Int)]
parseQValueList = fmap go . splitCommas
where
go = checkQ . S.break (== _semicolon)
checkQ :: (BS.ByteString, BS.ByteString) -> (BS.ByteString, Maybe Int)
checkQ (val, "") = (val, Just 1000)
checkQ (val, bs) =
-- RFC 7231 says optional whitespace can be around the semicolon.
-- So drop any before it , . and any behind it $ and drop the semicolon
(dropWhileEnd (== _space) val, parseQval . BS.dropWhile (== _space) $ BS.drop 1 bs)
where
parseQval qVal = do
q <- BS.stripPrefix "q=" qVal
(i, rest) <- BS.uncons q
guard $
i `elem` [_0, _1]
&& BS.length rest <= 4
case BS.uncons rest of
Nothing
-- q = "0" or "1"
| i == _0 -> Just 0
| i == _1 -> Just 1000
| otherwise -> Nothing
Just (dot, trail)
| dot == _period && not (i == _1 && BS.any (/= _0) trail) -> do
let len = BS.length trail
extraZeroes = replicate (3 - len) '0'
guard $ len > 0
readMaybe $ w2c i : BS8.unpack trail ++ extraZeroes
| otherwise -> Nothing
Now I have looked around a bit, but haven’t found a library/package that concerns themselves with these types of functions, so if you know of one, I’d love to hear about it. But in the case where this just isn’t found anywhere yet, here is my question for the general public:
Where should these types of functions go?
- Into the
http-types
package? Since it already has utility functions for byte ranges. - Into a new package called
http-header
(or the like) - Just keep them in the internals of
wai / warp
- (Some other option?)