What helped for me was to index 64 bits at a time. So don’t fold over the bytestring, but use lower level indexing like this:
> x <- (\(B.BS ptr i) -> withForeignPtr ptr (\p -> peekElemOff (castPtr p :: Ptr Int) 0)) "hello world!"
> showHex x ""
"6f77206f6c6c6568"
But make sure that you apply a mask to the last element so that you don’t read more than the length of the word.