Hello everyone.
I decided to learn Haskell by building small projects with concepts I am familiar with and then try to refactor them in different ways. Obviously not always successful . For two days now I am calmly staring at my carefully crafted code snippet, and thinking of ways (<*>, >>=, <>, fold, map, fmap) to refactor it. As far as I understand, running this function over a 2000 page book is not that efficientâŠis it? Any ideas anyone to point me into the right direction? Thanks a lot in advance.
splitIntoSequence :: Int -> String -> [String]
splitIntoSequence y "" = []
splitIntoSequence y (x:xs)
| length xs < y = [x:xs]
| otherwise = take y (x:xs) : splitIntoSequence y xs
splitIntoSequence 2 "Hello" = ["He", "el", "ll", "lo"]
The most noticeable thing is your use of length on the tail of the list (xs). Length is not cheap on linked lists like it is on arrays (which is the default âlistâ type in other languages). Instead you can speed it up massively by writing length xs < y as null (drop (y - 1) xs) (but note that it doesnât work the same if y = 0). And also note that this makes your other base case redundant, so you could write:
splitIntoSequence :: Int -> String -> [String]
splitIntoSequence y xs
| null (drop y xs) = [xs]
| otherwise = take y xs : splitIntoSequence y (tail xs)
Although, I think the nicest way to write this is with zipWith:
Tail recursion is not really important in Haskell. It does use guarded recursion / tail recursion modulo cons, so it is still fast, see the Haskell wiki:
The important concept to know in Haskell is guarded recursion (see tail recursion modulo cons), where any recursive calls occur within a data constructor (such as foldr, where the recursive call to foldr occurs as an argument to (:)). This allows the result of the function to be consumed lazily, since it can be evaluated up to the data constructor and the recursive call delayed until needed.
That requires an extra pass over the whole output which is a bit suboptimal. And for y = 2 that works but in general you need to do something like take (length xs - (y - 1)), which requires both a pass over the input and a pass over the output.
Anyway, I think the solution is to use a different transpose function which also truncates the input to a rectanglular list of lists:
My immediate reaction would be to just do the length call once and use the length as an argument in a recursive helper function.
This might not be as âcleanâ and fast as the solutions that just foldr etc, but it might be a good one to know for certain situations
splitIntoSequence :: Int -> String -> [String]
splitIntoSequence _ [] = []
splitIntoSequence y str =
go y (length str) str
where
go chunkLen remainingLen (x:xs)
| remainingLen <= chunkLen = [x:xs]
| otherwise = take y (x:xs) : go chunkLen (remainingLen - 1) xs
Could probably use a few guard in the top-level function to make sure the y is not negative, but this is the idea.
Hereâs a way to do it with folds thatâs fairly readable:
import Data.List (tails)
splitIntoSequence :: Int -> [a] -> [[a]]
splitIntoSequence n xs =
let
f a b = if length a >= n then take n a : b else b
in
foldr f [] (tails xs)