How can I refactor this simple recursive function?

Hello everyone.
I decided to learn Haskell by building small projects with concepts I am familiar with and then try to refactor them in different ways. Obviously not always successful :joy:. For two days now I am calmly staring at my carefully crafted code snippet, and thinking of ways (<*>, >>=, <>, fold, map, fmap) to refactor it. As far as I understand, running this function over a 2000 page book is not that efficient…is it? Any ideas anyone to point me into the right direction? Thanks a lot in advance.

splitIntoSequence :: Int -> String -> [String]
splitIntoSequence y ""  = []
splitIntoSequence y (x:xs) 
  | length xs < y = [x:xs]
  | otherwise = take y (x:xs) : splitIntoSequence y xs

splitIntoSequence 2 "Hello" = ["He", "el", "ll", "lo"]

Maybe

λ> s = "Ciao"
λ> zipWith (\a b -> [a,b]) s (tail s) -- edit: woops this works only for `2` but can be generalised
["Ci","ia","ao"]

As for efficiency: I note splitIntoSequence is not tail recursive.

1 Like

The most noticeable thing is your use of length on the tail of the list (xs). Length is not cheap on linked lists like it is on arrays (which is the default “list” type in other languages). Instead you can speed it up massively by writing length xs < y as null (drop (y - 1) xs) (but note that it doesn’t work the same if y = 0). And also note that this makes your other base case redundant, so you could write:

splitIntoSequence :: Int -> String -> [String]
splitIntoSequence y xs 
  | null (drop y xs) = [xs]
  | otherwise = take y xs : splitIntoSequence y (tail xs)

Although, I think the nicest way to write this is with zipWith:

splitIntoSequence :: Int -> String -> [String]
splitIntoSequence 0 _ = repeat ""
splitIntoSequence _ "" = []
splitIntoSequence y xs = zipWith (:) xs (splitIntoSequence (y - 1) (tail xs))
1 Like

Hmm, I heard tail recursion is not beneficial for haskell specifically, as it is lazy.
(Oh so jaror talked about this!)

Tail recursion is not really important in Haskell. It does use guarded recursion / tail recursion modulo cons, so it is still fast, see the Haskell wiki:

The important concept to know in Haskell is guarded recursion (see tail recursion modulo cons), where any recursive calls occur within a data constructor (such as foldr, where the recursive call to foldr occurs as an argument to (:)). This allows the result of the function to be consumed lazily, since it can be evaluated up to the data constructor and the recursive call delayed until needed.

3 Likes

Hey, thanks you so very much. Excited to see what your ideas are. I will have follow up questions for sure if you don’t mind.

I think this is interesting:

ghci> List.transpose (take 2 (List.tails "Hello"))
["He","el","ll","lo","o"]

Unfortunately, I see no easy and efficient way to leave out that last element.

What I would do with my limited experience and knowledge is just plug that list into the init function?

init ["He","el","ll","lo","o"]

That requires an extra pass over the whole output which is a bit suboptimal. And for y = 2 that works but in general you need to do something like take (length xs - (y - 1)), which requires both a pass over the input and a pass over the output.

Anyway, I think the solution is to use a different transpose function which also truncates the input to a rectanglular list of lists:

transposeTrunc :: [[a]] -> [[a]]
transposeTrunc [] = repeat []
transposeTrunc (x:xs) = zipWith (:) x (transposeTrunc xs)

This will do the trick:

splitIntoSequence y xs = transposeTrunc (take y (List.tails xs))

With foldr you can make write that transposeTrunc as a one-liner:

splitIntoSequence y = foldr (zipWith (:)) (repeat []) . take y . List.tails

I think that is a very elegant solution.

Basically what I was aiming for but I still have to digest your solution. Thank a lot.

1 Like

This is also an interesting idea. Maybe I can make it work. Thanks

My immediate reaction would be to just do the length call once and use the length as an argument in a recursive helper function.
This might not be as “clean” and fast as the solutions that just foldr etc, but it might be a good one to know for certain situations

splitIntoSequence :: Int -> String -> [String]
splitIntoSequence _ [] = []
splitIntoSequence y str =
    go y (length str) str
  where
    go chunkLen remainingLen (x:xs)
      | remainingLen <= chunkLen = [x:xs]
      | otherwise = take y (x:xs) : go chunkLen (remainingLen - 1) xs

Could probably use a few guard in the top-level function to make sure the y is not negative, but this is the idea.

splitIntoSequence k s = 
  takeWhile ((==k) . length) $
  take k <$> tails s
1 Like

Here’s a way to do it with folds that’s fairly readable:

import Data.List (tails)
splitIntoSequence :: Int -> [a] -> [[a]]
splitIntoSequence n xs = 
  let 
    f a b = if length a >= n then take n a : b else b
  in
    foldr f [] (tails xs)