How can I refactor this simple recursive function?

dalendisto · May 22, 2022, 8:59am

Hello everyone.
I decided to learn Haskell by building small projects with concepts I am familiar with and then try to refactor them in different ways. Obviously not always successful . For two days now I am calmly staring at my carefully crafted code snippet, and thinking of ways (<*>, >>=, <>, fold, map, fmap) to refactor it. As far as I understand, running this function over a 2000 page book is not that efficient…is it? Any ideas anyone to point me into the right direction? Thanks a lot in advance.

splitIntoSequence :: Int -> String -> [String]
splitIntoSequence y ""  = []
splitIntoSequence y (x:xs) 
  | length xs < y = [x:xs]
  | otherwise = take y (x:xs) : splitIntoSequence y xs

splitIntoSequence 2 "Hello" = ["He", "el", "ll", "lo"]

f-a · May 22, 2022, 9:21am

Maybe

λ> s = "Ciao"
λ> zipWith (\a b -> [a,b]) s (tail s) -- edit: woops this works only for `2` but can be generalised
["Ci","ia","ao"]

As for efficiency: I note splitIntoSequence is not tail recursive.

jaror · May 22, 2022, 9:21am

The most noticeable thing is your use of length on the tail of the list (xs). Length is not cheap on linked lists like it is on arrays (which is the default “list” type in other languages). Instead you can speed it up massively by writing length xs < y as null (drop (y - 1) xs) (but note that it doesn’t work the same if y = 0). And also note that this makes your other base case redundant, so you could write:

splitIntoSequence :: Int -> String -> [String]
splitIntoSequence y xs 
  | null (drop y xs) = [xs]
  | otherwise = take y xs : splitIntoSequence y (tail xs)

Although, I think the nicest way to write this is with zipWith:

splitIntoSequence :: Int -> String -> [String]
splitIntoSequence 0 _ = repeat ""
splitIntoSequence _ "" = []
splitIntoSequence y xs = zipWith (:) xs (splitIntoSequence (y - 1) (tail xs))

Abab9579 · May 22, 2022, 9:24am

Hmm, I heard tail recursion is not beneficial for haskell specifically, as it is lazy.
(Oh so jaror talked about this!)

jaror · May 22, 2022, 9:25am

Tail recursion is not really important in Haskell. It does use guarded recursion / tail recursion modulo cons, so it is still fast, see the Haskell wiki:

The important concept to know in Haskell is guarded recursion (see tail recursion modulo cons), where any recursive calls occur within a data constructor (such as foldr, where the recursive call to foldr occurs as an argument to (:)). This allows the result of the function to be consumed lazily, since it can be evaluated up to the data constructor and the recursive call delayed until needed.

dalendisto · May 22, 2022, 9:34am

Hey, thanks you so very much. Excited to see what your ideas are. I will have follow up questions for sure if you don’t mind.

jaror · May 22, 2022, 9:37am

I think this is interesting:

ghci> List.transpose (take 2 (List.tails "Hello"))
["He","el","ll","lo","o"]

Unfortunately, I see no easy and efficient way to leave out that last element.

dalendisto · May 22, 2022, 9:49am

What I would do with my limited experience and knowledge is just plug that list into the init function?

init ["He","el","ll","lo","o"]

jaror · May 22, 2022, 10:08am

That requires an extra pass over the whole output which is a bit suboptimal. And for y = 2 that works but in general you need to do something like take (length xs - (y - 1)), which requires both a pass over the input and a pass over the output.

Anyway, I think the solution is to use a different transpose function which also truncates the input to a rectanglular list of lists:

transposeTrunc :: [[a]] -> [[a]]
transposeTrunc [] = repeat []
transposeTrunc (x:xs) = zipWith (:) x (transposeTrunc xs)

This will do the trick:

splitIntoSequence y xs = transposeTrunc (take y (List.tails xs))

With foldr you can make write that transposeTrunc as a one-liner:

splitIntoSequence y = foldr (zipWith (:)) (repeat []) . take y . List.tails

I think that is a very elegant solution.

dalendisto · May 22, 2022, 10:21am

Basically what I was aiming for but I still have to digest your solution. Thank a lot.

dalendisto · May 22, 2022, 5:18pm

This is also an interesting idea. Maybe I can make it work. Thanks

Vlix · May 23, 2022, 10:12pm

My immediate reaction would be to just do the length call once and use the length as an argument in a recursive helper function.
This might not be as “clean” and fast as the solutions that just foldr etc, but it might be a good one to know for certain situations

splitIntoSequence :: Int -> String -> [String]
splitIntoSequence _ [] = []
splitIntoSequence y str =
    go y (length str) str
  where
    go chunkLen remainingLen (x:xs)
      | remainingLen <= chunkLen = [x:xs]
      | otherwise = take y (x:xs) : go chunkLen (remainingLen - 1) xs

Could probably use a few guard in the top-level function to make sure the y is not negative, but this is the idea.

1chb · May 25, 2022, 6:59pm

splitIntoSequence k s = 
  takeWhile ((==k) . length) $
  take k <$> tails s

ScottSedgwick · May 29, 2022, 6:48am

Here’s a way to do it with folds that’s fairly readable:

import Data.List (tails)
splitIntoSequence :: Int -> [a] -> [[a]]
splitIntoSequence n xs = 
  let 
    f a b = if length a >= n then take n a : b else b
  in
    foldr f [] (tails xs)

Topic		Replies	Views
Writing recursion properly Learn	3	1717	October 25, 2021
Internalised Guarded Recursion for Equational Reasoning Links	5	495	October 22, 2022
Refactor Folds Tool? Learn	5	437	March 1, 2019
Using `length` in different positions seems to have different result Learn	4	384	September 28, 2021
Writing a Haskell code that compress the list Learn	3	906	October 6, 2022

How can I refactor this simple recursive function?

Related topics