Beautiful functional programming

simonpj · August 25, 2023, 4:08pm

I met Jose Valim, the creator of Elixir at Erlang Days a few weeks ago. He described a little challenge problem he designed where, in his view, an imperative solution just seems easier and more diret than a purely functional one. The problem is a pure data transformation, nothing imperative going on.

Here’s the challenge.

The algorithm should receive a list of sections. A section is a key-value data structure, with a “title”, a “reset_lesson_position” boolean, and a list of “lessons”. A lesson is a key-value data structure with the “name” field.
Your job is to traverse the list of sections, adding a position (starting from 1) to each section, and traverse the list of lessons adding a position (starting from 1) to each lesson. Note, however, the lessons position is shared across sections. The lesson position should also be reset if “reset_lesson_position” is true.

(I didn’t really understand this spec, but the example on the above link makes it clear.)

So does anyone want to have a go in Haskell? The repo contains solutions in 40-ish languages, including a lens-based one in Haskell, but I thought you all might enjoy a challenge.

Simon

rae · August 25, 2023, 7:29pm

I have to say I’m mildly offended that someone thinks this looks like a problem for which an imperative solution would appear to be easier than a functional one. I was expecting something involving indexing or arrays or something properly unwieldy. Maybe José hasn’t met mapAccumL? In any case, I’ve put my solution at Solution to https://discourse.haskell.org/t/beautiful-functional-programming/7411 · GitHub. Naturally this solution reallocates the entire structure instead of editing in place.

That was fun!

tristanC · August 25, 2023, 7:36pm

Using a good old go recursive function: Solution to https://discourse.haskell.org/t/beautiful-functional-programming/7411 · GitHub

tomjaguarpaw · August 25, 2023, 7:54pm

Given that mapAccumL is equivalent to a particular traverse in the State monad I wonder where we draw the line between functional and imperative code.

atravers · August 25, 2023, 8:59pm

I wonder where we draw the line between functional and imperative code.

Would the use of Reader to simply pass along some static data be considered “imperative” ?

jaror · August 25, 2023, 9:24pm

The description of the problem itself sounds utterly imperative. There’s even a literal imperative in the input schema: reset_lesson_position.

Perhaps the more interesting question is can we rephrase the problem in declarative terms?

For one, I’d remove the reset_lesson_position and instead make the input a list of lists where the sections are grouped such that inside each group the lesson position starts from 1:

g :: [(Bool, a)] -> [[a]]
g = map (map snd) . groupBy (\_ (x,_) -> not x)

A simplified variant of the problem is perhaps this:

Define f :: [[a]] -> [[(Int, a)]] such that for all a :: Type and x :: [[a]] we have map fst (concat (f x)) == [1 .. length x].

You can do this “the lens way” with a traversal which keeps track of state:

f :: [[a]] -> [[(Int, a)]]
f xs = evalState (traverse (traverse (\x -> do i <- get; put $! i + 1; pure (i, x))) xs) 1

You can even make it look much more imperative:

f :: [[a]] -> [[(Int, a)]]
f xss = (`evalState` 1) $ 
  for xss $ \xs -> 
    for xs $ \x -> do
      i <- get
      put $! i + 1
      pure (i, x)

Or alternatively, perhaps lesser known is “the attribute grammar way”, tying a recursive knot:

f :: [[a]] -> [[(Int, a)]]
f xss = yss where
  (iss, yss) = unzip (zipWith zipKeep ([1..] : iss) xss)

  zipKeep :: [a] -> [b] -> ([a], [(a, b)])
  zipKeep xs [] = (xs, [])
  zipKeep (x:xs) (y:ys) = let (xs', zs) = zipKeep xs ys in (xs', (x, y) : zs)

Here’s my full solution:

import Data.List (groupBy)
import Data.Bifunctor

data Section a b = Section String a [Lesson b] deriving Show
data Lesson a = Lesson String a deriving Show

f :: [Section a b] -> [Section a Int]
f xss = yss where
  (iss, yss) = unzip (zipWith (\i (Section n v ls) -> second (Section n v) (zipKeep i ls)) ([1..] : iss) xss)

  zipKeep :: [a] -> [Lesson b] -> ([a], [Lesson a])
  zipKeep xs [] = (xs, [])
  zipKeep (x:xs) (Lesson y _:ys) = let (xs', zs) = zipKeep xs ys in (xs', Lesson y x : zs)

g :: (b -> Bool) -> [Section b a] -> [[Section b a]]
g f = groupBy (\_ (Section _ x _) -> not (f x))

h :: [Section a b] -> [Section (Int, a) b]
h = zipWith (\i (Section n v ls) -> Section n (i, v) ls) [1..]

solve :: [Section Bool b] -> [Section Int Int]
solve = concat . map f . map (map (\(Section n (v, _) ls) -> Section n v ls)) . g snd . h

ghci> solve [Section "Getting Started" False [Lesson "Welcome" (), Lesson "Installation" ()], Section "Basic operator" False [Lesson "Addition / Subtraction" (), Lesson "Multiplication / Division" ()], Section "Advanced topics" True [Lesson "Mutability" (), Lesson "Immutability" ()]]
[Section "Getting Started" 1 [Lesson "Welcome" 1,Lesson "Installation" 2],Section "Basic operator" 2 [Lesson "Addition / Subtraction" 3,Lesson "Multiplication / Division" 4],Section "Advanced topics" 3 [Lesson "Mutability" 1,Lesson "Immutability" 2]]

I like that it decomposes the problem, but I don’t like that many components are not very reusable and there are some awkward reshaping parts necessary. I guess I’d need to grab some lenses or attribute grammars to really improve that.

Oh, you can introduce the reusable Flat type and that makes it a lot prettier:

data Flat a = Flat Int [(a, Int)] deriving Functor

zipFlat :: (a -> b -> c) -> [a] -> Flat b -> Flat c
zipFlat f xs (Flat n ys) = Flat n (zipWith (\x' (y', z') -> (f x' y', z')) xs ys)

nest :: Flat a -> [[a]]
nest (Flat n xs) = replicate n [] ++ concatMap (\(x, n) -> [x] : replicate n []) xs

flat :: [[a]] -> Flat a
flat [] = Flat 0 []
flat xs = (\(x, y) -> Flat (length x) ((\(Flat _ x) -> x) (flat y))) (span null xs)

solve :: [((String, [String]), Bool)] -> [((String, Int), [(String, Int)])]
solve = 
  concatMap 
    ( map (\((x,y):xs) -> (x, y: map snd xs))
    . nest 
    . zipFlat (\i (x, y) -> (x, (y, i))) [1..] 
    . flat 
    . map sequence
    )
  . nest
  . zipFlat (\i (x, y) -> ((x, i), y)) [1..]
  . Flat 0
  . map (second (\x -> if x then 1 else 0))

However, this still has a few bugs in it. Can you spot them?

AntC2 · August 25, 2023, 10:51pm

Yeah! What’s more, it’s described expecting the solution will update in situ. (So perhaps @rae’s solution won’t count?)

Simon’s quoting the instructions. But ‘key-value’ is just not true: they’re both sequential data structures – they even have repeated (therefore not) keys “name”.

Shall I blah on about youth of today not having proper schooling in data analysis …

Just a terrible way to pitch the question.

jameshaydon · August 25, 2023, 10:59pm

As pointed out by @jaror, the spec is quite naturally expressed with equations that look like:

The leads me to wonder, would a functional logic language such as Verse have an elegant solution to this?

Liamzy · August 26, 2023, 12:31am

The comedy is that it still probably performs better than a mutable implementation in Elixir.

AntC2 · August 26, 2023, 3:40am

Can I ask the assembled brains trust: is there anything real-life plausible about that input schema? Have you met a requirements design with that sort of ‘feature’? (I haven’t – at least not from any data analyst who remained in my employment.)

reset_lesson_position is not a property of the title entity. It’s a property of the position of the title in the text file. IOW (and given a longer/more realistic set of chapters) shuffling around the title sub-trees would just scramble the numbering. What’s really going on is that the book is in groups of chapters; numbering to restart for each group. Again, poor data analysis.

So the ‘challenge problem’ is artificial, contrived to push some polemic agenda for imperative solutions – at which it fails even so.

ReleaseCandidate · August 26, 2023, 5:38am

Here it is, from a Hacker News post:
https://news.ycombinator.com/item?id=26776786

The problem came from implementing a real feature into a course platform.
The data structure contains a list of sections and lessons to build a table of contents. The position stores the order they are being displayed in.
In the course viewing page lessons are annotated with a “#N” where N is the position and it goes from 1 all the way until the last lesson’s count. This way folks could be watching the course and be like “Hey, I’m on lesson #56 and have a question”.
But on the course description page the table of contents was displayed in such a way where the position / lesson count was reset for each section.
Although in the real life implementation this code is wrapped into a function and the reset_lesson_position is being supplied by a boolean function argument which controls whether or not the lesson’s position gets reset for every section. We added it as an attribute to the section directly for the sake of this example to keep the problem more scoped to the core idea of the problem.

https://news.ycombinator.com/item?id=26778287

I still don’t see the need for these “reset_lesson_position” field. Or, to rephrase, why you would want to restart the index for just some lessons and not all?

jhenahan · August 26, 2023, 6:42am

I think the most Haskell-brained thing in every programming thought experiment is to see a specification and immediately start building a better data structure to solve the problem rather than submit to a poor model.

From the specification, you get a (flat) list of lessons and must produce a collection of sections, each of which contains a collection of lessons. The desire in the end is to render a table of contents with sections numbered from 1, containing lessons numbered from 1. The odd boolean field is how you know it’s time to create a new section.

ReleaseCandidate · August 26, 2023, 7:01am

Yes, I understood this part. What I didn’t understand has been why you would want to do that.
But now I understand that a course can have more than one section and this “rest_bla” marks the beginning of a new course. Which does make even less sense in my opinion

Nycticorax · August 26, 2023, 8:30am

I think the few submissions mentioned above suggest nicely that the original challenge is a poor example of “harder / less elegant for functional languages”. I think a much better example would be to implement a (non-CLI) GUI easily portable to any major platform. This is pretty standard requirement in the industry, and one for which I think Haskell is in very bad shape in comparison to C++ / Rust / TypeScript / Python.

Of course, the difficulty comes almost entirely from the fact that portable GUI often requires turning to frameworks which are not Haskell-friendly or not even usable from Haskell, albeit through convoluted techniques. But I’d argue this doesn’t make it any less of a respectable problem for Haskell and functional languages in general.

AntC2 · August 26, 2023, 9:15am

Well, that’s real-life in the sense they got the design wrong, then tried to kludge it by programming round the mis-design. Not what I meant by “real-life”.

Nothing specific to “Haskell-brained”. Professional systems design says to fit the data structure to the business situation. Yes it’s often difficult to understand the whole business situation/you likely won’t get there at the first try. Doesn’t mean you should put up with a mis-fit.

So I’m not after a “better” data structure, just one that’s adequate to the business requirements.

What triggered my line of questions is trying to design a (relational) key-value store. The ‘example input’ contains no key-alike that would keep the nodes flagged reset_lesson_position in the sequenced position. So it’s not a key-value store; it’s a sequential store. Nothing necessarily wrong with that; just don’t mis-represent the problem.

Anyhoo Haskell has plenty of power to scan elegantly over sequential/JSON data.

maxigit · August 26, 2023, 9:25am

The fact that the problem is artifical, contrivied or asked in with an imperative biased is irrelevant isn’t ?

atravers · August 26, 2023, 10:08am

AntC2 · August 26, 2023, 10:44am

Sure. We have input data as a sequence of sequences. This naturally leads to a Haskell representation as a List of Lists. Functional languages (since LISP) are excellent at elegantly iterating (recursing) over lists, carrying state as they go. Haskell is the world’s finest imperative programming language, after all.

And we have a dude Jose Valim who seems not to know how to describe programming problems or fit a data design to a requirement. Irrelevant.

Addit: in response to the comment just below

I wasn’t commenting on anybody’s program-writing. I’m commenting (as had several others before me [**]) on the statement of requirements; the data model; the design process by which a requirement uncovered late in the development wasn’t reflected back into the data model. Development isn’t all programming – indeed as I expect we’ve all experienced – programming alone can’t rescue a design that’s a poor fit to the use-case. (And the elegant Haskell programs shown here can only be elegant because the use-case isn’t a key-value store, contra the claims in the design doco.)

[**] Indeed all the criticisms here had already been made 2½ years ago in the YCombinator discussions that @ReleaseCandidate linked to. Including disagreeing with Jose’s claim that @simonpj’s intro repeats “an imperative solution just seems easier and more diret [sic] than a purely functional one”. In particular:

… when you’re stuck with a poorly defined data structure, …
The whole problem seems to be bizarrely defined when it comes to data structures.
I’m not sure the problem reflect anything, …
IMO The problem is poorly defined. Data structures should avoid control flow instructions like “reset_lesson_position”

The author is aware of the fact that mutability makes this kind of problem easier

Not clear from the spec that mutating rather than building a fresh output stream is required. (In situ mutating is what the example Python script seems to be doing – including inserting an extra tagged value.) But to suggest mutability means the input must be some sort of key-value store, not tackled as sequential JSON-to-fresh-JSON. I don’t see how the input JSON as given could represent contents of a key-value store. (There’s no keys.)

So why after all that feedback 2½ years ago is O.P. still at programming conferences hawking this as a meaningful exercise, and making claims that have been discredited about imperative solutions?

jaror · August 26, 2023, 12:06pm

@AntC2 Let’s accept that there are people with different views on how to write programs and keep our criticism constructive with objective arguments instead of just saying other people are incompetent.

Edit: the comment this was referring to has now been edited.

taylor · August 26, 2023, 2:58pm

Yes it is contrived but it’s fun to see how concise you can get it. mapAccumL was a good tip:

{-# LANGUAGE 
      DuplicateRecordFields 
    , RecordWildCards 
    #-}

module Lib where 

import Data.Text (Text)
import Data.List (mapAccumL) 
import GHC.Generics 
import Data.Aeson


data Chapter = Chapter {
      title :: Text
    , reset_lesson_position :: Bool
    , lessons :: [Lesson]
    , position :: Maybe Int
    } deriving (Generic)
instance FromJSON Chapter 
instance ToJSON Chapter

data Lesson = Lesson {
      name :: Text
    , position :: Maybe Int
    } deriving (Generic)
instance FromJSON Lesson
instance ToJSON Lesson

book :: [Chapter] -> [Chapter]
book = snd . mapAccumL fo (1,1) 
    where 
    fo :: (Int, Int) -> Chapter -> ((Int, Int) , Chapter)
    fo (p1, p2) c = 
        let p3 = if reset_lesson_position c then 1 else p2
        in ( (p1 + 1, p3 + (length . lessons $ c))
           , c { position = Just p1
              , lessons = lo p3 (lessons c)  
              }
           )
          
    lo :: Int -> [Lesson] -> [Lesson]
    lo p = snd . mapAccumL no p

    no :: Int -> Lesson -> (Int, Lesson)
    no p l = (p + 1, l { position = Just p })