One Billion Row challenge in Hs

tomjaguarpaw · March 20, 2024, 3:59pm

I’m inclined to agree, but I still don’t understand what you’re asking for. Is an Opaleye-like solution satisfactory? Or a bytestring-like solution?

I suspect that’s because it’s a simple problem. Once the problem becomes sufficiently complicated the challenge becomes managing complexity in the implementation rather than the speed of bit twiddling. I suspect Haskell to eclipse C in complex scenarios.

the naive solutions literally don’t work because of the way thunks accumulate in lazy evaluation

I don’t think we’ve seen any problem in this thread that was related to thunks accumulating, have we? There are indeed some strictness annotations, but they are about specifying whether individual thunks are allocated or not, not about whether long chains of thunks are allocated.

I remember one case (resolved at Optimiser performance problems - #30 by tomjaguarpaw). Could you share the two others?

The suggestion to use force from deepseq is absolutely wrong, and adopting the strict container structure (I wouldn’t call it “specialised”, I’d just call it “correct”) is absolutely the right solution (i.e. make invalid laziness unrepresentable). I am willing to die on this hill! In brief: if you don’t want thunks to appear in your data type then choose a data type that doesn’t allow thunks to appear.

No, that’s not right. I explained that the approach to AD that the ad package takes is doomed to be slow on vector problems. That has nothing to do with thunks.

It’s a curious thing. There’s one large group of people (mostly outside the Haskell world) that thinks that “Haskell is hard to use because laziness and thunks make it hard to reason about performance and space usage” and another group of people (mostly Haskell insiders) who think “Haskell is great, laziness has all sorts of benefits, and no problems in practice”. They’re both wrong! Laziness has severe implications when used incorrectly, but the solution is simple, not difficult: make invalid laziness unrepresentable. Unfortunately this message has not really percolated very far in the Haskell world. I will keep on trying.

To reiterate, Haskell will be better for more complex problems, not worse! After all, the most simple problems will always be fastest with hand-crafted assembly.