Long time lurker, first time poster. I’ve been working for a while on a repo using parallelism in computer algebra at https://github.com/DaveBarton/calculi. I’ve been able to get speedups of around 30x on the Buchberger algorithm for Gröbner Bases, surprisingly easily using Haskell, while I think no one else has successfully parallelized this algorithm at all. (There are other parallel algorithms for Gröbner Bases, “F4” and “F5” based, which I plan to work on next.) This is definitely of interest in the computer algebra community, and I’ve been invited to a workshop in late September where I’ll tell some of them about it. I planned to post here to ask you all for advice and suggestions.
Others have suggested that parallelism might be the (or a first) killer app for Haskell, which I personally agree with. In addition to our personal machines having 5-10+ performance cores in the next few years, researchers often have access to clusters of machines with 50-100 cores per NUMA node, and soon there will be more. I know some other Haskellers also have access to such machines. See the graph in this link, which actually shows the number of AMD and Intel performance cores per NUMA node. ARM is coming on strong also, and CPUs with 100-200 cores have been announced for 2024, costing around $10K per CPU in bulk. Of course you have to buy a whole system, and pay for electricity and maintenance, but it’s still easily affordable for research labs and universities, etc. There’s open source and proprietary software available for job submissions and scheduling, etc.
Of course, floating point supercomputers using GPUs/etc. have now crossed the “exascale” milestone, which is 10^18 flops! That’s a billion billion floating point operations per second! But this only works because of massive SIMD-ness, e.g. 50,000+ GPUs each containing thousands of hardware multipliers, etc. One has to avoid even using branch instructions much to get this to work. So obviously it’s not for general purpose programming. (I am not a real expert on this, if someone wants to correct me or explain it better, please do.)
I believe in functional programming, pure code, Hindley-Milner based type checking, etc., so that Haskell code can be safer, more modular, and more general than in other languages. But parallelism may be the way to convert the masses. Especially for shared-memory multiprocessing (as opposed to distributed or “grid” computing), Haskell really makes things easy.
Getting back to the original post, I have learned a lot of things about this. A quick summary is that for obviously (“embarrassingly”) parallel algorithms, I do use rpar
and runEval
at least from Control.Parallel.Strategies
. But I think that book’s approach, and GHC sparks in general, try too hard to maintain determinism. To get competitive times for Gröbner Bases, I had to fork my own threads. It still was pretty easy. I did learn other things, such as on 30+ cores you need to keep laziness away from your shared data structures, gc does currently become a bottleneck, etc. I can/will say more about this, but this post seems long enough for now.
Finally, I’d like to mention and ask about the history of this. Apparently, Microsoft funded a Parallel GHC project in 2010-2012. Unfortunately, funding ended, Simon Marlow went to Facebook, work on extending the parallel garbage collector to use local heaps stopped, etc. But for CPU-intensive programming, I still believe parallelism is the biggest problem in computer science, and Haskell is the best way to solve it. I’d really like to hear from others with experience with this. (Sorry for the long note.)