I think what we want is a language L that makes it easy to encapsulate performant code in reusable libraries, such that users can use performant library functions instead of rolling their own implementation. This way, only the library authors need to know how to write performant L code, not the users of that library.
Of course, in order to write these libraries, we
- have to start from a concrete motivation such as the 1brc problem
- come up with a fast solution (e.g., using hacky
W8#
in parseLine
, perhaps unboxed byte strings)
- generalise this solution into a library.
We are currently doing (2) for L=Haskell. If we canāt do (2), a.k.a., the solution is not fast enough, then we should think about how our language/its compilation can be improved in order to make it fast enough. (I like thinking about this.)
Only then can we think about (3), which is another hard problem. Evidence is the abundance of INLINE
pragmas in high-perf libraries that result in code size explosion.
Iām increasingly convinced that a reliable solution to (3) can only be achieved by libraries using staged compilation/Typed Template Haskell, but in a way that users of that library can stay mostly oblivious to staging (so no syntactic noise through splicing/quoting). @AndrasKovacs knows much about this.
C (C++, Fortran) is a language that makes (2) easy, but (3) hard. Hence, performant C code needs an expert C programmer. L should do better, enabling performant L code written by novice L programmers using the right libraries.
When using SQL to solve the problem, SQL is like the interface of a C library (DuckDB); its implementation is certainly not pretty nor easily understood by its users. However, DuckDB is but one library, and it is not entirely trivial to interface with custom data types/your problem domain in L. Iād rather have the means in L to define a competitive DBMS as a library, first class (factor 2 at most compared to C). But that needs us solving (2) and (3).
So whatās causing problems with (2)? Why isnāt it āfastā, single-threaded? Thatās what we need to figure out. (Parallelism is another problem.)
Edit: Andras claims below that the fastest single-threaded solution performs like straightforward C solutions already, so we might be āfast enoughā. Not sure what the fastest C solutions do better; perhaps vectorisation etc. which is kind of a pain point for Haskell (but it might not be for a staged solution).