Every once in a while I set out to write some code that involves e.g. some nontrivial index arithmetic, and I find the available toolbox for machine arithmetic clearly lacking. To be specific, I don’t mean any advanced number theory or anything, but rather simple things like addition and multiplication, viewed through the prism of representing mathematical integers on a computer: dealing with overflow, rounding, and other possible limitations of said representation.
For example, when adding two Ints, it can happen that the result is not itself representable as an Int: there is an “overflow” or “underflow”. Being a haskeller, I care about program correctness, and how my program responds to overflow is an important aspect of it. In different circumstances I may intend it to respond differently, here’s a few really common ways:
- Wrapping: perform the addition modulo
2^bitwidth(which in the case ofIntbeing a signed type implies using 2’s complement); - Trapping: if the addition was unsuccessful, terminate the entire computation (
error). - Checked: let me branch (pattern match) on whether the addition was successful or not, potentially also let me branch on whether it was overflow or underflow, etc;
- Saturating: return the
Intthat is closest to the mathematical result, meaningmaxBoundin case of overflow, andminBoundin case of underflow. - Undefined: trust me it’s not going to happen, generate the fastest implementation possible with this assumption.
The base library does not provide any way for me to specify that I want one of these things to happen. Worse yet, the functions that base provides aren’t even documented to belong to one of these categories. Most operations such as (+) @Int are wrapping, but this is not documented anywhere. fromInteger @Natural cannot be wrapping so it is trapping, and so is fromEnum @Natural. But fromEnum @Integer is wrapping.
Still, the program needs to be written, and while one could make pedantically correct implementations of those behaviors to the tune of:
let result = toInteger @Int a + toInteger @Int b
in if
| result > toInteger @Int maxBound
-> -- handle overflow
| result < toInteger @Int minBound
-> -- handle underflow
| otherwise
-> -- use `fromInteger @Int result` which is now guaranteed to be safe
these would be embarrassingly slow. And besides program correctness I care about program performance. So what often ends up happening (I’ve seen it in my code and in others’), is that you reach for GHC.Prim.addIntC# and reinvent the wheel, over and over again.
This situation is basically on par with C, where there’s only one +, it does something and you have to always keep in mind what; and where the recommended trick (!) for detecting overflow is a < a + b.
In the long term I would like the situation in base to improve, but in the interim a package would be great too. Having a concrete and proven-usable interface is a great vehicle for getting changes merged into base.
One of the goals is uncompromising performance, and the implementations may be tricky to optimize, so it would be better to have a centralized place where these optimizations would live, rather than everyone optimizing their own wheel they reinvented. Notably, said optimizations will necessarily have to interact closely with the compiler internals, may require new primops, so maybe having them be maintained inside the GHC source tree would be better? On the other hand, introducing a bunch of primops for which we don’t have a good interface may not be a good idea either.
My search on hackage for prior art only found checked, likely hopelessly outdated. The closest in spirit is actually integer-logarithms, but that only provides logarithms and powers.
I would like to invite some discussion about this topic, what such a package could look like, what changes to base could look like, etc. Here’s some of my initial thoughts:
- Prior to talking about typeclasses or other overloaded interfaces, we should talk about the monomorphic functions we’re trying to support. Having to explicitly spell out
checkedAddInt32ToInt32is better than having nothing to call at all because you couldn’t fit this function in a typeclass that mandates that the type must bea -> a -> a. - We may need more complex functions than the basic binary operators, but it’s difficult to define the exact scope.
- Some operations can be composed out of more basic building blocks, while others may not. For example, consider that a saturating multiply-add is different from a saturating multiply followed by a saturating add. But in wrapping mode they are the same. For another example, unsigned multiply-then-modulo never overflows, but is different than a wrapping multiply followed by a modulo.
- There’s always the reference implementation going through
Integer. Some operations might admit much more performant implementations, while others may not. For example, unsigned multiply-then-modulo is like 2 instructions onx86_64. - Some operations may have potential applications, while others do not.
- Heterogeneous operations such as adding signed to unsigned, absolute difference, 64x64->128 bit multiplication – are also important.
- It’s not clear to do with portability and in particular the
Intsize story. Depending on the platform it’s equivalent to eitherInt32orInt64. And then there’s ancient comments inGHC.Primabout how anIntcan be 30 bits due to tagging. - We should look to other standard libraries for inspiration:
- Rust: by far the best example, u32 has methods
add,strict_add,checked_add,carrying_add,wrapping_add,unchecked_add,saturating_add,overflowing_add. There’s also a lot of non-basic operations worth considering. - C++: has
add_satandckd_add. Alsomidpointis a non-basic operation worth considering. - Zig: Integer Overflow, there’s builtin operators
+,+%,+|, operations@addWithOverflowandmath.add.
- Rust: by far the best example, u32 has methods