Improving Machine Arithmetic

Every once in a while I set out to write some code that involves e.g. some nontrivial index arithmetic, and I find the available toolbox for machine arithmetic clearly lacking. To be specific, I don’t mean any advanced number theory or anything, but rather simple things like addition and multiplication, viewed through the prism of representing mathematical integers on a computer: dealing with overflow, rounding, and other possible limitations of said representation.

For example, when adding two Ints, it can happen that the result is not itself representable as an Int: there is an “overflow” or “underflow”. Being a haskeller, I care about program correctness, and how my program responds to overflow is an important aspect of it. In different circumstances I may intend it to respond differently, here’s a few really common ways:

  • Wrapping: perform the addition modulo 2^bitwidth (which in the case of Int being a signed type implies using 2’s complement);
  • Trapping: if the addition was unsuccessful, terminate the entire computation (error).
  • Checked: let me branch (pattern match) on whether the addition was successful or not, potentially also let me branch on whether it was overflow or underflow, etc;
  • Saturating: return the Int that is closest to the mathematical result, meaning maxBound in case of overflow, and minBound in case of underflow.
  • Undefined: trust me it’s not going to happen, generate the fastest implementation possible with this assumption.

The base library does not provide any way for me to specify that I want one of these things to happen. Worse yet, the functions that base provides aren’t even documented to belong to one of these categories. Most operations such as (+) @Int are wrapping, but this is not documented anywhere. fromInteger @Natural cannot be wrapping so it is trapping, and so is fromEnum @Natural. But fromEnum @Integer is wrapping.

Still, the program needs to be written, and while one could make pedantically correct implementations of those behaviors to the tune of:

let result = toInteger @Int a + toInteger @Int b
in if
  | result > toInteger @Int maxBound
  -> -- handle overflow
  | result < toInteger @Int minBound
  -> -- handle underflow
  | otherwise
  -> -- use `fromInteger @Int result` which is now guaranteed to be safe

these would be embarrassingly slow. And besides program correctness I care about program performance. So what often ends up happening (I’ve seen it in my code and in others’), is that you reach for GHC.Prim.addIntC# and reinvent the wheel, over and over again.

This situation is basically on par with C, where there’s only one +, it does something and you have to always keep in mind what; and where the recommended trick (!) for detecting overflow is a < a + b.

In the long term I would like the situation in base to improve, but in the interim a package would be great too. Having a concrete and proven-usable interface is a great vehicle for getting changes merged into base.

One of the goals is uncompromising performance, and the implementations may be tricky to optimize, so it would be better to have a centralized place where these optimizations would live, rather than everyone optimizing their own wheel they reinvented. Notably, said optimizations will necessarily have to interact closely with the compiler internals, may require new primops, so maybe having them be maintained inside the GHC source tree would be better? On the other hand, introducing a bunch of primops for which we don’t have a good interface may not be a good idea either.

My search on hackage for prior art only found checked, likely hopelessly outdated. The closest in spirit is actually integer-logarithms, but that only provides logarithms and powers.

I would like to invite some discussion about this topic, what such a package could look like, what changes to base could look like, etc. Here’s some of my initial thoughts:

  • Prior to talking about typeclasses or other overloaded interfaces, we should talk about the monomorphic functions we’re trying to support. Having to explicitly spell out checkedAddInt32ToInt32 is better than having nothing to call at all because you couldn’t fit this function in a typeclass that mandates that the type must be a -> a -> a.
  • We may need more complex functions than the basic binary operators, but it’s difficult to define the exact scope.
    • Some operations can be composed out of more basic building blocks, while others may not. For example, consider that a saturating multiply-add is different from a saturating multiply followed by a saturating add. But in wrapping mode they are the same. For another example, unsigned multiply-then-modulo never overflows, but is different than a wrapping multiply followed by a modulo.
    • There’s always the reference implementation going through Integer. Some operations might admit much more performant implementations, while others may not. For example, unsigned multiply-then-modulo is like 2 instructions on x86_64.
    • Some operations may have potential applications, while others do not.
  • Heterogeneous operations such as adding signed to unsigned, absolute difference, 64x64->128 bit multiplication – are also important.
  • It’s not clear to do with portability and in particular the Int size story. Depending on the platform it’s equivalent to either Int32 or Int64. And then there’s ancient comments in GHC.Prim about how an Int can be 30 bits due to tagging.
  • We should look to other standard libraries for inspiration:
    • Rust: by far the best example, u32 has methods add, strict_add, checked_add, carrying_add, wrapping_add, unchecked_add, saturating_add, overflowing_add. There’s also a lot of non-basic operations worth considering.
    • C++: has add_sat and ckd_add. Also midpoint is a non-basic operation worth considering.
    • Zig: Integer Overflow, there’s builtin operators +, +%, +|, operations @addWithOverflow and math.add.
16 Likes

For prior art there is int-cast: Checked conversions between integral types for safe casts. Also bytestring goes some length to catch overflows, grep for checkedAdd.

1 Like

Haskell packages can contain not only Haskell, but also CMM, which gets you very close to “your own primops” if you need, without hacking them inside GHC.

4 Likes

This would extend to other integral types in Data.Int, right?

That’s interesting. Are you referring to this inline-cmm library? Or maybe separate .cmm source files?

Just .cmm files, they can be included under cmm-sources: in Cabal file.

3 Likes

See this example: atomic-counter/Counter.cmm at master · sergv/atomic-counter · GitHub

3 Likes

speaking of checked arithmetic in ghc, a major problem is that it’s not comprehensive; it’s better to have:

  • explicitly sized int/word operand types
  • high level base library wrappers

it’s possible to diy one’s own checked arithmetic implementation using regular primops (e.g. see genericIntAddCOp in the compiler), but still it’s better to have primop coverage and different backends may be able to emit more efficient code.

@Kleidukos has already opened Adding checked arithmetic low-level functions (#22665) · Issues · Glasgow Haskell Compiler / GHC · GitLab a few years ago about this topic, thanks!

7 Likes

I agree with you. The Haskell report (and GHC) leaves it unspecified what happens on overflow. This was an intentional choice 35 years ago (in the name of efficiency). So that means that Haskell arithmetic is, sadly, in your last category “Undefined”.

I’m in the middle of changing MicroHs to do trapping arithmetic. It’s definitely not without prooblems, e.g., (-128)::Int overflows in the negation since it is negate (fromInteger @Int8 128). But I will give it a shot and see what happens. As with most things in MicroHs, it willk be an optional feature.

It would be great to have primitives for all the variants that you suggest. And we also should work towards making a better choice for the default.

3 Likes

I would consider the future behaviour of MicroHS a bug. The documentation of Num says that “(+) and (*) are customarily expected to define a ring” which implies that the operations are total.

So you’d rather have arithmetic that silently gives you the “wrong” answer rather than scream at you? I don’t.

Since there are already several instances (Double, Float, Natural) that don’t obey the ring axioms, I feel that I’m in good company.

I consider that as the core problem: How rigorously are we expecting the already weakly stated arithmetic laws to hold? Even numeric-prelude falsely claims that Double is a ring. One way forward would be to have new typeclasses with high entry barriers for membership, e.g. class HonestToGodRingWithTotalAndLawfulOperations of which Integer but not Double can be a member.

See, I think part of the problem is that Haskell not only conflates eg the concept of a binary operation of a lawful ring / additive group / etc (which kind of has the same problems as having a privileged Monoid instance) with that of a more generalized concept of a forgetful / exceptional function that only approximates it within certain limits (I do support better checked arithmetic) - but we are also conflating the function ‘add’ with the symbolic operator ‘+’, because Haskell Num requires defining it as (+) which not only locks fallible numeric addition behind lawful additive groups, but it also steals the syntactic operator which requires hiding the entire Num hierarchy to deal with - thus partly contributing to the frankly rather insane number of custom operators endemic to advanced Haskell.

1 Like

I feel like this lends itself to newtypes for deriving via, e.g.:

newtype OverflowStrategy a = OverflowStrategy a

instance (Num a) => Num (OverflowStrategy a) where
-- ... do the overflow here ...

-- etc other strategies...

newtype MyCustomIntegral a = MyCustomIntegral a
    deriving Num via (OverflowStrategy Int) -- or whichever would compile...

or equivalent, then using coerce to go between them.

Maybe a library like that exists, maybe it’s worth experimenting with, or getting something like this into base.

Cheers

1 Like

Rust actually does this, it has the Saturating and Wrapping newtypes for saturating and wrapping arithmetic, respectively. I think it would work even better in Haskell, since number literals can be overloaded. However the Haskell version of this would require FlexibleInstances.

You could avoid FlexibleInstances by introducing an extra class:

newtype Wrapping a = Wrapping a
class WrappingNum a where
  ...
instance (WrappingNum a) => Num (Wrapping a) where
  ...