Looking to parse a decimal number into floating point efficiently

Say I’m parsing a decimal number into a Double from a text file and the number happens to be something like

1234.56789e305

It’s quite trivial to separate it out into 123456789 (significand expressed as unsigned integer in base-10) and 300 (base-10 exponent), and it’s quite obvious it’s a representable Double since the exponent does not exceed 308.


From what I gather the current base solution is roughly

fromRational (123456789 * (10 ^ (300 :: Integer)) % 1) :: Double

This obviously works correctly precision-wise, but constructing 10300 is quite overkill for an operation that shouldn’t require any big numbers whatsoever (since significand never needs more than 17 decimal digits for precision and exponent fits into a 16-bit number).

Am I missing a better solution (that does not include me implementing decimal to binary floating-point conversions) or is this currently all base got?

1 Like

The first and most important question is, if you want your doubles to be round-trippable, so that printing and re-reading them yields the same value.
And printing floats is a nice rabbit hole by itself, a good place to start is https://dl.acm.org/doi/10.1145/249069.231397

Why do it at all then? Just use Text.Read.
If you change your mind: The Eisel-Lemire ParseNumberF64 Algorithm | Nigel Tao
Or, slower but “simpler” ParseNumberF64 by Simple Decimal Conversion | Nigel Tao

2 Likes

I’m writing a parser library, so floating-point conversions are something I’d expect to be properly dealt with in a different place (ideally base or adjacent, hence the question).

Similarly Text.Read does nothing for me, because the number is already in the two-unsigned-integers form I described above.

And I won’t need to bother with encoding because Data.ByteString.Builder exists.

1 Like

Fast binary serialization primitives in Haskell are of interest to me. Might you know if we have a fast string-to-float sitting in any library currently? If not, and if that algorithm you linked is fairly simplistic (at least more than ryu), I’d have a go. flatparse would appreciate more fast primitives.

Note that bytestring received a ryu float-to-string implementation recently, so printing is very fast. Though it’s sadly not general use (you can’t plug your own builder into it), and the code is quite complex so tough to rip out into a standalone lib.

I ran into a similar issue myself, ended up importing atoi from C.

I have been working on the ryu implementation in bytestring and there are performance problems and bugs in the current implementation. I have several pull requests in the queue that fix this but the reviews are going slowly. Just a heads up.

1 Like

I have several closed PRs that may be helpful to modifying it Pull requests · haskell/bytestring · GitHub

1 Like

I have created a standalone ryu with formatting package GitHub - BebeSparkelSparkel/hryu: Haskell Translation of the Ryu Fast Float to String

1 Like

Fantastic, thanks for this! I envisage a world where we have a builder-agnostic ryu implementation (that works on ByteArrays or ForeignPtrs). I have a ByteString builder bytezap with zero intermediate allocation that could handle JSON serializing if I had efficient float printing (and an efficient way to pre-calculate output length, which sadly wasn’t in the paper!)