Changing the `Binary` instance for `Double` and `Float`

I did, it only reduced the running time by 20%. binary uses CPS which fundamentally limits performance (it builds a tree of thunks at run-time unless everything inlines).