Why doesn't UUID have a Bits instance?

Ambrose · September 20, 2023, 8:43pm

Those Bits instances couldn’t possibly be pure as-is. Whereas there’s a pretty well-defined and simple isomorphism between (Word64, Word64) and UUID.
This seems like a more principled stance that I’d personally take. At the end of the day, this is all running on computers after all. The best part of Haskell is there libraries with conflicting principles all over the place, and none of them are inherently wrong. This feels like that case

I’ll post my library on Discourse once I’ve got something

rhendric · September 20, 2023, 8:54pm

atravers:

Well, I’m assuming that we don’t want:
instance Bits (a -> b)
instance Bits (IO a)
instance Bits (ST s a)
…those instances being on “the other side” of your designated “line”.

And you think that a UUID is more like those types than like a file descriptor?

Hey, don’t ask me. I have no idea. You’re the one who took a position:

And I’m just asking questions to try to understand your position, because it seems to me like your position is only consistent with not having a Bits class at all. (And maybe we shouldn’t! I’m not here to defend Bits either.)

atravers · September 20, 2023, 10:29pm

instance Bits (a -> b)
instance Bits (IO a)
instance Bits (ST s a)

couldn’t possibly be pure as-is.

But why let that stop you? Like UUIDs, they’re just abstractions too! And according to you, it’s only the representation that matters, rather than the abstraction…

…as well as:

(Word32, Word32, Word32, Word32)
(Word16, Word16, Word16, Word16, Word16, Word16, Word16, Word16)
(Word8, Word8, Word8, Word8, Word8, Word8, Word8, Word8, Word8, Word8, Word8, Word8, Word8, Word8, Word8, Word8)

…now which of those Words contained those aforementioned version bits?

…and at the end of the day:

you and your computer are just assemblages of atoms
and your programs are just assemblages of bits;

Fortunately:

Let’s have another look at the comment which initially drew my attention:

…so why stop with just UUIDs - are there other types that people believe should have an “obligatory” Bits instance? Where does that “obligation” end? After all, everything on our (current) computers are assemblages of bits - why not provide Bits instances automatically for all types and be done with it?

(…alright, all other types.)

One practical problem I haven’t mentioned so far is that an instance cannot be confined to its module of origin. So having “obligatory” instances for many types only increases the burden on an Haskell implementation as a result of tracking all those instances throughout the program, even if they’re only ever used in their module of origin.

Then there’s Rice’s theorem, which places a limit on algorithmically determining whether a type should have a Bits instance (or not). Any “dividing line” will thus be subjective to some extent.

Having considered all that…what types should have Bits instances? I’m not a regular user of Bits so I haven’t had to make choices about which of my types needed to have its instances. With that disclaimer out of the way:

Right now, I think Bits instances should be reserved for FFI-compatible atomic types, or their direct newtypes, unless:
- the specification (as in ISO, not Haskell) for that type makes no direct mention of manipulating individual bits.

So @Ambrose using Bits inside his library would be sensible here - as noted earlier, there are at least four bits with relevant metadata in any given UUID, which would be very awkward to retrieve by using combinations of div, mod and other such functions.

But (to me, anyway) a UUID was not intended to be used like a (hypothetical) Word128 - that it just so happens to be most easily encoded as a Word128 should be regarded as a mere coincidence, an “implementation detail”. Therefore the interface or API provided by @Ambrose’s new library should keep that coincidence private, within the library.

BurningWitness · September 21, 2023, 2:38am

The line is in what the typeclass is supposed to do, ideally Bits is solely for data that has all (or most) of the operations supported as CPU instructions. From this point of view neither Bool, nor indeed Fd are valid instances. And yes, this is a bit murky still due to the fact that GHC doesn’t allow to say “if the system is 32-bit, there shall be no 64-bit instances”, but that’s more of a downstream effect of the language choices, not the fault of the typeclass definition.

BurningWitness · September 21, 2023, 2:52am

Again, the point is that you shouldn’t create partial instances. The “I can treat data as a finite field, that means it is a finite field” line of thinking leads to a conclusion that any arbitrary block of memory deserves a Bits instance, at which point a xor on two in-memory JSON files somehow starts making sense.

Ambrose · September 21, 2023, 3:30am

xor (and masking in general) UUIDs does make perfect sense and is actually useful in real life though.

And (Finite)Bits UUID won’t even be a partial instance!

They should’ve added a Num or Integral superclass constraint if they meant what they said in that doc comment

BurningWitness · September 21, 2023, 3:39am

xor on two UUIDs will always change the version, in the case of two version 4 ones the version will be set to 0. The resulting UUID is malformed in the overwhelming majority of cases (or should I say all because two version 4 UUIDs can never yield a version 4 one?).

atravers · September 21, 2023, 4:51am

…no, xor-ing (and masking in general) Word128 values makes perfect sense and is probably useful for other tasks. But it seems you cannot be convinced to take that “more principled stance” and define something like UUIDInfo, so I will suggest an alternative:

Define an actual Word128 type with all the appropriate instances like Bits: it can be in its own library or package, or you can extend base - whichever is more convenient for you.
Then use the new Word128 type to (re)define UUID as a type synonym:
```
type UUID = Word128
```

And there you have it: a UUID type with a Bits instance. As a bonus, you’ve also provided another ultra-large fixed-width integral-value type which others can use for their own purposes.

Ambrose · September 21, 2023, 5:04am

The UUID is not malformed at all. Slap it in postgres. Use it as an idempotency ID in an API call. Nobody cares about version bits. They’re noise.

Ambrose · September 21, 2023, 5:05am

Oh you mean like this

data UUID = UUID {-# UNPACK #-} !Word64 {-# UNPACK #-} !Word64

(source: uuid-types)

Ambrose · September 21, 2023, 5:21am

Also, to get back to my original question:

Why doesn’t UUID have a Bits instance?

There’s not been a single especially good argument against it.

It’s total and can validly implement the class.
There’s been no example of a footgun or ill effects of the instance.
The instance can be used to solve real, production software engineering problems.

Feels like the answer is “cuz there’s not one.”

BurningWitness · September 21, 2023, 5:30am

Then what you’re talking about is not a UUID, it’s a 128 bit finite field formatted to look like a UUID. Basic UUID operations, like retrieving the version, are inapplicable to this datatype.

Can you stick a malformed UUID into Postgres? Perhaps. However when you generate a random UUID in Postgres, you’ll explicitly get a version 4 one.

Ambrose · September 21, 2023, 6:12am

This discussion is so comically disconnected from the real world of UUIDs What value do any of these contrary opinions even bring? Like as a programmer.

rhendric · September 21, 2023, 6:21am

You can store image data in a file however you like, and if you call it an image file and tell people that you want to do operations on your image file, nobody’s going to bat an eye. But if you call it a PNG, a certain type of detail-oriented person is going to be very upset if the file doesn’t start with the bytes "\x89PNG". Because that’s what a PNG file does, and it’s what other applications expect when they read PNG files. No matter how much you protest that you aren’t giving your PNG files to other applications so it doesn’t make a practical difference if they use the correct header or not, you are still technically incorrect if you call them PNG files. The solution is simple: just call them image files, and use whatever format works for you.

There’s nothing wrong with the 128-bit identifiers you’re using to solve real-world problems. But if you’re ignoring the version bits because you couldn’t care less about interoperability with anything else, don’t call them UUIDs. (Or be prepared to get a lot of pushback when you talk about your UUIDs in public spaces, I guess.)

Ambrose · September 21, 2023, 1:10pm

Would you consider this function a bug then?

fromWords64 :: Word64 -> Word64 -> UUID

rhendric · September 21, 2023, 1:50pm

… No? It’s a function that doesn’t validate its input, but it isn’t documented to validate its input. Given how the Data.UUID.Types module doesn’t actually expose any of the details of a UUID’s format, I understand the choice; it’s not like that particular API is going to crash if you use fromWords64 to construct an invalid UUID. A richer API might want stronger guarantees around correctness.

Ambrose · September 21, 2023, 2:16pm

Are there any examples of any software systems or libraries using UUIDs and not accepting all permutations of 128 bits? I get it in theory but have never seen it in practice.

Maybe this is all just the computer engineer in me talking. I couldn’t with a straight face say that XORing two UUIDs isn’t a viable thing you could do.

Hmm I think this discussion is just a matter of differing opinion and mindsets. Which is a common Haskell thing! It’s important to have different approaches to the same problem.

So I think this discussion has identified not one but two seemingly paradoxical ecosystem gaps:

Ergonomic bitwise operations on UUIDs as 128-bit words
A more precisely-typed and strict library that treats UUIDs as more than 128-bit words informed by the RFC

I bet the former could help with the implementation of the latter, actually.

Now I have two projects to do instead of one Thanks for the discussion, all!

Ambrose · September 21, 2023, 2:36pm

Here’s the plan:

sized-bits - A library with extended Bits support, including N-ary tuples with FiniteBits instances and a parallel Bits hierarchy with more type-level information.
- It probably won’t be fruitful, but I’ll play around with Liquid Haskell as well.
uuid-bits - A UUID newtype with FiniteBits etc instances + an isolated orphan instance. This will depend on sized-bits.
uuid-typed - A more precisely-typed UUID library that can validate format etc. There’s probably some fun type-level-y stuff to be had here as well. This will depend on uuid-bits under the hood.
- It would be cool to have proven-to-the-spec UUID generation as well.

BurningWitness · September 21, 2023, 2:52pm

Are there any examples of any software systems or libraries using UUIDs and not accepting all permutations of 128 bits?

The question of why people use UUIDs when all they need are 128 bits of noise is a good one, but it has nothing to do with UUIDs as a format.

Would you consider fromWords64 a bug then?

It should be prefixed with unsafe or ideally it shouldn’t exist at all and the internals of UUID should be exposed in an .Unsafe module.

Ambrose · September 21, 2023, 3:18pm

It sounds like you will want to eschew uuid-types in favor of uuid-typed for your ventures once the library stabilizes then!