ByteString.empty is actually not the same as ""

4eUeP · July 29, 2024, 3:30am

Today, I discovered something interesting that could potentially cause a bug while working with FFI.

GHCi, version 9.4.8: https://www.haskell.org/ghc/  :? for help
ghci> :set -package bytestring
package flags have changed, resetting and loading new packages...
ghci> import qualified Data.ByteString as BS
ghci> import qualified Data.ByteString.Internal as BSI
ghci> import Foreign.ForeignPtr
ghci> :set -XOverloadedStrings
ghci> BS.empty == ""
True
ghci> withForeignPtr (fst (BSI.toForeignPtr0 "")) print
0x0000004200431e70
ghci> withForeignPtr (fst (BSI.toForeignPtr0 BS.empty)) print
0x0000000000000000
ghci>

atravers · July 29, 2024, 4:17am

#  ghci
GHCi, version 9.8.2: https://www.haskell.org/ghc/  :? for help
ghci> import Foreign.Marshal.Utils
ghci> (1 + 1 :: Int) == 2
True
ghci> with (1 + 1 :: Int) print
0x000000420042a428
ghci> with (2 :: Int) print
0x000000420042c248
ghci>

No surprise here - while expressions 1 + 1 :: Int and 2 :: Int reduce to the same normal form, Haskell has non-strict semantics. So those two expressions are represented internally by two different structured values: a different thunk for each expression. That’s why the results of with and withForeignPtr are I/O actions.

4eUeP · July 29, 2024, 6:00am

It’s nothing about IO or laziness here. It’s all about “nullptr” vs “pointer to a null”. You can view the implementation here:

To be short, The underlying of empty is a ForeignPtr with nullAddr# and packChars "" is a ForeignPtr with memory allocate by newPinnedByteArray#

4eUeP · July 29, 2024, 6:13am

And I find a related closed issue: unsafeUseAsCString(Len) may return a NULL pointer. · Issue #674 · haskell/bytestring · GitHub

atravers · July 29, 2024, 9:45am

Since you were able to diagnose where it is, can you can provide some patches to the bytestring maintainers to resolve this discrepancy?

BurningWitness · July 29, 2024, 10:38am

The byte array in question is zero-sized, there’s no good reason to ever follow that pointer.

Furthermore sharing empty values is a whole separate maintainer headache, see e.g. text#493.

Personally I’m in favor of data ByteString = BS1 ByteString1 | Empty, since most functions already do emptiness checks and ByteString1 has its own uses, same with Text. You’d need a lot of pushing to get something like that through, however.

darkxero · July 29, 2024, 11:38pm

What is the expected behaviour?

Empty BS is represented by null pointer, “” is represented by null pointer
Empty BS is represented by pointer to nothing, “” is represented by pointer to nothing, the address of the pointer to nothing is shared
Empty BS is represented by pointer to nothing, “” is represented by pointer to nothing, the addresses of the pointers are different

Also, is the nothing, another null pointer, nothing (empty array of char), or a null terminated CString (array of char with 1 element but that strlen says that it is length 0)?

sclv · July 30, 2024, 1:12am

Agreed, I don’t see any issues here. This is just what happens when you get down to the pointer level. Some pointers look different than others!

4eUeP · July 30, 2024, 7:46am

It doesn’t seem easy. I’d rather consider this a design challenge than a flaw.

Yes, and, this may also introduce a little performance degradation(I’m not sure).

Well, the current behaviour is acceptable since it’s a defined behaviour. However, it might be a good idea to put a warning remark in the documentation.

Yes, generally, this is fine. However, it requires users to be more careful. (Although dealing with pointers is always dangerous)

Consider the following example:

// c code
void f(char* p) {
  if (p) 
    do something...
  else
    do another...
}

-- haskell
foreign import ccall unsafe "f" c_f :: Ptr Word8 -> IO ()

f :: ByteString -> IO ()
f bs = withForeignPtr (fst (BSI.toForeignPtr0 bs)) c_f

Normally, you may expect f BS.empty to be totally the same as f "", but actually, it will enter two different branches defined in the C code.

BurningWitness · July 30, 2024, 9:20am

Consider using functions from Data.ByteString.Unsafe instead: they are in the stable part of the API and they are well-documented. In this case unsafeWithCStringLen does the same thing.

And, once again, you may not discard length when using ByteStrings this way because internally they are not null-terminated.

TeofilC · July 30, 2024, 9:43am

See also this issue about sharing 0 length ByteArray# objects. Ben makes the good point that users might depend on creating distinct 0-length objects for instance because they might use them with weak pointers.

sclv · July 30, 2024, 5:30pm

users should not expect two different bytestrings containing the string “hello” to have the same pointer, and so by induction there’s no reason they should expect that two containing the string “” would either.

in general, pointer equality is never presumed to be synonymous with semantic equality.

Topic		Replies	Views
Questions about FFI, ForeignPtr, and Opaque Types Learn	25	1986	July 24, 2023
Bytestring-0.11.3.0 Announcements	0	683	March 5, 2022
Question regarding unsafePerformIO and ForeignPtr Learn	1	519	August 30, 2022
Why is the length of a string in Foreign.C.String defined as Int and not as CSize? Learn	2	485	December 4, 2021
Base proposal around vector-like types Haskell Foundation	85	7485	August 26, 2023

ByteString.empty is actually not the same as ""

Related topics