Questions about FFI, ForeignPtr, and Opaque Types

jackdk · July 9, 2023, 3:15am

Only possible if you want the freeFinalizer — in my case I have to use the destroy function provided by libtelnet.

ssddq · July 9, 2023, 3:35am

This is somewhat orthogonal to the question about ForeignPtr: are the foreign import types correct?

It seems to me that botan_init_hash has the correct type Ptr (Ptr ()) -> ..., but botan_hash_name does not. I think the latter should be Ptr () -> ....

This isn’t exactly the code you were trying to write, but the following seems to return something sensible for me (48 for SHA-384):

main = do
  outputLength <- botanHashLength "SHA-384"
  print outputLength

type OpaqueHash = Ptr Word8

foreign import ccall unsafe "botan_hash_output_length" 
  botan_hash_output_length :: OpaqueHash -> Ptr CSize -> IO Int

foreign import ccall unsafe "botan_hash_init" 
  botan_hash_init :: Ptr OpaqueHash -> Ptr CChar -> Word32 -> IO Int

botanHashLength :: String -> IO Int
botanHashLength string = withCString string $ \ptr -> do
  ppHash <- malloc
  botan_hash_init ppHash ptr 0
  pHash <- peek ppHash
  pSize <- malloc
  botan_hash_output_length pHash pSize
  fromIntegral <$> peek pSize

EDIT: This is very rough, but for setting up the ForeignPtr properly, the following seems to work for me:


type OpaqueHash = Ptr ()

foreign import ccall unsafe "botan_hash_name" 
  botan_hash_name :: OpaqueHash -> Ptr CChar -> Ptr CSize -> IO Int

foreign import ccall unsafe "&botan_hash_destroy" 
  botan_hash_destroy :: FunPtr (OpaqueHash -> IO ())

foreign import ccall unsafe "botan_hash_output_length" 
  botan_hash_output_length :: OpaqueHash -> Ptr CSize -> IO Int

foreign import ccall unsafe "botan_hash_init" 
  botan_hash_init :: Ptr OpaqueHash -> Ptr CChar -> Word32 -> IO Int


newtype Hash = Hash (ForeignPtr ())

hashInit 
  :: String
  -> IO Hash
hashInit string = withCString string $ \ptr -> do
  ppHash <- malloc
  botan_hash_init ppHash ptr 0
  pHash <- peek ppHash
  fp <- newForeignPtr botan_hash_destroy pHash
  return $ Hash fp

botanHashName 
  :: Hash
  -> IO String
botanHashName hash@(Hash fp) = withForeignPtr fp $ \pHash -> do
  pSize <- malloc
  botan_hash_output_length pHash pSize
  reqLen <- peek pSize
  pCChar <- mallocArray (fromIntegral reqLen)
  botan_hash_name pHash pCChar pSize
  peekCString pCChar

main = do
  hash <- hashInit "SHA-384"
  name <- botanHashName hash
  print name

ApothecaLabs · July 9, 2023, 4:11am

Thank you - this helped confirm what I needed to do.

I see now! This was most helpful!

Indeed I am giving them closer scrutiny after discovering that some things were incorrectly documented. Your comment has been very helpful, and you’ve expanded on your answer greatly as I was integrating the results of your initial comment with what everyone else has said - but it appears we converged something very similar regardless!

It is late, but I think I have the clarity to move forward tomorrow with the Botan.Hash module. Thank you all so much for your very helpful pointers!

jaror · July 9, 2023, 8:22am

One thing I’ve seen some libraries do is to do the foreign pointers like this:

newtype Hash = Hash (ForeignPtr Hash)

I believe it was haskell-gi that does things that way.

ApothecaLabs · July 12, 2023, 3:54pm

If someone wouldn’t mind sanity checking this for me, I want to make sure that this initialization pattern is sane and doesn’t leak:

-- typedef struct botan_foo_struct* botan_foo_t;
type OpaqueFoo = Ptr ()
newtype Foo = Foo { fooForeignPtr :: ForeignPtr OpaqueFoo }

-- int botan_foo_init(botan_foo_t* foo);
foreign import ccall unsafe botan_foo_init :: Ptr OpaqueFoo -> IO BotanErrorCode

-- int botan_foo_destroy(botan_foo_t foo);
foreign import ccall "&botan_foo_destroy" botan_foo_destroy :: FunPtr (Ptr OpaqueFoo -> IO ())

fooInit :: IO Foo
fooInit name = do
    fooForeignPtr <- malloc >>= newForeignPtr botan_foo_destroy
    withForeignPtr fooForeignPtr $ \ fooPtr -> do
        throwBotanIfNegative_ $ botan_foo_init fooPtr
    return $ Foo fooForeignPtr

My concern is that Botan's design is to pass a pointer-pointer in to the initializer and return an success-error code, rather than return an initialized pointer. Being unfamiliar with ForeignPtr, I want to ensure that the pointer that we malloc for use with botan_foo_init is itself destroyed with botan_foo_destroy when the ForeignPtr is garbage collected, and not just the opaque object that it was pointing to.

I think this is correct, because when I used alloca instead of malloc (because I was replacing ByteArray.alloc), I got exceptions because long-lived references were having their pointers freed - but what I don’t want now is the opposite problem of leaking memory. I made the swap to malloc after reading the docs more closely (and checking the source of ByteArray.alloc revealed that it actually used mallocByteString, too).

So now, I am not observing any faulty behavior, but I’d surely appreciate a sanity check from another set of eyes.

glguy · July 12, 2023, 4:30pm

This should be

-- typedef struct botan_foo_struct* botan_foo_t;
data BotanStruct -- empty data type used only as a distinguishing tag for pointer types
type BotanPtr = Ptr BotanStruct -- optional type synonym
newtype Botan = MkBotanForeignPtr { getBotanForeignPtr :: ForeignPtr BotanStruct }

-- int botan_foo_init(botan_foo_t* foo);
foreign import ccall unsafe botan_foo_init :: BotanPtr -> IO BotanErrorCode

The type parameter of Ptr is phantom and just helps you distinguish pointers to one type of thing from another. For opaque types you should either make a new, empty data type as I’ve done above, or use the circular newtype pattern that appeared in an earlier post newtype Botan = Mk (ForeignPtr Botan). I like this version less, myself, because I like having the independent type that can be used both to index Ptr and ForeignPtr as we alternate between the two.

You shouldn’t use malloc because BotanStruct won’t be Storable because it’s opaque. You can instead use mallocBytes :: Int -> IO (Ptr a).

Also, don’t use newForeignPtr until you’ve successfully initialized your struct. Presumably you don’t destroy something that isn’t initialized successfully. So first mallocBytes, then initialize and throw on error like you did, then newForeignPtr with the destructor.

ApothecaLabs · July 12, 2023, 6:03pm

So, something rather like this?

fooInit :: IO Mac
fooInit = do
    fooPtr <- mallocBytes _ -- What goes here?
    throwBotanIfNegative_ $ botan_foo_init fooPtr
    fooForeignPtr <- newForeignPtr botan_foo_destroy fooPtr
    return $ MkFooForeignPtr fooForeignPtr

It is this bit that still has me scratching my head. I was originally using Ptr () and malloc because of these two things:

Since a pointer is a pointer, we can of course cast between Ptr () and Ptr OpaqueStruct (to use my initial terminology). We aren’t responsible for initializing the botan_struct memory, we just need to hand it our own pointer that itself won’t be freed - so how many bytes do we need to allocate for the pointer itself?

Since all pointers are the same size, and malloc is just mallocBytes (sizeOf (undefined :: a)), we could use:

fooPtr <- mallocBytes (sizeOf (undefined :: Ptr ()))

Alternatively:

fooPtr <- castPtr <$> malloc @(Ptr ())

Am I crazy? Or just missing something obvious? Or do I have this correct?

Note that I am trying it out in with the next module (Mac) before going back and refitting Hash and Random, and ~~it actually seems to work~~*. For the moment, I am more concerned with whether or not there is any leaky behavior. Once the implementation is memory-safe, the proper Haskell-y abstraction can be debated at our leisure, whereas I cannot move forwards without a working implementation.

* This actually does not work with the way that this library is structured.

For example, take the following function that we must be able to run

-- int botan_mac_output_length(botan_mac_t mac, size_t* output_length);

I am not sure how to pass in the MacStruct here. Previously, when type OpaqueMac = Ptr () and newtype Mac = Mac (ForeignPtr OpaqueMac), the double-pointer seemed to be represented properly, and I could do:

-- Note the `OpaqueMac / Ptr ()` for `botan_mac_t mac`
foreign import ccall unsafe botan_mac_output_length :: OpaqueMac -> Ptr CSize -> IO BotanErrorCode

macOutputLength :: Mac -> IO Int
macOutputLength (MkMac macForeignPtr) = withForeignPtr macForeignPtr $ \ macPtr -> do
    mac <- peek macPtr
    alloca $ \ szPtr -> do
        throwBotanIfNegative_ $ botan_mac_output_length mac szPtr
    fromIntegral <$> peek szPtr

However, with the new style using the uninhabited type, I cannot actually peek because it is not Storable (obviously). Instead, I have to do:

-- Note the `MacPtr` for `botan_mac_t mac`
foreign import ccall unsafe botan_mac_output_length :: MacPtr -> Ptr CSize -> IO BotanErrorCode

macOutputLength :: Mac -> IO Int
macOutputLength (MkMac macForeignPtr) = withForeignPtr macForeignPtr $ \ macPtr -> do
    alloca $ \ szPtr -> do
        throwBotanIfNegative_ $ botan_mac_output_length macPtr szPtr
        fromIntegral <$> peek szPtr

But I feel like botan_mac_t mac (which matched OpaqueMac) should match MacStruct rather than MacPtr. Or is the double-pointer throwing me off again?

glguy · July 12, 2023, 6:31pm

Here’s the cleaned up version

module Help where

import Data.Word (Word32)
import Foreign.C (CString, CInt(..), withCString)
import Foreign.ForeignPtr (ForeignPtr, newForeignPtr, FinalizerPtr)
import Foreign.Marshal.Alloc (alloca)
import Foreign.Ptr (Ptr, FunPtr)
import Foreign.Storable (peek)

data BotanHashStruct

-- int botan_hash_init(botan_hash_t* hash, const char* hash_name, uint32_t flags)
foreign import ccall unsafe botan_hash_init :: Ptr (Ptr BotanHashStruct) -> CString -> Word32 -> IO CInt

-- int botan_hash_destroy(botan_hash_t hash)
foreign import ccall "&botan_hash_destroy" botan_hash_destroy_funptr :: FinalizerPtr BotanHashStruct
-- this is a cheat because it ignores the 'int' return on hash_destroy

newtype BotanHash = BotanHash (ForeignPtr BotanHashStruct)

throwBotan :: CInt -> IO a
throwBotan x = fail ("Botan error: " ++ show x) -- do something better here

newBotanHash :: String -> IO BotanHash
newBotanHash name =
  alloca                                        $ \outPtr ->
  withCString name                              $ \namePtr ->
  botan_hash_init outPtr namePtr 0            >>= \ret ->
  if 0 /= ret then throwBotan ret else
  peek outPtr                                 >>= \out ->
  newForeignPtr botan_hash_destroy_funptr out >>= \fp ->
  pure (BotanHash fp)

ApothecaLabs · July 12, 2023, 6:35pm

I can post the Hash which uses the old style:

-- NOTE: I've elided irrelevant arguments

-- typedef struct botan_hash_struct* botan_hash_t;
type OpaqueHash = Ptr ()

newtype Hash = Hash { hashForeignPtr :: ForeignPtr OpaqueHash }

-- int botan_hash_init(botan_hash_t* hash, ...);
foreign import ccall unsafe botan_hash_init :: Ptr OpaqueHash -> ... -> IO BotanErrorCode
foreign import ccall "&botan_hash_destroy" botan_hash_destroy :: FunPtr (Ptr OpaqueHash -> IO ())

hashInit :: ByteString -> IO Hash
hashInit name = do
    hashForeignPtr <- malloc >>= newForeignPtr botan_hash_destroy
    withForeignPtr hashForeignPtr $ \ hashPtr -> do
        throwBotanIfNegative_ $ botan_hash_init hashPtr ...
    return $ Hash hashForeignPtr

-- int botan_hash_output_length(botan_hash_t hash, size_t* output_length);
foreign import ccall unsafe botan_hash_output_length :: OpaqueHash -> Ptr CSize -> IO BotanErrorCode

hashOutputLength :: Hash -> IO Int
hashOutputLength (Hash hashForeignPtr) = withForeignPtr hashForeignPtr $ \ hashPtr -> do
    hash <- peek hashPtr
    alloca $ \ szPtr -> do
        throwBotanIfNegative_ $ botan_hash_output_length hash szPtr
        fromIntegral <$> peek szPtr

You can see the hash <- peek hashPtr, which is a similar scenario to the issue mentioned.

ApothecaLabs · July 12, 2023, 6:55pm

I shall give this a try momentarily - right now I need some lunch

Despite any confusion, all of this has been wonderfully illustrative as to the proper internal workings of Haskell FFI. Taking the time to do it right, now, will save us from pain, later.

Ah, yes, you see it too. I havent figured out a way around this one - I’m not even sure where throwBotanIfNegative would throw to, even if we could (or what our response would be)!

back from lunch

If I could give this a second heart, I would. This has been a most insightful discussion. I believe this answers all of the minor quibbles and questions that I had previously regarding safe and proper initialization. Furthermore, it is more ergonomic, effectively moving the peek from all of the accessory functions into the initializer.

I have additionally implemented one of the accessory functions, to sanity check our definitions, and it works out nicely:

Barring any unforeseen questions, I will be retrofitting the other modules with these learnings.

ApothecaLabs · July 23, 2023, 10:28pm

I’m back with another headscratcher.

I have the following FFI function that is expecting an array of pointers to multiple buffers of known size as input. Simplifying to CStrings, It has a declaration similar to the following:

-- int use_multiple_cstrings(cchar *const*const inputs, size_t numBuffers, ...);
foreign import ccall unsafe use_multiple_cstrings :: Ptr (Ptr CChar) -> CSize -> ... -> IO BotanErrorCode

If we knew the number of buffers ahead of time (eg, 3), we could just write it out:

useMultipleCStrings :: ByteString -> ByteString -> ByteString -> IO ()
useMultipleCStrings a b c = do
    useAsCString a $ \ aPtr -> do
        useAsCString b $ \ bPtr -> do
            useAsCString c $ \ cPtr -> do
                allocaArray 3 $ \ (ptrArrayPtr :: Ptr (Ptr CChar)) -> do
                    let ptrs = [aPtr,bPtr,cPtr]
                    pokeArray ptrArrayPtr ptrs
                    throwBotanIfNegative_ $ use_multiple_cstrings ptrArrayPtr 3

However, the number of buffers / cstrings is not known ahead of time. I could just unsafely reach in to grab the pointers eg, unsafeUseAsCString return, but that’s dangerous (what if the bytestring gets move, etc).

Is there an idiomatic way of doing this safely? I feel like I should be able to do something with join or foldr or traverse with useAsCString to get useAsCStringArray :: [ByteString] -> ([Ptr CChar] -> IO a) -> IO a, but it eludes me for the moment.

I’ve whipped up an interim solution which works.

withPtrs :: (forall a . typ -> (ptr -> IO a) -> IO a) -> [typ] -> ([ptr] -> IO b) -> IO b
withPtrs withPtr []         act = act []
withPtrs withPtr (typ:typs) act = withPtr typ $ \ typPtr -> withPtrs withPtr typs (act . (typPtr:))

It seems to work fine:

> withPtrs withCString ["foo","bar","qux"] (print <=< mapM peekCString)
["foo","bar","qux"]

I still think it could be done with foldr, maybe I’ll return to this when I feel like playing code golf again.

Ambrose · July 24, 2023, 12:32am

I honestly think your solution seems fine? Unless I’m missing some weakness it has.

ApothecaLabs · July 24, 2023, 12:35am

It’s mostly curiosity now that there’s a solution

glguy · July 24, 2023, 1:28am

unsafeUseAsCString isn’t unsafe because of GC-related movement, but you can read about what is unsafe about it in the docs.

glguy · July 24, 2023, 1:35am

When I needed to work with a lot of this kind of “withPtr” FFI, I made a newtype for it.

NestedIO is for IO stuff that needs to operate in these nested “with” contexts.

This newtype allows me to implement a wrapper for any of the “withPtr”-style functions with nest1 :: (forall r. (a -> IO r) -> IO r) -> NestedIO a

With this I can take withText :: Text -> NestedIO FgnStringLen and then use traverse to do a version of what you’re asking about:

traverse withText :: [Text] -> NestedIO [FgnStringLen]

Then you can see this in a larger example like in withRawIrcMsg where a great many “withPtr” like operations are all chained together.

ApothecaLabs · July 24, 2023, 1:44am

Indeed - that is why I avoided unsafely reaching in to grab the pointer via unsafeUseAsCString return - I probably should have specified ‘collected’ instead of ‘move(d)’. (Does getting garbage-collected count as moving?)

This is most excellent - I already have a WithPtr type-alias, which is but a step away from it:

type WithPtr typ ptr = (forall a . typ -> (ptr -> IO a) -> IO a)

I shall definitely have to give some thought towards refining things further. I especially like the trick with traverse

glguy · July 24, 2023, 2:05am

The pointer can’t be collected when using it with that operation. That’s not how it is unsafe. It’s the same mechanism as withPtr or alloca.

ApothecaLabs · July 24, 2023, 2:42am

That was a small joke in reference to my linguistic faux pas, equivalent to calling deleting a file “moving it to /dev/null”, that sort of thing. I understand how unsafeUseAsCString is unsafe - it allows breaking referential transparency (editing the bytes edits the bytestring and any other sharing the same backing memory) and the pointer’s lifetime is scoped to wrapped action (so don’t use the pointer after).

Apologies for any confusion caused.

wiz · July 24, 2023, 7:52am

I prefer this to having a newtype over Ptr () (that would be void* and you can pull it out thin air) or Ptr Uninhabited.

It protects the pointer, removing the pointer details from public API while keeping it in one piece, without splitting into two or three distinct entities only to recombine them back again.

jackdk · July 24, 2023, 12:41pm

This is how I solved a similar problem in my libtelnet binding:

-- | Collect '[ByteString]' into a temporary array of strings in a
-- 'Ptr CString', for passing to C functions.
useAsCStrings :: [ByteString] -> (Ptr CString -> IO a) -> IO a
useAsCStrings list f = go list [] where
  go [] css = withArray (reverse css) f
  go (bs:bss) css = B.useAsCString bs $ \cs -> go bss (cs:css)