Questions about FFI, ForeignPtr, and Opaque Types

I am working on bindings to the Botan cryptography library, but need some help with the FFI. Context: I have done a fair amount of C in the past, but it has been a while (10 years), so I am a bit rusty. I have messed around with Foreign.Ptr and the FFI before, but nothing this complex before.

I need to implement the following C interface, using the Haskell FFI:

typedef opaque *botan_hash_t

    An opaque data type for a hash. Don’t mess with it.

int botan_hash_init(botan_hash_t hash, const char *hash_name, uint32_t flags)

    Creates a hash of the given name, e.g., “SHA-384”.

    Flags should always be zero in this version of the API.

int botan_hash_destroy(botan_hash_t hash)

    Destroy the object created by botan_hash_init.

int botan_hash_name(botan_hash_t hash, char *name, size_t *name_len)

    Write the name of the hash function to the provided buffer.

I have found Foreign.ForeignPtr which sounds like precisely what I need to run the finalizer automatically upon garbage collect, but I do not know how to represent the botan_hash_t object properly, nor make it Storable. I have read as many docs as I can find, but the recent degradation of Google and Reddit has make locating example material difficult, so I am turning here.

This may rightfully horrify you, but so far, I have:

type OpaqueHash = Ptr () -- ??? I know this is wrong
newtype Hash = Hash (ForeignPtr OpaqueHash) -- I suspect this is wrong

foreign import ccall unsafe botan_hash_init :: Ptr OpaqueHash -> Ptr CChar -> Word32 -> IO BotanErrorCode
foreign import ccall "&botan_hash_destroy" botan_hash_destroy :: FinalizerPtr OpaqueHash

hashInit :: ByteString -> IO Hash
hashInit name = do
    opaqueHash <- mallocForeignPtr
    withForeignPtr opaqueHash $ \ opaqueHash' -> do
        withByteArray name $ \ name' -> do
            throwBotanIfNegative_ $ botan_hash_init opaqueHash' name' 0
    addForeignPtrFinalizer botan_hash_destroy opaqueHash
    return $ Hash opaqueHash

foreign import ccall unsafe botan_hash_name :: Ptr OpaqueHash -> Ptr CChar -> Ptr CSize -> IO BotanErrorCode

hashName :: Hash -> IO ByteString
hashName (Hash opaqueHash) = alloca $ \ szPtr -> do
    bs <- ByteArray.alloc 64 $ \ bytes -> do
        withForeignPtr opaqueHash $ \ opaqueHash' -> do
            throwBotanIfNegative_ $ botan_hash_name opaqueHash' bytes szPtr
    sz <- peek szPtr
    return $ ByteArray.take (fromIntegral sz) bs

This compiles, and h <- hashInit "SHA-256" runs. However, it appears that it is not initialized properly, because hashName h will (correctly) trigger throwBotanIfNegative_ - but it can’t be too far off, because it doesn’t segfault or get into serious trouble. What am I doing wrong here, and how do I properly represent the OpaqueHash type?


Additionally, as another knowledge check, I would like to implement this, due to the additional indirection at botan_hash_t *dest. I will need to be able to implement many similar cases.

int botan_hash_copy_state(botan_hash_t *dest, const botan_hash_t source)

    Copies the state of the hash object to a new hash object.

Are there any particularly good libraries or comprehensive examples of ForeignPtr usage with opaque C data types? If you have experience with implementing similar FFI binding needs, I wouldn’t mind a chance to pick your brain and ask more pointed questions.

Here’s my FFI code from libtelnet (GPLv3+), which might be interesting to you.

1 Like

Can you use mallocForeignPtrBytes? And get the size of the opaque (botan_hash_t) type from hsc?

1 Like

@jackdk

I will take a good look. I do see this uninhabited type, which is similar to a few of the resources I did find:

-- | Uninhabited type for pointer safety (@telnet_t@).
data TelnetT

However, the telnet library has init functions of the form telnet_t* telnet_init(...) that directly return the initialized pointer. Botan has some functions of the form int botan_hash_init(botan_hash_t hash, ...) If we were to use an uninhabited data Hash like TelnetT, I do not understand what to pass in to botan_hash_t, especially if botan_hash_t * corresponds to Ptr Hash. I am actually quite confused by this. Botan has a documentation error - I thought I was going mad :crazy_face:! The type of botan_hash_init is actually int botan_hash_init(botan_hash_t * hash, ...), which makes more sense. This does not get me to an initialized hash object quite yet, but it gets me much closer.

I also do see that newForeignPtr does not require Storable which seems like it would be easier to implement, though per Foreign.ForeignPtr docs:

Use of mallocForeignPtr and associated functions is strongly recommended in preference to newForeignPtr with a finalizer.


@Ambrose I may have to go that route - I was hoping to avoid using either CPP or *.hsc (I do not mind unrolling things by hand) but I may have to since unlike constant enum values, pointer size values are machine-dependent.

I quite like using hsc. And this sort of stuff is what it’s meant for. It does have a couple of rough corners. It doesn’t play nice with ghci reloading (unless there’s a way to do this) and it’s super slow when cross-compiling.

1 Like

You probably can’t implement Storable if the library keeps the type opaque. However, the size of ForeignPtr ... is known since it’s just a pointer.

1 Like

Only possible if you want the freeFinalizer — in my case I have to use the destroy function provided by libtelnet.

1 Like

This is somewhat orthogonal to the question about ForeignPtr: are the foreign import types correct?

It seems to me that botan_init_hash has the correct type Ptr (Ptr ()) -> ..., but botan_hash_name does not. I think the latter should be Ptr () -> ....

This isn’t exactly the code you were trying to write, but the following seems to return something sensible for me (48 for SHA-384):

main = do
  outputLength <- botanHashLength "SHA-384"
  print outputLength

type OpaqueHash = Ptr Word8

foreign import ccall unsafe "botan_hash_output_length" 
  botan_hash_output_length :: OpaqueHash -> Ptr CSize -> IO Int

foreign import ccall unsafe "botan_hash_init" 
  botan_hash_init :: Ptr OpaqueHash -> Ptr CChar -> Word32 -> IO Int

botanHashLength :: String -> IO Int
botanHashLength string = withCString string $ \ptr -> do
  ppHash <- malloc
  botan_hash_init ppHash ptr 0
  pHash <- peek ppHash
  pSize <- malloc
  botan_hash_output_length pHash pSize
  fromIntegral <$> peek pSize

EDIT: This is very rough, but for setting up the ForeignPtr properly, the following seems to work for me:


type OpaqueHash = Ptr ()

foreign import ccall unsafe "botan_hash_name" 
  botan_hash_name :: OpaqueHash -> Ptr CChar -> Ptr CSize -> IO Int

foreign import ccall unsafe "&botan_hash_destroy" 
  botan_hash_destroy :: FunPtr (OpaqueHash -> IO ())

foreign import ccall unsafe "botan_hash_output_length" 
  botan_hash_output_length :: OpaqueHash -> Ptr CSize -> IO Int

foreign import ccall unsafe "botan_hash_init" 
  botan_hash_init :: Ptr OpaqueHash -> Ptr CChar -> Word32 -> IO Int


newtype Hash = Hash (ForeignPtr ())

hashInit 
  :: String
  -> IO Hash
hashInit string = withCString string $ \ptr -> do
  ppHash <- malloc
  botan_hash_init ppHash ptr 0
  pHash <- peek ppHash
  fp <- newForeignPtr botan_hash_destroy pHash
  return $ Hash fp

botanHashName 
  :: Hash
  -> IO String
botanHashName hash@(Hash fp) = withForeignPtr fp $ \pHash -> do
  pSize <- malloc
  botan_hash_output_length pHash pSize
  reqLen <- peek pSize
  pCChar <- mallocArray (fromIntegral reqLen)
  botan_hash_name pHash pCChar pSize
  peekCString pCChar

main = do
  hash <- hashInit "SHA-384"
  name <- botanHashName hash
  print name
1 Like

Thank you - this helped confirm what I needed to do.


I see now! This was most helpful! :slight_smile:


Indeed I am giving them closer scrutiny after discovering that some things were incorrectly documented. Your comment has been very helpful, and you’ve expanded on your answer greatly as I was integrating the results of your initial comment with what everyone else has said - but it appears we converged something very similar regardless!


It is late, but I think I have the clarity to move forward tomorrow with the Botan.Hash module. Thank you all so much for your very helpful pointers! :smiley:

2 Likes

One thing I’ve seen some libraries do is to do the foreign pointers like this:

newtype Hash = Hash (ForeignPtr Hash)

I believe it was haskell-gi that does things that way.

3 Likes

If someone wouldn’t mind sanity checking this for me, I want to make sure that this initialization pattern is sane and doesn’t leak:

-- typedef struct botan_foo_struct* botan_foo_t;
type OpaqueFoo = Ptr ()
newtype Foo = Foo { fooForeignPtr :: ForeignPtr OpaqueFoo }

-- int botan_foo_init(botan_foo_t* foo);
foreign import ccall unsafe botan_foo_init :: Ptr OpaqueFoo -> IO BotanErrorCode

-- int botan_foo_destroy(botan_foo_t foo);
foreign import ccall "&botan_foo_destroy" botan_foo_destroy :: FunPtr (Ptr OpaqueFoo -> IO ())

fooInit :: IO Foo
fooInit name = do
    fooForeignPtr <- malloc >>= newForeignPtr botan_foo_destroy
    withForeignPtr fooForeignPtr $ \ fooPtr -> do
        throwBotanIfNegative_ $ botan_foo_init fooPtr
    return $ Foo fooForeignPtr

My concern is that Botan's design is to pass a pointer-pointer in to the initializer and return an success-error code, rather than return an initialized pointer. Being unfamiliar with ForeignPtr, I want to ensure that the pointer that we malloc for use with botan_foo_init is itself destroyed with botan_foo_destroy when the ForeignPtr is garbage collected, and not just the opaque object that it was pointing to.

I think this is correct, because when I used alloca instead of malloc (because I was replacing ByteArray.alloc), I got exceptions because long-lived references were having their pointers freed - but what I don’t want now is the opposite problem of leaking memory. I made the swap to malloc after reading the docs more closely (and checking the source of ByteArray.alloc revealed that it actually used mallocByteString, too).

So now, I am not observing any faulty behavior, but I’d surely appreciate a sanity check from another set of eyes.

This should be

-- typedef struct botan_foo_struct* botan_foo_t;
data BotanStruct -- empty data type used only as a distinguishing tag for pointer types
type BotanPtr = Ptr BotanStruct -- optional type synonym
newtype Botan = MkBotanForeignPtr { getBotanForeignPtr :: ForeignPtr BotanStruct }

-- int botan_foo_init(botan_foo_t* foo);
foreign import ccall unsafe botan_foo_init :: BotanPtr -> IO BotanErrorCode

The type parameter of Ptr is phantom and just helps you distinguish pointers to one type of thing from another. For opaque types you should either make a new, empty data type as I’ve done above, or use the circular newtype pattern that appeared in an earlier post newtype Botan = Mk (ForeignPtr Botan). I like this version less, myself, because I like having the independent type that can be used both to index Ptr and ForeignPtr as we alternate between the two.

You shouldn’t use malloc because BotanStruct won’t be Storable because it’s opaque. You can instead use mallocBytes :: Int -> IO (Ptr a).

Also, don’t use newForeignPtr until you’ve successfully initialized your struct. Presumably you don’t destroy something that isn’t initialized successfully. So first mallocBytes, then initialize and throw on error like you did, then newForeignPtr with the destructor.

4 Likes

So, something rather like this?

fooInit :: IO Mac
fooInit = do
    fooPtr <- mallocBytes _ -- What goes here?
    throwBotanIfNegative_ $ botan_foo_init fooPtr
    fooForeignPtr <- newForeignPtr botan_foo_destroy fooPtr
    return $ MkFooForeignPtr fooForeignPtr

It is this bit that still has me scratching my head. I was originally using Ptr () and malloc because of these two things:

Since a pointer is a pointer, we can of course cast between Ptr () and Ptr OpaqueStruct (to use my initial terminology). We aren’t responsible for initializing the botan_struct memory, we just need to hand it our own pointer that itself won’t be freed - so how many bytes do we need to allocate for the pointer itself?

Since all pointers are the same size, and malloc is just mallocBytes (sizeOf (undefined :: a)), we could use:

fooPtr <- mallocBytes (sizeOf (undefined :: Ptr ()))

Alternatively:

fooPtr <- castPtr <$> malloc @(Ptr ())

Am I crazy? Or just missing something obvious? Or do I have this correct?

Note that I am trying it out in with the next module (Mac) before going back and refitting Hash and Random, and it actually seems to work*. For the moment, I am more concerned with whether or not there is any leaky behavior. Once the implementation is memory-safe, the proper Haskell-y abstraction can be debated at our leisure, whereas I cannot move forwards without a working implementation.


* This actually does not work with the way that this library is structured.

For example, take the following function that we must be able to run

-- int botan_mac_output_length(botan_mac_t mac, size_t* output_length);

I am not sure how to pass in the MacStruct here. Previously, when type OpaqueMac = Ptr () and newtype Mac = Mac (ForeignPtr OpaqueMac), the double-pointer seemed to be represented properly, and I could do:

-- Note the `OpaqueMac / Ptr ()` for `botan_mac_t mac`
foreign import ccall unsafe botan_mac_output_length :: OpaqueMac -> Ptr CSize -> IO BotanErrorCode

macOutputLength :: Mac -> IO Int
macOutputLength (MkMac macForeignPtr) = withForeignPtr macForeignPtr $ \ macPtr -> do
    mac <- peek macPtr
    alloca $ \ szPtr -> do
        throwBotanIfNegative_ $ botan_mac_output_length mac szPtr
    fromIntegral <$> peek szPtr

However, with the new style using the uninhabited type, I cannot actually peek because it is not Storable (obviously). Instead, I have to do:

-- Note the `MacPtr` for `botan_mac_t mac`
foreign import ccall unsafe botan_mac_output_length :: MacPtr -> Ptr CSize -> IO BotanErrorCode

macOutputLength :: Mac -> IO Int
macOutputLength (MkMac macForeignPtr) = withForeignPtr macForeignPtr $ \ macPtr -> do
    alloca $ \ szPtr -> do
        throwBotanIfNegative_ $ botan_mac_output_length macPtr szPtr
        fromIntegral <$> peek szPtr

But I feel like botan_mac_t mac (which matched OpaqueMac) should match MacStruct rather than MacPtr. Or is the double-pointer throwing me off again?

Here’s the cleaned up version

module Help where

import Data.Word (Word32)
import Foreign.C (CString, CInt(..), withCString)
import Foreign.ForeignPtr (ForeignPtr, newForeignPtr, FinalizerPtr)
import Foreign.Marshal.Alloc (alloca)
import Foreign.Ptr (Ptr, FunPtr)
import Foreign.Storable (peek)

data BotanHashStruct

-- int botan_hash_init(botan_hash_t* hash, const char* hash_name, uint32_t flags)
foreign import ccall unsafe botan_hash_init :: Ptr (Ptr BotanHashStruct) -> CString -> Word32 -> IO CInt

-- int botan_hash_destroy(botan_hash_t hash)
foreign import ccall "&botan_hash_destroy" botan_hash_destroy_funptr :: FinalizerPtr BotanHashStruct
-- this is a cheat because it ignores the 'int' return on hash_destroy

newtype BotanHash = BotanHash (ForeignPtr BotanHashStruct)

throwBotan :: CInt -> IO a
throwBotan x = fail ("Botan error: " ++ show x) -- do something better here

newBotanHash :: String -> IO BotanHash
newBotanHash name =
  alloca                                        $ \outPtr ->
  withCString name                              $ \namePtr ->
  botan_hash_init outPtr namePtr 0            >>= \ret ->
  if 0 /= ret then throwBotan ret else
  peek outPtr                                 >>= \out ->
  newForeignPtr botan_hash_destroy_funptr out >>= \fp ->
  pure (BotanHash fp)
2 Likes

I can post the Hash which uses the old style:

-- NOTE: I've elided irrelevant arguments

-- typedef struct botan_hash_struct* botan_hash_t;
type OpaqueHash = Ptr ()

newtype Hash = Hash { hashForeignPtr :: ForeignPtr OpaqueHash }

-- int botan_hash_init(botan_hash_t* hash, ...);
foreign import ccall unsafe botan_hash_init :: Ptr OpaqueHash -> ... -> IO BotanErrorCode
foreign import ccall "&botan_hash_destroy" botan_hash_destroy :: FunPtr (Ptr OpaqueHash -> IO ())

hashInit :: ByteString -> IO Hash
hashInit name = do
    hashForeignPtr <- malloc >>= newForeignPtr botan_hash_destroy
    withForeignPtr hashForeignPtr $ \ hashPtr -> do
        throwBotanIfNegative_ $ botan_hash_init hashPtr ...
    return $ Hash hashForeignPtr

-- int botan_hash_output_length(botan_hash_t hash, size_t* output_length);
foreign import ccall unsafe botan_hash_output_length :: OpaqueHash -> Ptr CSize -> IO BotanErrorCode

hashOutputLength :: Hash -> IO Int
hashOutputLength (Hash hashForeignPtr) = withForeignPtr hashForeignPtr $ \ hashPtr -> do
    hash <- peek hashPtr
    alloca $ \ szPtr -> do
        throwBotanIfNegative_ $ botan_hash_output_length hash szPtr
        fromIntegral <$> peek szPtr

You can see the hash <- peek hashPtr, which is a similar scenario to the issue mentioned.

I shall give this a try momentarily - right now I need some lunch :slight_smile:


Despite any confusion, all of this has been wonderfully illustrative as to the proper internal workings of Haskell FFI. Taking the time to do it right, now, will save us from pain, later.

Ah, yes, you see it too. I havent figured out a way around this one - I’m not even sure where throwBotanIfNegative would throw to, even if we could (or what our response would be)!


back from lunch

If I could give this a second heart, I would. This has been a most insightful discussion. I believe this answers all of the minor quibbles and questions that I had previously regarding safe and proper initialization. Furthermore, it is more ergonomic, effectively moving the peek from all of the accessory functions into the initializer.

I have additionally implemented one of the accessory functions, to sanity check our definitions, and it works out nicely:

Barring any unforeseen questions, I will be retrofitting the other modules with these learnings.

1 Like

I’m back with another headscratcher.

I have the following FFI function that is expecting an array of pointers to multiple buffers of known size as input. Simplifying to CStrings, It has a declaration similar to the following:

-- int use_multiple_cstrings(cchar *const*const inputs, size_t numBuffers, ...);
foreign import ccall unsafe use_multiple_cstrings :: Ptr (Ptr CChar) -> CSize -> ... -> IO BotanErrorCode

If we knew the number of buffers ahead of time (eg, 3), we could just write it out:

useMultipleCStrings :: ByteString -> ByteString -> ByteString -> IO ()
useMultipleCStrings a b c = do
    useAsCString a $ \ aPtr -> do
        useAsCString b $ \ bPtr -> do
            useAsCString c $ \ cPtr -> do
                allocaArray 3 $ \ (ptrArrayPtr :: Ptr (Ptr CChar)) -> do
                    let ptrs = [aPtr,bPtr,cPtr]
                    pokeArray ptrArrayPtr ptrs
                    throwBotanIfNegative_ $ use_multiple_cstrings ptrArrayPtr 3

However, the number of buffers / cstrings is not known ahead of time. I could just unsafely reach in to grab the pointers eg, unsafeUseAsCString return, but that’s dangerous (what if the bytestring gets move, etc).

Is there an idiomatic way of doing this safely? I feel like I should be able to do something with join or foldr or traverse with useAsCString to get useAsCStringArray :: [ByteString] -> ([Ptr CChar] -> IO a) -> IO a, but it eludes me for the moment.


I’ve whipped up an interim solution which works.

withPtrs :: (forall a . typ -> (ptr -> IO a) -> IO a) -> [typ] -> ([ptr] -> IO b) -> IO b
withPtrs withPtr []         act = act []
withPtrs withPtr (typ:typs) act = withPtr typ $ \ typPtr -> withPtrs withPtr typs (act . (typPtr:))

It seems to work fine:

> withPtrs withCString ["foo","bar","qux"] (print <=< mapM peekCString)
["foo","bar","qux"]

I still think it could be done with foldr, maybe I’ll return to this when I feel like playing code golf again.

I honestly think your solution seems fine? Unless I’m missing some weakness it has.

It’s mostly curiosity now that there’s a solution :laughing:

unsafeUseAsCString isn’t unsafe because of GC-related movement, but you can read about what is unsafe about it in the docs.