Looking for feedback on llama.cpp bindings

Hey folks, I’m working on some bindings for llama.cpp, and before I try pushing this to Hackage I was hoping to get some feedback from others in the community. Long-term my goal is to try embedding some LLM functionality in my own home automation system, but I’m relatively inexperienced at both FFI and anything to do with LLMs, so I’d appreciate any feedback anyone has.

I also figured others may be interested in messing around with this and helping me figure out how to approach an intermediate API layer on top of the low-level FFI interface I’ve got now–I haven’t yet figured out what a good intermediate wrapper would look like for this. I’d also like to add more examples mimicking what is present in llama.cpp…and make the main example feature complete with what is in llama.cpp…and add tests…etc. etc.

In any case, if anyone wants to poke at it with me let me know, or just give me some feedback in the github repo, etc. Thanks for taking a look!

(P.S. here is where I shamelessly mention that I’m looking for employment, in Haskell or otherwise, if anyone is looking for a curious, experienced, friendly developer :sweat_smile:)

5 Likes

I haven’t used chs for my FFI needs so far. Did you pick it over hsc2hs for any particular reasons or feature set I could learn from?

I normally would map uint32_t to Word32. It looks like you’re using CUInt and friends, instead. Is that deliberate?

Hey @glguy thanks for taking a look! So, my answer to both of your questions is, essentially, inexperience–I’m a pretty mediocre C programmer and have only read others’ FFI code up until now. To address your second question first, I probably should have specifically asked about my type choices in my original post because I felt pretty clueless when I was trying to figure out how to map types from C to Haskell–my choices were based largely on fumbling through the complaints the typechecker/c2hs gave me along with making my best guess. If you have suggestions on how I can get a better foundation here I’d appreciate it, I couldn’t find much guidance around this. In any case I’ll take another look through my type choices and see what else I can learn and adjust there.

Wrt the reason I chose c2hs, this was based on my assumption that it could hopefully help me make fewer mistakes–from my limited understanding, c2hs is higher-level and is more comprehensive in terms of what it produces (I think I got that idea from this SO post in particular…ezyang’s guide that image comes from was a big help as well, I’ll add). I also leaned heavily on the TensorFlow FFI code as a reference, as it’s a similar domain with similar patterns of usage, and it uses c2hs (and this is also where some of my type choices probably come from now that I think of it, going back to your other question).

1 Like

@glguy to follow up, I just read this section properly for the first time–doesn’t this imply I’d always want to use the C* types when writing FFI code to get the guarantees as mentioned? Or were you more suggesting that I might prefer to make the Haskell interface of all these API calls only expose e.g. Word32?

I’m thinking about how you should use CInt with int and CUInt with unsigned, but don’t guess which C newtype corresponds to uint32_t. For that case it’s Word32.

I see, I was conflating unsigned types, took me a bit to understand my mistake. Thanks!

Hi,

I haven’t really tried this but from my basic experiments with Llama, I could say that what I’d really want is 1) the ability to constrain the output grammar and 2) an example of a good model to start with.

If you are able to give (1) as a feature and (2) as an example, I may be able to put this to very good use :slight_smile: