Expert Advice Needed: Best Resources for Unsafe FFI, Memory Safety, Real-Time Audio Threads, and GHC Core Optimization

I’m looking for the best resources on the following topics:

1 Unsafe FFI:

  • Comprehensive guides or tutorials
  • Best practices and common pitfalls
  • Case studies or examples

2 Ensuring Safe Memory Operations:

  • Strategies for managing and terminating safe memory operations in various components
  • Tools and techniques for memory safety
  • Relevant research papers or article

3 Real-time Threads for Audio:

  • Resources on implementing real-time threads specifically for audio applications
  • Performance considerations and optimization tips
  • Examples of real-time audio processing in practice

4 Additionally, I’m interested in:

  • Techniques and Tools for Code Optimization through GHC Core Analysis:
  • Guides on understanding and analyzing GHC Core code
  • Any material useful in this area

Any recommendations for books, articles, tutorials, youtube videos, blog posts, documents leaks, secrete scripturas on these subjects would be greatly appreciated!

3 Likes

This depends what you mean by realtime threads. Although I ‘m not sure that you are asking about “real” realtime threads, here is Linux’ documentation of it anyways (which does not technically need a Preempt RT Kernel, but using one is preferred):
https://wiki.linuxfoundation.org/realtime/documentation/technical_basics/start
Basically you “just” need to set a realtime (higher) priority of your realtime threads - and not too high.
https://wiki.linuxfoundation.org/realtime/documentation/technical_basics/sched_policy_prio/start

1 Like

The general answer is “only use if benchmarks show a meaningful speedup”. The overhead of the safe wrapper is negligible (~100 ns), and unsafe calls stall garbage collection (as per the guide), so if you’re relying on multithreading you may easily lose that time across other threads.

2 Likes

For understanding FFI calls and intricacies, I found myself regularly referring back to the GHC User’s Guide. (BurningWitness linked it above.)

There have been a few videos in recent years regarding optimizing Haskell programs with the help of Core. Alexis King made a great video: https://www.youtube.com/watch?app=desktop&v=yRVjR9XcuPU
and Richard Eisenberg did a ton of Haskell videos a couple years ago on the same channel-- I think a few were concerning Core & optimization.

Another good entry is reading existing trusted code that makes use of the FFI. I recommend bytestring and text. There’s a good amount of inline documentation.

2 Likes

If you want “real” realtime threads, the GC isn’t allowed to run anyways (by passing -I0 to RTS).

2 Likes

That was part of the reason I wondered about it in the first place. Since GC creates problems with real-time audio, especially with low latencies, unsafe calls (used wisely) could be a way to deal with this. This makes FFI in Haskell as efficient performance-wise as a systems language (same as Rust, for example), according to the benchmark I saw. The problem is that one starts to deal with memory, and things not common in Haskell.

This is not exactly what you’re asking about, but I work on an audio sequencer in haskell: GitHub - elaforge/karya: music sequencer and generalized notation

There’s plenty of unsafe FFI, but I don’t directly do realtime audio. At most I schedule realtime things, and it’s the OS audio or MIDI system or DAW that does the realtime part. For DSP and synthesis I call FAUST-generated or C library code via FFI. I think this is the practical path, but of course it depends on your goals.

One package I know of that is doing fully real time synthesis in haskell is synthesizer-core: Audio signal processing coded in Haskell: Low level part, as far as I know he has to be careful about optimization to make sure it stays realtime and doesn’t get stuck in the GC at an awkward time.

2 Likes

Not really, pretty much anything you do in the language allocates stuff, so garbage collection has to run eventually. If you’re writing a hard real-time application, this won’t cut it.

For soft real-time things get muddy. Contrived microbenchmarks don’t tell the full story, which is that while the language has all the tools to do the job, people don’t have any grand new architectural approaches to make it work nicely. State is generally managed as giant blobs of immutable records (which is questionable performance-wise and fosters the need for lens), and deterministic memory management is generally achieved through imperative control flow (where ResourceT is the goto solution; here’s the reverse dependency list). I personally dislike this heavily because it means writing same old high-level imperative code in a language with worse syntax for it, but I can’t deny that it works.

Are there better ways to structure things? Probably, but I haven’t seen it done.

You don’t allocate anything as soon as a realtime thread is started, you (re-)use the memory you allocated before “entering” realtime or allocate in the non-realtime parts and access that from the realtime thread(s). And afterwards you can just exit the process instead of deallocation anything.

True, but all of that would be squarely outside of Haskell, programmed in an imperative language without garbage collection. I guess this means that the Haskell program in this design would be a soft real-time one serving the hard real-time C thread.

No, “just” disable the garbage collector and don’t allocate anything. So especially everything needs to be unboxed and mutated, so almost exclusively Prim GHC.Prim and mutable, unpacked arrays. Of course, that’s not exactly what Haskell is good at. But than again, we all know that Haskell is useless anyway :wink:

The easiest way would be JUCE (bindings to JUCE), I guess (I don’t know much about audio though) https://juce.com/

At which point you’d be using Haskell as C, since you’re barred from using pretty much all standard Haskell functions, with the added bonus that any allocation mistake you make silently permanently leaks memory. It’s a thing you can do for fun, but real applications should never touch this.

Hello @smoge, that’s a great list of resource to have, thanks for starting such a collection :slight_smile:

With regards to real-time audio, I guess that would depends on the underlying operating system. For example, you will want to use Jack (now pipewire) on Linux or ASIO on Windows. There is already a binding for Jack, which provides such helper: Sound.Jack.Audio.withProcessMono where you would implement a Sample -> IO Sample function.

Otherwise you can use something like SDL.Audio where you would implement a callback to fill the next buffer to be played. That’s what I’m using in the simple-dsp audio player, and that works well at 30 buffers per second with a basic sound card, see: simple-dsp:Player. At this rate, using a realtime thread does not seem necessary, even running the code in ghci without any optimizations is sufficient, at least for the playback part.

There is also FMOD which is geared at video games, there are no official bindings, but here is an example usage: defect-process setup doc.

It would help to know what you are trying to achieve!
Cheers,
-Tristan

1 Like

Even less, you’re using the realtime-safe part of C (or C++). Oh, and I forgot to mention: no exceptions allowed either.

There is also OpenAL Soft, an open-source version of ol’ dusty OpenAL that has been maintained and extended for 15 years. Got bindings for it here.

" GHC, since version 8.4, guarantees that garbage collection will never occur during an unsafe call, even in the bytecode interpreter, and further guarantees that unsafe calls will be performed in the calling thread. Making it safe to pass heap-allocated objects to unsafe functions."

https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/ffi.html

I’m pretty sure you can’t disable gc, only idle gc. So accidentally allocating “just” destroys your realtime guarantee, but is otherwise safe. Only way I can think of that actually disables gc is to not return from an unsafe ffi call, but even there I don’t think it stops minor gc.

Next to using only low level primitives, you also have the (terrible) option of writing normal haskell code and validating manually that ghc optimizes well enough to not allocate.

l’ve always wondered what it would take to have some sort of NoRTS pragma that causes GHC to statically verify your function/module/package/etc doesn’t have any dependencies on the RTS. Kind of like a first-class inspection testing. And then said Haskell programs/libraries could optionally be compiled & distributed sans RTS.

1 Like

1 It is necessary to switch between unsafe and safe depending on the FFI execution time.
Since the context switch is on the μs order, unsafe is better if it is less than 1 μs. A system call should take a similar amount of time. Also consider the number of calls.
2 If it is necessary to call an external destructor using WeakPtr with ForeignPtr, the GC time will increase in proportion to the number of pointers generated.
There is no problem when allocating memory with GHC.
3 Compare asm codes. Perf for low-level profiling - School of Haskell | School of Haskell Perf can show the bottleneck.
Be careful with memory layout. In general, vectors have a higher cache hit rate than lists, so they are faster. Lists of lists easily lose several μs.

2 Likes