Hsc2hs and C union declarations

anarchymatt · April 13, 2024, 4:10am

I’m working on wlhs, a library that will try to use the C ffi to create a Haskell wrapper for the Wlroots library, which aids in the development of Wayland compositors.

We have a templating language to make writing the bindings go a little more smoothly, but I’m also willing to write Storable instances by hand when I need to.

I’ve run into an issue where I need to write Haskell types for this struct, that contains a C union (I did 4 years of C in college and I’d never seen a union in my life until I started this project).

struct wlr_xdg_surface_configure {
	struct wlr_xdg_surface *surface;
	struct wl_list link; // wlr_xdg_surface.configure_list
	uint32_t serial;

	union {
		struct wlr_xdg_toplevel_configure *toplevel_configure;
		struct wlr_xdg_popup_configure *popup_configure;
	};
};

My understanding of how a union in C works is: “A union contains a list of field names and their types, and only one of those fields can be defined at a time. The C compiler will ‘reserve’ enough space (in the memory layout of the struct) for the largest element in the union, so that every element in the union can fit within the allotted space.”

So, since in this example, both of the fields are pointers, it makes sense that we could just reserve space for one pointer in the Storable instance.

What I don’t understand is

how would consumers of our library know from the types which member of the union is ‘active’ and occupying that space in memory
how do I define a Haskell type that either has one field or another, but not both.

I’m a Haskell newbie, I’ve done some Advent Of Code and I’ve been self-studying for about two years now with no practical project experience. I started contributing to WLHS because I thought the FFI could be a decent place to start contributing, since it doesn’t require knowledge of Type Families, Functional Dependencies, or GADTs and the like.

I’ve tried doing an internet search for “how to define a Haskell C FFI type for a C union” and not found any materials that seem like they can solve the problem.

When writing hsc2hs code, what is the best way to define C unions?

The repository for wlhs is here www.github.com/bradrn/wlhs, and my in-progress pull request is here Wlr scene by MattBrooks95 · Pull Request #18 · bradrn/wlhs · GitHub , where I’ve written some structs that had unions, but ended up needing to comment them out and use an empty data declaration temporarily, so that it builds.

The wlr_xdg_surface struct is the one that I started trying to write, and then stopped because I realized that I didn’t know how to represent the union.

github.com/bradrn/wlhs

Wlr scene

bradrn:master ← MattBrooks95:wlr_scene

opened 12:04PM - 05 Apr 24 UTC

MattBrooks95

+8149 -1

I started on WLR_scene, but it's not finished. Some of the C code that WLR_sc…ene needs to work is trying to import a C header that is output from wayland-scanner, We may need to run wayland-scanner on some XML files and then check those C files into the repository as well. I'm not quite sure how to do it so I commented out the structs that I had started to write. https://github.com/swaywm/wlroots/issues/1180 Unless somebody else has this packaged up for us we may need to modify our build process to generate these files as well. I looked at what hsroots was doing. I couldn't figure out how to generate these header files with the Cabal build hooks, so I did it manually with a command like `$ wayland-scanner server-code protocol/<xml file> protocol-headers/<header file name>` ~~I also couldn't figure out how to tell Cabal to expect these local header files to be in `protocol-headers`, so I made a bash script that passes that info in on the command line~~ UPDATE 2024-04-07 ~not building because some header files can't be found.~ I also don't know how to define a marshallable type for array function parameters, and the C union syntax ```c struct myStruct { union { type1 foo, type2 bar }; } ```

AntC2 · April 13, 2024, 6:39am

Yeah. I’d call that a very old-fashioned and deeply unsafe style. It’s giving me unpleasant memories of how much code like that I used to write.

untagged unions are generally only provided in untyped languages or in a type-unsafe way (as in C). They have the advantage over simple tagged unions of not requiring space to store a data type tag.

Hah! as if we’re so short of storage we can’t spare a word for a tag.

I’d expect that if you look at the struct this is embedded in, there’ll be some tag field that will tell you. Or you might have to hunt down the code that accesses this union, to see how it tells.

A usual Haskell data type with two constructors – that is, two tags, each with several fields, by the look of it.

BurningWitness · April 13, 2024, 7:22am

The power solution is to have a type class that defines offsets for each field of a C struct, and then have the programmer peek/poke around the struct themselves. This way it doesn’t matter where the knowledge regarding the current union member resides, it’s no longer your libraries’ responsibility.

The problem with that solution is that it isn’t standardized in the language, and there just aren’t enough people in the ecosystem who care about FFI to change that.

All that aside, knowing that both of these members are pointers, you could just define a single pointer field in that position and unsafely castPtr. Then you hope you never come across ambiguous C union declarations again and you can pretend the problem never existed in the first place.

Do note that I’m explicitly assuming that wlr_xdg_surface_configure does not define what union member is currently stored; if it turns out it does then @AntC2’s answer is the expected in-language solution.

rhendric · April 13, 2024, 8:11am

Those hopes will be dashed in wlroots; this pattern is used in a few places.

wlr_xdg_surface_configure is an event type emitted by wl_xdg_surface, which contains another union split along the same top-level/popup dimension. It’s wlr_xdg_surface that contains the discriminator, a field called role which might be NONE, TOPLEVEL, or POPUP.

I don’t know if this is easy (or possible) in Haskell’s FFI, but I’d think that the ideal interface would be to map wlr_xdg_surface to a type with one constructor per role, and give type parameters to wlr_xdg_surface_configure and have the three different wlr_xdg_surface constructors use different parameters on the configure/ack_configure event types. (Edited to remove some silly thing about GADTs that I wrote because I’m too sleepy.)

BurningWitness · April 13, 2024, 8:52am

Okay, so that means you can made a Storable instance for a Haskell datatype, but you’re either signing up for impossible state combinations (toplevel surface, popup configure) or you need to pass that info through types (unwieldy).

I don’t believe you can do better than the C interface and trying to come up with some all-encompassing Haskell solution for this is a fool’s errand. Both of the Storable solutions I outlined above have to do unnecessary reads and writes, and in some cases, like this one, invent extra states that may well not even matter for a particular use case.

anarchymatt · April 13, 2024, 10:12am

Wow, I didn’t expect to receive so much great advice so quickly. Thank you.

I’ll take a look at it again and see what I can do.