Need a review of linear-typed API

Bodigrim · February 2, 2022, 10:28pm

I’ve been playing with linear types recently, but seems I don’t really know how to design a safe API using them. Could someone more experienced please take a look at my ramblings?

Here is a screenshot of haddocks (corresponding to GitHub - Bodigrim/linear-builder at d16e6f36e7d12fd51b23c0f2b0580a33d48a9979):

jaror · February 2, 2022, 11:11pm

I tried some things to try and break it. The closest I got was:

f :: Builder %1 -> Builder %1 -> Builder
f x y = x .<> runBuilder (\z -> f y z)

But that doesn’t compile because runBuilder is not linear in its first argument and (.<>) isn’t linear in its second argument.

This convinces me that it is never possible to use a builder from one runBuilder to produce the result of another runBuilder.

Did you have other possible unsafety in mind? Or are you also looking for possible UX improvements?

Bodigrim · February 2, 2022, 11:47pm

Thanks @jaror!
The API seems safe to me, because a user cannot ever get hold of Builder and mutate the same buffer twice, and mutations available through (.<>) and (<>.) take care to thread a buffer linearly. But I don’t have a good intuition about linear types yet, therefore asking for review.

danidiaz · February 3, 2022, 10:03pm

Is it expected that runBuilder ∷ (Builder ⊸ a) → a should never allow to the builder to escape? Because it seems that a pure function could pass a linear id to runBuilder and get hold of the Builder.

Changing the signature to runBuilder ∷ (Builder ⊸ a) ⊸ a would still not solve the problem, because runBuilder can still “discharge” the result of the Builder ⊸ a callback by returning it directly. That still conforms to the linear function mantra of “if the result gets used exactly once, the input get used exactly once”.

Looking around in linear-base, it seems that the idiom for ensuring that mutable references don’t escape involves the use of the Ur (for “unrestricted”) datatype. For example in the function alloc of Data.Array.Mutable.Linear.

alloc :: HasCallStack => Int → a → (Array a %1 → Ur b) %1 → Ur b

Ur has the particularity that “consuming” it (for example, with pattern matching) doesn’t require you to “consume” its inner value. And this closes the escape hatch: a devious callback can’t return the Builder value inside Ur, because runBuilder must ensure that the result of the callback gets “consumed exactly once” and it can’t guarantee that by returning Ur Builder to be consumed. So it doesn’t typecheck.

jaror · February 3, 2022, 10:10pm

That is not the type of runBuilder. Edit: wait… at least not in the haddock image in this thread. But it seems to have changed in the repo.

Abab9579 · February 3, 2022, 10:12pm

Hm, confusing, github code is different from the doc image.

danidiaz · February 3, 2022, 10:16pm

That is not the type of runBuilder.

What I meant was that, even if it were the type of runBuilder, that wouldn’t be enough.

(Edit: ah, I see the capture used %1 -> instead of the lollipop arrow ⊸. But they’re the same thing.)

jaror · February 3, 2022, 10:23pm

Sorry, I was confused. I had looked at this before the latest git commit when the interface still looked the same as in de haddock screenshot at the top of this thread, so runBuilder :: (Builder %1 -> Builder) -> Text which is safe.

danidiaz · February 3, 2022, 10:28pm

Ah, I went directly to the repo, that explains the confusion

runBuilder :: (Builder %1 -> Builder) -> Text is more restrictive than returning some arbitrary type wrapped in Ur, but it’s simpler and it does look safe.

Bodigrim · February 3, 2022, 10:29pm

Sorry for confusion, I pushed another commit atop of yesterday’s discussion. Thanks for catching the leakage of Builder!

aspiwack · February 4, 2022, 8:07am

As a way to convince yourself that the type of runBuilder is safe (or, at least as safe as the traditional API), you could, instead, have given yourself

newBuilder :: (Builder %1 -> Ur a) %1 -> Ur a
runBuilder' :: Builder %1 -> Ur Text

The you could have defined runBuilder as follows:

runBuilder :: (Builder %1 -> Builder) -> Text
runBuilder f = unur $ newBuilder build
  where
     build :: Builder %1 -> Ur Text
     build = runBuilder' . f

I’ve had a quick look at the implementation, it has a lot of unsafeCoerce-s. It’s difficult to estimate the cost of these (though it’s definitely not trivial because unsafeCoerce between non-linear and linear functions prevent some inlining optimisations currently).

It would be worth considering defining Builder as

data Builder where
  Builder :: Text -> Builder

(notice the non-linear arrow)

This would mean one extra box everywhere (I have to admit that I haven’t yet gotten around to do the worker-wrapper split for unrestricted constructors like this; however, inlining should still remove a bunch of the boxes), in exchange of avoiding a lot of unsafeCoerce-s.

I honestly don’t know which is faster.

Bodigrim · February 6, 2022, 9:46pm

Thanks @aspiwack! GADT definition allows to remove all unsafeCoerce. Performance remains the same however, because worker-wrapper does not seem to kick in. In fact, if I convince GHC do not split functions into workers and wrappers, benchmarks get faster.

aspiwack · February 7, 2022, 8:15am

I’ve got to admit, it’s pretty funny that deactivating worker-wrapper split makes the code faster. It’s probably a coincidence though.

The reason why worker-wrapper split is not available for unrestricted types is simply because there is no unrestricted unboxed tuple (I wrote a bit in the wiki). There is a bit of design to do, and then it’s mostly just a matter of putting in the time in.

Bodigrim · April 11, 2022, 7:09pm

So far so good, my experimental linear Text builder makes blaze-markup benchmarks twice faster. Anyone else to take a look, before it pollutes Hackage forever?

atravers · April 11, 2022, 10:25pm

If the API is “small”, maybe it could be added to an existing package, if there is a suitable one in Hackage.
Otherwise, and if you’ve received little or no comments from other users, put an exact time and date on when you will be adding it permanently to Hackage on a public forum, like here or one of the mailing lists - that way if someone complains later, you can just send them a link to the relevant post.

danidiaz · April 12, 2022, 5:16pm

Perhaps the README could go into a bit more detail about how the library uses linearity to achieve performance.

A question about

(|>) ∷ Buffer ⊸ Text → Buffer

IIUC, this means that you (linearly) supply a Buffer and get a function to which you can supply different Text values, getting a different Buffer each time. For it to be safe, don’t you need to copy the underlying array each time?

Edit: I misread the signature. If you are in a linear context, you can only use the resulting Text → Buffer function once. You can’t use it multiple times with different arguments. That said, the Text argument can be used unrestrictedly inside the function.

Bodigrim · April 13, 2022, 5:39pm

@danidiaz right, you can define

bar :: Buffer -> Buffer
bar buf = (\f -> f "foo" >< f "bar") (buf |>)

but you cannot pass it to runBuffer, because it requires Buffer ⊸ Buffer.

Bodigrim · April 13, 2022, 6:16pm

I’ve uploaded a candidate package, rendered haddocks are available at Data.Text.Builder.Linear

jaror · April 15, 2022, 4:37pm

Why did you benchmark the Data.Text.Lazy.Builder.Builder type against your Buffer and not against your Builder? Your Builder is also faster than the standard builder, but slower than manipulating Buffers directly. And I don’t think your Builder interface requires linear types. Maybe you should warn that for the most performance users should use the Buffer type directly.

Bodigrim · April 15, 2022, 5:36pm

Sorry, this is not intentional: when I wrote benchmarks, there was no Builder interface yet, only Buffer. And yes, it’s expected to be a bit faster.

Topic		Replies	Views
Violating memory safety with Haskell's value restriction Links	6	726	May 26, 2025
Reference Counting with Linear Types Show and Tell	19	5455	June 17, 2024
[ANN] jet-stream 1.0.0.0, a streaming library Show and Tell	1	726	August 21, 2021
Is unsafeInterleaveIO idempotent? Learn	2	525	January 31, 2024
[ANN] streamly-0.10 with fast binary serialization Announcements	3	473	July 29, 2024

Need a review of linear-typed API

Related topics