Need a review of linear-typed API

I’ve been playing with linear types recently, but seems I don’t really know how to design a safe API using them. Could someone more experienced please take a look at my ramblings?

Here is a screenshot of haddocks (corresponding to https://github.com/Bodigrim/linear-builder/tree/d16e6f36e7d12fd51b23c0f2b0580a33d48a9979):

8 Likes

I tried some things to try and break it. The closest I got was:

f :: Builder %1 -> Builder %1 -> Builder
f x y = x .<> runBuilder (\z -> f y z)

But that doesn’t compile because runBuilder is not linear in its first argument and (.<>) isn’t linear in its second argument.

This convinces me that it is never possible to use a builder from one runBuilder to produce the result of another runBuilder.

Did you have other possible unsafety in mind? Or are you also looking for possible UX improvements?

1 Like

Thanks @jaror!
The API seems safe to me, because a user cannot ever get hold of Builder and mutate the same buffer twice, and mutations available through (.<>) and (<>.) take care to thread a buffer linearly. But I don’t have a good intuition about linear types yet, therefore asking for review.

1 Like

Is it expected that runBuilder ∷ (Builder ⊸ a) → a should never allow to the builder to escape? Because it seems that a pure function could pass a linear id to runBuilder and get hold of the Builder.

Changing the signature to runBuilder ∷ (Builder ⊸ a) ⊸ a would still not solve the problem, because runBuilder can still “discharge” the result of the Builder ⊸ a callback by returning it directly. That still conforms to the linear function mantra of “if the result gets used exactly once, the input get used exactly once”.

Looking around in linear-base, it seems that the idiom for ensuring that mutable references don’t escape involves the use of the Ur (for “unrestricted”) datatype. For example in the function alloc of Data.Array.Mutable.Linear.

alloc :: HasCallStack => Int → a → (Array a %1 → Ur b) %1 → Ur b

Ur has the particularity that “consuming” it (for example, with pattern matching) doesn’t require you to “consume” its inner value. And this closes the escape hatch: a devious callback can’t return the Builder value inside Ur, because runBuilder must ensure that the result of the callback gets “consumed exactly once” and it can’t guarantee that by returning Ur Builder to be consumed. So it doesn’t typecheck.

2 Likes

That is not the type of runBuilder. Edit: wait… at least not in the haddock image in this thread. But it seems to have changed in the repo.

1 Like

Hm, confusing, github code is different from the doc image.

1 Like

That is not the type of runBuilder.

What I meant was that, even if it were the type of runBuilder, that wouldn’t be enough.

(Edit: ah, I see the capture used %1 -> instead of the lollipop arrow . But they’re the same thing.)

1 Like

Sorry, I was confused. I had looked at this before the latest git commit when the interface still looked the same as in de haddock screenshot at the top of this thread, so runBuilder :: (Builder %1 -> Builder) -> Text which is safe.

4 Likes

Ah, I went directly to the repo, that explains the confusion :slight_smile:

runBuilder :: (Builder %1 -> Builder) -> Text is more restrictive than returning some arbitrary type wrapped in Ur, but it’s simpler and it does look safe.

1 Like

Sorry for confusion, I pushed another commit atop of yesterday’s discussion. Thanks for catching the leakage of Builder!

1 Like

As a way to convince yourself that the type of runBuilder is safe (or, at least as safe as the traditional API), you could, instead, have given yourself

newBuilder :: (Builder %1 -> Ur a) %1 -> Ur a
runBuilder' :: Builder %1 -> Ur Text

The you could have defined runBuilder as follows:

runBuilder :: (Builder %1 -> Builder) -> Text
runBuilder f = unur $ newBuilder build
  where
     build :: Builder %1 -> Ur Text
     build = runBuilder' . f

I’ve had a quick look at the implementation, it has a lot of unsafeCoerce-s. It’s difficult to estimate the cost of these (though it’s definitely not trivial because unsafeCoerce between non-linear and linear functions prevent some inlining optimisations currently).

It would be worth considering defining Builder as

data Builder where
  Builder :: Text -> Builder

(notice the non-linear arrow)

This would mean one extra box everywhere (I have to admit that I haven’t yet gotten around to do the worker-wrapper split for unrestricted constructors like this; however, inlining should still remove a bunch of the boxes), in exchange of avoiding a lot of unsafeCoerce-s.

I honestly don’t know which is faster.

3 Likes

Thanks @aspiwack! GADT definition allows to remove all unsafeCoerce. Performance remains the same however, because worker-wrapper does not seem to kick in. In fact, if I convince GHC do not split functions into workers and wrappers, benchmarks get faster.

1 Like

I’ve got to admit, it’s pretty funny that deactivating worker-wrapper split makes the code faster. It’s probably a coincidence though.

The reason why worker-wrapper split is not available for unrestricted types is simply because there is no unrestricted unboxed tuple (I wrote a bit in the wiki). There is a bit of design to do, and then it’s mostly just a matter of putting in the time in.

1 Like

So far so good, my experimental linear Text builder makes blaze-markup benchmarks twice faster. Anyone else to take a look, before it pollutes Hackage forever?

6 Likes
  • If the API is “small”, maybe it could be added to an existing package, if there is a suitable one in Hackage.

  • Otherwise, and if you’ve received little or no comments from other users, put an exact time and date on when you will be adding it permanently to Hackage on a public forum, like here or one of the mailing lists - that way if someone complains later, you can just send them a link to the relevant post.

3 Likes

Perhaps the README could go into a bit more detail about how the library uses linearity to achieve performance.

A question about

(|>) ∷ Buffer ⊸ Text → Buffer

IIUC, this means that you (linearly) supply a Buffer and get a function to which you can supply different Text values, getting a different Buffer each time. For it to be safe, don’t you need to copy the underlying array each time?

Edit: I misread the signature. If you are in a linear context, you can only use the resulting Text → Buffer function once. You can’t use it multiple times with different arguments. That said, the Text argument can be used unrestrictedly inside the function.

1 Like

@danidiaz right, you can define

bar :: Buffer -> Buffer
bar buf = (\f -> f "foo" >< f "bar") (buf |>)

but you cannot pass it to runBuffer, because it requires Buffer ⊸ Buffer.

3 Likes

I’ve uploaded a candidate package, rendered haddocks are available at Data.Text.Builder.Linear

4 Likes

Why did you benchmark the Data.Text.Lazy.Builder.Builder type against your Buffer and not against your Builder? Your Builder is also faster than the standard builder, but slower than manipulating Buffers directly. And I don’t think your Builder interface requires linear types. Maybe you should warn that for the most performance users should use the Buffer type directly.

1 Like

Sorry, this is not intentional: when I wrote benchmarks, there was no Builder interface yet, only Buffer. And yes, it’s expected to be a bit faster.

1 Like