I’ve been playing with linear types recently, but seems I don’t really know how to design a safe API using them. Could someone more experienced please take a look at my ramblings?
Thanks @jaror!
The API seems safe to me, because a user cannot ever get hold of Builder and mutate the same buffer twice, and mutations available through (.<>) and (<>.) take care to thread a buffer linearly. But I don’t have a good intuition about linear types yet, therefore asking for review.
Is it expected that runBuilder ∷ (Builder ⊸ a) → a should never allow to the builder to escape? Because it seems that a pure function could pass a linear id to runBuilder and get hold of the Builder.
Changing the signature to runBuilder ∷ (Builder ⊸ a) ⊸ a would still not solve the problem, because runBuilder can still “discharge” the result of the Builder ⊸ a callback by returning it directly. That still conforms to the linear function mantra of “if the result gets used exactly once, the input get used exactly once”.
Looking around in linear-base, it seems that the idiom for ensuring that mutable references don’t escape involves the use of the Ur (for “unrestricted”) datatype. For example in the function alloc of Data.Array.Mutable.Linear.
Ur has the particularity that “consuming” it (for example, with pattern matching) doesn’t require you to “consume” its inner value. And this closes the escape hatch: a devious callback can’t return the Builder value inside Ur, because runBuilder must ensure that the result of the callback gets “consumed exactly once” and it can’t guarantee that by returning Ur Builder to be consumed. So it doesn’t typecheck.
Sorry, I was confused. I had looked at this before the latest git commit when the interface still looked the same as in de haddock screenshot at the top of this thread, so runBuilder :: (Builder %1 -> Builder) -> Text which is safe.
Ah, I went directly to the repo, that explains the confusion
runBuilder :: (Builder %1 -> Builder) -> Text is more restrictive than returning some arbitrary type wrapped in Ur, but it’s simpler and it does look safe.
As a way to convince yourself that the type of runBuilder is safe (or, at least as safe as the traditional API), you could, instead, have given yourself
newBuilder :: (Builder %1 -> Ur a) %1 -> Ur a
runBuilder' :: Builder %1 -> Ur Text
The you could have defined runBuilder as follows:
runBuilder :: (Builder %1 -> Builder) -> Text
runBuilder f = unur $ newBuilder build
where
build :: Builder %1 -> Ur Text
build = runBuilder' . f
I’ve had a quick look at the implementation, it has a lot of unsafeCoerce-s. It’s difficult to estimate the cost of these (though it’s definitely not trivial because unsafeCoerce between non-linear and linear functions prevent some inlining optimisations currently).
It would be worth considering defining Builder as
data Builder where
Builder :: Text -> Builder
(notice the non-linear arrow)
This would mean one extra box everywhere (I have to admit that I haven’t yet gotten around to do the worker-wrapper split for unrestricted constructors like this; however, inlining should still remove a bunch of the boxes), in exchange of avoiding a lot of unsafeCoerce-s.
Thanks @aspiwack! GADT definition allows to remove all unsafeCoerce. Performance remains the same however, because worker-wrapper does not seem to kick in. In fact, if I convince GHC do not split functions into workers and wrappers, benchmarks get faster.
I’ve got to admit, it’s pretty funny that deactivating worker-wrapper split makes the code faster. It’s probably a coincidence though.
The reason why worker-wrapper split is not available for unrestricted types is simply because there is no unrestricted unboxed tuple (I wrote a bit in the wiki). There is a bit of design to do, and then it’s mostly just a matter of putting in the time in.
So far so good, my experimental linear Text builder makes blaze-markup benchmarks twice faster. Anyone else to take a look, before it pollutes Hackage forever?
If the API is “small”, maybe it could be added to an existing package, if there is a suitable one in Hackage.
Otherwise, and if you’ve received little or no comments from other users, put an exact time and date on when you will be adding it permanently to Hackage on a public forum, like here or one of the mailing lists - that way if someone complains later, you can just send them a link to the relevant post.
Perhaps the README could go into a bit more detail about how the library uses linearity to achieve performance.
A question about
(|>) ∷ Buffer ⊸ Text → Buffer
IIUC, this means that you (linearly) supply a Buffer and get a function to which you can supply different Text values, getting a different Buffer each time. For it to be safe, don’t you need to copy the underlying array each time?
Edit: I misread the signature. If you are in a linear context, you can only use the resulting Text → Buffer function once. You can’t use it multiple times with different arguments. That said, the Text argument can be used unrestrictedly inside the function.
Why did you benchmark the Data.Text.Lazy.Builder.Builder type against your Buffer and not against your Builder? Your Builder is also faster than the standard builder, but slower than manipulating Buffers directly. And I don’t think your Builder interface requires linear types. Maybe you should warn that for the most performance users should use the Buffer type directly.