Ah, I went directly to the repo, that explains the confusion
runBuilder :: (Builder %1 -> Builder) -> Text
is more restrictive than returning some arbitrary type wrapped in Ur
, but it’s simpler and it does look safe.
Ah, I went directly to the repo, that explains the confusion
runBuilder :: (Builder %1 -> Builder) -> Text
is more restrictive than returning some arbitrary type wrapped in Ur
, but it’s simpler and it does look safe.
Sorry for confusion, I pushed another commit atop of yesterday’s discussion. Thanks for catching the leakage of Builder
!
As a way to convince yourself that the type of runBuilder
is safe (or, at least as safe as the traditional API), you could, instead, have given yourself
newBuilder :: (Builder %1 -> Ur a) %1 -> Ur a
runBuilder' :: Builder %1 -> Ur Text
The you could have defined runBuilder
as follows:
runBuilder :: (Builder %1 -> Builder) -> Text
runBuilder f = unur $ newBuilder build
where
build :: Builder %1 -> Ur Text
build = runBuilder' . f
I’ve had a quick look at the implementation, it has a lot of unsafeCoerce
-s. It’s difficult to estimate the cost of these (though it’s definitely not trivial because unsafeCoerce
between non-linear and linear functions prevent some inlining optimisations currently).
It would be worth considering defining Builder
as
data Builder where
Builder :: Text -> Builder
(notice the non-linear arrow)
This would mean one extra box everywhere (I have to admit that I haven’t yet gotten around to do the worker-wrapper split for unrestricted constructors like this; however, inlining should still remove a bunch of the boxes), in exchange of avoiding a lot of unsafeCoerce
-s.
I honestly don’t know which is faster.
Thanks @aspiwack! GADT definition allows to remove all unsafeCoerce
. Performance remains the same however, because worker-wrapper does not seem to kick in. In fact, if I convince GHC do not split functions into workers and wrappers, benchmarks get faster.
I’ve got to admit, it’s pretty funny that deactivating worker-wrapper split makes the code faster. It’s probably a coincidence though.
The reason why worker-wrapper split is not available for unrestricted types is simply because there is no unrestricted unboxed tuple (I wrote a bit in the wiki). There is a bit of design to do, and then it’s mostly just a matter of putting in the time in.
So far so good, my experimental linear Text
builder makes blaze-markup
benchmarks twice faster. Anyone else to take a look, before it pollutes Hackage forever?
If the API is “small”, maybe it could be added to an existing package, if there is a suitable one in Hackage.
Otherwise, and if you’ve received little or no comments from other users, put an exact time and date on when you will be adding it permanently to Hackage on a public forum, like here or one of the mailing lists - that way if someone complains later, you can just send them a link to the relevant post.
Perhaps the README could go into a bit more detail about how the library uses linearity to achieve performance.
A question about
(|>) ∷ Buffer ⊸ Text → Buffer
IIUC, this means that you (linearly) supply a Buffer
and get a function to which you can supply different Text
values, getting a different Buffer
each time. For it to be safe, don’t you need to copy the underlying array each time?
Edit: I misread the signature. If you are in a linear context, you can only use the resulting Text → Buffer
function once. You can’t use it multiple times with different arguments. That said, the Text
argument can be used unrestrictedly inside the function.
@danidiaz right, you can define
bar :: Buffer -> Buffer
bar buf = (\f -> f "foo" >< f "bar") (buf |>)
but you cannot pass it to runBuffer
, because it requires Buffer ⊸ Buffer
.
I’ve uploaded a candidate package, rendered haddocks are available at Data.Text.Builder.Linear
Why did you benchmark the Data.Text.Lazy.Builder.Builder
type against your Buffer
and not against your Builder
? Your Builder
is also faster than the standard builder, but slower than manipulating Buffer
s directly. And I don’t think your Builder
interface requires linear types. Maybe you should warn that for the most performance users should use the Buffer
type directly.
Sorry, this is not intentional: when I wrote benchmarks, there was no Builder
interface yet, only Buffer
. And yes, it’s expected to be a bit faster.
Have you considered making runBuilder
nonlinear, i.e. runBuilder :: Builder -> Text
? I don’t think that introduces any unsafety.
I think you’re right, but does this actually achieve anything? Isn’t a -o b
a subtype of (or at least has an injection to) a -> b
?
Indeed, if you have
f :: a ⊸ b
you can always define
g :: a -> b
g a = f a
So it is preferrable to mark linear functions as such, as clients can always downgrade to a normal arrow, if it fits their API better.
I asked because I believe that it the only place where linearity is used in the Builder module (if you make the Builder type opaque), so that part of the API could just as well be added to the existing text package. That should already give a large speedup compared to the existing lazy text builder.
Hrm:
runBuilder :: (Builder %1 -> Builder) -> Text
runBuilder :: (Builder ⊸ Builder) -> Text
…anyone for plain ol’ boring:
runBuilder :: (*Builder -> Builder) -> Text
It’s concise and {-# UnicodeSyntax #-}
-free.
Alright, last call for reviews. I’ve added a section on design.
https://hackage.haskell.org/package/text-builder-linear-0.1/candidate