Need a review of linear-typed API

Ah, I went directly to the repo, that explains the confusion :slight_smile:

runBuilder :: (Builder %1 -> Builder) -> Text is more restrictive than returning some arbitrary type wrapped in Ur, but it’s simpler and it does look safe.

1 Like

Sorry for confusion, I pushed another commit atop of yesterday’s discussion. Thanks for catching the leakage of Builder!

1 Like

As a way to convince yourself that the type of runBuilder is safe (or, at least as safe as the traditional API), you could, instead, have given yourself

newBuilder :: (Builder %1 -> Ur a) %1 -> Ur a
runBuilder' :: Builder %1 -> Ur Text

The you could have defined runBuilder as follows:

runBuilder :: (Builder %1 -> Builder) -> Text
runBuilder f = unur $ newBuilder build
  where
     build :: Builder %1 -> Ur Text
     build = runBuilder' . f

I’ve had a quick look at the implementation, it has a lot of unsafeCoerce-s. It’s difficult to estimate the cost of these (though it’s definitely not trivial because unsafeCoerce between non-linear and linear functions prevent some inlining optimisations currently).

It would be worth considering defining Builder as

data Builder where
  Builder :: Text -> Builder

(notice the non-linear arrow)

This would mean one extra box everywhere (I have to admit that I haven’t yet gotten around to do the worker-wrapper split for unrestricted constructors like this; however, inlining should still remove a bunch of the boxes), in exchange of avoiding a lot of unsafeCoerce-s.

I honestly don’t know which is faster.

3 Likes

Thanks @aspiwack! GADT definition allows to remove all unsafeCoerce. Performance remains the same however, because worker-wrapper does not seem to kick in. In fact, if I convince GHC do not split functions into workers and wrappers, benchmarks get faster.

1 Like

I’ve got to admit, it’s pretty funny that deactivating worker-wrapper split makes the code faster. It’s probably a coincidence though.

The reason why worker-wrapper split is not available for unrestricted types is simply because there is no unrestricted unboxed tuple (I wrote a bit in the wiki). There is a bit of design to do, and then it’s mostly just a matter of putting in the time in.

1 Like

So far so good, my experimental linear Text builder makes blaze-markup benchmarks twice faster. Anyone else to take a look, before it pollutes Hackage forever?

6 Likes
  • If the API is “small”, maybe it could be added to an existing package, if there is a suitable one in Hackage.

  • Otherwise, and if you’ve received little or no comments from other users, put an exact time and date on when you will be adding it permanently to Hackage on a public forum, like here or one of the mailing lists - that way if someone complains later, you can just send them a link to the relevant post.

3 Likes

Perhaps the README could go into a bit more detail about how the library uses linearity to achieve performance.

A question about

(|>) ∷ Buffer ⊸ Text → Buffer

IIUC, this means that you (linearly) supply a Buffer and get a function to which you can supply different Text values, getting a different Buffer each time. For it to be safe, don’t you need to copy the underlying array each time?

Edit: I misread the signature. If you are in a linear context, you can only use the resulting Text → Buffer function once. You can’t use it multiple times with different arguments. That said, the Text argument can be used unrestrictedly inside the function.

1 Like

@danidiaz right, you can define

bar :: Buffer -> Buffer
bar buf = (\f -> f "foo" >< f "bar") (buf |>)

but you cannot pass it to runBuffer, because it requires Buffer ⊸ Buffer.

3 Likes

I’ve uploaded a candidate package, rendered haddocks are available at Data.Text.Builder.Linear

4 Likes

Why did you benchmark the Data.Text.Lazy.Builder.Builder type against your Buffer and not against your Builder? Your Builder is also faster than the standard builder, but slower than manipulating Buffers directly. And I don’t think your Builder interface requires linear types. Maybe you should warn that for the most performance users should use the Buffer type directly.

1 Like

Sorry, this is not intentional: when I wrote benchmarks, there was no Builder interface yet, only Buffer. And yes, it’s expected to be a bit faster.

1 Like

Have you considered making runBuilder nonlinear, i.e. runBuilder :: Builder -> Text? I don’t think that introduces any unsafety.

1 Like

@jaror done in Make runBuilder multiplicity polymorphic · Bodigrim/linear-builder@d3f3159 · GitHub

2 Likes

I think you’re right, but does this actually achieve anything? Isn’t a -o b a subtype of (or at least has an injection to) a -> b?

1 Like

Indeed, if you have

f :: a ⊸ b

you can always define

g :: a -> b 
g a = f a 

So it is preferrable to mark linear functions as such, as clients can always downgrade to a normal arrow, if it fits their API better.

2 Likes

I asked because I believe that it the only place where linearity is used in the Builder module (if you make the Builder type opaque), so that part of the API could just as well be added to the existing text package. That should already give a large speedup compared to the existing lazy text builder.

3 Likes

Hrm:

  • runBuilder :: (Builder %1 -> Builder) -> Text
  • runBuilder :: (Builder ⊸ Builder) -> Text

…anyone for plain ol’ boring:

runBuilder :: (*Builder -> Builder) -> Text

It’s concise and {-# UnicodeSyntax #-}-free.

1 Like

@atravers I made runBuilder arrow-polymorphic, see updated Data.Text.Builder.Linear

3 Likes

Alright, last call for reviews. I’ve added a section on design.
https://hackage.haskell.org/package/text-builder-linear-0.1/candidate

2 Likes