Meeting Minutes - Text Maintainers Meeting
2021-04-15
Attendees:
Li-Yao
Emily Pillmore
Andreas Abel
Andrew Lelechenko
Background
This meeting is for the text package maintainers as we begin project planning for the text-utf8
conversion work.
Action Items:
- Engage with users of
text-icu
to determine an optimal path going forward for Unicode as we begin the steps towards a utf-8 text migration. - Engage with maintainers who use
text
in serialization-heavy libraries to coordinate utf8 changes - Investigate the performance regressions between from 8.10 to 9.0, and further regressions from 9.0 to 9.2
Text-utf8 package migration:
Performance:
Emily:
- It’s important to get messaging right for this. Many expect a huge performance increase.
Andrew:
- Changes to branch prediction will probably mean not much of a benefit performance-wise aside from serialization.
Expectations: aeson, serialization etc etc will be the benefit, not necessarily the text itself. We’re excited by z-haskell’s Z-Data
approach. Text-icu is perhaps slow on the haskell side.
Andrew:
- There are a few options we have for implementing the conversion from text to unicode:
- Maintainers of text-icu would be able to find native utf8-bindings and provide an interface to use these bindings. (best from perspective of text maintainers).
- We can provide a converter between utf8 text and utf16
- Abandon text-icu. Implement all of its functionality as native haskell libs
Harendra Kumar was working on this with haskell-data. Text-icu was already requiring the icu library internally.
Andreas:
- The last point was a nuisance historically in Agda
- Agda uses
text-icu
to compute the width of characters. We are not married to the idea.
Andrew:
- Unsure of the users of text-icu aside from folks like Ed. Many have switched to unicode-transform.
Emily:
- We’ve gone to a handful of industry partners and have had no negative feedback on the issue - in fact, the opposite: it was lukewarm at worst.
- Community buy-in, especially from folks like Michael, Ed, and others have given the thumb’s up on the idea, so I imagine there will be strong backing behind this idea.
Andrew:
- Step 1 is to understand why text is 7x regressed between 8.10 and 9.0, and further between 9.0 and 9.2.
- Then we can decide on next steps. It’s hard to say what to do until we figure out what’s wrong.
- We also want to talk with stakeholders of
text
andtext-icu
to figure out what the timeline should be for migration, and how they will be affected.
Emily:
- Fair. We’ll do our investigation with the maintainers of megaparsec, attoparsec, http-client, hexml, aeson etc. to gauge tolerance for migration and what we can do to make it easier.
- We’ll schedule it in the coming weeks. Hopefully we can get some GHC folks like AndreasK or Bgamari to help out with diagnosing the regressions.
Andrew:
- We need someone to lead the benchmarking work.