Haskell source files with lots of string literals take a long time to compile. (I remember reading this somewhere, but I’ve also experienced it firsthand.)
Why is this? And is there some reason why it’s difficult to fix?
Haskell source files with lots of string literals take a long time to compile. (I remember reading this somewhere, but I’ve also experienced it firsthand.)
Why is this? And is there some reason why it’s difficult to fix?
I’ve searched the GHC issue tracker, but there doesn’t seem to be any open issues about this. So, I guess the reason is that they are not aware of any issues yet. It would be great if you could provide a reproducible example.
Actually there’s some ongoing work related to this in Introduce a standard thunk for allocating top-level strings (!3012) · Merge requests · Glasgow Haskell Compiler / GHC · GitLab.
With overloaded string literals (-XOverloadedStrings
) you may have the problem that the code for the fromString
method is repeated for every literal. pandoc
was suffering particularly from this issue for a while, because, after inlining, the size of the fromString
code was quite substantial. In this case a NOINLINE
pragma in the right place was very helpful.
For Text
literals, it should be helpful to upgrade to text-2.0
, because text
much reduced its use of INLINE
pragmas with that release.
I’ll look into doing this, after I’ve read the links posted by sjakobi to see what has already been reported.
In my current project, I’m using a lot of ByteString Builders. I’ve found that the slowest way to do it is to use the IsString
instance for Builder
:
import Data.ByteString.Builder (Builder)
foo :: Builder
foo = "foo"
Somewhat faster is to use the IsString
instance for strict ByteStrings, and then call BB.byteString
on it:
import Data.ByteString.Builder (Builder)
import qualified Data.ByteString.Builder as BB
foo :: Builder
foo = BB.byteString "foo"
But the fastest is to concatenate all my string literals into one big string, and then use drop
and take
to extract the specific substring I need:
import Data.ByteString.Builder (Builder)
import qualified Data.ByteString.Builder as BB
import qualified Data.ByteString.Char8 as B8
bigString :: B8.ByteString
bigString = "onebigstringfoobarbazotherstuffetc"
foo :: Builder
foo = BB.byteString $ B8.take 3 $ B8.drop 12 bigString
Yeah, I’m not surprised that reducing the number of string literals improves compilation speed. Please do report the problem with the Builder
literals on the bytestring
issue tracker though. It may be possible to improve compile times for idiomatic code.
I’ve filed a couple of GitHub issues for the bytestring
package:
I’ve also created a repository which can reproduce both of these issues. Unfortunately, this site will not allow me to post more than two links, but the repo is linked from both of the issues above.
It looks like these issues are mostly solved in GHC 9.2.2, although some of the numbers are still a little bit surprising. (Builder
literals are still slower than ByteString
literals, but everything is faster overall.)