String literals and compilation speed

Haskell source files with lots of string literals take a long time to compile. (I remember reading this somewhere, but I’ve also experienced it firsthand.)

Why is this? And is there some reason why it’s difficult to fix?

1 Like

I’ve searched the GHC issue tracker, but there doesn’t seem to be any open issues about this. So, I guess the reason is that they are not aware of any issues yet. It would be great if you could provide a reproducible example.

2 Likes

Actually there’s some ongoing work related to this in Introduce a standard thunk for allocating top-level strings (!3012) · Merge requests · Glasgow Haskell Compiler / GHC · GitLab.

With overloaded string literals (-XOverloadedStrings) you may have the problem that the code for the fromString method is repeated for every literal. pandoc was suffering particularly from this issue for a while, because, after inlining, the size of the fromString code was quite substantial. In this case a NOINLINE pragma in the right place was very helpful.

For Text literals, it should be helpful to upgrade to text-2.0, because text much reduced its use of INLINE pragmas with that release.

I’ll look into doing this, after I’ve read the links posted by sjakobi to see what has already been reported.

In my current project, I’m using a lot of ByteString Builders. I’ve found that the slowest way to do it is to use the IsString instance for Builder:

import Data.ByteString.Builder (Builder)

foo :: Builder
foo = "foo"

Somewhat faster is to use the IsString instance for strict ByteStrings, and then call BB.byteString on it:

import Data.ByteString.Builder (Builder)
import qualified Data.ByteString.Builder as BB

foo :: Builder
foo = BB.byteString "foo"

But the fastest is to concatenate all my string literals into one big string, and then use drop and take to extract the specific substring I need:

import Data.ByteString.Builder (Builder)
import qualified Data.ByteString.Builder as BB
import qualified Data.ByteString.Char8   as B8

bigString :: B8.ByteString
bigString = "onebigstringfoobarbazotherstuffetc"

foo :: Builder
foo = BB.byteString $ B8.take 3 $ B8.drop 12 bigString
1 Like

Yeah, I’m not surprised that reducing the number of string literals improves compilation speed. Please do report the problem with the Builder literals on the bytestring issue tracker though. It may be possible to improve compile times for idiomatic code.

I’ve filed a couple of GitHub issues for the bytestring package:

I’ve also created a repository which can reproduce both of these issues. Unfortunately, this site will not allow me to post more than two links, but the repo is linked from both of the issues above.

3 Likes

It looks like these issues are mostly solved in GHC 9.2.2, although some of the numbers are still a little bit surprising. (Builder literals are still slower than ByteString literals, but everything is faster overall.)