Could GHC track whether a particular value is known at compilation time?

Here’s my line of thought:

When parsing data, key/value pairs are a rather common occurence, the two ones on my mind are JSON objects and command-line option lists. Radix trees may be used to make this parsing efficient, and the choice between strict/lazy tree representation is a whole separate box of fun.

There is however an extra flavor of a radix tree that is squarely outside of what GHC can do right now: a pre-compiled one. If the entire list of keys is known at compilation time , a (forall a. [(String, a)] -> RadixTree a) conversion should should too be possible at compilation time.

I feel like GHC could track this “compilation-time’dness” in a way similar to tracking levity, via a separate type qualifier. All literals are known at compilation time, all functions only maintain the guarantee if every single argument is known at compilation time. IO could be fully banned by making any related function only produce runtime-evaluated expressions, though there may be counterpoints to this.

Does anyone have strong opinions as to why this should/shouldn’t be possible?

2 Likes

GHC already does this up to a point. For example if you write:

x = 1 + 2

Then GHC will optimize this to:

x = 3

I guess an improvement could be to allow the programmer to explicitly ask for a guarantee that a particular value will be evaluated at compile time. And the set of expressions for which this works could be expanded by adding more optimization rules.

In a sense, this is what TH’s Lift is.

1 Like

This vaguely reminds me of the Zero-Overhead Abstractions in Haskell using Staging video and of Staging with Class. But I don’t know if the techniques described are actually possible in current Haskell.

From my understanding Lift is a code injection typeclass, a nice way to dynamically print the code into the file.

In the (forall a. [(String, a)] -> RadixTree a) example the a is usually some kind of a function that consumes input at runtime. Since GHC knows that the forall a. constraint is there, it should be able to precompile by treating every a as an opaque reference.

I don’t think typed splices solve the problem since they still need to be converted to Q Exp.

I was imagining that what you wanted to do was, given a [(String, a)] that is known at compile time, generate a RadixTree a at compile time. Lift allows you to do that, assuming Lift a. Did you want to do something else? If so then perhaps you could give a small example.

And I do not have that guarantee: for commandline option-parsing a generally looks like (forall s. String -> s -> Either String s) (or even multiple functions in a branching type).

OK, then can you, instead of using Lift, hand-write an Exp that generates an expression of that type? Maybe simple quotation ([| ... |]) would work?

You could try splices, but this would mean reimplementing the functions from scratch just to pass the expressions around. At that level it’s easier to pack things during runtime.

Well and if you try to do it you’ll run into the stage restriction:

template :: Quote m => [(String, Code m (String -> s -> Either String s))]
template =
  [ ("foo", [|| \_ -> Right ||])
  , ("bar", [|| \_ _ -> Left "no" ||])
  , ("baz", [|| undefined ||])
  ]

compile :: Quote m => [(String, Code m a)] -> Code m (Map String a)
compile []               = [|| Map.empty ||]
compile ((s, Code x):xs) = _ s $$(x)

---

app/Foo.hs:20:35: error:
    • GHC stage restriction:
        ‘x’ is used in a top-level splice, quasi-quote, or annotation,
        and must be imported, not defined locally
    • In the typed splice: $$(x)
   |
20 | compile ((s, Code x):xs) = _ s $$(x)
   |

I’m not sure I understand what you mean. Can’t you just quote the function names? (See example below.)

Again I fear I may be missing the point, but don’t you want to use $$(x) inside a quotation? (See below.)

{-# LANGUAGE TemplateHaskell #-}

import Language.Haskell.TH
import qualified Data.Map as Map
import Data.Map (Map)

foo = \_ -> Right
bar = \_ _ -> Left "no"
baz = undefined

template :: Quote m => [(String, Code m (String -> s -> Either String s))]
template =
  [ ("foo", [|| foo ||])
  , ("bar", [|| bar ||])
  , ("baz", [|| baz ||])
  ]

compile :: Quote m => [(String, Code m a)] -> Code m (Map String a)
compile []               = [|| Map.empty ||]
compile ((s, x):xs) = [|| Map.singleton s $$(x) ||]

I think the answer is definitely Template Haskell. That its implementation is broken is no justification to introduce a separate, different mechanism into GHC that will need to implement staged metaprogramming in a different broken way. Rather we (and by that I mean “someone, not me”, unfortunately) should work hard to fix Template Haskell.

9 Likes

True, got bamboozled by the error message.

compile :: Quote m => [(String, Code m a)] -> Code m (Map String a)
compile []          = [|| Map.empty ||]
compile ((s, x):xs) = [|| Map.insert s $$(x) $$(compile xs) ||]

---

main :: IO ()
main = do
   let !_ = $$(compile template)
   pure ()

The code compiles to Map.insert "foo" foo $ Map.insert "bar" bar (Map.insert "baz" baz Map.empty), which I should’ve known: this is building expressions, not evaluating them. Any other ideas?


I don’t disagree in principle, but how would this work? Templating alone doesn’t seem sufficient. Would the compiler be able to evaluate an expression and then collect and backwire all the function references?

Ah indeed. There is no literal representation of Map available! There’s a good reason for that, I suppose: the type has invariants that must be satisfied by its values. Nonetheless, in principle the type’s constructors could be exposed so that those who promise to use them wisely can write Map literals.

ack another reason why I always prefer Internal modules over not exporting it at all.

@BurningWitness to give a sketch of what it could look like, you could do

compile :: Quote m => [(String, Code m a)] -> Code m (Map String a)
compile = go . Map.fromList
  where
    go = \case
      Tip -> [|| Tip ||]
      Bin s k a l r ->
        -- maybe add some strictness annotations
        [||
          Bin
            $$(lift s)
            $$(lift k)
            $$(a)
            $$(go l)
            $$(go r)
        ||]

That would only work if Data.Map exposed the constructors I’m using above. But you could also use black magic to import them anyway

2 Likes

Rather than exposing the constructors, I think Data.Map should just expose a function:

sequenceCode :: Quote m => Map String (Code m a) -> Code m (Map String a)
1 Like

I don’t actually know, since I’ve not used TH. But I’ve read papers using staging mechanisms in other languages and it is supposed to work. For example, you should be able to do the Map.insert business outside the quote, get back a Map String (Code m a) and then … sequence? Well, that would be the right type but Code m is not an Applicative. Perhaps precisely because that would need some kind of Lift1 type class, of which Map String was an instance.

It’s probably doable, it just hasn’t been done before (apparently).

1 Like

I’d argue this is terrible design because it’s code duplication. This kind of a function belongs in a library, but I would never expect a library to implement any of this.

Also “Internal” modules the way containers does them is bad design, PVP doesn’t cover them, so if you’re a law-abiding citizen you should put a hard containers == 0.7.0.0 constraint. I prefer exposing PVP-compatible Unsafe modules and properly exposing useful internals there (after all the user should be able to do unsafe stuff if they so desire).

I think to do better than this you need some kind of symbolic execution. GHC’s optimizer does this to some degree, in that it turns expressions into equivalent expressions (not necessarily “more evaluated” ones). And indeed GHC’s optimizer might well do a good job on this expression - you should see what Core the simplifier spits out!

Why not? Both @jaror and @sgraf are making a compelling argument that containers should export Map k (Code m v) -> Code m (Map k v).

1 Like

Okay, so the function required is merely

liftQ :: Quote m => Map Text (Code m a) -> Code m (Map Text a)
liftQ t =
  case t of
    Tip           -> [|| Tip ||]
    Bin s k a l r ->
      [|| Bin
            s
            k
            $$(a)
            $$(liftQ l)
            $$(liftQ r)
      ||]

This results in slightly noisy, but nonetheless properly reduced STG where each node is represented separately like this:

Main.dictionary
  :: forall {s}.
     Data.Map.Internal.Map
       Data.Text.Internal.Text
       (GHC.Base.String -> s -> Data.Either.Either GHC.Base.String s)
[GblId] =
    {} \u []
        case Main.dictionary351 of conrep_saWo [Occ=Once1] {
        Data.Text.Internal.Text _ [Occ=Dead] _ [Occ=Dead] _ [Occ=Dead] ->
        case Main.dictionary242 of conrep1_saWs [Occ=Once1] {
        __DEFAULT ->
        case Main.dictionary1 of conrep2_saWt [Occ=Once1] {
        __DEFAULT ->
        Data.Map.Internal.Bin [101#
                               conrep_saWo
                               Foo.baz
                               conrep1_saWs
                               conrep2_saWt];
        };
        };
        };

This performs well, a program with 101 entries takes 4 seconds to compile; 365 entries takes 8.5 seconds.

This should be good enough to put into data structure libraries, indeed.

4 Likes