How many kinds of bottom are there in GHC?

So if I understand it right, we have different denotations for the different function types (ignoring multiplicities):

⟦ FUN @_ @Type A B ⟧         = ⟦ A ⟧ → ⟦ B ⟧
⟦ FUN @_ @UnliftedType A B ⟧ = ⟦ A ⟧ → ⟦ B ⟧ ∪ {⊥}

If the codomain is Type then the ⊥ is part of ⟦ B ⟧.

So essentially GHC automatically “levitates” unlifted codomains of functions.

Edit: And let's of unlifted variables now become like binds in the partiality monad:

⟦ let x :: Int# = M in N ⟧ = if ⟦ M ⟧ = ⊥ then ⊥ else ⟦ N[ x := ⟦ M ⟧ ] ⟧

Edit: But this does raise questions if the body of the let is also unlifted, e.g. N :: Int#, because then you’d expect the type of the whole let expression to be Int# as well. And the denotation of that is ⟦ Int# ⟧ = {-2^31..2^31 - 1}, but that does not include ⊥, so where would that go?

I think this distinction is not necessary, if we acknowledge that there are computations that can be typed in an unlifted domain and still diverge. Because then the ⊥ is always part of the semantic domain ⟦A⟧ and does not need to be explicitly added. Otherwise I think your semantics of let bindings nails it. The property of A being unlifted amounts to

  • The only functions from any X to A that do not send ⊥ to ⊥ are the lazy constant functions.

According to this pdf I just found while searching for denotational semantics of CBV, that cannot be the case:

From an older version of those notes:

I don’t think this is a show-stopper. Levy proposes that the semantics of M : B is ⟦B⟧ + E where ⟦B⟧ may be a flat, bottomless type. But that still says there are more terms definable in B than denoted by ⟦B⟧.

Domain semantics knows many different variants of product and sums, and strict and lazy semantics yield different versions of these. One has to live with the fact that what syntactically looks like a cartesian product is in fact not mapped to the cartesian product of the semantic category:

⟦(A,B)⟧ is not ⟦A⟧ x ⟦B⟧ and ⟦Either A B⟧ is not ⟦A⟧ + ⟦B⟧

If I recall correctly, the strict tuple is the so-called smash product and distinct from the cartesian product in CPO, while this category has no coproduct. It has the co-called coalesced sum which is the coproduct in the sub-category of strict functions.
But all this is a side-topic, the only relevant information to extract is whether there is any argument forbidding that ⟦Int#⟧ could be a domain with bottom.

The cited denotational semantics are a bit boring because it is simply-typed without explicit general fix combinator, thus strongly normalising and hence no mention of ⊥.

To describe unlifted kinds, I would choose a variant of System F with two kinds (instead of just *): Unlifted (but perhaps boxed) kind # and lifted kind *, enriched with custom data types living in either # or *. Let’s say that there is one (->) type constructor per kind-kind combination, like a data family (for example, (->) @# @* is the function space from unlifted to lifted). Finally, there is a general fixpoint combinator fix (or, equivalently, letrec). The latter is important, otherwise we don’t need the whole CPO foo and ⊥ elements.

Since we are typed, we don’t need to give a universal kitchen sink domain D :: Domain, but we can give one such D :: Domain per type. This translation is described by @jaror’s translation function ⟦_⟧ :: Type -> Domain. I think he’s on point, because his definition is compositional. I would define it in a slightly different way, though:

Comp :: Kind -> Domain -> Domain
Comp # d = d + ⊥ -- + = set union
Comp * d = d 
⟦ (->) @k1 @k2 arg res ⟧ = ⟦ arg ⟧ ->c (Comp k2 ⟦ res  ⟧) -- NB: Scott-continuous functions on the  RHS
⟦ Int# ::  # ⟧ = ZZ -- mathematical integers
⟦ Int ::  * ⟧ = ZZ + ⊥

And with this definition, we can type the semantics of a closed term as ⟦_⟧ :: Exp ⊦ (ty::Type) -> Comp (kind ty) ⟦ty⟧.

One thing I don’t quite like is that there would be terms t :: ty whose semantics has ⟦t⟧ ∉ ⟦ty⟧.

Other than that, I think I see what you did there. Instead of the categories Hask and DCPO, you are considering the categories KindedHask and KindedDCPO where morphisms are functions and objects are objects of Hask resp. DCPO tagged with either * or #, the kind. In DCPO, only domains with ⊥ may be tagged with *. Then you define a mapping Comp on objects of KindedDCPO such that it is Lift for every object tagged with # and the identity for every object tagged with *. The question then is whether we can make Comp an endofunctor or even a monad. If we could, then we can hope for ⟦_⟧ to be a functor from KindedHask to the Kleisli category of Comp.
Let us define the morphism part of the functor Comp. Let 𝜂 denote the unit of the Lift monad Lift x = x + ⊥. I see only one choice:

Comp (f: * -> *) = f
Comp (f: * -> #) = 𝜂 . f
Comp (f: # -> #) = Lift f
Comp (f: # -> *) = maybe ⊥ f

There are only two non-trivial cases of composition to check which do not immediately follow from functoriality of Lift.

* -> # -> *
# -> * -> #

For the former, we compute that maybe ⊥ g . 𝜂 . f = g.f which is desired and for the latter we need 𝜂 . g . maybe ⊥ f = Lift (g.f) but that is not the case, since 𝜂 never produces a ⊥ but Lift (g.f) preserves ⊥. So either I have missed an alternative way to make Comp a functor or your semantics ⟦_⟧ is not functorial.

I think it is helpful to consider a compilation of Haskell to a lower-level language with syntactic distinction of ‘value’ and ‘computation’ like CBPV (Call-by-push-value), because Haskell is an impure language if you think non-termination as an effect.

Suppose such hypothetical compilation target exists and called Lowskell.
Following CBPV, Lowskell’s value type and computation type would be defined recursively with each other. Specifically, I suppose it has F and U type constructors below.

  • For a value type A, there’s a computation type F(A) meaning "a computation producing a value of A"
  • For a computation type B, there is a value type U(B) meaning "a thunk of a computation of B".

Every type in Haskell have corresponding value type in Lowskell.

  • Unlifted ground types like Int# in Haskell are value types of Lowskell as is.

  • Unlifted sums and products in Haskell are value types of Lowskell as is.

  • Rewrite every lifted type of Haskell to Lift A# for some unlifted type A#. For example,

    data Maybe a = Nothing | Just a
    -- ||
    data Lift (a :: TYPE UnliftedRep) where Mk :: a -> Lift a
    type Maybe = Lift (# (# #) | a #)
    

    Any such type is translated to a value type U(F(A#)) of Lowskell.

  • Function type A -> A' in Haskell is translated to U(Fun(A,F(A'))) in Lowskell, where

    • Fun(X,Y) is a computation type, which means “computation taking a parameter of value type X and do something of computation type Y.”

If you take the meaning of “denotation of a type A in Haskell” as “denotation of value type A in Lowskell,” it is correct to say "Int# doesn’t have ". I like this interpretation because it matches my intuition.

On the other hand, if you take the meaning of “denotation of type A in Haskell” as “denotation of computation type F(A) in Lowskell,” then both of Int and Int# have . The denotation of Int# and Int under this interpretation are the following:

  • The denotation of Int# is every possible result of computations returning Int#
  • The denotation of Int is every possible result of computations returning (computations returning Int#)

It looks like there are two modes of failures in the latter, but I guess we don’t distinguish it, because you’ll have to collapse them anyway to extract Int# value out of the computation of Int.

I don’t like this interpretation because of the inability to distinguish lifted and unlifted, but one advantage over the other (value) interpretation is it matches with what you can write in Haskell. When you write expression :: t in Haskell, it means computation_term :: F t in Lowskell. You can never write type assertion in Haskell corresponding to value_term :: t in Lowskell!

2 Likes

I think you do need to distinguish functions with unlifted codomain from functions with lifted codomain. Because now your denotation of A -> B (where B is lifted) is:

U(Fun(A,F(U(F(B#)))))

(A should be expanded to U(F(A#)) as well but let’s ignore that to avoid clutter.)

While I think it should simply be:

U(Fun(A,F(B#)))

Or is F (U x) = x always true?

1 Like

Okay, this thread is developing into a general denotational semantics discussion, and perhaps rightfully so. I can now appreciate the intentions of the explanations by @sgraf and @viercc as an attempt to separate the user-intended elements of a type (maybe call it the payload) from the by-products like ⊥ which exist because the language has general recursion.
Yet I feel it is debatable whether ⟦A⟧ should be reserved for the payload part of a type A or should denote the entirety of the possible outcomes, the “computation type” F(A) in the model laid out by @viercc. The domain theory community has generally adopted the latter, but I acknowledge that this is a choice another kind of semantics may make differently.

Edit: Sometimes the distinction between payload and by-products is vague. I’ve seen code that explicitly uses the unit type () as a two-element vertical chain, so the ⊥ belongs to the payload of user-wanted elements.

Thanks, this was the short form intro to CbPV that I needed.

Indeed, the F/U separation seems like a much better fit. Let me revise the fragment of the denotational semantics :

⟦ (->) arg res ⟧ = U(Fun(⟦ arg ⟧, F(⟦ res ⟧))
⟦ Int# ::  # ⟧ = ZZ -- mathematical integers
⟦ Int ::  * ⟧ = U(F(ZZ)) -- virtually ZZ + ⊥

But @jaror now raises an important point:

Or is F (U x) = x always true?

I don’t think that is the case, but it is the case that F . U . F = F (likewise U . F . U = U) since F and U are supposed to be adjoint. (At least that would be true for a Galois Connection, which is a special form of adjoint. I had a look at Wikipedia, there you can find this identity under “The first counit–unit equation”.)

So consider U(Fun(A,F(A')). If A' is itself a lifted value, A' = U(F(A#)), then we have U(Fun(A,F(U(F(A#)))) = U(Fun(A,F(A#))). The counit-unit property of adjoint functors implies that different ⊥ are smashed together, as one would expect. This renders my Comp family above redundant.

This encoding is quite “synthetic”, in that it does not appeal to a concrete model (i.e., Scott domain) of Fun. I suppose I would have to look into the thesis to see that.

Perhaps @DavidB has any further pointers?