Total Denotational Semantics

I finally overhauled my blog (it just took a PhD worth of time) and published a new blog post, using Haskell to make denotational semantics executable: fixpt · Total Denotational Semantics

I tried to make the matter accessible to people with at least a bit of background in formal semantics, so I apologise to anyone who does not share this background.

Let me know what you think!

20 Likes

As long as we’re only interested in total executable denotational semantics (and not proving properties of the semantics), couldn’t we achieve the same with Liquid Haskell’s termination checker and the ideas from Conor’s “Productive Coprogramming with Guarded Recursion”? Just define the guarded fixed point in an “unsafe” module with the termination checker disabled.

First of all: Respect for being able to condense a PhD thesis into such a short blog post.

What I find intriguing about the attempt to provide semantics in Haskell is that (a large subset of) Haskell has domain semantics itself, so you are embedding a domain (the semantic domain of the lambda calculus) into another domain (the semantic domain of the Haskell type), but in a way that is not order-preserving. In a way, it is an orthogonal embedding because the Haskell domain now has a second partial order that is unrelated to the specialization order. For example, we could say Bot <= Fun id and that relation is decidable, while the two elements are not comparable by ⊑. Your total semantics is mapping the lambda calculus (or whatever language) to a subset of total elements of this bi-ordered domain, and on this subset the specialization order is flat.
An “execution” on such a structure is a map that translates the orthogonal order into ⊑. For example it maps both the syntactically divergent diverge = fix Step as well as Stuck to the properly divergent ⊥.
For extra fun, observe that Exp itself is a domain, and you have only considered the total elements of it. In fact, denot also translates the partial elements of Exp to partial elements of D.
I do not know a mathematical theory of such structures; it must be bi-topological. It sounds like a wonderful rabbit hole to fall into.

You say that while the mathematical interpreter I is total, the naive interp is not. The latter is easily seen, but how does one prove the former? I would challenge your notion of totality and say that the denotational semantics of interp is just as total as your interpreter I because it maps every lambda term to a well-defined element of the semantic domain of D, of which ⊥ is a member.

A more geometric example of a bi-ordered domain and total semantics: Consider the domain D of closed sub-intervals of the unit interval [0,1], ordered by reverse inclusion: [a,b] ⊑ [x,y] if a ≤ x and y ≤ b. The numbers in [0,1] are embedded as singleton intervals at the top of this domain. Can we extend the usual order ≤ of numbers to the entire domain, in a way so that [0,0] is least and [1,1] is greatest? Indeed we can:

Say that [a,b] ≤ [x,y] if a ≤ x and b ≤ y. (This is the may modality.)

In fact both orders render this domain a bounded complete distributive lattice, and the lattice operations distribute over another. The complete lattice ([0,1],≤) is a sub-lattice of the complete lattice (D,≤).

Very fun post. With so many layers of partiality, you’re in for a loop.

2 Likes

To avoid the hassle of hauling an abstract heap throughout their semantics (monadically or otherwise), Jennifer Hackett and Graham Hutton used a different technique:

Perhaps it can help to alleviate the need for >>= appearing in your semantics (e.g. D ⟦⋅⟧)…

<ramble>

Partiality is inescapable:

  • Roads, bridges and cranes have maximum load/weight limits - exceeding them results in damage and possibly collapse.

  • Water and gas pipes (along with submersibles) have maximum pressure limits, with similar consequences when exceeded.

  • Insulated electrical cables and electronic components have maximum voltage limits.

  • Even superconductors have maximum limits on electrical current.

So this obsesession fascination with totality for software completely eludes me - even if you could stomp every last bug out of existence from your programs using types, the Universe in which said programs reside contains cosmic radiation which can not only change nucleotides, but bits of information as well (be it data or code). What was bug-free yesterday could still crash tomorrow.

“But it makes the maths easier!” So would pretending zero doesn’t exist. But it does:

  1. Place one bead in a bowl.
  2. Take one bead from same bowl.
  3. How many beads are left in that bowl?

Zero (and perhaps an empty round container was the inspiration for its numerical/symbolic representation, 0).

There’s also the bemusing attempts to avoid the problems associated with the actual existence of zero:

Static types are alright when you’re not having to constantly type them out - it’s so much nicer to just write:

midpoint x y = (x + y) / 2

instead of:

double midpoint (double x, double y) { return (x + y) / 2; }

But now the goal is to explicate every last detail of what the definitions in our programs should do using types, to the point of implementing fully-formed proofs! So what prevents bugs from sneaking into said types and proofs (particularly as Haskell’s type system has been Turing-complete since the advent of functional dependencies)? It’s just extra code; the bugs aren’t fussy!

If too many perceive Haskell as “being hard” (e.g. poor benefit/cost ratio), then the blowtorch is optional: there are many other programming languages out there:

</ramble>

1 Like

With other denotational semantics one usually proves that the semantic category is closed under all necessary type constructions, such as sums, products and certain fixed points. Can you do that with total semantics?

For flat types t, it seems okay to add a new distinguishable bottom and nothing else:

Denot t = Bottom | Total !t

Do we also need a non-strict Step constructor here? For recursive types, say r = Fix f, the construction laid out in your blog post seems adequate:

data BottomT f a = Stuck | Now !(f a) | Delay (f a)
type Denot r = Fix (BottomT f)

Here the constructor Now plays the role of Ret and Delay plays the role of Step. What I called “execution” above now becomes a function Denot t -> t. Since Denot r is again a fixed point, we can iterate the construction. By tacking on an extra layer of Now we obtain a function Denot r -> Denot (Denot r) that preserves total elements so chances are this is a comonad.
The flat case is subsumed in BottomT because t is the fixed point of Const t.

While I agree that ignoring or eradicating the existence of partiality leads us nowhere, I think there is still value in this total semantics: The advantage is that in the total semantics the specialization order ⊑ (of the original type whose denotation we study) becomes a decidable relation ≤, so one can perform analyses that otherwise would be forbidden by Rice’s theorem.

Hmm. It is a bit unclear to me what exactly you mean. Perhaps let me try to provide a non-answer with an explanation of how I think Liquid Haskell could work with guarded recursion.

First off, you can simply postulate the (singly-clocked) Later modality as

newtype Later a = MkLater a deriving (Functor, Applicative) via Identity 
  -- perhaps got DerivingVia wrong here; you get the idea 

Then export Later, but not MkLater. I think this is the “unsafe” runtime system module that you are referring to. (You likely need more primitives besides Applicative that can be proven safe in the underlying topos model.)

Then you define the general guarded recursion fixpoint combinator as follows (i.e. it it not primitive!)

gfix :: (Later a -> a) -> a
gfix f = f (pure (gfix f)) -- NB: this is safe

However, gfix enables the definition of infinite, non-well-founded structures. I have my doubts that Liquid Haskell’s termination checker would be happy about this if it keeps unfolding MkLater. This paper indicates that it currently is not happy, and suggests step-indexed induction as a model of coinduction. I think that solves two out of three problems best thought separately:

  1. Soundness in the presence of non-well-founded codata
  2. A way to do coinductive proofs by integrating productivity checks into the (refined) language
  3. A way to do coinductive proofs by way of doing a step-indexed proof in an underlying step-indexed inductive model of coinduction.

As far as I can tell, the paper solves (1) (don’t pass --no-adt) and (3) (I wonder how this is compatible with --adt?? They do not seem to step-index the data types!), but not (2). Instead of providing a productivity checker, it relies on the user to do step-indexing. That’s well and good for simple first-order types such as streams with a simple encoding of observational equality, but is unlikely to work well for higher-order, negatively recursive types such as Value.


Problem 1: What does it take to make LH sound wrt. the Later modality?
I think it suffices to prevent LH from deriving an induction principle for Later (so LH must know about it). That is, following P a from P (Later a) is utterly unsafe and we need to hide this “implementation” from LH. One way to achieve this is to keep Later opaque and tell the SMT solver just about the applicative laws (such as pure f <*> pure a = pure (f a)), but not about how pure and <*> are implemented.


To understand the qualitative difference between Problem 2 and Problem 3, let me give an analogy.

I think we agree that programming in C is more convenient than programming in assembler. Among other things, the C compiler ensures a proper stack and register discipline (you don’t need to “prove” that your program adheres to this discipline).
But sometimes writing C is “not enough” and you really need to write assembler, for example when writing an operating system kernel or a driver, because the C language does not offer the necessary low-level, architecture-specific abstractions.

By analogy, proving in a guarded type system is more convenient than proving in the underlying (step-indexed or Scott) topos model directly. You can simply forget about the mathematical details of the underlying model and live in your axiomatised world (“each function call gets fresh local variable slots”).
This argument is similar to the argument of blog post that reasoning on discretely-ordered total elements is more convenient than on partial ones.

Sometimes (rather frequently, actually), staying within the applicative functor interface/DSL/internal language of the Later modality is insufficient. (I think this is an example.) In this case, you really need to unpack the underlying model and attempt a step-indexed proof for the primitive that you need (called flip-tick in my example).


So, problems (2) and (3) are about writing convenient and useful tools to write coinductive proofs. (2) is more about high-level convenience and (3) is about the “assembler to make it happen” if (2) is insufficient.

Problem 2: How do we write convenient LH proofs involving Later?
I think we would need to teach LH about Löb induction, which is gfix generalised to a :: Prop.
That would probably allow you to do all kinds of proofs about programs in the Applicative fragment. (I believe the main benefit of Ticked Cubical Type Theory is that the Applicative reasoning becomes part of definitional equality, so you don’t need to do proofs involving Applicative laws.)

This would not be too much work, I think. At this point, LH would be on par with the (admittedly still infant and mediocre, if I may say so) treatment of guarded types in Agda, Rocq or Lean, where you axiomatise all combinators and equational theory.
I you want more combinators than just Applicative (and you likely will), then you need more axioms that describe how these combinators can be used safely. I think you can do this by introducing new uninterpreted functions for combinators and assume axioms about their equational theory.

For simple kinds of combinators, this might work well, because the axioms are “obviously true”.
However, every axiom is a source of unsoundness unless you have proved it.

Problem 3 is about providing users a way to prove correct their axioms in an underlying topos model.
(Semanticists like Jon Sterling tend to do these proofs on paper. I believe that is for lack of a good solution to this problem in today’s proof assistants.)
In order for that to work, LH would need to reflect the Haskell definition of the combinator and the definition of the axiom into this topos model (via a denotational semantics, from Haskell into a suitable Haskell DSL backed by SMT logic) and let the user provide a proof there.
Finding the right Haskell DSL here is the subject of synthetic guarded domain theory (= axiomatisation of an adequate topos), if I understand correctly; an active research area.

2 Likes

Thanks for your extensive response!

I had this lying around and now uploaded it to GitHub: GitHub - noughtmare/coprogramming: Implementation of McBride's productive coprogramming + liquid haskell termination checking

Check out src/Control/Coprogramming.hs for the primitives and app/Main.hs for how they can be used (hopefully) safely with Liquid Haskell’s termination checker to implement repmin (please ignore the other things in that file).

I don’t think liquid haskell’s termination checker can see that it is safe.

To be clear, I only meant that we could use LH’s termination checker, not to use the actual proof mechanism that LH provides. I think that sidesteps all these issues, because AFAIK the termination checker is purely syntactic.

Do you have an example? Ah, I guess you could define an infinite stream like in my app/Main.hs file. However, that stream will contain an infinite amount of Laters which you can never force all the way. So, termination is still preserved as far as I can tell.

1 Like

First of all: Respect for being able to condense a PhD thesis into such a short blog post.

Thank you, but I would not say that what I discuss in the post is part of my thesis, which is about static analyses such as found in GHC. With “(it just took a PhD worth of time)” I was trying to say that I had neglected by blog for almost my entire PhD.

You say that while the mathematical interpreter I is total, the naive interp is not. The latter is easily seen, but how does one prove the former?

If it weren’t total, it would not be a function :slight_smile:

More seriously, it’s composed from total primitives, so it must be total as well. (That’s kind of the point of domain theory.)

I would challenge your notion of totality and say that the denotational semantics of interp is just as total as your interpreter I because it maps every lambda term to a well-defined element of the semantic domain of D, of which ⊥ is a member.

Well, of course the denotational semantics of the Haskell program interp is a total function; it’s a denotational semantics after all!

But there is an important difference: The semantics do not coincide in the way I think you imply.

For I, we have I[[Ω]] emp = ⊥ = I[[x]] emp.
On the other hand, if we had a denotational semantics for Haskell [[e]], then surely [[ interp omega Map.empty ]] = ⊥, whereas [[ interp (Var "x") Map.empty ]] = lift Bot. Note that Bot is a data constructor! It is never lift Bot = ⊥.
This means you cannot reason about the object language without knowing the denotational semantics of the meta language Haskell (rather than just its equational theory) as well.
It is one of the main consequences I had hoped to convey with the post!

Indeed, that’s correct. Is that termination checker part of LH or of the SMT solver? That is, can you just deactivate the termination check on this function? I guess my point is that it’s fine to reflect this definition into the logic, whereas it would not be fine to reflect the definition of Later into the logic. I’m positively sure that you will need to reflect gfix into the logic or at least provide it as an equality for manual unwrapping. How else would you do proofs about gfix?

Hmm, only if unwrap MkLater, which I think is your point? But it is also my point above:

data Stream a = a :> Later (Stream a) 

ones :: Stream Int
ones = gfix (\r -> 1 :> r) 

false :: Void
false = go ones
  where
     go (_ :> MkLater r) = go r

I’ve not used LH, but the paper uses a similar counterexample in 1.3 (albeit using --no-adt without which Stream would not be possible to define.)

That’s quite nice! And you hide the L constructor as well, which makes it safe.
But I’m not sure if you will manage to prove any properties about repmin if you don’t reflect gfix into the SMT logic.

1 Like

Ah, I meant to say I don’t want to do proofs. I just want to be sure my programs terminate. That’s what I meant with:

1 Like

Oh, I did not mean to claim that I and iterp are semantically equivalent. After all, isn’t the whole point of the exercise that they aren’t?

1 Like

Perhaps you can have both:

Ah, yes. Using LH to prove that your definitions are total is quite a neat idea!
However, you still need to teach LH about what kind of data type definitions are OK. For example, the original definition of D from the post is not. That is, data D = Fun (D -> D) | Stuck is probably rejected by LH’s totality checker. However, it should better allow data Stream a = a :> Later (Stream a) and data D = Fun (Later D -> D) | Stuck!

1 Like

I would have expected the denotation of res_omega to be an infinite nesting of step constructors. However, when normalizing it in Agda, I got a surprisingly finite, but unintelligible, result. Was my intuition wrong and what does that normal form really tell us?

I don’t think it normalises through transports, but I could be wrong about that. At least it looked like some transports were getting in the way of beta redexes when I checked the output a couple of days ago.

It’s strange that it does not appear to normalise through transports, because I had thought that was the main point about cubical…

At any rate, as you have probably guessed, normalising through the transport that unfolds fix leads to undecidable defn. equality. So that’s perhaps why it doesn’t completely normalise.