Encoding refinement types as ordinary sub-types

olf · January 16, 2025, 11:12am

A refinement type can be understood as a type X together with a predicate p :: X -> Bool. The refined type consists of all elements of X that satisfy the predicate. There are several tools in Haskell to work with refinement types. Among them are refined, rerefined and Liquid Haskell.

What I am looking for are tools or literature about encoding a refinement type as an ordinary type, where possible, as part of a code generator, thus following the make illegal values unrepresentable paradigm. There is the functional pearl ghosts of departed proofs but I feel that approach also does not go beyond the refinement type approach, since the run-time representation remains the same.

Problem scope:
We begin with a polynomial type X, built using sums and products and a few base types, no recursion. Thus it can have a Generic instance. Further there is a predicate p :: X -> Bool and we are interested in encoding the subset of X that satisfies the predicate as a new, smaller type. We also assume that we have access to the structure of the predicate as an AST, e.g. the logical connectives and the patterns used to define it.

Example 1:

data X = X {foo :: Maybe Int, bar :: Maybe Bool}
p :: X -> Bool
p (X (Just _) Nothing) = True
p (X Nothing (Just _)) = True
p _ = False

We are looking for a type Y and a function t :: Y -> X such that the diagram

     p
X ------> Bool

^         ^
| t       |  const True
|         |

Y ------> ()
  const ()

is a weak pullback. Here weak means that t is not required to be injective. In case of the example above, there is such a type (even a proper pullback):

type Y = Either Int Bool
t :: Y -> X
t = either 
  (\i -> X (Just i) Nothing) 
  (\b -> X Nothing (Just b))

The above example is special because the predicate mentions only constructor patterns or their predicate counterparts, like isNothing and isJust.

Example 2:

type X = Integer
p :: X -> Bool
p = even

This time the predicate mentions values of X, not only patterns. Nevertheless a refinement is possible:

type Y = Integer
t :: Y -> X
t = (2*)

This works because the predicate even can be “solved”: even x holds if any only if x = (2*) y for some y.
In general though, we can not expect this to work in a language without dependent types when p can contain arbitrary code. In particular, negations: There is no type of strings that do not begin with 'A', although it is definable as a refinement type.

Why even the first example is problematic: Consider a further refinement:

q :: X -> Bool
q x = p x && r x

Since we already have the sub-type inclusion t :: Either Int Bool -> X that handles the p part, we are now dealing with two new predicates

r.t.Left  :: Int  -> Bool
r.t.Right :: Bool -> Bool

which, if we carry out the above procedure mechanically, results in combinatorial explosion. I would be grateful for pointers to literature that solves this problem efficiently, avoiding combinatorial explosion when possible.

AntC2 · January 17, 2025, 12:19am

As I understand it, the proposals in hand for Haskell ‘Dependent Typing’ won’t help this work either. The debate around them at the time was clear that Liquid Haskell is a complement to Haskell DT, they don’t/won’t overlap in any way.

Yes I’m keen on that paradigm, and I see it often repeated in context of Haskell. AFAICT those claims are bogus so far as they pertain to the type system. To make illegal values ‘unrepresentable’ (I think the word I’d rather use is ‘unreachable’), you have to hide the basic parts of the implementation using the module system, then protect the abstraction by providing a bunch of functions (could include Pattern Synonyms) to restrict clients to safe access. I’m thinking, for example, of Data.Set Most operations require that e be an instance of the Ord class. But with sufficient cunning you can create a Set over elements not in Ord, and not get it rejected until the use site.

olf · January 17, 2025, 7:04am

In my particular use case this is a real option, see below.

That is indeed a detail inherent to the implementation and not to the concept of sets, which is discussed in another thread. In a way, what we are trying to do is similar: Hide implementation-level detail from the user.

The basis for my question is an XSD specification of XML structure, which can be translated mechanistically to a Haskell record type, say X. The spec for the business logic does not stop there, though. It uses schematron assertions to describe additional properties inexpressible in the XSD. Our goal is to provide smart constructors for X that have the assertions (or at least most of them) baked in.

Some of these properties mention only XML tags and their occurrence, which translates to predicates on X that mention only constructor patterns, as in Example 1. At least for this part of the spec I expect the sub-typing described above to be feasible. Some other assertions are “solvable” as in Example 2, and therefore can lead to the removal of redundant constructors.
There are, however, hundreds of assertions. This warrants a code-generator instead of manually processing them. Hence my question for theory on the subject.

raehik · January 17, 2025, 10:26am

Both refined and rerefined provide predicate combinators like And. If you use these together with the appropriate type classes, I don’t think you would need to manually write refinements like q x = p x && r x.

Most likely you knew this and I didn’t understand your point on combinatorial explosion.

Your mention of Generic and refining to types with different representations (instead of a newtype wrapper) remind me of strongweak. I’m doubtful it would be useful alongside a code generator, but I think it’s relevant. I couldn’t find relevant literature myself while writing the library.

olf · January 17, 2025, 11:37am

Definitely not. In our case these are to be parsed from XPath expressions.

Having both the refined and unrefined data available via a type-level switch is an interesting idea. My concern is that some predicates will change the data type enough so that this becomes impossible.

I try not to be dogmatic on the requirement that the refined data type must have a representation distinct from the unrefined data type. However, I feel as soon as we have some smart constructors, the smaller type is not far away: Suppose we have found two ways of smartly constructing X values in a way that the relevant predicates hold:

smart1 :: A -> B -> X
smart2 :: C -> D -> E -> X

Then why not define

data Y = Smart1 A B | Smart2 C D E
t :: Y -> X
t (Smart1 a b)   = smart1 a b
t (Smart2 c d e) = smart2 c d e

The image of smart1 and smart2 may overlap, whence t may not be a proper sub-type embedding. Yet Y confers much more information about the business logic.

raehik · January 17, 2025, 12:57pm

Hmm. I’m not sure how to help. A key point of refinement types for me is that they retain the same underlying representation as the unrefined data. For example, one may encode a strictly even x :: Integer by storing x' = x/2 (as you note). But now to use the encoded number, you need to do x' * 2 first. So we’re exclusively doing more work, over simply plastering a predicate over the type like x' :: Refined IsEven Integer.

Refinement types in Haskell are a convenience for me, a pattern turned into a framework (that conveniently corresponds closely with a minor type theory concept). If you have smart constructors like smart1, smart2 that consume different types to produce the same output type X without validating some predicate (else I’d expect smart1 :: ... -> Maybe X), perhaps your problem doesn’t map well to refinement types.

olf · January 17, 2025, 1:26pm

Precisely. That is why, mathematically, a sub-set of X is not the a bare set Y but the map t :: Y -> X or even the pullback diagram. Only in few representations, such as refinement types, the map is truly implicit.
Anyways, I guess my question is more about code generation than about representation. I began to experiment and I feel the Generic approach of structural recursion and associated types such as Rep may be fruitful.

olf · February 21, 2025, 5:34pm

Dear folks,
I have a release candidate for computing refinement types online:
hackage candidate and repo.
It can solve the Example 1 (almost, all sub-types must be finite) and the example is included in the repo. It builds on mathematics that is now 88 years old. Notably, one can also use the machinery in the package to perform completeness analysis of pattern matches.

blamario · March 3, 2025, 3:25pm

Would you mind uploading the Hackage documentation?

olf · March 3, 2025, 9:52pm

Done.
For a proper release, I’d be interested whether there exists machinery to automatically generate data type declarations from Typeable or Generic information. Perhaps the generic-data-surgery package will be helpful since refinement by a logical disjunction can potentially duplicate field names and their accessors.

olf · March 4, 2025, 9:39pm

Today I noticed that my K implementation of possibly-empty searchable subsets is not productive on infinite recursive structures such as lists. The transformers package does not do better.
Until a working replacement is found, this hinders the refinement of Generic types with list components.