Coerce with type families

Hi everyone.

Currently I’m playing with type families and talking to chatGPT and came here to ask help and suggestions because GPT suggests mad examples.

Suppose I wanna take some data from stdin with help of optparse-applicative and store it in the following type

data Raw
data Validated

data Test a = Test
  { field1 :: Maybe Text 
  , field2 :: Maybe Text 
  }

I’ve deliberately chosen the following way - optparse parses and stores the data in the Test Raw type - I’m not interested to put any logic on the parsing stage, I just want to make my life easier by validating Maybe fields instead of bare types - and then I wanna validate that type and return Test Validated - I’m practicing Use a data structure that makes illegal states unrepresentable

I also have this type class

class Validate a where
  validate :: a Raw -> Either Error (a Validated)

In the end, I wanna have type Test looks like this

data Test a = Test
  { field1 :: Text
  , field2 :: Text
  }

I know that I can do that with type families without involving other types

type family Field a b c where
  Field Raw b _ = b
  Field Validated _ c = c

data Test a = Test
  { field1 :: Field a (Maybe Text) Text
  , field2 :: Field a (Maybe Text) Text
  }

However, I have several questions:

  1. how to use coerce with type families in that case? I just want to convert Test Raw to Test Validated without involving creating a new term
  2. I always had troubles with understanding of Parse, don’t validate approach, so the question: am I implemented it correctly? based on examples and description that I provided

Your help will be appreciated, and sorry for English - it’s not my native

Thanks!

You can’t coerce between Maybe Text and Text. Not safely, and not unsafely—they are represented differently in memory, and if you try to do this with unsafeCoerce your program will crash.

Why do you think you need to coerce? You have to visit each field in order to validate it, right? If you’re already doing that, you’re paying the time cost of having a new term, and the memory cost is negligible.

Since I’m returning Test Validated, I don’t want to construct another type which will cost the performance and so on. Before the type family approach my type had bare types in it and I benchmarked approach with coercion and without - it had an immense performance boost with coercion

I don’t know what you were trying to benchmark, but trust me, coercion is not a good fit for what you’re trying to do. If all you had was a phantom type parameter that indicated the validation state without changing any types, that’s another story. But if you want input and output to have different types with different representations, coercion is just a non-starter. You can’t successfully coerce types with different run-time representations.

I wouldn’t necessarily recommend this barbie-style validation framework either—the essence of parse-not-validate is to get errors out of your types as quickly as possible, i.e. in the optparse-applicative layer—but if that’s an itch you want to scratch to see where it ends up, it’s certainly something you can do. Unlike the coercion thing, which, again, I can’t stress enough, you will not get to work as long as your types have different representations.

2 Likes

Now it’s clear, thanks! So, the only way to do is to create another smart constructor which will translate from Raw to Validate, right?

Your heart is in the right place here, and half of it isn’t even your fault.


An ideal commandline option parser would allow you to convert between

-------------------------------
-- Options                   --
--       --foo     Does foo. --
--   -r, --bar     Does bar. --
--   -z, --baz     Does baz. --
-------------------------------

and

data Options =
       Options
         { quux  :: Quux
         , corge :: Corge
         }

This conversion would require some amount of internal state which shouldn’t be dependent on either interface. Then, as the parser folds over the list of options, it would modify its internal state, ultimately either failing due to some [custom] error, or succeeding and converting the internal state into Options.


You think of Test Raw as your internal state, but it’s not, it’s your Options. Going from Test Raw to Test Validated is a completely separate [validation] step which is ill-advised as it no longer has access to the parser context (you can’t say “when parsing option --foo” from here nicely).

Now, the meta-problem here is that none of the parsers in the Haskell ecosystem work the way I described above. You’re expected to derive your parsers from datatypes (Applicative in optparse-applicative, Generics in aeson, ??? in cmdargs), which superglues external format to data representation. Handrolling parsers, if you’re even allowed to, is extremely unpleasant in every way.

As such your best way forward is to do as much inline validation with what parsers provide, and, if any additional validation is necessary, create extra datatypes on the fly. So, no Validate typeclass, TestRaw and TestValidated should be separate.

More or less. With the types you have right now, I’d expect something like this:

{-# LANGUAGE ApplicativeDo #-}
{-# LANGUAGE RecordWildCards #-}

class Validate Test where
  validate Test{..} = do
    field1 <- validateField checkField1 field1
    field2 <- validateField checkField2 field2
    pure Test{..}

validateField :: (a -> Maybe Error) -> Maybe a -> Either Error a
validateField _ Nothing -> Left MissingFieldError
validateField check (Just a) | Just e <- check a = Left e
                             | otherwise = Right a

But a few notes:

  • I’d advise against using raw Either for validation, since it only propagates the first error. Validation is a wrapper that accumulates all errors in a semigroup; other implementations exist in the ecosystem with slightly different ergonomics. Use one of those (or roll your own; they’re small).
  • You’re not making best use of the type system with all these raw Texts floating around. Whatever invariants your validation logic is checking, those should be captured in the type, so that you don’t confuse an unvalidated Text with a validated one. So make a newtype for each kind of Text you have—PhoneNumber, Address, whatever—and use smart constructors on those types. The smart constructors do the validation. You can still do your type family trick:
data Test a = Test
  { field1 :: Field a (Maybe Text) Address
  , field2 :: Field a (Maybe Text) PhoneNumber
  }

But now validation becomes cleaner, and the types are more expressive.

class Validate Test where
  validate Test{..} = do
    field1 <- validateField mkAddress field1
    field2 <- validateField mkPhoneNumber field2
    pure Test{..}

validateField :: (a -> Validation Error b) -> Maybe a -> Validation Error b
validateField = maybe (Failure MissingFieldError)

mkAddress :: Text -> Validation Error Address
mkPhoneNumber :: Text -> Validation Error PhoneNumber

(Don’t forget ApplicativeDo with this, though! Validation e isn’t a monad.)

Agreed with pretty much everything rhendric said. To add to that though: the definition of the ‘Field’ type family still seems more complicated than necessary. I tend to use HKD’s instead. Something along the lines of:

data Valid a 
data Raw a 
-- or just use Data.Functor.Identity and Data.Maybe directly

type family HKD f a where 
   HKD Valid a = a 
   HKD Raw   a = Maybe a

data Test f = Test { field1 :: HKD f Address, field2 :: HKD f PhoneNumber } 

rawInput :: Test Raw 
rawInput = Test (Just myAddress) Nothing

validTest :: Test Valid
validTest = Test myAddress myPhoneNumber