Haskell records compare Standard ML

atravers · May 7, 2023, 12:08pm

The experience of the current effort to make GHC more modular along with having read Ben Moseley and Peter Marks’s Out of the Tar Pit now has me pondering this question: should some varieties of types (and the corresponding structured values) be allowed at all?

From the modular-GHC paper:

(page 33 of 59)

Caveat warning Passing several services to a function can look
cumbersome: why not bundle some of them into a single record
(say XYZEnv)? In our experience, there is no one-size-fits-all record
like this and the pattern is only a local maxima, instead, there is
a slightly different record best suited for each function. Thus, the
drive to bundle these services into a single record is exactly how
coupling begins. The sequence of events is:

Services are bundled into records such as XYZEnv.

Most of the functions which input XYZEnv use most of its
services or heavily related and thus have high coherence.

The code base evolves by adding new functions or features;
some functions require more than what is in XYZEnv, but still
require some services that exist in XYZEnv.

This produces an incentive to create new slightly altered
XYZEnv, or expand the existing XYZEnv. Creating more XYZEnvs
is cumbersome, and appears redundant, so the “path of least
resistance” is taken and an existing XYZEnv is extended with
whatever new fields are required for just the new functions.

Now that XYZEnv has grown, there are two secondary effects:
First, coherence is reduced because the number of functions
which use all of the XYZEnv fields is lower. Second, there is
more incentive to pass XYZEnv around because its functionality
has expanded.

And now we arrrive at a vicious cycle; more and more services
are added into XYZEnv because it is conveniently threaded
through many functions. But it is threaded through many
functions because of the many services it provides.

This is how we ended up threading DynFlags and HscEnv every-
where and storing arbitrary values into these records!

…and from Out of the Tar Pit:

(page 52 of 66)

9.2.4 Benefits for Data Abstraction

[…]

We believe that in many cases, un-needed data abstraction actually rep-
resents another common (and serious) cause of complexity. This is for two
reasons:

Subjectivity Firstly the grouping of data items together into larger com-
pound data abstractions is an inherently subjective business […]. Groupings which make sense for one purpose will inevitably differ from
those most natural for other uses, yet the presence of pre-existing data
abstractions all too easily leads to inappropriate reuse.

Data Hiding Secondly, large and heavily structured data abstractions can
seriously erode the benefits of referential transparency […]
This problem occurs both because data abstractions will often cause
un-needed, irrelevant data to be supplied to a function
[which lowers its coherency], and because
the data which does get used (and hence influences the result of a
function) is hidden at the function call site. This hidden and excessive
data leads to problems for testing as well as informal reasoning […]

It would appear that the current difficulties in making GHC more modular had (to some extent) been predicted as far back as 2006. So what was Moseley and Marks’s advice to avoid this problem?

 -- allowed
data Enumeration = Alpha Char | Beta Int | ... | Psi Double | Omega Bool

 -- banned
data Product = TarPot Char Int ... Double Bool

This also appears to be the solution which is being adopted by the modular-GHC developers - from their paper again:

(page 32 of 59)

In the following example, we can see that function foo requires
(and probably uses) two services: Logger and TmpFs.
foo :: Logger -> TmpFs -> ... -> IO ...
In current GHC, many similar functions would have the follow-
ing prototypes instead because services behavior may be configured
via command-line flags and some of their state may be stored in
the session environment:
foo :: DynFlags -> ... -> IO ...
 -- or
foo :: HscEnv -> ... -> IO ...
A significant part of our work is to refactor functions that have
the latter interfaces (DynFlags and HscEnv parameters) into functions
with the former interface (explicit passing of each service).

Perhaps the simplest way to solve the “anonymous records/record-polymorphic field names” problem is to disallow the types which necessitate their existence - product types, particularly those with inordinately-many components (such as HscEnv and DynFlags). But such a restriction is probably best left for yet another “Haskell” to explore…