Code style: «case» versus «when» in imperative blocks

The following two definitions are equivalent. Which one do you like more?

(Best compared in your editor of choice, with syntax highlighting.)

With higher order control flow functions:

initializePostgresCluster :: FilePath -> IO PostgresClusterID
initializePostgresCluster location = do
  maybeCurrentUserName <- lookupEnv "USER"
  currentUserName <- maybe (throwIO DeploymentException) return maybeCurrentUserName
  locationIsOccupied <- doesPathExist location
  when locationIsOccupied $ throwIO DeploymentException
  locationIsInDirectory <- doesDirectoryExist (dropFileName location)
  when (not locationIsInDirectory) $ throwIO DeploymentException
  callProcess "initdb" ["--pgdata", location, "--username", currentUserName]
  status <- validatePostgresClusterLocation location
  either throwIO return status

With case statements:

initializePostgresCluster :: FilePath -> IO PostgresClusterID
initializePostgresCluster location = do
  maybeCurrentUserName <- lookupEnv "USER"
  currentUserName <- case maybeCurrentUserName of
    Nothing -> throwIO DeploymentException
    Just x -> return x
  locationIsOccupied <- doesPathExist location
  case locationIsOccupied of
    True -> throwIO DeploymentException
    False -> do
      locationIsInDirectory <- doesDirectoryExist (dropFileName location)
      case locationIsInDirectory of
        False -> throwIO DeploymentException
        True -> do
          callProcess "initdb" ["--pgdata", location, "--username", currentUserName]
          status <- validatePostgresClusterLocation location
          case status of
            Left e -> throwIO e
            Right postgresClusterID -> return postgresClusterID

Please make up your mind before reading further, lest you be biased by my argument.


Discussion:

For higher order control flow functions:

  • Code is narrower since no indentation is needed. (Or, indeed, allowed.)

— But who cares, given modern display technology?

  • Code is more concise.

  • Reads like plain English.

For case statements:

  • Less cognitively demanding: no need to remember the shape of the type and mentally figure out which argument handles which case.

  • Proliferation of control flow functions is avoided, reducing long term memory recall expenses and hoogling delays.

  • Indentation makes code structure clear from a glance.

  • Parentheses are avoided, thus editing is easier.

  • Syntax highlighting helps readability more since the code is not a uniform wall of value identifiers but a pleasant tableau of value identifiers, data constructors, keywords and arrows. Unfortunately, Discourse does not seem to know Haskell, so please copy to your editor of choice in order to appreciate the difference.

  • Adding more statements (for example, logging) to a case is straightforward.

Against case statements:

  • Adding another control flow fork in the middle of a block requires adjusting the indentation of all the consecutive lines, resulting in a spurious diff.

It probably depends on the context.

In this case, there are a lot of case alternatives but it’s always ~“bail out when there’s an exception”. There clearly is a “normal” control flow which goes from the top to the bottom, and that’s why I think using when makes this easier to read.

If the case expressions become more complex and every case has a different path, with multiple statements, I think the second style is preferred.

1 Like

Both ways are wrong. You are not following the «parse, not validate» principle and you show no appreciation of what a monad is.

You may know of «applicative do». What does the «monadic do» do that the applicative one does not? Branch depending on previous results. Think about it. You have a monad insofar as you have a dependency graph. In a usual program, it is an acyclic directed graph, thus giving rise to a partial order. Collect all the minima. These are the actions that depend on nothing. You can execute them at once — concurrently if desired. Then proceed to the minima of the remaining graph. Make the partial order structure visible. Advance in waves.

initializePostgresCluster :: FilePath -> IO PostgresClusterID
initializePostgresCluster location = do
  (currentUserName, ( ), ( )) <- pure (,,)
    <*> getCurrentUserName
    <*> checkPathExists location
    <*> checkPathIsDirectory location
  callProcess "initdb" ["--pgdata", location, "--username", currentUserName]
  validatePostgresClusterLocationOrDie location
  where

    checkPathExists location = doesPathExist location >>=
      flip when (throwIO $ DeploymentException
        $ "Location designated for new cluster: " <> tShow location <> " — is occupied!")

    checkPathIsDirectory location = doesDirectoryExist (dropFileName location) >>=
      flip unless (throwIO $ DeploymentException
        $ "Location designated for new cluster: " <> tShow location <> " — is not a directory!")

    getCurrentUserName = lookupEnv "USER" >>=
      maybe (throwIO $ DeploymentException $
        "Unable to determine current user name!") return

    validatePostgresClusterLocationOrDie location =
      validatePostgresClusterLocation location >>= either throwIO return

In this example, it is implicit that the final result depends on the call to the external process. Ideally we should like it to return some sort of a witness value that would prove to us that it did what we suppose it should. Also, the dependency graph is very short, so it is not impressive. But consider that it being obviously shallow, unlike before, is a testament to the utility of the method.

1 Like

Your tone is quite harsh – maybe English is not your first language (it is not my first language either), but I don’t think it’s very nice to go around telling people that they “show no appreciation of what a monad is”. We should strive to be a welcoming community.

Anyway, I disagree with this. I think both ways are correct. I think the third way you describe is also correct. It’s a code style discussion, so I suspect there’s a large amount of subjectivity involved.

Sure, you can use <*> to indicate the places where there is no ordering needed. You can go a bit further even and use <* here, since that way you don’t need the subjectively awkward (,,):

currentUserName <- getCurrentUserName
    <* checkPathExists location
    <* checkPathIsDirectory location

Does this make it easier to read? I’m not sure. It makes it clear that the steps are independent. That’s good. But at the same time there’s no real “win” in this independence, since there’s no parallel execution or batching of error messages like in Data.Validation.

By the “parse, don’t validate” reference I assume you mean moving out the individual steps to individual parsers. That works, and follows the “parse, don’t validate” mantra, but it also increases the code size. So while it is somewhat of a win, I don’t think it’s a clear win – there’s again a tradeoff involved here.

4 Likes

This «welcoming» thing has gone way too far.

It is true that all three ways to write the given code are correct. But it is an unhelpful sort of truth. See: that all three ways are correct does not mean they are equally good, rather that we need finer distinctions, under which two of the three offered ways may be qualified as wrong.

The latter way makes the structure clear, so it has an edge. I imagine that the difference of clarity will manifest more and more as the code grows. Of course it is fair to say that this criterion is not naturally superiour. I welcome other, competing criteria to be announced.

It is surely true that error messages should be batched. I think this module can be used to that end. It makes the same distinction I am proposing to make explicit — between interdependent and independent action sets. I wonder if there is anything, in principle, against wiring this method of error batching into IO — it seems like a clear cut improvement.