The following two definitions are equivalent. Which one do you like more?
(Best compared in your editor of choice, with syntax highlighting.)
With higher order control flow functions:
initializePostgresCluster :: FilePath -> IO PostgresClusterID
initializePostgresCluster location = do
maybeCurrentUserName <- lookupEnv "USER"
currentUserName <- maybe (throwIO DeploymentException) return maybeCurrentUserName
locationIsOccupied <- doesPathExist location
when locationIsOccupied $ throwIO DeploymentException
locationIsInDirectory <- doesDirectoryExist (dropFileName location)
when (not locationIsInDirectory) $ throwIO DeploymentException
callProcess "initdb" ["--pgdata", location, "--username", currentUserName]
status <- validatePostgresClusterLocation location
either throwIO return status
With case statements:
initializePostgresCluster :: FilePath -> IO PostgresClusterID
initializePostgresCluster location = do
maybeCurrentUserName <- lookupEnv "USER"
currentUserName <- case maybeCurrentUserName of
Nothing -> throwIO DeploymentException
Just x -> return x
locationIsOccupied <- doesPathExist location
case locationIsOccupied of
True -> throwIO DeploymentException
False -> do
locationIsInDirectory <- doesDirectoryExist (dropFileName location)
case locationIsInDirectory of
False -> throwIO DeploymentException
True -> do
callProcess "initdb" ["--pgdata", location, "--username", currentUserName]
status <- validatePostgresClusterLocation location
case status of
Left e -> throwIO e
Right postgresClusterID -> return postgresClusterID
Please make up your mind before reading further, lest you be biased by my argument.
Discussion:
For higher order control flow functions:
- Code is narrower since no indentation is needed. (Or, indeed, allowed.)
— But who cares, given modern display technology?
For case statements:
-
Less cognitively demanding: no need to remember the shape of the type and mentally figure out which argument handles which case.
-
Proliferation of control flow functions is avoided, reducing long term memory recall expenses and hoogling delays.
-
Indentation makes code structure clear from a glance.
-
Parentheses are avoided, thus editing is easier.
-
Syntax highlighting helps readability more since the code is not a uniform wall of value identifiers but a pleasant tableau of value identifiers, data constructors, keywords and arrows. Unfortunately, Discourse does not seem to know Haskell, so please copy to your editor of choice in order to appreciate the difference.
-
Adding more statements (for example, logging) to a case is straightforward.
Against case statements:
- Adding another control flow fork in the middle of a block requires adjusting the indentation of all the consecutive lines, resulting in a spurious diff.
It probably depends on the context.
In this case, there are a lot of case alternatives but it’s always ~“bail out when there’s an exception”. There clearly is a “normal” control flow which goes from the top to the bottom, and that’s why I think using when
makes this easier to read.
If the case expressions become more complex and every case has a different path, with multiple statements, I think the second style is preferred.
1 Like
Both ways are wrong. You are not following the «parse, not validate» principle and you show no appreciation of what a monad is.
You may know of «applicative do
». What does the «monadic do
» do that the applicative one does not? Branch depending on previous results. Think about it. You have a monad insofar as you have a dependency graph. In a usual program, it is an acyclic directed graph, thus giving rise to a partial order. Collect all the minima. These are the actions that depend on nothing. You can execute them at once — concurrently if desired. Then proceed to the minima of the remaining graph. Make the partial order structure visible. Advance in waves.
initializePostgresCluster :: FilePath -> IO PostgresClusterID
initializePostgresCluster location = do
(currentUserName, ( ), ( )) <- pure (,,)
<*> getCurrentUserName
<*> checkPathExists location
<*> checkPathIsDirectory location
callProcess "initdb" ["--pgdata", location, "--username", currentUserName]
validatePostgresClusterLocationOrDie location
where
checkPathExists location = doesPathExist location >>=
flip when (throwIO $ DeploymentException
$ "Location designated for new cluster: " <> tShow location <> " — is occupied!")
checkPathIsDirectory location = doesDirectoryExist (dropFileName location) >>=
flip unless (throwIO $ DeploymentException
$ "Location designated for new cluster: " <> tShow location <> " — is not a directory!")
getCurrentUserName = lookupEnv "USER" >>=
maybe (throwIO $ DeploymentException $
"Unable to determine current user name!") return
validatePostgresClusterLocationOrDie location =
validatePostgresClusterLocation location >>= either throwIO return
In this example, it is implicit that the final result depends on the call to the external process. Ideally we should like it to return some sort of a witness value that would prove to us that it did what we suppose it should. Also, the dependency graph is very short, so it is not impressive. But consider that it being obviously shallow, unlike before, is a testament to the utility of the method.
1 Like
Your tone is quite harsh – maybe English is not your first language (it is not my first language either), but I don’t think it’s very nice to go around telling people that they “show no appreciation of what a monad is”. We should strive to be a welcoming community.
Anyway, I disagree with this. I think both ways are correct. I think the third way you describe is also correct. It’s a code style discussion, so I suspect there’s a large amount of subjectivity involved.
Sure, you can use <*>
to indicate the places where there is no ordering needed. You can go a bit further even and use <*
here, since that way you don’t need the subjectively awkward (,,)
:
currentUserName <- getCurrentUserName
<* checkPathExists location
<* checkPathIsDirectory location
Does this make it easier to read? I’m not sure. It makes it clear that the steps are independent. That’s good. But at the same time there’s no real “win” in this independence, since there’s no parallel execution or batching of error messages like in Data.Validation
.
By the “parse, don’t validate” reference I assume you mean moving out the individual steps to individual parsers. That works, and follows the “parse, don’t validate” mantra, but it also increases the code size. So while it is somewhat of a win, I don’t think it’s a clear win – there’s again a tradeoff involved here.
4 Likes
This «welcoming» thing has gone way too far.
It is true that all three ways to write the given code are correct. But it is an unhelpful sort of truth. See: that all three ways are correct does not mean they are equally good, rather that we need finer distinctions, under which two of the three offered ways may be qualified as wrong.
The latter way makes the structure clear, so it has an edge. I imagine that the difference of clarity will manifest more and more as the code grows. Of course it is fair to say that this criterion is not naturally superiour. I welcome other, competing criteria to be announced.
It is surely true that error messages should be batched. I think this module can be used to that end. It makes the same distinction I am proposing to make explicit — between interdependent and independent action sets. I wonder if there is anything, in principle, against wiring this method of error batching into IO — it seems like a clear cut improvement.