[ANN] Codd - a tool to apply postgresql migrations

This was posted by myself on Reddit a few days ago. Posting it here as I heard some haskellers don’t lurk on Reddit. Some comparisons to existing tools exist in the Reddit thread, but please don’t be afraid to ask in case you don’t visit/like Reddit.

I’d like to announce Codd. Link to GitHub.

Codd is a tool that applies plain postgres SQL migrations atomically (when postgres allows it) and enforces schema consistency checks to ensure different environments have the same schema. It also helps by providing meaningful merge conflicts for when e.g. different developers modify the same DB objects, among other interesting features. Please check its GitHub docs for what it can do!

We’ve been using codd at PRODA for some months (though we’ve been using parts of it for far longer in a transition and test period) to apply our migrations, run schema consistency checks and manage our multiple environments’ databases, from dev to staging, production and tests. I’d like to thank my friends at PRODA for their support and feedback all this time: it has been invaluable.

Now I find it’s time to release codd to a broader audience, and thought of this community because it’s written in Haskell.

I hope the documentation on GitHub explains reasonably how codd works and its nuances. I’m open to questions and feedback of all kinds here as well!

7 Likes

Thank you. Yes I’m one of the old grumps who doesn’t get on with Reddit.

You do know Ted Codd didn’t invent SQL, grew to hate it as a corruption of his ideas, and eventually had a big falling-out with IBM?

1 Like

Thank you very much for sharing this, and congratulations on the public release!

I have some questions regarding the blue-green-safe example migration. Such multi-step migrations generally require that steps are executed separately. This example renames a column, so I imagine high-level migration steps like the following:

  1. Migrate the database to an intermediate state that is compatible with both the old and new versions of the client software. (Steps 1 through 4)
  2. Upgrade the client software to the new version. This may be a phased/gradual rollout that allows testing of the new version in production before rolling out to all instances.
  3. One of the following:
    • (Success: all client software has been upgraded to the new version.) Migrate the database to the target state. (Remaining commands)
    • (Failure: there is an issue in the new software and it is decided to try a different approach. The clients that were upgraded to the new version are downgraded to the old version.) Migrate the database with the inverse of the first step so that the previous database state is restored.

High-level step 3 is not run until step 2 succeeds or fails. This may be automated, but it is often a decision made by an administrator when an upgrade is complex.

The example migration puts all of the database migration steps in one migration file. I thought that there might be support for migration steps, but I was unable to find anything about this. Perhaps this example migration should be split into two parts so that they can be applied separately? Am I misunderstanding or missing anything?

By the way, this example migration is configured to not run in a transaction, which introduces a number of issues:

  • Any UPDATE or INSERT between the UPDATE command and the following ADD CONSTRAINT command will cause the ADD CONSTRAINT command to fail.
  • Any UPDATE or INSERT between the ADD CONSTRAINT command and the corresponding CREATE TRIGGER command will fail with a constraint violation.
  • Any UPDATE or INSERT that is made between the corresponding DROP TRIGGER command and the RENAME COLUMN command is silently lost.
  • Any UPDATE or INSERT that is made between the DROP COLUMN and RENAME COLUMN commands will fail.

As for migration inverses, I guess they could simply be prepared as separate migrations.

1 Like

Ah, I never knew who invented SQL, but I certainly didn’t know Codd thought it was a corruption of his ideas. That’s very interesting!

1 Like

Thanks!

That’s quite the digging in the code that you did. Nice :slight_smile:

I think that blue-green-safe migration is more confusing than helpful; sorry! It’s from a past where Codd supported so called blue-green-safe migrations by having every migration have a -- codd: non-destructive and a -- codd: destructive section inside the same .sql file. The destructive section would automatically be applied in later deployments, so you are actually spot on. For various reasons, codd no longer supports that workflow.

Although I wasn’t considering rolling out application deployments gradually when I thought of blue-green-safe migrations, you will see that I at least had in mind that the destructive section of each SQL migration (e.g. dropping the column with the old name) would only run in a future codd up. This made codd up non idempotent in the simplest of cases or required a setting/env-var to choose up to which migration destructive sections would be applied. Both of these options sounded like more trouble than they’re worth. I even wrote a document (no longer exists in the master branch) about blue-green-safe migrations in Codd.

Blue-green-safe migrations also would have made parsing migrations in streaming fashion more difficult, and was infectious in the codebase in ways that are hard to describe.

By the way, this example migration is configured to not run in a transaction

Good catch. I think once again this is my fault for not tidying things up in the codebase :sweat_smile:.
I use those migrations in the repository as quick-and-dirty test beds; they don’t really mean much (e.g. this could’ve been me testing a BGS migration and then making it no-txn to see if codd didn’t burn, although I wouldn’t remember). I will definitely tidy them up. For what it’s worth, the docs recommend being careful with no-txn migrations.

As for migration inverses, I guess they could simply be prepared as separate migrations.

Currently that’s the only possibility if you want to revert something that has been deployed. I did give reversal migrations some thought in the past, but wasn’t able to come up with anything that’d be significantly different than just adding new migrations. Some kind of support for reversal could be useful during development, though, and I guess might facilitate some complex deployment scenarios like the one you described, but I still would have to think carefully how to approach this without going through the mess I made with BGS migrations.

2 Likes

Thank you very much for your reply!

You announced Codd at a time when I happened to be considering options for managing migrations in a work project. It was great timing for me, and we have recently decided to move forward with it!

I just finished an experiment with the following goals:

  • Use Codd in an environment where the database, role, and permissions are configured externally
  • Use Codd in an environment that uses password authentication, without leaking non-development credentials
  • Manage multi-step migrations as well as (optional) migration inverses

I wrote about it in the Codd Experiment blog entry on my personal blog.

At this point, my feedback mostly concerns usability. I am including a summary below, and I am happy to create issues if desired.

  • The codd up command name sounds like it is (just) used to initialize a new database, but I think it is primarily used to synchronize a database by applying pending migrations. If so, perhaps codd sync would be more intuitive?
  • The codd up command could use an option that shows what would be done without actually doing it. This option is traditionally called --dry-run, but perhaps --no-apply could be used for consistency with codd add.
  • The codd up, codd write-schema, and codd verify-schema commands could each use a --verbose option.
  • The codd add command has a --no-apply option. Perhaps --dry-run would be better?
  • Error messages are often misleading. For example, I ran into many errors about connection strings even though my connection string was fine. My use is clearly not anticipated, but implementing better error handling could greatly help people who try such things.

By the way, I am still considering how to manage multiple environments. For example, we had the following types of environments at a previous job.

  • Production environments: used by customers, contain sensitive data, have SLOs/SLAs
  • Demonstration environments: used by sales staff and/or potential customers, may contain sensitive data, minimizing issues and downtime is “best effort”
  • Staging environments: used to test before pushing to production or demonstration environments
  • Testing environments: used by developers to test during development
  • Development environments: local environments each used by single developers during development

Migrations are developed and tested locally in development environments, may be tested in testing environments, definitely tested in staging environments, and then applied to demonstration and production environments. Different environments do not necessarily have the same migrations applied at all times. When storing the Codd migration and schema directories in the project repository, they likely represent the development state, which is often further along than the state of other environments. Perhaps separate Codd migration and schema directories should be used for other environments. I plan to experiment with this in the future.

Sorry for the wall of text. Thanks again!

5 Likes

I’m glad you found codd useful, and the article is very interesting! Maybe one day codd could support workflows such as yours more easily.

I’d love to have the items you raised about usability reported as issues. Feel free to paste even just a title without a description and I can fill in the details in case you’re too busy. In particular, the faux error message is concerning.

As for managing multiple environments, maybe I’m lacking details about how you currently plan to manage them. In my view, migrations are applied alongside deployments, and since migrations and schema files are included in the repository, you always apply SQL changes that are compatible with the git revision you’re deploying.
That way you always have two folders and no more: one for migrations and another for schema files. In that sense, development really is further ahead than every other environment, but that is not a problem.

Perhaps one complication is if you want to apply inverses: you may not even have applied the troubling migrations in every environment, so you should only ship inverses to where the troubling migrations have been applied, but then you don’t want to ship the migrations and their inverses to environments where neither have been applied. I’m afraid codd could make your life much harder in a case like that.

As far as other concerns regarding multiple environments (none that you raised in particular, so the following might seem silly or useless to you; sorry if that’s the case!), codd can do a form of primitive env var templating to help with statements such as ALTER DATABASE "${PGDATABASE}" ... without being tied to your development environment’s database name, and the same trick can be used for credentials inside migrations.

One other arrangement that works well is to have folders of migrations for specific environments. You could have folders dev-only-migrations and test-only-migrations which e.g. creates the database and sets users up in the former and adds dummy data in the latter (but does not change the schema or else you would need a separate folder for the schema files), and they wouldn’t apply to Production, Staging and other environments. You just need to change env vars accordingly each time.

Hope it helps, but in case it doesn’t or you have more questions, please feel free to ask!

2 Likes