Reversible Migrations in Rails 3.1

moe · on May 7, 2011

Wake me up when they finally generate those stupid migrations automatically (see http://south.aeracode.org/docs/tutorial/part1.html#changing-...).

jarin · on May 7, 2011

That probably won't happen (anytime soon at least), because Rails takes the approach of generating your model attributes based on the table fields, not Django's (equally valid) approach of declaring the model attributes on the model and then generating the migrations based on that.

Personally, I prefer the Rails way of doing it, since it keeps my model files cleaner and avoids ambiguity (to me) when adding/removing/changing attributes. Of course, there is a tradeoff in increased startup time because Rails has to analyze your tables, but PassengerPreStart pretty much takes care of that annoyance for me (and I use background processing that doesn't require the Rails environment be instantiated for every job).

snprbob86 · on May 7, 2011

I too prefer Rail's approach. However, I disagree that Django's approach is equally valid :-)

I've written about this before, but here's the short version:

1) The database holds the authoritative schema. You shouldn't duplicate it in your code.

2) What if you want to do something more complicated, like split the values in one column into two columns? You're going to need support for manual migrations anyway.

3) Including column declarations in your model forces the database's strong typing system onto what should be plain old Ruby/Python objects. The result is ineffective domain objects.

awj · on May 7, 2011

Just to be contrary:

> 1) The database holds the authoritative schema. You shouldn't duplicate it in your code.

I care about the authoritative data model, which is slightly different from the authoritative schema. Beyond that, I don't edit the database schema. There are many cases where I'm happier to have a complete data model in one location versus (at least) two and needing to mentally parse how rails reacts to the database schema.

> 2) What if you want to do something more complicated, like split the values in one column into two columns? You're going to need support for manual migrations anyway.

In case you didn't notice, the automatic migration was happening through a --auto command line flag. Manual migrations exist, and are the default.

> 3) Including column declarations in your model forces the database's strong typing system onto what should be plain old Ruby/Python objects. The result is ineffective domain objects.

Why should these be plain old Ruby/Python objects? ActiveRecord already is a pretty extensive DSL for relating objects to database tables. Does including attribute descriptions really make it "not a plain old Ruby object" when we already have well developed support for field validation and relationship declarations?

That said, I don't think the auto migrations are good idea. You're right, eventually you'll want to do a manual migration. At that point it's painfully easy to create a situation where your manual migration depends on properties of your model that have since changed. It's a problem akin to using your AR classes to help manipulate data in a migration. We have a point in our migration history we simply cannot restart from because it uses a model class that has since been removed.

snprbob86 · on May 7, 2011

> I care about the authoritative data model

Me too, but the truth about the world is that all inherit complexity is accompanied by incidental complexity :-/ You've got a database, you need to respect it's existence.

> mentally parse how rails reacts to the database schema

It's hilariously straightforward in 99% of cases:

For each column, a getter and setter are created. The getter directly calls read_attribute and the setter calls write_attribute. Those two methods do standard string/integer/date/etc primitive conversion. What's there to map in your head?

> Manual migrations exist, and are the default.

Right, that's the point I'm making. They should be the default. I'm also saying that I shouldn't have to write a migration AND maintain some declarations in the python file.

> it's painfully easy to create a situation where your manual migration depends on properties of your model that have since changed.

Oh, I've made this mistake. Since then, I stopped referencing my models in my migrations. I just run raw SQL (or the shorthand Ruby functions for the most common cases). As a result, my migrations are waaaay faster (batch updates rather loops, etc) and I've never had this problem since.

awj · on May 7, 2011

FYI, you can re-declare as much of the model class as you need and simply call reset_column_info as a class method before using it. The models at work have some homemade serialized attributes (don't ask), and this makes it easier to deal with that situation.

moe · on May 7, 2011

At that point it's painfully easy to create a situation where your manual migration depends on properties of your model that have since changed.

I don't understand the situation you're describing. The migration-files, once created, are independent from the model. That means you can re-run the chain at any point in time, regardless of what your current model looks like.

Having an auto-generator for the migration files doesn't change that.

snprbob86 · on May 7, 2011

He's making the same mistake I once made: Trying to use his models from the migrations.

I'd really like to see Rails raise a deprecation warning if a model gets dynamically loaded by a migration....

jarin · on May 7, 2011

Every Rails developer makes that mistake once (well, hopefully not more than once). What I'd really like to see next is a way to version seeds.rb.

james2m · on May 11, 2011

@Jarin I've been thinking about versioning seeds or at least being able to alter / update in place seeds. Did you have any thoughts on how you would like the versioning to work?

foca · on May 7, 2011

jarin · on May 7, 2011

seeds.rb isn't intended to be run more than once on a particular database instance. Running it more than once results in duplicate entries, unless you're explicitly checking for dupes (not always possible).

It looks like seedbank is fairly close to what I want though, as it lets you maintain multiple sets of seeds: https://github.com/james2m/seedbank

moe · on May 7, 2011

1) The database holds the authoritative schema. You shouldn't duplicate it in your code.

That's nonsense, your application implies the authoritative schema as it relies on a given set of model fields to exist. More importantly any given version of your application may imply a different schema.

Artificially detaching the schema from the code only causes problems and adds pointless complexity to the overwhelming majority of applications (which don't share the database with anything else).

Having to reach for irb or psql only to figure out what fields exist on a given model is backwards. Having to manually craft migrations with proper indexes, keys and constraints to match the separately defined validations and associations is completely ass-backwards.

snprbob86 · on May 7, 2011

When a migration runs, the schema.rb file is regenerated. This way, each checkin contains a complete snapshot of the schema that goes with that code.

(side note: I think that schema.sql should be the default, not schema.rb)

moe · on May 7, 2011

Yes, that's what everybody refers to when they quickly need to look up a field.

However, having to look in two places to understand your model is a bit cumbersome, isn't it? Especially when complex associations come into play and you have to resolve the rails identifier magic in your head...

And then there's the question of why you should have to refer to a cached copy of your schema when you could just spell it all out in one place.

jarin · on May 8, 2011

If remembering attribute names is your concern, you can use an IDE or vim plugin for autocomplete, or just use annotated_models to add/update the schema down at the bottom of your model files on every db:migrate: https://github.com/openteam/annotated_models

moe · on May 9, 2011

You're proposing kludges for a problem that doesn't exist in declarative ORMs and that is far from the most serious issue of AR.

My general point is that AR makes no sense for web applications.

fernandotakai · on May 7, 2011

I have to say that i prefer the Django/sqlalchemy way.

I know that i'm probably duplicating code, but with this way, i don't have to look at the database to know which properties a model has, i just have to go to the .py file.

Also, i can use migrations as well to make sure the database is with the right version.

snprbob86 · on May 7, 2011

> i don't have to look at the database to know which properties a model has

I just pop open db/schema.rb and then search for where the users table schema is described. This .rb file is automatically generated. In production systems, I actually configure it to use db/schema.sql which stores the SQL schema instead of the Ruby schema, since my production database has indexes that Rail's migrations can't describe directly.

Or even more directly:

  psql -c '\d users'

Which I typically run from vim:

  :!psql -c '\d users'

And that just prints out the description of the table for me.

fernandotakai · on May 8, 2011

I don't know, but i think that looking at models/yourmodel.py is easier. But well, this is me (and also, i came from a hibernate/spring background).

And for 'newbies', it's more intuitive to look for a class model than to look for sql schema.

masklinn · on May 7, 2011

> The database holds the authoritative schema.

Why?

> You shouldn't duplicate it in your code.

If the database does not hold the authoritative schema, you're not duplicating anything.

> What if you want to do something more complicated, like split the values in one column into two columns? You're going to need support for manual migrations anyway.

Of course, but it's not like the generated migrations are special. They're just written for you by the migration tool. One thing I'm sure you'll agree with is that computers are good at grunt, non-creative work. If the computer can write a migration, why not let it do so?

> Including column declarations in your model forces the database's strong typing system onto what should be plain old Ruby/Python objects. The result is ineffective domain objects.

Hogwash.

snprbob86 · on May 7, 2011

For one, the database is the only place where the entire schema is maintained. This includes indexes, constraints, relationships, etc. If I need to go add an index to a production system, that is not captured in the code. I should, however, dump the schema to a .sql file and check that in, but that's still separate from the model's .py file, so that file can't be considered authoritative.

More importantly, the database exists without the application. The application does not exist without the database. You might inherit a database/schema from someone else. And once your application grows, the database will surely have additional clients: you might want to access it via SQL directly or from some other language.

> One thing I'm sure you'll agree with is that computers are good at grunt, non-creative work.

It's been proven that you can reliably wrap each row in an object and each column's field with a getter/setter pair. It's easy. Totally non-creative, grunt work. That's what Active Record does.

Migration generation via diffing schemas? That's a much harder, in fact provably intractable problem! There are many ways to get from A to B. If you're going to leave it to a computer to generate either migrations or classes with getters and setters, I'd much prefer the computer do the getters/setters.

moe · on May 7, 2011

For one, the database is the only place where the entire schema is maintained.

That may be true for rails, but not for most other ORMs that don't follow the AR pattern. Said other ORMs take authority over the schema, which means by default they will sync the database to match your code - although the degree to which that is enforced is usually configurable.

If I need to go add an index to a production system

Then you will of course either test the change on a staging system first, or quickly backport the change after the hotfix. Whether the backport is done through a rails migration or a django model-change is not relevant to this discussion.

You might inherit a database/schema from someone else

Red herring. It's a rare case and people have managed this scenario with ORMs other than AR just fine since long before Rails was conceived.

Migration generation via diffing schemas? That's a much harder, in fact provably intractable problem!

A wide range of tools has a pretty good handle on this problem. (sqldelta.com, postfacto.org, xsql, red-gate.com, etc.)

Ambiguous changes are pretty rare during the evolution of your average webapp schema. Machines can very well solve 99% of cases on their own, and smartly interrogate the user for the rest.

stretchwithme · on May 7, 2011

One nice thing about migrations describing your schema is that they can be used to migrate databases from different vendors. But if you rely upon scripts to create and migrate databases, you need to write a script for each one.

And there is no partial rollback. You get to write another script for that or do it manually.

In olden times, I used to have to write these for Oracle and SQL Server, and just keeping them in sync was a challenge. And you had to write a specific script or execute multiple smaller scripts to move from version 1.3 of the schema to version 3.9. The incremental, db agnostic, reversible approach is much better.

But you really need to make sure the end result is exactly what you expect before executing on production (and back it up first of course). The db-specific script seems much easier to trust, with no magic happening, especially when its atomic.

masklinn · on May 8, 2011

> For one, the database is the only place where the entire schema is maintained. This includes indexes, constraints, relationships, etc.

not at all.

> If I need to go add an index to a production system, that is not captured in the code.

Depends of the ORM. With non-AR ORMs your index is very much captured in the code, as is the rest of the schema.

> More importantly, the database exists without the application.

1. That is not the case of the vast majority of databases and applications out there, which have a 1:1 coupling

2. Using an ORM does not preclude (or prevent) correct designs.

> The application does not exist without the database.

The application can very well be database-independent and run on 5 different (and incompatible) databases. If the sql schema is the canonical truth, you need 5 different schema files to handle all possible deployments of your application. If the ORM types definition is the canonical truth, one is enough and the ORM takes care of the details for each database.

> And once your application grows, the database will surely have additional clients

Baseless assertion.

> It's been proven that you can reliably wrap each row in an object and each column's field with a getter/setter pair.

So?

> Migration generation via diffing schemas? That's a much harder, in fact provably intractable problem!

Generating all migrations is intractable, generating 95% of migrations is not. Or South would not be able to do it.

> There are many ways to get from A to B.

So what?

> If you're going to leave it to a computer to generate either migrations or classes with getters and setters, I'd much prefer the computer do the getters/setters.

And I'd rather ask more of my tools. Different strokes for different folks.

prodigal_erik · on May 7, 2011

> More importantly, the database exists without the application.

This. I'm continually amazed how many people think they will only ever have one application talking to their database. Unless your project promptly crashes and burns, you're going to find yourself using tools in different languages, and it's going to be incredibly painful not to have invested in competent data modelling that avoids relying on some specific ORM.

moe · on May 8, 2011

I repeat: Red Herring. Competent data modeling and AR migrations are in no way related.

Moreover most grown up applications consist of multiple database, since it would be unwise to mix e.g. reporting concerns into a webapp database schema.

jarin · on May 7, 2011

I would argue that manipulating your database schema is creative work.

jarin · on May 7, 2011

Well, yeah I totally agree. I didn't want to start a flame war :)

jarin · on May 7, 2011

Mission failed.

kinofcain · on May 8, 2011

Then wake up, my friend:

http://datamapper.org/ https://github.com/datamapper/dm-rails

Datamapper was one of the great parts of the Merb framework, and with rails 3 you can use it in place of Activerecord.

nwjsmith · on May 7, 2011

I've been using `:Rinvert` in rails.vim to accomplish the same thing for the last couple of months. Extremely useful.

jackseviltwin · on May 7, 2011

I like that the author added a section on demystifying the magic involved in how reversible migrations actually work. I like knowing how things actually work behind the scenes. I'd like to see the Rails community put less emphasis on the magic and more on how things work.

epochwolf · on May 7, 2011

About time. This has bugged me for years.