How to Create Django Data Migrations

Data Migration is a very convenient way to change the data in the database in conjunction with changes in the schema. They work like a regular schema migration. Django keep track of dependencies, order of execution and if the application already applied a given data migration or not.

A common use case of data migrations is when we need to introduce new fields that are non-nullable. Or when we are creating a new field to store a cached count of something, so we can create the new field and add the initial count.

In this post we are going to explore a simple example that you can very easily extend and modify for your needs.

Data Migrations

Let’s suppose we have an app named blog, which is installed in our project’s INSTALLED_APPS.

The blog have the following model definition:

blog/models.py

from django.db import models

class Post(models.Model):
    title = models.CharField(max_length=255)
    date = models.DateTimeField(auto_now_add=True)
    content = models.TextField()

    def __str__(self):
        return self.title

The application is already using this Post model; it’s already in production and there are plenty of data stored in the database.

id	title	date	content
1	How to Render Django Form Manually	2017-09-26 11:01:20.547000	[…]
2	How to Use Celery and RabbitMQ with Django	2017-09-26 11:01:39.251000	[…]
3	How to Setup Amazon S3 in a Django Project	2017-09-26 11:01:49.669000	[…]
4	How to Configure Mailgun To Send Emails in a Django Project	2017-09-26 11:02:00.131000	[…]

Now let’s say we want to introduce a new field named slug which will be used to compose the new URLs of the blog. The slug field must be unique and not null.

Generally speaking, always add new fields either as null=True or with a default value. If we can’t solve the problem with the default parameter, first create the field as null=True then create a data migration for it. After that we can then create a new migration to set the field as null=False.

Here is how we can do it:

blog/models.py

from django.db import models

class Post(models.Model):
    title = models.CharField(max_length=255)
    date = models.DateTimeField(auto_now_add=True)
    content = models.TextField()
    slug = models.SlugField(null=True)

    def __str__(self):
        return self.title

Create the migration:

python manage.py makemigrations blog

Migrations for 'blog':
  blog/migrations/0002_post_slug.py
    - Add field slug to post

Apply it:

python manage.py migrate blog

Operations to perform:
  Apply all migrations: blog
Running migrations:
  Applying blog.0002_post_slug... OK

At this point, the database already have the slug column.

id	title	date	content	slug
1	How to Render Django Form Manually	2017-09-26 11:01:20.547000	[…]	(null)
2	How to Use Celery and RabbitMQ with Django	2017-09-26 11:01:39.251000	[…]	(null)
3	How to Setup Amazon S3 in a Django Project	2017-09-26 11:01:49.669000	[…]	(null)
4	How to Configure Mailgun To Send Emails in a Django Project	2017-09-26 11:02:00.131000	[…]	(null)

Create an empty migration with the following command:

python manage.py makemigrations blog --empty

Migrations for 'blog':
  blog/migrations/0003_auto_20170926_1105.py

Now open the file 0003_auto_20170926_1105.py, and it should have the following contents:

blog/migrations/0003_auto_20170926_1105.py

# -*- coding: utf-8 -*-
# Generated by Django 1.11.5 on 2017-09-26 11:05
from __future__ import unicode_literals

from django.db import migrations


class Migration(migrations.Migration):

    dependencies = [
        ('blog', '0002_post_slug'),
    ]

    operations = [
    ]

Then here in this file, we can create a function that can be executed by the RunPython command:

blog/migrations/0003_auto_20170926_1105.py

# -*- coding: utf-8 -*-
# Generated by Django 1.11.5 on 2017-09-26 11:05
from __future__ import unicode_literals

from django.db import migrations
from django.utils.text import slugify


def slugify_title(apps, schema_editor):
    '''
    We can't import the Post model directly as it may be a newer
    version than this migration expects. We use the historical version.
    '''
    Post = apps.get_model('blog', 'Post')
    for post in Post.objects.all():
        post.slug = slugify(post.title)
        post.save()


class Migration(migrations.Migration):

    dependencies = [
        ('blog', '0002_post_slug'),
    ]

    operations = [
        migrations.RunPython(slugify_title),
    ]

In the example above we are using the slugify utility function. It takes a string as parameter and transform it in a slug. See below some examples:

from django.utils.text import slugify

slugify('Hello, World!')
'hello-world'

slugify('How to Extend the Django User Model')
'how-to-extend-the-django-user-model'

Anyway, the function used by the RunPython method to create a data migration, expects two parameters: apps and schema_editor. The RunPython will feed those parameters. Also remember to import models using the apps.get_model('app_name', 'model_name') method.

Save the file and execute the migration as you would do with a regular model migration:

python manage.py migrate blog
Operations to perform:
  Apply all migrations: blog
Running migrations:
  Applying blog.0003_auto_20170926_1105... OK

Now if we check the database:

id	title	date	content	slug
1	How to Render Django Form Manually	2017-09-26 11:01:20.547000	[…]	how-to-render-django-form-manually
2	How to Use Celery and RabbitMQ with Django	2017-09-26 11:01:39.251000	[…]	how-to-use-celery-and-rabbitmq-with-django
3	How to Setup Amazon S3 in a Django Project	2017-09-26 11:01:49.669000	[…]	how-to-setup-amazon-s3-in-a-django-project
4	How to Configure Mailgun To Send Emails in a Django Project	2017-09-26 11:02:00.131000	[…]	how-to-configure-mailgun-to-send-emails-in-a-django-project

Every Post entry have a value, so we can safely change the switch from null=True to null=False. And since all the values are unique, we can also add the unique=True flag.

Change the model:

blog/models.py

from django.db import models

class Post(models.Model):
    title = models.CharField(max_length=255)
    date = models.DateTimeField(auto_now_add=True)
    content = models.TextField()
    slug = models.SlugField(null=False, unique=True)

    def __str__(self):
        return self.title

Create a new migration:

python manage.py makemigrations blog

This time you will see the following prompt:

You are trying to change the nullable field 'slug' on post to non-nullable without a default; we can't do that
(the database needs something to populate existing rows).
Please select a fix:
 1) Provide a one-off default now (will be set on all existing rows with a null value for this column)
 2) Ignore for now, and let me handle existing rows with NULL myself (e.g. because you added a RunPython or RunSQL
 operation to handle NULL values in a previous data migration)
 3) Quit, and let me add a default in models.py
Select an option:

Select option 2 by typing “2” in the terminal.

Migrations for 'blog':
  blog/migrations/0004_auto_20170926_1422.py
    - Alter field slug on post

Now we can safely apply the migration:

python manage.py migrate blog
Operations to perform:
  Apply all migrations: blog
Running migrations:
  Applying blog.0004_auto_20170926_1422... OK

Conclusions

Data migrations are tricky sometimes. When creating data migration for your projects, always examine the production data first. The implementation of the slugify_title I used in the example is a little naïve, because it could generate duplicate titles for a large dataset. Always test the data migrations first in a staging environment, so to avoid breaking things in production.

It’s also important to do it step-by-step, so you can feel in control of the changes you are introducing. Note that here I create three migration files for a simple data migration.

As you can see, it’s fairly easy to create this type of migration. It’s also very flexible. You could for example load an external text file to insert the data into a new column for example.

The source code used in this blog post is available on GitHub: https://github.com/sibtc/data-migrations-example