Data Migration is a very convenient way to change the data in the database in conjunction with changes in the schema. They work like a regular schema migration. Django keep track of dependencies, order of execution and if the application already applied a given data migration or not.
A common use case of data migrations is when we need to introduce new fields that are non-nullable. Or when we are creating a new field to store a cached count of something, so we can create the new field and add the initial count.
In this post we are going to explore a simple example that you can very easily extend and modify for your needs.
Data Migrations
Let’s suppose we have an app named blog, which is installed in our project’s INSTALLED_APPS
.
The blog have the following model definition:
blog/models.py
from django.db import models
class Post(models.Model):
title = models.CharField(max_length=255)
date = models.DateTimeField(auto_now_add=True)
content = models.TextField()
def __str__(self):
return self.title
The application is already using this Post model; it’s already in production and there are plenty of data stored in the database.
id | title | date | content |
---|---|---|---|
1 | How to Render Django Form Manually | 2017-09-26 11:01:20.547000 | […] |
2 | How to Use Celery and RabbitMQ with Django | 2017-09-26 11:01:39.251000 | […] |
3 | How to Setup Amazon S3 in a Django Project | 2017-09-26 11:01:49.669000 | […] |
4 | How to Configure Mailgun To Send Emails in a Django Project | 2017-09-26 11:02:00.131000 | […] |
Now let’s say we want to introduce a new field named slug which will be used to compose the new URLs of the blog. The slug field must be unique and not null.
Generally speaking, always add new fields either as null=True
or with a default
value. If we can’t solve the
problem with the default
parameter, first create the field as null=True
then create a data migration for it. After
that we can then create a new migration to set the field as null=False
.
Here is how we can do it:
blog/models.py
from django.db import models
class Post(models.Model):
title = models.CharField(max_length=255)
date = models.DateTimeField(auto_now_add=True)
content = models.TextField()
slug = models.SlugField(null=True)
def __str__(self):
return self.title
Create the migration:
python manage.py makemigrations blog
Migrations for 'blog':
blog/migrations/0002_post_slug.py
- Add field slug to post
Apply it:
python manage.py migrate blog
Operations to perform:
Apply all migrations: blog
Running migrations:
Applying blog.0002_post_slug... OK
At this point, the database already have the slug column.
id | title | date | content | slug |
---|---|---|---|---|
1 | How to Render Django Form Manually | 2017-09-26 11:01:20.547000 | […] | (null) |
2 | How to Use Celery and RabbitMQ with Django | 2017-09-26 11:01:39.251000 | […] | (null) |
3 | How to Setup Amazon S3 in a Django Project | 2017-09-26 11:01:49.669000 | […] | (null) |
4 | How to Configure Mailgun To Send Emails in a Django Project | 2017-09-26 11:02:00.131000 | […] | (null) |
Create an empty migration with the following command:
python manage.py makemigrations blog --empty
Migrations for 'blog':
blog/migrations/0003_auto_20170926_1105.py
Now open the file 0003_auto_20170926_1105.py, and it should have the following contents:
blog/migrations/0003_auto_20170926_1105.py
# -*- coding: utf-8 -*-
# Generated by Django 1.11.5 on 2017-09-26 11:05
from __future__ import unicode_literals
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
('blog', '0002_post_slug'),
]
operations = [
]
Then here in this file, we can create a function that can be executed by the RunPython
command:
blog/migrations/0003_auto_20170926_1105.py
# -*- coding: utf-8 -*-
# Generated by Django 1.11.5 on 2017-09-26 11:05
from __future__ import unicode_literals
from django.db import migrations
from django.utils.text import slugify
def slugify_title(apps, schema_editor):
'''
We can't import the Post model directly as it may be a newer
version than this migration expects. We use the historical version.
'''
Post = apps.get_model('blog', 'Post')
for post in Post.objects.all():
post.slug = slugify(post.title)
post.save()
class Migration(migrations.Migration):
dependencies = [
('blog', '0002_post_slug'),
]
operations = [
migrations.RunPython(slugify_title),
]
In the example above we are using the slugify
utility function. It takes a string as parameter and transform it in
a slug. See below some examples:
from django.utils.text import slugify
slugify('Hello, World!')
'hello-world'
slugify('How to Extend the Django User Model')
'how-to-extend-the-django-user-model'
Anyway, the function used by the RunPython
method to create a data migration, expects two parameters: apps and
schema_editor. The RunPython
will feed those parameters. Also remember to import models using the
apps.get_model('app_name', 'model_name')
method.
Save the file and execute the migration as you would do with a regular model migration:
python manage.py migrate blog
Operations to perform:
Apply all migrations: blog
Running migrations:
Applying blog.0003_auto_20170926_1105... OK
Now if we check the database:
id | title | date | content | slug |
---|---|---|---|---|
1 | How to Render Django Form Manually | 2017-09-26 11:01:20.547000 | […] | how-to-render-django-form-manually |
2 | How to Use Celery and RabbitMQ with Django | 2017-09-26 11:01:39.251000 | […] | how-to-use-celery-and-rabbitmq-with-django |
3 | How to Setup Amazon S3 in a Django Project | 2017-09-26 11:01:49.669000 | […] | how-to-setup-amazon-s3-in-a-django-project |
4 | How to Configure Mailgun To Send Emails in a Django Project | 2017-09-26 11:02:00.131000 | […] | how-to-configure-mailgun-to-send-emails-in-a-django-project |
Every Post entry have a value, so we can safely change the switch from null=True
to null=False
. And since all
the values are unique, we can also add the unique=True
flag.
Change the model:
blog/models.py
from django.db import models
class Post(models.Model):
title = models.CharField(max_length=255)
date = models.DateTimeField(auto_now_add=True)
content = models.TextField()
slug = models.SlugField(null=False, unique=True)
def __str__(self):
return self.title
Create a new migration:
python manage.py makemigrations blog
This time you will see the following prompt:
You are trying to change the nullable field 'slug' on post to non-nullable without a default; we can't do that
(the database needs something to populate existing rows).
Please select a fix:
1) Provide a one-off default now (will be set on all existing rows with a null value for this column)
2) Ignore for now, and let me handle existing rows with NULL myself (e.g. because you added a RunPython or RunSQL
operation to handle NULL values in a previous data migration)
3) Quit, and let me add a default in models.py
Select an option:
Select option 2 by typing “2” in the terminal.
Migrations for 'blog':
blog/migrations/0004_auto_20170926_1422.py
- Alter field slug on post
Now we can safely apply the migration:
python manage.py migrate blog
Operations to perform:
Apply all migrations: blog
Running migrations:
Applying blog.0004_auto_20170926_1422... OK
Conclusions
Data migrations are tricky sometimes. When creating data migration for your projects, always examine the production data first. The implementation of the slugify_title I used in the example is a little naïve, because it could generate duplicate titles for a large dataset. Always test the data migrations first in a staging environment, so to avoid breaking things in production.
It’s also important to do it step-by-step, so you can feel in control of the changes you are introducing. Note that here I create three migration files for a simple data migration.
As you can see, it’s fairly easy to create this type of migration. It’s also very flexible. You could for example load an external text file to insert the data into a new column for example.
The source code used in this blog post is available on GitHub: https://github.com/sibtc/data-migrations-example