Data Migration is a very convenient way to change the data in the database in conjunction with changes in the schema. They work like a regular schema migration. Django keep track of dependencies, order of execution and if the application already applied a given data migration or not.
A common use case of data migrations is when we need to introduce new fields that are non-nullable. Or when we are creating a new field to store a cached count of something, so we can create the new field and add the initial count.
In this post we are going to explore a simple example that you can very easily extend and modify for your needs.
Data Migrations
Let’s suppose we have an app named blog, which is installed in our project’s INSTALLED_APPS
.
The blog have the following model definition:
blog/models.py
The application is already using this Post model; it’s already in production and there are plenty of data stored in the database.
id | title | date | content |
---|---|---|---|
1 | How to Render Django Form Manually | 2017-09-26 11:01:20.547000 | […] |
2 | How to Use Celery and RabbitMQ with Django | 2017-09-26 11:01:39.251000 | […] |
3 | How to Setup Amazon S3 in a Django Project | 2017-09-26 11:01:49.669000 | […] |
4 | How to Configure Mailgun To Send Emails in a Django Project | 2017-09-26 11:02:00.131000 | […] |
Now let’s say we want to introduce a new field named slug which will be used to compose the new URLs of the blog. The slug field must be unique and not null.
Generally speaking, always add new fields either as null=True
or with a default
value. If we can’t solve the
problem with the default
parameter, first create the field as null=True
then create a data migration for it. After
that we can then create a new migration to set the field as null=False
.
Here is how we can do it:
blog/models.py
Create the migration:
Apply it:
At this point, the database already have the slug column.
id | title | date | content | slug |
---|---|---|---|---|
1 | How to Render Django Form Manually | 2017-09-26 11:01:20.547000 | […] | (null) |
2 | How to Use Celery and RabbitMQ with Django | 2017-09-26 11:01:39.251000 | […] | (null) |
3 | How to Setup Amazon S3 in a Django Project | 2017-09-26 11:01:49.669000 | […] | (null) |
4 | How to Configure Mailgun To Send Emails in a Django Project | 2017-09-26 11:02:00.131000 | […] | (null) |
Create an empty migration with the following command:
Now open the file 0003_auto_20170926_1105.py, and it should have the following contents:
blog/migrations/0003_auto_20170926_1105.py
Then here in this file, we can create a function that can be executed by the RunPython
command:
blog/migrations/0003_auto_20170926_1105.py
In the example above we are using the slugify
utility function. It takes a string as parameter and transform it in
a slug. See below some examples:
Anyway, the function used by the RunPython
method to create a data migration, expects two parameters: apps and
schema_editor. The RunPython
will feed those parameters. Also remember to import models using the
apps.get_model('app_name', 'model_name')
method.
Save the file and execute the migration as you would do with a regular model migration:
Now if we check the database:
id | title | date | content | slug |
---|---|---|---|---|
1 | How to Render Django Form Manually | 2017-09-26 11:01:20.547000 | […] | how-to-render-django-form-manually |
2 | How to Use Celery and RabbitMQ with Django | 2017-09-26 11:01:39.251000 | […] | how-to-use-celery-and-rabbitmq-with-django |
3 | How to Setup Amazon S3 in a Django Project | 2017-09-26 11:01:49.669000 | […] | how-to-setup-amazon-s3-in-a-django-project |
4 | How to Configure Mailgun To Send Emails in a Django Project | 2017-09-26 11:02:00.131000 | […] | how-to-configure-mailgun-to-send-emails-in-a-django-project |
Every Post entry have a value, so we can safely change the switch from null=True
to null=False
. And since all
the values are unique, we can also add the unique=True
flag.
Change the model:
blog/models.py
Create a new migration:
This time you will see the following prompt:
Select option 2 by typing “2” in the terminal.
Now we can safely apply the migration:
Conclusions
Data migrations are tricky sometimes. When creating data migration for your projects, always examine the production data first. The implementation of the slugify_title I used in the example is a little naïve, because it could generate duplicate titles for a large dataset. Always test the data migrations first in a staging environment, so to avoid breaking things in production.
It’s also important to do it step-by-step, so you can feel in control of the changes you are introducing. Note that here I create three migration files for a simple data migration.
As you can see, it’s fairly easy to create this type of migration. It’s also very flexible. You could for example load an external text file to insert the data into a new column for example.
The source code used in this blog post is available on GitHub: https://github.com/sibtc/data-migrations-example