Speed up your Django App – Part 1 Querysets

So you’ve set up your Django project, divided the code into multiple modules, written a very clean code and proudly hosted your website on a domain. But soon you start seeing 504s in your responses, or perhaps users are complaining of very slow loading time. What’s wrong? Maybe you have overlooked something very important.

I’m a Pythonista and a Django lover. I have set up numerous projects, been a part of multiple teams and mentored quite a few developers. I have noticed that many of us developers care about building and completing a project, without worrying too much about how it behaves, specifically how fast it is. That realisation doesn’t dawn upon a techie until they themselves own a project and are responsible for its behaviour. I like to grow the seed of “Quality first, Deadlines later” in my team members so that when they own a team, they do not sideline this very important aspect of a web app – the Page Load Time.

A page load speed of an application depends on a number of factors. Fortunately, we can control and tweak most of them to behave in the most optimised manner to give our users a smooth and fast experience. In this five-part series, I am going to describe how each of these factors affects the speed and how to use them.

Note: I have used Django’s db.connection for analyzing the queries. For that to work, settings.DEBUG has to be true. In some cases, this does not capture the queries fired (example in multi-DB environments). In those cases, you can use db.connections. Example:

from django.db import connections

>>> connections['master'].queries_log

Part 1 – Optimising Django Querysets

Django ORM is a wonderful mediator that lets your application speak with your database in a language that they understand. It’s like a translator, which when used correctly may do wonders but if used inefficiently can behave like an evil twin. Optimising your querysets is the most essential step in speeding up your app. If your application uses an inefficient method to interact with the database, no amount of Database optimisations can help speed up your queries.

Sample this – You have created a DB index on the email field of the User model. But in query logs, you see that when you search a user by his email, that index is not being used. That is because of the inefficiency of the ORM query that you wrote.

Let’s take a walk through all the ways in which you can dramatically improve how your application interacts with your database.

I’ll be basing the examples assuming the following set of models in app ‘shop’.

from django.db import models

class Product(models.Model):
    name = models.CharField(max_length=100)
    description = models.TextField()
    ..

    def __str__(self):
        return self.name

class User(models.Model):
    name = models.CharField(max_length=200)
    email = models.EmailField()

    def __str__(self):
        return self.name

class OrderItem(models.Model):
    user = models.ForeignKey(User, on_delete=models.CASCADE)
    products = models.ManyToManyField(Product)
    ..

    def __str__(self):
        return self.id

Be selective

When working on a queryset, we usually don’t require all the fields of a model. It’s best to supply Django a list of fields we need to work on. If we don’t do that Django creates the query using “Select All” (or “Select *” in case of MySQL).

Example:

>> # query to fetch the list of all products
>> products = Product.objects.all()

>> , 
>> # For a product model with 30 fields. Time taken by query = 335ms 

Actual query: SELECT `shop_product`.`id`, `shop_product`.`created`, shop_product`.`modified`, `shop_product`.`url`, `shop_product`.`title`, `shop_product`.`desc`, `shop_product`.`keywords`, `shop_product`.`heading`, `shop_product`.`name`, `shop_product`.`slug`, ........... `shop_product`.`type_product`, `shop_product`.`icon`, `shop_product`.`image_bg`, `shop_product`.`image`, `shop_product`.`video_url`,`shop_product`.`about`, `shop_product`.`description`,`shop_product`.`price`,`shop_product`.`avg_rating`,`shop_product`.`no_review`, `shop_product`.`buy_count`, `shop_product`.`active` FROM shop_product

Now we don’t need all the 30 fields, that’s unnecessary processing that we can avoid. Django provides us with a selection facility in the name of values() and values_list().

>> # query to fetch the list of ids and names of all products
>> products = Product.objects.all().values('id','name')

>> 
>> # For a product model with 30 fields. Time taken by query = 80ms

Actual query: SELECT `shop_product`.`id`, `shop_product`.`name` FROM shop_product

values() returns a list of hashmaps of the fields requested per item in queryset, and values_list() provides a list of tuples of the same fields.

Creating something new? Go Big!

Often in our projects, we need to create multiple objects basis some information received in bulk, be it a file uploaded or anything else. Basic instinct makes one loop over the information and create the processed object within each iteration. This results in as many queries to the DB as there are iterations.

>> # create 5 user objects using iterative approach
>> from django.db import connection, reset_queries
>>
>> for i in range(5):
>>     User.objects.create(email='test{}@gmail.com'.format(i))
>> len(connection.queries) # Shows the number of queries made to the DB
>> 5
>> reset_queries() # Resets the connection.queries list

In the iterative approach for creating n objects, n queries will be fired and it would result in an exponential delay as the number of objects increase, and the related objects increase.

This can be completely avoided by using bulk_create(), a method that efficiently inserts the objects into the DB, no matter how many of them.

>> # create 5 users using bulk_create
>> len(connection.queries)
>> 0
>> users = [User(email='test{}@gmail.com'.format(i)) for i in range(5,10)]
>> len(connection.queries)
>> 0  # These objects are not created yet, but are in memory
>> User.objects.bulk_create(users)
>> len(connection.queries)  # only 1 query fired
>> 1

Do it once and for all, literally

Like in the case of object creation, the same type of updations to multiple objects should also be done in bulk and not in an iteration. Django ORM offers the update() method for doing exactly that. update() works on a queryset instead of a single object. So we get the benefit of filtering exactly what we wish to update.

>> # fetch objects and update them one at a time
>> reset_queries()
>> for i in range(5):
>>     u = User.objects.get(email='test{}@gmail.com'.format(i))
>>     u.name = 'test_save'
>>     u.save()
>> len(connection.queries)
>> 10

That’s right, 10 queries and not 5, because first an object is fetched from DB using the get() method and then saved using the save() method. Now let’s see the behaviour of update()

>> # update selected objects in one go
>> reset_queries()
>> User.objects.filter(email__in=['test{}@gmail.com'.format(i) for i in range(5)]).update(name='test_save')
>> 5  # It returns the number of objects updated
>> len(connection.queries)
>> 1

Do keep in mind that it has certain limitations:

  1. It can’t be used to update related fields of the object.
  2. It won’t work on a slice of a queryset.
  3. It won’t emit the pre_save and post_save signals nor call any custom save() method on the object.

Know where to search and how to search

We place indexes in our DB to speed up the search. Whether these indexes are used or not depends entirely upon the query we shoot.
Django provides field lookups that are equivalents of LIKE command in MySQL. They are iexact, contains, icontains, startswith, istartswith, endswith and iendswith.

Point to note here is that LIKE filters can only use the characters before the first wild card to trim down the search results using the index. The remaining characters are just filter predicates that do not narrow the scanned index range. So to make our indexes perform better, we should avoid LIKE expressions with leading wildcards (eg. %name% ) i.e use startswith, istartswith and iexact in place of contains, icontains,endswith and iendswith wherever possible. If not completely unavoidable, then you must restrict the number of fields to search on. Chaining multiple icontains together is a recipe for disaster.

>> # fetch order items by searching substring in thier product names
>> reset_queries()
>> OrderItem.objects.filter(product__name__icontains='shoe')
>> connection.queries[0]['time']
>> 333ms
>> 
>> # fetch order items by searching 
>> reset_queries()
>> OrderItem.objects.filter(product__name__istartswith='linkedin')
>> connection.queries[0]['time']
>> 74ms

Figure out what is the use case of the search, what fields do you really need to search on, can you restrict the user to search for the starting value of the fields and not anywhere in the field.

Don’t forget the Joins!

More often than not, we have normalised data with foreign key relationships in our tables. It becomes imperative that we consider the number of joins while formulating a queryset.

>> # Accessing the related field the tradiotional way
>> order_items = OrderItem.objects.filter(id__in=[1,2,3,4,5])
>> len(connection.queries)
>> 1
>> for oi in order_items:
>>     print(order_item.user)  # Hits the DB again
>> ...
>> len(connection.queries)
>> 6  # 1 DB hit per iteration

Alternatively, the select_related() queryset method recursively prepopulates the cache of all one-to-many relationships ahead of time, so that relationship is called, another query is not fired.

>> # Accessing the related field using select_related
>> order_item = OrderItem.objects.filter(id=5).select_related('user')
>> len(connection.queries)
>> 1
>> for oi in order_items:
>>    print(oi.user) # Uses the cached user in memory
>> > len(connection.queries)
>> 1

select_related()‘s cousin is the prefetch_related() method.  It behaves in the same manner i.e caches the related table’s data, only it works for ManytoManyField instead of a ForeignKey.

>> # Accessing the manytomany field the tradiotional way
>> order_items = OrderItem.objects.filter(id__in=[1,2,3,4,5])
>> for oi in order_items:
>>     print(order_item.products.all()) # Hits the DB again
>> ...
>> len(connection.queries)
>> 6 # 1 DB hit per iteration
>>
>> # Accessing the field using prefetch_related
>> order_items = OrderItem.objects.filter(id=5).prefetch_related('product')
>> for oi in order_items:
>>     order_item.product # Uses the cached product in memory
>> 
...
>> len(connection.queries)
>> 2  # One query extra for manytomany model

Note that these methods should not be used, when you need to update any field in the related model within the iteration. The updations will not get reflected in the next iteration because of old cached values.

Extra Tips:

  1. If we only need the count of objects in a queryset, do not count the queryset using python processes, like len() or iteration, instead use the count() queryset method. It directly employs SQL COUNT command.
  2. If you need a specific number of objects, always slice the queryset in order to impose LIMITS on the query instead of the iterative approach.
  3. Avoid processing the queryset in the django templates directly or in template tags. Try to move as many conditions closer to the DB as possible.

Django’s ORM is a convenient tool to ease the user interaction with the DB. However, it will not create the most efficient query unless we direct it to.
Do share if you have any more queryset tips and tricks to speed up your site.

Keep watching this space for Part 2 of this series..
Advertisements

2 thoughts on “Speed up your Django App – Part 1 Querysets

Add yours

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

Powered by WordPress.com.

Up ↑

The Mindful Gardener

The sensory pleasures and earthy delights of gardening.

Weekly Musings

A collection of thought pieces from a writer with many musings.

Vibe a Little

Learn to Live

The Road Till There

Hello everyone!! Welcome to ‘The Road Till There’. The Road Till There will help you find calmness in your daily life. You will find many posts regarding self-development, well-being and career and success in this blog. I started writing on these topics because my mind was filled with so many thoughts and opinions on such topics that I decided to pour out these thoughts in a blog! Nothing would make me happier than my posts helping someone on their road to personal growth, wellness or their career/success!

Pauls Pages Too

Extra Content from PaulsPages.com

Cooking Without Limits

Food Photography & Recipes

Shards of Being Me

"Your imperfections characterise your individuality"

Anna the Fangirl

Basically, a place where I just randomly rant or fangirl over the things I love, including but not limited to, movies, books, celebrities and music

Beauty lies within yourself

The only impossible journey in life is you never begin!! ~Tanvir Kaur

UsingHumorAsADeflectiveMechanism

it seemed like a good idea at the time.

Rowperspective

⚘Not Just Another Lifestyle Blogger⚘

%d bloggers like this: