Understand Group by in Django with SQL ¶
Since I never really managed to think about aggregations in term of ORM rather than SQL, I'm practicing and summarizing here the examples described in the interesting article Understand Group by in Django with SQL by Haki Benita, where QuerySets and SQL are put side by side.
Contents
How to Use Aggregate Function
sql:
SELECT COUNT(id) as total FROM auth_user;
python:
from django.db.models import Count queryset = (User.objects .aggregate( total=Count('id'), ) )
The aggregate() function accepts an expression to count.
In this case, we used the name of the primary key column id to count all the rows in the table.
The result of aggregate is a dict, and the name of the argument to aggregate becomes the name of the key:
{"total": 891}
Annotations
Annotates each object in the QuerySet with the provided list of query expressions.
An expression may be
- a simple value
- a reference to a field on the model (or any related models)
- or an aggregate expression (averages, sums, etc.) that has been computed over the objects that are related to the objects in the QuerySet.
Each argument to annotate() is an annotation that will be added to each object in the QuerySet that is returned.
See:
https://docs.djangoproject.com/en/2.0/ref/models/querysets/#annotate
How to Group By
We usually want to apply the aggregation on groups of rows:
sql:
SELECT is_active, COUNT(id) AS total FROM auth_user GROUP BY is_active
python:
queryset = (User.objects .values( 'is_active', ) .annotate( total=Count('id'), ) .order_by() )
To produce a GROUP BY we use a combination of values and annotate:
- values('is_active'): what to group by - annotate(total=Count('id')): what to aggregate
The order is important: failing to call values before annotate will not produce aggregate results.
Interaction with default ordering or order_by()
When not requiring a specific ordering, we always add an empty ordering list (.order_by()) to prevent possible interactions from a model’s Meta.ordering
Fields that are mentioned in the order_by() part of a queryset (or which are used in the default ordering on a model) are used when selecting the output data, even if they are not otherwise specified in the values() call. These extra fields are used to group “like” results together and they can make otherwise identical result rows appear to be separate. This shows up, particularly, when counting things. Notes: - Deprecated since version 2.2 - Starting in Django 3.1, the ordering from a model’s Meta.ordering won’t be used in GROUP BY queries, such as .annotate().values() - Since Django 2.2, these queries issue a deprecation warning indicating to add an explicit order_by() to the queryset to silence the warning See: https://docs.djangoproject.com/en/3.0/topics/db/aggregation/#interaction-with-default-ordering-or-order-by
How to Filter and Sort a QuerySet With Group By
You can use filter() ans order_by() anywhere in the query:
sql:
SELECT is_active, COUNT(id) AS total FROM auth_user WHERE is_staff = True GROUP BY is_active ORDER BY is_active, total
python:
queryset = (User.objects .values( 'is_active', ) .filter( is_staff=True, ) .annotate( total=Count('id'), ) .order_by( 'is_staff', 'total', ) )
Notice that you can sort by both the GROUP BY key and the aggregate field.
How to Combine Multiple Aggregations
To produce multiple aggregations on the same group, add multiple annotations:
sql:
SELECT is_active, COUNT(id) AS total, MAX(date_joined) AS last_joined FROM auth_user GROUP BY is_active
python:
from django.db.models import Max queryset = (User.objects .values( 'is_active', ) .annotate( total=Count('id'), last_joined=Max('date_joined'), ) .order_by() )
How to Group by Multiple Fields
To group by multiple fields, list these fields in values()
sql:
SELECT is_active, is_staff, COUNT(id) AS total FROM auth_user GROUP BY is_active, is_staff
python:
queryset = (User.objects .values( 'is_active', 'is_staff', ) .annotate( total=Count('id'), ) .order_by() )