Brainstorm's snippets (1/247)

  Understand Group by in Django with SQL

Since I never really managed to think about aggregations in term of ORM rather than SQL, I'm practicing and summarizing here the examples described in the interesting article Understand Group by in Django with SQL by Haki Benita, where QuerySets and SQL are put side by side.

How to Use Aggregate Function

sql:

SELECT
    COUNT(id) as total
FROM
    auth_user;

python:

from django.db.models import Count

queryset = (User.objects
    .aggregate(
        total=Count('id'),
    )
)

The aggregate() function accepts an expression to count.

In this case, we used the name of the primary key column id to count all the rows in the table.

The result of aggregate is a dict, and the name of the argument to aggregate becomes the name of the key:

{"total": 891}

Annotations

Annotates each object in the QuerySet with the provided list of query expressions.

An expression may be

  • a simple value
  • a reference to a field on the model (or any related models)
  • or an aggregate expression (averages, sums, etc.) that has been computed over the objects that are related to the objects in the QuerySet.

Each argument to annotate() is an annotation that will be added to each object in the QuerySet that is returned.

See:

https://docs.djangoproject.com/en/2.0/ref/models/querysets/#annotate

How to Group By

We usually want to apply the aggregation on groups of rows:

sql:

SELECT
    is_active,
    COUNT(id) AS total
FROM
    auth_user
GROUP BY
    is_active

python:

queryset = (User.objects
    .values(
        'is_active',
    )
    .annotate(
        total=Count('id'),
    )
    .order_by()
)

To produce a GROUP BY we use a combination of values and annotate:

- values('is_active'): what to group by
- annotate(total=Count('id')): what to aggregate

The order is important: failing to call values before annotate will not produce aggregate results.

Interaction with default ordering or order_by()

When not requiring a specific ordering, we always add an empty ordering list (.order_by()) to prevent possible interactions from a model’s Meta.ordering

Fields that are mentioned in the order_by() part of a queryset (or which are
used in the default ordering on a model) are used when selecting the output
data, even if they are not otherwise specified in the values() call.

These extra fields are used to group “like” results together and they can make
otherwise identical result rows appear to be separate.

This shows up, particularly, when counting things.


Notes:

- Deprecated since version 2.2
- Starting in Django 3.1, the ordering from a model’s Meta.ordering won’t be
  used in GROUP BY queries, such as .annotate().values()
- Since Django 2.2, these queries issue a deprecation warning indicating to
  add an explicit order_by() to the queryset to silence the warning

See:

  https://docs.djangoproject.com/en/3.0/topics/db/aggregation/#interaction-with-default-ordering-or-order-by

How to Filter and Sort a QuerySet With Group By

You can use filter() ans order_by() anywhere in the query:

sql:

SELECT
    is_active,
    COUNT(id) AS total
FROM
    auth_user
WHERE
    is_staff = True
GROUP BY
    is_active
ORDER BY
    is_active,
    total

python:

queryset = (User.objects
    .values(
        'is_active',
    )
    .filter(
        is_staff=True,
    )
    .annotate(
        total=Count('id'),
    )
    .order_by(
        'is_staff',
        'total',
    )
)

Notice that you can sort by both the GROUP BY key and the aggregate field.

How to Combine Multiple Aggregations

To produce multiple aggregations on the same group, add multiple annotations:

sql:

SELECT
    is_active,
    COUNT(id) AS total,
    MAX(date_joined) AS last_joined
FROM
    auth_user
GROUP BY
    is_active

python:

from django.db.models import Max

queryset = (User.objects
    .values(
        'is_active',
    )
    .annotate(
        total=Count('id'),
        last_joined=Max('date_joined'),
    )
    .order_by()
)

How to Group by Multiple Fields

To group by multiple fields, list these fields in values()

sql:

SELECT
    is_active,
    is_staff,
    COUNT(id) AS total
FROM
    auth_user
GROUP BY
    is_active,
    is_staff

python:

queryset = (User.objects
    .values(
        'is_active',
        'is_staff',
    )
    .annotate(
        total=Count('id'),
    )
    .order_by()
)