Groups of groups in SQL (nested GROUP BY)

We’ll show why and how to get what we call “groups of groups”, or aggregations of aggregated data. It means using nested GROUP BY queries.

Suppose that a website counts the views it gets for each page. As you know, we can use a GROUP BY to know how many views each page had this year.

A more advanced use of GROUP BY consists in visualising how many pages had 0-200 views, how many has 200-500 views, and so on.

The best way of explaining the concept is using an example. Let’s proceed by steps (or skip to the last query if you’re experienced enough with SQL).

Pages aggregates by this year’s views:

SELECT page_title, COUNT(*)
    FROM page_view
    WHERE year = YEAR()
    GROUP BY page_title
;

Count of pages for each “views category”:

SELECT view_count, COUNT(*) AS pages_count
    FROM (
        SELECT page_title, COUNT(*) AS view_count
            FROM page_view
            WHERE year = YEAR()
            GROUP BY page_title
    ) AS yv
    GROUP BY view_count
;

To make the query less slow, make sure to place the WHERE clause in the inner query.

But this is not yet what we want. This query aggregates the pages by the exact number of views. To aggregate by ranges of views:

SELECT
    CASE
            WHEN view_count < 200  THEN '< 200'
            WHEN view_count < 500  THEN '< 500'
            WHEN view_count < 2000 THEN '< 2000'
            ...
            WHEN view_count < 1000000000 THEN '< 1B'
            ELSE 'More than 1B'
        END AS view_count_band,
        COUNT(*) AS pages_in_band
    FROM (
        SELECT page_title, COUNT(*) AS view_count
            FROM page_view
            WHERE year = YEAR()
            GROUP BY page_title
    ) AS yv
    GROUP BY visit_count
;

Reference

Related articles: