We’ll show why and how to get what we call “groups of groups”, or aggregations of aggregated data. It means using nested GROUP BY
queries.
Suppose that a website counts the views it gets for each page. As you know, we can use a GROUP BY
to know how many views each page had this year.
A more advanced use of GROUP BY
consists in visualising how many pages had 0-200 views, how many has 200-500 views, and so on.
The best way of explaining the concept is using an example. Let’s proceed by steps (or skip to the last query if you’re experienced enough with SQL).
Pages aggregates by this year’s views:
SELECT page_title, COUNT(*)
FROM page_view
WHERE year = YEAR()
GROUP BY page_title
;
Count of pages for each “views category”:
SELECT view_count, COUNT(*) AS pages_count
FROM (
SELECT page_title, COUNT(*) AS view_count
FROM page_view
WHERE year = YEAR()
GROUP BY page_title
) AS yv
GROUP BY view_count
;
To make the query less slow, make sure to place the WHERE
clause in the inner query.
But this is not yet what we want. This query aggregates the pages by the exact number of views. To aggregate by ranges of views:
SELECT
CASE
WHEN view_count < 200 THEN '< 200'
WHEN view_count < 500 THEN '< 500'
WHEN view_count < 2000 THEN '< 2000'
...
WHEN view_count < 1000000000 THEN '< 1B'
ELSE 'More than 1B'
END AS view_count_band,
COUNT(*) AS pages_in_band
FROM (
SELECT page_title, COUNT(*) AS view_count
FROM page_view
WHERE year = YEAR()
GROUP BY page_title
) AS yv
GROUP BY visit_count
;
Reference
Related articles: