We’ll show why and how to get what we call “groups of groups”, or aggregations of aggregated data. It means using nested
GROUP BY queries.
Suppose that a website counts the views it gets for each page. As you know, we can use a
GROUP BY to know how many views each page had this year.
A more advanced use of
GROUP BY consists in visualising how many pages had 0-200 views, how many has 200-500 views, and so on.
The best way of explaining the concept is using an example. Let’s proceed by steps (or skip to the last query if you’re experienced enough with SQL).
Pages aggregates by this year’s views:
SELECT page_title, COUNT(*) FROM page_view WHERE year = YEAR() GROUP BY page_title ;
Count of pages for each “views category”:
SELECT view_count, COUNT(*) AS pages_count FROM ( SELECT page_title, COUNT(*) AS view_count FROM page_view WHERE year = YEAR() GROUP BY page_title ) AS yv GROUP BY view_count ;
To make the query less slow, make sure to place the
WHERE clause in the inner query.
But this is not yet what we want. This query aggregates the pages by the exact number of views. To aggregate by ranges of views:
SELECT CASE WHEN view_count < 200 THEN '< 200' WHEN view_count < 500 THEN '< 500' WHEN view_count < 2000 THEN '< 2000' ... WHEN view_count < 1000000000 THEN '< 1B' ELSE 'More than 1B' END AS view_count_band, COUNT(*) AS pages_in_band FROM ( SELECT page_title, COUNT(*) AS view_count FROM page_view WHERE year = YEAR() GROUP BY page_title ) AS yv GROUP BY visit_count ;