In statistics, the mode is the function that returns the element that appear most often in a series. This is what people try to achieve when they run MAX(COUNT(*))… and they find out that it doesn’t work. Let’s see how to obtain the mode in SQL.
Let’s use an example, as usual. Suppose we need to find which department in a company has most employees.
SELECT department_id, COUNT(*) AS how_many FROM employee GROUP BY department_id ORDER BY how_many DESC LIMIT 1;
What we’re doing here is to get the unique least of departments, get the number of employees for each of them, sort the rows by the (ascending) number of employees, and only take the first of them.
However, several departments could have the same number of employees, but the above query can only return one row. If we want to get all the department with the highest number of employees, we can use this query:
SELECT department_id, COUNT(*) AS how_many FROM employee GROUP BY department_id ORDER BY how_many DESC FETCH FIRST 1 ROWS WITH TIES;
FETCH FIRST n ROWS WITH TIES syntax means that we surely want the first
n rows, but the following rows have the same values for the
ORDER BY columns (in this case, the same
COUNT(*)) they need to be returned as well.
Support for WITH TIES
However, not all DBMSs support this syntax. It is supported by:
- SQL Server
It is not supported by:
On DBMSs that don’t support
WITH TIES, you can look at alternative methods, which include subqueries and window functions.
Get all the modes using a subquery
SELECT department_id, COUNT(*) AS how_many FROM employee GROUP BY department_id HAVING how_many = ( SELECT COUNT(*) AS how_many FROM employee GROUP BY department_id ORDER BY how_many DESC LIMIT 1 );
- Use the first query in this article to get the highest number of employees in a department (subquery in
- Count the number of employees per department again.
- From this statistics, we only take the rows where the number of employees is equal to the highest number of employees.
Get all the modes with the window function method
While the above method works, the syntax is a bit too verbose and it can be slow. We can use a window function instead.
SELECT department_id, how_many AS how_many FROM ( SELECT department_id, COUNT(*) AS how_many, RANK() OVER (ORDER BY COUNT(*) DESC) AS department_rank FROM employee GROUP BY department_id ) s WHERE department_rank = 1;
What we do is:
- Group employees by department, count the employees for each department.
- Assign a rank to each row, starting from the highest count (
ORDER BY COUNT(*) DESC).
- Note that rows where
COUNT(*)is the same, the rank will also be the same.
- Note that rows where
- From the results of this subquery, select the rows with the lower rank (highest count).
Why MAX(COUNT(*)) won’t work
SELECT MAX(COUNT(*)) FROM table GROUP BY column won’t work on any DBMS.
The reason is that aggregate functions (like
COUNT()) accept any number of rows as input and return one value as output. Input rows are each group’s rows if
GROUP BY is used, or all table rows otherwise.
In this case,
COUNT() will get each group’s rows and return a value for the group. You expect
MAX() to so the same, but it’s not what you’re writing: you’re passing it the result of an aggregation. This operation is not possible.
All the examples require an index on
department_id, or an index whose first column is
department_id. Otherwise at least one full table scan (sequential read) will be performed, and for big tables this operation is slow.
To return multiple modes, the
WITH TIES method is the fastest. It’s roughly as fast as the first query, that returns at most one row.
The window function method will be fast.
The subquery method can be slow, because the
GROUP BY is run twice. Note that
LIMIT 1 won’t make a
GROUP BY any faster, because the database can’t return a correct result before reading all the relevant rows.