In statistics, the mode is the function that returns the element that appear most often in a series. This is what people try to achieve when they run MAX(COUNT(*))… and they find out that it doesn’t work. Let’s see how to obtain the mode in SQL.
Let’s use an example, as usual. Suppose we need to find which department in a company has most employees.
SELECT department_id, COUNT(*) AS how_many
FROM employee
GROUP BY department_id
ORDER BY how_many DESC
LIMIT 1;
What we’re doing here is to get the unique least of departments, get the number of employees for each of them, sort the rows by the (ascending) number of employees, and only take the first of them.
WITH TIES
However, several departments could have the same number of employees, but the above query can only return one row. If we want to get all the department with the highest number of employees, we can use this query:
SELECT department_id, COUNT(*) AS how_many
FROM employee
GROUP BY department_id
ORDER BY how_many DESC
FETCH FIRST 1 ROWS WITH TIES;
The FETCH FIRST n ROWS WITH TIES
syntax means that we surely want the first n
rows, but the following rows have the same values for the ORDER BY
columns (in this case, the same COUNT(*)
) they need to be returned as well.
Support for WITH TIES
However, not all DBMSs support this syntax. It is supported by:
- PostgreSQL
- MariaDB
- SQL Server
- Oracle
It is not supported by:
- MySQL
- SQLite
- Db2
- Redshift
- Snowflake
Alternative methods
On DBMSs that don’t support WITH TIES
, you can look at alternative methods, which include subqueries and window functions.
Get all the modes using a subquery
SELECT department_id, COUNT(*) AS how_many
FROM employee
GROUP BY department_id
HAVING how_many = (
SELECT COUNT(*) AS how_many
FROM employee
GROUP BY department_id
ORDER BY how_many DESC
LIMIT 1
);
Here we:
- Use the first query in this article to get the highest number of employees in a department (subquery in
HAVING
). - Count the number of employees per department again.
- From this statistics, we only take the rows where the number of employees is equal to the highest number of employees.
Get all the modes with the window function method
While the above method works, the syntax is a bit too verbose and it can be slow. We can use a window function instead.
SELECT department_id, how_many AS how_many
FROM (
SELECT
department_id,
COUNT(*) AS how_many,
RANK() OVER (ORDER BY COUNT(*) DESC) AS department_rank
FROM employee
GROUP BY department_id
) s
WHERE department_rank = 1;
What we do is:
- Group employees by department, count the employees for each department.
- Assign a rank to each row, starting from the highest count (
ORDER BY COUNT(*) DESC
).- Note that rows where
COUNT(*)
is the same, the rank will also be the same.
- Note that rows where
- From the results of this subquery, select the rows with the lower rank (highest count).
Why MAX(COUNT(*)) won’t work
SELECT MAX(COUNT(*)) FROM table GROUP BY column
won’t work on any DBMS.
The reason is that aggregate functions (like MAX()
and COUNT()
) accept any number of rows as input and return one value as output. Input rows are each group’s rows if GROUP BY
is used, or all table rows otherwise.
In this case, COUNT()
will get each group’s rows and return a value for the group. You expect MAX()
to so the same, but it’s not what you’re writing: you’re passing it the result of an aggregation. This operation is not possible.
Performance
All the examples require an index on department_id
, or an index whose first column is department_id
. Otherwise at least one full table scan (sequential read) will be performed, and for big tables this operation is slow.
To return multiple modes, the WITH TIES
method is the fastest. It’s roughly as fast as the first query, that returns at most one row.
The window function method will be fast.
The subquery method can be slow, because the GROUP BY
is run twice. Note that LIMIT 1
won’t make a GROUP BY
any faster, because the database can’t return a correct result before reading all the relevant rows.
Reference
Related articles:
Wikipedia: