Get distinct values that are stored in multiple columns

Getting a list of unique values from a column is easy. But if the same type of values are written on multiple columns and we want to eliminate duplicates, this requires a little trick.

Example: Get all unique email addresses from the columns email1 and email2.

Getting distinct values from a single column

In a perfect world, all emails would be stored in one column. Getting an ordered list of unique emails would be simple:

SELECT DISTINCT email FROM user_contact ORDER BY email;

Getting distinct values from a UNION

But in the real world, production tables often have columns email and email2. This happens because initially only one email is allowed, but at some point a second email needs be added, and developers try to do it with a minimal change to the current database design.

The query

To get ordered unique emails from multiple columns, we need to do this:

(SELECT email FROM user)
UNION DISTINCT
(SELECT email2 AS email FROM user)
ORDER BY email;

How UNION works

The query works in this way:

  • Lists of the emails from the two columns are retrieved by the two subqueries;
  • UNION DISTINCT concatenates the two returned columns, forming a single resultset that consists of one column; at the same time it removes duplicate emails;
  • ORDER BY defined the order.