Find values that appear in multiple columns, in different rows

There are situations when you need to find values that occur in multiple columns, but oitentially in different rows. This can be done with the JOIN or the INTERSECT statement.

Sometimes you need to find values that appear in multiple columns, possibly in different rows. For example you may have a messanger table with a sender and a receipt column, and you may want to check which users both sent and received messages. There are several ways to do this.

JOIN

To do this using an INNER JOIN:

SELECT DISTINCT sender
    FROM messanger s
    INNER JOIN messenger r
        ON s.sender = r.receipt;

We joined a table with itself (self join). The ON comparison is between the two columns we’re interested in. Because it is an INNER JOIN, only the values that appear in both tables will be returned.

INTERSECT

To do this using INTERSECT:

(SELECT sender FROM messanger)
INTERSECT DISTINCT
(SELECT receipt FROM messanger);

INTERSECT is similar to the intersection set operation that we studied at school: it takes two queries, and returns the identical rows.

UNION DISTINCT

This method is showed for the sake of completeness. It is similar to INTERSECT, but the database could use a less efficient algorithm for this query.

(SELECT sender FROM messanger)
UNION DISTINCT
(SELECT receipt FROM messanger);

Performance

In most cases, the JOIN query will be faster. If one of the two tables is small enough, though, INTERSECT may be faster. INTERSECT DISTINCT (or just INTERSECT) is slower than INTERSECT ALL, so you may prefer the latter option; but each value will be returned multiple times, which can be annoying.

To speed up joins of this type, you can add two different indexes on sender and receipt – or make sure that two different indexes exist having sender and receipt as the first columns.