How to SELECT
rows from table A that have, or don’t have, a match in table B. While doing this, we’ll keep performance in mind.
Semijoin
Selecting rows from one table only, after verifying that they have a match in a second table, is called a semijoin.
Example: Find all directors who made at least one horror movie.
If the film table is small, we can use a subquery. This should be the fastest method.
SELECT d.id, d.name, d.surname
FROM director d
WHERE EXISTS (
SELECT id
FROM film f
WHERE f.genre = 'horror'
AND f.director = d.id
);
If the film table is medium to big, we better use a JOIN
.
SELECT d.id, d.name, d.surname
FROM director d
INNER JOIN film f
ON d.id = f.director
WHERE f.genre = 'horror';
Antijoin
Selecting rows from one table only that don’t have a match in another table is called an antijoin.
Example: Find all directors that never made a horror film.
The queries are very similar. In the first case, we’ll just use NOT EXISTS
instead of EXISTS
:
SELECT d.id, d.name, d.surname
FROM director d
WHERE NOT EXISTS (
SELECT id
FROM film f
WHERE f.genre = 'horror'
AND f.director = d.id
);
In the second case we’ll use a LEFT JOIN
which returns both rows with and without a match in the second table; but then, with a WHERE
condition, we’ll only choose the rows where the primary key of film
is NULL
, which can only mean that there is not match in film
:
SELECT d.id, d.name, d.surname
FROM director d
LEFT JOIN film f
ON d.id = f.director AND f.genre = 'horror'
WHERE f.id = NULL;
Reference
From Wikipedia: