Sometimes we want to find all events that happened in a given year. For example, a count of the sales that happened in the year 2000. It’s frequent to run a correct query that turns out to be slow. Let’s see how to do it properly.
The proper ways
Using one of these way to filter rows based on a year is correct.
Range search
A search by year is actually a range search, from the first day of that year to the last:
SELECT count(*)
FROM purchase
WHERE
timestamp => '2000-01-01'
AND timestamp < '2001-01-01';
If an index starts with the timestamp
column, this query will take advantage of it properly. But range searches can’t always take advantage of indexes, or can only do it in part.
Expressional index
If the range search is slow in your case, maybe you can turn it into a point search. To do this, we need to build an expressional index on it.
To define an expressional index in PostgreSQL:
CREATE INDEX idx_year ON purchase (year(timestamp), id);
This index will contain the result of year(timestamp)
for each row. At this point, we can write this query:
SELECT count(*)
FROM purchase
WHERE year(timestamp) = 2000
ORDER BY id;
Internally, this condition will be used:
WHERE idx_year = 2000
This is a point search, and won’t prevent the use of the second part of the index (id in this case) for the ORDER BY
.
The unoptimal way
The intuitive way to obtain the result is to use the year()
function to get a year from the timestamp.
Using the year()
function for filtering rows will prevent the use of an index, if we don’t have an expressional index or an indexed virtual column.
This applies any DBMS, and to any function, except for aggregate functions and window functions. The reasons are:
- DBMSs index columns, not functions or other expressions.
- Even if the index is usable in some cases, the component that decides queries execution strategies doesn’t know anything about functions.
Other DBMSs
Some DBMSs don’t support expressional indexes, but they allow to create a virtual column / generated field and build an index on it. In a future article we will show how to do this.
Some DBMSs don’t support the year()
function, but they all support some function that can be used to extract an year from a date or timestamp. Check your product documentation if the year()
function is not supported.