Sometimes you want to count all occurrences of a substring into a bigger string. For example, you may want to count all occurrences of a name in a text. This can be done, but it’s not straight-forward because SQL Server doesn’t have a specific function to do it.
Counting the occurrences of a character
This is the simplest case: the substring we want to count has a length of exactly one character. We can do it this way:
SELECT
LEN(summary) - LEN(REPLACE(summary, 'x', ''))
AS occurrences
FROM article
Here’s what this expression does:
- Gets the number of characters in the superstring;
- Deletes the x’s from the superstring and counts the characters again;
- Returns the difference or, in other words, the number of x’s.
Counting the occurrences of a substring of any length
If we don’t know the length of the substring, the expression to use is a bit more complex:
SELECT
(LEN(summary) - LEN(REPLACE(summary, 'France',''))) / LEN('France')
AS occurrences
FROM article
Here’s what it does:
- Gets the number of characters in the superstring;
- Deletes the occurrences of ‘France’ from the superstring and counts the characters again;
- Returns the length difference divided by the length of ‘France’.
The length of ‘France’ is 6 characters. So, for example, if the length difference is 6 there is one occurrence, and if the length difference is 12 there are two occurrences. If there are zero occurrences the expression will still work, because zero divided by any number returns zero.
Case-sensitivity
The above expressions are case-sensitive. We can make them case-insensitive by using the LOWER()
function.
Single character, case insensitive:
SELECT
LEN(summary) - LEN(REPLACE(LOWER(summary), LOWER('x'), ''))
AS occurrences
FROM article
Any number of characters, case insensitive:
SELECT
(LEN(summary) - LEN(REPLACE(LOWER(summary), LOWER('France'(,''))) / LEN('France')
AS occurrences
FROM article
In both cases, LOWER()
is applied before REPLACE()
, to make it work when. the cases don’t match. It’s not used with LEN()
because the case obviously doesn’t affect a string length.
Performance
In the examples above, the expressions should be relatively fast (though, of course, it can be a problem when applied to a huge number of rows).
Nothing prevents us to use those expressions in a WHERE
clause or in an ORDER BY
clause, but this may heavily affect performance. The reason is that, if there is an index on the summary
column, it cannot be used because the query optimizer cannot make assumptions on the results of the expressions.
If it is really necessary, this problem can be solved by building a computer column and an index on it, but this is out of the scope of this article.
Reference
More articles:
SQL Server documentation: