SQL Server: CHECKSUM() vs CHECKSUM_BINARY() vs HASHBYTES()

SQL Server has at least three functions to create checksums or hashes: CHECKSUM(), CHECKSUM_BINARY(), and HASHBYTES(). Here we discuss what differences exist between them, and how to choose the proper function for a specific use case.

Functions summary

Let’s explain the functions briefly.

CHECKSUM()

Returns a checksum of a value, a list of values, or a whole row.

This function is recommended by the SQL Server documentation to create hash indexes. The following example results in a hash index on all the table rows:

ALTER TABLE user ADD row_checksum AS CHECKSUM(*);
GO
CREATE INDEX idx_row_checksum ON user (row_checksum);
GO

Two lists of values return the same result if their types, values are the same, and they are specified in the same order. Otherwise, they will usually return different values. Similarly, changing a value will usually change its checksum.

This function does not accept these types:

  • CURSOR;
  • IMAGE;
  • NTEXT;
  • TEXT;
  • XML.

NULLs are considered equal when they are of the same type.

Case-sensitivity of the strings depends on server configuration.

The dash character is ignored for NCHAR and NVARCHAR types.

BINARY_CHECKSUM()

Similar to CHECKSUM(), but it is more precise. As a consequence, it is not as fast but when applied to different values it is more likely to return different outputs.

Documented reasons why the collisions are less likely include:

  • BINARY_CHECKSUM() is always case-sensitive;
  • CHECKSUM() ignores dashes, whereas BINARY_CHECKSUM() doesn’t.

Also, it accepts CURSORs.

HASHBYTES()

HASHBYTES() is more precise and less fast then BINARY_CHECKSUM().

It returns a hash using the specified algorithm (the argument is mandatory). Supported algorithms are: MD2, MD4, MD5, SHA, SHA1, SHA2_256, SHA2_512.

It works with all data types.

It treats NULL in the same way as CHECKSUM() and BINARY_CHECKSUM().

Example:

SELECT HASHBYTES('SHA2_512', secret) FROM dbo.user WHERE id = 24;

Choosing a function

Generic rules of thumb:

  • Some data types restrict your choice.
  • If you need to generate hashes that can be compared with hashes generated outside of SQL Server, use HASHBYTES().
  • In other cases, it’s generally fine to start with CHECKSUM().
  • If case-sensitivity and dashes matter for your specific case, or if the collisions are too many, switch to CHECKSUM_BINARY().
  • If the collisions are still too many, switch to HASHBYTES().
  • Different HASHBYTES() algorithms provide different collision probability and different performance, but this is beyond the scope of this article.

Reference

SQL Server documentation: