Introduction
In the realm of memory-optimized tables, every table must have at least one index to connect its rows. Notably, every index on such tables is also memory-optimized. Among the various types of indexes available, hash indexes stand out for their unique structure and functionality. Understanding the importance of bucket counts in hash indexes for memory-optimized tables is crucial for maintaining optimal performance in SQL Server. This post looks at the structure of hash indexes, how to specify and adjust bucket counts, and the consequences of incorrect bucket counts. Learn best practices for planning and optimizing your hash indexes to ensure efficient data management and avoid common performance pitfalls.
Hash Index Structure
A hash index consists of an array of pointers, with each element in the array referred to as a hash bucket. Each bucket is 8 bytes, storing the memory address of a linked list of key entries. These entries include a value for an index key and the address of the corresponding row in the memory-optimized table. Each entry, in turn, points to the next entry in the chain, all linked to the current bucket.
Specifying Bucket Counts
When defining a hash index, you must specify the number of buckets. This count is crucial for performance:
- Lower ratio of buckets to table rows or distinct values: Results in longer average bucket link lists.
- Shorter link lists: Provide faster performance than longer ones.
- Maximum number of buckets: The maximum number of buckets you can specify in a hash index is 1,073,741,824.
Planning and Optimizing Your Hash Index
Consider the following when planning your hash index:
- Single value lookups: Ensure your index is optimized for these.
- Include all columns: In the key equality comparison.
- Accurately define a bucket count:
- It’s acceptable to default high, but avoid overestimating.
- Higher cardinality makes a better candidate for a hash index.
- Lower cardinality can lead to row chains, causing performance issues.
- High confidence in the number of rows allows for a good estimated bucket count:
- For primary keys, the bucket count should equal the total number of rows.
- For non-unique and composite hash indexes, the count should equal the total distinct keys.
How the Hash Function Works
The hash function applied to the index key columns determines which bucket a key falls into. The characteristics of this hash function include:
- Single hash function: Used for all hash indexes in the Database Engine.
- Deterministic: The same input key value always maps to the same bucket.
- Multiple index keys: Can map to the same bucket.
- Distribution of index key values: Over buckets typically follows a Poisson or bell curve distribution, not a flat linear distribution.
- Hash collisions: Occur when two index keys map to the same bucket. While some collisions are expected, a large number can impact read performance. Ideally, around 30% of buckets contain two different key values.
Adjusting Bucket Counts
The bucket count for a hash index is specified at the time of index creation but can be altered using the ALTER TABLE…ALTER INDEX REBUILD
syntax. Typically, the bucket count should be between 1 and 2 times the number of distinct values in the index key. Even if you cannot predict the exact number of values, performance remains good if the BUCKET_COUNT
is within 10 times the actual number of key values. Overestimating is generally better than underestimating.
Consequences of Incorrect Bucket Counts
Too Few Buckets
- Increased hash collisions: Of distinct key values.
- Distinct values: Forced to share buckets with other distinct values.
- Longer average chain lengths: Per bucket.
- Slower equality lookup speeds: Due to longer bucket chains.
If the bucket count is significantly lower (think 10X) than the number of unique index keys, there will be many buckets that have multiple index keys. This degrades the performance of most DML operations, particularly point lookups, i.e., lookups of individual index keys.
Result: A performance degradation of queries that rely on lookups or inserts into the hash index. For example, SELECT
queries and UPDATE/DELETE
operations with equality predicates matching the index key columns in the WHERE
clause.
Review the index definition and the table data. For example, if the PRIMARY KEY
has a HASH index with bucket_count
10,000, and the table has 1,000,000 rows, the bucket count is too low and will need to be changed.
Too Many Buckets
- More empty buckets: Affecting the performance of full index scans.
- Wasted memory: Though each bucket only uses 8 bytes.
In addition to inspecting table schema and data, you can use the DMV sys.dm_db_xtp_hash_index_stats
. You can use the following query to obtain statistics concerning the buckets and the row chains hanging off the buckets:
SELECT hs.object_id, object_name(hs.object_id) AS 'object name', i.name as 'index name', hs.*
FROM sys.dm_db_xtp_hash_index_stats AS hs
JOIN sys.indexes AS i ON hs.object_id = i.object_id AND hs.index_id = i.index_id;
Conclusion
Carefully planning and optimizing the bucket count in hash indexes is essential for maintaining performance and efficiency in memory-optimized tables. By understanding the dynamics of hash functions and bucket distributions, you can ensure your indexes are both effective and efficient. This meticulous approach to index management will help you avoid common pitfalls and achieve optimal database performance.