Introduction
In the world of database management, SQL Server has consistently pushed the boundaries of performance and efficiency. The release of SQL Server 2022 marks another milestone, particularly in the realm of parallel redo operations. Let’s delve into how SQL Server 2022 transforms the landscape of thread management and batch redo, overcoming the limitations of its predecessors.
Limitations of Parallel redo
“A SQL Server instance uses up to 100 threads for parallel redo for secondary replicas. Each database uses up to one-half of the total number of CPU cores, but not more than 16 threads per database. If the total number of required threads for a single instance exceeds 100, SQL Server uses a single redo thread for every remaining database. Serial Redo threads are released after ~15 seconds of inactivity.”
- First few databases get more redo threads and others are single threaded.
- DBs can have different redo workloads which could keep on changing.
- DBs not in HA also use up the maximum thread limit count for crash recovery and HA dbs do not re-evaluate the thread usage after startup.
SQL Server 2022 Parallel Redo
SQL Server 2022 now uses a new parallel recovery algorithm so that worker threads are assigned for parallel recovery based on the workload. This eliminates the chance that a busy database will remain in a single-threaded recovery.
The Parallel Redo Thread Pool is a thread pool shared at the instance level, available for all databases that involve redo operations. This means that each database can benefit from parallel redo, even if it lacks sufficient active connections to fill the thread pool.
In previous SQL Server versions, the number of threads available for redo was limited to 100. This limitation arose from the need for each thread to acquire a latch on the log record being redone. This latch prevented other threads from accessing the same log record, leading to potential performance bottlenecks.
The Parallel Redo Thread Pool resolves this issue by enabling multiple threads to access the same log record simultaneously. This is accomplished by batching together the redo of log records. Each batch of log records is assigned to a single thread, reducing the number of times the latch needs to be acquired.
As a result of the Parallel Redo Thread Pool, the redo performance in SQL Server 2022 can be significantly improved.
You can still query the redo threads, here we are querying while there is active redo occurring on a 2 CPU secondary, we can see three threads running in context of database 5, the secondary replica database. Then, when redo is idle, notice that while the threads are still allocated, they are no longer tied to database thread.
Active redo | Idle redo |
|
|
Each DB requiring redo creates redo partitions and helper partitions.
Redo/Helper Partitions are queued to the thread pool when there is pending redo.
Calculating thread usage
Number of redo partitions per DB min(num_cpu/2, 16)
Number of redo threads in redo worker pool=min(total number of redo partitions+1,num_of_cores)
Number of helper partitions is 1 per DB.
Number of helper threads in helper pool=min(total number of helper partitions+1,num_of_cores)
EXAMPLE: So take a system with 48 cores and 27 ag databases.
Redo Partitions Per DB will be min(48/2,16)=16 but for 27 databases, could be as much as 16*27=432 partitions
Redo threads min(16+1,16)=16
Helper partitions=27
Helper threads=min(27+1,16)=16
So, for this example, 16 redo workers and 16 helper
EXAMPLE: System with 224 cores and 27 ag databases?
Redo Partitions Per DB will be min(224/2,16)=16 but for 27 databases, could be as much as 16*27=432 partitions
Redo threads min(433,224)=224
Helper partitions=27
Helper threads=min(27+1,16)=16
Here we have 224 redo threads and 16 helper threads.
EXAMPLE Below, we have 5 ag databases and 2 core system with active DML. We see redo threads are
Number of redo partitions per DB min(2/2, 16)=1 per DB so 5 redo partitions
Number of redo threads in redo worker pool=min(5+1, 2)=2
Here we can see the redo threads have been released on an idle system.
You can disable just like with parallel redo prior Trace Flags
Trace flag 3484 will disable redo pool and helper pool
Trace flag 3485 will disable helper pool
The second addition to thread management is Parallel Redo Batch
The Parallel Redo Batch Redo feature enhances redo performance by batching together the redo of contiguous log records. This involves grouping together log records that appear consecutively in the log file into a single batch. Each batch of log records is then redone by a single thread.
By batching together the redo of log records, the number of times the log file needs to be read is reduced. Each thread only needs to read the log records within its assigned batch. Consequently, the batching of log records significantly enhances redo performance.
Conclusion
The Parallel Redo Batch Redo feature is particularly advantageous for databases with a large amount of redo work. Such databases typically exhibit a high number of contiguous log records. By batching the redo of these log records, the Parallel Redo Batch Redo feature can greatly enhance redo performance.
EXAMPLE: So take a system with 48 cores and 27 ag databases.
1. Number of redo threads in redo worker pool=min(total number of redo partitions+1,num_of_cores)
2. Redo threads min(16+1,16)=16
Why in formula num_of_cores =16, but task write system with 48 cores?