Introduction
In high-availability environments, SQL Server Always On Availability Groups (AG) are essential for minimizing downtime and preventing data loss. However, situations can arise where a secondary replica lags behind the primary, disrupting synchronization and risking potential data inconsistencies. This blog provides instructions on how to manually catch up a lagging secondary replica using transaction log backups.
Common Scenarios Leading to a Lagging Secondary Replica
Understanding the reasons behind a lagging secondary replica can help in effectively addressing the issue. Here are some common scenarios:
1. Extended Network Outages
A prolonged network outage between the primary and secondary replicas can prevent the secondary from receiving and applying transaction logs. When the connection is restored, the accumulated backlog may cause the secondary to be significantly behind the primary.
2. Insufficient Bandwidth During Heavy Load
High transaction volumes on the primary database can overwhelm the available network bandwidth. If the bandwidth between replicas isn’t sufficient to handle the load, the secondary may fall behind, leading the availability group to enter a “not synchronized” state.
3. Resource Constraints on the Secondary Replica
Limitations in CPU, memory, or disk I/O on the secondary server can delay the application of transaction logs. These resource bottlenecks can cause the secondary replica to become increasingly out of sync with the primary.
4. Unexpected Failover Events
In the event of an unplanned failover, an unsynchronized secondary replica can result in significant lag. Post-failover, manual intervention is required to catch up the secondary to ensure it’s prepared for future failover scenarios.
Steps to Manually Catch Up the Secondary Replica
To restore full synchronization and prevent potential data loss, follow these steps to manually catch up the secondary replica using transaction log backups.
1. Query the Last Hardened LSN on the Secondary Replica
Begin by identifying the last hardened log sequence number (LSN) on the secondary replica. This LSN represents the point up to which the secondary has applied transaction logs.
SELECT DISTINCT
dcs.database_name,
ds.last_hardened_lsn
FROM
sys.dm_hadr_database_replica_states ds
JOIN
sys.dm_hadr_database_replica_cluster_states dcs ON ds.group_database_id = dcs.group_database_id
WHERE
dcs.database_name = 'YourDatabaseName';
This query retrieves the last hardened LSN for the specified database, which you’ll use to identify the required transaction log backups.
2. Identify the Relevant Transaction Log Backups in MSDB
With the last hardened LSN, query the backup history in the msdb
database to find the transaction log backups that need to be applied to the secondary replica.
SELECT
name,
backup_set_id,
backup_start_date,
backup_finish_date,
first_lsn,
last_lsn
FROM
msdb..backupset
WHERE
database_name = 'YourDatabaseName' AND
type = 'L' AND -- 'L' indicates transaction log backups
first_lsn <= 'YourLSN' AND
last_lsn >= 'YourLSN'
ORDER BY
backup_start_date;
This query lists all transaction log backups where the hardened LSN falls between first_lsn
and last_lsn
.
3. Take the Database Out of the Availability Group on the Secondary Replica
Before applying the transaction logs, remove the database from the availability group on the secondary replica. This action puts the database into the RESTORING state, allowing you to apply transaction log backups manually.
ALTER DATABASE [YourDatabaseName] SET HADR OFF;
4. Apply the Transaction Log Backups Manually
Restore the necessary transaction log backups to the secondary replica using the RESTORE LOG
command with the NORECOVERY
option. This option keeps the database in a restoring state, enabling the application of subsequent logs.
RESTORE LOG [YourDatabaseName]
FROM DISK = 'Path\To\YourFirstTransactionLogBackup.trn'
WITH NORECOVERY;
RESTORE LOG [YourDatabaseName]
FROM DISK = 'Path\To\YourSecondTransactionLogBackup.trn'
WITH NORECOVERY;
-- Continue this process for all relevant transaction log backups
Ensure that you apply all transaction log backups sequentially up to the most recent one needed to catch up the secondary replica.
5. Add the Database Back into the Availability Group
After applying all the required transaction log backups, add the database back to the availability group and resume synchronization:
ALTER DATABASE [YourDatabaseName] SET HADR AVAILABILITY GROUP = [YourAvailabilityGroupName];
GO
ALTER DATABASE [YourDatabaseName] SET HADR RESUME;
GO
6. Verify Synchronization
Confirm that the database is synchronizing properly by querying the status of the availability group database:
SELECT
dcs.database_name,
drs.synchronization_state_desc,
drs.is_suspended,
drs.last_hardened_lsn
FROM
sys.dm_hadr_database_replica_states drs
JOIN
sys.dm_hadr_database_replica_cluster_states dcs ON drs.group_database_id = dcs.group_database_id
WHERE
dcs.database_name = 'YourDatabaseName';
The synchronization_state_desc
should display SYNCHRONIZED, and is_suspended
should be 0, indicating that synchronization is active and not suspended.
Conclusion
Catching up a lagging secondary replica in a SQL Server Always On Availability Group is important for maintaining high availability and ensuring data consistency across replicas. By following the steps outlined—identifying the last hardened LSN, applying the necessary transaction log backups, and re-adding the database to the availability group—you can restore full synchronization without significant data loss or downtime. Regular monitoring and proactive management of network and server resources can help prevent such lag issues from occurring in the future.