Introduction
Welcome to the third part of our series on setting up SQL Server availability groups on Linux. In the previous post, we focused on enabling High Availability Disaster Recovery (HADR) on SQL Server and setting up the availability group. This post will guide you through configuring Pacemaker resources and constraints to manage the availability group effectively.
Prerequisites
Before we begin, ensure that you have completed the steps from Parts 1 and 2. You should have three Ubuntu 20.04 hosts with SQL Server installed, Pacemaker configured and running on all nodes, and an availability group created and operational.
Step 1: Install SQL Server HA Tools
On all nodes, install the SQL Server High Availability tools, which are necessary for integrating SQL Server with Pacemaker:
sudo apt -y update
sudo apt -y install mssql-server-ha
Step 2: Create Pacemaker Resource for Availability Group
On the primary node (vm0), we will create the Pacemaker resource for the availability group:
1. Create a file with Pacemaker login credentials:
This step sets up a file containing the credentials that Pacemaker will use to authenticate with SQL Server.
echo 'pacemakerLogin' >> ~/pacemaker-passwd
echo 'Password123' >> ~/pacemaker-passwd
sudo mv ~/pacemaker-passwd /var/opt/mssql/secrets/passwd
sudo chown root:root /var/opt/mssql/secrets/passwd
sudo chmod 400 /var/opt/mssql/secrets/passwd
2. Determine the Virtual IP Address
The virtual IP address is used by clients to connect to the availability group, ensuring high availability and seamless failover. Choose an IP address that is not in use on your network and assign it to the availability group. This IP should be within the same subnet as your nodes and configured in your DNS settings.
3. Create the Pacemaker resource for the availability group:
These commands create the necessary Pacemaker resources to manage the SQL Server availability group. The first command creates the availability group resource, and the second command sets up a virtual IP address resource that will be used by the clients to connect to the primary replica. The constraints ensure the virtual IP is always associated with the primary replica.
sudo pcs resource create ag_cluster ocf:mssql:ag ag_name=ag1 meta failure-timeout=60s promotable notify=true
sudo pcs resource create virtualip ocf:heartbeat:IPaddr2 ip=10.160.108.15
sudo pcs constraint colocation add virtualip with master ag_cluster-clone INFINITY with-rsc-role=Master
sudo pcs constraint order promote ag_cluster-clone then start virtualip
Replace 10.160.108.15
with the virtual IP address you have chosen.
Step 3: Verify Pacemaker Resource Status
Check the status of the Pacemaker resources to ensure everything is set up correctly. This command will show the current state of all resources and their locations:
sudo pcs resource status
Step 4: Manage Cluster Services
Make sure the required services are online and enabled to start at boot on all nodes. This ensures that the cluster services start automatically after a reboot:
sudo systemctl enable corosync
sudo systemctl enable pacemaker
sudo systemctl enable pcsd
systemctl status corosync
systemctl status pacemaker
systemctl status pcsd
If any services are not running, start them. This command starts the cluster services if they are not already running:
sudo systemctl start corosync
sudo systemctl start pacemaker
sudo systemctl start pcsd
Step 5: Handling Failovers
To manually move the availability group to another node, use these commands. This can be useful for testing or maintenance purposes:
sudo pcs resource move ag_cluster-clone vm1 --master
sudo pcs resource clear ag_cluster
To move the virtual IP address, use these commands. This ensures the virtual IP moves to the node that is currently the primary replica:
sudo pcs resource move virtualip vm1
sudo pcs resource clear virtualip
Step 6: Inspect Cluster Logs
For more detailed information on errors or warnings, check the Pacemaker logs. This command filters the logs to show only entries related to Pacemaker:
cat /var/log/syslog | grep pacemaker
If you suspect the Cluster Information Base (CIB) is out-of-date or corrupted on a node, synchronize it. This command synchronizes the CIB on the local node with the rest of the cluster:
sudo pcs cluster sync
To restart the cluster services, be cautious of potential downtime. These commands stop and start the cluster services on all nodes, which can help resolve communication issues but may cause downtime:
sudo pcs cluster stop --all
sudo pcs cluster start --all
Conclusion
In this post, we configured Pacemaker resources and constraints to manage our SQL Server availability group effectively. These steps ensure that your SQL Server instances are highly available and can fail over seamlessly between nodes. In the next post, we will dive into advanced troubleshooting and performance tuning to optimize your high availability setup. Stay tuned for more detailed guidance as we continue our journey toward robust SQL Server high availability on Linux.