Today we will discuss VLDB backups. Handling VLDBs can be challenging for many administrators. Sometimes just completing a full backup in a reasonable amount of time is challenging, not to mention restoring a database to meet your organization’s service level agreements (SLA). There are many options to consider. Some options exist outside of the BACKUP DATABASE statement. They key to improving database backups is to adjust various configurations to increase throughput, increase concurrency, and reduce the amount of data to backup.
There are a few configurations you can use increase throughput in the BACKUP DATABASE statement, such as using multiple backup set files, BLOCKSIZE, BUFFERCOUNT, and MAXTRANSFERSIZE.
There are also configuration changes you can make to reduce the amount of data that is backed up. First, the COMPRESSION and “backup compression” options compress the backup so less disk I/O is required. Also, using multiple backup files or devices on separate volumes can increase performance. Finally, you can place very large tables and indexes on different file groups and back these up separately.
Let’s review each of these options in greater detail…
Compression
Backup compression can significantly improve backup performance. How much compression can improve backup performance will depend on much of the database can be compressed. A compressed backup is smaller than uncompressed backup of the same data, therefore, less disk I/O is required.
There are several factors that can affect how much a backup can be compressed, such as data type, data page uniqueness, and database encryption. You can read further about these factors at link below.
Note: Compression can significantly increase CPU usage. It is recommended to first test the usage of compression on smaller database and use caution when attempting to compress backups on an instance that is already experiencing consistently high average CPU usage. You can consider using Resource Governor where you can assign backup sessions to a lower priority group, to limit the impact on a production workload.
As your VLDB grows you’re going to want to keep its size in check. You’re going to have a lot of accumulated data that is not queried often. For example, clients in the finance industry usually have to comply with all kinds of regulations that might force them to keep several years of data. It doesn’t make sense to keep the data from 7 years ago consuming as much storage as the data that is from last week.
Starting with SQL Server 2008 Enterprise, we’ve had some different options on the menu for compression. Row compression is recommended for the more frequent data that is still very active. It is recommended that page compression for all the data that has to stay available but is not frequently queried on the system. Depending on the type of queries you do, compression might even perform better because of the decrease in IO.
Combining this with partitioning is a very powerful combination since you’ll be able to do something like uncompressed for the 3 newest partitions, row compression for the 3 next ones and page compression for the rest.
You can enable compression in two ways. By using the COMPRESSION backup set option of the BACKUP statement.
BACKUP DATABASE MyDb TO DISK=’Z:\Backups\ByDb.bak’ WITH COMPRESSION;
Or, by setting the server configuration option.
EXEC sp_configure ‘backup compression default’, 1;
RECONFIGURE;
GO
Multiple Backup Files
Using multiple backupset devices can significantly increase backup performance. For each file, SQL Server uses additional threads increasing backup threads. Starting in SQL Server 2016, the number of backup threads increase from one to four for each file.
Keep in mind using, each file increases the amount of required memory in the buffer pool.
Use Multiple Filegroups
Often, a VLDB will start in life like any other small database and keep growing and growing over time. Suddenly, the DBA is left with multiple terabytes of data, and all on a single PRIMARY filegroup. On the other hand, if you already know that your database has a goal of growing to multiple TBs then you should design it from the ground up to have multiple filegroups.
By separating the database into multiple filegroups several optimization opportunities are afforded.
- Separation of filegroups can be used in conjunction with large table partitioning. This also brings the advantage of optimizing database backups.
- Backups can be specified to only those filegroups set to READ/WRITE.
- Specific filegroups can also be backed up in a “round robin” fashion or only backing up the filegroup that has the active partition(s) being used in the database(s).
- Full database backups for VLDB’s are often not required to be taken on a regular basis. It is often sufficient to take a full database backup supplemented by differential backups on a more consistent basis.
- One such methodology is to take a full database backup every month, supplemented by nightly or weekly differential backups, which are also supplemented by multiple daily transaction log backups.
Perform Smart Maintenance Routines
What is meant by “Smart” is a routine that looks at the current state of the database to decide what work the routine will do. The classic example is checking for the fragmentation level of a table to see if you will do a defrag or a rebuild. This is, in contrast, to the SQL Server Maintenance Plans which will simply run on any table regardless of whether it’s fragmented or not.
For database backups, particularly transaction logs, you can look at the current log size since the last backup by using the sys.dm_db_log_stats table-valued function and only backup the log if it is, for example, greater than 100 MB.
As the database grows, you may need to customize these routines to fit your needs. Ola Hallengren’s Maintenance scripts are capable of much of these smart routines.
We will cover more on this topic in other Kickstart Tips.
Buffer Count and Max Transfer Size
BUFFERCOUNTand MAXTRANSFERSIZEwork together. Can you configure one or both.
BUFFERCOUNTspecifies the total number of I/O buffers to be used for the backup operation. You can specify any positive integer; however, large numbers of buffers might cause “out of memory” errors because of inadequate virtual address space in the Sqlservr.exe process.
MAXTRANSFERSIZEspecifies the largest unit of transfer in bytes to be used between SQL Server and the backup media. The possible values are multiples of 65536 bytes (64 KB) ranging up to 4194304 bytes (4 MB).
You can use these two options to configure the throughput of the backup I/O operation. You can think of the BUFFERCOUNT as number of buckets and MAXTRANSFERSIZEis the size of each bucket.
The total space used by the buffers is determined by: BUFFERCOUNT* MAXTRANSFERSIZE.
Table Partitioning
Table partitioning has been available since SQL 2005 but is still an Enterprise only feature (or a Premium feature if we’re talking about Azure SQL database V12). In addition to making backups much easier, there are many other advantages to table partitioning:
- You can have a table that spans multiple filegroups. This gives you all kinds of flexibility for storage, performance, and recovery.
- You can load data and extract data as a metadata-only operation with the SWITCH command.
- If you’re running SQL Server 2014, you can update stats on particular partitions instead of one massive table.
- You can perform index maintenance at a partition level instead of the entire table. With SQL Server 2014 or newer, you can perform online-rebuilds per partition.
- Table partitioning makes it easier to use Columnstore indexes because you can manage the Columnstore at the partition level (with SQL Server 2012 or newer).
- Better query performance from partition elimination when predicates use partitioned columns.
- You can specify different compression levels on a per partition basis.
In general, the toughest thing about partitioning is dealing with a table that wasn’t designed this way from the beginning. In those cases, do your work on your development environment to script out an entire solution to migrate the data over to a partitioned table and then work little by little to move it all on your production server.
Because table partitioning places your table on to multiple filegroups, managing backups are much easier and give you more options in your backup plans.
Check for duplicate, redundant, or unused indexes
As time goes by, many databases start accumulating a lot of trash. Many people work on them, people apply DTA suggestions blindly, someone creates an index for a query that they ran once, etc. The result is that a significant portion of your VLDB might be occupied by redundant pieces or completely unused pieces.
Thanks to the DMVs we can easily check the definition and usage of indexes and get rid of most of the trash. Look for indexes that might be subsets of others and speak to the developers to see if it’s necessary. Same with indexes with the same keys but different INCLUDES, there might be an opportunity for index consolidation there.
Finally, use the Index Usage DMV to look at your index usage pattern over time and over a full business cycle. If you have unused indexes even after your entire business cycle is done, then propose to disable them and eventually drop them.
The end result that we want is a lean, mean database schema where every table and index has a known purpose and function.
File-Snapshot Backups for Database Files in Azure
If your VLDB is on an Azure VM, with the database files on Azure Blog Storage in a premium storage account, you can achieve database backup performance by orders of magnitude faster with file-snapshot backups. Performance improvements can be from hours to seconds, depending on your workload.
SQL Server File-snapshot backup uses Azure snapshots to provide nearly instantaneous backups and quicker restores for database files stored using the Azure Blob storage service. The file-snapshot backup consists of a set of Azure snapshots of the blobs containing the database files plus a backup file containing pointers to these file-snapshots. Each file-snapshot is stored in the container with the base blob.
Below is a link to a good article (SQL Server VLDB in Azure) describing the process, as well as some disadvantages.
- SQL Server VLDB in Azure
https://techcommunity.microsoft.com/t5/datacat/sql-server-vldb-in-azure-dba-tasks-made-simple/ba-p/305505 - File-Snapshot Backups for Database Files in Azure
https://docs.microsoft.com/en-us/sql/relational-databases/backup-restore/file-snapshot-backups-for-database-files-in-azure?view=sql-server-ver15#using-azure-snapshots-to-back-up-database-files-stored-in-azure