SQL Server Database – Detect Page Level Corruption Automatically
Summary: SQL Server has a bunch of mechanisms built into it that allows it to automatically detect when a problem occurs when it is reading a page from disk, whether that page is corrupt for instance or the I/O just fails. It also has mechanisms that allow it to tell you when problems have occurred rather than just detecting them. And that is what we are going to talk about in this article. We are going to cover torn-page detection, page checksums, and how these two are different. Also, it advises on downloading the trial version of Stellar Repair for MS SQL software to fix database corruption.
SQL Server has a way of allowing data file pages to be what is called protected on disk. It is actually a real misnomer because what it really means is that it allows SQL Server to detect when a page has become corrupted. It is simply a way of SQL Server being able to tell that the I/O subsystem has corrupted a page. You can turn on page protection options or change the one you have by using ALTER DATABASE SET PAGE_VERIFY<option>, there are three different options that you can use:
- The first one is NONE, and I really do not recommend that you do that.
- TORN_PAGE DETECTION
And we are going to talk about what these two different options mean and how SQL Server is using them. No matter which of the two options you choose, it is all done by the buffer pool, and this is also known as the buffer manager or the buffer cache. It is the block of memory where SQL Server keeps the in-memory copies of data file pages.
If you think about it, every 8 KB data file page is really 16 contiguous 512-byte disk blocks, and for a page to be properly written out to disk, all 16 of those disk blocks have to be written out correctly. It’s possible of course for a page to not be completely written out. For instance, if a power failure occurs and a disk doesn’t have enough backup to be able to finish writing out all 16 of those disk blocks. When this occurs, it’s called a torn page. Now SQL Server has a mechanism that allows it to detect when a torn page has occurred.
The way that it does it is as follows:
- Takes two-bits from each sector, stores them in the page header (a 32-byte metadata structure in each data file page) and writes an alternating bit pattern in each sector
- 01 in the first, 10 in the second, 01 in the third, etc.
- The bit pattern flips each time the page is written to disk
- On a subsequent read, if the pattern is disrupted, the page is torn
Now, this is all very well for detecting when a torn page occurs, but it doesn’t allow SQL Server to detect when a problem has occurred or a corruption has occurred within a disk sector. This is why page checksums were invented.
ALTER DATABASE [Company]
SET PAGE_VERIFY TORN_PAGE_DETECTION;
It was introduced in SQL Server 2005 and it is a checksum per page. Any database that you created on SQL Server 2005 onward has page checksums enabled by default apart from tempdb. Tempdb did not get page checksums until SQL Server 2008. So if you have a server that you have upgraded in place from SQL Server 2005 to a version after SQL Server 2005, you must make sure that you manually go and enable page checksums for tempdb using the ALTER DATABASE code that I showed you.
- Performed by the buffer pool:
- Calculated as a four-byte value and stored in the page header as the very last thing SQL Server does on a physical write
- Recalculated and checked against the stored value as the very first thing SQL Server does on a physical read
- Upgraded databases from before SQL Server 2005 must enable it
- Switching it on doesn’t do anything until pages are written
- Switching it on does not erase existing torn page detection
- Negligible CPU overhead as simple checksum algorithm
- Error detecting, not error correcting
- Provides the “smoking gun” that the error is not due to SQL Server
- Helps in hardware vendor vs. Microsoft arguments
- Checked when:
- Page is read normally
- Page is read during consistency checks
- Page is read during BACKUP … WITH CHECKSUM
- Page is read from within a checksum’d backup
ALTER DATABASE [Company]
SET PAGE_VERIFY CHECKSUM;
- You must enable one of the above methods as soon as possible because the faster you realize you have corruption, the more likely you will be able to recover with minimal downtime and data loss
- Get prepared with a SQL Database Repair software like Stellar Repair for MS SQL software
As I mentioned before, almost every DBA is going to see corruption at some point in their career, so it is really useful to understand the different types of corruption and why it is so inevitable that you are going to see corruption at some point. So I encourage you to start downloading the trial version of Stellar Repair for MS SQL software to be prepared for the most critical situation you might face. I hope this article has been informative for you.