SQL Server Production Database Corruption – Causes & Fixes

SYSTEM DOWN: Database Error 8928 and application not working!!!!

I woke up at 5 am, not to my sounding alarm clock but to email alerts and a phone call. Hello? Sam – the system administrator on the other end of the line in panicky voice says: “All east division users are down on the service desk application. They can log into the application but cannot pull up any client’s data. I have already refreshed app pool, rebooted application servers and still cannot get it to work”. The database alerting system I created already gave me a lead on what was going on and where to start troubleshooting right away. Multiple tickets submitted by the system users provided details as well from the front-end. The screenshot below shows an instance of issue experienced.

It looks like the application tries to fix database errors encountered but still unable to recover from the failure. Continue to read to find what was the cause of this database corruption, the quick and to some the easy fix.

CAUSE                                        

In this situation, the impacted application was a system with SLA max of 2 hours downtime. Time was not something that I had at that moment, hence I had to immediately provide a resolution and then go back through logs to identify root cause. For the purpose of this article, I will provide the root cause first for the reader to have a better understanding of the solution.

Long story short, one of the email alerts indicated the SQL Server experienced a dirty shutdown a few minutes before the application went down. I got a confirmation from system administrator that some servers (including this database server) had experienced a shutdown due to major failure on the host machine. The SQL Server database went into an inconsistency state due to the dirty shutdown as there were in-flight transactions during that time. If you ever run into this scenario, follow the steps below and you should be back up and running in no time with respect to server resources and size of database.

FIX                                              

Identify Corrupt Database

  • Login into SQL instance, under object explorer you would find the database(s) that are possibly corrupted as highlighted in screenshot below. A suspect database is where the database is in an inconsistent/corrupt state.
  • In order to identify the corruption, run a DBCC CHECKDB on the database using query below.
  • You should have a result as shown below. It will identify all allocation and consistency errors and show which database objects are corrupted. It will also suggest what is the minimum repair level for the corruption.  NOTE: A DBCC repair is not guaranteed and database may still be corrupted after the repair.

Restore Corrupt Database

As a database administrator, planning for such disasters is an important part of the job and you need to have a recovery plan set up that is all ready to roll. In my case, the production database was setup in FULL RECOVERY model and transaction logs were backed up every 1 hour. With this plan, I could successfully restore the database using the point in time method. I was able to restore and achieve our Recovery Time Objective (RTO) requirement.

NOTE: The trick here is to restore database as a different name and make sure database is consistent and no other issues. You can then rename or drop the suspect database and rename the newly created database to its original

RESTORE DATABASE Corrupt_Database_Name 
   FROM CorruptDBBackups 
   WITH FILE=3, NORECOVERY; 
RESTORE LOG Corrupt_Database_Name    
   FROM CorruptDBBackups 
   WITH FILE=4, NORECOVERY, STOPAT = ‘Sep 29, 2019 12:00 AM’; –specify your target time on STOPAT clause
RESTORE LOG Corrupt_Database_Name    
FROM CorruptDBBackups 
WITH FILE=5, NORECOVERY, STOPAT = ‘Sep 29, 2019 12:00 AM’; –specify your target time on STOPAT clause
RESTORE DATABASE Corrupt_Database_Name WITH RECOVERY;

  The restore process when run will progress and complete as shown below.

  Check Integrity and Status

The final step is to perform one more integrity check on newly restored database to make sure there are no issues and online.

  • Run another DBCC CHECKDB on the database and verify results as shown below screenshot.
  • As shown in results below, the restored database is ONLINE, set to FULL recovery model and allow multi-user access.

CONCLUSION

So, there you have it! If you have a good disaster recovery plan setup, then you should have no problems when your production database or any other database gets corrupted. Now let’s say you find yourself in a situation where a proper DR plan was not established and there aren’t any backups to restore. You can consider utilizing the minimal repair level reported by the DBCC CHECKDB when you run the integrity check on the suspect database.

Remember that the repair feature of SQL Server is not robust and a guaranteed solution. For a faster, and more versatile repair that would bring your corrupt SQL database back into a working start with minimal data loss, look no further with Stellar SQL database recovery tool. It repairs faster using an advanced repair algorithm and even can recover deleted data in your database. Click the mentioned link to learn more!

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.