How to Fix Exchange DAG Witness Failed State

Summary: Database Availability Group (DAG) was introduced with Exchange 2010 that allows organizations to add multiple mailbox servers in a group to achieve high availability and site resilience. When the member servers are in even numbers, the DAG employs a file witness server to maintain the quorum. However, if this witness server fails or goes offline, it can disrupt or break the email flow and compromise the DAG. Thus, it's critical to fix a failed witness server. In this blog, you will learn some simple solutions to fix the failed witness server issue and restore mailboxes using Exchange recovery software.

Microsoft Exchange Database Availability Group or DAG requires a witness server and witness directory (automatically created by the Exchange on the Witness Server) for maintaining quorum.

A witness server or file witness server (FSW) provides automatic failover protection. It identifies which member server holds the mirror copy and which server holds the principal copy of the database, ensuring at least one server is active at any given time.

But sometimes, due to underlying issues or misconfiguration in the Exchange DAG witness server may lead to a failed state, resulting in an unhealthy and compromised DAG. A witness server state may also show as failed if the server doesn’t boot due to hardware or software failure.

To check the witness server status in DAG, use Get-DatabaseAvailabilityGroup cmdlet in Exchange Management Shell (EMS),

Get-DatabaseAvailabilityGroup -Identity "DAG01" -Status | ft Name, Witness*, Servers

If the witness server has failed, the following error/warning message is displayed in the output,

WARNING: Database availability group 'DAG01' witness is in a failed state. The database availability group requires the witness server to maintain quorum. Please use the Set-DatabaseAvailabilityGroup cmdlet to re-create the witness server and the directory.
WitnessServer : fsw.domain.local
WitnessDirectory : C:\DAGFileShareWitnesses\DAG1.domain.local
AlternateWitnessServer :
AlternateWitnessDirectory :
WitnessShareInUse : InvalidConfiguration
DxStoreWitnessServers :

In this blog, you will learn a simple solution to fix the failed witness server state and bring your DAG back to a healthy state.

Methods to Resolve DAG Witness Server Failed State in Exchange

When the witness server fails due to any hardware or software issue rather than a network-related problem, set up a new witness server and then change the witness server and witness directory in the DAG using the Set-DatabaseAvailabilityGroup cmdlet. The command is as follow,

Set-DatabaseAvailabilityGroup -Identity "DAGName" -WitnessServer "NewFileWitnessServerName? -WitnessDirectory NonRootLocalLongFullPath

For instance,

Set-DatabaseAvailabilityGroup -Identity "DAG01" -WitnessServer "FSW02.abc.com" -WitnessDirectory C:\DAG01

If the Windows firewall is enabled, you may get the following warning message in the output,

WARNING: Unable to access file shares on witness server 'FSW02.abc.com'. The database availability group may be more vulnerable to failures until this problem is corrected. You can use the Set-DatabaseAvailabilityGroup cmdlet to try the operation again. Error: The network path was not found Unable to change the quorum for database availability group DAG01. The network path for witness server '\\FSW02.abc.com\DAG01.abc.com' was not found. This may be due to firewall settings.

In such a case, you may either disable the Windows Firewall or add an exception for File and Printer Sharing on SMB port 445 (used by the witness server). Then execute the cmdlet.

To verify the new DAG witness server, execute the following cmdlet,

Get-DatabaseAvailabilityGroup -Identity "DAG01" -Status | ft Name, Witness*, Servers

If the output displays the new Witness server and witness directory, you have successfully changed the witness server.

You may also perform these steps via Exchange Admin Center (EAC). The steps are as follow,

To verify the DAG witness server, check the server name in servers > database availability groups. Also, check that the witness directory is created successfully on the witness server.

IMPORTANT NOTE: After this, you must exclude the witness directory on the witness server from the antivirus.

Alternate Solution

If the above solution didn’t work for you and your witness server isn’t dead, try checking the cluster using the Get-ClusterResource cmdlet.

If the output displays File Share Witness state as failed, bring it back online using the following cmdlet,

Get-ClusterResource | Start-ClusterResource

This will start the cluster and bring the FSW back to the online state. If this happens, you don’t need to perform any additional action.

Conclusion

Witness server is an important component of the DAG required to maintain the quorum. However, a witness server may go offline or fail after a reboot leading to failed witness server state that breaks the failover clustering. In such a critical situation, you must try to bring the witness server online or change to a new witness server and witness directory. If the member server fails during these operations or database dismounts due to inconsistencies, you can use your backup to restore the database and mailboxes. If backups are available, you can use Exchange recovery software, such as Stellar Repair for Exchange, to repair the database, extract mailboxes and save them as PST. You may also export the mailboxes directly to your live Exchange server or Office 365.   

Related Post