- Review of Stellar Phoenix Mailbox Exchange Recovery - Wed, Jul 22 2015
- Configure Exchange 2013 Internet mail flow during migration - Fri, Dec 27 2013
- Exchange 2013 DAG recovery in a stretched AD site - Part 2 - Wed, Dec 25 2013
In Exchange 2013, a client can receive multiple IP addresses from the DNS for a given “Fully Qualified Domain Name (FQDN)”. With this major change in Exchange 2013, every client’s requests are HTTP-based (for example, Outlook, Outlook Anywhere, and OWA). These clients are provided with multiple IP addresses for the destination, thereby providing a failover option at the client’s end. If an IP address fails, the client has one or more other IP addresses to use to connect. If the client tries to connect with an IP address and the connection fails, the client waits for about 20 seconds and then tries the next IP address in the list. Technically, automatic recovery should happen in 21 seconds (approximately).
If you lose your CAS array or load balancer in your primary datacenter, you really don’t have to perform a datacenter switchover or any DNS changes to point clients to the secondary datacenter CAS server. Clients are automatically redirected to the secondary datacenter CAS server, as clients have received multiple IP addresses of CAS arrays. Once the client connects to the secondary datacenter, the mailbox access request from the secondary CAS server proxies back to the primary datacenter. Now, you just spend your time recovering or replacing the failed load balancer rather than working on performing a DAG switchover or a DNS change.
Suppose you have intermittent failures of the load balancer where the device might be up and accepting requests but not really processing them. In this scenario, you may have to perform a manual namespace switchover by removing the VIP configuration in DNS. During this period, no client will try to connect to the primary datacenter VIP and instead will connect to the secondary datacenter VIP. After you replace the load balancer, you can add the VIP back to the DNS and the client will start using the primary datacenter VIP.
A DAG’s quorum is lost when the majority of the DAG nodes are down. For a DAG with an even number of servers, the “node and file share majority (number of nodes and file share witness)” quorum is used. For a DAG with an odd number of members, the “node majority (number of nodes)” quorum mode is used. Suppose the DAG is spread across two datacenters and the majority of the nodes are lost due to datacenter or network failure. To recover the DAG in the secondary datacenter, you need to execute manual commands to evict the primary datacenter nodes from the DAG (cluster), form majority nodes in the secondary datacenter, and mount the database in the secondary datacenter.
Quorum loss due to primary datacenter failure
DAGs can also be protected from network failure or quorum loss by simply splitting the DAG between two datacenters (Active/Active) and then placing the file share witness server in the third datacenter so that it can be arbitrated by DAG members in either of the datacenters, regardless of the state of the network between the datacenters that contain DAG members.
Protecting the DAG from quorum loss by placing the file share witness server in the third datacenter
Exchange 2013 provides multiple options for HA and site resilience. Microsoft has tried to keep human intervention to a minimum and allow Exchange to recover itself from any kind of failures so that the administrator can focus on recovering the failed hardware or server rather than recovering the service.
I hope you now have a clear picture of the various types of failures and how to recover from them. Please comment, if you have any queries.