- Review of Stellar Phoenix Mailbox Exchange Recovery - Wed, Jul 22 2015
- Configure Exchange 2013 Internet mail flow during migration - Fri, Dec 27 2013
- Exchange 2013 DAG recovery in a stretched AD site – Part 2 - Wed, Dec 25 2013
There are three failure types:
- Database or disk failure
- Server failure
- Site/Datacenter failure
Today I will write about the first two types of failures and in Part 2 I will cover Site/Datacenter failures.
Database failure ^
Exchange database failure can happen for a variety of reasons, including database corruption or disk failure in a “Just a Bunch of Disks (JBOD)” disk environment. Active Manager, which runs on the “Database Availability Group (DAG)” node, manages the complete DAG. Active Manager performs a “best copy and server selection” action to determine the next best healthy server with a healthier copy of a database. Some new rules help Active Manager find the best server and best copy to activate. Based on these new rules, the new database will be activated.
Active Manager activated DB1
Exchange provides options for quick recovery in case of database failures. For example:
- Exchange 2013 provides the Autoreseed option. With this option, if a disk fails, the database will be reseeded to the new spare disk available in the system. This requires preconfiguration of the spare disk and database. In the event of a disk failure, if the disk is no longer available, the auto reseed will be kicked to the new spare volume that is preconfigured in the system.
- Exchange 2013 has an option for lagged database copy, which is inherited from Exchange 2010. This option protects the Exchange database from corruption by delaying the log replay on the database. It works well with the new transport server feature “safety net.” It is a queue that holds copies of the emails delivered to an active mailbox database on the mailbox server. You can specify the number of days that it should hold emails in a queue. For example, if the database is corrupted, all we need to do is suspend the database from replication, copy the LAG database and log files, remove the old log files (keeping only the required log files), mount the database, and then request a “safety net” to redeliver the latest emails.
Server failure ^
If a mailbox server fails or reboots, the Active Manager running on the DAG will perform the best copy and server selection process, determine the best database available for all active databases running on the failed server, and mount the database on the available server.
If one or more “Client Access Server (CAS)” servers fail in a CAS array, other CAS servers in the array will take up the load without having any impact on email users. Similarly, if one or more mailbox servers fail in the DAG, other nodes in the DAG with a passive healthy copy of the failed node will be mounted automatically by Active Manager.
Server failure and database recovered on the best available server
“Managed Availability” is a new built-in active monitoring component to protect Exchange 2013 CAS and mailbox servers from server failures. It includes three components to monitor the server. The first component collects logs/data from the server. The second component analyzes the health of the collected logs. The third component is a responder engine, which takes necessary action to recover the server, such as restarting the application pool, the service, or even the server itself. If none of the action items are successful, the administrator will be alerted through event logs.
I will cover the site resilience feature of Exchange 2013 in the next part.