Recommended Exchange Database Distributions may be Sub-optimal in Enterprise Environments

In a previous post, (see: My Modulus Obsession Part II) I touched on the way the Exchange Server 2019 Sizing Calculator distributes databases across servers and disks. That was really just background information to discuss the use of the modulus operator in distribution algorithms. However, in this post I’ll be discussing the database distribution itself, and particularly why it might not be optimal in some Enterprise environments.

Calculator Generated Database Distribution

To review, the above image is the distribution generated by the calculator. This is an active/passive datacenter design where the users are geographically proximate to the primary data center. For performance reasons and withstanding a DR event the active DBs should always be hosted in the primary datacenter. The pattern is designed to distribute databases evenly, but also to minimize recovery time if a disk fails. For example, if Vol1 fails on server ExSrv-01 The reseed sources will be as follows:

Database  SourceServer
--------  ------------
DB001     ExSrv-02
DB004     ExSrv-04
DB005     ExSrv-03
DB007     ExSrv-03

As opposed to a simple alternating distribution, which would use 2 servers to reseed the 4 DBs, this distribution results in sourcing from 3 servers. Ostensibly, this is done to improve reseed performance and return the DBs full HA and redundancy as quickly as possible.

However, in my experience the calculator is designed to drive your deployment toward the Exchange preferred architecture (PA) and is heavily influenced by Office 365’s operational practices and preferences. The problem is O365 operates at a much larger scale and with far more manpower than even the largest of enterprises. Therefore, the operational considerations, priorities and resulting practices may be very different.

The problem I quickly noticed with the this distribution is that you can’t take more than 1 server out of service in the primary datacenter at the same time. For example, ExSrv-01’s primary DB copies have their secondary copies spread out across the other 3 servers. Take any one of the other servers out of service and there are no activation options for those DBs in the primary site.

Using the calculators failure simulation confirms that taking any 2 servers out of service will result in lack of availability in the primary site. Granted, if you’ve configured the DAG correctly users should still have access through a DB in the secondary site. However, given the design goals, that isn’t an optimal user experience.

Server failure or maintenance scenario.

You can see from the image, if servers 1 & 2 are taken out of service DB001, 10 & 13 etc. will have no secondary copy to resort to in the primary site. The DBs will switchover to the tertiary copies in the secondary datacenter. Likewise, If I were to take servers 1 & 3 out of service, DB017, 19 (not pictured) etc. will switchover similarly. Taking any 2 servers out of service will have similar outcome, a subset of DBs will have to switch to the secondary datacenter. Furthermore, you wouldn’t be able to perform maintenance on the same server numbers in the secondary datacenter. If you were to down ExSrv-DR01 & 02, while already working on ExSrv-01 & 02 the DBs would have no where to mount and an outage would ensue. Even if you could gain greater operational concurrency for example by working on ExSrv-DR03 & 04 at the same time, this makes for a confusing and likely error prone dance.

This characteristic poses a major operational challenge. To avoid degraded service during routine maintenance, i.e. Windows or Exchange updates, we’d only be able to work on 2 servers at a time, 1 in the primary datacenter and 1 in the secondary. Each server would need to be taken out of service updated to completion, then put back in service. Then repeating the sequence again for each server.

To further illustrate the impact, I’ll draw a comparison to the previous Exchange 2010 & 13 environments and procedures I was working with. Those DAGs only had 1 DB per volume and were arranged in an odd/even pattern. For example, DB001 – 024 had primary/secondary copies on ExSrv-01/02 respectively. That layout would’ve looked something like below:

Legacy 2013 DB layout.

Note: This is a contrived image. It didn’t come from an actual calculator exercise and is only meant for illustration.

This design was inherited from Exchange 2010 where only 1 DB per volume was supported. It worked well and was reused for Exchange 2013, which ran on similar hardware. This allows for 2 servers per site to be taken out of service at the same time. Again, in an odd/even pattern Servers 1 & 3 would move their DBs to 2 & 4 respectively. The process was then reversed in order to update servers 2 & 4. Updates could be run concurrently on twice as many servers compared to the new Exchange 2019 design.

Note: Reseed performance wasn’t a consideration in previous environments because the DB disks were mirrored , making reseeds rare.

Comparing the 2 scenarios, the maintenance cycle for the new Exchange 2019 environment was going to take at least twice as long as the old environment. In my view, this would substantially increase total cost of ownership (TOC) and cause a significant, albeit unquantifiable opportunity cost. Furthermore, considering the increased pace and urgency of Exchange security updates combined with quarterly CU’s and monthly Windows patching, investing still more time and manpower was unacceptable. I needed to find a new distribution, that could give us acceptable reseed performance, but mitigate or preferably eliminate the operational hindrance.

Not having any idea how the calculator internally determines the distribution I decided to simply open a spreadsheet and do some trial and error calculations. Luckily I quickly found a working distribution; simply distribute the DB copies in a progressive fashion, like below:

Improved DB Distribution

Note: Again this is a contrived image. it wasn’t generated by the calculator

This simplified distribution results in the same number of active/passive DB copies per disk and server. However, and coincidentally, it allows for odd and even numbered servers to be taken out of service at the same time. Between the 2 datacenters 4 servers can be taken out of service concurrently, matching the operational capabilities of the previous environment. In the event of a disk failure, we’d lose the marginal benefit of reseeding from 3 servers, instead 2 servers would be uses as reseed sources. For example, if Vol1 fails on ExSrv-01 its primary copies will be reseeded from ExSrv-02, while its secondary copies will be reseeded from ExSrv-04. I felt the change in the reseed pattern was an acceptable tradeoff. I reasoned that the calculator’s recommended distribution already had at least 1 server acting as a source for 2 DB reseeds. Since I/O concerns are limited to the source server/disk pair, if it isn’t a source-side problem for 1 server then it isn’t an issue for a 2nd server either. Hence, the only conceivable loss, is that it may take slightly longer to reseed all DB copies. And, this pattern is still an improvement over the previous Exchange 2010 & 13 environments where reseeds were 1 disk to 1 disk.

Implementing the alternate distribution was actually quite easy. As discussed in My Modulus Obsession Part II, I had already written my own configuration scripts. I only needed to change 1 variable, $OffSet to create configuration objects for the new pattern. I also removed the $Gap variable which had defined the 3 alternating patters. Hence, $Offset now represents the single repeating pattern by itself. At the risk of being redundant I’ve posted the revised code below.

$Servers   = @( 'EXSrv-01', 'EXSrv-02', 'EXSrv-03', 'EXSrv-04' )
$DRServers = @( 'EXSrv-DR01', 'EXSrv-DR02', 'EXSrv-DR03', 'EXSrv-DR04' )
 
$DBs = @(
    'DB001', 'DB002', 'DB003', 'DB004', 'DB005', 'DB006', 'DB007', 'DB008'
    'DB009', 'DB010', 'DB011', 'DB012', 'DB013', 'DB014', 'DB015', 'DB016'
    'DB017', 'DB018', 'DB019', 'DB020', 'DB021', 'DB022', 'DB023', 'DB024'
)
 
$Vols = @(
    'Vol1', 'Vol2', 'Vol3', 'Vol4',  'Vol5',  'Vol6'
    'Vol7', 'Vol8', 'Vol9', 'Vol10', 'Vol11', 'Vol12'
)

$DBsPerVol   = 4
$VolTurnover = $DBsPerVol + $Servers.Count
 
$DBConfigs =
For( $i = 0; $i -lt $DBs.Count; ++$i )
{    
    $OffSet    = 1
    $SrvNum    = $i % $Servers.Count                                       # Reusable index for primary & tertiary servers
    $SrvNum2nd = ($i + $Offset) % $Servers.Count                           # Reusable index for secondary & quaternary servers
    $VolNum    = ([Math]::Floor( ($i / $VolTurnover) ) % $Vols.Count)      # Returns the volume number 
     
    [PSCustomObject]@{
        Name             = $DBs[ $i ]               # Returns the DB name.
        Disk             = $VolNum + 1              # Returns the disk# 
        Volume           = $Vols[ $VolNum ]         # Returns the volume name
        PrimaryServer    = $Servers[ $SrvNum ]      # Returns the primary server
        SecondaryServer  = $Servers[ $SrvNum2nd ]   # Returns the secondary server
        TertiaryServer   = $DRServers[ $SrvNum ]    # Returns the tertiary server
        QuaternaryServer = $DRServers[ $SrvNum2nd ] # Returns the quaternary server
    }
}
 
$DBConfigs | Format-Table -AutoSize

Conclusion:

O365 implements an unbound namespace model where client connections aren’t forced to a particular datacenter. There is no concept of primary/secondary or active/passive datacenters. Both datacenters are peers participating in an active/active capacity. Client connections may enter through either and may not take the shortest path to the mailbox. Microsoft’s apparent lack of concern with latency affords them the flexibility to allow DBs to switch between datacenters. Again, this may represent a gap between O365 priorities and those of the enterprise. For good reasons, the design discussed here is an active/passive datacenter model enforced using bound namespaces and doesn’t afford that flexibility. Nevertheless, the calculator returned a distribution that failed to account for the active/passive datacenter model, and the lack of said flexibility stemming from it. Had the resulting operational constraints not been realized they would’ve introduced significant costs and pain points to the organization.

While the Exchange 2019 Sizing Calculator remains an indispensable tool, enterprise planners need to be aware of the potential gap between the influential PA/O365 principles it incorporates and the goals/priorities of their own organization. Going into a sizing exercise with this knowledge may prime the engineer to spot concerns conflicts like I’ve discussed here.