Recommended Exchange Database Distributions may be Sub-optimal in Enterprise Environments

In a previous post, (see: My Modulus Obsession Part II) I touched on the way the Exchange Server 2019 Sizing Calculator distributes databases across servers and disks. That was really just background information to discuss the use of the modulus operator in distribution algorithms. However, in this post I’ll be discussing the database distribution itself, and particularly why it might not be optimal in some Enterprise environments.

Calculator Generated Database Distribution

To review, the above image is the distribution generated by the calculator. This is an active/passive datacenter design where the users are geographically proximate to the primary data center. For performance reasons and withstanding a DR event the active DBs should always be hosted in the primary datacenter. The pattern is designed to distribute databases evenly, but also to minimize recovery time if a disk fails. For example, if Vol1 fails on server ExSrv-01 The reseed sources will be as follows:

Database  SourceServer
--------  ------------
DB001     ExSrv-02
DB004     ExSrv-04
DB005     ExSrv-03
DB007     ExSrv-03

As opposed to a simple alternating distribution, which would use 2 servers to reseed the 4 DBs, this distribution results in sourcing from 3 servers. Ostensibly, this is done to improve reseed performance and return the DBs full HA and redundancy as quickly as possible.

However, in my experience the calculator is designed to drive your deployment toward the Exchange preferred architecture (PA) and is heavily influenced by Office 365’s operational practices and preferences. The problem is O365 operates at a much larger scale and with far more manpower than even the largest of enterprises. Therefore, the operational considerations, priorities and resulting practices may be very different.

The problem I quickly noticed with the this distribution is that you can’t take more than 1 server out of service in the primary datacenter at the same time. For example, ExSrv-01’s primary DB copies have their secondary copies spread out across the other 3 servers. Take any one of the other servers out of service and there are no activation options for those DBs in the primary site.

Using the calculators failure simulation confirms that taking any 2 servers out of service will result in lack of availability in the primary site. Granted, if you’ve configured the DAG correctly users should still have access through a DB in the secondary site. However, given the design goals, that isn’t an optimal user experience.

Server failure or maintenance scenario.

You can see from the image, if servers 1 & 2 are taken out of service DB001, 10 & 13 etc. will have no secondary copy to resort to in the primary site. The DBs will switchover to the tertiary copies in the secondary datacenter. Likewise, If I were to take servers 1 & 3 out of service, DB017, 19 (not pictured) etc. will switchover similarly. Taking any 2 servers out of service will have similar outcome, a subset of DBs will have to switch to the secondary datacenter. Furthermore, you wouldn’t be able to perform maintenance on the same server numbers in the secondary datacenter. If you were to down ExSrv-DR01 & 02, while already working on ExSrv-01 & 02 the DBs would have no where to mount and an outage would ensue. Even if you could gain greater operational concurrency for example by working on ExSrv-DR03 & 04 at the same time, this makes for a confusing and likely error prone dance.

This characteristic poses a major operational challenge. To avoid degraded service during routine maintenance, i.e. Windows or Exchange updates, we’d only be able to work on 2 servers at a time, 1 in the primary datacenter and 1 in the secondary. Each server would need to be taken out of service updated to completion, then put back in service. Then repeating the sequence again for each server.

To further illustrate the impact, I’ll draw a comparison to the previous Exchange 2010 & 13 environments and procedures I was working with. Those DAGs only had 1 DB per volume and were arranged in an odd/even pattern. For example, DB001 – 024 had primary/secondary copies on ExSrv-01/02 respectively. That layout would’ve looked something like below:

Legacy 2013 DB layout.

Note: This is a contrived image. It didn’t come from an actual calculator exercise and is only meant for illustration.

This design was inherited from Exchange 2010 where only 1 DB per volume was supported. It worked well and was reused for Exchange 2013, which ran on similar hardware. This allows for 2 servers per site to be taken out of service at the same time. Again, in an odd/even pattern Servers 1 & 3 would move their DBs to 2 & 4 respectively. The process was then reversed in order to update servers 2 & 4. Updates could be run concurrently on twice as many servers compared to the new Exchange 2019 design.

Note: Reseed performance wasn’t a consideration in previous environments because the DB disks were mirrored , making reseeds rare.

Comparing the 2 scenarios, the maintenance cycle for the new Exchange 2019 environment was going to take at least twice as long as the old environment. In my view, this would substantially increase total cost of ownership (TOC) and cause a significant, albeit unquantifiable opportunity cost. Furthermore, considering the increased pace and urgency of Exchange security updates combined with quarterly CU’s and monthly Windows patching, investing still more time and manpower was unacceptable. I needed to find a new distribution, that could give us acceptable reseed performance, but mitigate or preferably eliminate the operational hindrance.

Not having any idea how the calculator internally determines the distribution I decided to simply open a spreadsheet and do some trial and error calculations. Luckily I quickly found a working distribution; simply distribute the DB copies in a progressive fashion, like below:

Improved DB Distribution

Note: Again this is a contrived image. it wasn’t generated by the calculator

This simplified distribution results in the same number of active/passive DB copies per disk and server. However, and coincidentally, it allows for odd and even numbered servers to be taken out of service at the same time. Between the 2 datacenters 4 servers can be taken out of service concurrently, matching the operational capabilities of the previous environment. In the event of a disk failure, we’d lose the marginal benefit of reseeding from 3 servers, instead 2 servers would be uses as reseed sources. For example, if Vol1 fails on ExSrv-01 its primary copies will be reseeded from ExSrv-02, while its secondary copies will be reseeded from ExSrv-04. I felt the change in the reseed pattern was an acceptable tradeoff. I reasoned that the calculator’s recommended distribution already had at least 1 server acting as a source for 2 DB reseeds. Since I/O concerns are limited to the source server/disk pair, if it isn’t a source-side problem for 1 server then it isn’t an issue for a 2nd server either. Hence, the only conceivable loss, is that it may take slightly longer to reseed all DB copies. And, this pattern is still an improvement over the previous Exchange 2010 & 13 environments where reseeds were 1 disk to 1 disk.

Implementing the alternate distribution was actually quite easy. As discussed in My Modulus Obsession Part II, I had already written my own configuration scripts. I only needed to change 1 variable, $OffSet to create configuration objects for the new pattern. I also removed the $Gap variable which had defined the 3 alternating patters. Hence, $Offset now represents the single repeating pattern by itself. At the risk of being redundant I’ve posted the revised code below.

$Servers   = @( 'EXSrv-01', 'EXSrv-02', 'EXSrv-03', 'EXSrv-04' )
$DRServers = @( 'EXSrv-DR01', 'EXSrv-DR02', 'EXSrv-DR03', 'EXSrv-DR04' )
 
$DBs = @(
    'DB001', 'DB002', 'DB003', 'DB004', 'DB005', 'DB006', 'DB007', 'DB008'
    'DB009', 'DB010', 'DB011', 'DB012', 'DB013', 'DB014', 'DB015', 'DB016'
    'DB017', 'DB018', 'DB019', 'DB020', 'DB021', 'DB022', 'DB023', 'DB024'
)
 
$Vols = @(
    'Vol1', 'Vol2', 'Vol3', 'Vol4',  'Vol5',  'Vol6'
    'Vol7', 'Vol8', 'Vol9', 'Vol10', 'Vol11', 'Vol12'
)

$DBsPerVol   = 4
$VolTurnover = $DBsPerVol + $Servers.Count
 
$DBConfigs =
For( $i = 0; $i -lt $DBs.Count; ++$i )
{    
    $OffSet    = 1
    $SrvNum    = $i % $Servers.Count                                       # Reusable index for primary & tertiary servers
    $SrvNum2nd = ($i + $Offset) % $Servers.Count                           # Reusable index for secondary & quaternary servers
    $VolNum    = ([Math]::Floor( ($i / $VolTurnover) ) % $Vols.Count)      # Returns the volume number 
     
    [PSCustomObject]@{
        Name             = $DBs[ $i ]               # Returns the DB name.
        Disk             = $VolNum + 1              # Returns the disk# 
        Volume           = $Vols[ $VolNum ]         # Returns the volume name
        PrimaryServer    = $Servers[ $SrvNum ]      # Returns the primary server
        SecondaryServer  = $Servers[ $SrvNum2nd ]   # Returns the secondary server
        TertiaryServer   = $DRServers[ $SrvNum ]    # Returns the tertiary server
        QuaternaryServer = $DRServers[ $SrvNum2nd ] # Returns the quaternary server
    }
}
 
$DBConfigs | Format-Table -AutoSize

Conclusion:

O365 implements an unbound namespace model where client connections aren’t forced to a particular datacenter. There is no concept of primary/secondary or active/passive datacenters. Both datacenters are peers participating in an active/active capacity. Client connections may enter through either and may not take the shortest path to the mailbox. Microsoft’s apparent lack of concern with latency affords them the flexibility to allow DBs to switch between datacenters. Again, this may represent a gap between O365 priorities and those of the enterprise. For good reasons, the design discussed here is an active/passive datacenter model enforced using bound namespaces and doesn’t afford that flexibility. Nevertheless, the calculator returned a distribution that failed to account for the active/passive datacenter model, and the lack of said flexibility stemming from it. Had the resulting operational constraints not been realized they would’ve introduced significant costs and pain points to the organization.

While the Exchange 2019 Sizing Calculator remains an indispensable tool, enterprise planners need to be aware of the potential gap between the influential PA/O365 principles it incorporates and the goals/priorities of their own organization. Going into a sizing exercise with this knowledge may prime the engineer to spot concerns conflicts like I’ve discussed here.

Advertisement

My Modulus Obsession Part II

In a previous post I discussed using the modulus operator ( % )to easily distribute one list across another usually larger list. For a simple 1 dimensional distribution modulus calculations enabled very compact and comprehensible code. However, not all distributions are that simple. Recently, while working on a new Exchange deployment, I needed to code for a more complex distribution algorithm which is the subject of this post.

Modern Exchange designs are usually informed by the sizing calculator and will tend to host multiple DB copies on each disk. In order to maximize efficiency, redundancy and high availability, DB copies are distributed across servers and disks. However, this may result in a visually complex DB distribution as seen below.

Calculator Generated Database Distribution

The above image was adjusted from the distribution tab of the sizing calculator. This design has 4 servers acting as an HA cooperative on the active side of an active/passive datacenter design. Servers in the secondary datacenter are to host the 3rd and 4th activation preferences. Notice, in either datacenter there are 3 different distribution patterns spreading database copies across 4 different servers. Those patterns then repeat down the list of volumes. The advantage of this is that if a single disk fails it will be reseeded from 3 other servers, restoring HA as quickly as possible.

While the calculator guides many design decisions it’s common to make adjustments that do not exactly follow the calculator’s output. For example, an organization may opt to increase or decrease capacity, hosting more or less DBs and/or disks. In this case, the organization opted for fewer drives & DBs, hosted on much more powerful SSDs. This somewhat invalidated the various setup scripts generated by the calculator. Along with several other factors, I decided to write my own set of scripts to configure disk subsystems, mount points and mailbox DBs copies, etc.

I may document the whole set of scripts in a separate post. Relevant to the distribution algorithm my approach required a set of objects that would emulate the calculator’s distribution. The objects would then be used as input to other scripts to actually create and configure the resources. At first I thought the modulus approach wouldn’t be portable to the more complex pattern, but once I got to coding it was really simply to just layer in some parameters with some simple arithmetic.

$Servers   = @( 'EXSrv-01', 'EXSrv-02', 'EXSrv-03', 'EXSrv-04' )
$DRServers = @( 'EXSrv-DR01', 'EXSrv-DR02', 'EXSrv-DR03', 'EXSrv-DR04' )

$DBs = @(
    'DB001', 'DB002', 'DB003', 'DB004', 'DB005', 'DB006', 'DB007', 'DB008'
    'DB009', 'DB010', 'DB011', 'DB012', 'DB013', 'DB014', 'DB015', 'DB016'
    'DB017', 'DB018', 'DB019', 'DB020', 'DB021', 'DB022', 'DB023', 'DB024' 
)

$Vols = @(
    'Vol1', 'Vol2', 'Vol3', 'Vol4',  'Vol5',  'Vol6'
    'Vol7', 'Vol8', 'Vol9', 'Vol10', 'Vol11', 'Vol12'
)

$Gap         = 1, 2, 3
$DBsPerVol   = 4
$VolTurnover = $DBsPerVol + $Servers.Count

$DBConfigs =
For( $i = 0; $i -lt $DBs.Count; ++$i )
{    
    $OffSet    = $Gap[ ([Math]::Floor( ($i / $DBsPerVol) )) % $Gap.Count ] # Determine the offset:
    $SrvNum    = $i % $Servers.Count                                       # Reusable index for primary & tertiary servers
    $SrvNum2nd = ($i + $Offset) % $Servers.Count                           # Reusable index for secondary & quaternary servers
    $VolNum    = ([Math]::Floor( ($i / $VolTurnover) ) % $Vols.Count)      # Returns the volume number 
    
    [PSCustomObject]@{
        Name             = $DBs[ $i ]               # Returns the DB name.
        Disk             = $VolNum + 1              # Returns the disk# 
        Volume           = $Vols[ $VolNum ]         # Returns the volume name
        PrimaryServer    = $Servers[ $SrvNum ]      # Returns the primary server
        SecondaryServer  = $Servers[ $SrvNum2nd ]   # Returns the secondary server
        TertiaryServer   = $DRServers[ $SrvNum ]    # Returns the tertiary server
        QuaternaryServer = $DRServers[ $SrvNum2nd ] # Returns the quaternary server
    }
}

$DBConfigs | Format-Table -AutoSize

Note: For brevity, this example is truncated. The real implementation had 96 DBs.

Before entering a typical for loop the code defines a few variables to guide the distribution pattern:

  1. $Gap is an array to help define the location of the secondary DB copy relative to the primary.
  2. As the name implies $DBsPerVol defines how many DBs should be on each volume.
  3. And, $VolTurnover determines how many loop iterations can elapse before we start placing databases on the next volume.

Inside the loop, several calculations are made:

  1. $OffSet uses a [Math]::Floor() calculation with simple division and % calculations to select an index from the $Gap array. Again, this will determine where to place the secondary DB copy relative to the primary, 1, 2 or 3 spots away, in a rotating pattern.
  2. $SrvNum & $SrvNum2nd calculate which index is selected from the $Servers & $DRServers arrays. As noted in the code, this effectively defines the servers hosting the primary, secondary, tertiary and quaternary copies for a given DB.
  3. Finally $VolNum uses another [Math]::Floor() calculation with a few other factors to select an index from the $Vols array.

The output of the code looks like:

Name  Disk Volume PrimaryServer SecondaryServer TertiaryServer QuaternaryServer
----  ---- ------ ------------- --------------- -------------- ----------------
DB001    1 Vol1   EXSrv-01      EXSrv-02        EXSrv-DR01     EXSrv-DR02
DB002    1 Vol1   EXSrv-02      EXSrv-03        EXSrv-DR02     EXSrv-DR03
DB003    1 Vol1   EXSrv-03      EXSrv-04        EXSrv-DR03     EXSrv-DR04
DB004    1 Vol1   EXSrv-04      EXSrv-01        EXSrv-DR04     EXSrv-DR01
DB005    1 Vol1   EXSrv-01      EXSrv-03        EXSrv-DR01     EXSrv-DR03
DB006    1 Vol1   EXSrv-02      EXSrv-04        EXSrv-DR02     EXSrv-DR04
DB007    1 Vol1   EXSrv-03      EXSrv-01        EXSrv-DR03     EXSrv-DR01
DB008    1 Vol1   EXSrv-04      EXSrv-02        EXSrv-DR04     EXSrv-DR02
DB009    2 Vol2   EXSrv-01      EXSrv-04        EXSrv-DR01     EXSrv-DR04
DB010    2 Vol2   EXSrv-02      EXSrv-01        EXSrv-DR02     EXSrv-DR01
DB011    2 Vol2   EXSrv-03      EXSrv-02        EXSrv-DR03     EXSrv-DR02
DB012    2 Vol2   EXSrv-04      EXSrv-03        EXSrv-DR04     EXSrv-DR03
...

The above output table follows the same distribution pattern that was output from the calculator. A simple export to a CSV file now allows me to use the configuration objects as input to the other scripts.

Extending modulus calculations with some basic math, I was able to generate rather complicated distribution pattern. I didn’t even attempt to write this without leveraging modulus, but I’d imagine having to resort to copious amounts if/else logic. In closing, this is another concise but powerful pattern that really makes me appreciate the modulus operator only further compounding my modulus obsession.