My Modulus Obsession Part II

In a previous post I discussed using the modulus operator ( % )to easily distribute one list across another usually larger list. For a simple 1 dimensional distribution modulus calculations enabled very compact and comprehensible code. However, not all distributions are that simple. Recently, while working on a new Exchange deployment, I needed to code for a more complex distribution algorithm which is the subject of this post.

Modern Exchange designs are usually informed by the sizing calculator and will tend to host multiple DB copies on each disk. In order to maximize efficiency, redundancy and high availability, DB copies are distributed across servers and disks. However, this may result in a visually complex DB distribution as seen below.

Calculator Generated Database Distribution

The above image was adjusted from the distribution tab of the sizing calculator. This design has 4 servers acting as an HA cooperative on the active side of an active/passive datacenter design. Servers in the secondary datacenter are to host the 3rd and 4th activation preferences. Notice, in either datacenter there are 3 different distribution patterns spreading database copies across 4 different servers. Those patterns then repeat down the list of volumes. The advantage of this is that if a single disk fails it will be reseeded from 3 other servers, restoring HA as quickly as possible.

While the calculator guides many design decisions it’s common to make adjustments that do not exactly follow the calculator’s output. For example, an organization may opt to increase or decrease capacity, hosting more or less DBs and/or disks. In this case, the organization opted for fewer drives & DBs, hosted on much more powerful SSDs. This somewhat invalidated the various setup scripts generated by the calculator. Along with several other factors, I decided to write my own set of scripts to configure disk subsystems, mount points and mailbox DBs copies, etc.

I may document the whole set of scripts in a separate post. Relevant to the distribution algorithm my approach required a set of objects that would emulate the calculator’s distribution. The objects would then be used as input to other scripts to actually create and configure the resources. At first I thought the modulus approach wouldn’t be portable to the more complex pattern, but once I got to coding it was really simply to just layer in some parameters with some simple arithmetic.

$Servers   = @( 'EXSrv-01', 'EXSrv-02', 'EXSrv-03', 'EXSrv-04' )
$DRServers = @( 'EXSrv-DR01', 'EXSrv-DR02', 'EXSrv-DR03', 'EXSrv-DR04' )

$DBs = @(
    'DB001', 'DB002', 'DB003', 'DB004', 'DB005', 'DB006', 'DB007', 'DB008'
    'DB009', 'DB010', 'DB011', 'DB012', 'DB013', 'DB014', 'DB015', 'DB016'
    'DB017', 'DB018', 'DB019', 'DB020', 'DB021', 'DB022', 'DB023', 'DB024' 
)

$Vols = @(
    'Vol1', 'Vol2', 'Vol3', 'Vol4',  'Vol5',  'Vol6'
    'Vol7', 'Vol8', 'Vol9', 'Vol10', 'Vol11', 'Vol12'
)

$Gap         = 1, 2, 3
$DBsPerVol   = 4
$VolTurnover = $DBsPerVol + $Servers.Count

$DBConfigs =
For( $i = 0; $i -lt $DBs.Count; ++$i )
{    
    $OffSet    = $Gap[ ([Math]::Floor( ($i / $DBsPerVol) )) % $Gap.Count ] # Determine the offset:
    $SrvNum    = $i % $Servers.Count                                       # Reusable index for primary & tertiary servers
    $SrvNum2nd = ($i + $Offset) % $Servers.Count                           # Reusable index for secondary & quaternary servers
    $VolNum    = ([Math]::Floor( ($i / $VolTurnover) ) % $Vols.Count)      # Returns the volume number 
    
    [PSCustomObject]@{
        Name             = $DBs[ $i ]               # Returns the DB name.
        Disk             = $VolNum + 1              # Returns the disk# 
        Volume           = $Vols[ $VolNum ]         # Returns the volume name
        PrimaryServer    = $Servers[ $SrvNum ]      # Returns the primary server
        SecondaryServer  = $Servers[ $SrvNum2nd ]   # Returns the secondary server
        TertiaryServer   = $DRServers[ $SrvNum ]    # Returns the tertiary server
        QuaternaryServer = $DRServers[ $SrvNum2nd ] # Returns the quaternary server
    }
}

$DBConfigs | Format-Table -AutoSize

Note: For brevity, this example is truncated. The real implementation had 96 DBs.

Before entering a typical for loop the code defines a few variables to guide the distribution pattern:

  1. $Gap is an array to help define the location of the secondary DB copy relative to the primary.
  2. As the name implies $DBsPerVol defines how many DBs should be on each volume.
  3. And, $VolTurnover determines how many loop iterations can elapse before we start placing databases on the next volume.

Inside the loop, several calculations are made:

  1. $OffSet uses a [Math]::Floor() calculation with simple division and % calculations to select an index from the $Gap array. Again, this will determine where to place the secondary DB copy relative to the primary, 1, 2 or 3 spots away, in a rotating pattern.
  2. $SrvNum & $SrvNum2nd calculate which index is selected from the $Servers & $DRServers arrays. As noted in the code, this effectively defines the servers hosting the primary, secondary, tertiary and quaternary copies for a given DB.
  3. Finally $VolNum uses another [Math]::Floor() calculation with a few other factors to select an index from the $Vols array.

The output of the code looks like:

Name  Disk Volume PrimaryServer SecondaryServer TertiaryServer QuaternaryServer
----  ---- ------ ------------- --------------- -------------- ----------------
DB001    1 Vol1   EXSrv-01      EXSrv-02        EXSrv-DR01     EXSrv-DR02
DB002    1 Vol1   EXSrv-02      EXSrv-03        EXSrv-DR02     EXSrv-DR03
DB003    1 Vol1   EXSrv-03      EXSrv-04        EXSrv-DR03     EXSrv-DR04
DB004    1 Vol1   EXSrv-04      EXSrv-01        EXSrv-DR04     EXSrv-DR01
DB005    1 Vol1   EXSrv-01      EXSrv-03        EXSrv-DR01     EXSrv-DR03
DB006    1 Vol1   EXSrv-02      EXSrv-04        EXSrv-DR02     EXSrv-DR04
DB007    1 Vol1   EXSrv-03      EXSrv-01        EXSrv-DR03     EXSrv-DR01
DB008    1 Vol1   EXSrv-04      EXSrv-02        EXSrv-DR04     EXSrv-DR02
DB009    2 Vol2   EXSrv-01      EXSrv-04        EXSrv-DR01     EXSrv-DR04
DB010    2 Vol2   EXSrv-02      EXSrv-01        EXSrv-DR02     EXSrv-DR01
DB011    2 Vol2   EXSrv-03      EXSrv-02        EXSrv-DR03     EXSrv-DR02
DB012    2 Vol2   EXSrv-04      EXSrv-03        EXSrv-DR04     EXSrv-DR03
...

The above output table follows the same distribution pattern that was output from the calculator. A simple export to a CSV file now allows me to use the configuration objects as input to the other scripts.

Extending modulus calculations with some basic math, I was able to generate rather complicated distribution pattern. I didn’t even attempt to write this without leveraging modulus, but I’d imagine having to resort to copious amounts if/else logic. In closing, this is another concise but powerful pattern that really makes me appreciate the modulus operator only further compounding my modulus obsession.

Advertisement

My Modulus Obsession

A modulus (modulo or mod) is the remainder of a division operation. PowerShell, like many other languages, includes a modulus operator ( % ) that returns the remainder of division between any 2 numbers. The immediate and somewhat obvious use case for the modulus operator is to test if a number is even or odd.

Simple Examples:

4 % 2 Returns: 0, and means 4 is evenly divisible.
5 % 2 Returns: 1, which is of course the remainder and means 5 is odd and not evenly divisible.

This simple function can be very useful. In one of my Exchange environments databases were distributed in an even/odd pattern across even/odd numbered servers respectively.  For example, DB001’s activation preference 1 is ExchSrv1 and preference 2 is ExchSrv2, while DB002 was the opposite.  With simple knowledge of this distribution matrix, the modulus operator enables me to quickly determine which DBs should be active on a given server.

Get-MailboxDatabase -Server ExchSrv1 |
Where-Object{ $_.Name.SubString(2) % 2 -eq 1 }

This would return all odd numbered DBs with copies on ExchSrv1, and because I know my configuration, I know these should normally be active on the same server.  This is particularly useful when putting a server back in service, I can simply add the Move-ActiveDatabase command as below:

Get-MailboxDatabase -Server ExchSrv1 | 
Where-Object{ $_.Name.SubString(2) % 2 -eq 1 } |
Move-ActiveMailboxDatabase -ActivateOnServer ExchSrv1

To return even numbered DBs all I’d need to do is compare to 0 instead of 1, so if I were working on server 2, ExchSrv2:

Get-MailboxDatabase -Server ExchSrv2 | 
Where-Object{ $_.Name.SubString(2) % 2 -eq 0 } |
Move-ActiveMailboxDatabase -ActivateOnServer ExchSrv2

PowerShell’s type conversion engine can shorten the evaluations in the above Where clauses.  This is because a Boolean 0 is False & a Boolean 1 (or anything non-zero) is True.

[Boolean]4 % 2 Returns: False
[Boolean]5 % 2 Returns: True

So technically the above examples can be shortened like below:

Get-MailboxDatabase -Server ExchSrv1 | 
Where-Object{ $_.Name.SubString(2) % 2 } |
Move-ActiveMailboxDatabase -ActivateOnServer ExchSrv1

To get even numbered DBs you’d have to negate the modulus return with either ! or -not :

Get-MailboxDatabase -Server ExchSrv2 | 
Where-Object{ !($_.Name.SubString(2) % 2) } |
Move-ActiveMailboxDatabase -ActivateOnServer ExchSrv2

Given the Boolean conversions I find it helpful to think of these expressions as something like an .IsOdd() test method. However, I find the shortened format to be a little confusing, so I prefer the more explicit approach.

So far these are relatively straight forward uses for the modulus operator.  However, this simple operation can be used in clever ways to solve other types of problems. One use case is the distribution of one set of values across another usually larger set.

As a realistic example, let’s say I have a group of students I need to assign to classrooms.

$Rooms = 101, 102, 103, 104

# Establish test objects:
$Students = @(
    'Mike', 'Chris',   'Jessica', 'Matt'
    'Jenn', 'Josh',    'Amanda',  'Dan'
    'Jim',  'Rob',     'John',    'Joseph'
    'Ryan', 'Brandon', 'Jason',   'Justin'
) | 
Select-Object @{Name = 'Name'; Expression = { $_ }},
    @{Name = 'Room'; Expression = { 0 }}

# Assign students to rooms:
For($i = 0; $i -lt $Students.Count; ++$i)
{
    $Students[$i].Room = $Rooms[$i % $Rooms.Count]
}

$Students

Return:

Name    Room
----    ----
Mike     101
Chris    102
Jessica  103
Matt     104
Jenn     101
Josh     102
Amanda   103
Dan      104
Jim      101
Rob      102
...

The first few lines merely establish the test data. What stands out, is the compact and easy to understand loop that’s assigning students to rooms. By calculating the remainder of $i divided by the number of elements in the $Rooms array, the loop assigns rooms to the students in a rolling pattern. When the end of the $Rooms array is reached it starts again at the beginning. This works without a nested loop, tracking or flag variables.

Here’s how it works:

1st iteration $i is 0, 0 % 3 returns  index 0 is selected from the $Locations array.
2nd iteration $i is 11 % 3 returns 1, index 1 is selected.
3rd iteration $i is 22 % 3 returns 2, index 2 is selected.

This will continue until $i exceeds $Rooms.Count:

4th iteration $i is 33 % 3 returns 0, index 0 is selected.
5th iteration $i is 44 % 3 returns 1, index 1 is selected.

The first 3 iterations are straight forward, but things get interesting as the value of $i meets then exceeds $Rooms.Count. Since the modulus calculation is based on whole number division, the size of the numbers are irrelevant, and the remainder always maps to the “next” index in the other array.  Whenever $i is equivalent to or an even multiple of $Rooms.Count the remainder will be zero effectively wrapping around to the beginning of the $Rooms array and completing an instance of the pattern.

Another real world example is the allocation of mailboxes to databases for an Exchange migration. This is a little more complicated because we also have to account for mailbox size. Otherwise, arbitrary assignments will result in poor distribution of data. An initial step in solving this problem is to sort the mailboxes in size order.  However, sorting alone, would result in a lopsided data distribution where, assuming a descending sort, DBs earlier in the collection will contain disproportionately more data.

Here’s how I’ve solved this problem in the past without using the modulus operator:

$AllDBs =
@(
    "DB001","DB002","DB003","DB004","DB005","DB006","DB007","DB008"
    "DB009","DB010","DB011","DB012","DB013","DB014","DB015","DB016"
    "DB017","DB018","DB019","DB020","DB021","DB022","DB023","DB024"
    "DB025","DB026","DB027","DB028","DB029","DB030","DB031","DB032"
    "DB033","DB034","DB035","DB036","DB037","DB038","DB039","DB040"
    "DB041","DB042","DB043","DB044","DB045","DB046","DB047","DB048"
)

$Direction = 1
$Index     = 0

For($i = 0; $i -lt $Mailboxes.Count; ++$i)
{
    $Mailboxes[$i].DestinationDB = $AllDBs[$Index]
    If($Direction -eq 1) {
        $Index++
    }
    ElseIf($Direction -eq 0) {
        $Index--
    }

    # Alternate direction and adjust $index.
    If($Index -eq $AllDBs.Count) {
        $Direction = 0
        $Index--
    }
    ElseIf($Index -eq -1) {
        $Direction = 1
        $Index++
    }
}

This code certainly gets the job done. It was a little difficult to develop, but it’s relatively literal to understand.  For brevity’s sake the creation of $Mailboxes isn’t shown but it’s a collection of [PSCustomObjects] representing user mailboxes, with an added yet empty DestinationDB property. $Index is manually incremented or decremented within the loop and controls which DB is assigned from the $AllDBs collection.

When $Index exceeds the $AllDBs.Count the flag variable $Direction is flipped and $Index starts decrementing instead of incrementing. This causes DBs to be assigned from both directions climbing up and down the $AllDBs array and resulting a smoother data distribution.

There’s nothing wrong with the above example, however leveraging the modulus operator I can accomplish the same thing with much less code:

For($i = 0; $i -lt $Mailboxes.Count; ++$i)
{
    $Modulus = $i % $AllDBs.Count

    $Mailboxes[$i].DestinationDB = $AllDBs[$Modulus]

    If( $Modulus -eq $AllDBs.Count -1 ) {
        [Array]::Reverse($AllDBs)
    }
}

Note: Didn’t bother restating the $AllDBs array above.

With an understanding of how the modulus based pattern works, I’ve written code that’s easy to read, efficient and, of course, nice to look at. There’s only 1 conditional statement to execute per iteration. Compare that to the previous code where depending on $Index & $Direction there were 2 – 4 conditionals executing per iteration. Furthermore, 1 – 2 incrementation operations have been replaced with just the 1 modulus calculation to assign the $Modulus variable. Granted, that might be offset by the reversal of the $AllDBs array, but in this case I’ll chance it, given how many lines I’ve saved. In closing, beautiful patterns like this are why I have a little bit of a modulus obsession.