Working with XML? Don’t forget about XPath

.Net and ASP.Net applications use XML-based configuration files to store and manage their configuration data.  The files are typically stored with the application files and aptly named *.exe.config or web.config respectively.  MS Exchange is largely written in managed .Net code and not surprisingly stores executable and web service configuration data in just such files.  In fact, the %ExchangeInstallPath%\bin folder has dozens of *.exe.config files presumably storing a great many configuration options.

While for the most part Exchange’s default configurations are fine and the standard advice is to avoid changes, modifications are sometimes needed.  Having excellent XML support, PowerShell is an ideal tool to deploy config file changes both initially and in build/update cycles.

This post will demonstrate/discuss techniques I use to modify an Exchange configuration file.  However, XML is a standard, therefore these techniques are by no means limited to the task at hand.  Effectively, this discussion is a single example, the principles of which can be reused to perform similar work on almost any XML document.

In this case, I need to modify Exchange BackPressure thresholds on all my servers.  The settings are controlled by the Edge.Transport.exe.config file partially documented here.  Among other characteristics .Net Config files typically store a series of <add ../> nodes with key/value pairs as attributes formatted similar to below. I’ll be working within the same <appSettings> element/level, but again these techniques should be relevant at any level of any XML document.

<appSettings>
  <add key="AgentLogEnabled" value="true" />
  <add key="ResolverRetryInterval" value="30" />
  <add key="DeliverMoveMailboxRetryInterval" value="2" />
  <add key="ResolverLogLevel" value="Disabled" />
  . . .
</appSettings>

Note: The terminology can be confusing. Hereafter, pay careful attention to the value of the key attribute versus the value attribute, versus the value of the value attribute.

Per the documentation I needed to set the following 4 keys and their respective values:

UsedVersionBuckets.LowToMedium
UsedVersionBuckets.MediumToHigh
UsedVersionBuckets.HighToMedium
UsedVersionBuckets.MediumToLow

Dealing with redundant XML node names is always a little tricky.  I’ll have to isolate the correct node before I can assert the value.  Complicating matters further these keys aren’t present in the file by default.  So, I have to ensure the nodes exist before I can isolate them to assert the value attribute.

First Revision:

$ConfigFile = Join-Path $env:ExchangeInstallPath "Bin\EdgeTransport.exe.config"
$Config     = [XML](Get-Content $ConfigFile)

$ConfigPairs = [Ordered]@{
    'UsedVersionBuckets.LowToMedium'  = "1998"
    'UsedVersionBuckets.MediumToHigh' = "3000"
    'UsedVersionBuckets.HighToMedium' = "2000"
    'UsedVersionBuckets.MediumToLow'  = "1600"
}

$appSettings = Select-XML -Xml $Config -XPath "/configuration/appSettings"

If( $appSettings.Count -eq 1 ) {    
    $appSettings  = $appSettings.Node # Reset appSettings to the returned node
    $ExistingKeys = $appSettings.ChildNodes.Key
    ForEach( $Key in $ConfigPairs.Keys )
    {
        If( $Key -notin $ExistingKeys ) {
            # Add the node:
            $Add = $Config.CreateElement('add')
            $Add.SetAttribute( 'key', $Key )
            $Add.SetAttribute( 'value', $ConfigPairs[$Key] )
            [Void]$appSettings.AppendChild( $Add )
        }
        Else {
            # Node already exist, just set:
            $ConfigKey = $appSettings.ChildNodes | Where-Object{ $_.Key -eq $Key }
            $ConfigKey.value = $ConfigPairs[$Key]
        }
    }
}
Else {
    Write-Host "appSettings node not found!"
}

$Config.Save($ConfigFile)

For ease of reference, I stored the key names and values in an ordered hash table.  Although not required, the ordered hash ensures the sequence of the resulting XML output. 

Note: The order of XML elements usually doesn’t matter, so the [Ordered]hash only ensures the order of new elements among themselves. New elements are appended to the given section. In this case, the 4 elements are new and will ultimately be the last 4 elements within the <appSettings> section. If an element already exists it will be modified in place.

By looping through the dictionary entries I can check if each key is present and create the nodes as needed. But, when the key is already present, I still need to isolate it using a where{} clause before I can set the value.  This code works fine and is certainly satisfactory for the task at hand.  Nevertheless, running Where{} several times across what could be dozens of elements is inefficient.  Performance isn’t a big concern for this type of project, but the code could be better! Furthermore, given the commonality of XML-related tasks refactoring was definitely worth my time.

WARNING: Pay close attention to casing. XML is case sensitive! In an earlier version of the code and this post I had inadvertently capitalized the value attribute.  That caused Exchange Transport Service stop/start failures.  It took me hours to spot the oversight. Furthermore, had this been fully deployed it could have effected organization-wide mail flow.

Second Revision:

$ConfigFile = Join-Path $env:ExchangeInstallPath "Bin\EdgeTransport.exe.config"
$Config     = [XML](Get-Content $ConfigFile)

$ConfigPairs = [Ordered]@{
    'UsedVersionBuckets.LowToMedium'  = "1998"
    'UsedVersionBuckets.MediumToHigh' = "3000"
    'UsedVersionBuckets.HighToMedium' = "2000"
    'UsedVersionBuckets.MediumToLow'  = "1600"
}

$appSettingsPath = '/configuration/appSettings'

ForEach( $Key in $ConfigPairs.Keys )
{
    $Node = Select-XML -Xml $Config -XPath "$appSettingsPath/add[@key='$Key']"
    If( $Node ) {
        # Element exists:
        $Node = $Node.Node
        $Node.value = $ConfigPairs[$Key]
    }
    Else {
        $appSettings = Select-XML -Xml $Config -XPath $appSettingsPath
        If( $appSettings ) {
             # Create the node:
            $appSettings = $appSettings.Node
            $Add = $Config.CreateElement('add')            
            $Add.SetAttribute( 'key', $Key )
            $Add.SetAttribute( 'value', $ConfigPairs[$Key] )
            [Void]$appSettings.AppendChild( $Add )
        }
        Else {
            Write-Host -ForegroundColor Red "appSettings node doesn't exist!"
        }
    }
}

$Config.Save($ConfigFile)

Note: I reversed the logic because after initial deployment, I expect the nodes will exist more often than not.

Note: Although they are case sensitive, the .SelectSingleNode() and .SelectNodes() methods can be used in place of the Select-Xml cmdlet.

The second example leverages XPath more aggressively, obviating the need for a Where{} clause by directly checking for the key’s presence. Each loop iteration evaluates an expanding string to substitute the current value of $Key into the XPath query executed by Select-Xml.  Then, conditioned on the presence or lack of a result, the code either asserts the value attribute or creates the needed element.

XPath is succinct but somewhat opaque. I use it enough to have it in my toolbox, but too infrequently to be fluent.  To help craft the queries I sometimes use the XML Tools extension for VSCode.

The add-in has a neat feature that reveals the XPath query respective to the cursor position in the document. Simply position the cursor on the element you’re interested in, invoke the command pallet (Ctrl + Alt + P), and start typing “XML Tools: Get Current XPath”.

Hit enter and the XPath query string /configuration/appSettings/add[70]/@key is returned.  As partly indicated by the common array index syntax […], the pseudo-code translation can be read as a directive to return the value of the key attribute at the 70th instance of the <add> element within the <appSettings> section. This feature is granular enough to reveal the XPath statement down to the node or attribute level, depending on where you position the cursor.  While the returned string isn’t exactly what’s needed, it is a head start to crafting the more specific query below:

/configuration/appSettings/add[@key='UsedVersionBuckets.LowToMedium']

Adjusted per each iteration, the above XPath query is a directive to return the <add> element where the key attribute is present and has the sought-after value.

Conclusion:

XPath is a powerful path syntax and query language for selecting nodes from XML data.  Properly crafted XPath queries can be used to simplify and streamline your code.  Combined with other PowerShell features and techniques we can create concise, efficient and reusable code patterns for many different scenarios.  In this case, I used an ordered hash table to store the desired configuration elements making it very easy to check for, create and/or assert values in a short easy to comprehend set of statements.

Additional Resources:

Advertisement

Recommended Exchange Database Distributions may be Sub-optimal in Enterprise Environments

In a previous post, (see: My Modulus Obsession Part II) I touched on the way the Exchange Server 2019 Sizing Calculator distributes databases across servers and disks. That was really just background information to discuss the use of the modulus operator in distribution algorithms. However, in this post I’ll be discussing the database distribution itself, and particularly why it might not be optimal in some Enterprise environments.

Calculator Generated Database Distribution

To review, the above image is the distribution generated by the calculator. This is an active/passive datacenter design where the users are geographically proximate to the primary data center. For performance reasons and withstanding a DR event the active DBs should always be hosted in the primary datacenter. The pattern is designed to distribute databases evenly, but also to minimize recovery time if a disk fails. For example, if Vol1 fails on server ExSrv-01 The reseed sources will be as follows:

Database  SourceServer
--------  ------------
DB001     ExSrv-02
DB004     ExSrv-04
DB005     ExSrv-03
DB007     ExSrv-03

As opposed to a simple alternating distribution, which would use 2 servers to reseed the 4 DBs, this distribution results in sourcing from 3 servers. Ostensibly, this is done to improve reseed performance and return the DBs full HA and redundancy as quickly as possible.

However, in my experience the calculator is designed to drive your deployment toward the Exchange preferred architecture (PA) and is heavily influenced by Office 365’s operational practices and preferences. The problem is O365 operates at a much larger scale and with far more manpower than even the largest of enterprises. Therefore, the operational considerations, priorities and resulting practices may be very different.

The problem I quickly noticed with the this distribution is that you can’t take more than 1 server out of service in the primary datacenter at the same time. For example, ExSrv-01’s primary DB copies have their secondary copies spread out across the other 3 servers. Take any one of the other servers out of service and there are no activation options for those DBs in the primary site.

Using the calculators failure simulation confirms that taking any 2 servers out of service will result in lack of availability in the primary site. Granted, if you’ve configured the DAG correctly users should still have access through a DB in the secondary site. However, given the design goals, that isn’t an optimal user experience.

Server failure or maintenance scenario.

You can see from the image, if servers 1 & 2 are taken out of service DB001, 10 & 13 etc. will have no secondary copy to resort to in the primary site. The DBs will switchover to the tertiary copies in the secondary datacenter. Likewise, If I were to take servers 1 & 3 out of service, DB017, 19 (not pictured) etc. will switchover similarly. Taking any 2 servers out of service will have similar outcome, a subset of DBs will have to switch to the secondary datacenter. Furthermore, you wouldn’t be able to perform maintenance on the same server numbers in the secondary datacenter. If you were to down ExSrv-DR01 & 02, while already working on ExSrv-01 & 02 the DBs would have no where to mount and an outage would ensue. Even if you could gain greater operational concurrency for example by working on ExSrv-DR03 & 04 at the same time, this makes for a confusing and likely error prone dance.

This characteristic poses a major operational challenge. To avoid degraded service during routine maintenance, i.e. Windows or Exchange updates, we’d only be able to work on 2 servers at a time, 1 in the primary datacenter and 1 in the secondary. Each server would need to be taken out of service updated to completion, then put back in service. Then repeating the sequence again for each server.

To further illustrate the impact, I’ll draw a comparison to the previous Exchange 2010 & 13 environments and procedures I was working with. Those DAGs only had 1 DB per volume and were arranged in an odd/even pattern. For example, DB001 – 024 had primary/secondary copies on ExSrv-01/02 respectively. That layout would’ve looked something like below:

Legacy 2013 DB layout.

Note: This is a contrived image. It didn’t come from an actual calculator exercise and is only meant for illustration.

This design was inherited from Exchange 2010 where only 1 DB per volume was supported. It worked well and was reused for Exchange 2013, which ran on similar hardware. This allows for 2 servers per site to be taken out of service at the same time. Again, in an odd/even pattern Servers 1 & 3 would move their DBs to 2 & 4 respectively. The process was then reversed in order to update servers 2 & 4. Updates could be run concurrently on twice as many servers compared to the new Exchange 2019 design.

Note: Reseed performance wasn’t a consideration in previous environments because the DB disks were mirrored , making reseeds rare.

Comparing the 2 scenarios, the maintenance cycle for the new Exchange 2019 environment was going to take at least twice as long as the old environment. In my view, this would substantially increase total cost of ownership (TOC) and cause a significant, albeit unquantifiable opportunity cost. Furthermore, considering the increased pace and urgency of Exchange security updates combined with quarterly CU’s and monthly Windows patching, investing still more time and manpower was unacceptable. I needed to find a new distribution, that could give us acceptable reseed performance, but mitigate or preferably eliminate the operational hindrance.

Not having any idea how the calculator internally determines the distribution I decided to simply open a spreadsheet and do some trial and error calculations. Luckily I quickly found a working distribution; simply distribute the DB copies in a progressive fashion, like below:

Improved DB Distribution

Note: Again this is a contrived image. it wasn’t generated by the calculator

This simplified distribution results in the same number of active/passive DB copies per disk and server. However, and coincidentally, it allows for odd and even numbered servers to be taken out of service at the same time. Between the 2 datacenters 4 servers can be taken out of service concurrently, matching the operational capabilities of the previous environment. In the event of a disk failure, we’d lose the marginal benefit of reseeding from 3 servers, instead 2 servers would be uses as reseed sources. For example, if Vol1 fails on ExSrv-01 its primary copies will be reseeded from ExSrv-02, while its secondary copies will be reseeded from ExSrv-04. I felt the change in the reseed pattern was an acceptable tradeoff. I reasoned that the calculator’s recommended distribution already had at least 1 server acting as a source for 2 DB reseeds. Since I/O concerns are limited to the source server/disk pair, if it isn’t a source-side problem for 1 server then it isn’t an issue for a 2nd server either. Hence, the only conceivable loss, is that it may take slightly longer to reseed all DB copies. And, this pattern is still an improvement over the previous Exchange 2010 & 13 environments where reseeds were 1 disk to 1 disk.

Implementing the alternate distribution was actually quite easy. As discussed in My Modulus Obsession Part II, I had already written my own configuration scripts. I only needed to change 1 variable, $OffSet to create configuration objects for the new pattern. I also removed the $Gap variable which had defined the 3 alternating patters. Hence, $Offset now represents the single repeating pattern by itself. At the risk of being redundant I’ve posted the revised code below.

$Servers   = @( 'EXSrv-01', 'EXSrv-02', 'EXSrv-03', 'EXSrv-04' )
$DRServers = @( 'EXSrv-DR01', 'EXSrv-DR02', 'EXSrv-DR03', 'EXSrv-DR04' )
 
$DBs = @(
    'DB001', 'DB002', 'DB003', 'DB004', 'DB005', 'DB006', 'DB007', 'DB008'
    'DB009', 'DB010', 'DB011', 'DB012', 'DB013', 'DB014', 'DB015', 'DB016'
    'DB017', 'DB018', 'DB019', 'DB020', 'DB021', 'DB022', 'DB023', 'DB024'
)
 
$Vols = @(
    'Vol1', 'Vol2', 'Vol3', 'Vol4',  'Vol5',  'Vol6'
    'Vol7', 'Vol8', 'Vol9', 'Vol10', 'Vol11', 'Vol12'
)

$DBsPerVol   = 4
$VolTurnover = $DBsPerVol + $Servers.Count
 
$DBConfigs =
For( $i = 0; $i -lt $DBs.Count; ++$i )
{    
    $OffSet    = 1
    $SrvNum    = $i % $Servers.Count                                       # Reusable index for primary & tertiary servers
    $SrvNum2nd = ($i + $Offset) % $Servers.Count                           # Reusable index for secondary & quaternary servers
    $VolNum    = ([Math]::Floor( ($i / $VolTurnover) ) % $Vols.Count)      # Returns the volume number 
     
    [PSCustomObject]@{
        Name             = $DBs[ $i ]               # Returns the DB name.
        Disk             = $VolNum + 1              # Returns the disk# 
        Volume           = $Vols[ $VolNum ]         # Returns the volume name
        PrimaryServer    = $Servers[ $SrvNum ]      # Returns the primary server
        SecondaryServer  = $Servers[ $SrvNum2nd ]   # Returns the secondary server
        TertiaryServer   = $DRServers[ $SrvNum ]    # Returns the tertiary server
        QuaternaryServer = $DRServers[ $SrvNum2nd ] # Returns the quaternary server
    }
}
 
$DBConfigs | Format-Table -AutoSize

Conclusion:

O365 implements an unbound namespace model where client connections aren’t forced to a particular datacenter. There is no concept of primary/secondary or active/passive datacenters. Both datacenters are peers participating in an active/active capacity. Client connections may enter through either and may not take the shortest path to the mailbox. Microsoft’s apparent lack of concern with latency affords them the flexibility to allow DBs to switch between datacenters. Again, this may represent a gap between O365 priorities and those of the enterprise. For good reasons, the design discussed here is an active/passive datacenter model enforced using bound namespaces and doesn’t afford that flexibility. Nevertheless, the calculator returned a distribution that failed to account for the active/passive datacenter model, and the lack of said flexibility stemming from it. Had the resulting operational constraints not been realized they would’ve introduced significant costs and pain points to the organization.

While the Exchange 2019 Sizing Calculator remains an indispensable tool, enterprise planners need to be aware of the potential gap between the influential PA/O365 principles it incorporates and the goals/priorities of their own organization. Going into a sizing exercise with this knowledge may prime the engineer to spot concerns conflicts like I’ve discussed here.

My Modulus Obsession Part II

In a previous post I discussed using the modulus operator ( % )to easily distribute one list across another usually larger list. For a simple 1 dimensional distribution modulus calculations enabled very compact and comprehensible code. However, not all distributions are that simple. Recently, while working on a new Exchange deployment, I needed to code for a more complex distribution algorithm which is the subject of this post.

Modern Exchange designs are usually informed by the sizing calculator and will tend to host multiple DB copies on each disk. In order to maximize efficiency, redundancy and high availability, DB copies are distributed across servers and disks. However, this may result in a visually complex DB distribution as seen below.

Calculator Generated Database Distribution

The above image was adjusted from the distribution tab of the sizing calculator. This design has 4 servers acting as an HA cooperative on the active side of an active/passive datacenter design. Servers in the secondary datacenter are to host the 3rd and 4th activation preferences. Notice, in either datacenter there are 3 different distribution patterns spreading database copies across 4 different servers. Those patterns then repeat down the list of volumes. The advantage of this is that if a single disk fails it will be reseeded from 3 other servers, restoring HA as quickly as possible.

While the calculator guides many design decisions it’s common to make adjustments that do not exactly follow the calculator’s output. For example, an organization may opt to increase or decrease capacity, hosting more or less DBs and/or disks. In this case, the organization opted for fewer drives & DBs, hosted on much more powerful SSDs. This somewhat invalidated the various setup scripts generated by the calculator. Along with several other factors, I decided to write my own set of scripts to configure disk subsystems, mount points and mailbox DBs copies, etc.

I may document the whole set of scripts in a separate post. Relevant to the distribution algorithm my approach required a set of objects that would emulate the calculator’s distribution. The objects would then be used as input to other scripts to actually create and configure the resources. At first I thought the modulus approach wouldn’t be portable to the more complex pattern, but once I got to coding it was really simply to just layer in some parameters with some simple arithmetic.

$Servers   = @( 'EXSrv-01', 'EXSrv-02', 'EXSrv-03', 'EXSrv-04' )
$DRServers = @( 'EXSrv-DR01', 'EXSrv-DR02', 'EXSrv-DR03', 'EXSrv-DR04' )

$DBs = @(
    'DB001', 'DB002', 'DB003', 'DB004', 'DB005', 'DB006', 'DB007', 'DB008'
    'DB009', 'DB010', 'DB011', 'DB012', 'DB013', 'DB014', 'DB015', 'DB016'
    'DB017', 'DB018', 'DB019', 'DB020', 'DB021', 'DB022', 'DB023', 'DB024' 
)

$Vols = @(
    'Vol1', 'Vol2', 'Vol3', 'Vol4',  'Vol5',  'Vol6'
    'Vol7', 'Vol8', 'Vol9', 'Vol10', 'Vol11', 'Vol12'
)

$Gap         = 1, 2, 3
$DBsPerVol   = 4
$VolTurnover = $DBsPerVol + $Servers.Count

$DBConfigs =
For( $i = 0; $i -lt $DBs.Count; ++$i )
{    
    $OffSet    = $Gap[ ([Math]::Floor( ($i / $DBsPerVol) )) % $Gap.Count ] # Determine the offset:
    $SrvNum    = $i % $Servers.Count                                       # Reusable index for primary & tertiary servers
    $SrvNum2nd = ($i + $Offset) % $Servers.Count                           # Reusable index for secondary & quaternary servers
    $VolNum    = ([Math]::Floor( ($i / $VolTurnover) ) % $Vols.Count)      # Returns the volume number 
    
    [PSCustomObject]@{
        Name             = $DBs[ $i ]               # Returns the DB name.
        Disk             = $VolNum + 1              # Returns the disk# 
        Volume           = $Vols[ $VolNum ]         # Returns the volume name
        PrimaryServer    = $Servers[ $SrvNum ]      # Returns the primary server
        SecondaryServer  = $Servers[ $SrvNum2nd ]   # Returns the secondary server
        TertiaryServer   = $DRServers[ $SrvNum ]    # Returns the tertiary server
        QuaternaryServer = $DRServers[ $SrvNum2nd ] # Returns the quaternary server
    }
}

$DBConfigs | Format-Table -AutoSize

Note: For brevity, this example is truncated. The real implementation had 96 DBs.

Before entering a typical for loop the code defines a few variables to guide the distribution pattern:

  1. $Gap is an array to help define the location of the secondary DB copy relative to the primary.
  2. As the name implies $DBsPerVol defines how many DBs should be on each volume.
  3. And, $VolTurnover determines how many loop iterations can elapse before we start placing databases on the next volume.

Inside the loop, several calculations are made:

  1. $OffSet uses a [Math]::Floor() calculation with simple division and % calculations to select an index from the $Gap array. Again, this will determine where to place the secondary DB copy relative to the primary, 1, 2 or 3 spots away, in a rotating pattern.
  2. $SrvNum & $SrvNum2nd calculate which index is selected from the $Servers & $DRServers arrays. As noted in the code, this effectively defines the servers hosting the primary, secondary, tertiary and quaternary copies for a given DB.
  3. Finally $VolNum uses another [Math]::Floor() calculation with a few other factors to select an index from the $Vols array.

The output of the code looks like:

Name  Disk Volume PrimaryServer SecondaryServer TertiaryServer QuaternaryServer
----  ---- ------ ------------- --------------- -------------- ----------------
DB001    1 Vol1   EXSrv-01      EXSrv-02        EXSrv-DR01     EXSrv-DR02
DB002    1 Vol1   EXSrv-02      EXSrv-03        EXSrv-DR02     EXSrv-DR03
DB003    1 Vol1   EXSrv-03      EXSrv-04        EXSrv-DR03     EXSrv-DR04
DB004    1 Vol1   EXSrv-04      EXSrv-01        EXSrv-DR04     EXSrv-DR01
DB005    1 Vol1   EXSrv-01      EXSrv-03        EXSrv-DR01     EXSrv-DR03
DB006    1 Vol1   EXSrv-02      EXSrv-04        EXSrv-DR02     EXSrv-DR04
DB007    1 Vol1   EXSrv-03      EXSrv-01        EXSrv-DR03     EXSrv-DR01
DB008    1 Vol1   EXSrv-04      EXSrv-02        EXSrv-DR04     EXSrv-DR02
DB009    2 Vol2   EXSrv-01      EXSrv-04        EXSrv-DR01     EXSrv-DR04
DB010    2 Vol2   EXSrv-02      EXSrv-01        EXSrv-DR02     EXSrv-DR01
DB011    2 Vol2   EXSrv-03      EXSrv-02        EXSrv-DR03     EXSrv-DR02
DB012    2 Vol2   EXSrv-04      EXSrv-03        EXSrv-DR04     EXSrv-DR03
...

The above output table follows the same distribution pattern that was output from the calculator. A simple export to a CSV file now allows me to use the configuration objects as input to the other scripts.

Extending modulus calculations with some basic math, I was able to generate rather complicated distribution pattern. I didn’t even attempt to write this without leveraging modulus, but I’d imagine having to resort to copious amounts if/else logic. In closing, this is another concise but powerful pattern that really makes me appreciate the modulus operator only further compounding my modulus obsession.

My Modulus Obsession

A modulus (modulo or mod) is the remainder of a division operation. PowerShell, like many other languages, includes a modulus operator ( % ) that returns the remainder of division between any 2 numbers. The immediate and somewhat obvious use case for the modulus operator is to test if a number is even or odd.

Simple Examples:

4 % 2 Returns: 0, and means 4 is evenly divisible.
5 % 2 Returns: 1, which is of course the remainder and means 5 is odd and not evenly divisible.

This simple function can be very useful. In one of my Exchange environments databases were distributed in an even/odd pattern across even/odd numbered servers respectively.  For example, DB001’s activation preference 1 is ExchSrv1 and preference 2 is ExchSrv2, while DB002 was the opposite.  With simple knowledge of this distribution matrix, the modulus operator enables me to quickly determine which DBs should be active on a given server.

Get-MailboxDatabase -Server ExchSrv1 |
Where-Object{ $_.Name.SubString(2) % 2 -eq 1 }

This would return all odd numbered DBs with copies on ExchSrv1, and because I know my configuration, I know these should normally be active on the same server.  This is particularly useful when putting a server back in service, I can simply add the Move-ActiveDatabase command as below:

Get-MailboxDatabase -Server ExchSrv1 | 
Where-Object{ $_.Name.SubString(2) % 2 -eq 1 } |
Move-ActiveMailboxDatabase -ActivateOnServer ExchSrv1

To return even numbered DBs all I’d need to do is compare to 0 instead of 1, so if I were working on server 2, ExchSrv2:

Get-MailboxDatabase -Server ExchSrv2 | 
Where-Object{ $_.Name.SubString(2) % 2 -eq 0 } |
Move-ActiveMailboxDatabase -ActivateOnServer ExchSrv2

PowerShell’s type conversion engine can shorten the evaluations in the above Where clauses.  This is because a Boolean 0 is False & a Boolean 1 (or anything non-zero) is True.

[Boolean]4 % 2 Returns: False
[Boolean]5 % 2 Returns: True

So technically the above examples can be shortened like below:

Get-MailboxDatabase -Server ExchSrv1 | 
Where-Object{ $_.Name.SubString(2) % 2 } |
Move-ActiveMailboxDatabase -ActivateOnServer ExchSrv1

To get even numbered DBs you’d have to negate the modulus return with either ! or -not :

Get-MailboxDatabase -Server ExchSrv2 | 
Where-Object{ !($_.Name.SubString(2) % 2) } |
Move-ActiveMailboxDatabase -ActivateOnServer ExchSrv2

Given the Boolean conversions I find it helpful to think of these expressions as something like an .IsOdd() test method. However, I find the shortened format to be a little confusing, so I prefer the more explicit approach.

So far these are relatively straight forward uses for the modulus operator.  However, this simple operation can be used in clever ways to solve other types of problems. One use case is the distribution of one set of values across another usually larger set.

As a realistic example, let’s say I have a group of students I need to assign to classrooms.

$Rooms = 101, 102, 103, 104

# Establish test objects:
$Students = @(
    'Mike', 'Chris',   'Jessica', 'Matt'
    'Jenn', 'Josh',    'Amanda',  'Dan'
    'Jim',  'Rob',     'John',    'Joseph'
    'Ryan', 'Brandon', 'Jason',   'Justin'
) | 
Select-Object @{Name = 'Name'; Expression = { $_ }},
    @{Name = 'Room'; Expression = { 0 }}

# Assign students to rooms:
For($i = 0; $i -lt $Students.Count; ++$i)
{
    $Students[$i].Room = $Rooms[$i % $Rooms.Count]
}

$Students

Return:

Name    Room
----    ----
Mike     101
Chris    102
Jessica  103
Matt     104
Jenn     101
Josh     102
Amanda   103
Dan      104
Jim      101
Rob      102
...

The first few lines merely establish the test data. What stands out, is the compact and easy to understand loop that’s assigning students to rooms. By calculating the remainder of $i divided by the number of elements in the $Rooms array, the loop assigns rooms to the students in a rolling pattern. When the end of the $Rooms array is reached it starts again at the beginning. This works without a nested loop, tracking or flag variables.

Here’s how it works:

1st iteration $i is 0, 0 % 3 returns  index 0 is selected from the $Locations array.
2nd iteration $i is 11 % 3 returns 1, index 1 is selected.
3rd iteration $i is 22 % 3 returns 2, index 2 is selected.

This will continue until $i exceeds $Rooms.Count:

4th iteration $i is 33 % 3 returns 0, index 0 is selected.
5th iteration $i is 44 % 3 returns 1, index 1 is selected.

The first 3 iterations are straight forward, but things get interesting as the value of $i meets then exceeds $Rooms.Count. Since the modulus calculation is based on whole number division, the size of the numbers are irrelevant, and the remainder always maps to the “next” index in the other array.  Whenever $i is equivalent to or an even multiple of $Rooms.Count the remainder will be zero effectively wrapping around to the beginning of the $Rooms array and completing an instance of the pattern.

Another real world example is the allocation of mailboxes to databases for an Exchange migration. This is a little more complicated because we also have to account for mailbox size. Otherwise, arbitrary assignments will result in poor distribution of data. An initial step in solving this problem is to sort the mailboxes in size order.  However, sorting alone, would result in a lopsided data distribution where, assuming a descending sort, DBs earlier in the collection will contain disproportionately more data.

Here’s how I’ve solved this problem in the past without using the modulus operator:

$AllDBs =
@(
    "DB001","DB002","DB003","DB004","DB005","DB006","DB007","DB008"
    "DB009","DB010","DB011","DB012","DB013","DB014","DB015","DB016"
    "DB017","DB018","DB019","DB020","DB021","DB022","DB023","DB024"
    "DB025","DB026","DB027","DB028","DB029","DB030","DB031","DB032"
    "DB033","DB034","DB035","DB036","DB037","DB038","DB039","DB040"
    "DB041","DB042","DB043","DB044","DB045","DB046","DB047","DB048"
)

$Direction = 1
$Index     = 0

For($i = 0; $i -lt $Mailboxes.Count; ++$i)
{
    $Mailboxes[$i].DestinationDB = $AllDBs[$Index]
    If($Direction -eq 1) {
        $Index++
    }
    ElseIf($Direction -eq 0) {
        $Index--
    }

    # Alternate direction and adjust $index.
    If($Index -eq $AllDBs.Count) {
        $Direction = 0
        $Index--
    }
    ElseIf($Index -eq -1) {
        $Direction = 1
        $Index++
    }
}

This code certainly gets the job done. It was a little difficult to develop, but it’s relatively literal to understand.  For brevity’s sake the creation of $Mailboxes isn’t shown but it’s a collection of [PSCustomObjects] representing user mailboxes, with an added yet empty DestinationDB property. $Index is manually incremented or decremented within the loop and controls which DB is assigned from the $AllDBs collection.

When $Index exceeds the $AllDBs.Count the flag variable $Direction is flipped and $Index starts decrementing instead of incrementing. This causes DBs to be assigned from both directions climbing up and down the $AllDBs array and resulting a smoother data distribution.

There’s nothing wrong with the above example, however leveraging the modulus operator I can accomplish the same thing with much less code:

For($i = 0; $i -lt $Mailboxes.Count; ++$i)
{
    $Modulus = $i % $AllDBs.Count

    $Mailboxes[$i].DestinationDB = $AllDBs[$Modulus]

    If( $Modulus -eq $AllDBs.Count -1 ) {
        [Array]::Reverse($AllDBs)
    }
}

Note: Didn’t bother restating the $AllDBs array above.

With an understanding of how the modulus based pattern works, I’ve written code that’s easy to read, efficient and, of course, nice to look at. There’s only 1 conditional statement to execute per iteration. Compare that to the previous code where depending on $Index & $Direction there were 2 – 4 conditionals executing per iteration. Furthermore, 1 – 2 incrementation operations have been replaced with just the 1 modulus calculation to assign the $Modulus variable. Granted, that might be offset by the reversal of the $AllDBs array, but in this case I’ll chance it, given how many lines I’ve saved. In closing, beautiful patterns like this are why I have a little bit of a modulus obsession.

Exchange 2019 Installation Issue, Problems with User Rights and Readiness Checks

Recently I started migrating from Exchange 2013 to 2019.  As you might imagine installing the first Exchange 2019 server is a big deal.  I spent a lot of time preparing, but still hit a few issues, and wanted to document them here.

Setup initially failed with the below error:

The following error was generated when “$error.Clear();
Set-LocalPermissions

” was run: “System.Security.AccessControl.PrivilegeNotHeldException: The process does not possess the ‘SeSecurityPrivilege’ privilege which is required for this operation.
at Microsoft.Exchange.Configuration.Tasks.Task.ThrowError(Exception exception, ErrorCategory errorCategory, Object target, String helpUrl)
at Microsoft.Exchange.Configuration.Tasks.Task.WriteError(Exception exception, ErrorCategory category, Object target)
at Microsoft.Exchange.Management.Deployment.SetLocalPermissions.InternalProcessRecord()
at Microsoft.Exchange.Configuration.Tasks.Task.b__91_1()
at Microsoft.Exchange.Configuration.Tasks.Task.InvokeRetryableFunc(String funcName, Action func, Boolean terminatePipelineIfFailed)”.

The Exchange Server setup operation didn’t complete. More details can be found in ExchangeSetup.log located in the :\ExchangeSetupLogs folder.

Note: There were similar errors in the ExchangeSetup.log file.

An internet search for “The process does not possess the ‘SeSecurityPrivilege’ privilege which is required for this operation.” turned up a rather old Technet discussion which mentioned 5 potentially missing user rights assignments.  After checking each I determined that even though the installation account was a domain admin it didn’t have the “Manage Auditing and Security Log” right.

Limiting access to this this right is a common security hardening measure.  Doing a cursory search I found Microsoft mentioning the vulnerability and countermeasures here and here.  They tersely state that “Manage Auditing and Security Logs” enables the erasure of important evidence of unauthorized activity.  Luckily I’ve got friends in the security business who gave the more articulate summary below:

“Naturally, an attacker wants to go undetected as long as possible. If the attacker has access to “Manage Auditing and Security Logs”, they can clear or simply change the logging to hide their malicious activity.  Limiting access to this right denies an attacker the ability to easily cover their tracks and helps ensure an organization’s detection and forensic capabilities.” -Kevin Kidder, KidderSec Technologies LLC.

In an enterprise environment these restrictions are most likely delivered through group policy, however to resolve the issue in the near-term I added the installation account to the local “Event Log Readers” group. That grants the right at least for the duration of the install, but I’ll have to address it more eloquently later. Unfortunately subsequent installation attempts failed with the below error:

FAILED

A Setup failure previously occurred while installing the AdminTools role. Either run Setup again for just this role, or remove the role using Control Panel. For more information, visit: http://technet.microsoft.com/library(EXCHG.150)/ms.exch.setupreadiness.InstallWatermark.aspx

Note: There’s no “AdminTools” role they probably mean Management Tools.

Apparently, the previous failure did some damage. This must have something to do with tracking installation progress, possibly for the purpose of recovery.  The link was to documentation of a 2016 readiness check.  I might forgive that if it had anything useful, but it was little more than an apology for not having any information at all:

ms.exch.setupreadiness.InstallWatermark

“Sorry, but we haven’t added content for this Exchange 2016 readiness check yet. However, we’re gathering feedback that will help us add the most relevant content to this topic. Please take a minute to send us feedback about the information you were hoping to find.”

I directly checked the 2019 readiness check documentation but didn’t find anything.  I did find documentation for an Exchange 2013 readiness check, but it was no help in resolving the issue. It merely suggested a reinstall albeit with an unusual syntax.  I tried it anyway and it didn’t work either.

Eventually I tried a complete uninstall and reboot, but even that didn’t get me past the error.

Out of desperation I decided to go hunting in the Registry and wouldn’t you know, the very first place I looked I found a lead.

I renamed the key as seen above, I reran the installation and it went through without issue. Of course this also proves the lack of “Manage Auditing and Security Log” right caused the initial issue. I’ll have to follow-up to determine if the right is needed permanently or just for the installation.

If anyone else encounters the “watermark” issue, regardless of the cause, I recommend uninstalling first.  If that doesn’t work seek & delete the registry key then retry the installation.  I wouldn’t recommend forcing the installation to proceed by deleting the key without first uninstalling.

A few recommendations I’ll be passing to Microsoft:

  1. There should be readiness checks for the required rights.  Microsoft has recommendations around user rights assignments and should be prepared for the possibility their customers have implemented those recommendations. It seems reasonable to check for the required rights before allowing a flawed installation to continue.
  2. Microsoft should document both the installation tracking mechanism and the “watermark” readiness check.  That alone would’ve saved me a lot of time and frustration.
  3. The uninstall procedure should’ve cleaned up the registry.

Hopefully Microsoft will address these problems. In the meantime, I hope this post helps you get through these installation issues a little easier.  Feedback is always welcomed, comment, click follow or grab the RSS feed to get notifications of future posts.

March 2021 Applying MS Exchange 0-Day Patches

On March 2nd Microsoft released Exchange Server Security Updates to address several 0-day exploits targeting Exchange servers. The vulnerabilities, update & mitigations have been covered thoroughly by Microsoft et al. So I’m not going to rehash it here.  If you need additional information, please check the references section at the end of this post. I’ll do my best to keep it updated as the situation is still evolving.

I began planning the update deployment almost immediately, however, there were some rumblings in the community about installation issues.  Several people in this Reddit post described issues, including outright failures and services being left disabled and/or not starting after the installation.

As you might expect, I tested the patch in a lab environment and indeed quite a few services were left in a disabled state.  So, this post is about how I corrected the issue.

I ran the package from the command line with msiexec.exe /Update <Path> /passive /promptrestart.  Despite the arguments the server rebooted without prompting.  After the reboot, quite a few services were stopped and disabled.

DisplayName                                          StartType  Status
-----------                                          ---------  ------
Application Identity                                  Disabled Stopped
Computer Browser                                      Disabled Stopped
IIS Admin Service                                     Disabled Stopped
Internet Connection Sharing (ICS)                     Disabled Stopped
Microsoft Exchange Active Directory Topology          Disabled Stopped
Microsoft Exchange Anti-spam Update                   Disabled Stopped
Microsoft Exchange DAG Management                     Disabled Stopped
...
Microsoft Exchange Unified Messaging                  Disabled Stopped
Microsoft Filtering Management Service                Disabled Stopped
NetBackup SAN Client Fibre Transport Service          Disabled Stopped
Performance Logs & Alerts                             Disabled Stopped
Remote Registry                                       Disabled Stopped
Routing and Remote Access                             Disabled Stopped
ScanMail EUQ Monitor                                  Disabled Stopped
Smart Card                                            Disabled Stopped
SSDP Discovery                                        Disabled Stopped
Tracing Service for Search in Exchange                Disabled Stopped
UPnP Device Host                                      Disabled Stopped
Windows Management Instrumentation                    Disabled Stopped
World Wide Web Publishing Service                     Disabled Stopped

Note: For brevity, some Exchange services were truncated from the above table.

Notice it wasn’t just Exchange services.  For example, IIS AdminService and WMI were both disabled.    From above, I couldn’t tell with certainty which services were disabled by the update installer or what their original start modes were.

To correct this I decided to compare the disabled services to the services on an unaffected Exchange server. On the affected server I ran:

Get-Service | 
Where-Object{ $_.StartType -eq 'Disabled' } |
Export-Csv -Path 'C:\Temp\BadServiceState.csv'

I took that file to an unaffected server and ran:

Import-Csv -Path 'C:\Temp\BadServiceState.csv' |
Get-Service |
Export-Csv -Path 'C:\Temp\GoodServiceState.csv'

Finally, to fix the services, I returned to the troubled server and ran the below loop:

Import-Csv -Path  'C:\Temp\BadServiceState.csv' |
ForEach-Object{ Set-Service $_.Name -StartupType $_.StartType }

At this point, all the startup modes were correct, but I didn’t have a quick way to start the services.  I didn’t want to spend the time tracing out the dependencies to ensure everything would start.  So, I simply let an additional reboot take care of it for me.

After the reboot I reapplied the patch for good measure. This time I ran it via the GUI and had no issues.

To further diagnose the issue I took a quick look at the file C:\ExchangeSetupLogs\ServiceControl.log. The log lists all the services that are stopped and disabled in a format similar to below.

	[08:58:17] Stopping service 'hostcontrollerservice'.
	[08:58:50] Stopping service 'FMS'.
	…
	[08:58:52] Disabling service 'FMS'.
	[08:58:52] Disabling service 'hostcontrollerservice'.
	…

However, the log does a poor job of showing the service configuration prior to the installation.  The process interrogates all services not just those that were changed, making it difficult to parse the file for relevant data. So, instead, I grabbed 2 files from the C:\ExchangeSetupLogs folder while the installer was running.

ServiceStartupMode.xmlRecords the service startup configurations prior to the install.
ServiceState.xmlRecords the service state prior to the install.

Apparently these files are used in the last stages of the installation to return service configurations to normal.  Unfortunately, the files are removed at the end of even a faulty install, but if you can grab them during the install you can use a little PowerShell magic to right the ship afterward.

Both files are formatted as Common Language Infrastructure (CLI) XML representations of native PowerShell objects.  In fact, it’s likely these files were created using the Export-CliXml cmdlet.  This is the same type of XML serialization used by PowerShell remoting to communicate objects over the wire.  As such, they are very easy to import and work with in another PowerShell console.

ServiceStartupMode.xml stores an array of hash tables with Name & StartupType keys.  I presume these are used by the installation as splat parameters for the Set-Service cmdlet so one way to leverage the file is:

Import-Clixml <PathTo_ServiceStartMode.xml> | 
ForEach-Object{ Set-Service @_ -ErrorAction SilentlyContinue }

Now, If you reboot the server the services should start the same as before.

Because you’re passing an array of hash tables down the pipeline you can use @_ as the current pipeline element.  Set-Service will treat that as typical splatting.

Note: The file contains information from all services not just the ones modified by the installer.  Since there are some services that can’t be changed, the -ErrorAction SilentlyContinue argument will spare you from profuse error output.

ServiceState.xml stores the actual, albeit serialized ServiceController objects.  As I mentioned before due to service dependencies it would take some work to use this file for corrective action.  However, it may be useful for reporting or other diagnostics.

Of course, attempting to capture these files mid-install is a little inconvenient.  As an alternative PowerShell makes it very easy to capture the same data. You can run the below code to generate the files before running the install package.

Get-Service |
ForEach-Object{
    @{
    Name = $_.Name
    StartupType = $_.StartType
    }
} | 
Export-Clixml -Path c:\temp\ServiceStartupModes.xml

Get-Service | 
Export-Clixml -Path C:\temp\ServiceState.xml

Note: The MS version of the ServiceStartupModes.xml file uses “StartMode” as the key. “StartMode” is an alias for the –-StartupType parameter in the Set-Service cmdlet. An earlier version of this post used “StartType” which is also an alias but only in PowerShell Core 7.x. Ergo, I decided to forego the aliases and use the actual parameter name “StartupType”. However, notice the value is still $_.StartType, the property from any given service.

I also spoke with Microsoft Support and asked them if in future patch releases  they can retain the ServiceStartupMode.xml & ServiceState.xml files in the C:\ExchangeSetupLogs folder.  It’s a simple change that could make a huge difference while working in semi-crisis 0-Day patching scenarios.

MS Support also mentioned some service start issues being linked to missing .DLL files in the /bin folder. They suggested exporting a directory listing of the /bin to a text file. With the export you can use any of a number of methods to isolate missing files and recopy them from a known good server. You can then use the file to figure out which files are missing and copy them back from the known good server.

Quick PowerShell command to export a file listing:

Get-ChildItem "$($env:exchangeinstallpath)bin" -Recurse -Include "*.dll", "*.exe" | 
Select-Object -ExpandProperty FullName | 
Set-Content c:\temp\DLLlist.txt

Hopefully these quick & dirty tricks will help you get through this update cycle a little easier. Feedback is always welcomed, comment, click follow or grab the RSS feed to get notifications of future posts.

Additional Resources Regarding Recent Exchange 0-Day Exploits:

PowerShell Performance Part 2, Reading Text Files

This is part 2 of my informal blog series on PowerShell performance.  In part 1 I discussed some strategies for measuring performance.  In part 2 I’ll be covering file read performance and related techniques and use cases.  Because of the volume of information, I’ll cover writing file data in part 3.

Working with text files is fundamental.  Tasks like reading and parsing log files are exceedingly common in both interactive and programmatic scenarios.  It’s no surprise a lot has already been written about PowerShell performance in this area.  My goal here is to conduct a comprehensive study of file read techniques to determine the best options in different situations.  As the title implies I’m particularly interested in performance but code readability and memory utilization will also be considered.

PowerShell’s primary tool for reading text files is the Get-Content (GC) cmdlet.  Like many native cmdlets, GC offers broad capabilities. For example, it can easily read different encodings including non-text data.  No surprise, GC’s flexibility comes with a performance penalty; it’s earned a reputation for being quite slow.  As such, a number of alternate techniques have gained popularity, especially those that directly leverage .Net classes.

Study Methodology:

As described in Part 1, I don’t want to rely on a single measurement.  So, I ran each technique through a 10 iteration loop.  Those techniques that generate a single string were re-run through another 2 loops.  The first, using the -split operator and the other using the .Split() method.  Get-Content can return both types, but defaults to an array, so I wanted to ensure comparison of like return types while including the typical expectation.  The data should be sufficient to pick the fastest approach for the desired output type.

Note/Warning: .Split() will split on every character in its argument.  Therefore, splitting on the default Windows line ending results in unintended empty elements.  To compare fairly with the -split I included the [System.StringSplitOptions]::RemoveEmptyEntries argument in the tests.  However, that will also remove naturally occurring blank lines; a potential problem if you are expecting and/or need them. I included the .Split() variations because it still works well where blanks aren’t an issue, which is often the case with text logs. 

Test files were created by copying data from an IIS log file into 100KB, 2.5MB, 25MB, 50MB, 100MB and 1GB files. I maintained ASCII encoding throughout.

I ran each test in a fresh PowerShell console window.  Seeing as there’s overlap between command permutations and/or .Net classes I didn’t want any of the caching functionality mentioned in Part 1 to skew the results.

To evaluate the impact on memory, I monitored the \Process\Private Bytes counter for each run.

Note: All tests were performed with PowerShell 5.1.


Here’s are the techniques I tested and their respective test code:

  • Get-Content
1..10 | ForEach{ (Measure-Command { Get-Content $file }).TotalMilliseconds }
  • Get-Content -Raw

    Returns a single string including line ending characters.  As mentioned the -Raw parameter will be retested with the additional splits.
1..10 | ForEach{ (Measure-Command { Get-Content  $file -Raw }).TotalMilliseconds }
1..10 | ForEach{ (Measure-Command { (Get-Content  $file -Raw) -split "`r`n" } ).TotalMilliseconds }
1..10 | ForEach{ (Measure-Command { (Get-Content  $file -raw).split("`r`n", [StringSplitOptions]::RemoveEmptyEntries)}).TotalMilliseconds}
  • Get-Content -ReadCount 0

The -ReadCount parameter determines how many lines are passed down the pipe at a time.  -ReadLine 0 will pass all lines down the pipe at once.  This generally precludes cleanly placing | ForEach-Object{} directly after the Get-Content cmdlet, because $_ will actually be an array consisting of whatever number of objects were specified with -ReadCount. This method is fine if you need to store the data in a variable.

1..10 | foreach{ (Measure-Command { Get-Content 'C:\temp\TestFiles\Test100MB.txt' -ReadCount 0}).TotalMilliseconds }
  • [System.IO.File]::ReadAllLines()

Reference: MS Documentation

The System.IO.File class offers functionality for working with files.  The ReadAllLines static method is particularly useful and has been my go-to alternative for quite a while.  It returns a string array ([String[]]) which operationally equivalent to Get-Content‘s [Object[]] return. So, withstanding the break from verb-noun syntax it’s an easy drop-in alternative.

1..10 | ForEach{ (Measure-Command { Get-Content $file -ReadCount 0}).TotalMilliseconds }

Note: Shorthand below may refer to this as [IO.File]::ReadAllLines() or just ::ReadAllLines()

  • [System.IO.File]::ReadAllText()
    like GC -Raw this will read the entire file into memory as a single string, including the line break characters.  So, it too will be tested with the additional splits.
1..10 | ForEach{ (Measure-Command { [System.IO.File]::ReadAllText( $file ) }).TotalMilliseconds }
1..10 | ForEach{ (Measure-Command { [System.IO.File]::ReadAllText( $file ) -split "`r`n" }).TotalMilliseconds }
1..10 | ForEach{ (Measure-Command { [System.IO.File]::ReadAllText( $file ).Split("`r`n",[StringSplitOptions]::RemoveEmptyEntries) }).TotalMilliseconds }

Note: Shorthand below may refer to this as [IO.File]::ReadAllText() or just ::ReadAllText()

  • System.IO.StreamReader object using the .ReadLine() method

Reference: MS Documentation

StreamReader reads a stream of bytes as text. Usually, it’s more verbose than other techniques.  It’s not as neat as ::ReadAllLines() but it’s a common and well-advertised alternative to Get-Content.  Using StreamReader generally follows a loop pattern common to many languages.  Once the file is open, read and processing commands are placed in a loop stepping through each line until the EndOfStream value evaluates to true and executing the .Close() method immediately after.

1..10 | ForEach{ (Measure-Command {
$Stream = [System.IO.StreamReader]::new( $file )
While( !$Stream.EndOfStream ) { 
	$Stream.ReadLine()
	# Do some other stuff with the data…
}
$Stream.Close() } ).TotalMilliSeconds }

This pattern doesn’t return an array and cannot be piped. Of course that makes it a little more difficult to work with incoming lines. In practice, you’d probably assign the incoming line to a variable to work with it further. You can easily store the output in a variable to facilitate piping, but I’d only do so if it was already a requirement. It’s slower and more memory intense so if it’s just for piping you’re better off doing the work in the existing loop.

Note: Shorthand below may refer to this as $Stream.ReadToEnd() or just .ReadLine()

  • System.IO.StreamReader object using the .ReadToEnd() method
1..10 | ForEach{ (Measure-Command {
$Stream = [System.IO.StreamReader]::new( $file )
$Stream.ReadToEnd()
$Stream.Close() } ).TotalMilliSeconds }

1..10 | ForEach{ (Measure-Command {
$Stream = [System.IO.StreamReader]::new( $file )
$Stream.ReadToEnd() -split "`r`n"
$Stream.Close() } ).TotalMilliSeconds }

1..10 | ForEach{ (Measure-Command {
$Stream = [System.IO.StreamReader]::new( $file )
$Stream.ReadToEnd().Split("`r`n", [StringSplitOptions]::RemoveEmptyEntries )
$Stream.Close() } ).TotalMilliSeconds }

Note: Shorthand below may refer to this as $Stream.ReadToEnd() or just .ReadToEnd()


Observations:

The study confirms Get-Content quite a bit slower than other methods but there are some other very interesting observations. Below, I graphed some data from the 100MB file tests:

Note: I choose to display 100MB results because the graph seems a better representation. With the smaller files, relatively small differences were over-represented.

Note: Above, green are techniques that return an array, blue are single string returns and red are single string returns split after the fact.

Of those techniques that return an array, Get-Content is by far the slowest, taking 1271ms. .ReadLine() & ::ReadAllLines() averaged 585 & 675ms. That’s a significant difference that could really add up when processing many files. Get-Content -ReadCount 0 performed better but was still way behind both the .Net approaches which were respectively ~200/100ms faster.

I was surprised by the difference between the 2 .Net approaches above. I’ve always favored ::ReadAllLines() because it’s so easy to use in typical PowerShell code.  Whenever I’ve read about StreamReader I’d do a quick test and ::ReadAllLines() was always faster.  Now, looking at my results across file sizes it seems [IO.File]::ReadAllLines() is faster for smaller files, but $Stream.ReadLine() method is faster for “larger” files. Take a look at the below table.

FileSize[System.IO.File]::ReadAllLines()StreamReader’s .ReadLine() method
100KB1.332.05
2.5MB14.1416.60
25MB169.74153.60
50MB329.51290.87
100MB675585

This is an interesting find because it offers some logic on which technique to use when. If you’re processing many small files ::ReadAllLines() may perform better. If you’re dealing with larger files you may want to accept slightly more complex code to implement the StreamReader. Either way, both approaches are valid and perform far better than Get-Content.

Of course, I don’t know how these observations would play out in a larger program. $Stream.ReadLine() requires a loop. Assuming you pack further operations into the same loop the only additional overhead is from those operations. Any additional overhead needed to loop with[IO.File]::ReadAllLine() is not accounted for in these tests.

Given the admittedly arbitrary file sizes, more testing is necessary to determine where the performance advantage flips. Moreover, I’d like to see how this plays out in more realistic scripts. I’ll post a follow-up with that information as soon as I can pull it together.

The .Net methods that return a single string are the fastest overall. They perform similarly to one another. ::ReadAllText() outperformed .ReadToEnd() by a mere 15ms (407 Vs. 422ms) . Both .Net methods are very good alternatives to Get-Content -Raw which clocked in at 1049ms – ~2.5x slower!

Not surprising, but splitting the string after the fact added significant overhead. If you need an array ::ReadAllText() & .ReadToEnd() aren’t the best options. Unbelievably, and despite the extra overhead, when using the .Split() method both .Net methods were still faster than Get-Content alone.

Another revelation from these tests; .Split() consistently outperformed the -split Operator. This was true across all tested sizes but the differences were modest on smaller files and exaggerated larger ones. This seems to indicate splitting larger strings is faster using .Split(), but this too calls for a follow-up post. I’d like to re-test the 2 split techniques independent of file read operations. Some use cases may allow splitting on a single newline character so I also want to see how .Split() performs without removing the empties.

Memory Considerations:

Memory is a concern, particularly when processing many large files. obviously, the techniques that return a single string used the most memory, but there were still some surprises.

Note: These are peak measurements taken from perfmon during each test.

All the sessions started out using ~72MB. Get-Content & $Stream.ReadLine() had no detectable impact on memory! I was surprised to see that [IO.File]::ReadAllLines() used about 525MB.

I expected the techniques that return single string to use the most memory. Indeed Get-Content -Raw consumed 1.3GB even before splitting. However, $Stream.ReadToEnd() & [IO.File]::ReadAllText() were more modest at ~525MB. Get-Content -ReadCount 0 used ~600MB most likely because it has to pass all the file’s lines down the pipeline.

Memory is generally not a concern. PowerShell relies on .Net to manage memory through background garbage collection which frees unused memory either when needed or on a schedule. Different underlying collection behaviors may explain some of these disparities, particularly between .ReadLine() & ::ReadAllLines(). However, the larger the file the greater the risk of memory exhaustion.

All the methods that return a single string ran out of memory trying to read a 1GB file. This was true even when >3GB was available. Secondary testing showed storing the output in RAM required 3-4x the file size. Thankfully, even if you had a use case for single strings, you could certainly adapt one of the more memory friendly methods.


Conclusion:

The most glaring and unfortunate conclusion is that Get-Content is still unacceptable slow. Comparatively, Get-Content under performed in all use cases and permutations. However, PowerShell’s ability to utilize .Net classes offers a rich set of alternatives that cover pretty much any file read scenario.

It’s healthy to revisit old assumptions once in a while. Obviously I knew a bit about this topic beforehand, but going through a formal experiment uncovered some new information and questions. I’ll be writing an addendum soon to address the following points:

  1. [IO.File]::ReadAllLines() & $Stream.ReadLine(). Is the former faster for smaller files and the latter faster for larger ones. And if so, at what point does it flip? In other words, define large & small in this context.
  2. Determine if garbage collection impacting the performance differentials between [IO.File]::ReadAllLines() & $Stream.ReadLine() .
  3. Additional StreamReader examples & code patterns, merits & demerits of different approaches.
  4. Separate experiment to determine the performance difference between .Split() than -Split. Evaluate the additional impact of [System.StringSplitOptions]::RemoveEmptyEntries .

As always, I’d love to get some feedback.  Comment, click follow or grab the RSS feed to get notifications of future posts.

A PowerShell Success Story

I’m supposed to be writing the second post in a series on performance (See part 1), but I wanted to take a quick detour to discuss my recent trip to MS Ignite.  Despite working with MS products for around 20 years, I’d never gone to a big conference, so I really didn’t know what to expect.  Considering many MS on-prem products have been problematic of late, I envisioned Ignite as an opportunity for cathartic griping to various product groups.  While there was some of that I’m happy to report Ignite was actually hugely productive for me.

It’s probably no surprise that I attended a few PowerShell oriented sessions, like:

I got a lot out of the sessions, but what really made the whole trip worth it was the time I spent at the PowerShell booth with the team that actually builds and maintains the product.  Besides the fact that I was totally starstruck, I was truly impressed!  The enthusiasm of the team and dedication to the user community is truly unbelievable.

left to right: Japp Brasser (MVP – Working for Rubrik), Danny Maertens, Syndee Smith & Jason Helmick, all PowerShell Program Mangers with Microsoft, and me.

In a conversation with James Helmick I couldn’t resist telling a PowerShell success story I’m particularly proud of.  James actually encouraged me to blog about it, which is the real inspiration for this post.  So without further blathering here’s one of many PowerShell success stories.

When I was hired into my current position the organization was transitioning from a Solaris Unix centric environment to Windows & AD.  They had a long history in the Unix world and had spent decades perfecting their very own way of doing things.  And, as any IT veteran knows, people can get pretty attached to their work product.  It’s not at all surprising; after all technologists work their buts off first to think through tough problems, then to codify solutions.  Of course this is all to say that old systems diehard.  It’s often a protracted bitter process, fraught with hazards.

OK that sounds bad, but these circumstances were a real opportunity for me.  I wasn’t really hired for this sort of thing, but as big pieces of the infrastructure were transitioning they were also disrupting old processes.  This put me in an ideal position to modernize, and PowerShell was the obvious tool to do it.

One series of events took place back in 2015.  Our new HR system had just gone live, and even though data was synchronizing back to the old database, a lot of our data flows broke.

I wasn’t even part of the HRIS project, but having written a few of the now broken sync scripts I suddenly found myself in the middle of a crisis.  By design, the new system was locking records during its on-boarding process.  The data wasn’t being sent downstream, so new users weren’t getting setup properly.  User accounts couldn’t even be tested until the employee start date.  It was a huge and potentially embarrassing problem.

As the resident PoSh evangelist I was waiting for an opportunity like this.  Ultimately I proposed a rewrite of our provisioning programs using PowerShell.

The existing provisioning tools were hosted in Unix, where Tcl programs were used to telnet into Windows with privileged access.  Once in, still other scripts (regrettably I wrote some of those too) would create accounts etc.  It’s an understatement to say this was grotesquely complex and terribly insecure.

In the new system I wrote functions to wrap the user creation process, and added them to a module I had already written for the support team.  The new functions leveraged Just Enough Administration (JEA), using a PowerShell constrained endpoint running under an alternate privileged account.  In this paradigm the module functions would call the end point when needed, rather than directly granting access to the operator.  This also securely stored the credentials with the endpoint configuration, which solved the old problem of storing and transmitting the password in clear text.

To solve the data flow issue I wrote code to prepopulate the necessary data in the old HR database.  The RunAs account did need some access to the DB, but we used views & permissions to minimize that surface area as well.

While it was a lot to bite off at the time this was actually pretty easy to do.  I can’t share anything material to my organization, but I’ll try to demo a framework below.  This isn’t intended to be an exhaustive deep dive into PowerShell’s remoting features, perhaps I’ll cover that later, but this should be enough to get going.

The first thing I did was write a startup script:

# Define Functions:
Function Test-SupportEndPoint
{
	#Just to make sure we can connect to the raw end point...
	Write-Host "Connection Successful on : $($env:COMPUTERNAME)"
	Write-Host "Connected User           : $($PSSenderInfo.ConnectedUser)"
	Write-Host "RunAs User               : $($PSSenderInfo.RunAsUser)"
} #End Function Test-SupportEndPoint

# Add additional functions as needed!
# Define Visibility of Endpoint Functions:
[string[]]$ProxyFunctions =
@(
# As you add functions to your module list them here
'Test-SupportEndPoint'
'Get-Command'           # Only if you plan on interactive or implicit remoting.
'Measure-Object'        # Only if you plan on interactive or implicit remoting.
'Select-Object'         # Only if you plan on interactive or implicit remoting.
'Get-Help'              # Only if you plan on interactive or implicit remoting.
'Get-FormatData'        # Only if you plan on interactive or implicit remoting.
)

# Set visibility of commands
ForEach( $Command In (Get-Command -All) )
{
	If( $ProxyFunctions -notcontains $Command.Name )
		{ $Command.Visibility = 'Private' }
}

<#
There are some commands that must be present for different remoting scenarios.
This seems to be poorly documented, but through trial & error I came up with
the below for implicit and interactive remoting.

Command          Implicit   Interactive
-------          --------   -----------
Select-Object    Yes        Yes
Measure-Object   Yes        Yes
Out-File         No         No
Exit-PSSession   No         Yes
Get-FormatData   Yes        No
Out-Default      No         Yes

Note: Errors generating in some of the testing scenarios suggested that Get-
      Help is also used under the hood but not required.

For some reason these commands wouldn't activate properly from the array/loop
above, so I explicit enabled them.  Uncomment the below according to the 
desired remoting scenario.
#>

# Explicit to show these commands Only for interactive & implicit remoting:
# ( Get-Command Measure-Object).Visibility = 'Public'
# ( Get-Command Select-Object ).Visibility = 'Public'
# ( Get-Command Get-FormatData).Visibility = 'Public'
# ( Get-Command Exit-PSSession).Visibility = 'Public'
# ( Get-Command Out-Default   ).Visibility = 'Public'

<#
Note: Available LanguageModes:
 FullLanguage
 Restricted Language
 ConstrainedLanguage
 NoLanguage          - 

https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_language_modes?view=powershell-6
#>

$ExecutionContext.SessionState.Applications.Clear()
$ExecutionContext.SessionState.Scripts.Clear()
$ExecutionContext.SessionState.LanguageMode = 'FullLanguage'

Now I just needed to register the new end point:

Register-PSSessionConfiguration -Name Support -RunAsCredential PrivUser@MyDomain.local -StartupScript C:\PSSessionConfigs\Support.Startup.ps1 -ShowSecurityDescriptorUI -Confirm:$false

A few things are going to happen after running the command:

  1. You’ll be prompted for the RunAs credentials via a typical credential dialog:
  2. After entering the cred you’ll get a typical security dialog:
    undefined
    If you need to grant access to the endpoint, the user or group will need Read & execute as shown.
  3. The registration completes and you’ll see the below warning.

    WARNING: Register-PSSessionConfiguration may need to restart the WinRM service if a configuration using this name has recently been unregistered, certain system data structures may still be cached.

    In that case, a restart of WinRM may be required. All WinRM sessions connected to Windows PowerShell session configurations, such as Microsoft.PowerShell and session configurations that are created with the Register-PSSessionConfiguration cmdlet, are disconnected.


    Assuming no concerns I usually go ahead and restart the WinRM service. Doing so saves some confusion. When testing a new or changed endpoint, it helps to rule out the need for a service restart.

And that’s it!  You can test the endpoint with:

Invoke-Command -ComputerName YourServer -Configuration Support -ScriptBlock { Test-SupportEndPoint }
Connection Successful on : YourServer
Connected User           : TheCallingUser
RunAs User               : PrivUser

There are 2 other ways to access your new endpoint:

1) You can interactively enter the session:

$Session = New-PSSession -ComputerName YourServer -Configuration Support
Enter-PSSession $Session

2) You can import the session locally, this is referred to as implicit remoting:

$Session = New-PSSession -ComputerName YourServer -Configuration SupportImport-PSSession $Session
Import-PSSession $Session

Implicit remoting is awesome!  PowerShell will generate proxy functions to mimic the functions in your endpoint, but they’ll be loaded in your local session.  When used the local functions seamlessly call the endpoint functions but still return output locally.  More or less this looks no different than using PowerShell locally.

Notice that in the above code there are some comments that describe which cmdlets must be available for these 2 remoting scenarios.  I added that to the demo code for some degree of completeness in this otherwise abridged tale.

In my case I also needed to incorporate commands from the Exchange Management Shell (ESM). ESM is always an implicit remote session that’s imported locally, but it isn’t implemented like the rest of PowerShell’s remoting infrastructure.  That’s a long story for another time, but the gist is it’s very difficult to incorporate ESM commands such that they can be imported with custom endpoint.  In this case I used Exchange Role Based Access Control (RBAC) to create custom least privilege roles for the support team.  I wrote my own front end proxy functions to leverage commands coming from the implicit ESM session then used Invoke-Command against the custom end point when needed.  One of these days I’ll get around to finding a fix for that so I can do the whole implementation through implicit remoting, but for now this works really well.

The endpoint is secured in a few different ways:

  1. An ACL locks it down so only intended users can access it.
  2. The startup script stores the functions, but it also hides anything you don’t want to expose.  This is how we prevent the privileged account from being misused.
  3. Anything I couldn’t obscure, I checked and prevented in code.  For example, I’d only allow an account deletion after checking the OU.  The end point couldn’t be used to delete domain admin, service account other privileged account.

#3 Deserves a little more explanation.  Let’s say I have a function in the end point to remove a user account:

	Function Remove-UserAccount
	{
	    Param( [String]$UserName )
	    Try {
	        Remove-ADUser $UserName -Confirm:$false -ErrorAction Stop
	        Write-Host -ForegroundColor Green "Successfully removed $UserName ."
	    }
	    Catch {
	        Write-Host -ForegroundColor Red "An error occured trying to remove $UserName"
	    }
	}

Obviously this is just an example, but it’s easy to see that by deploying this in an endpoint running under a privileged account, you’ve given a the user the ability to remove any account.  I dealt with this by coding checks into risky functions so the above might look something like:

	Function Remove-UserAccount
	{
	    Param( [String]$UserName )
		
		#Only Allow deletions from the below OU's
		$AlowedOUs = @("ou=regular users,dc=yourcompany,dc=com")
		$User = Get-ADUser $UserName
		
		$OU = $User.DistinguishedName.ToLower().Substring($User.DistinguishedName.IndexOf(",") + 1)
		
		#Exit if not allowed OU…
		If( $AllowedOUs -notcontains $OU) {
		    Write-Host -ForegroundColor Red "$UserName is not in an allowed OU exit function!"
		    Return
		}
		
	    Try {
	        Remove-ADUser $UserName -Confirm:$false -ErrorAction Stop
	        Write-Host -ForegroundColor Green "Successfully removed $UserName ."
	    }
	    Catch {
	        Write-Host -ForegroundColor Red "An error occured trying to remove $UserName"
	    }
}

One other point.  In my organization I ended up creating the endpoint on quite a few systems to service different locations.  Of course for security reasons we need to change the password on the RunAs account pretty frequently and going to each host became a real chore.  I wasn’t able to make the change through remoting itself, but I did find one thing that makes it a little easier.

In theory you can use Set-PSConfiguration -RunAsCredential <Pre-created Credential Object> but that’s never worked for me.  Instead I quickly unregister the endpoint then re-register, but I use the SecurityDescriptoSddl string that already exists on the object to skip the security dialog.

$Sddl = (Get-PSSessionConfiguration Support).SecurityDescriptoSddl

Unregister-PSSessionConfiguration -Name Support -Confirm:$false

Register-PSSessionConfiguration -Name Support -RunAsCredential PrivUser@MyDomain.local -StartupScript C:\PSSessionConfigs\Support.Startup.ps1 -SecurityDescriptorSddl $sddl -Confirm:$false

Restart-Service WinRM -Confirm:$false

You’ll still get prompted for the password, but you won’t have to click around in the security dialog.

I should point out there are other ways to create constrained endpoints. In particular you can create a session configuration file using the New-PSSessionConfigurationFile cmdlet, similarly limiting visible commands.  I chose to go with the startup script because it offered more granular control.  For example, I couldn’t lock down a command like Remove-ADUser the way I described earlier.

I’ll delve into endpoints more deeply in a future post. The capabilities have evolved quite a bit since I first built this but this story was basically to illustrate a real use case and implementation.  PowerShell had everything we needed to quickly and robustly solve a very serious problem.  Furthermore, this has worked so well it’s stood the test of time. We implemented more than 4 years ago, and only built upon the original work.  Many organizations end up buying 3rd party products to integrate their provisioning processes, but we have no need for it. This was truly a case where Anything PoSh-able | Everything PoSh-able.

Here are some more resources on PowerShell’s remoting features.

  • Introduction to PowerShell Endpoints – MVP Boe Prox guest blogging for the Scripting Guys.  This was the most useful reference I found at the time, it was invaluable.
  • Secrets of PowerShell Remoting – Small E-Book from the DevOps collective.  Don Jones and Tobias Weltner are the principal authors, with Dave Wyatt & Aleksander Nikolik contributing.
  • Once again I did this back in 2015, and I think the JEA terminology had just come into use.  Researching for this article I found MS has been developing these concepts.  Oh this definitely means I’ll have to write a follow-up.  At any rate, check out the JEA Section of the PowerShell Documentation.

Once again this isn’t meant to be a comprehensive walk through. However, for the beginner, or other practitioner trying to address specific issues, I hope this demonstrates the utility PowerShell can bring to your skill set and by extension your organization.

I’m seriously hoping to get some comments on this one. I’m sure there are some tweaks or best practices I may have missed. So, as always, I’d love to get some feedback.  Comment, click follow, or grab the RSS feed to get notifications of future posts.