Working with XML? Don’t forget about XPath

.Net and ASP.Net applications use XML-based configuration files to store and manage their configuration data.  The files are typically stored with the application files and aptly named *.exe.config or web.config respectively.  MS Exchange is largely written in managed .Net code and not surprisingly stores executable and web service configuration data in just such files.  In fact, the %ExchangeInstallPath%\bin folder has dozens of *.exe.config files presumably storing a great many configuration options.

While for the most part Exchange’s default configurations are fine and the standard advice is to avoid changes, modifications are sometimes needed.  Having excellent XML support, PowerShell is an ideal tool to deploy config file changes both initially and in build/update cycles.

This post will demonstrate/discuss techniques I use to modify an Exchange configuration file.  However, XML is a standard, therefore these techniques are by no means limited to the task at hand.  Effectively, this discussion is a single example, the principles of which can be reused to perform similar work on almost any XML document.

In this case, I need to modify Exchange BackPressure thresholds on all my servers.  The settings are controlled by the Edge.Transport.exe.config file partially documented here.  Among other characteristics .Net Config files typically store a series of <add ../> nodes with key/value pairs as attributes formatted similar to below. I’ll be working within the same <appSettings> element/level, but again these techniques should be relevant at any level of any XML document.

<appSettings>
  <add key="AgentLogEnabled" value="true" />
  <add key="ResolverRetryInterval" value="30" />
  <add key="DeliverMoveMailboxRetryInterval" value="2" />
  <add key="ResolverLogLevel" value="Disabled" />
  . . .
</appSettings>

Note: The terminology can be confusing. Hereafter, pay careful attention to the value of the key attribute versus the value attribute, versus the value of the value attribute.

Per the documentation I needed to set the following 4 keys and their respective values:

UsedVersionBuckets.LowToMedium
UsedVersionBuckets.MediumToHigh
UsedVersionBuckets.HighToMedium
UsedVersionBuckets.MediumToLow

Dealing with redundant XML node names is always a little tricky.  I’ll have to isolate the correct node before I can assert the value.  Complicating matters further these keys aren’t present in the file by default.  So, I have to ensure the nodes exist before I can isolate them to assert the value attribute.

First Revision:

$ConfigFile = Join-Path $env:ExchangeInstallPath "Bin\EdgeTransport.exe.config"
$Config     = [XML](Get-Content $ConfigFile)

$ConfigPairs = [Ordered]@{
    'UsedVersionBuckets.LowToMedium'  = "1998"
    'UsedVersionBuckets.MediumToHigh' = "3000"
    'UsedVersionBuckets.HighToMedium' = "2000"
    'UsedVersionBuckets.MediumToLow'  = "1600"
}

$appSettings = Select-XML -Xml $Config -XPath "/configuration/appSettings"

If( $appSettings.Count -eq 1 ) {    
    $appSettings  = $appSettings.Node # Reset appSettings to the returned node
    $ExistingKeys = $appSettings.ChildNodes.Key
    ForEach( $Key in $ConfigPairs.Keys )
    {
        If( $Key -notin $ExistingKeys ) {
            # Add the node:
            $Add = $Config.CreateElement('add')
            $Add.SetAttribute( 'key', $Key )
            $Add.SetAttribute( 'value', $ConfigPairs[$Key] )
            [Void]$appSettings.AppendChild( $Add )
        }
        Else {
            # Node already exist, just set:
            $ConfigKey = $appSettings.ChildNodes | Where-Object{ $_.Key -eq $Key }
            $ConfigKey.value = $ConfigPairs[$Key]
        }
    }
}
Else {
    Write-Host "appSettings node not found!"
}

$Config.Save($ConfigFile)

For ease of reference, I stored the key names and values in an ordered hash table.  Although not required, the ordered hash ensures the sequence of the resulting XML output. 

Note: The order of XML elements usually doesn’t matter, so the [Ordered]hash only ensures the order of new elements among themselves. New elements are appended to the given section. In this case, the 4 elements are new and will ultimately be the last 4 elements within the <appSettings> section. If an element already exists it will be modified in place.

By looping through the dictionary entries I can check if each key is present and create the nodes as needed. But, when the key is already present, I still need to isolate it using a where{} clause before I can set the value.  This code works fine and is certainly satisfactory for the task at hand.  Nevertheless, running Where{} several times across what could be dozens of elements is inefficient.  Performance isn’t a big concern for this type of project, but the code could be better! Furthermore, given the commonality of XML-related tasks refactoring was definitely worth my time.

WARNING: Pay close attention to casing. XML is case sensitive! In an earlier version of the code and this post I had inadvertently capitalized the value attribute.  That caused Exchange Transport Service stop/start failures.  It took me hours to spot the oversight. Furthermore, had this been fully deployed it could have effected organization-wide mail flow.

Second Revision:

$ConfigFile = Join-Path $env:ExchangeInstallPath "Bin\EdgeTransport.exe.config"
$Config     = [XML](Get-Content $ConfigFile)

$ConfigPairs = [Ordered]@{
    'UsedVersionBuckets.LowToMedium'  = "1998"
    'UsedVersionBuckets.MediumToHigh' = "3000"
    'UsedVersionBuckets.HighToMedium' = "2000"
    'UsedVersionBuckets.MediumToLow'  = "1600"
}

$appSettingsPath = '/configuration/appSettings'

ForEach( $Key in $ConfigPairs.Keys )
{
    $Node = Select-XML -Xml $Config -XPath "$appSettingsPath/add[@key='$Key']"
    If( $Node ) {
        # Element exists:
        $Node = $Node.Node
        $Node.value = $ConfigPairs[$Key]
    }
    Else {
        $appSettings = Select-XML -Xml $Config -XPath $appSettingsPath
        If( $appSettings ) {
             # Create the node:
            $appSettings = $appSettings.Node
            $Add = $Config.CreateElement('add')            
            $Add.SetAttribute( 'key', $Key )
            $Add.SetAttribute( 'value', $ConfigPairs[$Key] )
            [Void]$appSettings.AppendChild( $Add )
        }
        Else {
            Write-Host -ForegroundColor Red "appSettings node doesn't exist!"
        }
    }
}

$Config.Save($ConfigFile)

Note: I reversed the logic because after initial deployment, I expect the nodes will exist more often than not.

Note: Although they are case sensitive, the .SelectSingleNode() and .SelectNodes() methods can be used in place of the Select-Xml cmdlet.

The second example leverages XPath more aggressively, obviating the need for a Where{} clause by directly checking for the key’s presence. Each loop iteration evaluates an expanding string to substitute the current value of $Key into the XPath query executed by Select-Xml.  Then, conditioned on the presence or lack of a result, the code either asserts the value attribute or creates the needed element.

XPath is succinct but somewhat opaque. I use it enough to have it in my toolbox, but too infrequently to be fluent.  To help craft the queries I sometimes use the XML Tools extension for VSCode.

The add-in has a neat feature that reveals the XPath query respective to the cursor position in the document. Simply position the cursor on the element you’re interested in, invoke the command pallet (Ctrl + Alt + P), and start typing “XML Tools: Get Current XPath”.

Hit enter and the XPath query string /configuration/appSettings/add[70]/@key is returned.  As partly indicated by the common array index syntax […], the pseudo-code translation can be read as a directive to return the value of the key attribute at the 70th instance of the <add> element within the <appSettings> section. This feature is granular enough to reveal the XPath statement down to the node or attribute level, depending on where you position the cursor.  While the returned string isn’t exactly what’s needed, it is a head start to crafting the more specific query below:

/configuration/appSettings/add[@key='UsedVersionBuckets.LowToMedium']

Adjusted per each iteration, the above XPath query is a directive to return the <add> element where the key attribute is present and has the sought-after value.

Conclusion:

XPath is a powerful path syntax and query language for selecting nodes from XML data.  Properly crafted XPath queries can be used to simplify and streamline your code.  Combined with other PowerShell features and techniques we can create concise, efficient and reusable code patterns for many different scenarios.  In this case, I used an ordered hash table to store the desired configuration elements making it very easy to check for, create and/or assert values in a short easy to comprehend set of statements.

Additional Resources:

Advertisement

March 2021 Applying MS Exchange 0-Day Patches

On March 2nd Microsoft released Exchange Server Security Updates to address several 0-day exploits targeting Exchange servers. The vulnerabilities, update & mitigations have been covered thoroughly by Microsoft et al. So I’m not going to rehash it here.  If you need additional information, please check the references section at the end of this post. I’ll do my best to keep it updated as the situation is still evolving.

I began planning the update deployment almost immediately, however, there were some rumblings in the community about installation issues.  Several people in this Reddit post described issues, including outright failures and services being left disabled and/or not starting after the installation.

As you might expect, I tested the patch in a lab environment and indeed quite a few services were left in a disabled state.  So, this post is about how I corrected the issue.

I ran the package from the command line with msiexec.exe /Update <Path> /passive /promptrestart.  Despite the arguments the server rebooted without prompting.  After the reboot, quite a few services were stopped and disabled.

DisplayName                                          StartType  Status
-----------                                          ---------  ------
Application Identity                                  Disabled Stopped
Computer Browser                                      Disabled Stopped
IIS Admin Service                                     Disabled Stopped
Internet Connection Sharing (ICS)                     Disabled Stopped
Microsoft Exchange Active Directory Topology          Disabled Stopped
Microsoft Exchange Anti-spam Update                   Disabled Stopped
Microsoft Exchange DAG Management                     Disabled Stopped
...
Microsoft Exchange Unified Messaging                  Disabled Stopped
Microsoft Filtering Management Service                Disabled Stopped
NetBackup SAN Client Fibre Transport Service          Disabled Stopped
Performance Logs & Alerts                             Disabled Stopped
Remote Registry                                       Disabled Stopped
Routing and Remote Access                             Disabled Stopped
ScanMail EUQ Monitor                                  Disabled Stopped
Smart Card                                            Disabled Stopped
SSDP Discovery                                        Disabled Stopped
Tracing Service for Search in Exchange                Disabled Stopped
UPnP Device Host                                      Disabled Stopped
Windows Management Instrumentation                    Disabled Stopped
World Wide Web Publishing Service                     Disabled Stopped

Note: For brevity, some Exchange services were truncated from the above table.

Notice it wasn’t just Exchange services.  For example, IIS AdminService and WMI were both disabled.    From above, I couldn’t tell with certainty which services were disabled by the update installer or what their original start modes were.

To correct this I decided to compare the disabled services to the services on an unaffected Exchange server. On the affected server I ran:

Get-Service | 
Where-Object{ $_.StartType -eq 'Disabled' } |
Export-Csv -Path 'C:\Temp\BadServiceState.csv'

I took that file to an unaffected server and ran:

Import-Csv -Path 'C:\Temp\BadServiceState.csv' |
Get-Service |
Export-Csv -Path 'C:\Temp\GoodServiceState.csv'

Finally, to fix the services, I returned to the troubled server and ran the below loop:

Import-Csv -Path  'C:\Temp\BadServiceState.csv' |
ForEach-Object{ Set-Service $_.Name -StartupType $_.StartType }

At this point, all the startup modes were correct, but I didn’t have a quick way to start the services.  I didn’t want to spend the time tracing out the dependencies to ensure everything would start.  So, I simply let an additional reboot take care of it for me.

After the reboot I reapplied the patch for good measure. This time I ran it via the GUI and had no issues.

To further diagnose the issue I took a quick look at the file C:\ExchangeSetupLogs\ServiceControl.log. The log lists all the services that are stopped and disabled in a format similar to below.

	[08:58:17] Stopping service 'hostcontrollerservice'.
	[08:58:50] Stopping service 'FMS'.
	…
	[08:58:52] Disabling service 'FMS'.
	[08:58:52] Disabling service 'hostcontrollerservice'.
	…

However, the log does a poor job of showing the service configuration prior to the installation.  The process interrogates all services not just those that were changed, making it difficult to parse the file for relevant data. So, instead, I grabbed 2 files from the C:\ExchangeSetupLogs folder while the installer was running.

ServiceStartupMode.xmlRecords the service startup configurations prior to the install.
ServiceState.xmlRecords the service state prior to the install.

Apparently these files are used in the last stages of the installation to return service configurations to normal.  Unfortunately, the files are removed at the end of even a faulty install, but if you can grab them during the install you can use a little PowerShell magic to right the ship afterward.

Both files are formatted as Common Language Infrastructure (CLI) XML representations of native PowerShell objects.  In fact, it’s likely these files were created using the Export-CliXml cmdlet.  This is the same type of XML serialization used by PowerShell remoting to communicate objects over the wire.  As such, they are very easy to import and work with in another PowerShell console.

ServiceStartupMode.xml stores an array of hash tables with Name & StartupType keys.  I presume these are used by the installation as splat parameters for the Set-Service cmdlet so one way to leverage the file is:

Import-Clixml <PathTo_ServiceStartMode.xml> | 
ForEach-Object{ Set-Service @_ -ErrorAction SilentlyContinue }

Now, If you reboot the server the services should start the same as before.

Because you’re passing an array of hash tables down the pipeline you can use @_ as the current pipeline element.  Set-Service will treat that as typical splatting.

Note: The file contains information from all services not just the ones modified by the installer.  Since there are some services that can’t be changed, the -ErrorAction SilentlyContinue argument will spare you from profuse error output.

ServiceState.xml stores the actual, albeit serialized ServiceController objects.  As I mentioned before due to service dependencies it would take some work to use this file for corrective action.  However, it may be useful for reporting or other diagnostics.

Of course, attempting to capture these files mid-install is a little inconvenient.  As an alternative PowerShell makes it very easy to capture the same data. You can run the below code to generate the files before running the install package.

Get-Service |
ForEach-Object{
    @{
    Name = $_.Name
    StartupType = $_.StartType
    }
} | 
Export-Clixml -Path c:\temp\ServiceStartupModes.xml

Get-Service | 
Export-Clixml -Path C:\temp\ServiceState.xml

Note: The MS version of the ServiceStartupModes.xml file uses “StartMode” as the key. “StartMode” is an alias for the –-StartupType parameter in the Set-Service cmdlet. An earlier version of this post used “StartType” which is also an alias but only in PowerShell Core 7.x. Ergo, I decided to forego the aliases and use the actual parameter name “StartupType”. However, notice the value is still $_.StartType, the property from any given service.

I also spoke with Microsoft Support and asked them if in future patch releases  they can retain the ServiceStartupMode.xml & ServiceState.xml files in the C:\ExchangeSetupLogs folder.  It’s a simple change that could make a huge difference while working in semi-crisis 0-Day patching scenarios.

MS Support also mentioned some service start issues being linked to missing .DLL files in the /bin folder. They suggested exporting a directory listing of the /bin to a text file. With the export you can use any of a number of methods to isolate missing files and recopy them from a known good server. You can then use the file to figure out which files are missing and copy them back from the known good server.

Quick PowerShell command to export a file listing:

Get-ChildItem "$($env:exchangeinstallpath)bin" -Recurse -Include "*.dll", "*.exe" | 
Select-Object -ExpandProperty FullName | 
Set-Content c:\temp\DLLlist.txt

Hopefully these quick & dirty tricks will help you get through this update cycle a little easier. Feedback is always welcomed, comment, click follow or grab the RSS feed to get notifications of future posts.

Additional Resources Regarding Recent Exchange 0-Day Exploits:

PowerShell Performance Part 2, Reading Text Files

This is part 2 of my informal blog series on PowerShell performance.  In part 1 I discussed some strategies for measuring performance.  In part 2 I’ll be covering file read performance and related techniques and use cases.  Because of the volume of information, I’ll cover writing file data in part 3.

Working with text files is fundamental.  Tasks like reading and parsing log files are exceedingly common in both interactive and programmatic scenarios.  It’s no surprise a lot has already been written about PowerShell performance in this area.  My goal here is to conduct a comprehensive study of file read techniques to determine the best options in different situations.  As the title implies I’m particularly interested in performance but code readability and memory utilization will also be considered.

PowerShell’s primary tool for reading text files is the Get-Content (GC) cmdlet.  Like many native cmdlets, GC offers broad capabilities. For example, it can easily read different encodings including non-text data.  No surprise, GC’s flexibility comes with a performance penalty; it’s earned a reputation for being quite slow.  As such, a number of alternate techniques have gained popularity, especially those that directly leverage .Net classes.

Study Methodology:

As described in Part 1, I don’t want to rely on a single measurement.  So, I ran each technique through a 10 iteration loop.  Those techniques that generate a single string were re-run through another 2 loops.  The first, using the -split operator and the other using the .Split() method.  Get-Content can return both types, but defaults to an array, so I wanted to ensure comparison of like return types while including the typical expectation.  The data should be sufficient to pick the fastest approach for the desired output type.

Note/Warning: .Split() will split on every character in its argument.  Therefore, splitting on the default Windows line ending results in unintended empty elements.  To compare fairly with the -split I included the [System.StringSplitOptions]::RemoveEmptyEntries argument in the tests.  However, that will also remove naturally occurring blank lines; a potential problem if you are expecting and/or need them. I included the .Split() variations because it still works well where blanks aren’t an issue, which is often the case with text logs. 

Test files were created by copying data from an IIS log file into 100KB, 2.5MB, 25MB, 50MB, 100MB and 1GB files. I maintained ASCII encoding throughout.

I ran each test in a fresh PowerShell console window.  Seeing as there’s overlap between command permutations and/or .Net classes I didn’t want any of the caching functionality mentioned in Part 1 to skew the results.

To evaluate the impact on memory, I monitored the \Process\Private Bytes counter for each run.

Note: All tests were performed with PowerShell 5.1.


Here’s are the techniques I tested and their respective test code:

  • Get-Content
1..10 | ForEach{ (Measure-Command { Get-Content $file }).TotalMilliseconds }
  • Get-Content -Raw

    Returns a single string including line ending characters.  As mentioned the -Raw parameter will be retested with the additional splits.
1..10 | ForEach{ (Measure-Command { Get-Content  $file -Raw }).TotalMilliseconds }
1..10 | ForEach{ (Measure-Command { (Get-Content  $file -Raw) -split "`r`n" } ).TotalMilliseconds }
1..10 | ForEach{ (Measure-Command { (Get-Content  $file -raw).split("`r`n", [StringSplitOptions]::RemoveEmptyEntries)}).TotalMilliseconds}
  • Get-Content -ReadCount 0

The -ReadCount parameter determines how many lines are passed down the pipe at a time.  -ReadLine 0 will pass all lines down the pipe at once.  This generally precludes cleanly placing | ForEach-Object{} directly after the Get-Content cmdlet, because $_ will actually be an array consisting of whatever number of objects were specified with -ReadCount. This method is fine if you need to store the data in a variable.

1..10 | foreach{ (Measure-Command { Get-Content 'C:\temp\TestFiles\Test100MB.txt' -ReadCount 0}).TotalMilliseconds }
  • [System.IO.File]::ReadAllLines()

Reference: MS Documentation

The System.IO.File class offers functionality for working with files.  The ReadAllLines static method is particularly useful and has been my go-to alternative for quite a while.  It returns a string array ([String[]]) which operationally equivalent to Get-Content‘s [Object[]] return. So, withstanding the break from verb-noun syntax it’s an easy drop-in alternative.

1..10 | ForEach{ (Measure-Command { Get-Content $file -ReadCount 0}).TotalMilliseconds }

Note: Shorthand below may refer to this as [IO.File]::ReadAllLines() or just ::ReadAllLines()

  • [System.IO.File]::ReadAllText()
    like GC -Raw this will read the entire file into memory as a single string, including the line break characters.  So, it too will be tested with the additional splits.
1..10 | ForEach{ (Measure-Command { [System.IO.File]::ReadAllText( $file ) }).TotalMilliseconds }
1..10 | ForEach{ (Measure-Command { [System.IO.File]::ReadAllText( $file ) -split "`r`n" }).TotalMilliseconds }
1..10 | ForEach{ (Measure-Command { [System.IO.File]::ReadAllText( $file ).Split("`r`n",[StringSplitOptions]::RemoveEmptyEntries) }).TotalMilliseconds }

Note: Shorthand below may refer to this as [IO.File]::ReadAllText() or just ::ReadAllText()

  • System.IO.StreamReader object using the .ReadLine() method

Reference: MS Documentation

StreamReader reads a stream of bytes as text. Usually, it’s more verbose than other techniques.  It’s not as neat as ::ReadAllLines() but it’s a common and well-advertised alternative to Get-Content.  Using StreamReader generally follows a loop pattern common to many languages.  Once the file is open, read and processing commands are placed in a loop stepping through each line until the EndOfStream value evaluates to true and executing the .Close() method immediately after.

1..10 | ForEach{ (Measure-Command {
$Stream = [System.IO.StreamReader]::new( $file )
While( !$Stream.EndOfStream ) { 
	$Stream.ReadLine()
	# Do some other stuff with the data…
}
$Stream.Close() } ).TotalMilliSeconds }

This pattern doesn’t return an array and cannot be piped. Of course that makes it a little more difficult to work with incoming lines. In practice, you’d probably assign the incoming line to a variable to work with it further. You can easily store the output in a variable to facilitate piping, but I’d only do so if it was already a requirement. It’s slower and more memory intense so if it’s just for piping you’re better off doing the work in the existing loop.

Note: Shorthand below may refer to this as $Stream.ReadToEnd() or just .ReadLine()

  • System.IO.StreamReader object using the .ReadToEnd() method
1..10 | ForEach{ (Measure-Command {
$Stream = [System.IO.StreamReader]::new( $file )
$Stream.ReadToEnd()
$Stream.Close() } ).TotalMilliSeconds }

1..10 | ForEach{ (Measure-Command {
$Stream = [System.IO.StreamReader]::new( $file )
$Stream.ReadToEnd() -split "`r`n"
$Stream.Close() } ).TotalMilliSeconds }

1..10 | ForEach{ (Measure-Command {
$Stream = [System.IO.StreamReader]::new( $file )
$Stream.ReadToEnd().Split("`r`n", [StringSplitOptions]::RemoveEmptyEntries )
$Stream.Close() } ).TotalMilliSeconds }

Note: Shorthand below may refer to this as $Stream.ReadToEnd() or just .ReadToEnd()


Observations:

The study confirms Get-Content quite a bit slower than other methods but there are some other very interesting observations. Below, I graphed some data from the 100MB file tests:

Note: I choose to display 100MB results because the graph seems a better representation. With the smaller files, relatively small differences were over-represented.

Note: Above, green are techniques that return an array, blue are single string returns and red are single string returns split after the fact.

Of those techniques that return an array, Get-Content is by far the slowest, taking 1271ms. .ReadLine() & ::ReadAllLines() averaged 585 & 675ms. That’s a significant difference that could really add up when processing many files. Get-Content -ReadCount 0 performed better but was still way behind both the .Net approaches which were respectively ~200/100ms faster.

I was surprised by the difference between the 2 .Net approaches above. I’ve always favored ::ReadAllLines() because it’s so easy to use in typical PowerShell code.  Whenever I’ve read about StreamReader I’d do a quick test and ::ReadAllLines() was always faster.  Now, looking at my results across file sizes it seems [IO.File]::ReadAllLines() is faster for smaller files, but $Stream.ReadLine() method is faster for “larger” files. Take a look at the below table.

FileSize[System.IO.File]::ReadAllLines()StreamReader’s .ReadLine() method
100KB1.332.05
2.5MB14.1416.60
25MB169.74153.60
50MB329.51290.87
100MB675585

This is an interesting find because it offers some logic on which technique to use when. If you’re processing many small files ::ReadAllLines() may perform better. If you’re dealing with larger files you may want to accept slightly more complex code to implement the StreamReader. Either way, both approaches are valid and perform far better than Get-Content.

Of course, I don’t know how these observations would play out in a larger program. $Stream.ReadLine() requires a loop. Assuming you pack further operations into the same loop the only additional overhead is from those operations. Any additional overhead needed to loop with[IO.File]::ReadAllLine() is not accounted for in these tests.

Given the admittedly arbitrary file sizes, more testing is necessary to determine where the performance advantage flips. Moreover, I’d like to see how this plays out in more realistic scripts. I’ll post a follow-up with that information as soon as I can pull it together.

The .Net methods that return a single string are the fastest overall. They perform similarly to one another. ::ReadAllText() outperformed .ReadToEnd() by a mere 15ms (407 Vs. 422ms) . Both .Net methods are very good alternatives to Get-Content -Raw which clocked in at 1049ms – ~2.5x slower!

Not surprising, but splitting the string after the fact added significant overhead. If you need an array ::ReadAllText() & .ReadToEnd() aren’t the best options. Unbelievably, and despite the extra overhead, when using the .Split() method both .Net methods were still faster than Get-Content alone.

Another revelation from these tests; .Split() consistently outperformed the -split Operator. This was true across all tested sizes but the differences were modest on smaller files and exaggerated larger ones. This seems to indicate splitting larger strings is faster using .Split(), but this too calls for a follow-up post. I’d like to re-test the 2 split techniques independent of file read operations. Some use cases may allow splitting on a single newline character so I also want to see how .Split() performs without removing the empties.

Memory Considerations:

Memory is a concern, particularly when processing many large files. obviously, the techniques that return a single string used the most memory, but there were still some surprises.

Note: These are peak measurements taken from perfmon during each test.

All the sessions started out using ~72MB. Get-Content & $Stream.ReadLine() had no detectable impact on memory! I was surprised to see that [IO.File]::ReadAllLines() used about 525MB.

I expected the techniques that return single string to use the most memory. Indeed Get-Content -Raw consumed 1.3GB even before splitting. However, $Stream.ReadToEnd() & [IO.File]::ReadAllText() were more modest at ~525MB. Get-Content -ReadCount 0 used ~600MB most likely because it has to pass all the file’s lines down the pipeline.

Memory is generally not a concern. PowerShell relies on .Net to manage memory through background garbage collection which frees unused memory either when needed or on a schedule. Different underlying collection behaviors may explain some of these disparities, particularly between .ReadLine() & ::ReadAllLines(). However, the larger the file the greater the risk of memory exhaustion.

All the methods that return a single string ran out of memory trying to read a 1GB file. This was true even when >3GB was available. Secondary testing showed storing the output in RAM required 3-4x the file size. Thankfully, even if you had a use case for single strings, you could certainly adapt one of the more memory friendly methods.


Conclusion:

The most glaring and unfortunate conclusion is that Get-Content is still unacceptable slow. Comparatively, Get-Content under performed in all use cases and permutations. However, PowerShell’s ability to utilize .Net classes offers a rich set of alternatives that cover pretty much any file read scenario.

It’s healthy to revisit old assumptions once in a while. Obviously I knew a bit about this topic beforehand, but going through a formal experiment uncovered some new information and questions. I’ll be writing an addendum soon to address the following points:

  1. [IO.File]::ReadAllLines() & $Stream.ReadLine(). Is the former faster for smaller files and the latter faster for larger ones. And if so, at what point does it flip? In other words, define large & small in this context.
  2. Determine if garbage collection impacting the performance differentials between [IO.File]::ReadAllLines() & $Stream.ReadLine() .
  3. Additional StreamReader examples & code patterns, merits & demerits of different approaches.
  4. Separate experiment to determine the performance difference between .Split() than -Split. Evaluate the additional impact of [System.StringSplitOptions]::RemoveEmptyEntries .

As always, I’d love to get some feedback.  Comment, click follow or grab the RSS feed to get notifications of future posts.

PowerShell Performance Part 1, Introduction:

Comparatively speaking, PowerShell’s concise and flexible syntax translates to faster development, but usually not faster programs.  From its earliest days, one of PowerShell’s most annoying drawbacks is its comparatively poor performance.

I typically start script projects as a purist, leveraging native functionality, cmdlets, pipes, where clauses etc.  All the stuff that makes PowerShell code easy to write. Frequently the elation of quickly building a working prototype is overshadowed by unsatisfactory performance, and what follows is a time-consuming effort to refactor the code.  There may be better workflows to implement, but the issues usually boil down to PowerShell itself.  In turn, this forces a regressive break from the purist approach. Mining for performance results in a proliferation of increasingly exotic techniques within the code.

Refactoring is rather unfortunate because it erodes some of PowerShell’s eloquence.

Hazards of Refactoring for Performance (An incomplete list for sure):

  1. The resulting time deficit is the biggest problem, encompassing the additional effort to improve performance and generally more difficult maintenance.
  2. Code can become less readable.  Resulting code can end up so far from PoSh norms that it’s difficult even for the author to revisit.  Comments can help, but it’s difficult to relay the reasoning behind hard fought code.
  3. It’s hard to foresee at the outset, but you may only yield so much improvement.  Hence you can get lured into a time consuming yet relatively fruitless effort

I’ve spent a lot of time on these types of problems; in fact, I’m a little obsessed with maximizing performance.  This is the first in a series of posts about various tips & tricks I use to make things a bit faster.

It goes without saying that to analyze performance we need a method of measuring it.  PowerShell provides the cmdlet Measure-Command for this.  Measure-Command makes it easy to gauge the performance of almost any block of code.  Here I’m going to focus on simple commands, but keep in mind you can do this on much bigger script blocks.  So if you’re testing a different code pattern as opposed to a simple command, Measure-Command is still quite useful.

Measure-Command { Get-Process Explorer }
Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 8
Ticks             : 81151
TotalDays         : 9.39247685185185E-08
TotalHours        : 2.25419444444444E-06
TotalMinutes      : 0.000135251666666667
TotalSeconds      : 0.0081151
TotalMilliseconds : 8.1151

Performance usually becomes a factor when processing large collections of data, where a relatively small performance difference can accumulate into a sub-optimal run-duration.  As such we’ll often be comparing granular pieces of code but this isn’t as straight forward as Measure-Command would have us believe.  One thing to consider is the reliability of the measurements.  After exalting the awesomeness of Measure-Command, this seems like a stupid question, but there are circumstances where inconsistent results can be presented.  This should be easy to address; repeat the tests and compare averages.  In other words, use a larger sample size.  Nevertheless, there are some anomalies I’ve never been comfortable with.

To get a larger sample size you can wrap a particular test in a loop, something like:

For($i = 1; $i -le 20; ++$i)
{ ( Measure-Command { Get-Service AtherosSvc } ).TotalMilliseconds }

I ran this test 3 times, but the results were questionable:

Note: Just to make sure, I repeated this with several different common cmdlets & sessions:

  1. Get-Service
  2. Get-Process
  3. Get-ChildItem
  4. Write-Host

Obviously something isn’t right here.  The first run is way slower than the others.  A few days ago at MS Ignite 2019, I spoke with PowerShell team members Jason Helmick & Tyler Leonhardt.  They confirmed my long-held suspicion that this is due to a caching mechanism.  That’s probably a good feature, but something to be aware of when you’re measuring performance.

This doesn’t really invalidate the test either.  If you are comparing 2 different commands generally the first run of each will be predictive enough.  However, I always track the results over some number of executions.  Furthermore, this appears to be specific to the cmdlet (or more likely the underlying .Net classes) not the command. Hence if you run Get-Service Service_1, then Get-Service Service_2, the second command should show the faster result.

As of yet, I’m not sure how long the cmdlet stays cached but considering this is a performance scenario centered on repetition it’s not a big concern.  Granted this lends itself to some ambiguity, but I’ll, try to resolve that in a later post.

Additional Guidelines I use when Evaluating Performance:

  1. Don’t look at performance in isolation.  Whether we’re talking about PowerShell or anything else. Performance is based on a variety of environmental factors including CPU disk, and memory, which are important to consider.
  2.  Always consider expected runtime conditions.  Say you choose a faster but more memory intensive piece of code.  If you then run it on a memory constrained system you may get unexpected results.
  3. Stay current!  The PowerShell team is aware of performance issues and they’ve been steadily improving the native cmdlets.  This is especially true since PowerShell went open source. So, if you’ve been using a faster technique on an older version, you may want to recheck it.  It’s possible whatever issue has been resolved, and your code can remain a little more pure.
  4. Think before you refactor.  Just because something is faster doesn’t mean you should use it.  You have to decide the specific tradeoffs.
  5.  Always re-evaluate performance after implementation.  Sometimes despite our best efforts we still miss the mark.  Of course that could mean anything, but don’t draw conclusions based purely on pre-testing.  Besides, it’s always a bit more satisfying to demonstrate improvements in the final product.

This first post on driving PowerShell performance should establish a reasonable, albeit informal framework for testing and validating such improvements.

As always, I’d love to get some feedback.  Comment, click follow or grab the RSS feed to get notifications of future posts.

Removing A Specific Error from the $Error Collection:

This is my first post, and I really struggled with what to write.  I didn’t want to write something heavy straight away.  So, I decided to write about an interesting observation.  In this case, a neat error handling trick, something that’s been there the whole time, but I never thought of leveraging.

Back in the VBScript days a common pattern was to set On Error Resume Next, run some statement(s) then check if Err.Number <> 0. Followed by some handling code, often a simple echo, then Err.Clear.

The Err object is a simple mechanism to test for problems, but it only stores the last error.  PowerShell’s more robust error handling includes Try/Catch which is similar albeit more eloquent.  Both Try/Catch & Err must be present where errors are expected.  However, PowerShell’s $Error collection is different offering an ability to analyze errors post-run.  $Error can be used to identify unexpected errors that may be the subject of further improvements etc…

Recently, while debugging issues in a very busy script, I wrote $Error to a log file.  This had always worked “well enough”, however, in this case there was too much noise to make use of the output.

I needed to retain unexpected errors while discarding the handled and/or unimportant ones.  It’s reminiscent of VBS’s Err.Clear, but I can’t use PowerShell’s $Error.Clear() because it will clear the entire collection.  Instead I need to tactically remove records from the collection.

Luckily $Error, is an instance of System.Collections.ArrayList.  ArrayList is a really useful class, and well documented elsewhere, so, I won’t get in-depth on it here.  However, for this case ArrayList’s .Remove() method can remove a specific object from the collection when passed that object.

For Example:

[collections.ArrayList]$arrList = @( "one","two","three" )
$arrList.Remove( "two" )

Bringing this back to the $Error collection, you can remove the last error with something like:

$Error.Remove($Error[0])

The point in the above example is that $Error[0] is the last ErrorRecord object, so you could go with a pattern like:

#Start with an empty error collection:
$Error.Clear()
Try
{
    #Just to get an error:
    Remove-Variable DoesNotExist -ErrorAction Stop
}
Catch
{
    Write-Host "Caught error, current count: $($Error.Count)"
    # $Error.Remove($_)
    $Error.Remove($Error[0])
    Write-Host "Count after removing last error: $($Error.Count)"
}

One thing to notice is $Error.Remove( $_ ) doesn’t work! That’s a little shocking considering above PoSh 3.0 $_ in a catch block should be identical to $Error[0]. I’m guessing there’s some kind of referencing going on, but I’ll have to work on figuring that out.

At any rate, you can use this trick to keep your error collection as clean as possible, potentially making post-analysis a lot easier.

So that’s it for my very first post. Let me know what you think in the comments, I’d love to hear from you.