New-DepthGaugeFile.ps1: The Powershell Pipeline Is Neat, But It's Also Slow

If you know me or this blog at all, you know that first and foremost, I think Powershell is awesome.  It is essential to any Windows system administrator or I.T. pro's success.  The good news is that there are a dozen ways to accomplish any given task in Powershell.  The bad news is that eleven of those twelve techniques are typically as slow as a three-legged tortoise swimming through a vat of cold Aunt Jemima.

This is not the first, or the second, blog post I've made about Powershell performance pitfalls.  One of the fundamental problems with super-high-level languages such as Powershell, C#, Java, etc., is that they take raw performance away from you and give you abstractions in return, and you then have to fight the language in order to get your performance back.

Today I ran across another such example.

I'm creating a program that reads from files.  I need to generate a file that has "markers" in it that tell me where I am within that file at a glance.  I'll call this a "depth gauge."  I figured Powershell would be a simple way to create such a file.  Here is an example of what I'm talking about:

Depth Gauge or Yardstick File


The idea being that I'd be able to tell my program "show me what's at byte 0xFFFF of file.txt," and I'd be able to easily visually verify the answer because of the byte markers in the text file.  The random characters after the byte markers are just gibberish to take up space.  In the above example, each line takes up exactly 64 bytes - 62 printable characters plus \r\n.  (In ASCII.)

I reach for Powershell whenever I want to whip up something in 5 minutes that accomplishes a simple task.  And voila:

Function New-DepthGaugeFile
{
    [CmdletBinding()]
    Param([Parameter(Mandatory=$True)]
          [ValidateScript({Test-Path $_ -IsValid})]
          [String]$FilePath, 
          [Parameter(Mandatory=$True)]
          [Int64]$DesiredSize, 
          [Parameter(Mandatory=$True)]
          [ValidateRange(20, [Int32]::MaxValue)]
          [Int32]$BytesPerLine)
    Set-StrictMode -Version Latest
    [Int64]$CurrentPosition = 0
    $ValidChars = @('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',
                    'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z',
                    '0','1','2','3','4','5','6','7','8','9',
                    '!','@','#','$','%','^','&','*','(',')','_','+','-','=','?','<','>','.','[',']',';','`','{','}','\','/')
    
    Try
    {
        New-Item -Path $FilePath -ItemType File -Force -ErrorAction Stop | Out-Null
    }
    Catch
    {
        Write-Error $_
        Return
    }    
    
    [Int32]$BufferMaxLines = 64
    [Int32]$BufferMaxBytes = $BufferMaxLines * $BytesPerLine

    If ($DesiredSize % $BytesPerLine -NE 0)
    {
        Write-Warning 'BytesPerLine is not a multiple of DesiredSize.'
    }


    $LineBuffer = New-Object 'System.Collections.Generic.List[System.String]'
        
    While ($DesiredSize -GT $CurrentPosition)
    {
        [System.Text.StringBuilder]$Line = New-Object System.Text.StringBuilder
        
        # 17 bytes
        [Void]$Line.Append($("{0:X16} " -F $CurrentPosition))
        
        # X bytes
        1..$($BytesPerLine - 19) | ForEach-Object { [Void]$Line.Append($(Get-Random -InputObject $ValidChars)) }        
        
        # +2 bytes (`r`n)        

        [Void]$LineBuffer.Add($Line.ToString())
        $CurrentPosition += $BytesPerLine

        # If we're getting close to the end of the file, we'll go line by line.
        If ($CurrentPosition -GE ($DesiredSize - $BufferMaxBytes))
        {
            Add-Content $FilePath -Value $LineBuffer -Encoding Ascii
            [Void]$LineBuffer.Clear()
        }

        # If the buffer's full, and we still have more than a full buffer's worth left to write, then dump the
        # full buffer into the file now.
        If (($LineBuffer.Count -GE $BufferMaxLines) -And ($CurrentPosition -LT ($DesiredSize - $BufferMaxBytes)))
        {
            Add-Content $FilePath -Value $LineBuffer -Encoding Ascii
            [Void]$LineBuffer.Clear()
        }
    }
}

Now I can create a dummy file of any size and dimension with a command like this:

New-DepthGaugeFile -FilePath 'C:\testfolder1\largefile2.log' `
                   -DesiredSize 128KB -BytesPerLine 64

I thought I was being really clever by creating an internal "buffering" system, since I instinctively knew that performing a file write operation (Add-Content) on each and every loop iteration would slow me down.  I also knew from past experience that overuse of "array arithmetic" like $Array += $Element would slow me down because of the constant cloning and resizing of the array.  I also remembered that in .NET, strongly-typed lists are faster than ArrayLists because we want to avoid boxing and unboxing.

Despite all these little optimizations, here is the performance while writing a 1MB file:

Measure-Command { New-DepthGaugeFile -FilePath C:\testfolder1\largefile.log `
                                     -DesiredSize 1MB -BytesPerLine 128 }

TotalSeconds    : 103.8428624

Over 100 seconds to generate 1 megabyte of data.  I'm running on an SSD that is capable of copying hundreds of megabytes per second of data, so the storage is not the bottleneck.

To try to speed things up, I decided to focus on the line that appears to be doing the most amount of work:

1..$($BytesPerLine - 19) | ForEach-Object { 
             [Void]$Line.Append($(Get-Random -InputObject $ValidChars)) }

The idea behind this line of code is that we add some random characters to each line.  If we want each line to take up 64 characters, then we would add (64 - 19) characters to the line, because the byte marker at the beginning of the line, plus a space character, takes up 17 bytes.  Then then the newline and carriage return takes up 2 bytes.

My first instinct was that the Get-Random action was taking all the CPU cycles.  So I replaced it with static characters... and it made virtually no difference.  Maybe it's the pipe and Foreach-Object?  Let's change it to this:

For ($X = 0; $X -LT ($BytesPerLine - 19); $X++)
{
    [Void]$Line.Append($(Get-Random -InputObject $ValidChars))
}

And now the results:

Measure-Command { New-DepthGaugeFile -FilePath C:\testfolder1\largefile.log `
                                     -DesiredSize 1MB -BytesPerLine 128 }

TotalSeconds    : 61.0464638

Exact same output, almost twice as fast.

Certificate Store Backup and Cleanup (With a Little Powershell)

I was thinking about HTTPS packet inspection the other day.  The type of HTTPS packet inspection that could be performed with a product such as Forefront Threat Management Gateway, for instance.  Basically, you start by funneling everyone's network traffic through your gateway.  Second, you install an SSL certificate into the Trusted Root CA certificate store on all of the client computers whose encrypted traffic you wish to inspect. Now, your gateway is ready to act as a "man-in-the-middle" and decrypt everyone's outbound traffic.  Traffic to their personal email accounts on Gmail and Hotmail... traffic to their online banking websites... transparently, without their knowledge.

I.T. departments do this to their employees all the time.  But I wonder... if it's so easy for I.T. departments, what would stop something like a government agency from installing this same sort of gateway in an ISP datacenter, and sniffing everyone's HTTPS traffic?

If only the government had some way of getting a trusted CA certificate into the cert store on everyone's computer...


So anyway, this thought led me to think about cleaning up my own certificate stores on my personal machines.  We trust so many certificates by default, and we don't really know what they all are or where they came from.  Most of us who use Windows just rely on Microsoft's Windows Root Certificate Program to tell us which root CAs we should trust by default.

So first, we need to turn off Automatic Root Certificates Update via Group Policy (if you administer an Active Directory domain) or via local security policy if you are on a standalone PC:

Computer Configuration > Administrative Templates > System > Internet Communication settings > Turn off Automatic Root Certificates Update


If you leave this setting turned on, Windows will use the internet to re-download and replace any root CA certificates that it thinks you should trust.  So you'll delete them, and they'll just reappear.

Secondly, there are many certificates in your certificate stores for a reason.  Many of them are there for verifying signed code, such as kernel drivers, for example. If Windows cannot verify their digital signatures, they won't load. Windows might not even boot properly.  Nevertheless, Microsoft continues to ship some very old certificates with Windows, such as these:


They're there for "backwards compatibility," of course.  I decided I'd take the risk that I didn't need certificates from the 20th century anymore, and figured I would delete them.

Then there's this guy:


On the list of organizations that engender warm, fuzzy feelings of implicit trust, this one is pretty much at the rock bottom of that list... right above the Nigerian pirate running a CA on his laptop.  Nevertheless, this is one of those root certificate authority certs that is protected and automatically distributed by the Windows Root CA Program.  But since we've disabled the automatic certificate update, and I don't feel like I should be compelled to trust this certificate authority, it's time to delete it.

But, one last thing before we delete the certificates.  Let's make a backup of all of our certificate stores, so that if we accidentally delete a certificate that's required for something important, we can restore it.  It took about 15 minutes of Powershell to write the backup script.  With another 5 minutes, you could sprinkle a little decoration on it and make a cmdlet out of it.

# Backup-CertificateStores.ps1
Set-StrictMode -Version Latest
[String]$CertBackupLocation = "C:\Users\Ryan\Documents\CertStoreBackup_$([Environment]::MachineName)\"

If (-Not (Test-Path $CertBackupLocation))
    { New-Item $CertBackupLocation -ItemType Directory | Out-Null }

$AllCerts = Get-ChildItem Cert:\ -Recurse | Where PSIsContainer -EQ $False

Foreach ($Cert In $AllCerts)
{
    [String]$StoreLocation = ($Cert.PSParentPath -Split '::')[-1]    
    If (-Not (Test-Path (Join-Path $CertBackupLocation $StoreLocation)))
        { New-Item (Join-Path $CertBackupLocation $StoreLocation) -ItemType Directory | Out-Null }

    If (-Not $Cert.HasPrivateKey -And -Not $Cert.Archived)
    {
        Export-Certificate -Cert $Cert -FilePath ([String](Join-Path (Join-Path -Path $CertBackupLocation $StoreLocation) $Cert.Thumbprint) + '.crt') -Force
    }
}

Now you have backups of all your public certificates (this doesn't back up your private certs or keys,) so delete whichever ones you feel are unnecessary.

One Way of Exporting Nicer CSVs with Powershell

One of the ever-present conundrums in working with computers is that data that looks good and easily readable to a human, and data that is easy and efficient for a computer to process, are never the same.

In Powershell, you see this "immutable rule" manifest itself in that, despite all the various Format-* cmdlets available to you, some data will just never look good in the console.  And if it looks good in the console, chances are you've mangled the objects so that they've become useless for further processing over the pipeline.  This is essentially one of the Powershell "Gotcha's" espoused by Don Jones, a term that he refers to as "Format Right."  The principal is that if you are going to format your Powershell output with a Format-* cmdlet, you should always do so at the end of the statement (e.g., on the right side.)  The formatting should be the last thing you do in an expression, and you should never try to pass something that has been formatted over the pipeline.

CSV files, in my opinion, are a kind of happy medium, because they are somewhat easy for humans to read (especially if the human has an application like Microsoft Excel or some such,) and CSV files are also relatively easy for computers to read and process.  Therefore, CSVs are a popular format for transporting data and feeding it to computers, while still being legible to humans.

When you use Export-Csv to write a bunch of objects out to a CSV file:

# Get Active Directory groups, their members, and memberships:
Get-ADGroup -Filter * -SearchBase 'CN=Users,DC=domain,DC=local' -Properties Members,MemberOf | `
    Select Name, Members, MemberOf | `
    Export-Csv -NoTypeInformation -Path C:\Users\ryan\Desktop\test.csv 

And those objects contain arrays or lists as properties, you'll get something like this in your CSV file:

"Name","Members","MemberOf"
"MyGroup","Microsoft.ActiveDirectory.Management.ADPropertyValueCollection","Microsoft.ActiveDirectory.Management.ADPropertyValueCollection"

Uh... that is not useful at all.  What's happened is that instead of outputting the contents of the Active Directory group members and memberOf attributes, which are collections/arrays, Powershell has instead output only the names of the .NET types of those collections.

What we need is a way to expand those lists so that they'll go nicely into a CSV file.  So I usually do something like the script excerpt below.  This is just one possible way of doing it; I by no means claim that it's the best way or the only way.

#Get all the AD groups:
$Groups = Get-ADGroup -Filter * -SearchBase 'OU=MyOU,DC=domain,DC=com' -Properties Members,MemberOf

#Create/initialize an empty collection that will contain a collection of objects:
$CSVReadyGroups = @()

#Iterate through each one of the groups:
Foreach ($Group In $Groups)
{
    #Create a new object to hold our "CSV-Ready" version of the group:
    $CSVReadyGroup = New-Object System.Object #Should probably be a PSObject
    #Add some properties to the object.
    $CSVReadyGroup | Add-Member -Type NoteProperty -Name 'Name'     -Value  $Group.Name
    $CSVReadyGroup | Add-Member -Type NoteProperty -Name 'Members'  -Value  $Null
    $CSVReadyGroup | Add-Member -Type NoteProperty -Name 'MemberOf' -Value  $Null

    # If the group has any members, then run the code inside these brackets:
    If ($Group.Members)
    {
        # Poor-man's serialization.
        # We are going to convert the array into a string, with NewLine characters 
        # separating each group member. Could also be more concise just to cast
        # as [String] and do  ($Group.Members -Join [Environment]::NewLine)

        $MembersString = $Null
        Foreach ($GroupMember In $Group.Members)
        {
            $MembersString += $GroupMember + [Environment]::NewLine
        }
        #Trim the one extra newline on the end:
        $MembersString = $MembersString.TrimEnd([Environment]::NewLine)
        #Add to our "CSV-Ready" group object:
        $CSVReadyGroup.Members = $MembersString
    }

    # If the group is a member of any other groups, 
    # then do what we just did for the Members:
    If ($Group.MemberOf)
    {
        $MemberOfString = $Null
        Foreach ($Membership In $Group.MemberOf)
        {
            $MemberOfString += $Membership + [Environment]::NewLine
        }
        $MemberOfString = $MemberOfString.TrimEnd([Environment]::NewLine)
        $CSVReadyGroup.MemberOf = $MemberOfString
    }

    #Add the object we've created to the collection:
    $CSVReadyGroups += $CSVReadyGroup
}

#Output our collection:
$CSVReadyGroups | Export-Csv -NoTypeInformation -Path C:\Users\ryan\Desktop\test.csv

Now you will have a CSV file that has readable arrays in it, that looks good when you open it with an application such as Excel.