Powershell: Get Content Faster with ReadCount!

Do you use Powershell?  Do you use Get-Content in Powershell to read files?  Do you sometimes work with large text files?

If you answered yes to any of the questions above, then read on - this post is for you!

I have a very simple tip that I used today in a script I was writing.  Thought I'd share.

Let's say you have a large text file, such as a packet log from a DNS server that you're debugging.  It might be 300 megabytes and millions of lines.  I was writing a script to parse the file and collect some statistics that I was after.

$LogFile = Get-Content $FileName
ForEach($_ In $LogFile)
{
    Do-Stuff
}

When I ran this script against a 52MB text file, the script executed in about 22 seconds.  When I ran the script on a 150MB text file, Powershell proceeded to consume over 3GB of RAM within a few seconds, the script never finished, and after bringing my laptop (Win7 x64, 4GB RAM, 4CPU, PS v3, .NET 4.5) to a crawl for about 5 minutes, Powershell just gave up and returned to the prompt without outputting anything.  I guess it was some sort of memory leak.  But come on... a 150MB file is not even that big...

So I started looking through the help for Get-Content, and it turns out there's an easy workaround:

$LogFile = Get-Content $FileName -ReadCount 0
ForEach($_ In $LogFile)
{
    Do-Stuff
}

The -ReadCount parameter specifies how many lines of content are sent through the pipeline at a time. The default is 1. A value of 0 sends all of the content through at one time.

Now when I run the script against the 52MB file, it completes in 2.8 seconds, and when I run it on the 150MB text file, it finishes in 10.2 seconds!

Comments (3) -

But then why bother with the foreach, right?

technet.microsoft.com/en-us/library/hh849787.aspx
"Specifies how many lines of content are sent through the pipeline at a time. The default value is 1. A value of 0 (zero) sends all of the content at one time."

So try something like...

gc temp.txt | % {"@@" + $_} #                               "@@" before every line
gc temp.txt -ReadCount 3 | % {"@@" + $_} #       "@@" every third line
gc temp.txt -ReadCount 0 | % {"@@" + $_}         "@@" once

rufwork, you are right. Thank you for the clarification.

gc temp.txt -ReadCount 3 | %{$_ | % {"@@" + $_}} #       "@@" before every line
Works faster?

Comments are closed