Passive-Aggressive Configuration Management FTW

I was doing a little deep investigation of the Windows DNS Cache service today, and I discovered that the process checks for the existence of this registry value upon startup:

HKLM\System\CurrentControlSet\services\Dnscache\Parameters\DowncaseSpnCauseApiOwnerIsTooLazy

Needless to say, the configuration setting doesn't appear to be publicly documented.  Whatever it does though, I sense some latent hostility toward some API owner. It wouldn't be the first time Microsoft has let Registry settings with silly names slip through the cracks.

Thread Quantum

Edit October 26, 2014: The commenter known as dirbase corrected me on some errors I made in this post, which I will now be correcting.  Thank you very much for the constructive feedback, dirbase. :) 

Still being a bit lazy about the blog -- I've been busy reading and working, both of which have longer quantums than writing for this blog, apparently.

Basically I just wanted to take a moment out of this Sunday afternoon to discuss thread quantums.  This entire post is inspired by Windows Internals, 6th Ed. by Mark Russinovich, et al.  As always, I could not recommend the book more highly.

So, we've all seen this screen, right?  Adjust processor scheduling for best performance of "Programs" or "Background services:"

Advanced System Properties

Well that seems like a very simple, straightforward choice... but who knows what it actually means?  To answer that question, we need to know about a basic mechanism that Windows uses: Quantum.

A thread quantum is the amount of time that a thread is allowed to run until Windows checks if there is another thread at the same priority waiting for its chance to run.  If there are no other threads of the same priority waiting to run, then the thread is allowed to run for another quantum.

Process Priority is that attribute that you can set on a process in Task Manager or in Process Explorer by right-clicking a process and choosing its priority.  Even though it's the threads that actually "run" and not processes per se, each process can have many dynamically-lived threads, so Windows allows you to set a priority per process, and in turn each thread of that process inherits its base priority from its parent process. (Threads actually have two priority attributes, a base and a current priority.  Scheduling decisions are made based on the thread's current priority.)

There are 32 process priority levels, 0-31, that are often given simplified labels such as "Normal," "Above Normal," "Real time," etc.  Those are all within the subset of 0-1 on the Interrupt Request Level (IRQL) scale.  What this means is that if you set a process to run at "Real Time" - or the highest possible priority - the process and its threads will still not have the ability to preempt or block hardware interrupts, but it could delay and even block the execution of important system threads, not to mention all other code running at Passive level.  That is why you should have a very good reason for setting a process to such a high priority.  Doing so has the ability to affect system-wide stability.

So now, back to quantum.  We now know its definition, but how long exactly is a quantum?  That depends on your hardware clock resolution (not to be confused with timer expiration frequency,) the speed of your processor, and how you have configured that setting pictured above to optimize performance for "Programs" or "Background services."  As of Windows 7 and Windows Server 2008 R2, clients are configured to let threads run for 2 clock intervals before another scheduling decision is made, while it's 12 clock intervals on servers. So when you change that setting on the Performance Options page, you are bouncing back and forth between those two values.

The reasoning for the longer default quantums on Server operating systems is to minimize context switching, and that if a process on a server is woken up, with a longer quantum it will have a better chance of completing the request and going back to sleep without being interrupted in between.  On the other hand, shorter quantums can make things seem "snappier" on your desktop, leading to a better experience for desktop OSes.

As I said before, the resolution of your system clock is a factor in determining how long a quantum really is.  In contemporary x86 and x64 processors, this is usually 15.6 milliseconds and it's set by the HAL, not the OS kernel.  You can see what it's set to for yourself by using a kernel debugger and examining KeMaximumIncrement:

KeMaximumIncrement

Reading the bytes right to left, if you convert 02, 62, 5A to decimal, you will get 156250, which represents about 15.6ms. Don't confuse this value with timer expiration frequency/interval.  The two values are related, but different.

There are a couple of different ways to obtain timer expiration frequency/interval.  One way is with clockres.exe from Sysinternals:

clockres.exe

Notice that the maximum timer interval is the familiar 15.6 milliseconds, which is also my hardware clock interval.  But my current timer interval is 1ms.  Programs running on your system can request system-wide changes to this timer, which is what has happened here.  You can use powercfg.exe -energy to run a power efficiency analysis of your system that will identify processes that have made such requests to increase the resolution of the system timer.  When timer expirations fire at a higher frequency, that causes the system to use more energy, which can be of significant concern on laptops and mobile devices that run on battery power.  In my case, it's usually Google Chrome that asks that the system timer resolution be increased from its default of 15.6ms.  But remember that even when this timer interval changes, it doesn't change the length of thread quantums, as thread quantum calculation is done using the max or base clock interval.

When Windows boots up, it uses the above KeMaximumIncrement value, in seconds, and multiplies it by the processor speed in Hertz, divides it by 3 and stores the result in KiCyclesPerClockQuantum:

KiCyclesPerClockQuantum

Converted to decimal, that is 17151040 CPU cycles.

The other factor in determining the length of a quantum is base processor frequency.  You can obtain this value in several different ways, including using the !cpuinfo command in the debugger: 

!cpuinfo

3.293 GHz is the base frequency of my processor, even though a command such as Powershell's

$(Get-WMIObject Win32_Processor).MaxClockSpeed

would report 3.801 GHz as the maximum frequency. This is a slightly overclocked Intel i5-2500k.  Now that we have those two pieces of information, all that's left is some good old fashioned arithmetic:

The CPU completes 3,293,000,000 cycles per second, and the max timer interval, as well as the hardware clock resolution, is 15.6 ms.  3293000000 * 0.0156 = 51370800 CPU cycles per clock interval.

1 Quantum Unit = 1/3 (one third) of a clock interval, therefore 1 Quantum Unit = 17123600 CPU cycles.

This is only a tiny, rounding error amount off from the value that is stored in KiCyclesPerClockQuantum.

Assuming that at a rate of 3.293GHz, each CPU cycle is 304 picoseconds, that works out to 5.2 milliseconds per quantum unit.  Since my PC is configured for thread quantums of 2 clock intervals, and each clock interval is 3 quantum units, that means my PC is making a thread scheduling decision about every 31 milliseconds.

Now there's one final complication to this, and that is by using the "Programs" performance setting as opposed to the "Background services" setting, you are also enabling variable length quantums.  Whereas a typically configured server will use fixed-length, 12 clock-interval quantums... but I'll leave off here and if you're interested in knowing more about variable length quantums, I would suggest the book I mentioned at the beginning of this post.

Windows Portable Executable (PE) Diagram

This one falls squarely under the category of "Windows Internals."  I ran across this sweet diagram today that dissects the Windows Portable Executable format.  It would make an awesome poster, in my opinion.

The original link is here, and I have mirrored the PDF version here:

PE101-v1.pdf (382.25 kb) - Credit goes to Ange Albertini - corkami.com.

Windows Internals 6th Edition, and a Bonus Powershell Script

I started reading Windows Internals, 6th Edition about a week ago. In case you don't know, it was authored by Mark Russinovich, David Solomon and Alex Ionescu. It's been great so far, packed full of ridiculously detailed technical information on how the Windows operating system works at its most fundamental level. And there is no one on the planet who knows more about that very topic than those three guys. Weighing in at about 750 very dense pages - and that's just part 1 - it's not for the faint of heart. But if you do have the fortitude and desire to consume this kind of material, you'll be rewarded with being able to explain to people what the differences between the Kernel and the Executive are, how to examine the Kernel Processor Control Block in Windbg, etc.  Good stuff.

Now, context switch:

I wrote this little Powershell script a few days ago to help me automate some SQL stuff.  I realize that there are already other ways to do distributed SQL queries and so I'm sort of reinventing the wheel here, but hey... now it's in Powershell. Automation-ready and no Management Studio required.

<#
.NOTES
	Name  : Execute-DistributedSQLQuery.ps1
	Author: Ryan Ries
	Email : ryanries09@gmail.com
	Date  : June 07, 2012	

.LINK	
	http://www.myotherpcisacloud.com

.SYNOPSIS
	This script executes a SQL query across multiple SQL servers as defined
	either on the command line or in a file.

.DESCRIPTION
	This script executes a SQL query across multiple SQL servers as defined
	either on the command line or in a file. Use the -Servers parameter to
	define multiple SQL servers on the command line. Alternatively, use the
	-File parameter to specify a text file of SQL servers, one per line.
	Use the NonQuery switch if your SQL statement is not a SELECT-style
	query, but a stored procedure or other operation. If Username and 
	Password is specified, then SQL authentication will be used. Otherwise,
	SSPI will be used. If you want to specify a different database for each
	server, use a ! between the server name and the DB name. (On either
	the command line or in a file.) Otherwise, "master" will be the default
	database and you must specify the desired database name as part of
	your query.
	
	Use Get-Help <script> -Full for examples and more info.

.EXAMPLE
	.\Execute-DistributedSQLQuery.ps1 -Servers SQLSERVER01,SQLSERVER02 -Query "SELECT * FROM DB.dbo.Inventory"
	Queries the Inventory table in the DB database on both SQLSERVER01 and SQLSERVER02. Uses SSPI authentication.
.EXAMPLE
	.\Execute-DistributedSQLQuery.ps1 -File servers.txt -Query "SELECT * FROM DB.dbo.Inventory"
	Runs identical queries on each server found in servers.txt.
.EXAMPLE
	.\Execute-DistributedSQLQuery.ps1 -File svrs.txt -Query "SELECT * FROM Inv" -Username ryan -Password xyz
	By specifying a username and password, the authentication method is changed from SSPI to SQL authentication.
.EXAMPLE
	.\Execute-DistributedSQLQuery.ps1 -File servers.txt -Query "EXEC Clear_Inventory" -NonQuery
	Use the -NonQuery switch if executing a SQL statement that is not a SELECT query.
.EXAMPLE
	.\Execute-DistributedSQLQuery.ps1 -Servers SVR01!DB1,SVR02!MgtDB -Query "SELECT * FROM Inv"
	You can specify a separate database on each server by separating the server\instance name and the database
	name with an exclamation mark. This is useful if you want to run an identical query on multiple SQL
	servers with differently-named databases. The exclamation mark syntax works both on the command line
	and in a file.
.EXAMPLE
	.\Execute-DistributedSQLQuery.ps1 -Servers SVR01,SVR02 -Query "SELECT * FROM DB.dbo.Inv"
	Remember that if no database is specified by using an exclamation mark, the master database
	will be selected by default, so to run a query on a different database on the server, you must
	specify that in your query.
#>
Param([Parameter(Mandatory=$false)][String[]]$Servers,
      [Parameter(Mandatory=$false)][ValidateScript({Test-Path $_ -PathType Leaf})][String]$File,
	  [Parameter(Mandatory=$false)][String]$Username,
	  [Parameter(Mandatory=$false)][String]$Password,
	  [Parameter(Mandatory=$true)] [String]$Query,
	  [Parameter(Mandatory=$false)][Switch]$NonQuery)

If(($Servers -And $File) -Or (!$Servers -And !$File))
{
	Throw "You must specify either -Servers or -File; not both, not neither."
}
If(($Username -And !$Password) -Or (!$Username -And $Password))
{
	Throw "You need to specify both Username and Password if using SQL authentication."
}

If($File) {	$Servers = Get-Content $File }

ForEach ($_ in $Servers)
{
	If($_.Split("!").Count -gt 1)
	{
		If($_.Split("!").Count -gt 2)
		{
			Throw "Error parsing Server.DB name. Did you use too many exclamation marks?"
		}
		$Instance = $_.Split("!")[0]
		$DB = $_.Split("!")[1]
	}
	Else
	{
		$Instance = $_
		$DB = "master"
	}
	
	If($Username)
	{
		$ConnectionString = "server=$Instance;database=$DB;user=$Username;password=$Password"
	}
	Else
	{	
		$ConnectionString = "server=$Instance;database=$DB;Integrated Security=SSPI"
	}	
	
	If($NonQuery)
	{
		$SQLConnection = New-Object System.Data.SqlClient.SqlConnection $ConnectionString
		$SQLConnection.Open()
		$SQLCommand = $SQLConnection.CreateCommand()   
		$SQLCommand.CommandText = $Query
		$rdr = $SQLCommand.ExecuteNonQuery();
		$SQLConnection.Close()
	}
	Else
	{
		$DataAdapter = New-Object System.Data.SqlClient.SqlDataAdapter ($Query, $ConnectionString)
		$DataTable = New-Object System.Data.DataTable
		$DataAdapter.Fill($DataTable) | Out-Null
		$DataTable | Out-GridView -Title "$Instance      DB: $DB"
	}	
}

Ighashgpu and Cracking NTLM Hashes

Neat Fact of the Day:  Given an NTLM hash, the video card in my PC can attempt 1.5 - 2 Billion password attempts per second to crack that hash with brute-force, versus 9 million passwords per second with my CPU.

Video Card: Nvidia GTX 560 Ti (384 cores) using ighashgpu.

CPU: Intel i5 2500k @ 3.5GHz (4 cores)

Lesson learned: CUDA is freaking awesome. I can crack the NTLM hash of a 10-character password consisting only of digits in under five seconds.