Stack Exchange An Early Adopter For SQL 2012?

I just coincidentally happened to catch this on Server Fault a couple days ago. A couple minutes after my comment, he added the bolded edit at the bottom:

serverfault

Ironically, I saw it fit to place this post in both the "IT Pro" and "IT (Not Very) Pro" categories simultaneously. Not sure how you mistake "posting on a public web forum" with "sending a private email."  Oh well. I got a chuckle out of it.

Cisco UCS

Let's talk about Cisco UCS - Unified Computing System.

I help stand up new IT infrastructure all over the world, and I have been seeing a lot more of these lately. It's a pretty impressive system. In most small to mid-size shops you tend to see an onsite server closet or maybe a small cage in a datacenter full of 2 or 3 generations old HP Proliants and Dell Poweredges. But for the largest scale enterprise operations, nothing beats the density and manageability of blades. (And their ability to lock you in with a particular vendor. ;)) Blade systems essentially do for hardware what hypervisors did for operating systems. Not only are you packing more into less and increasing your compute density, you're centralizing the management of your entire datacenter and simplifying the deployment process by orders of magnitude. What do I mean by that? Well have some pictures worth a thousand words:

(Try right-clicking the images and opening in a new tab and you might get a better view.)

Turn this...ucs

... into this.

(The above image is courtesy of dalgeek - knightfoo.wordpress.com.)

Turn this...ucs wiring

... into this.

(I took that picture on the left myself a couple years ago, from a place I used to work.)

Now, when we talk about Cisco UCS, we're actually talking about a few discrete components that come together to form the UCS. First, we have the fabric interconnect. We'll use the Cisco UCS 6120XP 20-Port Fabric Interconnect as an example.

6120xp

It's a specialized 1U 10Gb (ten gigabit) switch that supports up to 160 servers or 20 chassis as a single, seamless system. (And remember each "server" can have dozens of VMs on it.) This particular switch is capable of 520Gbps of throughput. (I keep feeling like I'm making typos when I type numbers that large.)

The next piece is the blade chassis itself. Take the Cisco UCS 5108 Blade Server Chassis for example. This thing is 6 rack units, making the entire solution so far 7U for what could potentially house hundreds of VMs. Those smaller ports on the bottom of the chassis are for power supplies. Note that you can cram either half-width blades or full-width blades into this chassis. A full-width blade would look a little more like the traditional pizza box that we're used to, and has room for more stuff in it obviously, but I think the extra agility offered by half-width blades is probably the reason that they're the only ones I really see out in the wild.

Cisco UCS 5100 series

And lastly we have the blades themselves. Take the Cisco UCS B200 M2, a half-width blade, for example:

UCS Guts *Why yes, that is 192GB of DDR3 RAM, thanks for noticing*

And here's a little artist's depiction of what an entirely fleshed out "Unified Computing System" would look like. Note that you'd probably want some SAN storage somewhere for this to be considered a complete solution, beyond just the couple of disks that you can stick into each blade. I wonder how much storage you could get up there in the top 4 to 6 U of each cabinet...

racks of ucses

Hardware characteristics such as MAC addresses are configured at the chassis slot level, so if a blade fails you can swap in a new blade and not have to reconfigure anything. You can also do things like automatically reboot a host onto another blade if one fails, etc.

And lastly - it is all managed from a single web interface. (I hope you like Java.)

So that all looks pretty amazing, right? There may be a couple of cons to going with Cisco UCS however, and there are alternative blade systems to consider as well. You just have to weigh these pros and cons for yourself and your enterprise's situation. One of these possible cons is cost. The old adage goes that nobody in IT ever got in trouble for buying Cisco. They do make great stuff, but they also make practically the most expensive equipment in existence. Exact pricing is complicated and of course depends on exactly how you configure your equipment, but list price is somewhere in the ballpark of $20,000 per blade. Don't worry though, no one pays list price. Especially if you were to make a huge order like this, Cisco would be expected to have their discount pen at the ready. $10,000 - $12,000 per blade might be a more realistic figure. I count 288 blades in the picture above, putting your budgetary needs at somewhere around $2.9 - $3.46 million USD. (And we still don't have storage or networking yet... but you are well on your way to having one of the densest datacenters in the world.)

In the types of environments that I'm most used to, I see one to two UCS chassis per datacenter, each with two clustered interconnects for redundancy. In contrast, you might decide to go with HP c7000 blade servers filled with BL465c's. I see some of these as well, especially as things like cloud technologies cause aggressive IT expansion and thus the need to do more with your budget. You would almost certainly save a substantial amount of cash if you did go with HP or Dell; however, I think Cisco still has a very compelling price-per-blade as the solution scales out extremely well, and you only pay for the management stack once. (Or twice if you're really reaching for the stars like we did above.)

So in conclusion, I'll just leave you with a couple last things. Here is Cisco's UCS In A Nutshell documentation if I've whet your appetite and you want more information. And here is a Cisco UCS emulator, if you'd like to play around with what it feels like to administer one of these things. And lastly, here are some tutorials to go along with that emulator software.

'Till next time!

A List of NICs, IPs, MACs, Physical Locations, etc.

I'm back, finally.

I was recently challenged with trying to not only enumerate all the network adapters on a system across dozens of different operating system versions and hardware platforms, but also to try to figure out where they are physically in the machine, remotely, without being able to see the actual hardware.

The short answer is you can't.

The long answer is you can't... do it scriptomatically without the assistance of vendor-specific software, such as the HP network configuration software and maybe an API or WBEM queries... but that's only going to cover one specific hardware platform. I need to consistently gather this data across not only Proliants, but Poweredges, VMs, desktop workstations, anything that runs Windows. Windows doesn't know where in space your network adapters are. By that I mean Windows doesn't know which physical port on your 4-port NIC is the third one from the left, etc. This would be why there is seemingly no rhyme or reason as to which network adapter Windows assigns "Local Area Network", "Local Area Network #2", "Local Area Network #3", etc. The installed NICs are enumerated randomly, as evidenced by the fact that you may get different results for which NIC port is assigned to which network connection every time you re-install Windows on a multi-NIC machine. I have heard that some particularly anal administrators even go so far as to install Windows, then delete all the Network Connections that are out of order, and continue removing and letting Windows reinstall them until they are all in the "correct" order. There is also a theory that NIC manufacturers of multi-port NICs should give each port on the card sequential MAC addresses, starting from the port closest to the PCI bus. So you might be able to infer something from that, but that's not something I would put money on for thousands of NICs with dozens of manufacturers.

Furthermore, "NIC teaming" throws yet another wrench into this, as now you can no longer rely on what Windows thinks the MAC address of a teamed adapter is, or what the cabinet switch thinks the MAC address is on a given switch port that has a teamed NIC plugged in to it.

I can get you all the information that Windows does have though, including (apparent) MAC addresses, IPs, and "Location Information" as read from the registry. This is that "Bus 0, Device 8, Function 25" stuff that you might have seen in Device Manager. It might be useful in drawing some correlations, but it's still not going to tell you much about physically where all these NICs are.

So without further ado, here are the scripts. The first one is Powershell. The second one is the exact same but ported to VB Script, for compatibility with older versions of Windows. Note the operating system version check in the VB Script.

Powershell:

$ErrorActionPreference = 'Stop'
$nics = Get-WmiObject Win32_NetworkAdapter
$cfgs = Get-WmiObject Win32_NetworkAdapterConfiguration

Write-Host "`nPhysical NICs In No Particular Order"
Write-Host "------------------------------------`n"
foreach ($_ in $nics)
{
	Try
	{
		if($_.PNPDeviceID.StartsWith('PCI'))
		{
			$registryKey = Get-Item HKLM:\System\CurrentControlSet\Enum\$($_.PNPDeviceID)
			$keyValues   = Get-ItemProperty $registryKey.PSPath
			$regSplit    = $keyValues.LocationInformation.Split(";") 
			$location    = $regSplit[2].Replace('(','').Replace(')','')
			$locSplit    = $location.Split(",")			
			
			Write-Host "Name    : $($_.Name)"
			Write-Host "MAC     : $($_.MACAddress)"
			Write-Host "Location: Bus $($locSplit[0]), Device $($locSplit[1])`, Function $($locSplit[2])"
			$mac = $_.MACAddress
			foreach ($cfg in $cfgs)
			{
				if($cfg.MACAddress -eq $mac -And $cfg.IPAddress)
				{
					Write-Host "IP      : $($cfg.IPAddress)"
				}
			}
			Write-Host " "	
		}	
	}
	Catch {	}
}

VB Script:

Option Explicit
const HKEY_LOCAL_MACHINE = &H80000002

Dim nic, objNICs, objCfgs, objWMIService, objReg, objOSVer
Dim strWMIQuery, strRegistryKey, strValue, strLocInfo, strBus, strDevice, strFunction, strOSMajor
Dim arrSplitKey, arrSplitLoc, arrOSBuild
Dim mac, cfg, ip, v

strWMIQuery = "SELECT * FROM Win32_NetworkAdapter"
Set objWMIService = GetObject("winmgmts:\\.\root\CIMv2")
Set objNICs = objWMIService.ExecQuery(strWMIQuery)
strWMIQuery = "SELECT MACAddress,IPAddress FROM Win32_NetworkAdapterConfiguration"
Set objCfgs = objWMIService.ExecQuery(strWMIQuery)
strWMIQuery = "SELECT Version FROM Win32_OperatingSystem"
Set objOSVer = objWMIService.ExecQuery(strWMIQuery)

For Each v in objOSVer
	arrOSBuild = Split(v.Version,".")
Next

strOSMajor = arrOSBuild(0)

Wscript.Echo "Physical NICs In No Particular Order"
Wscript.Echo "------------------------------------"

For Each nic In objNICs
	If StrComp(Left(nic.PNPDeviceID,3),"PCI",1) = 0 Then
		Set objReg = GetObject("winmgmts:{impersonationLevel=impersonate}!\\.\root\default:StdRegProv")
		strRegistryKey = "System\CurrentControlSet\Enum\" & nic.PNPDeviceID				
		objReg.GetStringValue HKEY_LOCAL_MACHINE,strRegistryKey,"LocationInformation",strValue
		If CInt(strOSMajor) >= 6 Then
			arrSplitKey = Split(strValue,";")
			strLocInfo = arrSplitKey(2)
			strLocInfo = Replace(strLocInfo,"(","")
			strLocInfo = Replace(strLocInfo,")","")
			arrSplitLoc = Split(strLocInfo,",")
		End If
		
		Wscript.Echo "Name    : " & nic.Name
		Wscript.Echo "MAC     : " & nic.MACAddress
		
		If CInt(strOSMajor) >= 6 Then
			Wscript.Echo "Location: Bus " & arrSplitLoc(0) & ", Device " & arrSplitLoc(1) & ", Function " & arrSplitLoc(2)
		Else
			Wscript.Echo "Location: " & strValue
		End If
		
		mac = nic.MACAddress
		For Each cfg In objCfgs
			If StrComp(cfg.MACAddress,mac) = 0 And isNull(cfg.IPAddress) = False Then
				For Each ip In cfg.IPAddress
					Wscript.Echo "IP      : " & ip
				Next				
			End If
		Next
		Wscript.Echo " "
		If isObject(objReg) Then Set objReg = Nothing
	End If
Next

The output looks like this:

The IPs are not shown on the second adapter because it's switched off right now and thus doesn't have any IPs. My first idea for improvement of the Powershell version (I don't invest much time into improving VBS,) is making custom objects out of the output instead of just doing Write-Hosts. The power of Powershell is in its ability to deal with objects, and so you should try to keep everything as objects for as long as possible. Once you've spit it out on the screen in a Write-Host statement for example, you can no longer pass it along the pipeline, etc.

Thanks to Kelvin Wong and Server Fault for helping me research this.

Domain Health Report.ps1

It's been a while since I posted, so I figured I'd show you a little something I whipped out a few days ago. The script is a sort of "domain health report," and it sends out a nicely-formatted email with its findings. I have the script set in a scheduled task to run nightly. Every morning when I wake up, the email is there waiting for me in my inbox. The script uses the Active Directory Powershell module to get a list of all the computer accounts and user accounts in your domain. After displaying some general domain stats, based on the enabled computer accounts that it finds, it then attempts to find information from all of those machines. The information will be highlighted in red and bold if it falls below a certain threshold, e.g. disk space below 10%, an SSI below 7, etc.

So without further ado:

# DomainReport.ps1
# Emails a report of various metrics collected from every computer in the domain.
# This script is intended to be run automatically, on a schedule of once a day or so,
# to let us know how our domain is doing.

[string]$senderName   = "Domain Health Report"
[string]$senderAddr   = "dc1@domain.myotherpcisacloud.com"
[string]$recptName    = "Ryan Ries"
[string]$recptAddr    = "ryanries09@gmail.com"
[string]$emailSubject = "Domain Health Report"
[string]$smtpServer   = "smtp.domain.myotherpcisacloud.com"
[string]$emailBody    = ""
[int]$staleCompAcctDays = 60
[int]$staleUserAcctDays = 60
[int]$diskFreePercentThreshold = 10
[int]$SSIIndexThreshold = 7

Import-Module ActiveDirectory	# It will not hurt if the module is already loaded.

$localhost = Get-Content env:Computername
$domain = Get-ADDomain
$forest = Get-ADForest
$allComputerAccts = Get-ADComputer -Filter * -Properties *
$enabledComputerAccts = Get-ADComputer -Filter 'Enabled -eq $true' -Properties *
$allUserAccts = Get-ADUser -Filter * -Properties *
$enabledUserAccts = Get-ADUser -Filter 'Enabled -eq $true' -Properties *

Function Ping-Server
{
    param($hostName)
    trap
    {
        $false; continue
    }
	$object = New-Object System.Net.NetworkInformation.Ping
	$object.Send($hostName, 2000) #2000ms ping timeout
}

$emailBody += "<FONT STYLE=`"font-size:30px;`">"
$emailBody += "Domain Health Report"            
$emailBody += "</FONT>"
$emailBody += "<FONT STYLE=`"font-size:9px;`">"
$emailBody += "<BR/>Report executed from $localhost at $(Get-Date)<HR/>"
$emailBody += "</FONT>"
$emailBody += "<FONT STYLE=`"font-family:Monospace;font-size:13px`"><BR/>"
$emailBody += "<strong>Forest Root Domain:</strong> $($forest.RootDomain) ($($forest.ForestMode))<BR/>"
$emailBody += "<strong>Current Domain:</strong> $($domain.Name), NetBIOS $($domain.NetBIOSName) ($($domain.DomainMode))<BR/>"
$emailBody += "<BR/>"
$emailBody += "<strong>Domain Controllers:</strong> $($domain.ReplicaDirectoryServers.Count) Writable, $($domain.ReadOnlyReplicaServers.Count) RODCs, $($forest.GlobalCatalogs.Count) Global Catalogs<BR/>"
$emailBody += "<strong>Schema Master:</strong> $($forest.SchemaMaster)<BR/>"
$emailBody += "<strong>Domain Naming Master:</strong> $($forest.DomainNamingMaster)<BR/>"
$emailBody += "<strong>Infrastructure Master:</strong> $($domain.InfrastructureMaster)<BR/>"
$emailBody += "<strong>RID Master:</strong> $($domain.RIDMaster)<BR/>"
$emailBody += "<strong>PDC Emulator:</strong> $($domain.PDCEmulator)<BR/>"
$emailBody += "<strong>Sites:</strong> $($forest.Sites)<BR/>"
$emailBody += "<BR/>"
$emailBody += "<strong>Computer Accounts:</strong> $($allComputerAccts.Count) found, $($enabledComputerAccts.Count) enabled<BR/>"
if($allComputerAccts.Count -gt $enabledComputerAccts.Count)
{
	$emailBody += "<strong>Disabled Computer Accounts:</strong> "
	ForEach($_ in $allComputerAccts)
	{
		if($_.Enabled -eq $false)
		{
			$emailBody += "$($_.CN)`, " 
		}
	}
	$emailBody = $emailBody -Replace "..$" # Trim off the last two characters
	$emailBody += "<BR/>"
}
$emailBody += "<strong>Stale Computer Accounts<sup>*</sup>: </strong> "
ForEach($_ in $allComputerAccts)
{
	if($_.PasswordLastSet -lt $((Get-Date).AddDays(-$($staleCompAcctDays))))
	{
		$emailBody += "$($_.CN)`, "
	}
}
$emailBody += "<BR/><BR/>"
$emailBody += "<strong>User Accounts: </strong> $($allUserAccts.Count) found, $($enabledUserAccts.Count) enabled<BR/>"
if($allUserAccts.Count -gt $enabledUserAccts.Count)
{
	$emailBody += "<strong>Diabled User Accounts:</strong> "
	ForEach($_ in $allUserAccts)
	{
		if($_.Enabled -eq $false)
		{
			$emailBody += "$($_.SAMAccountName)`, "
		}		
	}
	$emailBody = $emailBody -Replace "..$"
	$emailBody += "<BR/>"
}
$emailBody += "<strong>Stale User Accounts<sup>*</sup>: </strong> "
ForEach($_ in $enabledUserAccts)
{
	$lastLogon = [DateTime]::FromFileTime($_.LastLogonTimeStamp)
	if($lastLogon -lt $((Get-Date).AddDays(-$($staleUserAcctDays))))
	{
		$emailBody += "$($_.SAMAccountName)`, "
	}
}
$emailBody += "</FONT><BR/><BR/>"
$emailBody += "<FONT STYLE=`"font-size:9px;`">* A `"stale`" computer account is one that has not updated its machine password with AD in $staleCompAcctDays days."
$emailBody += "<BR/>* A `"stale`" user account is not disabled but has not logged on to the domain in $staleUserAcctDays days."
$emailBody += "</FONT>"
$emailBody += "<HR/><BR/>"

ForEach($_ in $enabledComputerAccts)
{
	$emailBody += "<FONT STYLE=`"font-size:16px;`">"
	$emailBody += "<strong>$($_.CN)</strong> <BR/>"
	$emailBody += "</FONT>"
	$emailBody += "<div style=`"border-width:1px;border-style:solid;margin:2px;padding:2px;`">"
	$emailBody += "<FONT STYLE=`"font-family:Monospace;font-size:13px`">"
	$pingNode = Ping-Server $($_.CN)
	$emailBody += "<strong>Ping:</strong> "
	if($pingNode.Status -ne "Success")
	{
		$emailBody += "<FONT STYLE=`"color:red;`"><strong>NO REPLY!</strong></FONT><BR/>"
	}
	else
	{
		$emailBody += "$($pingNode.RoundTripTime) ms reply from $($pingNode.Address)<BR/>"
	}
	$computerSystem = Get-WmiObject Win32_ComputerSystem -ComputerName $($_.CN)
	$emailBody += "<strong>System: </strong> $($computerSystem.Manufacturer) $($computerSystem.Model)<BR/>"
	$latestStabilityIndex = Get-WmiObject Win32_ReliabilityStabilityMetrics -ComputerName $($_.CN) | Select-Object -First 1 | ForEach {$_.SystemStabilityIndex}
	$emailBody += "<strong>Latest SSI<sup>*</sup>: </strong>"
	if($latestStabilityIndex -gt 0 -and $latestStabilityIndex -le 10)
	{
		if($latestStabilityIndex -lt $SSIIndexThreshold)
		{
			$emailBody += "<FONT STYLE=`"color:red;`"><strong>$latestStabilityIndex<BR/></strong></FONT>"
		}
		else
		{
			$emailBody += "$latestStabilityIndex<BR/>"
		}		
	}
	else
	{
		$emailBody += "<FONT STYLE=`"color:red;`"><strong>NO DATA!</strong></FONT><BR/>"
	}
	
	## Don't want to use $log.Count here because it seems to be implemented inconsistently in the Get-Eventlog cmdlet,
	## e.g. sometimes it is null when it should be zero, and vice versa, and still other times it throws an exception
	## for no matches found.

	$emailBody += "<strong>Application Log Errors Last 24hrs: </strong>"
	Try
	{
		$appLogErrors = Get-EventLog -Log Application -EntryType Error -After $(Get-Date).AddHours(-24) -ComputerName $($_.CN)
		if($appLogErrors -eq $null)
		{
			$emailBody += "0<BR/>"
		}
		else
		{
			## This technique doesn't work if $log is null. $counter goes to 1 when it should stay at 0.
			$counter = 0
			$appLogErrors | ForEach-Object { $counter++ }
			$emailBody += "$counter<BR/>"
		}
	}
	Catch 
	{ 
		$emailBody += "<FONT STYLE=`"color:red;`"><strong>$($_.Exception.Message.ToString())</strong></FONT><BR/>" 
	}
	
	$emailBody += "<strong>System Log Errors Last 24hrs: </strong>"
	Try
	{
		$sysLogErrors = Get-EventLog -Log System -EntryType Error -After $(Get-Date).AddHours(-24) -ComputerName $($_.CN)
		if($sysLogErrors -eq $null)
		{
			$emailBody += "0<BR/>"
		}
		else
		{
			$counter = 0
			$sysLogErrors | ForEach-Object { $counter++ }
			$emailBody += "$counter<BR/>"
		}

	}
	Catch
	{
		$emailBody += "<FONT STYLE=`"color:red;`"><strong>$($_.Exception.Message.ToString())</strong></FONT><BR/>"
	}
	
	$emailBody += "<strong>Security Audit Failures Last 24hrs: </strong>"
	Try
	{
		$secLogErrors = Get-EventLog -Log Security -EntryType FailureAudit -After $(Get-Date).AddHours(-24) -ComputerName $($_.CN)
		if($secLogErrors -eq $null)
		{
			$emailBody += "0<BR/>"
		}
		else
		{
			$counter = 0
			$secLogErrors | ForEach-Object { $counter++ }
			$emailBody += "$counter<BR/>"
		}

	}
	Catch
	{
		$emailBody += "<FONT STYLE=`"color:red;`"><strong>$($_.Exception.Message.ToString())</strong></FONT><BR/>"
	}

	$emailBody += "<strong>Total RAM: </strong>$([math]::Round($computerSystem.TotalPhysicalMemory/1GB,0)) GB <BR/>"
	$emailBody += "<strong>Logical Disks:</strong>"
	$emailBody += "<div style=`"border-width:1px;border-style:dashed;margin:8px;padding:8px;background-color:`#dddddd`">"
	$computer = $($_.CN)
	ForEach($_ in $(Get-WMIObject -Query "SELECT DeviceID FROM Win32_Logicaldisk WHERE DriveType=3" -Computer $computer | ForEach { $_.DeviceID }))
	{
		$logicalDisk = Get-WMIObject -Query "SELECT * FROM Win32_Logicaldisk WHERE DeviceID='$_'" -Computer $computer
		$freespace = [math]::Round($logicalDisk.FreeSpace/1GB,0)
		$totalSize = [math]::Round($logicalDisk.Size/1GB,0)
		if((($freespace/$totalSize)*100) -lt $diskFreePercentThreshold)
		{
			$emailBody += "<strong><FONT STYLE=`"color:red;`">$($logicalDisk.DeviceID) ($($logicalDisk.VolumeName)) $freespace GB free out of $totalSize GB </FONT></strong><BR/>"
		}
		else
		{
			$emailBody += "$($logicalDisk.DeviceID) ($($logicalDisk.VolumeName)) $freespace GB free out of $totalSize GB <BR/>"
		}	
	}
	$emailBody += "</DIV>"
	$emailBody += "</FONT></DIV><BR/><BR/>"
}

$emailBody += "</FONT>"
$emailBody += "<FONT STYLE=`"font-size:9px;`">* SSI = Windows System Stability Index. Configure WMI Reliability Providers across your domain via Group Policy and ensure that the RAC scheduled task is running on the machines in order to gather this data.<BR/>"
$emailBody += "* The Remote Registry service must be running on remote computers in order to gather event log data."
$emailBody += "</FONT>"

Send-MailMessage -From "$senderName <$senderAddr>" -To "$recptName <$recptAddr>" -Subject "$emailSubject" -Body $emailBody -SMTPServer $smtpServer -BodyAsHTML

Enabling Win32_Reliability WMI Classes for Windows Server

I really like the Win32_Reliability classes, Win32_ReliabilityRecords and Win32_ReliabilityStabiltyMetrics. I used one of them in a previous post. They basically hold records of all the useful system events that relate to system configuration and stability, such as unexpected shutdown events, application errors and software installs/uninstalls, etc. To boot, Windows uses all those events to calculate a System Stability Index. Some people might think the SSI is unnecessary, but I personally really like it as a quick at-a-glance number that I can use to give me an idea of overall system health when I have a thousand machines to look at. It's basically an index from 0 to 10 that fluctuates based on the aforementioned system stability events. Machines with an SSI below a certain number need to be looked at more closely, you get the idea.

The difference is in my previous post, I didn't realize that the Win32_Reliability classes are not enabled by default on Windows 2008 R2 servers. On Windows 7 they are enabled by default, and on the one Windows 2008 Server (non-R2) on which I used them, they were functioning, which means that they're either enabled on 2008 Server by default or someone had turned them on previously.

You can, of course, access both these WMI classes in Powershell with the good old Get-WMIObject that we all know and love, like this:

Get-WMIObject win32_reliabilityrecords
Get-WMIObject win32_reliabilitystabilitymetrics

On a Windows 2008 R2 server that does not have these two classes enabled, you will get the error

Get-WmiObject : Provider load failure

whether you are executing the Powershell cmdlet locally or remotely. So as I started to research this problem, it seemed to be a simple matter of enabling the GPO setting "Configure Reliability WMI Providers." (This article from The Scripting Guy is pretty much all you need for that.) So I did that and applied it to all of my servers. And then I waited. I waited for 24 hours. Still nothing. I got onto one of the servers and ran gpupdate /force. Then I waited some more. (Maybe it needs time to gather the data, right?) 24 hours later, nothing. Rebooted the server. Nothing.

OK, that GPO setting is obviously not the only piece of the puzzle here. I researched a little more and The Scripting Guy showed up yet again!

So there is a Scheduled Task named "RacTask" in Scheduled Tasks -> Task Scheduler Library -> Microsoft -> Windows -> RAC. (Make sure you are set to view hidden tasks, just in case.) That task has two triggers - one that only fires when a new Application log event 1007 from Customer Experience Improvement Program shows up, and another that runs indefinitely every hour. On Server 2008 R2, by default, the first trigger is enabled while the latter trigger is disabled. (On client OSes like Win7, both triggers are enabled by default.) So the GPO setting alone would have worked, except that I had not gotten an event ID 1007 from CEIP in three days. Event 1007 from CEIP is "Successfully sent CEIP data to Microsoft." I have only gotten Error 1008s (Failure to send data to Microsoft) in the past three days. I'm choosing that to mean there's something wrong with Microsoft's SQM servers at the moment. Maybe they're down for maintenance or just too busy...

Needless to say, you'd never get event 1007s at all if you opted out of the Customer Experience Improvement Program, in which case simply changing that GPO setting would definitely not be enough. I'm not saying that you have to participate in CEIP on your servers if you want to use the Win32_Reliability monitors. But you do need to enable that second trigger on the scheduled task. Enable the trigger, run the task, and then you'll be able to access the WMI classes immediately, locally and remotely.

$latestStabilityIndex = Get-WmiObject Win32_ReliabilityStabilityMetrics -ComputerName $server | Select-Object -First 1 | ForEach {$_.SystemStabilityIndex}

That's how you kick it off manually. I should note that I received a 1007 (data sent successfully) on one of my servers the next day, which enabled the monitors as expected. (The CEIP uploader is set to attempt to collect and upload data every 19 hours by default.)

So the moral of the story is enabling the GPO setting "Configure Reliability WMI Providers" in the Computer Config -> Administrative Templates area is enough to enable the use of the Win32_Reliability WMI classes on your Win2K8R2 servers if they are participating in CEIP and you are willing to wait until they are able to successfully upload CEIP data, which could take one to several days. Otherwise, you're going to have to find a way to also kick off that scheduled task on all your servers, be it manually or scriptomatically.

I don't feel like this was altogether implemented that well in that regard. I do like the reliability data, but I don't feel like it should be related to or dependent on CEIP events at all. Also, while trying to come up with hypothetical ways to automate the enabling of this so that I wouldn't have to log on to every server:


Come on Microsoft, get it together!