Enabling Win32_Reliability WMI Classes for Windows Server

I really like the Win32_Reliability classes, Win32_ReliabilityRecords and Win32_ReliabilityStabiltyMetrics. I used one of them in a previous post. They basically hold records of all the useful system events that relate to system configuration and stability, such as unexpected shutdown events, application errors and software installs/uninstalls, etc. To boot, Windows uses all those events to calculate a System Stability Index. Some people might think the SSI is unnecessary, but I personally really like it as a quick at-a-glance number that I can use to give me an idea of overall system health when I have a thousand machines to look at. It's basically an index from 0 to 10 that fluctuates based on the aforementioned system stability events. Machines with an SSI below a certain number need to be looked at more closely, you get the idea.

The difference is in my previous post, I didn't realize that the Win32_Reliability classes are not enabled by default on Windows 2008 R2 servers. On Windows 7 they are enabled by default, and on the one Windows 2008 Server (non-R2) on which I used them, they were functioning, which means that they're either enabled on 2008 Server by default or someone had turned them on previously.

You can, of course, access both these WMI classes in Powershell with the good old Get-WMIObject that we all know and love, like this:

Get-WMIObject win32_reliabilityrecords
Get-WMIObject win32_reliabilitystabilitymetrics

On a Windows 2008 R2 server that does not have these two classes enabled, you will get the error

Get-WmiObject : Provider load failure

whether you are executing the Powershell cmdlet locally or remotely. So as I started to research this problem, it seemed to be a simple matter of enabling the GPO setting "Configure Reliability WMI Providers." (This article from The Scripting Guy is pretty much all you need for that.) So I did that and applied it to all of my servers. And then I waited. I waited for 24 hours. Still nothing. I got onto one of the servers and ran gpupdate /force. Then I waited some more. (Maybe it needs time to gather the data, right?) 24 hours later, nothing. Rebooted the server. Nothing.

OK, that GPO setting is obviously not the only piece of the puzzle here. I researched a little more and The Scripting Guy showed up yet again!

So there is a Scheduled Task named "RacTask" in Scheduled Tasks -> Task Scheduler Library -> Microsoft -> Windows -> RAC. (Make sure you are set to view hidden tasks, just in case.) That task has two triggers - one that only fires when a new Application log event 1007 from Customer Experience Improvement Program shows up, and another that runs indefinitely every hour. On Server 2008 R2, by default, the first trigger is enabled while the latter trigger is disabled. (On client OSes like Win7, both triggers are enabled by default.) So the GPO setting alone would have worked, except that I had not gotten an event ID 1007 from CEIP in three days. Event 1007 from CEIP is "Successfully sent CEIP data to Microsoft." I have only gotten Error 1008s (Failure to send data to Microsoft) in the past three days. I'm choosing that to mean there's something wrong with Microsoft's SQM servers at the moment. Maybe they're down for maintenance or just too busy...

Needless to say, you'd never get event 1007s at all if you opted out of the Customer Experience Improvement Program, in which case simply changing that GPO setting would definitely not be enough. I'm not saying that you have to participate in CEIP on your servers if you want to use the Win32_Reliability monitors. But you do need to enable that second trigger on the scheduled task. Enable the trigger, run the task, and then you'll be able to access the WMI classes immediately, locally and remotely.

$latestStabilityIndex = Get-WmiObject Win32_ReliabilityStabilityMetrics -ComputerName $server | Select-Object -First 1 | ForEach {$_.SystemStabilityIndex}

That's how you kick it off manually. I should note that I received a 1007 (data sent successfully) on one of my servers the next day, which enabled the monitors as expected. (The CEIP uploader is set to attempt to collect and upload data every 19 hours by default.)

So the moral of the story is enabling the GPO setting "Configure Reliability WMI Providers" in the Computer Config -> Administrative Templates area is enough to enable the use of the Win32_Reliability WMI classes on your Win2K8R2 servers if they are participating in CEIP and you are willing to wait until they are able to successfully upload CEIP data, which could take one to several days. Otherwise, you're going to have to find a way to also kick off that scheduled task on all your servers, be it manually or scriptomatically.

I don't feel like this was altogether implemented that well in that regard. I do like the reliability data, but I don't feel like it should be related to or dependent on CEIP events at all. Also, while trying to come up with hypothetical ways to automate the enabling of this so that I wouldn't have to log on to every server:


Come on Microsoft, get it together!

Monitoring with Windows Remote Management (WinRM) and Powershell Part I

Hey guys. I should have called this post "Monitoring with Windows Remote Management (WinRM), and Powershell, and maybe a Certificate Services tutorial too," but then the title would have definitely been too long. In any case, I poured many hours of effort and research into this one. Lots of trial and error. And whether it helps anyone else or not, I definitely bettered myself through the creation of this post.

I'm pretty excited about this topic. This foray into WinRM and Powershell Remoting was sparked by a conversation I had with a coworker the other day. He's a senior Unix engineer, so he obviously enjoys *nix and when presented with a problem, naturally he approaches it with the mindset of someone very familiar with and ready to use Unix/Linux tools.

I'm the opposite of that - I feel like Microsoft is the rightful king of the enterprise and usually approach problems with Windows-based solutions already in mind. But what's important is that we're both geeks and we'll both still happily delve into either realm when it presents an interesting problem that needs solving. There's a mutual respect there, even though we don't play with the same toys.

The Unix engineer wants to monitor all the systems using SNMP because it's tried and true and it's been around forever, and it doesn't require an agent or expensive third-party software. SNMP wasn't very secure or feature-rich at first so now they're on SNMPv3. Then there's WBEM. Certain vendors like HP have their own implementations of WBEM. I guess Microsoft wasn't in love with either and so decided to go their own way, as Microsoft is wont to do, hence why you won't find an out of the box implementation of SNMPv3 from Microsoft.

One nice thing about SNMP though, is that it uses one static, predictable port.

In large enterprise IT infrastructures, you're likely to see dozens of sites, hundreds (if not thousands,) of subnets, sprinklings of Windows and Unix devices all commingled together... and you can't swing a dead cat without hitting a firewall which may or may not have some draconian port restrictions on it. Furthermore, in a big enterprise you're likely to see the kind of bureaucracy and separation of internal organizations such that server infrastructure guys can't just go and reconfigure firewalls on their own, network guys can't just make changes without running it by a "change advisory board" first, and it all basically just makes you want to pull your hair out while you wait... and wait, and wait some more. You just want to be able to communicate with your other systems, wherever they are.

Which brings us to WinRM and Powershell Remoting. WinRM, a component of Windows Hardware Management, is Microsoft's implementation of the multi-platform, industry-standard WS-Management protocol. (Like WMI is Microsoft's implementation of WBEM. Getting tired of the acronym soup yet? We're just getting started. You might also want to review WMI Architecture.) I used WinRM in a previous post, but only used the "quickconfig" option. Seems like most people rarely go any deeper than the quickconfig parameter.

Here's an excerpt from a Technet doc:

"WinRM is Microsoft's implementation of the WS-Management protocol, a standard Simple Object Access Protocol (SOAP)-based, firewall-friendly protocol that enables hardware and operating systems from different vendors to interoperate. You can think of WinRM as the server side and WinRS the client side of WS-Management."

I bolded the phrase that especially made my ears perk up. You see, Windows has a long history with things like RPC and DCOM. Those protocols have been instrumental in many awesome distributed systems and tool sets throughout Microsoft's history. But it just so happens that these protocols are also probably the most complex, and most firewall unfriendly protocols around. It's extremely fortuitous then that Ned over at AskDS just happened to write up a magnificent explication of Microsoft RPC. (Open that link in a background tab and read it after you're done here.)

Here's the thing - what if I want to remotely monitor or interact with a machine in another country, or create a distributed system that spans continents? There are dozens of patchwork networks between the systems. Each packet between the systems traverses firewall after firewall. Suddenly, protocols such as RPC are out the window. How am I supposed to get every firewall owner from here to Timbuktu to let my RPC and/or DCOM traffic through?

That's why monitoring applications like SCOM or NetIQ AppManager require the installation of agents on the machines. They collect the data locally and then ship it to a central management server using just one or two static ports. Well, they do other more complex stuff too that requires software be installed on the machine, but that's beside the point.

Alright, enough talk. Let's get to work on gathering performance metrics remotely from a Windows server. There are a few scenarios to test here. One is communications within the boundaries of an Active Directory domain, and the other is communications with an external, non-domain machine. Then, exploring SSL authentication and encryption.

The first thing you need to do is set up and configure the WinRM service. One important thing to remember is that just starting the WinRM service isn't enough - you still have to explicitly create a listener. In addition, like most things SSL, it requires a certificate to properly authenticate and encrypt data. Run: 

winrm get winrm/config

to see the existing default WinRM configuration:

WinRM originally used ports 80 for HTTP and 443 for HTTPS. With Win7 and 2k8R2, it has changed to use ports 5985 and 5986 respectively. But those are just defaults and you can change the listener(s) back to the old ports if you want. Or any port for that matter. Run:

winrm enumerate winrm/config/listener

to list the WinRM listeners that are running. You should get nothing, because we haven't configured any listeners yet. WinRM over SSL will not work with a self-signed certificate. It has to be legit. From support.microsoft.com:

"WinRM HTTPS requires a local computer "Server Authentication" certificate with a CN matching the hostname, that is not expired, revoked, or self-signed to be installed."

To set up a WinRM listener on your machine, you can run

winrm quickconfig

or

winrm quickconfig -transport:HTTPS

or even

winrm create winrm/config/listener?Address=*+Transport=HTTPS @{Port="443"}

Use "set" instead of "create" if you want to modify an existing listener. The @{} bit at the end is called a hash table and can be used to pass multiple values. The WinRM.cmd command line tool is actually just a wrapper for winrm.vbs, a VB script. The quickconfig stuff just runs some script that configures and starts the listener, starts and sets the WinRM service to automatic, and creates some Windows Firewall exceptions. What is more is that Powershell has many cmdlets that use WinRM, and the entire concept of Powershell Remoting uses WinRM. So now that you know the fundamentals of WinRM and what's going on in the background, let's move ahead into using Powershell. In fact, you can emulate all of the same behavior of "winrm quickconfig" by instead running 

Configure-SMRemoting.ps1

from within Powershell to set up the WinRM service. Now from another machine, fire up Powershell and try to use the WinRM service you just set up:

$dc01 = New-PSSession -ComputerName DC01
Invoke-Command -Session $dc01 -ScriptBlock { gwmi win32_computersystem }

Returns:

You just pulled some data remotely using WinRM! The difference between using a "session" in Powershell, and simply executing cmdlets using the -ComputerName parameter, is that a session persists such that you can run multiple different sets of commands that all share the same data. If you try to run New-PSSession to connect to a computer on which you have not configured the WinRM service, you will get a nasty red error. You can also run a command on many machines simultaneously, etc. Hell, it's Powershell. You can do anything.

Alright so that was simple, but that's because we were operating within the safe boundaries of our Active Directory domain and all the authentication was done in the background. What about monitoring a standalone machine, such as SERVER1?

My first test machine:

  • Hostname: SERVER1 
  • IP: 192.168.1.10 
  • OS: Windows 2008 R2 SP1, fully patched, Windows Firewall is on
  • It's not a member of any domain

First things first: Launch Powershell on SERVER1. Run:

Set-ExecutionPolicy Unrestricted

Then set up your WinRM service and listener by running

Configure-SMRemoting.ps1

and following the prompts. If the WinRM server (SERVER1) is not in your forest (it's not) or otherwise can't use Kerberos, then HTTPS/SSL must be used, or the destination machine must be added to the TrustedHosts configuration setting. Let's try the latter first. On your client, add the WinRM server to the "Trusted Hosts" list:

We just authenticated and successfully created a remote session to SERVER1 using the Negotiate protocol! Negotiate is basically "use Kerberos if possible, fall back to NTLM if not." So the credentials are passed via NTLM, which is not clear text, but it's not awesome either. You can find a description of the rest of the authentication methods here, about halfway down the page, if you need a refresher.

Edit 1/29/2012: It should be noted that even within a domain, for Kerberos authentication to work when using WinRM, an SPN for the service must be registered in AD. As an example, you can find all of the "WSMAN" SPNs currently registered in your forest with this command:

setspn -T yourForest -F -Q WSMAN/*

SPN creation for this should have been taken care of automatically, but you know something is wrong (and Kerberos will not be used) if there is no WSMAN SPN for the device that is hosting the WinRM service.

OK, I am pooped. Time to take a break. Next time in Part II, we're going to focus on setting up SSL certificates to implement some real security to wrap up this experiment!

Auditing Active Directory Inactive Users with Powershell and Other Cool Stuff

Hello again, fellow wanderers.

I was having a hell of a comment spam problem here for a couple days... hope I didn't accidentally delete any legitimate comments in the chaos. (Read this excellent comment left on my last DNS post.) Then I realized that I might ought to change the challenge question and response for my simple captcha from its default... I guess the spammers have the old "5+5=" question figured out. :P

A few years ago, I made my own simple captcha for another blog that was along the lines of x + y = ? using PHP, but x and y were randomly generated at each page load. Worked really well. The simple captcha that comes boxed with BlogEngine.NET here is static. Being able to load a random question and answer pair from a pool of questions would be a definite enhancement.

Anyway, since we're still on the topic of auditing Active Directory, I've got another one for you: Auditing "inactive" user accounts.

I had a persnickety customer that wanted to be kept abreast of all AD user accounts that had not logged on in exactly 25 days or more. As soon as one delves into this problem, one might realize that a command-line command such as dsquery user -inactive x will display users that are considered inactive for x number of weeks, but not days. I immediately suspected that there must be a reason for that lack of precision, as I knew that any sort of computer geek/engineer that wrote the dsquery utility would not have purposely left out that measure of granularity unless there was a good reason for it.

So what defines an "inactive" user? A user that has not logged on to his or her user account in a period of time. There is an AD attribute on each user called LastLogonTimeStamp. After a little research, I stumbled across this post, where it is explained that the LastLogonTimeStamp attribute is not terribly accurate - i.e., off by more than a week. Now that dsquery switch makes a lot more sense. I conjecture that the LastLogonTimeStamp attribute is inaccurate because Microsoft had to make a choice when designing Active Directory - either have that attribute updated every single time a user account is logged on to and thus amplify domain replication traffic and work for the DCs, or have it only updated periodically and save the replication load.

To further complicate matters, there is an Active Directory Powershell cmdlet called Search-ADAccount that, when it returns users, it reports a LastLogonDate attribute. As it turns out, LastLogonDate is not even a real attribute, but rather that particular Powershell cmdlet's mechanism for translating LastLogonTimeStamp into a more human-readable form. (a .NET DateTime object.)

Next, there is another AD attribute - msDS-LogonTimeSyncInterval - that you can dial down to a minimum of 1 day, and that will have replication of the users' LastLogonTimeStamp attribute updated much more frequently and thus make it more accurate. Of course, this comes at the expense of additional load on the DCs and replication traffic. This may be negligible in a small domain, but may have a significant impact on a large domain.

*ADSI Edit*

Lastly, your other options for being able to accurately track the last logon time of users as close to "real-time" as possible involve scanning the security logs or attributes on all of your domain controllers and doing some heavy parsing. This is where event forwarding and subscriptions would really shine. See my previous post for details. I don't know about you guys, but all that sounds like a nightmare to me. Being able to track inactive user accounts to within 1 day is just going to have to suffice for now.

So we made the decision to decrease the msDS-LogonTimeSyncInterval, and I wrote this nifty Powershell script to give us the good stuff. Each major chunk of code is almost identical but with a minor tweak that represents the different use cases if given different parameters. Reading the comments toward the top on the five parameters will give you a clear picture of how the script works:

# ADUserAccountAudit.ps1
# Writen by Ryan Ries on Jan 19 2012
# Requires the AD Powershell Module which is on 2k8R2 DCs and systems with RSAT installed.
#
# Locates "inactive" AD user accounts. Note that LastLogonTimeStamp is not terribly accurate.
# Accounts that have never been logged into will show up as having a LastLogonTimeStamp of some time
# around 1600 AD - 81 years after the death of Leonardo da Vinci.
# This is because even though their LastLogonTimeStamp attribute is null, we cast it to a DateTime object
# regardless, which converts null inputs into a minimum date, apparently.
#
# For specific use with NetIQ AppManager, put this script on the agent machine at 
# C:\Program Files (x86)\NetIQ\AppManager\bin\Powershell (for 64 bit Windows. Just "Program Files" if 32 bit Windows.)

Param([string]$DN = "dc=corpdom,dc=local",         # LDAP distinguished name for domain
      [string]$domainName = "Corpdom",             # This can be whatever you want it to be
      [int]$inactiveDays = 25,                     # Users that have not logged on in this number of days will appear on this report
      [bool]$includeDisabledAccounts = $false,     # Setting this to true will include accounts that are already disabled in the report as well
      [bool]$includeNoLastLogonAccounts = $false)  # Setting this to true will include accounts that have never been logged into and thus have no LastLogonTimeStamp attribute.

# First, load the Active Directory module if it is not already loaded
$ADmodule = Get-Module | Where-Object { $_.Name -eq "activedirectory" } | Foreach { $_.Name }
if($ADmodule -ne "activedirectory")
{
   Import-Module ActiveDirectory
}

if($includeDisabledAccounts -eq $false)
{
   if($includeNoLastLogonAccounts -eq $false)
   {
      Write-Host "Enabled users that have not logged into $domainName in $inactiveDays days`r`nExcluding accounts that have never been logged into`r`nAccounts younger than $inactiveDays days not shown.`r`n-------------------------------------------------------"
      Search-ADAccount -UsersOnly -SearchBase "$DN" -AccountInactive -TimeSpan $inactiveDays`.00:00:00 | 
      Where-Object {$_.Enabled -eq $true -And $_.LastLogonDate -ne $null } |
      Get-ADUser -Properties Name, sAMAccountName, givenName, sn, lastLogonTimestamp, Enabled, WhenCreated |
      Where-Object {$_.WhenCreated -lt (Get-Date).AddDays(-$($inactiveDays)) } |
      Select sAMAccountName, givenName, sn, @{n="LastLogonTimeStamp";e={[DateTime]::FromFileTime($_.LastLogonTimestamp)}}, Enabled, WhenCreated |
      Sort-Object LastLogonTimeStamp |
      Format-Table   
   }
   else
   {
      Write-Host "Enabled users that have not logged into $domainName in $inactiveDays days`r`nIncluding accounts that have never been logged into`r`nAccounts younger than $inactiveDays days not shown.`r`n-------------------------------------------------------"
      Search-ADAccount -UsersOnly -SearchBase "$DN" -AccountInactive -TimeSpan $inactiveDays`.00:00:00 | 
      Where-Object {$_.Enabled -eq $true } |
      Get-ADUser -Properties Name, sAMAccountName, givenName, sn, lastLogonTimestamp, Enabled, WhenCreated |
      Where-Object {$_.WhenCreated -lt (Get-Date).AddDays(-$($inactiveDays)) } |
      Select sAMAccountName, givenName, sn, @{n="LastLogonTimeStamp";e={[DateTime]::FromFileTime($_.LastLogonTimestamp)}}, Enabled, WhenCreated |
      Sort-Object LastLogonTimeStamp |
      Format-Table 
   }
 
}
else
{
   if($includeNoLastLogonAccounts -eq $false)
   {
      Write-Host "All users that have not logged into $domainName in $inactiveDays days`r`nExcluding accounts that have never been logged into`r`nAccounts younger than $inactiveDays days not shown.`r`n------------------------------------------------------"   
      Search-ADAccount -UsersOnly -SearchBase "$DN" -AccountInactive -TimeSpan $inactiveDays`.00:00:00 |
      Where-Object { $_.LastLogonDate -ne $null } |
      Get-ADUser -Properties Name, sAMAccountName, givenName, sn, lastLogonTimestamp, Enabled, WhenCreated |
      Where-Object { $_.WhenCreated -lt (Get-Date).AddDays(-$($inactiveDays)) } |
      Select sAMAccountName, givenName, sn, @{n="LastLogonTimeStamp";e={[DateTime]::FromFileTime($_.lastlogontimestamp)}}, Enabled, WhenCreated |
      Sort-Object LastLogonTimeStamp |
      Format-Table   
   }
   else
   {
      Write-Host "All users that have not logged into $domainName in $inactiveDays days`r`nIncluding accounts that have never been logged into`r`nAccounts younger than $inactiveDays days not shown.`r`n------------------------------------------------------"   
      Search-ADAccount -UsersOnly -SearchBase "$DN" -AccountInactive -TimeSpan $inactiveDays`.00:00:00 |
      Get-ADUser -Properties Name, sAMAccountName, givenName, sn, lastLogonTimestamp, Enabled, WhenCreated |
      Where-Object {$_.WhenCreated -lt (Get-Date).AddDays(-$($inactiveDays)) } |
      Select sAMAccountName, givenName, sn, @{n="LastLogonTimeStamp";e={[DateTime]::FromFileTime($_.lastlogontimestamp)}}, Enabled, WhenCreated |
      Sort-Object LastLogonTimeStamp |
      Format-Table   
   }
}

So there you have it, a quick and dirty report to locate users that have been inactive for over x days. Accounts that were just created and not logged on to yet would have a LastLogonTimeStamp of null and would therefore show up in this report, so I threw the Where-Object {$_.WhenCreated -lt (Get-Date).AddDays(-$($inactiveDays)) } bit in there to exclude in any case the user accounts that were younger than the specified number of days required to consider an account "inactive." Furthermore, you might want to resist the urge just now to go a step further and programmatically disable inactive user accounts. Most organizations use service accounts and other special accounts that may not get logged into very often, and yet, all hell would break loose if you disabled them. I'm considering a system that disables the accounts, but also reads in a list of accounts which are "immune" and would therefore be ignored by the program. For a future post I guess.

Lastly, I want to thank Ned of the AskDS blog, without whom this post would not have been possible. (Now it sounds like a Grammy speech...) But seriously, I asked him about this stuff and he knew all the answers right away. Helped me out immeasurably on this.

DNS 101: Round Robin (Or Back When I was Young And Foolish Part II)

I learned something today. It's something that made me feel stupid for not knowing. Something that seemed elemental and trivial - yet, I did not know it. So please, allow me to relay my somewhat embarrassing learning experience in the hopes that it will save someone else from the same embarrassment.

I did know what DNS round robin was. Or at least, I would have said that I did.

Imagine you configure DNS1, as a DNS server, to use round robin. Then, you create 3 host (A or AAAA) records for the same host name, using different IPs. Let's say we create the following A records on DNS1:

server01 - A 10.0.0.4
server01 - A 10.0.0.5
server01 - A 10.0.0.6

Then on a workstation which is configured to use DNS1 as a DNS server, you ping server01. You receive 10.0.0.4 as a reply. You ping server01 again. With no hesitation, you get a reply from 10.0.0.4 again. We assume that your local workstation has cached 10.0.0.4 locally and will reuse that IP for server01 until the entry either expires, or we flush the DNS cache on the workstation with a command like ipconfig/flushdns.

I run ipconfig/flushdns. Then I ping server01 again.

This time I receive a response from 10.0.0.5. Now I assume DNS round robin is working perfectly. I go home for the day feeling like I know everything there is to know about DNS.

But was it that the DNS server is responding to DNS queries with the single next A/AAAA record that it has on file, in a round-robin type sequential fashion to every DNS query that it receives? That is what I assumed.

But the fact of the matter is that DNS servers, when queried for a host name, actually return a list of all A/AAAA records associated with that host name, every time that host name is queried for. (To a point - the list must fit within a UDP packet, and some firewalls/filters don't let UDP packets longer than 512 bytes through. That's changing though. Our idea of how big data is and should be allowed to be is always growing.)

I assume that www.google.com, being one of the busiest websites in the world, has not only some global load balancing and other advanced load balancing techniques employed, but probably also has more than one host record associated with it. To test my theory, I fire up Wireshark and start a packet capture. I then flush my local DNS cache with ipconfig/flushdns and then ping www.google.com.

Notice how I pinged it, got one IP address in response (.148), then flushed my DNS cache, pinged it again and got another different IP address (.144)? But despite what it may look like, that name server is not returning just one A/AAAA record each time I query it:


*Click for Larger*

My workstation is ::9. My workstation's DNS server is ::1. The DNS server is configured to forward DNS requests for zones for which it is not authoritative on to yet another DNS server. So I ask for www.google.com, my DNS server doesn't know, so it forwards the request. The forwardee finally finds out and reports back to my DNS server, which in turn relays back to me a list of all the A records for www.google.com. I get a long list containing not only a mess of A records, but a CNAME thrown in there too, all from a single DNS query! (We're not worried about the subsequent query made for an AAAA record right now. Another post perhaps.)

I was able to replicate this same behavior in a sanitary lab environment running a Windows DNS server and confirmed the same behavior. (Using the server01 example I mentioned earlier.)

Where round robin comes in is that it rotates the order of the list given to each subsequent client who requests it. Keep in mind that while round robin-ing the A records in your DNS replies does supply a primitive form of load distribution, it's a pretty poor substitute for real load balancing, since if one of the nodes in the list goes down, the DNS server will be none the wiser and will continue handing out the list with the downed node's IP address on it.

Lastly, since we know that our client is receiving an entire list of A records for host names which have many IP addresses, what does it actually do with the list?  Well, the ping utility doesn't do much. If the first IP address on the list is down, you get a destination unreachable message and that's it. (Leading to a lot of people not realizing they have a whole list of IPs they could try.) Web browsers however, have a nifty feature known as "browser retry" or "client retry," where they will continue trying the other IPs in the list until they find a working one. Then they will cache the working IP address so that the user does not continue to experience the same delay in web page loading as they did the first time. Yes, there are exploits concerning this feature, and yes it's probably a bad idea to rely on this since browser retry is implemented differently across every different browser and operating system. It's a relatively new mechanism actually, and people may not believe you if you tell them. To prove it to them, find (or create) a host name which has several bad IPs and one or two good ones. Now telnet to that hostname. Even telnet (a modern version from a modern operating system) will use getaddrinfo() instead of gethostbyname() and if it fails to connect the first IP, you can watch it continue trying the next IPs in the list.

More info here, here and here. That last link is an MSDN doc on getaddrinfo(). Notice that it does talk about different implementations on different operating systems, and that ppResult is "a pointer to a linked list of one or more addrinfo structures that contains response information about the host."

Using Powershell to Monitor Windows Reliability Data

There's always a lot of talk about monitoring when stuff gets installed and uninstalled on a Windows machine, or when "configuration changes" take place on a system, or even when unplanned reboots (crashes) take place... how do we audit that? As awesome as the Windows event logs are, they can be a bit unwieldy to sift through all the noise and cryptic messages.

There are lots of third-party tools for auditing software changes. Those tools can cost a lot of money. But did you know Windows already does this for you? If you run perfmon /rel on your Vista/7/2008/R2 machine, you will be greeted with this pretty picture:

Notice that you can even export all the data as a nice little XML file. So that's pretty neat. You can see all the application crashes, system crashes, when software was installed and uninstalled, etc... but that's all GUI stuff. I know what really you want is something more programmatic, customizable and automatable. It just so happens that there's a WMI class called Win32_ReliabilityRecords. Let's use Powershell to take a peek:

# Looking at Windows software installations and uninstallations and other reliability data
# Ryan Ries, Jan 5 2012
#
# Usage: .\ReliabilityData.ps1 <argument>
# Valid arguments are "ShowAll", "ShowSystemCrashes", "ShowWhateverYourImaginationIsTheLimit", ...
# Arguments are not case sensitive.

param([parameter(Mandatory=$true)]
      [string]$Argument)

Function WMIDateStringToDateTime([String] $strWmiDate) 
{ 
    $strWmiDate.Trim() > $null 
    $iYear   = [Int32]::Parse($strWmiDate.SubString( 0, 4)) 
    $iMonth  = [Int32]::Parse($strWmiDate.SubString( 4, 2)) 
    $iDay    = [Int32]::Parse($strWmiDate.SubString( 6, 2)) 
    $iHour   = [Int32]::Parse($strWmiDate.SubString( 8, 2)) 
    $iMinute = [Int32]::Parse($strWmiDate.SubString(10, 2)) 
    $iSecond = [Int32]::Parse($strWmiDate.SubString(12, 2)) 
    $iMicroseconds = [Int32]::Parse($strWmiDate.Substring(15, 6)) 
    $iMilliseconds = $iMicroseconds / 1000 
    $iUtcOffsetMinutes = [Int32]::Parse($strWmiDate.Substring(21, 4)) 
    if ( $iUtcOffsetMinutes -ne 0 ) 
    { 
        $dtkind = [DateTimeKind]::Local 
    } 
    else 
    { 
        $dtkind = [DateTimeKind]::Utc 
    } 
    return New-Object -TypeName DateTime -ArgumentList $iYear, $iMonth, $iDay, $iHour, $iMinute, $iSecond, $iMilliseconds, $dtkind 
} 

If($Argument -eq "ShowAll")
{
       $reliabilityData = Get-WmiObject Win32_ReliabilityRecords
       ForEach ($entry in $reliabilityData)
       {
              Write-Host "Computer Name: " $entry.ComputerName
              Write-Host "Event ID:      " $entry.EventIdentifier
              Write-Host "Record Number: " $entry.RecordNumber
              Write-Host "Date and Time: " $(WMIDateStringToDateTime($entry.TimeGenerated))
              Write-Host "Source:        " $entry.SourceName
              Write-Host "Product Name:  " $entry.ProductName
              Write-Host "User:          " $entry.User
              Write-Host "Message:       " $entry.Message
              Write-Host " "
       }
}

If($Argument -eq "ShowSystemCrashes")
{
       $reliabilityData = Get-WmiObject Win32_ReliabilityRecords
       ForEach ($entry in $reliabilityData)
       {
              If($entry.Message.StartsWith("The previous system shutdown") -And $entry.Message.EndsWith("was unexpected."))
              {
                     Write-Host "Computer Name: " $entry.ComputerName
                     Write-Host "Event ID:      " $entry.EventIdentifier
                     Write-Host "Record Number: " $entry.RecordNumber
                     Write-Host "Date and Time: " $(WMIDateStringToDateTime($entry.TimeGenerated))
                     Write-Host "Source:        " $entry.SourceName
                     Write-Host "Product Name:  " $entry.ProductName
                     Write-Host "User:          " $entry.User
                     Write-Host "Message:       " $entry.Message
                     Write-Host " "             
              }
       }
}

If($Argument -eq "ShowApplicationInstalls")
{
       $reliabilityData = Get-WmiObject Win32_ReliabilityRecords
       ForEach ($entry in $reliabilityData)
       {
              If($entry.Message.StartsWith("Windows Installer installed the product."))
              {
                     Write-Host "Computer Name: " $entry.ComputerName
                     Write-Host "Event ID:      " $entry.EventIdentifier
                     Write-Host "Record Number: " $entry.RecordNumber
                     Write-Host "Date and Time: " $(WMIDateStringToDateTime($entry.TimeGenerated))
                     Write-Host "Source:        " $entry.SourceName
                     Write-Host "Product Name:  " $entry.ProductName
                     Write-Host "User:          " $entry.User
                     Write-Host "Message:       " $entry.Message
                     Write-Host " "             
              }
       }
}

So those are just some ideas that I threw together, but is by no means a complete solution. Use that as a starting point, play with the script, expand on it and make it even better! And one last thing, ideally I should not be using Write-Host here but instead be preserving the objects, that way I could combine this script with other commandlets on the pipeline, etc. I'll put that in as an enhancement request...