NT Debugging

I'm not talking about the NT Debugging blog.  This is one of my personal experiences with NT debugging.

A couple weeks ago, I was looking at a Windows VM that was apparently crashing on a somewhat regular basis.  Through the use of usual logfile analysis techniques we can get some correlations and some probable causes.  In this particular case it was plainly evident that the system was working perfectly until some 3rd party software was loaded.  Then the regular unexpected shutdowns began, about once every day or two.

The correlation was found through the use of the Reliability and Performance Monitor, which is a very nifty tool: 

A "stop," or "bugcheck" we are all familiar with.  It produces a memory dump file in %SystemRoot%\MEMORY.DMP and other "minidumps" in %SystemRoot%\Minidump\ unless otherwise configured.  It's pretty much, well, a dump of everything that the system had in memory when the offense took place.

But do we really know what to do with an NT memory dump?  I have to say I didn't really, and I was a little embarrassed about it.  So I set out to figure out what useful information I could really glean from that memory dump. Having that extra bit of tenacity to really dig down deep and identify with greater precision what the root of the problem is, rather than just saying "well it's some sort of software compatibility problem, better reformat the hard drive!" can help you out in your quest to be the office guru. 

Well it turns out there's a nice utility called Windbg.exe. You can get it from the Windows SDK.  To effectively debug an application, you need debugging symbols. Fortunately, Microsoft has provided some public debugging symbols on their own symbol server.  I hear that Microsoft also has a private symbol tree for internal use only, but we'll just have to settle for the public ones.

Here's a KB  that will help you get Windbg and your symbols path set up correctly. 

Now that you have that configured, simply drag the memory dump file into the Windbg window, and it will tell you with much greater certainty exactly what driver and/or interaction caused the BSOD.

One of the interesting things that Windbg can reveal, is that sometimes drivers installed by crashy software still get loaded even after the software has been uninstalled. And if all that machine code-looking stuff seems scary, Windbg also outputs the simple line: "Probably caused by: driver.sys" that can at least give you a lead.

There are also other dump file analyizers, such as WhoCrashed, that may be more to your liking.

And lastly, be careful about sharing your memory dumps, as they might contain password hashes.

Back When I Was Young and Foolish (Part I of Oh-So-Many)

I'd like to do a little reminiscing now, of a time when I was young and foolish. I anticipate many such posts, which I guess means I spend a lot of time being young and foolish.

It was my first real IT job, and it'd be fair to call me a sysadmin. I was basically just an assistant to the lead architect. I was doing things like setting up backup schedules in DPM and creating user accounts in AD. Just the usual stuff you'd expect a guy first getting started in IT would be doing.

One day, my boss and I were discussing one of the many issues caused by being in the middle of a domain migration, and that was that we currently had employees working in two different Windows domains simultaneously. For various reasons, there would not and could not be a trust relationship established between the two. To further complicate matters, we now needed two separate ISA (now known as TMG) servers at the office, one for each domain, so as to keep the employees off of Facebook and ESPN.com. One might suggest that it was a social problem that should be dealt with by the employees' managers, but the managers were among the worst offenders. But I digress.

Now since both of these domains are on the same subnet, we can really only have one DHCP server. So if we only have one DHCP server, how are we going to push out two different sets of options to the clients, such as which proxy server to use, to members of either domain?

We settled on using two different DHCP classes.

So while we were fiddling around with the DHCP server, for no good reason we decided that right now was a great time to screw around with the basic settings of a DHCP server that had been serving us faithfully with no problems for years.

What is a good DHCP lease duration, really? Is it the default of 8 days? Well I suppose that depends on a lot of factors, such as how mobile your DHCP clients are and how often you expect them to be connecting to and disconnecting from the network, how many DHCP addresses you have to distribute, etc..  But I've already put more thought into it just now than we did on that day, for we almost completely arbitrarily set the DHCP lease duration to 30 minutes.

Fast forward a few months.

Everything had been working great and the office DHCP server, as DHCP servers should be, was all but forgotten about. I had just been offered a much higher-paying job with a much larger IT company, so I was now a short-timer at this job. Then, out of the blue, our primary domain controller (the one hosting all the FSMOs) at our remote datacenter goes dark.  I can still ILO into it, but both of its network adapters are just gone. No error messages in the logs, no sign that Windows had ever known that it did once have network connections, nothing.  It just looked like it had never had network adapters in the first place.  (This still perturbs me to this day. Why wasn't there an event in either the Windows logs or in the IML that said "Uhh dude where did your network adapters go!?")

After seizing the FSMO roles on the backup DC, we took a trip out to the datacenter to have a look. Sure enough, the link lights on the physical hardware were out, and upon rebooting, the BIOS of the machine didn't even recognize that it had ever had network adapters in it. So I had HP come out and replace the motherboard, which fixed the issue. I don't know what the point of having redundant NICs is if they're wired to both blow at the same time, but I digress again...

So now I have essentially a brand new server out of what used to be our primary domain controller. The question, my boss wanted to know, was how do we restore a former DC after its FSMO roles have been seized? My answer: You don't. You never bring it back. Ever. If I don't wipe the hard drive right now, it thinks it's still the owner of those FSMO roles, and the only way for it to learn that it no longer owns those FSMO roles is to connect it back to the domain network and let the KCC do its magic.  What's going to happen in that span of time that we now have two DCs in the domain that both simultaneously think that they own all the FSMOs? I didn't want to find out. So I argued that we just rebuild it and I eventually got my way.

Now since I had to rebuild the machine, I figured now was as good a time as any to upgrade it from Windows 2008 to Windows 2008 R2. Having spent the last several months studying my ass off for the MCITP tests and passing them, I was feeling pretty self-confident. I took the freshly rebuilt server back out to the datacenter. I re-cabled it and re-racked it, and powered it on. I grabbed the 2008 R2 version of adprep.exe and on the other domain controller, used it to run adprep /forestprep followed by adprep /domainprep. The domain was now ready to accept a 2008 R2 domain controller. I dcpromo'ed the new machine, and it went without a hitch. I then gracefully transferred the FSMOs back from the other DC. (I didn't alter the actual functional levels of the forest or domain.)

Now I had one new, fresh shiny 2008 R2 domain controller in the datacenter, with 5 other DCs in 2 other sites that were still on Windows 2008. Still high off my earlier success, I felt like this was a great time to do an in-place upgrade on the rest of those DCs.  One by one, I upgraded them from 2008 to 2008 R2, and you know what? I was actually pleasantly surprised at how smooth and painless it was. My hat goes off to Microsoft for making an in-place upgrade of a production domain controller so seamless.  I just remoted into the server, mounted the 2008 R2 media and hit "Upgrade."  That's all there was to it. The installation took about 30 minutes, and when the machine came back up, it was a pristine 2008 R2 box.

Everything was going great, until I got around to upgrading the last of the 6 DCs. That sixth DC was the DHCP server at the office. I followed the same procedure that had gone off without a hitch 5 times already. Confident that all was well, I set off to browse reddit while the DC upgrade finished. I was just thinking about how the upgrade was going kind of slow, when suddenly my Internet connection dropped. At the exact same instant, the phone rang.

Me: "Hello?"
Boss: "What did you do?"
Me: "Wha, I don't, I ..."
Boss: "RUN to the server room and set up a new DHCP scope. Now."

The entire office had just lost their Internet connections. In the intervening months, I had completely forgotten that we had set the DHCP lease duration on the domain controller to... oh, just a hair shorter than the amount of time it takes to complete an in-place upgrade on that domain controller. Had the lease time been just 5 minutes longer, the DC/DHCP server would have finished the upgrade and been back up serving renewals as if nothing had ever happened. By the time I had gotten the replacement scope set up on the other DC and done the dance of de-authorizing the first DC as as a DHCP server in Active Directory, and authorizing the new one, the original DC was back up and ready.

After about 30 minutes of repairing the damage, I sheepishly emerged from the server room, to the sound of ironic applause coming from all the employees in their cubicles.

myotherpcisacloud.com Visio - Work in Progress

Here's a Visio diagram of my current infrastructure.  You're reading this webpage right now from it.  It's admittedly not very good as far as Visio diagrams go, which is why I'm calling it a work in progress.

Rack.vsd (217.50 kb)

If you don't have Visio installed, you can get a free viewer off the web.

Edit: Contributor Sam Powers converted it to PDF for me here:

MOPCIAC.pdf (145.61 kb)

More Pictures with AD: thumbnailPhoto

In this post I showed you how to automatically update the default "user tile" or logon picture in your domain and force everyone to use it via GPO.

Every user account object in Active Directory has a thumbnailPhoto attribute.  This attribute isn't entirely simple to modify.  That may be because Microsoft discourages AD administrators from bloating their Active Directory databases with pictures, but that's just a guess.  By default, users do have the permission to edit this attribute on their own user account.

The thumbnailPhoto attribute isn't shown on a user's Start Menu or on the logon prompt when you try to RDP to a server.  The thumbnailPhoto attribute is only used by certain applications, such as Outlook 2010 or OCS/Lync.  Each new picture that is added to a user account will increase the size of the Active Directory database and will have to be replicated.  So keep your pictures as small as possible.

The dimensions, format and filesize of the picture you want to use are not very strict.  I was able to use both a .bmp and a .jpg with no issues.  Of course in the end I prefer the .jpg because it's much smaller in size.  An easy way to modify this attribute is through Powershell.  The nice thing about this method is that you could do it programmatically through a script, making large batch operations easier.

Import-Module ActiveDirectory
$photo = [byte[]](Get-Content C:\photo.jpg -Encoding byte)
Set-ADUser user -Replace @{thumbnailPhoto=$photo}

You need the Active Directory module for PowerShell, which comes along with the RSAT.  What the script is doing is first converting the .jpg picture into a string of bytes and storing it to a variable, and then replacing the thumbnailPhoto attribute of the specified user's account object with that string of bytes.

Now just let that data replicate and you will soon see your new picture in Outlook, Office Communicator, Sharepoint, etc.!

Thanks to Oddvar for the hint.