Windows XP, RDC7, Trusted Publishers, and You

Someone asked me for some help yesterday with a problem they were having at work. At their company they use Windows XP workstations, a 2003 Active Directory infrastructure, and *.rdp files that the employees use to establish remote connections to other servers. XP was pretty nice when it came out, but today it's old and just not exciting anymore. Same goes with Server 2003. I mean Windows 7 and 2008 R2 are both several years old by now and definitely proven technologies... but still, upgrading to a modern OS seems to be at the bottom of almost every company's list. Desktop admins around the globe are still puttering about supporting employees on WinXP, and server admins all over the world are still logging on to Server 2003 (or worse!) servers.

With the release of Windows 7 came the new Remote Desktop Client 7, which adds some nice new features, and supports the new and interesting Group Policies that come with a 2008+ Active Directory. One such Group Policy is "Specify SHA1 thumbprints of certificates representing trusted .rdp publishers." Enabling this setting allows you, as the administrator, to specify a list of SHA1 hashes that represent certificates that are from what are considered trusted sources.  When the recipient of this policy launches an *.rdp file, and it's signed by a trusted certificate whose hash is on the list, the user will not get prompted with a warning. When you locate this setting in the (2008 and above) GPO editor, it plainly states that this policy is for "Windows Vista SP1 and above." The thing is, you can install RDC7 on Windows XP.

Here's the rest of the detail on the GPO settings from Technet: http://technet.microsoft.com/en-us/library/cc771261(WS.10).aspx.

Furthermore, a signed *.rdp file will have these two lines at the end:

signscope:s:Full Address,Server
signature:s:THISISANSHA1THUMBPRINT

The problem is that the aforementioned Group Policy setting doesn't exist on 2003 Domain Controllers.

Nevertheless, the effect of the newer 2008 policy should still work since we've installed the new RDC7 client on the Windows XP machines. In theory. We just have to figure out how to deploy it. As it turns out, you can just navigate yourself to this registry key on the client:

HKEY_CURRENT_USER\Software\Microsoft\Terminal Server Client\PublisherBypassList

In Windows XP the PublisherBypassList key might not exist. Create it! Your SHA1 hashes go there as 32-bit dwords, no spaces, all caps. (This could be done in either HKLM or HKCU. The hashes in HKCU are just added onto the ones loaded from HKLM... just like the description of the GPO setting says.)

So even though you don't have that GPO Setting in Server 2003 like you do in 2008, you can push generic registry modifications such as this out to clients, thereby achieving the same effect.

And it works!

Replacing Task Manager with Process Explorer

Alright, everyone's back from the holidays, and I for one am ready to get my nose back to the grindstone!

In this post, I want to talk about a fairly recent discovery for me: Mark Russinovich's Process Explorer, not to be confused with his Process Monitor. Process Explorer has been around for years and is still being kept current so the fact that I had never really used it before now is a bit of an embarrassment for me, but I'm a total convert now and won't go without it from now on. Hopefully I'll be able to convert someone else with this post.

First, there were two videos that I watched recently that were instrumental in convincing me to keep Process Explorer permanently in my toolbox. It's a two-part series of talks given by Russinovich himself about the "Mysteries of Windows Memory Management." The videos are a little on the technical side, but they're extremely detailed and in-depth and if you're interested in hearing one of the top NT gurus in the world explicate the finer intricacies of how Windows uses physical and virtual memory, then you need to watch these videos. They're quite long, so you may want to save them for later:

Part 1
Part 2

One of the prevailing themes in the videos is that Russinovich doesn't seem to care much for the traditional Task Manager. We all know and love taskmgr and the three-fingered salute required to bring it up. (The three-fingered salute, CTRL+ALT+DEL, is officially referred to as the Secure Attention Sequence. Some free trivia for you.) He explains how some of the labels in Task Manager - especially the ones concerning memory usage - are a bit misleading and/or inaccurate. (What is memory free versus memory available?) He then shows us how he uses Process Explorer in lieu of Task Manager, which gives us a much clearer and more accurate (and cooler looking) picture of all the processes running on the machine, the memory that they're using, the ways in which they're using it, the handles, DLLs and files the processes are using, and so much more.

It's basically better than the regular Windows Task Manager in every way... and the best part? You can easily "replace" Task Manager with it such that when you hit Ctrl+Alt+Del and choose to bring up the "Task Manager," Process Explorer actually launches instead!

Awesome, right? Process Explorer provides an enormous wealth of information where the vanilla Task Manager falls short. Part of me wants to post more screen shots of this program to show you more examples of what you can see and do with Process Explorer, but those videos by Russinovich himself do a better job of showing off exactly how the program works and what all of it means than I can. In the videos, you'll learn what a Working Set is, Private Bytes, Bytes Committed, what a Hard Fault is and how it differs from a Soft Fault, etc.

And not to mention that as an added bonus, you can use this tool to troubleshoot the age-old conundrum of "what process is holding this file open so that I'm unable to delete it! Waaah!"

Needless to say, that if you ever hit Ctrl+Alt+Del on one of my machines and hit Start Task Manager, Process Explorer is going to show up instead.

The Page File

How tired is this topic? I don't want to be "reheating boring leftovers," as Ned puts it, but maybe this post will help finally put it to bed for anyone still wondering.

Ever since I became an "NT geek" a little over 15 years ago, there has always seemingly been so much mystery and intrigue shrouding "the page file," also known as the "swap file."  Windows does not really do "swapping" anymore, so I will only refer to it as a page file from here on out. Still to this day, in 2011, people commonly ask me about what size their page file should be. "Does my page file need to be this big?  What happens if I shrink it?  Can I put it on another logical drive? Can I disable it?"  And naturally, the Web can be a cesspool of misinformation which only serves to further confuse people and cause them to rely on "rules of thumb" for which there's no true reasoning or sense behind them. To add to this mess, the exact ways in which Windows has used the page file have changed slightly over the years as Windows has evolved and the average amount of RAM in our machines has increased. (*cough* Superfetch *cough*)

I'm focusing on Windows Vista and later here. (Win7, Server 2008, R2, etc.)

Just follow the "Advanced"s to get there.

 

 

First I want to clear something up: Forget any "rules" you have ever heard about how the page file should be 1.5x or 2.0x or 3.14159x the amount of RAM you have.  Any such formula is basically useless and doesn't scale to modern systems with different amounts of RAM. Get your vestigial 20th century old wives tales out of my Windows.

Alright, sorry about that. I'm just really tired of hearing those rules of thumb about page file sizing. The only part about this that sort of embarrasses me is that Windows itself still uses a formula like this if you choose to let Windows manage your page file for you. Older versions of Windows use this formula to choose the page file size for you:

System Memory Minimum Page File Maximum Page File
< 1GB 1.5 * RAM 3 * RAM
> = 1GB 1 * RAM 3 * RAM

Windows 7 and 2008 R2 set the paging file size to the amount of RAM + 300MB.

Furthermore, the page file is dynamic by default and can expand on demand. And for the most part that still works just fine for the average home PC user.  But if you have a server with 192GB of RAM, Windows will by default create for you a nice, fat 192GB page file.  Our "rule of thumb" now looks absurd.

No, you do not need (or want) a 192GB page file gobbling up space on your nice 15k SAS drives.  The only reason you might ever want such a huge page file is only if you're interested in generating full crash dumps (full memory dumps) when the machine crashes. (You need a page file that is the size of RAM plus a couple hundred megabytes for this.) In later versions of Windows you can use a dedicated crash dump file to store memory dumps, further decreasing the need for a huge page file.  Also, you'll still get minidumps that can contain useful information about system crashes even if your page file is too small to support a full memory dump.

Other types of crash dumps are explained here.

The truth is the more RAM you have, the less page file you need - with a couple of stipulations.

The answer to "how big should the page file be?" is "just big enough so that you don't run out of commitable memory." The amount of memory that your system can commit is equal to your RAM + page file. You can put your system under a heavy workload and use Performance Monitor to see how much RAM you're using. You can also use Process Explorer by Mark Russinovich and watch the peak memory value. Just make sure that you have enough RAM + page file that you can support that peak memory usage at all times, or else your page file will be forced to expand if it can, and if your page file can't expand then your application might hang, or crash, or any number of systemic weirdnesses will occur.

Unfortunately, Superfetch complicates this test because it goes around in the background, pre-loading things into memory from disk all the time, so it'll always make it seem like you're running low on memory. In reality, a lot of that is Superfetch putting your memory to work, and it's actually really good at reducing disk seeks on PCs.  But Superfetch is turned off by default if Windows was installed on an SSD (because seek times are near 0 on an SSD anyway,) and it's also disabled in Windows Server. 

Also keep in mind that certain applications such as MSSQL and LSASS have their own memory management systems that can operate outside the purview of the NT Memory Management system, which can lead to things like those applications hogging up more than their fair share of memory.

 

Process Explorer by Mark Russinovich

On the other hand, don't disable the page file completely, even if you have 192GB of RAM.  (You should be able to move it off of the system drive with no ill-effects though, unless you're dealing with a very poorly written application.) Windows can run perfectly fine with no page file if you have plenty of RAM. However, some applications may not. Some third-party applications, and even some MSFT applications like AD DS and SQL may be simply assuming that there is a page file, and if there isn't one, it may result in unpredictable results. (Read: Hard to troubleshoot headaches.)

On the other other hand, keep in mind that having a huge pagefile (like 192GB) will affect system performance if your system is actually having to use that page file a lot.  Just having a large file on your disk won't affect performance by virtue of it just being there, but if your system is having to manipulate that file by pushing memory pages to it all the time it will. (And if you find yourself having to push memory pages to a huge file on disk very often, you obviously needed more RAM a long time ago.)

Lastly - yes, there are some "page file optimization" techniques that still apply, such as striping the page file across multiple spindles, setting a static size to eliminate file system fragmentation, etc. However, with RAM being as inexpensive as it is these days, your main concern should be minimizing having to touch the page file at all anyway.

NT Debugging

I'm not talking about the NT Debugging blog.  This is one of my personal experiences with NT debugging.

A couple weeks ago, I was looking at a Windows VM that was apparently crashing on a somewhat regular basis.  Through the use of usual logfile analysis techniques we can get some correlations and some probable causes.  In this particular case it was plainly evident that the system was working perfectly until some 3rd party software was loaded.  Then the regular unexpected shutdowns began, about once every day or two.

The correlation was found through the use of the Reliability and Performance Monitor, which is a very nifty tool: 

A "stop," or "bugcheck" we are all familiar with.  It produces a memory dump file in %SystemRoot%\MEMORY.DMP and other "minidumps" in %SystemRoot%\Minidump\ unless otherwise configured.  It's pretty much, well, a dump of everything that the system had in memory when the offense took place.

But do we really know what to do with an NT memory dump?  I have to say I didn't really, and I was a little embarrassed about it.  So I set out to figure out what useful information I could really glean from that memory dump. Having that extra bit of tenacity to really dig down deep and identify with greater precision what the root of the problem is, rather than just saying "well it's some sort of software compatibility problem, better reformat the hard drive!" can help you out in your quest to be the office guru. 

Well it turns out there's a nice utility called Windbg.exe. You can get it from the Windows SDK.  To effectively debug an application, you need debugging symbols. Fortunately, Microsoft has provided some public debugging symbols on their own symbol server.  I hear that Microsoft also has a private symbol tree for internal use only, but we'll just have to settle for the public ones.

Here's a KB  that will help you get Windbg and your symbols path set up correctly. 

Now that you have that configured, simply drag the memory dump file into the Windbg window, and it will tell you with much greater certainty exactly what driver and/or interaction caused the BSOD.

One of the interesting things that Windbg can reveal, is that sometimes drivers installed by crashy software still get loaded even after the software has been uninstalled. And if all that machine code-looking stuff seems scary, Windbg also outputs the simple line: "Probably caused by: driver.sys" that can at least give you a lead.

There are also other dump file analyizers, such as WhoCrashed, that may be more to your liking.

And lastly, be careful about sharing your memory dumps, as they might contain password hashes.

Back When I Was Young and Foolish (Part I of Oh-So-Many)

I'd like to do a little reminiscing now, of a time when I was young and foolish. I anticipate many such posts, which I guess means I spend a lot of time being young and foolish.

It was my first real IT job, and it'd be fair to call me a sysadmin. I was basically just an assistant to the lead architect. I was doing things like setting up backup schedules in DPM and creating user accounts in AD. Just the usual stuff you'd expect a guy first getting started in IT would be doing.

One day, my boss and I were discussing one of the many issues caused by being in the middle of a domain migration, and that was that we currently had employees working in two different Windows domains simultaneously. For various reasons, there would not and could not be a trust relationship established between the two. To further complicate matters, we now needed two separate ISA (now known as TMG) servers at the office, one for each domain, so as to keep the employees off of Facebook and ESPN.com. One might suggest that it was a social problem that should be dealt with by the employees' managers, but the managers were among the worst offenders. But I digress.

Now since both of these domains are on the same subnet, we can really only have one DHCP server. So if we only have one DHCP server, how are we going to push out two different sets of options to the clients, such as which proxy server to use, to members of either domain?

We settled on using two different DHCP classes.

So while we were fiddling around with the DHCP server, for no good reason we decided that right now was a great time to screw around with the basic settings of a DHCP server that had been serving us faithfully with no problems for years.

What is a good DHCP lease duration, really? Is it the default of 8 days? Well I suppose that depends on a lot of factors, such as how mobile your DHCP clients are and how often you expect them to be connecting to and disconnecting from the network, how many DHCP addresses you have to distribute, etc..  But I've already put more thought into it just now than we did on that day, for we almost completely arbitrarily set the DHCP lease duration to 30 minutes.

Fast forward a few months.

Everything had been working great and the office DHCP server, as DHCP servers should be, was all but forgotten about. I had just been offered a much higher-paying job with a much larger IT company, so I was now a short-timer at this job. Then, out of the blue, our primary domain controller (the one hosting all the FSMOs) at our remote datacenter goes dark.  I can still ILO into it, but both of its network adapters are just gone. No error messages in the logs, no sign that Windows had ever known that it did once have network connections, nothing.  It just looked like it had never had network adapters in the first place.  (This still perturbs me to this day. Why wasn't there an event in either the Windows logs or in the IML that said "Uhh dude where did your network adapters go!?")

After seizing the FSMO roles on the backup DC, we took a trip out to the datacenter to have a look. Sure enough, the link lights on the physical hardware were out, and upon rebooting, the BIOS of the machine didn't even recognize that it had ever had network adapters in it. So I had HP come out and replace the motherboard, which fixed the issue. I don't know what the point of having redundant NICs is if they're wired to both blow at the same time, but I digress again...

So now I have essentially a brand new server out of what used to be our primary domain controller. The question, my boss wanted to know, was how do we restore a former DC after its FSMO roles have been seized? My answer: You don't. You never bring it back. Ever. If I don't wipe the hard drive right now, it thinks it's still the owner of those FSMO roles, and the only way for it to learn that it no longer owns those FSMO roles is to connect it back to the domain network and let the KCC do its magic.  What's going to happen in that span of time that we now have two DCs in the domain that both simultaneously think that they own all the FSMOs? I didn't want to find out. So I argued that we just rebuild it and I eventually got my way.

Now since I had to rebuild the machine, I figured now was as good a time as any to upgrade it from Windows 2008 to Windows 2008 R2. Having spent the last several months studying my ass off for the MCITP tests and passing them, I was feeling pretty self-confident. I took the freshly rebuilt server back out to the datacenter. I re-cabled it and re-racked it, and powered it on. I grabbed the 2008 R2 version of adprep.exe and on the other domain controller, used it to run adprep /forestprep followed by adprep /domainprep. The domain was now ready to accept a 2008 R2 domain controller. I dcpromo'ed the new machine, and it went without a hitch. I then gracefully transferred the FSMOs back from the other DC. (I didn't alter the actual functional levels of the forest or domain.)

Now I had one new, fresh shiny 2008 R2 domain controller in the datacenter, with 5 other DCs in 2 other sites that were still on Windows 2008. Still high off my earlier success, I felt like this was a great time to do an in-place upgrade on the rest of those DCs.  One by one, I upgraded them from 2008 to 2008 R2, and you know what? I was actually pleasantly surprised at how smooth and painless it was. My hat goes off to Microsoft for making an in-place upgrade of a production domain controller so seamless.  I just remoted into the server, mounted the 2008 R2 media and hit "Upgrade."  That's all there was to it. The installation took about 30 minutes, and when the machine came back up, it was a pristine 2008 R2 box.

Everything was going great, until I got around to upgrading the last of the 6 DCs. That sixth DC was the DHCP server at the office. I followed the same procedure that had gone off without a hitch 5 times already. Confident that all was well, I set off to browse reddit while the DC upgrade finished. I was just thinking about how the upgrade was going kind of slow, when suddenly my Internet connection dropped. At the exact same instant, the phone rang.

Me: "Hello?"
Boss: "What did you do?"
Me: "Wha, I don't, I ..."
Boss: "RUN to the server room and set up a new DHCP scope. Now."

The entire office had just lost their Internet connections. In the intervening months, I had completely forgotten that we had set the DHCP lease duration on the domain controller to... oh, just a hair shorter than the amount of time it takes to complete an in-place upgrade on that domain controller. Had the lease time been just 5 minutes longer, the DC/DHCP server would have finished the upgrade and been back up serving renewals as if nothing had ever happened. By the time I had gotten the replacement scope set up on the other DC and done the dance of de-authorizing the first DC as as a DHCP server in Active Directory, and authorizing the new one, the original DC was back up and ready.

After about 30 minutes of repairing the damage, I sheepishly emerged from the server room, to the sound of ironic applause coming from all the employees in their cubicles.