So I finally solved a bit of a mystery, after wondering about it off an on for about 6 months.
A lot of us out there use some sort of monitoring program like SCOM that gathers some performance counter data from all of the servers in your environment, and uses that data to produce messages such as:
But every once in a while, if for some reason the performance counters needed are unavailable, your monitoring application might say something like this:
Well that happens to me somewhat frequently. Sometimes it's an error in the monitoring application, or sometimes the agent machine needs a lodctr /R or winmgmt /resync or something like that. But every once in a while it's more difficult to solve than that. I logged on to SERVER01 to see if I could simply add the performance counters needed to Perfmon. The objects and instances were there, but one of them looked odd. Then I decided to use typeperf.exe to get some data from the counters:
A counter with a negative denominator value was detected? Now contrary to that screenshot above, the -1's were intermittent, and would actually change to normal values for a few seconds, and then go back to negative ones. It was as if the scale of that counter was offset, and whenever real CPU load would occur, the counter would increase to positive values, but then as the CPU usage dropped back down closer to zero, the counter would fall below zero again. Hmm... if you do a search for something like "missing performance counters," or "a counter with a negative denominator value," you'll get plenty of good links, as missing or corrupt performance counters are a somewhat common issue. Such as this, or this. Notice in that last link, that there is a vague reference to it being caused by "intermittent timing issues in the performance data handlers for many counters." The article fails however to offer any advice or anything more concrete than that. lodctr /R and winmgmt /resyncperf and all that usual stuff did nothing to help.
So I sat back and thought some more about the scope of the problem. I have only seen this issue two or three times in my career. The two cases I could remember both seemed to be on VMware virtual machines. One of them was a Windows 2003 operating system, and then a couple months later I saw it again on a Windows 2008 R2 VM. That phrase from that one KB stuck with me... "intermittent timing issues," and that made me think of hardware timing issues. But since these were VMs, they didn't exactly have hardware per se... but maybe it had something to do with the way in which the virtual machines communicate with their physical host. I was so close to figuring out the answer, but at the time I shrugged it off and shelved the issue in my mind for a while.
Then about a week ago the issue popped up again. Long story short, it turns out it was this all along:
Unchecking that "Time synchronization between the virtual machine and the ESX server" in the VMware tools app on the VM made the problem go away. Ugh. The answer had been so simple all along, yet it alluded me for months.
I feel better now that I at least have a solution for that particular scenario. Now... back to Hyper-V.