Windows Server - Performance Counter Guidance

Traditionally we have shied away from noting specific values or thresholds that are indicative of good or bad performance.  One reason for this is because coming up with good values is quite hard to do, and people sometimes see that a particular value is outside of some threshold and become fixated on that being the issue when in reality it may not be.  For example, the Windows NT Resource Kit had a section that stated that a disk queue length greater than two to three times the number of disk spindles was indicative of a performance problem.  When working with SQL Server this is not always true, especially if read ahead activity is driving the disk queue length. Just because there is a queue waiting for IO does not necessarily mean that SQL Server is stalled waiting for the IO to complete.  We have seen disk queue lengths up in the 20-30 range (on much fewer than 10 disks) where SQL Server performance was just fine.

However, in the absence of specific values people sometimes look at Performance Monitor data and fail to spot interesting trends.  So despite the hesitation to provide any specific values, we will attempt to note some thresholds where a given counter starts to attract the interest of those who have been working some of these cases over the past several years.  Hopefully this will provide you some guidance in things that stand out.  But ultimately, you will need to look for other counters that also seem out of range so that you can start developing a pattern of indicators that point to a particular type of problem.

It should be fairly easy for you to visually identify a counter whose value changed substantially during a problematic time period.  Quite often you will find that there are many counters that changed significantly.  With a blocking problem, for example, you might see user connections, lock waits and lock wait time all increase while batch requests/sec decreases.  If you focused solely on a particular counter (or a few counters) you might come to some very different conclusions about what the problem is, and you could very likely be wrong.  Some of the changes in counter values are the cause of the original problem, whereas others are just side affects from that problem. 

In the ideal situation, the change in the counters that indicate the cause of the problem should lead the counters showing the affect, but due to the granularity used to capture Performance Monitor data some of these distinctions can be lost.  If you collect data once every 15 seconds and the problem was of quick onset, it can be hard to figure out if user connections went up first and then lock timeouts, or vice versa.  This is where you have to use other available information, such as other performance counters, the customer’s description of the problem, etc, to form a theory as to what you think may be wrong and then look for other supporting data to prove or disprove your theory.

 

Performance Non-Disk Counters

Object

Counter

Preferred Value

Description

Memory

Available Mbytes

> 100MB

Available MBytes is the amount of physical memory available to processes running on the computer, in Megabytes.

Memory

Pages Input/Sec

< 10

Pages Input/sec is the rate at which pages are read from disk to resolve hard page faults. See KB 889654.

Memory

Pages/Sec

See Description

Pages/sec is the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays. Investigate if over 100 pages per second on a system with a slow disk, usually even 500 pages per second on a system with a fast disk subsystem may not be an issue. 

Note:

·          Values of >20 pages that appear in many other sources of documentation are out of date.

·          A high value for the Memory: Pages/sec counter does not necessarily indicate memory pressure or a System Monitor reporting error. To gain an accurate reading of your system, you must also monitor other counters (Pages Input/Sec, %Usage, %Usage Peak). See KB 889654.

Paging File

%Usage

< 70%

The amount of the Page File instance in use in percent. See KB 889654.

Paging File

%Usage Peak

< 70%

The peak usage of the Page File instance in percent. See KB 889654.

Process (sqlservr)

%Processor Time

< 80%

% Processor Time is the percentage of elapsed time that all of process threads used the processor to execution instructions. An instruction is the basic unit of execution in a computer, a thread is the object that executes instructions, and a process is the object created when a program is run. Code executed to handle some hardware interrupts and trap conditions are included in this count.

Process (msmdsrv)

%Processor Time

< 80%

For the SSAS.

80% is for a server which is dedicated to SSAS.

Processor

%Privileged Time

< 30% of Total %Processor Time

% Privileged Time is the percentage of elapsed time that the process threads spent executing code in privileged mode. 

Processor

%Processor Time

< 80%

% Processor Time is the percentage of elapsed time that the processor spends to execute a non-Idle thread.

System

Processor Queue Length

< 4 per CPU

For standard servers with long Quantums
<= 4 per CPU Excellent
< 8 per CPU Good
< 12 per CPU  Fair

Performance Disk Counters

When the data files are places on a SAN ignore the following!! Use the performance tools provided by the SAN vendor instead

Object

Counter

Preferred Value

Description

PhysicalDisk

Avg. Disk Sec/Read   

< 8ms

Measure of disk latency. Avg. Disk sec/Read is the average time, in seconds, of a read of data from the disk.
More Info:
Reads
Excellent <  08 Msec    ( .008 seconds )
Good  <  12 Msec    ( .012 seconds )
Fair  <  20 Msec    ( .020 seconds )
Poor  >  20 Msec    ( .020 seconds )

PhysicalDisk

Avg. Disk sec/Write

< 8ms (non cached)

 

< 1ms (cached)

Measure of disk latency. Avg. Disk sec/Write is the average time, in seconds, of a write of data to the disk.

Non cached Writes
Excellent <  08 Msec    ( .008 seconds )
Good  <  12 Msec    ( .012 seconds )
Fair  <  20 Msec    ( .020 seconds )
Poor  >  20 Msec    ( .020 seconds )

Cached Writes Only
Excellent <  01 Msec    ( .001 seconds )
Good  <  02 Msec    ( .002 seconds )
Fair  <  04 Msec    ( .004 seconds )
Poor  >  04 Msec    ( .004 seconds

 

Performance Counters for Windows Server                                                                                     

( SQL Server Premier Field Engineers in Microsoft UK )