Creating a baseline

 

This document introduces the subject of trending and monitoring for performance ( as against alerts ).

If your server performance degrades you need to be able to compare.

 

Ø  If you cannot do this then you are on the back foot straight away and you may well hear such comments as “ So why do we employ a DBA then?” or worse still  “Maybe we should look at replacing SQL Server with Oracle”

Ø  This document forms the basis of the statistic gathering I introduced in SQL 2000, it’s not particularly rocket science or even very exciting, it even manifests itself as management reporting, some examples below, however if you’re supporting critical business systems and you get in the firing line when things go bad, then you might find this type of process rather essential.

 

 

Month

Average Transactions / Second

Total

Sales

WorkFlow

Orders

Tempdb

October 2006

238

106

3

22

97

November 2006

201

80

2

23

86

December 2006

134

53

2

14

57

January 2007

269

165

4

22

62

February 2007

179

85

3

16

36

March 2007

139

75

3

16

30

April 2007

156

76

4

20

39

May 2007

180

83

4

21

53

June 2007

175

84

4

20

47

July 2007

159

74

4

20

43

August 2007

148

66

3

20

41

September 2007

153

65

3

25

41

October 2007

207

101

4

29

53

 

 

Date ( 1st  of )

Total Data Size

Data Growth for the month

Projected 12 month size

Jun 2006

34,721.09

 

 

Jul 2006

38,869.29

     4,148 

168,000 Mb

Aug 2006

50,199.05

                      11,329.76

212,000 Mb

Sep 2006

59,027.62

                   8,828.57

225,600 Mb

Oct 2006

74,100.98

            15,073.36          

285,000 Mb

Nov 2006

82,460.33

8,359.00

200,000 Mb

Dec 2006

91,546.58

9,087.00

212,000 Mb

Jan 2007

91,353.23

-193.35

 

Feb 2007

96,771.10

5,417.87

 

Mar 2007

122,255.81

25,484.71

 

Apr 2007

132,750.58

10,494.77

 

May 2007

127,405.57

-5,345.01

 

Jun 2007

140,451.88

13,046.31

 

Jul 2007

151,522.62

11,070.74

300,000 Mb

Aug 2007

158,673.16

 

 

Sep 2007

168,236.72

 

 

Oct 2007

179,723.10

11,486.38

 

1 Nov 2007

188,745.10

 ( 9,022   -  14,825 )

386,469 Mb

 

 

 

 

 

Server Performance for October

 

Weekday

Date

CacheHitRatio

 ( Avg )

Logins

 ( Avg )

Deadlocks

( Total )

Latchwaits\sec

Lockwaits\sec

Monday

3

63.365883

508

0

112

0.13553031

Tuesday

4

63.633709

541

1

101

0.06068182

Wednesday

5

64.069511

544

0

111

0.15459596

Thursday

6

62.808182

518

0

115

0.29828283

Monday

10

72.500183

459

548

119

0.12861112

Tuesday

11

67.146805

516

3

117

8.1414141E-2

Wednesday

12

65.199837

553

1

118

0.13881312

Thursday

13

64.065308

578

1

135

0.36633837

Friday

14

63.502281

577

1

127

4.6035353E-2

Monday

17

64.474243

492

2

176

9.6565656E-2

Tuesday

18

64.344391

591

1

172

9.7676769E-2

Wednesday

19

64.221291

572

2

163

0.11300505

Thursday

20

64.010361

567

2

163

8.4797978E-2

Friday

21

64.742073

476

2

152

8.7853536E-2

Monday

24

72.102028

506

1

161

4.8813131E-2

Tuesday

25

66.991104

513

3

198

0.1084596

Wednesday

26

65.961151

514

1

260

0.14002526

Thursday

27

65.872841

504

1

257

0.10290404

Friday

28

65.586937

451

0

240

8.5252523E-2

Monday

31

69.859261

493

2

400

9.0075761E-2

 

 

These are just a few examples presented as tables, I usually graph some of the results, this is easier with Reporting Services, the original reports didn’t have Reporting Services available.

 

 

Why?

 

 

 

Collection of data

 

Establish what data to collect

 

Server level

 

 

SQL 2000 Basic Counters available are :-

 

Auto-Param Attempts/sec

AWE unmap calls/sec                         

Backup/Restore Throughput/sec 

Bulk Copy Throughput/sec                

DBCC Logical Scan Bytes/sec         

Failed Auto-Params/sec                 

Free list requests/sec                        

FreeSpace Scans/sec                       

Latch Waits/sec                                   

Lock Timeouts/sec                         

Log Cache Reads/sec                      

Logins/sec                                         

Number of Deadlocks/sec             

Page reads/sec                                 

Pages Allocated/sec                    

Readahead pages/sec                     

Shrink Data Movement Bytes/sec

SQL Re-Compilations/sec                  

Unsafe Auto-Params/sec               

AWE lookup maps/sec       

AWE unmap pages/sec                      

Batch Requests/sec                            

Cache Use Counts/sec                        

 Extent Deallocations/sec              

 Forwarded Records/sec             

Free list stalls/sec                

 Full Scans/sec                               

Lazy writes/sec                    

Lock Waits/sec                                     

 Log Flush Waits/sec                     

 Logouts/sec                                        

 Page Deallocations/sec              

Page Splits/sec                                     

Probe Scans/sec                                

 Safe Auto-Params/sec               

 Skipped Ghosted Records/sec

Table Lock Escalations/sec 

 Workfiles Created/sec                      

AWE stolen maps/sec

AWE write maps/sec

Bulk Copy Rows/sec

Checkpoint pages/sec

Extents Allocated/sec

Free list empty/sec

FreeSpace Page

Fetches/sec

Index Searches/sec

Lock Requests/sec

Log Bytes Flushed/sec

Log Flushes/sec

Mixed page allocations/sec

Page lookups/sec

Page writes/sec

Range Scans/sec

Scan Point Revalidations/sec

SQL Compilations/sec

Transactions/sec

Worktables Created/sec

 

Full details of sql perfmon counters can be found in the SQL Server 2000 Performance Tuning Technical  Reference – Microsoft Press ISBN 0-7356-1270-6 

 

SQL Server 2005 has introduced an extended number of performance counters query the table sys.dm_os_performance_counters   or open perfmon and scroll through the SQL counters

 

 

 

 

 -- SQL 2000

select counter_name ,cntr_value,cast((cntr_value/1024.0)/1024.0 as numeric(8,2)) as Gb

from dbo.sysperfinfo where counter_name like '%server_memory%'

-- SQL 2005

select counter_name ,cntr_value,cast((cntr_value/1024.0)/1024.0 as numeric(8,2)) as Gb

from sys.dm_os_performance_counters where counter_name like '%server_memory%';

 

 

 

 

-- SQL 2000

declare @init numeric(20,2),@final numeric(20,2)

declare @start_time datetime,@final_time datetime,@count numeric(20,2)

--

select @init=cntr_value,@start_time=getdate() from dbo.sysperfinfo where cntr_type=272696320

and counter_name = 'SQL Re-Compilations/sec'

waitfor delay '00:00:10'

select @final=cntr_value,@final_time=getdate() from dbo.sysperfinfo where cntr_type=272696320

and counter_name = 'SQL Re-Compilations/sec'

set @count=datediff(ss,@start_time,@final_time)

select cast ((@final-@init)/@count as numeric(20,2))

--

-- SQL 2005

declare @init numeric(20,2),@final numeric(20,2);

declare @start_time datetime,@final_time datetime,@count numeric(20,2);

--

select @init=cntr_value,@start_time=getdate() from sys.dm_os_performance_counters where cntr_type=272696320

and counter_name = 'SQL Re-Compilations/sec';

waitfor delay '00:00:10';

select @final=cntr_value,@final_time=getdate() from sys.dm_os_performance_counters where cntr_type=272696320

and counter_name = 'SQL Re-Compilations/sec';

set @count=datediff(ss,@start_time,@final_time);

select cast ((@final-@init)/@count as numeric(20,2));

 

 

 

Suggested counters might be:-

 

Procedure Cache Hit Ratio

% age

This value should always be high , >90%, and is an indication of how often cached plans satisfy calls.

For 32bit SQL Server the procedure cache may not exist in extended memory so can be constrained by other memory pressures.

A low cache hit ratio is an indication of a lack of server memory or of less than optimal code.

A low cache hit ratio can also be produced where the volume of  transactions and number of databases is such that the procedure cache cannot support them.

Transactions

Number / sec  Total and for selected databases including tempdb

Transactions/second measure work done within a database and within the server. Transactions only measure changes. This is a measure of database activity. Tempdb should be included as this database can sometimes be a bottleneck. Temporary table creation, sorts, order by, work tables etc. are created in tempdb.

Compilations

Number / sec  total for the server

All work on the SQL Server

Logins

Total

The number of persisted connection to the server. Each connection typically takes resource of around 64k memory. Connection memory cannot come from extended memory and thus unnecessary persisted connections can decrease the available memory for other processes.

Connections

Number / sec

This is the count of users connecting every second, connecting is resource hungry, connection pooling if applicable should minimise this figure, high values for this counter may indicate issues with connection pooling.

Packets  received

Packets sent

Number / sec

Indication of server activity. Can be used to compare against network bandwidth

Reads

Writes

Number / sec

Count of page reads and writes executed by SQL Server. Indication of server activity.

Packet Errors

Count

Should be zero

Read/Write errors

Count

Should be zero

Workfiles

Worktables

Number / sec

Indication of work within tempdb. Workfiles and worktables are created during sorts for example. High levels for these figures may indicate less than optimal sql.

Full Scans

Number / sec

This figure should be zero but rarely is. It indicates the number of full table or index scans performed.

IndexSearches

 

 

PageSplits

Number / sec

Too many page splits are bad, ideally this value should be very low but typically often isn’t. A page split occurs when a new record is inserted within an existing page and there is no room. Page splits are expensive in performance terms and are also likely to result in data fragmentation in the database.

Certain monitoring programs can influence this counter.

LockRequests

LockTimeouts

LockWaits

Number / sec

Number / sec

/average wait time in milliseconds

Indicators of server performance. Lock requests show activity. Lock timeouts should be minimal. Note that LockTimeouts and LockWaits do not directly relate to database locking activity

 LatchWaits

 

/average wait time in milliseconds

Latches are lightweight locks – this is a useful counter.

Deadlocks

Count

Must be zero. Deadlocks are bad news

 

 

 

Database Growth

 

 

 

Where to put the data?

 

 

Other metrics of interest

 

% Idle Time

 

Disk Idle Time

 

Data and log drives

%  age

The disk idle time is considered to be a more accurate measure of disk activity ( actual activity equals 100 – idle time ) This value is valid for a SAN

The transaction log drives should never show sustained capacity usage , 0% idle time.

 

Avg. Disk sec/read

Avg. Disk sec/write

Avg. Disk sec/transfer

 

Average i/o completion time

 

Mainly data drives, variations should never be seen for log drives.

Average duration ms

This is the time in milliseconds for disk operations to complete, counters are available for overall i/o, reads and writes. Typically i/o should complete as quickly as possible, the actual values may vary according to configuration, typically expect values of less than 6 ms. This counter is valid for a SAN.

This counter is very useful for monitoring system performance, high values for i/o completion times will generally indicate degraded system performance.

There should never be high values for the transaction log drives, any slow down in the transaction log writes will seriously degrade performance – transactions have to write successfully before control is returned to the process, in practice the configuration of the drive and controller caching should make this transparent – expect to see no values for this counter for the log drive.

Disk Reads/sec

Disk Writes/sec

Disk Transfers/sec

 

Disk i/o

 

Data drive generally. Total, reads and writes

number/sec

Count of actual disk operations

A pattern of i/o should emerge, any variation will be an indicator of potential problems.

Any disk or array has a maximum supported i/o count, performance will start to degrade at 75% - 85% of the maximum supported i/o

Typically a 10k spindle might support around 280 sequential and 105 random i/o s per second.  The exact figures can be calculated from the manufacturer’s specification for the disk.

Cpu

 

There is a direct relationship between cpu, memory and disks.

High cpu may be an indicator of inadequate memory or an inefficient disk subsystem. Poorly optimised sql may also increase cpu due to increased i/o.

Generally faster processors are better than multiple slower processors. More processor cache improves performance significantly. Hyperthreading is generally beneficial but can sometimes cause problems.

Total sustained cpu >80% may be an indicator of performance bottlenecks.

Memory

W2k can support 8gb

W2003 can support 64Gb ( 32 bit )

 

64bit o/s have a flat memory model.

 

For absolute performance have physical memory equal to database size.

Generally SQL Server works best with fixed allocated memory, in this case most memory counters do not work.

 

Typically adding memory where sql server can use it will reduce cpu and improve performance. Memory is the most effective hardware upgrade.

For the 32bit platform configuration of extended memory mostly only benefits data caching – this would reduce physical i/o. Reading and writing to memory is far more efficient than to disk, low values for the Page life expectancy counter may be an indicator of insufficient memory.

 

Disk Queue

Data and Log drives

Count

These values should typically be zero, disk queue counters usually do not give meaningful values for a SAN.

 

 

ü  Gathering data as described will allow a baseline of normal activity to be set.

 

 

NOTE:

 

There are a number of excellent Microsoft White Papers which cover all aspects of performance, mainly for SQL 2005 onwards.

 

Document tidied up prior to publication to www.grumpyolddba.co.uk

 

 

© colin leversuch-roberts   2001 - 2007