Leave feedback
  • Discussion

    Finding Performance Bottlenecks in Windows

Enter a new topic
  • Henrik Wejdmark Henrik Wejdmark StreamServe Employee Administrator
    2 likes 16736 views

    Hi!

    Here’s a good document found on Internet:

    Finding Performance Bottlenecks

    Part 1: Memory

    Although the Performance Monitor is interesting and even fun to use, it can also be confusing and of little value if you don't know which counters are significant and why.  In this feature, we will begin looking at several sets of counters that will help you determine if and where your Windows 2000 system is experiencing bottlenecks.

    One of the most common bottlenecks that Windows 2000/NT systems face is memory.  NT/2000 computers are notorious for being RAM hogs: the more you can pack in a system, the better.  However, there are counters that you can measure that will show this quantitatively, and that can help you build a case for adding more RAM.

    There are two general areas to look to see if your system needs more RAM: the paging file, and the various memory "pools."  The paging file is an actual file on your hard drive.  When the operating system runs out of physical RAM, it will temporarily move data out of RAM to this file.  Therefore, if monitoring this file shows heavy usage, we know that the system is low on RAM, generally speaking.  Here are the counters to watch:

    Paging File

    Object: Memory
    Counter: Pages /sec

    Object: Logical Disk (location of the PAGEFILE.SYS)
    Counter: Avg. Disk sec/Transfer

    Note: In order to use the Logical Disk counters, you must turn them on by typing "diskperf -yv" at a command prompt, and then restarting the machine.

    Interpretation: Multiplying these two counters will give you the percentage of the disk access time being used by paging.  If this is greater than 10% for a sustained period, you should add more RAM.  If the Pages/Sec value is consistently greater than 5, you should also consider adding more RAM.

    Memory Pools

    Object: Server
    Counter: Pool Nonpaged Failures
    Interpretation:  The number of times allocations from nonpaged pool have failed. Failures indicate that the computer's physical memory is too small.

    Object: Server
    Counter: Pool Paged Failures
    Interpretation:  Pool Paged Failures indicate that either physical memory or a paging file is near capacity.

    Object: Server
    Counter: Pool Nonpaged Peak
    Interpretation:  This counter shows the maximum number of bytes of nonpaged pool the server has had in use at any one point. This will indicate how much physical memory the computer should have.

    Although you can view these counters in a real-time chart window in Performance Monitor, I recommend creating a log file and letting it run during a typical twenty-four hour period, to get a more complete picture of what's going on with your system.

    Finding Performance Bottlenecks

    Part 2: Disk Drives

    We have been exploring, with the help of Performance Monitor, how to determine if a Windows 2000 machine is suffering from a performance bottleneck.  In the previous feature, we looked at the performance counters for memory.  In this feature, we will be examining the disk drives, to determine if they are the item that is holding back the performance of the rest of the system.  Keep in mind that if there is a bottleneck with physical RAM, then one of the symptoms will be excessive disk usage, as the system swaps memory in and out to disk.  If it appears that the disks present a bottleneck to the system, be sure to check RAM also.  Here are the counters to look at in Performance Monitor:

    Performance Counters

    Object: Physical Disk
    Counter: %Disk Time
    Interpretation: This is the percentage of elapsed time that the selected disk drive is busy servicing read or write requests.  If this value is consistently high and disk queue length (see below) is greater than 2, your disk may be a bottleneck. 

    Object: Physical Disk
    Counter: Average Disk sec/Transfer
    Interpretation: This is the time in seconds of the average disk transfer.  If this value measures greater than 0.3 it probably indicates that the disk controller is having to retry over and over because of disk failures.

    Object: Physical Disk
    Counter: Current Disk Queue Length 
    Interpretation: This counter is the number of requests outstanding on the disk at the time the performance data is collected. It includes requests in service at the time of the snapshot. This is an instantaneous length, not an average over the time interval. Multi-spindle disk devices can have multiple requests active at one time, but other concurrent requests are awaiting service. This counter might reflect a transitory high or low queue length, but if there is a sustained load on the disk drive, it is likely that this will be consistently high. Requests are experiencing delays proportional to the length of this queue minus the number of spindles on the disks. This difference should average less than 2 for good performance.

    There is a formula that Microsoft has provided for determining what they call the "Average Queue Time."  This AQT is "the average amount of time for a disk transfer (either reads or writes) to complete."  Here is the formula:

    Avg. Queue Time = Disk Queue Length x Avg. Disk sec/Transfer

    "This information is a relative performance measurement and should be compared with other hard disk drivers in your system. Compute the figures for all logical disks in your system. The number of disk commands waiting in the queue is normally the factor that slows disk performance by increasing the average disk queue time."

    Object: Physical Disk
    Counter: Disk Bytes/sec
    Interpretation: This counter is the rate that bytes are transferred to or from the disk during write or read operations.  A Disk Bytes/sec count lower than 20K may indicate a bottleneck.  Check this rate during read or write operations.

    Although you can view these counters in a real-time chart window in Performance Monitor, I recommend creating a log file and letting it run during a typical twenty-four hour period, to get a more complete picture of what's going on with your system.

    So what can you do if your monitoring of the above counters does indicate that your disks present a bottleneck to your system (assuming RAM does not)?  There are really several options.  The most obvious option is to purchase faster disks and/or other components, such as disk controllers.  There are several flavors of SCSI drives available now, with varying speeds and throughput.  IDE drives have also advances in speed and throughput over the years.  Other options are adding more drives and using various RAID configurations that can improve throughput.  All hardware eventually has limitations, however.  So what can you do if you can't find or afford faster drives or a faster controller?  Another option is to move some data or applications off the disk volumes that are causing the bottleneck.  Look for applications that use the disk extensively, or for folders that contain frequently accessed data.  Move them to other drives or even to separate systems.  If money is no object, another option to investigate is load-balanced or clustered systemsStorage Area Networks, also known as SANs, can also offer some relief for disk-intensive applications.

    Finding Performance Bottlenecks

    Part 3: Processor

    We have been exploring, with the help of Performance Monitor, how to determine if a Windows 2000 machine is suffering from a performance bottleneck in one or more of its components.  In the previous features, we looked at the performance counters for memory, and for disks.  In this feature, we will be examining the Processor, to determine if it is the item that is holding back the performance of the rest of the system.  Many people wrongly assume that just because a system seems slow, the processor must not be up to par.  However, as we have seen, other components can also be bottlenecks, so in order to tell for certain, there are some specific measurements to take.  Here are the counters to look at in Performance Monitor:

    Performance Counters

    Object: Processor
    Counter: %Processor Time
    Interpretation: If this value remains greater than 80%, without corresponding high values for disk and network counters, the processor may be a bottleneck.

    Object: System
    Counter: Processor Queue Length
    Interpretation: A sustained value of 3 or greater usually indicates a processor bottleneck.

    Object: Processor
    Counter: Interrupts/sec
    Interpretation: Check this counter when system activity is low.  If there is a large increase in the counter without an increase in system activity, there may be a hardware problem.

    Object: Process (_Total)
    Counter: %Processor Time 
    Interpretation: If more than one or two processes contend for most of the process time, then performance can be improved by adding processors or upgrading to faster a processors.

    If you have a multiprocessor system: 

    Object: System
    Counter: %Processor Time (for multi processor systems)
    Interpretation: If this value remains greater than 80%, without corresponding high values for disk and network counters, the processor may be a bottleneck..

    Although you can view these counters in a real-time chart window in Performance Monitor, I recommend creating a log file and letting it run during a typical twenty-four hour period, to get a more complete picture of what's going on with your system.

    So what if your measurements indicate that your system has a processor bottleneck?  Your options are similar to that of disk bottlenecks.  You can add processing power to your current system either by adding more processors or upgrading the current processor to a faster one.  Of course, you may be limited by your system board (or even your OS version) as far as which processors and how many you can add.  The second best option may be to move some of the load or functionality to a different machine.  If money is no object, you may wish to look into clustering or load balancing options.

    Finding Performance Bottlenecks

    Part 3: Network & Network Components

    We have been exploring, with the help of Performance Monitor, how to determine if a Windows 2000 machine is suffering from a performance bottleneck in one or more of its components.  In the previous features, we looked at the performance counters for memory, disks, and CPU.  In this feature, we will be examining the network and networking components, to determine if there are performance problems that might be holding back the rest of the system.  When measuring these components, there are two things to look at.  The network in general may not be able to handle the amount of traffic that it is being asked to carry, or it may be the network can handle large amounts of traffic, but the networking components in a certain machine cannot keep up with that traffic.  To discern which it might be you have to measure both components.  Here are the counters to look at in Performance Monitor:

    Performance Counter for the Network

    Object: Server
    Counter: Bytes Total/sec
    Interpretation: If the sum of this value for all servers is getting close to the actual maximum transfer rates of the network (e.g.: 10Mb/sec or 100Mb/sec), it may be necessary to segment the network.

    Performance Counters for Network Components

    Object: Redirector
    Counter: Current commands
    Interpretation: If this number is greater than 1 for each network adapter, it may indicate a bottleneck in the redirector.  The redirector is a software component that communicates with the network and the server.  A bottleneck may mean that the server is slower than the redirector, or that the network may be at capacity, or that the network adapter is slower than the redirector.

    Object: Redirector
    Counter: Network Errors/sec
    Interpretation: Reasons vary.  More information should be obtained from the System Event Log.

    Object: Redirector
    Counter: Reads Denied/sec & Writes Denied/sec
    Interpretation: The server's buffer size is a negotiated value.  When read or write is larger than the server's buffer size, the redirector will request a Raw Read or Raw Write (large transfer of data with less protocol overhead).  The server must lock out other requests in order to allow Raw Reads/Writes, so if the server is very busy, it denies the request from the redirector.  These counters may also indicate that servers are having trouble allocating memory.

    Object: System
    Counter: Work Item Shortage
    Interpretation: This counter shows the number of times that a STATUS_DATA_NOT_ACCEPTED was returned at receive indication time.  If this counter is increasing, it should cause a corresponding increase in the InitWorkItems or MaxWorkItems parameters in the Registry.  See Microsoft KB Article Q102967 for more information.

    Object: Server
    Counter: Raw Reads Rejected/sec & Raw Writes Rejected/sec
    Interpretation: When performing large file transfers, if the raw work items become exhausted, it will be indicated in the above rejections.  Increasing the RawWorkItems value in the Registry may eliminate this particular bottleneck.  See Microsoft KB Article Q102969 for more information.

    Although you can view these counters in a real-time chart window in Performance Monitor, I recommend creating a log file and letting it run during a typical twenty-four hour period, to get a more complete picture of what's going on with your system.

    So what if your measurements indicate that your system has a bottleneck with the network or networking components?  The options really will depend on your current environment, as well as where, exactly, the problem lies.  If the bottleneck seems to be with the network components on just one server, then moving some resources off from that server to a different server, splitting the network load, may help.  However, if your network is at or near capacity, you may need to segment it, having fewer machines on each segment.  This usually involves adding more network adapters to the servers, so that they can still communicate with all of the computers on the network.  Another option may be to use switches rather than standard hubs, on an Ethernet network.

    If you have specific questions, comments, or insight, or would like to discuss more about the Windows 2000 performance, bottlenecks, or about other aspects regarding Windows 2000/NT, or any related topic, please feel free to post your questions or comments on the Focus on Windows NT/2000 forum, or visit the Chat room to exchange information, ideas and opinions with other NT'ers.

    'Til next time,

    Douglas Ludens
    Windows 2000/NT Guide

    //Cheers

         Henrik

    Henrik Wejdmark
    Vice President, Systems Engineering
    henrik.wejdmark@streamserve.com

    StreamServe, Inc.
    3 Van de Graaff Drive
    Burlington, MA 01803, USA

    Phone: +1.781.863.1510

    Fax:     +1.781.229.6622

    Cell:     +1.617.259.6996 

    www.StreamServe.comA Leader in Enterprise Document Presentment

    Monday 18 January, 2010