You are on page 1of 13

Server & Tools Blogs > Server & Management Blogs > Ask the Core Team

Sign in

Ask the Core Team


Microsoft Enterprise Platforms Support: Windows Server Core Team

Windows Performance Monitor Disk Counters Explained



★★★★
★★
★★★★★
★★★

March 16, 2012 by Jeff Hughes (MSFT) // 31 Comments

Share 19 0 0

My name is Flavio Muratore and I am a Sr. Support Escalation Engineer with the Windows Core team at
Microsoft. If you ever find yourself analyzing storage performance with Performance Monitor, this post is for
you. We will go beyond very brief descriptions provided in Perfmon and describe how we calculate the data
for the Physical and Logical disk counters.

Why the Performance Monitor?


When it comes to the subject of disk performance in Windows, the majority of questions can be quickly
answered by Performance Monitor alone. Performance Monitor is very low overhead, does a great job with
averages and can also capture and store data over long periods of time. It is an excellent choice to record a
performance baseline and to troubleshoot.
For short in this text, we are going to call the Windows Performance Monitor by its nickname: Perfmon.
The nickname comes from its executable file located at %systemroot%system32Perfmon.exe.

There are some things Perfmon will not be able to tell us. For advanced analysis, Windows provides us with
xPerf, enabling state of the art performance data capture through Event Tracing for Windows (ETW). There is
an excellent bog on the subject by Robert Smith (Sr. PFE/SDE). “Analyzing Storage Performance using the
Windows Performance Analysis ToolKit (WPT)”.

What is the difference between the Physical Disk vs. Logical Disk performance objects in Perfmon?

Perfmon has two objects directly related to disk performance, namely Physical Disk and Logical Disk. Their
counters are calculated in the same way but their scope is different.

The Physical Disk performance object monitors disk drives on the computer. It identifies the instances
representing the physical hardware, and the counters are the sum of the access to all partitions on the
physical instance.

The Logical Disk Performance object monitors logical partitions. Performance monitor identifies logical
disks by their drive letter or mount point. If a physical disk contains multiple partitions, this counter will report
the values just for the partition selected and not for the entire disk. On the other hand, when using Dynamic
Disks the logical volumes may span more than one physical disk, in this scenario the counter values will
include the access to the logical disk in all the physical disks it spans.

Disk Counters Explained.

%Disk Time (% Disk Read Time, % Disk Write Time)


The “% Disk Time” counter is nothing more than the “Avg. Disk Queue Length” counter multiplied by 100. It is
the same value displayed in a different scale.
If the Avg. Disk queue length is equal to 1, the %Disk Time will equal 100. If the Avg. Disk Queue Length is
0.37, then the %Disk Time will be 37.
This is the reason why you can see the % Disk Time being greater than 100%, all it takes is the Avg. Disk
Queue length value being greater than 1.
The same logic applies to the % Disk Read Time and the % Disk Write Time. Their data comes from the Avg.
Disk Read Queue Length and Avg. Disk Write Queue Length, respectively.

Avg. Disk Queue Length (Avg. Disks Read Queue Length, Avg. Disk Write Queue Length)
Avg. Disk Queue Length is equal to the (Disk Transfers/sec) *( Disk sec/Transfer). This is based on “Little’s Law”
from the mathematical theory of queues. It is important to note this is a derived value and not a direct
measurement, I recommend reading this article from Mark Friedman, the information still applies to Windows
2008 R2.
As you would expect, the Avg. Disk Read Queue Length is equal to the “(Disk Reads/sec) * (Disk sec/Read)”
and Avg. Disk Write Queue Length is equal to the “(Disk Writes/sec) * (Disk sec/Write)”.

Current Disk Queue Length


Current Disk Queue Length is a direct measurement of the disk queue present at the time of the sampling.

% Idle Time
This counter provides a very precise measurement of how much time the disk remained in idle state, meaning
all the requests from the operating system to the disk have been completed and there is zero pending
requests.
This is how it’s calculated, the system timestamps an event when the disk goes idle, then timestamps another
event when the disk receives a new request. At the end of the capture interval, we calculate the percentage of
the time spent in idle. This counter ranges from 100 (meaning always Idle) to 0 (meaning always busy).

Disk Transfers/sec (Disk Reads/sec, Disk Writes/sec)


Perfmon captures the total number of individual disk IO requests completed over a period of one second. If
the Perfmon capture interval is set for anything greater than one second, the average of the values captured
is presented.
Disk Reads/sec and Disk Writes/sec are calculated in the same way, but break down the results in read
requests only or write requests only, respectively.

Disk Bytes/sec (Disk Read Bytes/sec, Disk Write Bytes/sec)


Perfmon captures the total number of bytes sent to the disk (write) and retrieved from the disk (read) over a
period of one second. If the Perfmon capture interval is set for anything greater than one second, the average
of the values captured is presented.
The Disk Read Bytes/sec and the Disk Write Bytes/sec counters break down the results displaying only read
bytes or only write bytes, respectively.

Avg. Disk Bytes/Transfer (Avg. Disk Bytes/Read, Avg. Disk Bytes/Write)


Displays the average size of the individual disk requests (IO size) in bytes, for the capture interval. Example: If
the system had ninety nine IO requests of 8K and one IO request of 2048K, the average will be 28.4K.
Calculation = (8k*99) + (1*2048k) / 100
The Avg. Disk Bytes/Read and Avg. Disk Bytes/Write counters break down the results showing the average
size for only read requests or only write requests, respectively.

Avg. Disk sec/Transfer (Avg. Disk sec/Read, Avg. Disk sec/Write)


Displays the average time the disk transfers took to complete, in seconds. Although the scale is seconds, the
counter has millisecond precision, meaning a value of 0.004 indicates the average time for disk transfers to
complete was 4 milliseconds.
This is the counter in Perfmon used to measure IO latency.
I wrote a blog specifically about measuring latency with Perfmon. For details got to “Measuring Disk Latency
with Windows Performance Monitor”.

Split IO/Sec
Measures the rate of IO split due to file fragmentation. This happens if the IO request touches data on non-
contiguous file segments. For an explanation about file segments see this blog from Robert Mitchell – The
Four Stages of NTFS File Growth.

Logical Disk Counters Exclusive Counters


The Logical Disk performance object has all the same counters as the physical disk, and except for the fact
they are reported per logical unit instead of physical device, they are calculated in the same way.
Because the physical disk counter does not understand volumes, the following counters are exclusive to the
Logical Disk Object.

% Free Space
Display the percentage of the total usable space on the selected logical disk that was free.

Free Megabytes
Displays the unallocated space, in megabytes, on the volume.
How can we quickly tell how much free space is available in the volume? Check this blog from Robert Mitchell
– NTFS Metafiles.

A few words about performance monitor counters averaging and rounding:


Perfmon is really good at averaging results and rounding numbers, this enables us to have relatively small log
files and extract useful the information from the data captured. Although the numbers displayed to the user
during a live capture and the numbers saved in the log files are rounded, the numbers used in the internal
calculations are more precise.
When reading the description for some counters in this blog, you probably noticed Perfmon has to calculate
an average of averages, this leads to small imprecisions on the final numbers. Also, when we combine this
with instances that do further rounding and averaging, like the “ _Total instance”, you will see some results are
close but do not add up exactly. For example, if you get the “Disk Transfers/sec” over a period of time and
subtract both the “Disk Reads/sec” and the “Disk Writes/sec” the resulting number may not be exactly zero.

This is expected and does not pose a problem to the performance analysis at this level. If you can’t tolerate
these small imprecisions you will need to use xPerf. xPerf does event tracing and all data is kept with no
averaging or rounding. The downside is the resulting log files with xPerf are much bigger than the ones
Perfmon creates.

Conclusion:
The Windows Performance Monitor is a very powerful diagnostic tool and is capable of answering most
questions about the state of disks on the fly. Perfmon uses averaging and rounding to keep only meaningful
data in its log files, thus allowing captures over a long period of time.

I must thank a bunch of Microsoft fellows for helping me with this blog. Big thanks to Bruce Worthington
(Principal Development Lead), without your knowledge I would not be able to finish this blog. Thanks also
to Mark Licata (Principal SE), Robert Smith (Sr. PFE), Clint Huffman (Sr. PFE), John Rodriguez (Principal
PFE), Steven Andress (Sr. SEE) and the Storage performance discussion group at Microsoft. It seems so
simple now, but it took a lot of sweat to get the exact data to make sure this information is accurate.

Y’all have fun with Perfmon.

Flavio Muratore
Senior Support Escalation Engineer
Microsoft Enterprise Platforms Support

Search MSDN with Bing


Search MSDN with Bing

Search this blog Search all blogs

Top Server & Tools Blogs

ScottGu's Blog

Brad Anderson’s "In the Cloud" Blog


Brian Harry's Blog
Steve "Guggs" Guggenheimer's Blog

EPS Team Blogs

ASK Perf Blog


Memory Management, Performance, Printing, Terminal Server
CPR / Escalation Services
Debugging, Hangs, Tools
Directory Services
DFSR, Certificates, Group Policy
LATAM [Portugese & Spanish]
Espanol, Portugues

Manageability
SMS MOM
Networking
Scalable Networking, OCS, Communications Server

Small Business Server


Windows Essential Business Server
SoftGrid
Microsoft Application Virtualization, SoftGrid, Softricity
Data Protection Manager (DPM)
DPM related issues
Microsoft Deployment Team Blog
BDD Team
Vista
Vista Team Blog

Microsoft GTSC Romania - Enterprise Platforms Support


Microsoft GTSC Bucharest / Covering topics such as: Windows Server, Failover Clustering, Performance, Printing,
Core OS, AD, Deployment, WSUS, SCOM/SCCM

Product Team Blogs

Virtualization
DPM
Clustering

Windows Server

Recent Posts

CROSSPOST: New Server Management Tool: Project Name “Honolulu” September 14, 2017
High CPU/High Memory in WSUS following Update Tuesdays August 18, 2017
Enabling Surface Laptop keyboard during MDT deployment August 18, 2017
Deploy Windows from USB drive to Surface Studio June 14, 2017

Tags

Activation Backup/Restore Bitlocker Chris Butcher Chuck Timon Deployment


Disaster Recovery DPM Failover Cluster Himanshu Singh Hyper-V James Burrage
Jeff Hughes John Marlin Joseph Conway Keith Hill Manoj Sehgal MBAM (Microsoft
BitLocker Administration and Monitoring) Mike Rosado Naziya Shaik Pages Performance Robert

Mitchell Scott McArthur SCVMM Sean Dwyer Servicing Shannon Gowen Steven Graves
Storage and File Storage and File Systems Surface Vic Reavis Vimal Shekar

Windows 7 Windows 8 Windows 8.1 Windows 10 Windows 2012 Windows


Server 2008 Windows Server 2008 R2 Windows Server 2012
Windows Server 2012 R2 Windows Server 2016 Windows Vista

Archives

September 2017 (1)


August 2017 (2)
June 2017 (2)
May 2017 (2)
April 2017 (1)
March 2017 (4)
All of 2017 (14)
All of 2016 (16)
All of 2015 (25)
All of 2014 (31)
All of 2013 (43)
All of 2012 (48)
All of 2011 (45)
All of 2010 (41)
All of 2009 (63)
All of 2008 (61)
All of 2007 (8)

Tags Performance Storage and File Systems

Join the conversation Add Comment

Anonymous 49 years ago

Great Article – Exactly what I was looking for

Anonymous 49 years ago

This is more of a general PerfMon question but I have not been able to find an answer
anywhere so I was hoping the Performance or Core Team may know.

I'm trying to troubleshoot an intermittent system hang which causes the system to become completely
unresponsive to the point where a reboot is required. When using a data collector set in Performance Monitor,
the data is apparently buffered in memory until some point where the buffer fills up and flushes the data to
disk (writing it to the .blg file). Since the data only occasionally gets written to disk, when the system hangs
and is reset, all of the log data that was in the buffer is lost. I want to be able to log data up to (or nearly up
to) the point where the system hangs but this buffering mechanism doesn't allow that to happen.

Is there a way to force counter data to be flushed to disk at a specific interval or just turn off buffering
altogether so that the data is written to disk at each sample interval?

Anonymous 49 years ago

can you tell me, how to view these counter values?? Because i want these counter
values.

Brian Day [MSFT] 49 years ago

Awesome stuff, Flavio! I'm a PFE for Exchange and was curious what tips you would
suggest for someone trying to determine if latency is coming from outside storport (so the storage system
itself or components in between the host and the storage), something like antivirus scanning, or other
unknowns. I've played with storport logging, but only being able to set a threshold and not know the total #
of packets passed during the logging makes it hard to say if the resulting log is 1% of all packets or 50% of all
packets over the configured threshold and if a deeper look is warranted. Thank you in advance!

Yassine Souabni 49 years ago


Helpful explanation – thanks !

Robert Smith, PFE 49 years ago

steve_Zhou: How can I track application I/O


I believe you can get what you're looking for by using "ProcessIO Write
Operations/sec" and "ProcessIO Read Operations/sec"

Thanks,

Robert Smith, Sr. PFE, Microsoft

Robert Smith, PFE 49 years ago

Sreenath Gupta: Restart required to fix problem


Restarting could mean either something in the OS is wrong, or possibly the
server heats up under load and a reboot might cool it down just enough to let
it run normal for some time. Some processors have the ability to reduce
frequency at certain temperatures,
in order to keep from burning out. The obvious side-effect is reduced
performance. I see this a lot for computers and servers that get filled up with
dust and you can't otherwise explain why the computer performs slower and
slower over time. I had one at home
that would run for a while and then freeze solid. I tried everything until I once
almost burnt myself on one of the RAM chips. I put a fan on it and it stopped
freezing. I got some RAM fans and big CPU fans and it runs good to this day,
about two years later.

With the OS, one thing that can reduce performance over time, and be
temporarily relieved with a reboot, is a resource "leak". This is when for
example a process has a thread created for it, and later that thread's work is
finished, but the thread not cleaned
up. We see this with "handles", "processes", or other general memory
allocations. These can be hard to diagnose, but using PerfMon and monitoring
"pool" or "nonpaged pool" memory and watching over time. These values can
go up and down during the day with load,
but if the long-term trend is upward, this could indicate a leak. CSS can help
you with these, there are good articles and blogs published on this, and
possibly some of the newsgroups.

If you suspect a HDD going bad…you might want to see if you can check
"SMART" data. Most if not all drives these days have the ability to report some
amount of information through SMART. Your hardware vendor should have
tools or methods you can use to interrogate
the drives and view the SMART data, to see for example if the drive is marking
a lot of blocks permanently bad, predicts failure, etc.
Thanks,

Robert Smith, Sr. PFE, Microsoft

Robert Smith, PFE 49 years ago

Flavio put together several great articles. He is no longer at Microsoft, but still in the
storage industry. I'll try to answer a few of the outstanding questions.

1. Jereme: RAID IOPS not adding up.


Jereme, the value of about 180 IOPS for a disk, is worst case, full-stroke seek. When you run an I/O load test,
some of the variables are LBA location on disk, read/write ratio, randomness, size of I/O, the on-disk cache
(per disk), the RAID controller cache
(if using a controller), file cache in the OS, etc. If you want to see the point at which the disks can barely keep
up, your I/O workload has to be sufficiently random and mixed, so as to mitigate the effects of cache. I'm not
familiar with all the I/O load-sim
tools available, but IOMeter for example would allow you to "step" through things like number of outstanding
I/O as your test proceeds, so you could be running PerfMon or just watching IOMeter and see at which point
disk response time falls below acceptable
levels (usually 10 ms avg).

Thanks,

Robert Smith, Sr. PFE, Microsoft

Anonymous 49 years ago

I'm having a problem with the Disk Transfer/sec calculation and the corresponding
Reads/sec and Writes/sec.

I have a server running a RAID containing 3 x 15K, 300GB drives.

By my calculations I should be able to get a maximum IOPS of 545.46 doing nothing but reading 100% on this
system, that should be my fastest scenario possible. However, through both WMI and Performance Monitor I
am seeing read and write/sec values of about
1000 each and the Disk Transfer/sec at nearly 2000.

The scale for all of these counters is set at 1 and I am only monitoring the single drive corresponding to the 3
disk RAID. How is performance monitor giving such high values? Or are the values to scale or mean something
else?

Alexander 7 years ago

In most cases Disk Reads/sec, Disk Writes/sec & Current Disk Queue is enough to
understand system status and define bottleneck. sometime you will need to drill down into processesphysical
disks counters to check which process do most readswrites.

Measuring Disk Latency with Windows Performance 7 years ago


Monitor
It may just be me, but the link to Measuring Disk Latency with Windows Performance
Monitor goes to a "Bad Request" page. Are you sure that link is correct? I would really love to read this article.

Kurt Gunter 7 years ago

We try to use Perfmon to diagnose and troubleshoot storage performance on our


Windows 2008 R2 systems. However, it seems that there are many very basic bugs with the Perfmon GUI
interface under 2008 and 2008 R2.

It is clear that only basic QA testing was done, as these bugs are replicable on any system (and have been
logged under case REG:112021466743035 by myself). This is pretty disturbing stuff as it calls into question the
reliability of the entire product. If they haven't fixed the obvious bugs, how can we rely on anything else it tells
us?

The problem is very simply replicated: if you have multiple (i.e. more than 10) LUNs or disks on a system, try
creating a New Data Collector Set. Add in a handful of individual counters (i.e. don't use Total or All). Do the
same for the instances (LUNs), i.e. select them individually instead of using All.

The result is that random counters are dropped from the set. You THINK you just selected Reads/sec,
Writes/sec, Disk Queue, and Split IO/sec for your 10 LUNs, but what you actually GET is only some of these.

Other times you click OK and NONE of these are added to the data collector set. The list is blank.

This is a massive issue for us. We have systems with 100+ LUNs and we wish to collect certain counters only
for certain LUNs because if we select everything (* for All Instances) our logs get massive. But the only
workaround is to do that; to select everything. Not very useful.

This is my 6th logged-and-confirmed MS bug and 5 of them have revolved around basic issues with MS
products where large-scale deployments have simply not been tested. Try using MS Cluster or frequent VSS
snapshots with 30+ LUNs and you'll quickly see what I mean. It's very concerning because it means MS seems
to be skimping on QA… and it's the big customers that will suffer (and consueqently not choose MS products).
But this one was especially annoying as it prevents Perfmon doing what it was designed for, i.e. act as a
reliable troubleshooting tool. I hope this gets fixed soon.

TH 7 years ago

is there a place that explains terms like latency, IOPS etc?

thanks

TH
http://www.tamirh.com

Michiel 7 years ago

Thank you for explaining, this is helpfull information.

JEET 6 years ago

Great!!!!!!

William 6 years ago

Thanks for putting this together. I've got a perf mon I'm looking at for a customer and
do not know which counters to focus on, etc so I've been going through many sites to get an understanding.
This is a great start, thank you!

http://www.learntomarketnow.com

Jim G. 6 years ago

Is there some documentation on all available perfmon performance counters and


their descriptions that I can reference. I am working on a Win Server 2008r2 performance team and I would
like to review for some reporting I would like to do.

Thanx,

Jim G.

65 6 years ago

Sekhar 5 years ago

Great Article. In my experience we see most of the Issues related to IO subsystem.


Working with storage vendor is not an easy, it takes longer time for troubleshoot and fix as Many components
involved with Storage.

http://www.shop2vizag.com
Ripton White 5 years ago

I like what Flavio did. He took the time to explain in great detail about performance
monitoring. Way to go Flavio. Although this does not address MY concerns, as a student of server2012 I can
associate and appreciate the value of the article.

king boy 5 years ago

very helful indeed..thanks

Sreenath Gupta 5 years ago

Hello My friend, i think i reached you atlast, i am in very big trouble with my new Dell
R820 server with Hyper-v 2012 installed on it. I am facing high latency issues with all the vm's as well with the
host machine, for few hours server works good and
after that i am facing the latency issue again and at the time of latency, i am restarting the server and problem
will disappear, and again it come back after some time. I have raised ticket with Dell and Microsoft for the
same and both of them could not solve
my issue. We are using the server for virtuliazation and the hardware configuration is 64 GB RAM/xeon
processor quad core 2 sockets total 32 threads/1 x 3 SAS 6 gb 7k RPM. Kindly suggest me if is this due to the
HDD issue.

steve_zhou 5 years ago

Hi Flavio,

If the Disk Transfers/sec (Disk Reads/sec, Disk Writes/sec) only counts completed I/O, which means it cannot
tell us about the application IOPS to the disk. How to measure IOPS from an application through PerfMon?

thanks!

Cakeway.in 4 years ago

wonderful information, I had come to know about your blog from my friend nandu ,
hyderabad,i have read atleast 7 posts of yours by now, and let me tell you, your website gives the best and the
most interesting information. This is just the kind of information
that i had been looking for, i'm already your rss reader now and i would regularly watch out for the new posts,
once again hats off to you! Thanks a ton once again, Regards,
https://www.cakeway.in

IanB 4 years ago


Nice article, thanks. Perfmon is very useful – I like to watch (not log) disk/tape/network
throughput of our backup systems. But when monitoring bytes/second, every second, I get very peaky graphs
– up to 300MB one second, almost down to 0 the next second,
back to 300MB, down to 0, etc – repeated forever. Is this expected, or does my disk system need tweaking to
get consistent 300MB/sec?
Thanks

themightym 4 years ago

Thanks…very helpful…

Anonymous 3 years ago

FYI: The Mark Friedman article referenced here, “Top Six FAQs on Windows 2000 Disk
Performance,” has moved to:

http://archive.oreilly.com/pub/a/network/2002/01/18/diskperf.html
-or-
http://archive.oreilly.com/lpt/a/1503

Is it safe to say that this article still applies to Windows Server 2012 and Windows Server 2012 R2?

Alton long 2 years ago

Does anyone know or can explain what “PDO” means and its usage?

Alton long 2 years ago

Does anyone know or can explain what “PDO” means and its usage?

Hendo 2 years ago

I/O block sizes and latencies are only reported on with an average. Any way to get a
breakdown.. like a histogram? It appears like the data is available.. from the section on block sizes: “Example: If
the system had ninety nine IO requests of 8K and one IO request of 2048K, the average will be 28.4K.
Calculation = (8k*99) + (1*2048k) / 100.”

Ron McLaren 2 years ago

Flavio, I have been on the trail of possible corruption on my C: drive, but I am a little
confused. I have read about CHKDSK, and have used it. However, I get the impression that it is doing an
integrity check that is about logica errors in the file system (on Win10 in this case). When I joined the
computer industry (1966) it was necessary to know all sorts of things – about how disks work, for instance.
Now that is all buried under layers of software that make life a lot easier.

But this means that I don’t know how to find out if my disk is having a rising number of repeats, as we used to
call them, arising from failed transfers. Can I do that?
Regards, Ron McLaren

© 2019 Microsoft Corporation.

Terms of Use Trademarks Privacy & Cookies

You might also like