You are on page 1of 22

Getting Started with Monitoring in Lync 2010

Published May 2011

Copyright
The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication. This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, email address, logo, person, place or event is intended or should be inferred. 2011 Microsoft Corporation. All rights reserved. Microsoft, Microsoft, Lync, and SQL Server are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

Microsoft Corporation

Contents
Copyright........................................................................................................................................2 Contents.......................................................................................................................................... 3 Introduction..................................................................................................................................... 4 What is Monitoring Server, and What Is Monitoring Server Reports?..............................................5 How Do VoIP Networks Work, and Why Should I Care About That?.................................................7 How Do I Monitor VoIP Traffic?...................................................................................................... 11 What Should I Look for When Monitoring VoIP Calls?....................................................................15 Poor Call Percentage............................................................................................................... 16 Round Trip.............................................................................................................................. 16 Degradation (MOS)................................................................................................................. 17 Packet Loss............................................................................................................................. 19 Jitter........................................................................................................................................19 What Does It Mean to Monitor for Usage?.....................................................................................20

Microsoft Corporation

Introduction
One feature that you'll find in any modern car is a dashboard. (At least we assume that all modern cars have dashboards.) Dashboards typically contain all kinds of gauges, dials, and other instruments that provide information about a car and how well that car is performing. Some of these instruments are designed to run at all times and to keep a running tally of the car's current performance. For example, the speedometer tells you how fast you're driving, and the gas gauge tells you how much gas is left in the gas tank. Other instruments make their presence known only on an as-needed basis. You say you just looked at the dashboard and you don't see the Low Oil Pressure warning light? That's good: that means that you are not running low on oil. Setting aside any legal requirements, there isn't any reason why your car has to have a speedometer or a gas gauge; if you don't have a speedometer or a gas gauge your car will run just fine. However, your driving experience will be a less-than-optimal one. Why? Because while it might seem like everything is fine, you'll never really know for sure. It might seem like you're staying below the speed limit, but how can you be sure? It might seem like you have enough gas to make it to work and back, but how can you be sure? No matter how well things might seem to be going, there will always be a certain amount of anxiety and uneasiness. And anxiety and uneasiness don't translate to an optimal driving experience. Real-time performance data, like your current speed or the current amount of gas in the gas tank, is useful; so, too is long-term data about how your car is being used. For example, tires should be changed every 40,000 miles. Yes, there's no substitute for checking your tire tread when deciding whether you need new tires. However, knowing how many miles you've driven can give you some idea of how carefully, and how often, you should check the tire tread. You say you've driven only 5,000 miles on these tires? Then your tires probably don't need to be a major concern. But if it has been 48,000 miles since your last tire change, well, that's a different story. And you won't know that unless you have usage data that tells you that the car has been driven for 48,000 miles. Here's another reason why cars have dashboards. Suppose you're out driving and something bad does happen. A well-designed dashboard and instrumentation system won't necessarily prevent problems from happeningif a water pump is going to fail, then that water pump is simply going to fail. However, many times a warning light can help minimize damage and help prevent a minor problem from becoming a major problem. For example, the Low Oil Pressure warning light doesn't come on only after your car has used up all its oil and is now on fire by the side of the road. Instead, the Low Oil Pressure warning light comes on before you run out of oil. It comes on when you still have enough oil to safely make it to the next gas station. Having to stop and buy a quart of oil is an inconvenience. Having to bail out of a burning car while traveling down the highway is something more than an inconvenience. In other words, the instruments on your dashboard can help prevent a minor problem from turning into a major problem: if the Low Oil Pressure light comes on, then you should add oil to the car. That's useful to help you finish your current trip and to get from here to there. In addition, if you pay close enough attention to these warnings, they can also help you distinguish
Microsoft Corporation

between a one-time anomaly and a real problem. Did you simply forget to put oil in the car this time around? That's fine: the Low Oil Pressure warning light comes on, you add oil, and the problem is resolved. But suppose the Low Oil Pressure warning light comes on again next week and then the week after that. A car that's working properly shouldn't require an oil change every week. No matter what your car salesperson might tell you. So what does all this have to do with Microsoft Lync Server 2010 communications software? Needless to say, cars are complex pieces of machinery, cars cost money, and a broken car can cause problems not only for you but for other people as well (like family members or the people in your carpool). These reasons explain why it's a good idea to use the instrumentation on the dashboard to routinely monitor your car for both usage and performance and to respond promptly any time one of these instruments suggests there might be a problem somewhere. As for Lync Server 2010, Lync Server is a complex product, probably even more complex than a car. Lync Server costs money, and a broken deployment of Lync Server can cause problems not only for you but for other people as well (like all the people in your organization who depend on Lync Server for phone service, instant messaging, conferencing, and other communication needs). These reasons explain why it's a good idea to routinely monitor Lync Server for both usage and performance and to respond promptly any time there's the slightest hint that there might be a problem somewhere. That's a reasonably good analogy, but, at first glance, there would appear to be one major flaw in it. After all, it's easy to monitor usage and performance in your car: all cars come equipped with a dashboard, and all dashboards come equipped with instruments designed to monitor usage and performance. But what do any of those things have to do with Lync Server? The fact that you're asking that question can only mean one thing: it's time to discuss Monitoring Server and Monitoring Server Reports. Note. If you'd like to see some real-life examples of Monitoring Server Reports in action, then you might be interested in the following videos: Help Desk Troubleshooting: Lync Call Issues at http://go.microsoft.com/fwlink/? LinkId=218906 System-wide Troubleshooting: Lync Call Connectivity at http://go.microsoft.com/fwlink/?LinkId=218907

The Help Desk Troubleshooting video demonstrates how a support person might assist a user who has been having call quality problems, while the System-wide Troubleshooting video shows how administrators can use Monitoring Server Reports to determine whether users are able to make and complete calls and, if theyre not able to, why.

What is Monitoring Server, and What Is Monitoring Server Reports?


Microsoft Corporation

Monitoring Server is a Lync Server server role designed to collect usage information and Quality of Experience (QoE) data about the communication sessions that your users are involved in. Note. We should clarify from the start that Monitoring Server keeps track of information about each session: who contacted who, which endpoints were used in the session, how long did the session last, what was the perceived quality of the session, and so on. However, Monitoring Server does not record and store the actual session itself. This is true for calls and also for instant messaging (IM) sessions: although Monitoring Server records information about IM sessions, it does not maintain a record of each instant message that was sent during the session. That's the job of Archiving Server, a server role not discussed in this document. Monitoring Server is built around the fact that SIP-compliant endpoints (including software programs, such as Microsoft Lync 2010, and hardware devices, such as an IP phone) automatically keep track of information during a call. This information includes: End-to-end metrics End-to-end metrics deal with the actual transmission of the call itself; that is, they provide a sort of travel log as the call journeys across the network. These metrics (which include things such as packet loss, jitter, and round-trip times, all of which will be explained later in this document) provide information about what happened to the call from the time it left your phone to the time it arrived at the other person's phone. What these metrics don't do is describe things such as whether the other person was hard to hear or whether there was excessive noise on the line. That information is conveyed by the endpoint metrics. Endpoint metrics Endpoint metrics enable you to determine what the call sounded like to each person involved in the call. Was it hard to pick out the other person's voice over the background noise? Could you hear an echo of your own voice every time you talked? Did it sound like the beginning and ending of sentences were cut off? Endpoint metrics often deal with hardware issues (that is, problems with your speaker or microphone) or with environmental issues (like having multiple phones involved in a call that are located too close to one another). Configuration parameters Configuration parameters relay basic information about each endpoint involved in the call, including IP address, link speed, the Edge Server used in the call, and so on. Event ratios To be honest, we aren't sure where the term "event ratios" came from. We do know that event ratios report back diagnostic IDs, SIP response codes, and other error messages generated during a call. As you might expect, these messages can be extremely useful when trying to diagnose problems that occurred during a call.

At the end of each call, SIP-compliant endpoints automatically transmit this information to the Front End Server that facilitated the call. You don't have to do anything to get endpoints to transmit that information; that behavior is built into SIP. However, if you want to collect and store that information, then you need to install and enable Monitoring Server.

Microsoft Corporation

Note. For details about installing Monitoring Server, see Deploying Monitoring in the Lync Server TechNet Library at http://go.microsoft.com/fwlink/?LinkId=207085. If you do install and enable Monitoring Server, then call information is gathered by agents running on the Front End Server and relayed to the Monitoring Server; that relayed information is then stored in a pair of Microsoft SQL Server databases. And then what? Well, after the data has been stored in the database, administrators who are familiar with writing SQL Server queries can access and analyze the information to help gauge system usage or to troubleshoot call-related problems. That's fine, except that many system administrators might not be able to write the complicated SQL Server queries needed to extract useful information from the monitoring databases. Fortunately, there's an alternative: administrators can use Lync Server Monitoring Server Reports to perform these same tasks. Monitoring Server Reports (which ships with Lync Server) uses the SQL Server Reporting Service to provide administrators with predefined reports that cover key topics such as: The number of users who log on to Lync Server and how many of those users actually use the system after they have logged on The total number of IM sessions and conferences held in the organization Quality ratings for phone calls and conferences Inventory-type information regarding the IP phones and hardware devices in use in the organization

The individual reports included in Lync Server Monitoring Server Reports are discussed in more detail in the document Understanding the Monitoring Server Reports, also in this download package. Instead of giving you that level of detail, this document provides more general information about why you might want to monitor in the first place (actually, why you do want to monitor) and what you should monitor for. In doing so, this document tries to answer the following questions: How do Voice over Internet Protocol (VoIP) networks work, and why should I care about that? How do I monitor VoIP traffic? What should I look for when monitoring VoIP calls? What does it mean to monitor for usage?

How Do VoIP Networks Work, and Why Should I Care About That?
Microsoft Corporation

When it comes to conducting voice calls, the public switched telephone network (PSTN)or what we tend to think of as the "regular old" phone network of landlines and cell phoneshas a number of advantages over a VoIP program such as Enterprise Voice. For one thing, the PSTN network is an example of a "connection-oriented network." That means that, when you pick up the phone and place a call, the route that the call will take has been predetermined for you. Not only that, but when you make a call on the PSTN network, a circuit is reserved for you, and you are automatically allocated all the resources (such as bandwidth) needed for you to complete the call and carry out your conversation. After this circuit has been reserved for you, you do not need to worry about additional network traffic impinging on your call: in general, your call continues regardless of how many other calls are being placed at the same time. Note. Admittedly, it's possible that there might be times (for example, on Mother's Day) when you try to make a call only to be told that "no circuits are available." In that case, you wont be able to make your call until someone else hangs up and frees up that circuit and those resources. However, if you are able to make the call, the clarity and the quality of that call will be just as good as if you were the only one in the world currently using the phone. Does it make a difference that the circuit is predetermined for you the moment you dial a phone number on the PSTN network, and does it make a difference that the PSTN uses a dedicated network for your phone calls? As a matter of fact, it makes a big difference. For one thing, these factors make PSTN phone calls very fast. Because the route has been predetermined, there's never a time when the system has to make a decision about how a call should be routed. As we'll see, the difference between a call with acceptable clarity and one without acceptable clarity is often measured in a matter of milliseconds (a millisecond equals 1/1000th of a second). When it comes to phone calls, faster is always better. And PSTN calls tend to be very fast. In addition to that, having a dedicated circuit helps to ensure that the words you speak into your phone arrive at their destination in the correct order and in a regular and predictable timeframe. Suppose you say something like this into the phone: "Hey, how's it going?" When you do that, it's reasonable to expect that all the words you spoke will arrive at their destination. It's also reasonable to expect that these words will arrive in the same order you said them; the listener on the other end shouldn't hear something like this: "It going hey how's?" And, it's reasonable to expect that the rhythm and cadence of your original speech will also be heard by the listener at the other end. Your words should be heard exactly as you said them. For example, you should not have long, unnatural pauses between words: "Hey , how's it
Microsoft Corporation

going ?" Thanks to dedicated circuits, and thanks to its guaranteed allocation of resource, the PSTN network is very good at ensuring rapid and reliable delivery of information. But, as with all things, rapid and reliable delivery of information comes at a cost: a network that can deliver only one type of information (like the PSTN network) tends to be more expensive to create, maintain, and operate than an all-purpose network that, among other things, carries VoIP traffic. Historically, PSTN phone calls have been expensive to make, and long-distance calls have been more expensive to make than local calls. You get high-quality and highly reliable service on the PSTN network but at a price. Now, let's compare a phone call made on the PSTN network with a VoIP call made on a TCP/IP data network (including both intranets and the Internet). Note. For simplicity's sake, we're going to talk only about audio calls. However, the basic arguments apply to video calls as well. TCP/IP networks are known as "connectionless networks." That means that the route traveled by network packets is not defined in advance. Instead, each packet sent across the network contains its destination address. As the packet travels through routers, switches, and other networking hardware, each device it encounters reads the destination address and then forwards the packet based on a number of factors, including the amount of traffic currently traveling down a given path. That means that two network packets from the same conversation could take very different routes in order to reach their destination: if route A is highly congested, a given packet could be sent along route B instead. In turn, this has the potential to introduce delays in the conversation (for example, unnatural pauses between words) and can cause packets to arrive out of order or later than expected (for example, packet 1 takes 20 milliseconds to arrive, packet 2 takes 20 milliseconds to arrive, but packet 3 takes 53 milliseconds to arrive). All these things can dramatically impact a conversation. And not for the better. Note. Here's an interesting phenomenon that illustrates the difference the delivery rate can have on a conversation. Echo is a well-known problem in which a speaker hears everything he or she says echoed back to them. Echo is caused by "leaky" signals on the PSTN network and does not occur on TCP/IP networks: if you make a phone call that never has to cross over onto the PSTN network, then you will not experience echo on that call. Echo is a phenomenon restricted to the PSTN network. But here's the interesting thing: users on the PSTN network will rarely, if ever, hear echo on their phones. Why not? Because the echo typically travels so quickly that people are unable to process it. If you barely finish speaking when an echo is returned to you, your brain doesn't have time to register the returned echo. You might hear a very low amount of noise, but it's more likely that you won't hear anything at all.
Microsoft Corporation

Instead, if you're going to hear an echo at all it will probably be on a VoIP phone. Why? Well, if an echo is returned in less than 50 milliseconds (give or take), it will probably be imperceptible. That kind of return speed almost always is achieved on the PSTN network. However, that kind of speed can't be guaranteed on a TCP/IP network. Because packet deliveries are more likely to be delayed on a VoIP network (that is, they are more likely to take more than 50 milliseconds to arrive), VoIP users are more likely to hear an echo. Another potential drawback to VoIP calls is the ongoing and ever-present contention for resources. As we noted earlier, when you make a call on the PSTN network, you get your own circuit and your own set of resources, and that circuit (and your call) cannot be affected by any other traffic on the network. That's not the case with VoIP calls, however. VoIP traffic travels on the same network as all your network traffic; for example, your voice packets have to compete with data packets created by users copying files from one computer to another. The more congested the network, the longer it takes for voice packets to traverse that network, which makes it more likely that voice packets will be lost, delayed, or arrive out of sequence. Many (if not most) of the problems found on VoIP networks are due to network congestion and the resulting lack of bandwidth. Note. We should probably mention that VoIP vendors do attempt to maximize the chances that available resources are given to voice calls as opposed to other types of data traveling on the network. For example, a higher priority can be assigned to voice packets. If the network needs to drop packets in order to relieve congestion, it will drop packets with a lower priority first. However, this does not mean that voice packets will never be dropped, and nothing stops other data types from being assigned a high priority. In case you're wondering, the IEEE P802.1p specification allows for eight different priority levels, with 0 representing the lowest-priority traffic and 7 the highest priority traffic: Network priority 0 1 2 3 4 5 6 7 Traffic type Background Best Effort Excellent Effort Critical Applications Video, < 100 milliseconds latency Voice, < 10 milliseconds latency Internetwork Control Network Control

The Resource Reservation Protocol (RSVP) can also help maximize the resources allocated to a VoIP call. RSVP travels ahead of an actual VoIP phone call and asks all the routers and intermediaries it encounters along the way if it can reserve a set amount of bandwidth for
Microsoft Corporation

10

the call. If accepted, these reservations can help ensure adequate resources for a call. However, routers do not have to accept RSVP reservations and, on a network that's already congested, it's likely that those reservations will be refused. In that case, of course, RSVP won't be able to help speed up your calls. So does this mean that you shouldnt use VoIP? Far from it. Although there are disadvantages to the VoIP technology, there's also a major advantage: VoIP calls tend to be less expensive than PSTN phone calls. In addition, the average VoIP call can be almost as good and almost as clear as the average PSTN call. It's just that to get that level of quality, VoIP administrators have to do more than rely on the phone company to take care of things. Which simply means this: it's important to routinely monitor your VoIP calls and your VoIP network. To a large extent, you don't have to worry about the PSTN network. Although problems do occur on the PSTN network, they tend to be fewer and further between. VoIP networks, on the other hand, are prone to problems that simply do not occur on the PSTN network. However, by keeping tabs on the system, and by trying to solve little problems before they evolve into big problems, you can minimize these problems and minimize the impact of these problems.

How Do I Monitor VoIP Traffic?


In the previous section of this document, we tried to establish one thing: it's important to monitor your VoIP network. That's good advice, except for this: what should you be looking for when you monitor your VoIP network? In order to answer that question, let's first talk a little bit more about how VoIP networks work. By doing that, we'll learn more about the primary issues that affect VoIP networks and VoIP calls, including: Delay Packet loss Echo Jitter Clipping Noise

OK, so then how do you monitor VoIP traffic? Let's start by assuming that your VoIP phone has just rung, and you've picked up the phone and said, "Hello?" What happens now? Well, what happens now is this: your VoIP phone uses a built-in codec to transform your speech into digital signals that can be transmitted across the network. However, these signals aren't automatically transmitted the moment they're encoded. Instead, the codec creates individual packets containing a specified amount of speech (for example, 20 milliseconds worth of speech). As soon as a packet has been "filled," it is then sent across the network. The codec then immediately begins filling up the next packet. (Think of it all as being like bottles being filled on an assembly line. As soon as bottle 1 is filled, bottle 2 is queued up and gets filled in turn.)
Microsoft Corporation

11

Twenty milliseconds is actually a very small amount of time. For example, blinking your eye typically requires 300-to-400 milliseconds. However, for illustration purposes let's pretend that each packet contains a single word. Suppose we say this: "Hey, how's it going?" Under our pretend system (where each packet contains a single word), the following packets would be created and transmitted across the network: Packet number 1 2 3 4 Packet contents Hey how's it going

As you can see, we've already introduced a small amount of delay into the system. (Delay is defined, in somewhat simplistic terms, as the amount of time that elapses between the time you say something and the time the person on the other end of the line hears something.) We've introduced delay partly because the system needs to fill up a packet before transmitting any information and partly because the process of encoding speech into a digital packet requires a little bit of time as well. Note. And yes, that means that the decoding process at the other endwhere the digital signal is converted back to recognizable speechis going to introduce even more delay. On top of that, each time our packet needs to pass through a network router, we'll introduce additional delay. That's because each router needs to read the address information stored in the packet and then determine how to best route that packet toward its final destination. The longer a router takes to determine the "next hop" for that packet the more delay introduced into the journey. Let's assume that our packet has arrived at its destination. Is that packet immediately played back to the listener? Typically the answer is no, at least not immediately. Why not? Well, as we learned a little while ago, TCP/IP networks cannot guarantee that network packets will arrive on a fixed delivery schedule. Ideally, if each packet contains 20 milliseconds of speech, then each packet would take 20 milliseconds to arrive at its destination: Packet number 1 2 3 4 Packet arrival time (from the time you began talking) 20 milliseconds 40 milliseconds 60 milliseconds 80 milliseconds

In that ideal scenario, you could play the first packet and, at the exact moment that first packet finished, the second packet would arrive, and you could play that packet. Again, it's like an assembly line: no sooner do you finish screwing the top on widget 1 then widget 2 arrives, ready for its top.

Microsoft Corporation

12

But remember, that's in an ideal world. In the real world, and on a real TCP/IP network, you might get packet arrival times that look like this: Packet number 1 2 3 4 Packet arrival time (from the time you began talking) 17 milliseconds 61 milliseconds 74 milliseconds 103 milliseconds Travel time (delay) 17 44 13 29 milliseconds milliseconds milliseconds milliseconds

As you can see, in this more-realistic scenario packet arrival times vary considerably: some arrive faster than you might expect while others arrive slower than you might expect. If you were to play these packets as soon as they arrived, you would introduce discontinuities in the speech: sometimes there would be unnatural pauses between words, while other times there might not be a long enough pause between words: "Hey . how'sit . going?" Either way, the conversation runs the risk of becoming garbled or unintelligible. Note. The variations in packet arrival time are known as jitter. The amount of time it takes for packets to arrive is known as delay or latency; in the preceding example, we had an average delay of 25.75 milliseconds (103 milliseconds of travel time divided by four packets). However, the jitter (that is, the variation in delay) ranged from 17 milliseconds to 44 milliseconds; that's 24 milliseconds of jitter (44 minus 17), which, by the way, is pretty good: for optimal conversation quality, you'd like to keep jitter below 20 milliseconds. However, anything below 30 milliseconds is considered good. So how do you combat the problem of jitter? One way, as we implied a moment ago, is to not play packets at the exact moment they are received. Instead, packets are held in a "jitter buffer" and then played after a specified amount of time. For example, suppose you have a jitter buffer of 80 milliseconds. In that case, you would wait 80 milliseconds before playing the first packet. In our preceding example, that would mean that three packets would arrive and be buffered before playback began: Packet number 1 2 3 4 Packet arrival time (from the time you began talking) 17 milliseconds 61 milliseconds 74 milliseconds 103 milliseconds Travel time (delay) 17 44 13 29 milliseconds milliseconds milliseconds milliseconds

If all goes well, by the time you finish playing the first set of packets in the buffer, the next set of packets will have been received and there will be no discontinuity in playback. Of course, that leads to another problem: what happens if packet 3 is really late in arriving? For example, what if packet 3 arrives after packet 4 has been received and is ready to be played? Or what happens if packet 3 doesn't arrive at all?
Microsoft Corporation

13

The answer to that question might surprise you a little. TCP/IP networks, and the TCP typically used on those networks, are designed to resend any packets that get lost along the way. That's an excellent strategy to be used when copying files. After all, if you're copying an executable file to another computer and some packets are missing (that is, some parts of the file are missing), that executable file is not going to execute. If you copy a file and a packet is lost, the network will automatically ask the sender to retransmit the missing packet. It might take a little longer for the copying process to complete, but you'll be guaranteed to get an exact copy of the file being copied. With voice calls, however, it's been shown that it's better to drop lost or delayed packets than to introduce even more jitter and discontinuity into a conversation. That sounds drastic, but remember, a voice packet might contain only 20 milliseconds of speech. Because of that, a dropped packet is typically going to consist of a very small bit of speech; dropping a packet is not the same thing as dropping entire words or phrases. For example, take this sentence: "Hey, how's it going?" If you need to drop a packet, then that might be equivalent to simply dropping the final g at the end of going: "Hey, how's it goin?" The truth is, an occasional dropped packet will probably go unnoticed. That's partly because the human brain is remarkably adept at using context to figure out what was actually said and partly because VoIP systems use sophisticated packet loss concealment techniques to try and hide the fact that a packet was dropped. For example, instead of a sudden silence, the system might play a tiny bit of random noise or simply repeat the previous packet, all to help disguise the fact that a miniscule portion of the speech is missing. Note. Of course, that works only if you need to drop an occasional packet. On a highlycongested network, you might find yourself dropping a large number of packets. Obviously those gaps are much harder to conceal and can lead to problems like clipping. As long as packet loss is limited to one out of every 1,000 packets everything will be fine. If you begin to lose more than one out of every 1,000 packets then conversation quality will begin to suffer. The jitter buffer is one technique used by VoIP systems to try and compensate for some of the problems inherent in conducting voice calls across a data network. Another commonly-used technique is voice activity detection. Voice activity detection is based on the fact that, in any given conversation, you spend as much as half your time not speaking. Silence is endemic throughout a conversation: there's silence on your end while the other person is talking; there's silence between the words and sentences you speak; there might be silence while you gather your thoughts or look up a phone number for someone. What difference does that make? Well, consider this example. Someone calls and asks you if you know whether or not Ken is in his office. You say, "Hold on a second," get up from your desk, and pop your head out your door to see if Ken is in. During that entire time there is no conversation whatsoever; everything is silent.

Microsoft Corporation

14

As we noted earlier, when you make a VoIP call your phone dutifully records everything you say and then converts your speech to network packets that are transmitted across the network. Of course, in this case, there's nothing to record; that means that the voice packets sent across the network dont contain any actual speech. That might not sound like a big deal, but it is: packets are packets, regardless of whether or not there's anything actually in those packets. That means that you're using up resources and contributing to network congestion just so that you can transmit nothing at all. Again, suppose it's time for lunch at the assembly line, and all the workers leave. At that point, it makes far more sense to shut down the assembly line than to keep having widgets roll by even though no one is there to screw on the tops. This is where voice activity detection comes into play. As the name implies, voice activity detection tries to determine whether or not you're actually speaking. If you're not, then voice activity detection temporarily stops recording and sending network packets, thus reducing network congestion. Does that make much of a difference? Yes: it's been estimated that voice activity detection lets you double the number of voice calls that can be carried on your network.

Note. That's the good news. The bad news? Voice activity detection can also lead to clipping, particularly at the beginning and the ending of a speech segment. For example, suppose you've been silent for a minute or two and now you suddenly begin speaking again. If voice activity detection is not quick enough to resume recording your speech, then it might miss the first sound or two. Take this sentence: "That's fine with me." With voice activity detection, there's always the chance that the preceding sentence could be transmitted something like this: "hat's fine with me." Again, the point here is not to dissuade anyone from using a VoIP solution. Instead, the point is just to emphasize that the fact that VoIP calls use the same network as other data traffic means you can (and will) encounter problems that you wont encounter on the PSTN network. That doesn't mean you shouldn't use VoIP. It just means that you should periodically monitor the system to help prevent these potential problems from developing into real problems. In the next section of this document, we'll talk about how to monitor calls using Lync Server.

What Should I Look for When Monitoring VoIP Calls?


One of the great things about VoIP calls is the fact that monitoring capabilities are built right into the call itself. VoIP calls are typically transmitted by using a pair of protocols: the real-time transport protocol (RTP) and the real-time transport control protocol (RTCP). RTP is used to transmit the actual data and ancillary information, such as a time stamp. (The time stamp indicates when the packet was created and is used to ensure that received packets are played in the correct order.) RTCP, meanwhile, keeps detailed information (such as jitter and packet loss)
Microsoft Corporation

15

about the transmission itself. If you have set up Lync Server Monitoring Server then, at the conclusion of a call, Monitoring Server will collect these statistics from each endpoint and store them in a SQL Server database. Each endpoint will send 250 (or more) bits of information about a single call. Multiply that by the number of calls made in your organization, and you'll probably have far more information than you can readily deal with. Because of that, we should take a little time to discuss some of the key metrics to be used as part of routinely monitoring audio calls made using Lync Server.

Poor Call Percentage


In Lync Server, a "poor call" is any call that reports an unacceptable value for at least one of the major call metrics (including round trip, packet loss, degradation, and jitter). For example, a call that has excessive round trip times will be marked as a poor call even if other metrics (such as packet loss, degradation, and jitter) are within the acceptable range. Note. With Lync Server, calls are either rated good, or they are rated poor; there is no category in between the two. One thing we should also mention is this: when looking at the poor call percentage (or when working with any of the Monitoring Server Report metrics), it's important to look at both the number of poor calls and at the total number of calls. For example, suppose you had a poor call percentage of 25% but had only a total of four calls. That means you really had only one failed call (out of four). In a case like that, it might be premature to draw any conclusions other than this: you should continue to monitor call quality to see if that value goes up or down when users make more and more calls. Likewise, suppose you have a poor call percentage of 6%. Is that a cause for alarm? Maybe: although that percentage is lower than 25%, if you've made 10,000 calls, then a poor call percentage of 6% means you've had 600 poor calls. That could equate to a large number of unhappy users, which means that it might be worth looking more closely at those poor calls to see if they share a common cause. For example, if 543 of the 600 calls experienced high jitter values, then you can safely assume that jitter is responsible for most of your call-related problems. In turn, you can then try to determine why your network is experiencing so much jitter.

Round Trip
What's round trip? Well, when you speak into your microphone as part of a VoIP conversation, everything you say gets encoded into a series of network packets. Each individual packet gets sent across the network to your listener; in return, the listener's endpoint sends a packet acknowledging that your packet was received. The amount of time it takes for you to say something and then receive this acknowledgement is known as the round-trip time. Round trip is sometimes referred to as latency; however, because latency can also mean a one-way trip (that is, the time it takes for a packet to arrive at its destination, without any concern for the acknowledgement packet), Monitoring Server uses the term round trip instead.

Microsoft Corporation

16

Round trip times are important for a very obvious reason: the longer it takes for network packets to be delivered, the less natural a conversation. People expect the words in a sentence to be recited one right after the other: "This is how a sentence is supposed to sound." People do not expect there to be long breaks between words (or within words): "This a po is sen edto is sound." not how sup

Round trips times are measured in milliseconds. As a general rule, round trip times of 200 milliseconds or less result in good quality conversations; round trip times between 200 and 300 milliseconds can devolve into choppy sounding speech because the words or sounds do not smoothly flow from one to the other. Long round-trip times can result in dropped packets and echo. As round-trip times exceed 500 milliseconds, conversation can become difficult. For example, a user will say something and then, after not hearing a reply, will say the same thing over again. At about the same time the first user repeats her statement, the reply is finally received, and the callers experience the less-than-ideal situation of talking over one another. Long round-trip times are typically the result of network congestion (although they can also be due to routing misconfiguration issues). The distance between two endpoints also plays a role: the farther a packet has to travel (as in a long-distance call), the slower its round-trip time. Likewise, the farther a packet has to travel, the more network switches and routers it likely has to traverse. Each switch and each router adds a tiny bit of delay to the trip, further increasing the round-trip time.

Degradation (MOS)
The mean opinion score (MOS) was originally created as a way to rate the quality of phone calls on the PSTN network. In those days, a group of people would gather in a room, listen to a collection of audio snippets, and then be asked to rate each snippet on the following scale: Score 5 4 3 2 1 Description Excellent. No problems can be heard during the conversation. Good. An occasional problem is noticeable, but these problems do not interfere with the conversation. Fair. Problems can be a little annoying, but, in general, conversation can still be carried out. Poor. Problems are extremely annoying. Conversation is still possible, but its difficult. Bad. Problems are so bad that conversation is next-to-impossible.

The ratings from each person would then be averaged to determine the MOS for a given audio sample. Historically, calls on the PSTN network are quite good: they have an MOS of between 4.3 and 4.5. Cell phone calls do not fare quite as well: cell phone calls receive an average score between 3.8 and 4.0. To put that in perspective, calls with an MOS of 4.0 or higher are
Microsoft Corporation

17

considered very acceptable by most people. As the MOS falls to 3.0 or lower, conversations are still good, but more and more people will find the quality unacceptable. Calls with a score of 2.6 or lower will usually result in one or more of the parties hanging up and trying the call again. In the past, an MOS was calculated by having people listen to, and rate, actual audio samples. Today, however, VoIP systems (including Enterprise Voice) use computer algorithms to estimate the MOS for a call. That provides useful information, but it's information that comes with a catch. The catch is that the MOS was originally developed for use on the PSTN network, andas we noted earlierthe PSTN network has been optimized for voice communications. As we've also seen, that's not the case on VoIP networks: VoIP networks are designed to carry all sorts of data, not just voice data. In that regard, you shouldn't compare the MOS of PSTN calls with the MOS of VoIP calls: the PSTN calls will almost always win. Another reason for PSTN's superiority is the fact that the codecs used to encode and decode VoIP packets always result in at least some loss of audio quality; even the highest-quality VoIP codecs (like the ones used in Lync Server) can provide a maximum MOS of only around 4.1. Again, that means that there is little to be gained in comparing the MOS of VoIP calls to the MOS of PSTN calls: it's already well-known that, on average, PSTN calls are going to get better ratings than VoIP calls. Because of that, one of the key metrics reported isn't the overall MOS; instead it's the degradation score. What's the degradation score? To begin with, the degradation score is based on the codec that was used and the maximum MOS that codec could be expected to achieve. For example, let's assume we have a codec that has a maximum MOS of 4.1; however, the call we just made received an overall MOS of 3.7. Obviously, we didn't have the best possible call: we achieved an MOS of only 3.7 out of a possible 4.1. But why didn't we have the best possible call? As you might expect, there are many reasons why a call might be less than optimal. For example, there might be considerable background noise in your office; that will affect conversation quality. You might have a microphone that isn't working correctly, or the person you're talking to might be using the speakers built into her laptop rather than a high-fidelity headset. Alternatively, you might be having network issues: jitter packet loss, delay, and all those other things we've discussed in this document. Or you might be encountering all these problems. The degradation score provides an estimate of how much the network affected the MOS; that means that the degradation score considers only network-related factorssuch as jitter, packet loss, and latencyand ignores such things as speech level and noise level. Instead, the degradation score is based on how well voice packets were able to traverse the network and is not concerned with the sound quality contained within those packets. For example, suppose the degradation score is .3. As you might recall, we fell .4 points short of the best-possible MOS (4.1 minus 3.7). A degradation score of .3 suggests that network issues were responsible for lowering the MOS by .3 points. In other words, most of the problem with the call was due to the network.

Microsoft Corporation

18

As a general rule, you should be concerned about network issues if you consistently see degradation scores higher than .5. If you consistently see degradation scores higher than 1.0 that almost always indicates serious problems with the network.

Packet Loss
Packet loss refers to network packets that were recorded but never played at the receiving end of the conversation. Packets can be lost because they quite literally were lost: they never arrived at their destination. Alternatively, packets can be dropped by the system because, while they did arrive, they arrived too late to be played. Either way, missing packets are a problem because each packet that does not get played represents a portion of the conversation that is missing. Suppose each packet contains an entire word (which, in reality, it doesn't) and that every other packet in the conversation is lost. That would mean that this sentence: "This is an example of packet loss and how it can affect a conversation." Would be rendered like this: "This an of loss how can a." Although that's a gross exaggeration of packet loss and its effects, there's no doubt that the fewer packets lost the better the conversation. Packet loss rates are reported as a percentage of the total packets sent; for example, if you sent 100 packets and three of them were lost, that would be a packet loss rate of 3%. In general, your packet loss rate should be quite a bit lower than 3%: packet loss rates greater than .1% (1 out of every 1,000 packets) can start to have a noticeable effect on conversation quality. Packet loss is almost always due to network congestion: the less traffic and congestion on your network the more likely your packets will arrive and arrive in a timely fashion.

Jitter
When you speak into a VoIP endpoint (like an IP phone), everything you say is recorded into a series of network packets, with each packet containing a finite amount of speech data (for example, 20 milliseconds worth). Those packets are then transmitted across the network, where the other endpoint plays back the contents of that packet. (Imagine recording your speech on a series of very small audio tapes and then having those audio tapes replayed by your listener, one after another, on a very tiny tape recorder.) Ideally, packets would arrive at their destination at regular, predictable intervals. For example, if you just now received the first packet, the ideal situation would be for packet 2 to arrive 20 milliseconds later and packet 3 to arrive 20 milliseconds after that. Why is that ideal? Because then you could play packet 1 the moment it was received and be confident that when packet 1 finishes playing, packet 2 will arrive. Unfortunately, though, there is no way to guarantee packet arrival time on a VoIP network. Packet 2 might arrive 20 milliseconds after packet 1, but packet 3 might not show up until 45 milliseconds later. Packet 4 might take another 83 milliseconds before it arrives, and yet packet
Microsoft Corporation

19

5 might show up just 22 milliseconds after that. This variation in packet arrival times is known as "jitter." Why? Well, a graph of hypothetical packet arrival times will likely give you a hint:

Keep in mind that jitter is not the same thing as packet loss, although there is definitely an overlap between the two. Packet loss refers to packets that were dropped and never played; this could be because the packets never arrived or because they arrived out of sequence or too late to be played. As we noted, jitter measures the variation in packet arrival times. You could have a conversation with no packet loss whatsoever and yet that conversation could still have a high amount of jitter. That would occur if all the packets arrived in time to be played, but there was a wide variability in the amount of time it took for each packet to arrive. In general, though, excessive jitter leads to excessive packet loss, as late-arriving packets are discarded in order to keep the conversation flowing. Ideally, jitter values will not exceed 20 milliseconds. Jitter values higher than 30 milliseconds will begin to lead to problems such as echo and dropped packets. Up to this point, our discussion has focused on performance monitoring. It's now time to switch gears and talk about monitoring for system usage.

What Does It Mean to Monitor for Usage?


When people talk about monitoring for Lync Server, they're often talking about the QoE monitoring just discussed: how many packets are being dropped on the network, how much jitter does the average call experience, what is the MOS for my Lync-to-Lync calls, and so on? What's wrong with monitoring for QoE data? Absolutely nothing: in fact, you should be monitoring for QoE data. However, in addition to periodically assessing call quality, and in addition to relying on
Microsoft Corporation

20

Monitoring Server Reports as a troubleshooting tool, it's a good idea to routinely monitor your system usage. Why? As you might expect, there are a number of answers to that question. As we noted earlier, problems with VoIP calls are often due to network congestion: key metrics such as lost packets, jitter, and long round-trip times are often due to overloaded networks. Is your network overloaded? That's a difficult question to answer at best and pretty much an impossible question to answer if you aren't familiar with your network usage patterns. But suppose you've gotten a number of complaints from users saying that their phone calls have been of unacceptable quality. In looking at the QoE data for each call, you notice that many of the calls were taking places between the hours of 11:00 AM and noon. In checking the overall usage records for Lync Server, you might see that most of the calls being made in your organization are taking place during that very same time period. If your QoE ratings for alternate time periods are acceptable, that would suggest that your problem is due to too many users trying to make calls during that one-hour time period. In other words, usage information can help you troubleshoot existing problems; this kind of information can also help you predict, and perhaps avoid, future problems. For example, suppose you monitor system usage over time and notice that more and more calls are being made each week; at the same time, the overall call quality is declining bit by bit. This implies that, if network use continues to climb at the same rate, your call quality might eventually decline to unacceptable levels. Knowing that now might allow you to proactively solve the problem, perhaps by adding additional bandwidth, by implementing call admission control, or maybe by educating users on the need to limit the use of video calls whenever possible. Usage information also helps you determine whether the fixes that you have implemented are working. For example, suppose one of your solutions to the network congestion problem was to encourage users to schedule conferences during "quieter" periods (9:00 AM to 10:00 AM, for example). Ideally, that will do two things: 1) lessen network congestion by reducing the number of calls that take place between 11:00 AM and noon; and, 2) by lessening network congestion, raise the QoE scores for all your calls. To know whether or not you have lessened network congestion, you need to do usage monitoring. And unless you do that type of monitoring, you won't be able to claim that a less-congested network is responsible for a rise in QoE scores. Usage monitoring can also help you determine whether you are reaching goals that you set for yourself (or the goals that were set for you) before you first implemented Lync Server. For example, suppose your intention was that, shortly after you deployed Lync 2010, 50 percent of your users would be actively logging on and using the system. Likewise, suppose you expected that, after users had a chance to learn about and adjust to the new system, calls to your help desk would decrease. Usage monitoring can help you answer these kinds of questions. Note. For details about how to retrieve this kind of data, see the document Understanding the Monitoring Server Reports, also in this download package. Usage information can also provide a way to estimate user satisfaction with Lync. For example, the User Registration Report can be used to estimate the number of users who are logging on to Lync Server and, equally important, the number of users who are actively using Lync Server. If users are not logging on to Lync Server, or if they are logging on but not making calls or sending
Microsoft Corporation

21

instant messages, that could indicate that they either don't understand how to use these new tools or that they don't like using these new tools. (Of course, there could be other reasons for this as well.) If peer-to-peer calls are declining but conferences are increasing, that might suggest that users are working more collaboratively. The data found in Monitoring Server Reports does not let you draw any conclusions in the area of user satisfaction, but the data does suggest topics that you might want to investigate in more detail.

Microsoft Corporation

22

You might also like