You are on page 1of 4

Speech Quality Issues & Measurement Techniques Overview

With the increased proliferation and popularity of Voice-over-IP (VoIP) networks, the importance of speech quality testing and verification has never been more important. Because of this, CT Labs maintains a significant focus in this area. When it comes to evaluating how well an IP telephony product performs in the area of speech quality, any automated lab measurements and analysis should ultimately be traceable back to live listener quality ratings as discussed later in this paper.

Sources of IP Packet Degradation


One factor to consider when measuring IP telephony speech quality is the quality of service of the IP data stream. Any wide area network (WAN) that interconnects communication nodes can undergo a variety of degraded conditions that can cause from mild to serious speech quality degradation. For example, lost packets can occur when critical path routers in a congested network exceed their available outgoing segment bandwidth. When a real-time speech data stream is degraded with lost packets, the result can be anything from brief audible pops and clicks to completely unintelligible intervals of noise. WAN latency, or delay from the sender to the ultimate receiver of the real-time packet stream, is caused by the cumulative effect of all transmission delays, data buffers, and jitter buffers in the data path, as well as voice coder/decoder algorithms in VoIP gateways and other network-edge devices. Latency can also cause significant degradation in perceived quality of the received speech by forcing callers to revert to a half-duplex "push to talk" mode of exchange. Jitterthe non-uniform arrival of periodic transmitted realtime data packets, can further exacerbate the quality situation if it is not corrected at the receiving end before the audio is reconstructed from the data stream.
Speech Quality Test Setup, VoIP Gateways

WAN Simulator

IP Gatekeeper

IP

VoIP Gateway #1

Quality Analyzer / Call Generator.

VoIP Gateway #2

Ultimately, speech quality analysis of IP Telephony equipment is only valid when conducted using the same IP channel conditions that will exist in the real-world. Thus, any lab tests must incorporate accurate simulation of these conditions as a part of the overall test plan. The above figure illustrates an approach

Page 1

CT Labs: Speech Quality Issues & Measurement Techniques Overview that involves interconnecting the IP telephony devices under test through a WAN simulation unit that allows precise setting of degraded IP channel conditions. More advanced versions of this type of simulator allow multiple WAN segments to be set with different, or even varying, degraded conditions which can simulate more complex network topologies. Performing speech quality tests in this type of labcontrolled environment can yield highly-reproducible results, important for manufacturers that wish to quickly resolve detected quality problems.

Speech Quality Analysis Techniques


CT Labs can perform a variety of speech quality analysis techniques on processed speech samples gathered from IP telephony and CT devices under test. One of the standard tests in the CT Labs VoIP Certification Suite is the Speech Quality Sampling Test which gathers processed speech samples for later quality analysis. The following speech quality analysis techniques are currently offered by the Labs: Mean Opinion Score (MOS) This is a live listener test designed to yield a single numeric score that rates the perceived quality of speech of the analyzed audio sample. For scores to be valid, this test must be conducted based on published guidelines identified in ITU-T Recommendation P.80. While more time-consuming and costly than automated quality analysis techniques, MOS is considered to be the authoritative way to rate perceived speech quality in communication systems. One common misconception of this type of testing is that it requires carefully-trained listeners. In fact, just the opposite is true: the test requires untrained, unbiased listeners. However, the test itself must be conducted in a carefully-controlled environment. That requirement coupled with the fact that the raw test scores must be carefully analyzed for validity makes it an unsuitable test for general product manufacturer Q/A labs. While CT Labs conducts MOS testing and degraded-sample gathering directly on telecommunication products, all MOS scoring work is outsourced to a highly-experienced company whose only business is live listener testing. Perceptual Analysis/Measurement System (PAMS) This is an automated speech quality analysis technique that predicts the speech quality and listening effort for collected speech samples. This repeatable and objective test derives a set of scores by comparing one or more high-quality reference speech samples to the processed (degraded) resulting audio. This technique is ideally suited to evaluate perceived speech quality when voice is packetized, and when the voice packets are subjected to degraded conditions including time misalignment that can result from bit and packet loss in frameoriented communication systems. PAMS is an automated technique that usually can be performed quickly, returning quality measurement results that are typically within 0.5 of a live listener MOS test rating. The PAMS listening quality and listening effort score is based upon a five point category judgement scale as follows:
Score 5 4 3 2 1 Listening Quality Excellent Good Fair Poor Bad Listening Effort Complete relaxation possible; no effort required Attention necessary; no appreciable effort required Moderate effort required Considerable effort required No meaning understood with any feasible effort

While the PAMS technique produced excellent quality rating results, it should be pointed out that depending on the nature of the audio degradation, PAMS may not always show a 100% correlation with live-listener MOS test scores conducted on the exact same speech samples. For this reason, PAMS testing should not be considered an equivalent test to MOS but rather a very good approximation. When CT Labs performs PAMS testing, it retains the processed speech samples so that they may be submitted for MOS analysis at a later time without the need to stage additional equipment.

Page 2

CT Labs: Speech Quality Issues & Measurement Techniques Overview

Figure 2 PAMS Measurement System Error Surface Display The screen shot above illustrates the error surface display that is provided with PAMS scores. The error surface highlights the impacts of a wide range of network-induced distortions, including front-end clipping, muting, noise, frequency rolloff, and bit or frame errors. In the illustrated example, the reference and degraded audio is displayed along with and error trace which highlights the differences between the reference audio input and the processed result.

Perceptual Speech Quality Measure (PSQM) This is a popular automated speech quality analysis method that predicts the speech quality for processed speech samples. The technique as identified in the ITU P.861 Recommendation is appropriate for estimating the perceived quality of speech samples gathered in environments that are not subjected to transmission bit or frame errors, frame erasures, or other modes of transmission loss. This means PSQM tests should only be conducted in clear-channel environments. When CT Labs performs PSQM tests, it uses an enhanced version of the algorithm called PSQM+ that has been enhanced to account for severe distortions and time clipping as experienced in packet networks. From practical lab experience, PSQM+ scores can be sensitive to audio levels so extra care must be taken during the test setup phase. This technique was originally designed to perform automated comparative quality analysis on compressed voice vocoder algorithms but has been popularized by a number of testing equipment vendors as a general quality analysis technique. If carefully and appropriately applied, it can be successfully used for this purpose.

Page 3

CT Labs: Speech Quality Issues & Measurement Techniques Overview

Telecom Devices Tested


CT Labs can test a wide variety of telecom products for speech quality. The following is a partial list of equipment that can be tested at CT Labs for speech quality along with the corresponding features that can be verified:

CT / IPT Product
VoIP gateways

Speech Quality Features Tested


Verify speech quality of audio channels under varying IP network conditions (packet loss, latency, jitter), different vocoders, various equipment settings (jitter buffer depth, frame packing factors, etc). Speech quality tests can be conducted as single-channel tests or random 1 speech quality sampling under high-density channel load tests . Verify quality of recorded voice mail messages under different speech encoding settings (e.g. PCM, ADPCM, 6 kB/s, 8 kB/s, etc.). Verify quality of internal talk path (station-to-station) both as a single-conversation test and as a speech quality sampling test under multi-conversation loading conditions. Verify quality of remote (IP) talk path as identified under VoIP gateways item above. 2 Verify quality of audio transmit / receive path. Verify quality of recorded messages under different speech encoding settings (e.g. PCM, ADPCM, 6 kB/s, 8 kB/s, etc.). Verify quality of internal speech prompts using live CT Labs-listener audit. Verify quality of recorded messages. Verify quality of email text-to-speech component using live CT Labs-listener audit.

IP-based PBXs

IP telephones Voice mail systems

Unified messaging systems

End of document

Testing speech quality under high channel loads is an important test that verifies no additive distortion relating to system resource starvation (e.g. inadequate processor or memory resources) which can significantly degrade quality. 2 For IP phone handset earpiece and microphone elements that cannot be coupled electrically to our test head, a special acoustic setup may be required.

Revision: 10-23-2000 CJB

Copyright 2000, CT Labs, Inc.

Page 4

You might also like