You are on page 1of 16

Full HD Voice

Full HD Voice
Huawei

Just like speaking face to face.

October 2014

Full High Definition voice, refers to the next generation of voice quality for
telephony audio resulting in crystal clear voice quality compared to digital
telephony "toll quality" and even to HD voice. Full HD Voice extends the
frequency range of audio signals up to 20000 Hz which covers the whole range
of the human voice and that of the human ear.

Enterprise VoIP

The AMR-WB codec, representing HD voice quality, was completed by 3GPP


(The 3rd Generation Partnership Project) in 2001 and since that time codec
technology has developed significantly. Codecs such as ITU-T G.718 have
shown enhanced performance in poor radio channels and codecs such as 3GPP
AMR-WB+ have demonstrated better quality for music signals. In March 2010
3GPP completed a study item on use-cases for Enhanced Voice Services (EVS)
over the Evolved Packet System of LTE. This study [1] led directly to the
development of the EVS Codec which will be completed in 3Q2014.
The EVS Codec represents a huge improvement in terms of speech/audio
quality and functionality when compared to existing conversational (low delay)
codecs. For the first time a 3GPP conversational codec will combine high quality
speech and music performance across four bandwidths; Narrowband (NB = 200
- 4000 Hz), Wideband (WB = 50 8000 Hz), Superwideband (SWB = 50
16000 Hz) and Fullband (FB = 50 20000 Hz). This level of performance
exceeds that of all existing 3GPP codecs and in particular the AMR-WB codec
which led to the creation of the GSMA HD Voice Logo which has been
successful in encouraging the deployment of AMR-WB services.
The EVS Codec is also able to compete directly in over-the-top VoIP applications
with codecs such as the recently introduced OPUS. Both fixed point and floating
point versions of EVS make it suitable for low power devices and PCs.
This document first presents the services and features of existing 3GPP
Wideband Codec (AMR-WB) and describes the current HD Voice Logo. Over the
top codecs such as OPUS are described and then the performance and features
of the EVS Codec are examined. Finally we examine a new Full HD Voice Logo.
and immersive sound experience for future.

Full HD Voice

2014-10-20

Page 2/16

Full HD Voice

Contents
Introduction ...................................................................... 4
HD Voice and the 3GPP AMR-WB Codec .............................. 4
Over the Top Conversational Codecs .................................. 6
Full HD voice and new EVS Codec for VoLTE ....................... 7
Features and Performance of the EVS Codec ....................... 8
Why Operators should deploy EVS .................................... 11
EVS impact on VoLTE ...................................................... 12
Full HD Voice proposal in GSMA ....................................... 13
Future Voice: EVS Beyond 3GPP Release 12 ...................... 15
References ...................................................................... 16

2014-10-20

Page 3/16

Full HD Voice

Introduction
In March 2010 3GPP completed a study item on use-cases for Enhanced Voice
Services (EVS) over the Evolved Packet System of LTE. This study [1] led directly to
the development of the EVS Codec was completed in 3Q2014. After a competitive
qualification phase, a consortium of all of the qualified codec developers, including
Huawei Technologies, was formed and the Selection phase became a collaborative
development.
This document first presents the services and features of existing 3GPP and over the
top codecs and describes the current HD Voice Logo. Then performance and
features of the EVS Codec are examined. Finally we examine a new Full HD Voice
Logo and opportunities for Huawei to lead in the deployment of the EVS Codec.

HD Voice and the 3GPP AMR-WB Codec


The better voice quality of HD voice improves the call experience over conventional
Narrowband, allowing people to be better understood, share their feelings, do
business and communicate ideas more easily. HD voice transmits slightly more of
the human voice spectrum; making conversations more natural and easily
understood. HD voice also helps people hear better in noisy environments.

HD voice
improves the
call experience
over
conventional
Narrowband

HD voice helps operators to differentiate their voice service offerings and enables
high quality services e.g. voice dependent business like call centers, information and
emergency services, etc. HD voice is much better for conference calls and can
contribute to a reduction in business travel - raising productivity while reducing
environmental impact. Calls which are easier to hear and understand reduce the
fatigue often associated with long conference calls.
Orange R&D studies of HD voice customers confirmed: 96% of customers are
satisfied with HD voice calls [2].
The HD Voice Logo of GSMA (Global System for Mobile Communications
Association) has been successful in encouraging both operators and manufacturers
to provide AMR-WB and EVRC-NW based services.
Both the 3GPP AMR-WB and the 3GPP2 EVRC-NW codecs are essentially speech
codecs. A degree of performance for music signals at the higher bit rates of
operation is achieved but these codec have not been designed to provide other than
tolerable rendering.

2014-10-20

Page 4/16

Full HD Voice

Initially take-up of the AMR-WB codec and wideband speech services was slow,
partially due to the need for either tandem-free operation (TFO) or transcoder-free
operation (TrFO) to be available in the network, but once these innovations were inplace the service started to take off.
There are currently many well established operators and major manufacturers signed
up as licensees of the HD Voice Logo - see Figure 1 and the Global Mobile Suppliers
Association announced in March 2014 that one hundred operators worldwide have
enabled mobile HD Voice services in 73 countries [3] - see Figure 2.
Currently the HD Voice Logo requirements for GSM/UMTS mandate use of AMR-WB
and those for CDMA2000 mandate the use of EVRC-NW; both of which are
wideband speech codecs (50 Hz to 7000 Hz). This is well aligned with the
conventional definition of HD Voice, which is synonymous with wideband speech
services (50 Hz to 7000 Hz); matching as it does the frequency response of these
two codecs.

one hundred
operators
worldwide
have enabled
HD Voice
services in 73
countries

Figure 1: GSMA HD Voice Logo

Figure 2: GSMA HD Voice Logo Licensees


HD Voice Operator Licensees

2014-10-20

Page 5/16

Full HD Voice

HD Voice Manufacturer Licensees

The group that is responsible for developing the HD Voice Logo Requirements within
GSMA, TSG VLR, is in the process of determining priorities for version 3.0; version
2.0 was approved in 2013 [3].

Over the Top Conversational Codecs

The recently
standardized
Opus codec
represents a
performance
benchmark
that is hard to
ignore

2014-10-20

Over-the-top (OTT) service providers such as Skype have been providing VoIP pointto-point services for several years. The flexibility and processing power of the PC
platform combined with IP and little or no legacy infrastructure allowed the services
to shift easily from conventional NB services to WB and even SWB using proprietary
codecs such as SiLK. Broadband IP networks do not suffer the same radio resource
constraints as wide area mobile networks and so the drive for high quality at lower bit
rates is less obvious but nevertheless such services are already threatening the
capacity and revenue streams of mobile operators. Many operators attempt to control
their use by deep packet inspection or other profiling methods but smart phones
using WiFi connections can easily circumvent the mobile networks.
The recently standardized Opus codec in IETF RFC 6716 [4] represents a
performance benchmark that is hard to ignore for conventionally standardized
codecs. This codec which is a hybrid between the Skype SiLK voice codec and the
CELT audio codec spans a range in bit rate from 6 kbit/s to 510 kbit/s. At lower bit
rates performance is somewhat limited and the coded bandwidth is less than SWB.
The Opus codec may not live up to all of the claims as a totally open, royalty-free
audio codec but it represents a high quality codec at, and above, 24 kbit/s where it
codes more of the SWB bandwidth. See [5]. Unfortunately, or perhaps fortunately, 24
kbit/s represents a rather high bit rate for efficient use of the radio resource for
speech/audio in mobile systems.

Page 6/16

Full HD Voice

Full HD voice and new EVS Codec for VoLTE


Full HD Voice will go beyond the quality of the current HD Voice to deliver unrivalled
quality to mobile users and provides even greater benefits. The better voice quality of
Full HD voice will improve the call experience still further, allowing people to
experience calls just as if they are speaking face to face or directly to the person they
are speaking to. Full HD voice transmits almost the entire human voice spectrum;
making conversations completely natural and as understandable as possible. Like
HD voice, Full HD Voice will also help people hear better in noisy environments.
Full HD voice will provide additional means for operators to differentiate their voice
service offerings and enable even higher quality services. The additional error
robustness of Full HD Voice will also mean that these higher quality services are
provided over more of the coverage area of an operators network; increasing the
satisfaction level of end-users.
The features of Full HD Voice cannot be provided using existing speech and audio
codecs and therefore a new codec is clearly needed. The AMR-WB codec was
completed by 3GPP in 2001 and since that time codec technology has developed
significantly. Codecs such as the 3GPP2 VMR Codec, ITU-T G.718 and 3GPP AMRWB+ have built upon the best features of AMR-WB and been shown to provide
enhanced performance in poor radio channels and better quality for music signals.
Over the same period codecs have been developed that encode more and more of
the audio spectrum. Codecs such as ITU-T G.719, and Superwideband extensions to
codecs such as ITU-T G.718 and G.729.1 have demonstrated that the additional
audio bandwidths above 7kHz do not require very much extra data to encode well.
There have also been significant developments in Mobile System infrastructure. With
the deployment of the Internet Protocol (IP) based infrastructure known as IMS, in
conjunction with LTE which is also a packet-based air interface technology, the
introduction of new codecs is also much more easily achieved than in the past. This
is because fewer changes are required within the infrastructure to support the new
codecs as the data packets can remain in-tact from one handset to the other in a call.
The transmission of voice packets over the LTE air interface is known as Voice over
LTE (VoLTE) to mirror the similarity to VoIP. VoLTE is currently being rolled out in
Korea with more general deployment later in the year and throughout 2014.
In response to these developments a study item on use-cases for Enhanced Voice
Services (EVS) over the Evolved Packet System of LTE was initiated in 3GPP; and

2014-10-20

the
introduction of
new codecs is
more easily
achieved than
in the
past. fewer
changes are
required
within the
infrastructure.

Page 7/16

Full HD Voice

in March 2010 it was completed. This study [1] led directly to the development of the
EVS Codec which will be completed in 3Q2014.

Features and Performance of the EVS Codec


Looking to the Enhanced Voice Services (EVS) Codec; it represents a very
significant milestone in terms of speech/audio quality and functionality when
compared to existing conversational (low delay) codecs. For the first time a 3GPP
conversational codec will combine high quality speech and music performance
across four bandwidths; Narrowband (NB = 200 - 4000 Hz), Wideband (WB = 50
8000 Hz), Superwideband (SWB = 50 16000 Hz) and Fullband (FB = 50 20000
Hz). These wider audio bandwidths, combined with improved quality for music and
mixed content signals, are at the heart of what constitutes Full HD Voice.
The 3GPP Work Item Description for the EVS Codec which will be completed during
2014 lists the objectives of the new codec as follows;
1.
Enhanced quality and coding efficiency for narrowband (NB) and wideband
(WB) speech services, leading to improved user experience and system efficiency.
This should also be achieved in interoperation with pre-Rel-10 systems and services
employing WB voice.
2.
Enhanced quality by the introduction of super-wideband (SWB) speech,
leading to improved user experience.
3.
Enhanced quality for mixed content and music in conversational
applications (for example, in-call music), leading to improved user experience for
cases when selection of dedicated 3GPP audio codecs is not possible.
4.
Robustness to packet loss and delay jitter, leading to optimized behavior in
IP application environments like MTSI within the EPS.
5.
Backward interoperability to the 3GPP AMR-WB codec by having some WB
EVS modes supporting the AMR-WB codec format used throughout 3GPP
conversational speech telephony service (including CS). The AMR-WB interoperable
operation modes of the EVS codec may be either identical to those in the AMR-WB
codec or different but bitstream interoperable with them.
Many of the improvements in NB and WB represent a capacity boost for mobile
systems whilst delivering the same audio quality. It is also clear that the EVS codec

2014-10-20

Page 8/16

Full HD Voice

will provide improvements to the Wideband speech services that are at the heart of
the HD Voice Logo Terminal Requirements (WID Items 1, 3, 4 & 5).
Perhaps the main enhancement to voice services provided by EVS though will be
SWB speech (and in-call music - WID Item 2 in combination with Items 3 & 4) which
obviously goes beyond the wideband frequencies up to 7kHz and covers frequencies
up to at least 14kHz. In-fact the current frequency masks used within the EVS
standardization exercise extend beyond 15000 Hz at certain bitrates. The Fullband
audio mode of EVS operating from 16.4 kbit/s will also provide even greater
improvement. As mentioned previously, it will be these broader audio bandwidths
which will define Full HD Voice.
Table 1: Source codec bit-rates for the EVS codec (from draft TS 26.441)
Source codec bit-rate
(kbit/s)

Signal bandwidths
supported

Source Controlled
Operation Available

5.9 (SC-VBR)

NB, WB

Yes (Always On)

7.2

NB, WB

Yes

NB, WB

Yes

9.6

NB, WB, SWB

Yes

13.2

NB, WB, SWB

Yes

13.2 Channel Aware

WB, SWB

Yes

16.4

NB, WB, SWB, FB

Yes

24.4

NB, WB, SWB, FB

Yes

32

WB, SWB, FB

Yes

48

WB, SWB, FB

Yes

64

WB, SWB, FB

Yes

96

WB, SWB, FB

Yes

128

WB, SWB, FB

Yes

the EVS codec


provides
unrivalled
quality
particularly at
bit rates up to
24.4 kbit/s

There have been conversational SWB and FB codecs before in both ITU-T and VoIP
applications such as Skype but the EVS Codec achieves with SWB coding from 9.6
kbit/s and FB coding from 16.4 kbit/s as shown in Table 1. The SWB coding of EVS
comes close to achieving the quality and reproducing the bandwidth of broadcast FM
radio. Fullband coding comes close to HiFi bandwidths and systems such as MP3.
See Figure 3.

2014-10-20

Page 9/16

Full HD Voice

From a quality perspective, the EVS codec provides this unrivalled quality for not
only clean speech but noisy speech and music/audio across the entire bit rate range;
but particularly at bit rates up to 24.4 kbit/s. This, combined with better capacity and
excellent robustness to frame erasures, makes the EVS codec supremely adapted to
mobile applications.
Figure 3: Bandwidths of 3GPP Codecs

This extra audio


bandwidth will
make a really
significant
improvement in
the user
experience of
VoLTE systems

The EVS codec also has an example solution of a jitter buffer manager (JBM) which
evens out the packet delay variation experienced by speech data packets
transported over the IMS which is a voice over IP (VoIP) system.
The quality of the EVS codec operating in its SWB modes can be seen in Figure 4.
This figure shows the performance of the codec in clean speech (Figure 4a), clean
speech with frame losses (Figure 4b), noisy speech (Figure 4c) and music/mixed
content (Figure 4d). The tests were performed as part of the independent evaluation
of the codec in the EVS Selection Phase.
In almost all cases the EVS Codec is superior to the reference codecs used to define
the requirements Note in Figure 4d the reference codecs although operating at the
same bit rate have significant longer delays making them unsuitable for
conversational applications. Similar performance against the references is achieved
in NB and WB.

2014-10-20

Page 10/16

Full HD Voice

This level of performance exceeds that of all existing 3GPP codecs and in particular
the AMR-WB codec which led to the creation of the GSMA HD Voice Logo after all
HD Voice is synonymous with Wideband audio.
Figure 4: Quality of The EVS Codec operating in SWB (Selection test results)

Why Operators should deploy EVS


As described above, the EVS Codec provides a quantum leap in terms of quality and
efficiency and results in business benefits to operators.
As previous studies of HD voice customers have shown, customers notice the
difference when they are provided with high quality voice calls [2] and this naturally
leads to longer duration calls. This extended use brings greater user satisfaction
levels and leads to less churn and/or greater ARPU.
Competition from OTT services such as Skype has been naturally limited by the
universal addressing provided by the unique address space represented by ITU-T
E.164 and yet they have flourished due to enhanced audio quality and lower cost.
EVS provides a real opportunity for mobile operators to devalue the proposition of
these OTT providers by offering a highly competitive audio quality package to both to
consumers and business/enterprise customers, in addition to the addressing
convenience.
In addition to the EVS primary modes, the codec has modes that allow it to
interoperate with the 3GPP AMR-WB codec and achieve enhanced quality and
robustness to packet loss (see Figure 5). This feature allows EVS enabled phones to

2014-10-20

Page 11/16

Full HD Voice

communicate directly with AMR-WB VoLTE phones and 2G/3G phones and gives
operators flexibility to roll-out VoLTE handsets featuring the EVS codec as an
alternative to AMR-WB. During this initial phase of EVS deployment operators will
also benefit from enhanced performance of their AMR-WB service.
Figure 5: The EVS Codec operation in AMR-WB I/O Mode

EVS impact on VoLTE


To enable EVS services in emerging LTE networks, some network nodes need to be
updated from two aspects:
1.

Media handling enhanced for EVS codec: SBC, MGW

2.

Signaling handling enhanced for SDP Offer/Answer: SBC, AS, MGCF

Figure 6 highlights the necessary network node changes for EVS over VoLTE.

2014-10-20

Page 12/16

Full HD Voice

Full HD Voice proposal in GSMA


The most significant enhancement to VoLTE services provided by the EVS codec will
be SWB speech (and in-call music) which obviously goes beyond the wideband
audio frequencies associated with the current HD Voice Logo. Whats more the EVS
codec will also be capable of FB speech. This extra audio bandwidth will make a
really significant improvement in the user experience of VoLTE systems and, if
marketed well, could provide a valuable selling point for LTE systems and handsets.
However for this strategy to be successful the new EVS SWB and FB services need
to be differentiated from the current HD Voice Logo service in the minds of network
operators and consumers alike.
Figure 6: Network Enhancement to Support Full HD Voice (EVS codec) in VoLTE

Application Server

Media handling enhanced for EVS codec: SBC, MGW


Signaling handling enhanced for SDP O/A : SBC, AS,
MGCF
TAS/IP-SM-GW/T-ADS

RCS Server

SIP

Converged SDB

IMS Core

Diameter
H.248

H L R/HSS/ENUM/DNS

PLMN/PSTN Network

SIP
I/S-CSCF/MRFC

MRFP

MGCF

SI
P

IM-MGW

SBC (P-CSCF/ATCF/ATGW/E-CSCF)

CS

EPC

EMSC
PCRF

S-GW/P-GW

MME

MGW

2G/3G

2G/3G

Data card + SoftClient

LTE

CPE + Fixed Phone

LTE

VoLTE Smartphone

The group that is responsible for developing the HD Voice Logo Requirements within
GSMA, TSG VLR, is in the process of determining priorities for version 3.0; version
2.0 was approved in 2013. The timescales for version 3.0 are well aligned with
Release 12 completion of the EVS Codec standard and the Huawei Media Lab has
been actively working within TSG VLR to encourage the development of a new
enhancement to the HD Voice Logo to promote the deployment of SWB services with
the EVS Codec.

2014-10-20

Page 13/16

Full HD Voice

Figure 7: Example New Logos proposed for SWB and FB variants of the HD Voice
Logo in GSMA.

The rationale for a new Logo is that the existing Logo is very well adapted to WB
speech services provided by AMR-WB but the significant improvements in user
experience enabled by EVS go far beyond this. Good progress toward this goal has
been made and there is good support for the initiative within the TSG VLR group.
The marketing and project management groups within GSMA are now considering
the proposal.
Figure 8: Example GSMA HD Voice Logo with Tag-line.

The proposal made and accepted by TSG VLR was not to employ a completely new
logo but to build on the success of the original logo by creating a slightly modified
logo as shown in Figure 7. As an alternative it has been suggested that a tag-line
beneath the current logo may also be considered as shown in Figure 8.

2014-10-20

Page 14/16

Full HD Voice

Figure 9: Relationship between EVS, Enhancements to EVS and 3GPP 5G Radio


Standards.

Future Voice: EVS Beyond 3GPP Release 12


There are plans within the main players in 3GPP EVS to develop a stereo variant of
the EVS codec for Release 13 or perhaps Release 14. It is preferred that the stereo
extensions should be built upon the EVS mono codec operating modes in an
embedded sense.
The development timescales of EVS and the extensions to EVS in relation to the
3GPP developments towards 5G can be seen in Figure 9.
Beyond stereo, one of the next key areas which is likely to enhance the perceived
audio quality for communication will be binaural rendering and immersive audio. In
such a system the user will experience the full effect of immersion within a recreated
sound field. This requires 3-D head-tracking so that as a user moves their head the
source of all sounds within the sound field change position naturally. This technology
is already used in virtual reality gaming but represents the next logical step in the
evolution of audio and speech communication. The goal being to get ever closer to
Just like speaking face to face.

2014-10-20

one of the
next key areas
to enhance the
perceived
audio
quality will
be binaural
rendering and
immersive
audio.

Page 15/16

Full HD Voice

References
[1]

[2 ]

2014-10-20

3GPP TR 22.813 Study of Use Cases and requirements for enhanced


voice codecs for the EPS, v.10.0.0, March 2010.
http://www.gsacom.com/downloads/pdf/GSA_mobile_hd_voice_020614.php4,
June 2014.

[3]

http://www.gsacom.com/news/gsa_407.php June 2014.

[4]

ftp://ftp.3gpp2.org/TSGAC/Working/2014/20140318_Kyoto/TSG-AC-2014-03Kyoto/WG1/14_01_20_Position/AC10-20140120-010A_HD-Voice-Annex-CMinimum-Requirements-with-GSM-UMTS.pdf

[5]

http://tools.ietf.org/html/rfc6716

[6]

http://www.opus-codec.org/

Page 16/16

You might also like