You are on page 1of 26

PREPARED BY: RISHIKESH BHAVSAR 08MECC04 SEM M.

Tech COMMUNICATION

2ND

GUIDED BY: TANISH ZAVERI SIR (LECT.) COMMUNICATION

INTRODUCTION SPEECH COMPRESSION IN WIRED COMMUNICATION NETWORKS SPEECH COMPRESSIOIN IN WIRELESS COMMUNICATION NETWORKS

SPEECH COMPRESSION

INTRODUCTION
Speech Compression
The sound samples can be processed, transmitted, and converted back to analog format, where they can finally be received by the human ear .

Speech compression plays a major role in two broad areas: 1. Wired telephone network, 2. Wireless network (including cordless and cellular), and

Within the wired network the requirements on speech compression are rather tight with strong restrictions on quality, delay, and complexity. Within the wireless network, because of the noisy environments that are often encountered, the requirements on quality and delay are often relaxed. However, due to limited channel capacity the requirements on bit rate are generally tighter (i.e. lower bit rate is required) than for the wired network.

INTRODUCTION

Features

1. Data Rates
Telephone quality voice: 8000 samples/sec, 8 bits/samples, mono 64Kb/s CD quality audio: 44100 samples/sec, 16 bits/sample, stereo 1.4Mb/s

INTRODUCTION

2. Speech Codec Overview


PCM - send every sample DPCM - send differences between samples ADPCM - send differences, but adapt how we code them SB-ADPCM - wideband codec, use ADPCM twice, once for lower frequencies, again at lower bit rate for upper frequencies. LPC - linear model of speech formation CELP (Code Excited Linear Prediction) - use LPC as base, but also use some bits to code corrections for the things LPC gets wrong.

INTRODUCTION

INTRODUCTION SPEECH COMPRESSION IN WIRED COMMUNICATION NETWORKS

SPEECH COMPRESSIOIN IN WIRELESS COMMUNICATION NETWORKS

SPEECH COMPRESSION IN WIRED COMMUNICATION NETWORKS

Audio signal compression in wired communication networks typically includes methods for compression of
Telephone speech (300-3400 Hz), Wideband speech (50-7000 Hz), and Wideband audio signals (10-20000 Hz).

However, this presentation is mostly dedicated to considering the various methods for compression of the 3.1 kHz telephone speech.

SPEECH COMPRESSION IN WIRED MEDIA

Pulse Code Modulation (PCM)

ITU(INTERNATIONAL TELECOMMUNICATION UNION)-T has given a standard G.711 PCM 64 kb/s.


Transmit value of each speech sample
dynamic range of speech is about 50-60 dB 11 bits/sample
maximum frequency in telephone speech is 3.4 kHz sampling frequency 8 kHz 8000 x 11 = 88 kb/s Simple and universal but not very efficient.

G.711 uses a sampling rate of 8,000 samples per second. The tolerance on that rate is 50 parts per million (ppm). Eight binary digits per sample are used. Two encoding laws are used and these are commonly referred to as the A-law and the mulaw.
SPEECH COMPRESSION IN WIRED MEDIA

QUANTIZATION METHOD :-

Please see MATLAB PROGRAM :- PCM

LOGARITHMIC PCM

Features of PCM: Interface to support systems with multiple speech coders (G.729, G.728, G.726 ).
Optimized for high performance on leading edge DSP architectures.

Multi-tasking environment compatible.


Can be integrated with G.168 and G.165 echo cancellers, and tone detection/regeneration. Multi channel implementation. Optimized implementation. 64 Kbit/s expander input rate.

A-law or mu-law expander input.


Uniform PCM expander output.

SPEECH COMPRESSION IN WIRED MEDIA

Adaptive Differential Pulse Code Modulation (ADPCM)

ADPCM, adaptive differential pulse code modulation, is one of the most widely used speech compression techniques used today due to its high speech quality, low delay, and moderate complexity.

ITU-T (INTERNATIONAL TELECOMMUNICATION UNION) has given a standard G.721 ADPCM 32 kb/s.

ADPCM is a waveform coder, and achieves its compression improvements by taking advantage of the high correlation exhibited by successive speech samples.

SPEECH COMPRESSION IN WIRED MEDIA

Figure 1. ADPCM Encoder

Referring to Figure 1 above, the ADPCM encoder first calculates the difference between the input signal, typically A-law or u-law, and a signal generated by a linear adaptive predictor. Due to the high correlation between samples, the difference signal has a much lower dynamic range than the input signal.
SPEECH COMPRESSION IN WIRED MEDIA

The reduced dynamic range means fewer bits are required to accurately represent the difference signal as compared to the input.
Transmitting only the quantized difference signal reduces the transmitted data rate. A four-bit quantization translates to a 32 kbps data rate for a compression ratio of 2:1. Higher quality is achieved through the adaptive nature of the quantization.

Analyzing the time varying characteristics of the difference signal, the size of the quantization steps, and the rate at which quantization steps change facilitate higher accuracy for a wider dynamic range. As mentioned, signal received at the decoder is the quantized difference signal. The ADPCM decoder is essentially the reverse process of the encoder.
SPEECH COMPRESSION IN WIRED MEDIA

ITU-T G.728 Speech Encoder


ITU-T G.728 LD-CELP 16 kb/s compression algorithm is recently used in wired telephone with less than 2 ms coding delay. ITU-T G.728 encodes five sample frames of 16-bit linear PCM data into 10-bit code words. The excitation gain is updated by a 10th-order adaptive linear predictor based on the logarithmic gains of previously quantized and scaled excitation vectors. The LPC predictor and the gain predictor are updated by performing LPC analysis on previously coded speech and previous log-gain sequence, respectively, with the autocorrelation coefficients calculated by a novel hybrid windowing method. The excitation codebook is closed-loop optimized and its index is Gray-coded for better robustness to channel errors. Fig 1.(a) ITU-T G.728 LD-CELP Encoder

SPEECH COMPRESSION IN WIRED MEDIA

ITU-T G.728 Speech Decoder


Decoder receives the codebook index, it obtains the corresponding excitation codebook vector. This code vector is then passed through a gain scaling unit and a synthesis filter to obtain decoded speech.

Synthesis filter coefficients are updated in the same manner as done in the encoder.
We do not perform codebook search

Fig 1.(b) ITU-T G.728 LD-CELP Decoder

SPEECH COMPRESSION IN WIRED MEDIA

INTRODUCTION

SPEECH COMPRESSION IN WIRED COMMUNICATION NETWORKS

SPEECH COMPRESSIOIN IN WIRELESS COMMUNICATION NETWORKS

SPEECH COMPRESSIOIN IN WIRELESS COMMUNICATION NETWORKS

As for wireless communication networks, we consider separately actual trends in speech compression algorithm developments in

Time-division multiple access (TDMA) and Code-division multiple access (CDMA) communication systems.

SPEECH COMPRESSION IN WIRELESS MEDIA

Time-division multiple access (TDMA) Systems


GSM 06.10 RPE-LTP speech coder
The full rate speech codec in GSM is described as Regular Pulse Excitation with Long Term Prediction (GSM 06.10 RPE-LTP). Basically, the encoder divides the speech into short-term predictable parts, long-term predictable part and the remaining residual pulse. Then, it encodes that pulse and parameters for the two predictors. The decoder reconstructs the speech by passing the residual pulse first through the long-term prediction filter, and then through the short-term predictor, see below fig1.

Fig 1. GSM 06.10 RPE-LTP

SPEECH COMPRESSION IN WIRELESS MEDIA

VSELP (Vector-Sum Excited Linear Prediction) algorithm


VSELP (Vector-Sum Excited Linear Prediction) algorithm reduces computational complexity and increased robustness to channel errors. The VSELP excitation is derived by combining excitation vectors from three codebooks (pitch adaptive codebook and two highly structured stochastic codebooks), see Fig. 2. The speech frame in the VSELP algorithm is 20 ms long and each frame is divided into four 5- ms sub-frames.

A 10th-order LPC filter is used and its coefficients are encoded as reflection coefficients once per frame while sub-frame LPC parameters are obtained through linear interpolation.
The excitation parameters are updated every 5 ms. The excitation is coded using gain-shape vector quantizers. VSELP coding algorithm encodes frames of 160 linear PCM samples into ten 16-bit code words and has an algorithmic delay of 7.5 ms.

SPEECH COMPRESSION IN WIRELESS MEDIA

Code-division multiple access (CDMA) Systems


Constant BR versus Variable BR

Most speech coders are designed to generate a constant rate bit stream for digital transmission. However, for digital storage and for some applications in telecommunications a variable bit rate could be advantageous. While a constant bit rate is well suited for many digital communication systems, speech is by nature intermittent and has a short-term statistical character that varies greatly with time. Recently, variable bit rate (VBR) speech compression has become a very active and important topic in the field of speech coding. VBR speech coders can exploit the pauses and silent intervals which occur in conversational speech and may also be designed to take advantage of the fact that different speech segments may be encoded at different rates while maintaining a given reproduction quality.

SPEECH COMPRESSION IN WIRELESS MEDIA

Code Excited Linear Prediction (CELP)

In order to maintain acceptable quality below the 8 kbps data rate, a fundamentally different approach to speech coding and sizeable jump in complexity is encountered. Goal is to efficiently encode the residue signal, improving speech quality over LPC, but without increasing the bit rate too much.

CELP codec's use a codebook of typical residue values.


Analyzer compares residue to codebook values. Chooses value which is closest. Sends that value.

Receiver looks up the code in its codebook, retrieves the residue, and uses this to excite the LPC formant filter.

SPEECH COMPRESSION IN WIRELESS MEDIA

USFS(U.S. Federal Standard)-1016 CELP Encoder operating at 4.8 kbps

Figure 2 :- USFS-1016 Standard of CELP


SPEECH COMPRESSION IN WIRELESS MEDIA

CODE BOOK GENERATION

Collection of a variety of speech samples, including both male and female. Code book must contain an assortment of sounds representing a variety of sounds. The samples collected contain an equal representation of both male and female sounds in different language.

Editing of the above collected samples for phonemes and unvoiced signals. Every language contains a set of phonemes. The smallest phonetic unit in a language that is capable of conveying a distinction in meaning is known as phoneme. The code book works better if it contains samples for all possible phonemes. The code book must also contain samples of unvoiced signals.

SPEECH COMPRESSION IN WIRELESS MEDIA

5.3 Vector Quantization(VQ)

VQ is a special method of cooking a code book. It makes sure that our code book contains codes representing an assortment of sounds. It picks up one frame M, searches for N-1 number of other frames similar to M, finds the average of all the frames to make a single code of the code book. It then purges N frames from the source and repeats the above process again till all the frames have been consume

Thank you for your attention!

You might also like