Professional Documents
Culture Documents
Tech COMMUNICATION
2ND
INTRODUCTION SPEECH COMPRESSION IN WIRED COMMUNICATION NETWORKS SPEECH COMPRESSIOIN IN WIRELESS COMMUNICATION NETWORKS
SPEECH COMPRESSION
INTRODUCTION
Speech Compression
The sound samples can be processed, transmitted, and converted back to analog format, where they can finally be received by the human ear .
Speech compression plays a major role in two broad areas: 1. Wired telephone network, 2. Wireless network (including cordless and cellular), and
Within the wired network the requirements on speech compression are rather tight with strong restrictions on quality, delay, and complexity. Within the wireless network, because of the noisy environments that are often encountered, the requirements on quality and delay are often relaxed. However, due to limited channel capacity the requirements on bit rate are generally tighter (i.e. lower bit rate is required) than for the wired network.
INTRODUCTION
Features
1. Data Rates
Telephone quality voice: 8000 samples/sec, 8 bits/samples, mono 64Kb/s CD quality audio: 44100 samples/sec, 16 bits/sample, stereo 1.4Mb/s
INTRODUCTION
INTRODUCTION
Audio signal compression in wired communication networks typically includes methods for compression of
Telephone speech (300-3400 Hz), Wideband speech (50-7000 Hz), and Wideband audio signals (10-20000 Hz).
However, this presentation is mostly dedicated to considering the various methods for compression of the 3.1 kHz telephone speech.
G.711 uses a sampling rate of 8,000 samples per second. The tolerance on that rate is 50 parts per million (ppm). Eight binary digits per sample are used. Two encoding laws are used and these are commonly referred to as the A-law and the mulaw.
SPEECH COMPRESSION IN WIRED MEDIA
QUANTIZATION METHOD :-
LOGARITHMIC PCM
Features of PCM: Interface to support systems with multiple speech coders (G.729, G.728, G.726 ).
Optimized for high performance on leading edge DSP architectures.
ADPCM, adaptive differential pulse code modulation, is one of the most widely used speech compression techniques used today due to its high speech quality, low delay, and moderate complexity.
ITU-T (INTERNATIONAL TELECOMMUNICATION UNION) has given a standard G.721 ADPCM 32 kb/s.
ADPCM is a waveform coder, and achieves its compression improvements by taking advantage of the high correlation exhibited by successive speech samples.
Referring to Figure 1 above, the ADPCM encoder first calculates the difference between the input signal, typically A-law or u-law, and a signal generated by a linear adaptive predictor. Due to the high correlation between samples, the difference signal has a much lower dynamic range than the input signal.
SPEECH COMPRESSION IN WIRED MEDIA
The reduced dynamic range means fewer bits are required to accurately represent the difference signal as compared to the input.
Transmitting only the quantized difference signal reduces the transmitted data rate. A four-bit quantization translates to a 32 kbps data rate for a compression ratio of 2:1. Higher quality is achieved through the adaptive nature of the quantization.
Analyzing the time varying characteristics of the difference signal, the size of the quantization steps, and the rate at which quantization steps change facilitate higher accuracy for a wider dynamic range. As mentioned, signal received at the decoder is the quantized difference signal. The ADPCM decoder is essentially the reverse process of the encoder.
SPEECH COMPRESSION IN WIRED MEDIA
Synthesis filter coefficients are updated in the same manner as done in the encoder.
We do not perform codebook search
INTRODUCTION
As for wireless communication networks, we consider separately actual trends in speech compression algorithm developments in
Time-division multiple access (TDMA) and Code-division multiple access (CDMA) communication systems.
A 10th-order LPC filter is used and its coefficients are encoded as reflection coefficients once per frame while sub-frame LPC parameters are obtained through linear interpolation.
The excitation parameters are updated every 5 ms. The excitation is coded using gain-shape vector quantizers. VSELP coding algorithm encodes frames of 160 linear PCM samples into ten 16-bit code words and has an algorithmic delay of 7.5 ms.
Most speech coders are designed to generate a constant rate bit stream for digital transmission. However, for digital storage and for some applications in telecommunications a variable bit rate could be advantageous. While a constant bit rate is well suited for many digital communication systems, speech is by nature intermittent and has a short-term statistical character that varies greatly with time. Recently, variable bit rate (VBR) speech compression has become a very active and important topic in the field of speech coding. VBR speech coders can exploit the pauses and silent intervals which occur in conversational speech and may also be designed to take advantage of the fact that different speech segments may be encoded at different rates while maintaining a given reproduction quality.
In order to maintain acceptable quality below the 8 kbps data rate, a fundamentally different approach to speech coding and sizeable jump in complexity is encountered. Goal is to efficiently encode the residue signal, improving speech quality over LPC, but without increasing the bit rate too much.
Receiver looks up the code in its codebook, retrieves the residue, and uses this to excite the LPC formant filter.
Collection of a variety of speech samples, including both male and female. Code book must contain an assortment of sounds representing a variety of sounds. The samples collected contain an equal representation of both male and female sounds in different language.
Editing of the above collected samples for phonemes and unvoiced signals. Every language contains a set of phonemes. The smallest phonetic unit in a language that is capable of conveying a distinction in meaning is known as phoneme. The code book works better if it contains samples for all possible phonemes. The code book must also contain samples of unvoiced signals.
VQ is a special method of cooking a code book. It makes sure that our code book contains codes representing an assortment of sounds. It picks up one frame M, searches for N-1 number of other frames similar to M, finds the average of all the frames to make a single code of the code book. It then purges N frames from the source and repeats the above process again till all the frames have been consume