You are on page 1of 9

Int J Speech Technol (2011) 14:157165 DOI 10.

1007/s10772-011-9093-5

Proposed modications in ETSI GSM 06.10 full rate speech codec and its overall evaluation of performance using MATLAB
Ninad Bhatt Yogeshwar Kosta

Received: 16 March 2011 / Accepted: 25 May 2011 / Published online: 29 June 2011 Springer Science+Business Media, LLC 2011

Abstract Today, the primary constrain in wireless communication system is limited bandwidth and power. Wireless systems involved in transmission of speech envisage that efcient and effective methods be developed (bandwidth usage & power) to transmit and receive the same while maintaining quality-of-speech, especially at the receiving end. Speech coding is a technique, since the era of digitization (digital) and computerization (computational and processing horsepowerDSP) that has been a material-of-research for quite some time amongst the scientic and academic community. Amongst all elements of the communication system (transmitter, channel and receiver), transmission channel (carrier of information/data, also called the medium) is the most critical and plays a key role in the transmission and reception of information/data. This paper proposes some modications in the selection process of grid positions in Regular Pulse Excitation section of 13 kbps ETSI GSM 06.10 Full Rate Speech coder so that there is an overall 1.8 kbps (36 bits / each 20 ms frame) reduction in bit-rate which can be utilized for either improving error detection and correction at channel coding or for hidden data embedding and transmission over wireless link. Both Standard GSM FR and proposed GSM FR are implemented in MATLAB. Here, Subjective and Objective analysis are carried out on a proposed system to evaluate its performance and the results obtained are then compared with the results of GSM 06.10 Full Rate coder using set of

tables and graphs. As can be observed from obtained results that both PESQ and MOS scores are quite comparable for each wave les and marginal degradation of both can be witnessed with respect to decrease in codec bitrates. Keywords Speech coding GSM ETSI RPE-LTP coder Subjective analysis Objective analysis MATLAB

1 Introduction Full Rate GSM 06.10 Speech Coder basically belongs to Hybrid coder (Analysis by Synthesis coder) which provides attractive trade off between waveform coders and vocoders, both in terms of speech quality and transmission bit rate, although generally at the price of higher complexity (Malkovic 2003). The speech encoder takes its input as a 13 bit uniform PCM signal either from the audio part of the mobile station or on the network side, from the PSTN via an 8 bit / A-law to 13 (13 bit 8 KHz = 104 Kbps) bit uniform PCM as specied in GSM 06.01 (ETSI 20052006). The encoded speech at the output of the speech encoder is delivered to a channel encoder unit which is specied in GSM 05.03 (ETSI 20052006). In the receive direction, inverse operations take place. GSM 06.10 describes the detailed mapping between input blocks of 160 speech samples in 13 bit uniform PCM format to encoded blocks of 260 bits and from encoded blocks of 260 bits to output blocks of 160 reconstructed speech samples. The rate of sampling is 8000 samples/s leading to an average bit rate for the encoded bit stream of 13 kbps. The coding scheme is so called Regular Pulse Excitation-Long Term Prediction-Linear Predictive Coder.

N. Bhatt ( ) Veer Narmad South Gujarat University, Surat, Gujarat, India e-mail: bhattninad@gmail.com Y. Kosta Marwadi Education Foundation, Rajkot, Gujarat, India e-mail: ypkosta@yahoo.com

158

Int J Speech Technol (2011) 14:157165 Table 1 Bit allocation for GSM Full Rate Speech Coder (ETSI 20052006) Parameter No. per frame LPC Pitch Period Long Term Gain Grid Position Peak Magnitude Sample Amplitude Total 8 4 4 4 4 4 13 6, 6, 5, 5, 4, 4, 3, 3 7 2 2 6 3 Resolution Total bits / frame 36 28 8 8 24 156 260

2 GSM full rate encoder The detailed block diagram of GSM 06.10 Speech Encoder is shown in Fig. 1. The input speech frame, consisting of 160 signal samples is rst pre-processed to produce an offset free signal, which is then subjected to a rst order pre-emphasis lter. The 160 samples obtained are then analyzed to determine the coefcients for the short term analysis (LPC Analysis). These parameters are then used for the ltering of the same 160 samples. The result is 160 samples of the short term residual signal. The lter parameters, termed reection coefcients, are transformed to log area ratios, LARs, before transmission. The speech frames are divided into 4 subframes with 40 samples of the short term residual signal in each. Each sub-frame is processed block wise by the subsequent functional elements. Before the processing of each sub block of 40 short term residual samples, the parameters of the long term analysis lter, the LTP lag and the LTP gain, are estimated and updated in the LTP analysis block, on the basis of the current sub-block of the present and a stored sequence of the 120 previous reconstructed short term residuals. A block of 40 long term residual signal samples is obtained by subtracting 40 estimates of the short term residual signal from the short term residual signal itself. The resulting block of 40 long term residual samples is fed to the Regular Pulse Excitation analysis which performs the basic compression function of the algorithm. As a result of the RPE analysis, the block of 40 input long term residual samples are represented by one of 4 candidate sub-sequences of 13 pulses each. The subsequence selected is identied by RPE grid position (M). The 13 RPE pulses are encoded using Adaptive Pulse Code Modulation (APCM) with estimation of the sub-block amplitude which is transmitted to the decoder as side information. The RPE parameters are also fed to a local RPE decoding and reconstruction module which produces a block of 40 samples of the quantized version of the long term residual signal. By adding these 40 quantized samples of the long term residual to the previous block of short term residual signal estimates, a reconstructed version of the current short term residual signal is obtained. The block of reconstructed short term residual signal samples is then fed to the long term analysis lter which produces the new block of 40 short term residual signal estimates to be used for the next sub-block thereby completing feedback loop.

term residual samples. These samples are then applied to the short term synthesis lter followed by the de-emphasis lter resulting in the reconstructed speech signal samples. GSM 06.10 describes the detailed mapping between input blocks of 160 speech samples in 13 bit uniform PCM format to encoded blocks of 260 bits and from encoded blocks of 260 bits to output blocks of 160 reconstructed speech samples. The sampling rate is 8000 samples/sec leading to an average bit rate for the encoded bit stream of 13 kbps. The bit allocation for the ETSI GSM 06.10 Full Rate Speech coder is as shown in Table 1.

4 Proposed modication in GSM full rate speech coder GSM Full Rate Coder Consists of three major blocks Linear Predictive Coding Section, Long Term Predictive Section and Regular Pulse Excitation Section. The proposed modications are suggested in RPE Section in the selection of grid positions. In RPE section, selection of grid position and samples is modied such a way that no samples repeat in multiple grids which is the case of GSM Full Rate coder in rst and forth grid where except sample number 0 and sample number 39 dont repeat where as all other samples in both grids repeat. A new proposed grid selection strategy is as shown in Fig. 3. As can be seen in Fig. 3, if the weighting ltered Prediction-error sequence is down-sampled by a ratio of 4 instead of 3, it results into four interleaved sequences with regularly spaced pulses. These are dened with Xm [k] = X[m + 4k]; m = 0, 1, 2, 3; k = 0, 1, 2 . . . 9; (1)

3 GSM full rate decoder The detailed block diagram of GSM 06.10 Speech Decoder is shown in Fig. 2. The decoder includes the same structure as the feedback loop of the encoder. In error free transmission, the output of this stage will be the reconstructed short

where, m = no. of grids per sub segment and k = no. of samples per grid. The benet in this sampling grid position selection is that, there is no repetition of any sample in multiple grids where as now the total number of samples per grid reduces from 13 to 10 so ultimately there is a reduction in overall bit-rate

Int J Speech Technol (2011) 14:157165

159

Fig. 1 Detailed block diagram of Full Rate GSM 06.10 Speech Encoder (ETSI 20052006)

Fig. 2 Detailed block diagram of Full Rate GSM 06.10 Speech Decoder (ETSI 20052006)

160

Int J Speech Technol (2011) 14:157165

5 Subjective and objective analysis 5.1 Subjective analysis (1) Mean Opinion Score (MOS) One of the important Subjective Analysis is MOS (Mean Opinion Score) which is a statistical method of judging the quality of the compressed speech. Randomly untrained listeners are chosen and they are asked to judge the overall quality of recovered speech signal produced after decoding. The ratings of all listeners are recorded after playing decoded speech in quiet environment with high quality headphones and then averaged to get nal MOS scores. They are provided as given in the Table 5. 5.2 Objective analysis To evaluate the performance of proposed GSM FR, the different types of Objective Analysis have been carried out in this paper. Objective Analysis has been categorized into waveform, spectral, perceptual and composite measures (Hu and Loizou 2008). 5.2.1 Waveform based objective analysis The following parameters are evaluated in this category of Objective Analysis. (1) Signal to Noise Ratio is mathematically dened as SNR = 10 log10 |Si |2 |Si So |2 (2)

Fig. 3 Sampling grids used in position selection for proposed GSM FR 11.2 Kbps coder

Table 2 Bit allocation for proposed GSM full rate speech coder Parameter No. per frame LPC Pitch Period Long Term Gain Grid Position Peak Magnitude Sample Amplitude Total 8 4 4 4 4 4 10 6, 6, 5, 5, 4, 4, 3, 3 7 2 2 6 3 Resolution Total bits / frame 36 28 8 8 24 120 224

of 1.8 kbps (3 samples per grid 3 bits per sample 4 subframes = 36 bits/20 ms frame) compared to actual bit rates of 13 kbps for GSM 06.10 FR coder which can be useful for error detection and correction purpose in each frame transmission. The proposed modication in GSM FR offers a new bit allocation table as shown in Table 2 and Table 3 shows modied encoder parameters according to its occurrence and its bit allocations in speech frame with reference to standard GSM FR (ETSI 20052006). The parameters produced by GSM FR encoder like short term lter parameters, long term prediction parameters and RPE parameters have their unequal importance with respect to their recovered speech quality. The GSM FR encoded 260 bits are rearranged according to its subjective importance as mentioned in GSM 05.03 (ETSI 1999). The rearranged bits are classied into class Ia, Ib and II which contains total no. of 50, 132 and 78 bits respectively. Error protection in different classes is using different methods like CRC and convolution coding. Table 4 shows the modications in GSM 05.03 (Table 2) for proposed GSM FR 11.2 kbps (ETSI 1999), so now the saved 36 bits per frame can effectively be used either for better error protection at channel coding stage or for steganographic data transmission.

where Si = input signal, So = decoded signal and N = total no. of frames. (2) Segmental SNR is mathematically given as SNRSEG 1 = M
M1

10 log10
j =0

mj 2 nmj N +1 s (n) mj 2 nmj N +1 [s(n) s (n)]

(3)

where, s(n) = input signal, s (n) = decoded signal, N = segment length, M = no. of segments and mj = end of the current segment (Hu and Loizou 2008). 5.2.2 Perceptual based objective analysis and composite measure The following is the important parameter for performing perceptual based analysis. (1) Perceptual Evaluation of Speech Quality (PESQ) PESQ compares an original speech signal with the decoded signal that is the result of passing the original signal through a communication system. The output of PESQ is a

Int J Speech Technol (2011) 14:157165 Table 3 Proposed modications in GSM FR encoder output parameters in order of occurrence and bit allocation within the speech frame Parameter Filter parameters Parameter number 1 2 3 4 5 6 7 8 LTP parameters 9 10 RPE parameters 11 12 13 14 . . 22 LTP parameters 23 24 RPE parameters 25 26 27 28 . . 36 LTP parameters 37 38 RPE parameters 39 40 41 42 . . 50 LTP parameters 51 52 RPE parameters 53 54 55 LTP Lag LTP Gain Grid position Block amplitude RPE pulse 1 RPE pulse 2 . . RPE pulse 10 LTP Lag LTP Gain Grid position Block amplitude RPE pulse 1 RPE pulse 2 . . RPE pulse 10 LTP Lag LTP Gain Grid position Block amplitude RPE pulse 1 RPE pulse 2 . . RPE pulse 10 LTP Lag LTP Gain Grid position Block amplitude RPE pulse 1 Parameter name Logarithmic Area Ratio 1 to 8 Variable name LAR 1 LAR 2 LAR 3 LAR 4 LAR 5 LAR 6 LAR 7 LAR 8 N1 b1 M1 Xmax1 X1(0) X1(1) . . X1(9) N2 b2 M2 Xmax2 X2(0) X2(1) . . X2(9) N3 b3 M3 Xmax3 X3(0) X3(1) . . X3(9) N4 b4 M4 Xmax4 X4(0) Number of bits 6 6 5 5 4 4 3 3 7 2 2 6 3 3 . . 3 7 2 2 6 3 3 . . 3 7 2 2 6 3 3 . . 3 7 2 2 6 3

161

Bit number (LSB-MSB) b1. . .b6 b7. . .b12 b13. . .b17 b18. . .b22 b23. . .b26 b27. . .b30 b31. . .b33 b34. . .b36 b37. . .b43 b44. . .b45 b46. . .b47 b48. . .b53 b54. . .b56 b57. . .b59 . . b81. . .b83 b84. . .b90 b91. . .b92 b93. . .b94 b95. . .b100 b101. . .b103 b104. . . 106 . . b128. . .b130 b131. . .b137 b138. . .b139 b140. . .b141 b142. . .b147 b148. . .b150 b151. . .b153 . . b175. . .b177 b178. . .b184 b185. . .b186 b187. . .b188 b189. . .b194 b195. . .b197

162 Table 3 (Continued) Parameter Parameter number 56 . . 64 Parameter name RPE pulse 2 . . RPE pulse 10 Variable name X4(1) . . X4(9)

Int J Speech Technol (2011) 14:157165

Number of bits 3 . . 3

Bit number (LSB-MSB) b198. . .b200 . . b221. . .b224

prediction of the perceived quality that would be given to decoded speech by a listeners in subjective listening tests like MOS. The PESQ score is mapped to a MOS like scale with range between 1.0 and 4.5 (de Lamare and Alcaim 2005). In comparison with other objective measures, the PESQ measure is the most complex to compute and is the one recommended by ITU-T P.862 for speech quality assessment of 3.2 kHz (narrow-band) handset telephony and narrow-band speech Codecs (ITU-T 2001). The other benet of PESQ is that it provides high correlation with subjective MOS analysis. PESQ score is computed as a linear combination of the average disturbance value Dind and the average asymmetrical disturbance values Aind as follows PESQ = a0 + a1 Dind + a2 Aind (4)

10 M
M1

m=0

|X(j,m)|2 K j =1 W (j, m) log10 (|X(j,m)||X(j,m)|)2 K j =1 W (j, m)

(5)

where W (j, m) is the weight placed on the j th frequency band, K is the number of bands, M is the total number of frames in the signal, |X(j, m)| is the weighted (by a Gaussian-shaped window) clean signal spectrum in the j th frequency band at the mth frame, and |x(j, m)| in the weighted decoded signal spectrum in the same band (Hu and Loizou 2008).

where, the parameters a0 , a1 and a2 are determined using Multiple linear regression analysis and then optimized for required measurements. Different set of parameters (a0 , a1 , a2 ) are chosen for establishing correlation between PESQ scores and Composite measures in line with (Hu and Loizou 2008). Also parameters like Dind and Aind were treated as independent variables in regression analysis. (2) Composite measures As conventional objective measures are not sufcient to provide high correlations in terms of speech/noise distortion and overall speech quality, it is hence necessary to combine different objective measures in order to produce Composite measure (Hu and Loizou 2008). Here, Multiple Linear Regression Analysis and Multivariate Adaptive Regression Spines (MARS) techniques are used to produce different parameters. With reference to ITU P.835 standards, the following parameters are used for evaluation of Composite measure: Predicted rating of Overall Speech Quality (Covl), Rating of speech distortion (Csig) and Rating of background distortion (Cbak) (Hu and Loizou 2008; ITU-T 2003). 5.2.3 Spectral based objective analysis The following parameters are evaluated in this category of Objective Analysis. (1) Frequency Weighted Segmental SNR (fwSNRseg) is expressed as follows fwSNRseg

6 Performance evaluation of proposed GSM FR coder Here, both GSM 06.10 FR and proposed GSM FR coders are implemented in MATLAB and performance of both coders is evaluated using different Subjective and Objective measures. First GSM 06.10 FR coder is implemented in MATLAB and then proposed modications are carried out in GSM FR to provide room of 36 bits/ frame in 260 bits of each transmitted frames for better error concealment at channel coding. For the purpose of Subjective and Objective analysis, ve different wave les have been chosen (NOIZEUS 2009). Each Wave le is sampled at 8 kHz and coded by 16 bits mono rather than 13 bits which is the case of actual GSM FR to produce 104 kbps. 6.1 Results obtained for MOS analysis As discussed previously, MOS analysis is carried out for ve different wave les. Here, ten untrained listeners had to judge the quality of speech in quiet environment using high quality headphones. Listeners had to rate score for decoded speech les of both standard GSM FR coder and also for proposed GSM FR coder. Resulting average MOS scores are sited in Fig. 4. As can be seen from Fig. 4 that there is a small degradation in MOS score for proposed GSM FR coder in comparison with standard GSM FR coder but still the proposed coder offers acceptable values of MOS score when compared with its counterpart.

Int J Speech Technol (2011) 14:157165 Table 4 Modications in proposed GSM FR coder for channel coding (224 bits/20 ms frame) Parameter name Log area ratio 1 block amplitude Log area ratio 1 Log area ratio 2 Log area ratio 3 Log area ratio 1 Log area ratio 2 Log area ratio 3 Log area ratio 4 LTP Lag Block amplitude Log area ratio 2, 5, 6 LTP lag LTP Lag LTP lag LTP lag Block amplitude Log area ratio 1 Log area ratio 4 Log area ratio 7 LTP lag Log area ratio 5, 6 LTP gain LTP lag Grid position Log area ratio 1 Log area ratio 2, 3, 8, 4 Log area ratio 5, 7 LTP gain Block amplitude RPE pulses RPE pulses RPE pulses RPE pulses Grid position Block amplitude RPE pulses RPE pulses RPE pulses RPE pulses RPE pulses Log area ratio 1 Log area ratio 2, 3, 6 Log area ratio 7 Log area ratio 8 27. . . .35 36 41. . . .50 55. . . ..64 1 2, 3, 6 7 8 Parameter number 1 12, 26, 40, 54 1 2 3 1 2 3 4 9, 23, 37, 51 12, 26, 40, 54 2, 5, 6 9, 23, 37, 51 9, 23, 37, 51 9, 23, 37, 51 9, 23, 37, 51 12, 26, 40, 54 1 4 7 9, 23, 37, 51 5, 6 10, 24, 38, 52 9, 23, 37, 51 11, 25, 39, 53 1 2, 3, 8, 4 5, 7 10, 24, 38, 52 12, 26, 40, 54 13. . . .22 27. . . .36 41. . . .50 55. . . .64 11, 25, 39, 53 12, 26, 40, 54 13. . . .22 1 0 2 2 2 2 2 0 1 1 1 1 1 1 0 1 0 1 . . . . . . . . . . d145 d146 . . . . . . Class 2 Bit index 5 5 4 5 4 3 4 3 4 6 4 3 5 4 3 2 3 2 3 2 1 2 1 0 1 1 2 Label d0 d1, d2, d3, d4 . . . . . . . . . . . . . . . . . . d49 d50, d51 . . . . . 1B With parity check Class 1A With parity check

163

Without error protection

164 Table 4 (Continued) Parameter name Log area ratio 8, 3 Log area ratio 4 Log area ratio 4, 5 Block amplitude RPE pulses RPE pulses RPE pulses RPE pulses Log area ratio 2, 6 Parameter Number 8, 3 4 4, 5 12, 26, 40, 54 13. . . .22 27. . . .36 41. . . .50 55. . . .64 2, 6 Bit index 0 1 0 0 0 0 0 0 0

Int J Speech Technol (2011) 14:157165

Label . . . . . . . . d222, d223

Class

Fig. 4 MOS score comparison between standard GSM FR coder and proposed GSM FR coder

Fig. 5 PESQ score comparison between standard GSM FR coder and proposed GSM FR coder

Table 5 Mean opinion score (MOS) ratings Sr. No. 1 2 3 4 5 Choice Excellent Good Fair Poor Unacceptable MOS ratings 5 4 3 2 1

observed in Table 6, all Objective parameters offer acceptable values. It is visible from obtained results that there is a small degradation in values of each parameter when the performance of proposed GSM FR coder is compared with standard GSM FR coder. This small degradation of values for all parameters is in response to reduction of bitrates by 1.8 kbps from standard GSM FR to Proposed GSM FR coder. Figure 5 shows PESQ score comparison between both standard and proposed coders.

6.2 Results obtained for objective analysis 7 Discussion and concluding remarks In this paper, as mentioned in previous section different types of Objective analysis have been conducted and their results are tabulated as shown below in Table 6. As can be In order to conserve the channel bandwidth, the role of a speech coder is to provide toll quality recovered speech sig-

Int J Speech Technol (2011) 14:157165 Table 6 Objective analysis comparison between standard GSM FR coder & Proposed GSM FR coder Algorithm Wave les (.wav) Perceptual Analysis PESQ Standard GSM FR (13 kbps) Ninad.wav Ninadvoice.wav Five.wav Doormono.wav Sp21.wav Proposed GSM FR (11.2 kbps) Ninad.wav Ninadvoice.wav Five.wav Doormono.wav Sp21.wav 3.0812 3.0381 2.8038 2.6190 2.8598 3.0042 2.8952 2.9636 2.3190 2.7457 Composite measures Csig 1.6643 2.7733 2.7017 2.9858 2.6280 1.6209 2.5618 2.9062 2.8337 2.4834 Cbak 2.1895 2.4256 2.4217 2.5744 2.4405 2.1597 2.2948 2.4944 2.3982 2.3076 Covl 1.6737 2.3592 2.1832 2.4626 2.3222 1.6320 2.1394 2.4357 2.2509 2.1393 Waveform Analysis SNR 3.9566 6.1327 3.5587 3.3471 2.0206 3.7958 5.0678 3.5587 2.7804 1.8839 SNRseg 1.7324 1.8035 3.6560 3.5598 2.4059 1.7136 1.5015 3.0310 3.0660 2.2064

165

Spectral Analysis fwSNRseg 1.1441 2.4729 8.4468 8.3558 7.3820 0.8179 1.0801 8.0093 7.6917 6.5518

nal even with comparatively lower bit rate and also with less delay and complexity. There is a trade off between Quality of Speech and Bit Rate. Full Rate GSM Speech CODEC offers moderate delay and less complexity in comparison with other coders but at the cost of comparatively moderate bit rate (Malkovic 2003). The idea behind implementation of proposed GSM FR coder is to reduce the bit-rate of GSM FR coder (by 1.8 kbps) which can be used for better error concealment at channel coding at the same time the spared 36 bits per frame can be used for steganographic data transmission as well. The proposed GSM FR coder, as is implemented in line with standard GSM FR coder, reduces overall complexity and delay (which is an inherent benet of GSM FR coder when compared to other standard GSM coders) but provides small degradation in speech quality as observed in Figs. 4 and 5 for MOS ratings and PESQ scores respectively. Subjective analysis provides acceptable values of MOS score for proposed coder while compared with standard GSM FR coder. Objective analysis of different parameters also resulted into satisfactory values when both coders are compared. Both PESQ (Objective) and MOS (subjective) scores for both coders are quite comparable and small reduction in their values is also clearly visible with reduction in overall

bitrates of coder from standard 13 kbps coder to proposed 11.2 kbps coder.

References
de Lamare, R. C., & Alcaim, A. (2005). Strategies to improve the performance of very low bit rate speech coders and application to a variable rate 1.2 kbps codec. IEE Proceedings. Vision, Image and Signal Processing, 152(1). ETSI (1999). Channel coding (GSM 05.03 version 8.9.0 2005-01), pp. 1219 & 98. ETSI (20052006). Digital cellular telecommunications system (Phase 2+), full rate speech, transcoding (GSM 06.10 version 8.2.0), pp. 1059. Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech and Language Processing, 16(1). ITU-T (2001). Recommendation P.862, Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codec, pp. 118. ITU-T (2003). Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm, P.835. Malkovic, D. (2003). Speech coding methods in mobile radio communication systems. In 17th international conference on applied electromagnetics and communications, Croatia. The NOIZEUS database (2009). Available: http://www.utdallas.edu/ ~loizou/speech/noize.

You might also like