In this paper, a comparative study is made between the performances of the arithmetic and Huffman coding and suitable entropy coding technique is identified for the compression of satellite vibration test data. From the results of the experiments conducted on samples of satellite vibration test
data, it is inferred in this paper that the arithmetic coding provides better compression ratio, require lesser computation time and hence it is most effective entropy coding technique that can be adopted in compression of satellite vibration test data
Original Title
Entropy Coding Technique for Compression of Satellite Vibration Test Data
In this paper, a comparative study is made between the performances of the arithmetic and Huffman coding and suitable entropy coding technique is identified for the compression of satellite vibration test data. From the results of the experiments conducted on samples of satellite vibration test
data, it is inferred in this paper that the arithmetic coding provides better compression ratio, require lesser computation time and hence it is most effective entropy coding technique that can be adopted in compression of satellite vibration test data
In this paper, a comparative study is made between the performances of the arithmetic and Huffman coding and suitable entropy coding technique is identified for the compression of satellite vibration test data. From the results of the experiments conducted on samples of satellite vibration test
data, it is inferred in this paper that the arithmetic coding provides better compression ratio, require lesser computation time and hence it is most effective entropy coding technique that can be adopted in compression of satellite vibration test data
International Journal of Emerging Trends in Signal Processing( IJETSP )
ISSN(2319-9784) , Volume 1 , Issue 5 September 2013
1
Entropy Coding Technique for Compression of Satellite Vibration Test Data Dr. A M Khan # , B R Nagendra *1 , Dr. N K Misra *2
# Department of Electronics, Mangalore University, Mangalore, India asifabc@yahoo.com * Facilities, ISRO Satellite Centre, Bangalore, India 1 brnag@isac.gov.in 2 nkmisra@isac.gov.in
Abstract Entropy coding is an important stage in any compression algorithm. During compression process, data is transformed from one domain to another domain to make use of the redundancy present in the data. The size of the resultant transformed data is less compared to original data size and hence, results in compression of the data. But, the transformed data is organized in such a way that entropy coding of these transformed coefficients provides further compression ratio. The most commonly used techniques of coding are arithmetic coding and Huffman coding. The performance of each of these techniques depends on the type of data that the transformed coefficients represent. In this paper, a comparative study is made between the performances of the arithmetic and Huffman coding and suitable entropy coding technique is identified for the compression of satellite vibration test data. From the results of the experiments conducted on samples of satellite vibration test data, it is inferred in this paper that the arithmetic coding provides better compression ratio, require lesser computation time and hence it is most effective entropy coding technique that can be adopted in compression of satellite vibration test data.
Keywordssatellite vibration test; vibration test data; vibration data compression; entropy coding; arithmetic coding; Huffman coding; I. INTRODUCTION Various environmental tests such as vibration tests, thermo vacuum tests and acoustic tests are conducted on satellites as part of design validation, qualification and flight acceptance requirements. Vibration testing is one of the important environmental tests being carried out on satellites and its subsystems, as in [2]. Types of vibration tests that are conducted on satellite and its subsystems are sine vibration tests, random vibration tests and shock tests. High speed data acquisition systems are used to acquire, analyse and store the vibration test data of the satellites. The sampling rate at which the data is acquired during vibration tests varies from 5 KHz to 100 KHz depending on criticality of the test and range of frequency components to be considered for analysis. Vibration response levels at about 250 locations on the satellites are monitored to study the dynamic behaviour of the different satellite components. About 50 GB of test data, as in [4], is generated during a typical satellite vibration test. Tens of terra bytes of vibration test data is generated per year from vibration tests conducted on various satellites and its subsystems. This huge amount of data is stored in a centralized data base server at vibration laboratory of a space organization, for off-line analysis at a later time. II. COMPRESSION OF SATELLITE VIBRATION TEST DATA The enormous amount of vibration test data is to be stored in data server systems and accessing this data over intranet backbone of the organization at a later time encounters two challenges. Firstly, it takes several minutes to access data from the database server. Secondly, the storage space required for archiving the vibration test data in server system is very large. The above two issues are resolved by compressing the satellite vibration test data before its archival. There are hundreds of compression techniques which are developed for compressing various types of data. As the performance of the compression algorithm depends on the characteristics of the data, there is no single best method for compression of all types of data. Each compression algorithms provide the best performance parameters for the associated data sets. One algorithm will not be effective for the other data set. Different vibration data sets corresponding to various vibration environments have unique features such as frequency distribution, skewness, kurtosis, randomness, correlations, etc. Hence, there are various algorithms being developed for variety of vibration test data such as motor vibrations, seismic vibrations, gear box vibrations, etc., as in [13]. This paper proposes an the most suitable entropy coding technique that will be the final stage of the compression algorithm designed for satellite vibration test data. III. ENTROPY CODING AND COMPRESSION ALGORITHM Compression algorithms make use of redundancy in the data set. To obtain optimal compression ratio, most of the compression techniques uses transform coding by applying any of enumerable transform algorithms like Fourier transforms, wavelet transforms, Hilbert transforms, KarhunenLoeve transforms, etc, as in [15]. When these transforms are applied on the real numbered vibration test data values, the resultant transformed coefficients are also the real International Journal of Emerging Trends in Signal Processing( IJETSP ) ISSN(2319-9784) , Volume 1 , Issue 5 September 2013
2
numbers. Further compression is obtained by applying entropy coding, which is the process of representing information in the most compact form, as in [14]. The pre-requisite process of entropy coding is to convert the real numbered transformed coefficients into integer set of data, and the reverse of this conversion is to be carried out during decoding process. Three commonly used entropy coding techniques are Huffman coding, arithmetic coding, Lempel-Ziv coding, as in [1]. All these entropy coding techniques are lossless coding techniques and hence original symbol sequence is reconstructed without any error. Transform coding provides significant compression by utilizing the redundancy in the data values, whereas the entropy coding makes use of probability distribution patterns of the transformed coefficients and results in further compression of the data, as in [5]. Hence, entropy coding techniques converts the integer values representing the transformed coefficients into binary bit stream and results in increased efficiency of the compression algorithm. Lempel-Ziv coding is a dictionary based coding which is suitable for compression of text data. Hence, Lempel-Ziv coding is not considered in the present study to identify appropriate entropy coding technique for satellite vibration test data. IV. ARITHMETIC CODING Arithmetic coding is a form of entropy encoding used in lossless data compression. This coding method assigns one codeword to entire input symbol stream. Codeword is a number, representing a segmental sub-section based on the probabilities of the symbols being coded. The binary equivalent of this codeword is the resultant bit stream of arithmetic coding process, as in [8, 12, 16]. Arithmetic coding algorithm divides the probability scale of 0 to 1 into n number of partitions, where 'n' is the number of symbols of the alphabet set being coded. Each partition represents each symbol and the partition size is proportional to probability of occurrence of the corresponding symbol. The partitions are made in such way that the partition for the symbol with highest probability starts from zero and the partition for the symbol with lowest probability is at the end, near unity. To code a sequence of 'k' symbols of an alphabet set with 'n' symbols, k iterations are carried out in the coding process. In each of the iteration, the partition corresponding to the current symbol being coded is selected. This range is expanded and re-partitioned with same proportions as done in the initial iteration for the scale - 0 to 1. After k-iterations, final range of values is selected and the number with minimal numerals in the final range is identified and this number is the arithmetic code of given sequence of k-symbols. The binary equivalent of this number is the bit stream that is obtained as a result of arithmetic coding of 'k' symbols of the alphabet set with n-symbols. The equations used in arithmetic coding algorithm during each of the iteration are given below: L k =L k-1 + R k-1 Cp i (1)
R k =R k-1 p i (2) CP(S i )=L k + R k Cp i (3)
where,
L k = minimum value of the scale in k th iteration R k = maximum value of the scale in k th iteration CP(S i )=cumulative probability of current symboli Cp i = cumulative probability of symboli in first iteration p i = probability of coded symbol in the previous iteration This algorithm of arithmetic coding is explained by an illustration given below. Consider an alphabet set {A, B, C, D, E} with set of probability of occurrence of each symbol given by {0.3, 0.25, 0.2, 0.15, 0.1}. The process to generate arithmetic code for the symbol sequence - {CDBB} is illustrated pictorially in Fig. 1. As mentioned earlier arithmetic code generates a single integer that represents the coded symbol sequence. In the above example, an integer obtained by removing the decimal place of the number in the interval identified in final iteration, i.e 0.71125 to 0.71312, with least number of digits is the resultant codeword of this arithmetic coding process. Hence, the arithmetic code for the symbol sequence - {CDBB} is 712 i.e. its equivalent binary bit sequence {1011001000} represents the symbols sequence - {CDBB}.
Fig. 1 Illustration of Arithmetic coding of the example considered The decoding algorithm to obtain original set of symbols requires the information about the alphabet set and its probability distribution. The above scales are reconstructed and symbols are extracted during reconstruction of each scale. It is also essential for encoder to send the number of symbols encoded or assign a new EOF(End Of File) symbol in the alphabet, with a very small probability, and encode it at the end of the message. One advantage of arithmetic coding over other similar coding techniques used in data compression is the E
C
D
B Lk=0.0 - - 0.3000 - - 0.5500 - - 0.7500 - -
A 0.9000 - - Rk=1.0 - E
C
D
B 0.5500 -- 0.6100 - - 0.6600 - - 0.7000 - -
A 0.7300 - - 0.7500 - - E
C
D
B 0.7000 - - 0.7090 - - 0.7165 - - 0.7225 - -
A 0.7270 - - 0.7300 - - E
C
D
B 0.7090 - 0.71125 - 0.71312 - 0.71462 5
A 0.71575 - 0.71650 - International Journal of Emerging Trends in Signal Processing( IJETSP ) ISSN(2319-9784) , Volume 1 , Issue 5 September 2013
3
convenience of adaptation, as in [6, 9]. Adaptation is the changing of probability distribution tables while coding the data. The decoded data matches the original data as long as the frequency table in decoding is replaced in the same way and in the same step as in encoding. The synchronization is usually based on a combination of symbols occurring during the encoding and decoding process. Adaptive arithmetic coding significantly improves the compression ratio compared to static methods. There are various variants of arithmetic coding which are designed to provide best performance and the algorithms of each of these variants depend on the type of data being compressed, as in [3, 10]. V. HUFFMAN CODING Algorithm for Huffman coding using Huffman tree is explained below. Let the symbols to be coded is from the alphabet set-A, with n symbols with each symbol has the frequency of occurrence - f i . The algorithm for Huffman coding consists of five steps, and each step is explained as in Fig. 2. First step of Huffman algorithm results in a tree which is called the Huffman tree. To obtain the Huffman code for any symbol, it is required to create the Huffman tree and traverse from root to the leaf nodes representing the symbol and generate the binary sequence for each symbol. This process is explained below using the same example considered to illustrate arithmetic coding in the previous section.
Fig. 2 Huffman Coding Algorithm Huffman coding for the four symbols {CDBB} of the Alphabet set {A, B, C, D, E} with frequencies {30, 25, 20, 15, 10} consists of four iterations which is illustrated in Table 1. During each of the iteration the symbols with minimum frequencies are selected.
TABLE I ITEARATIONS OF HUFFMAN CODING Itera -tion Alphabet Set Frequencies Symbols Selected Frequency of new symbol 1 {A, B, C, D, E} {30, 25, 20, 15, 10} n1={D, E} {25} 2 {A, B, C, n1} {30, 25, 20, 25} n2 ={B, C} {45} 3 {A, n2, n1} {30, 45, 25} n3={A, n1} {55} 4 {n3, n2} {55, 45} - -
Huffman tree generated for the above example is shown in Fig. 3. In the first iteration, symbols D and E are selected and a new symbol 'n1' is created. In the second iteration, symbols B and C are selected and a new symbol 'n2' is formed. In the next iteration, 'n1' and A are combined to create the symbol 'n3'. Hence the Huffman tree is created from the leaf nodes to the root, i.e. from bottom to top of the tree, as in [17]. From the Huffman tree for the given alphabet set, the Huffman dictionary obtained by traversing from the root to the leaf nodes and binary codes for each symbol is generated. Hence, the codes generated for the symbols of the example considered are: A: 00, B:10, C:11, D:010, E:011 Hence, the Huffman code for the symbols {CDBB} is {110101010}. Huffman codes are not unique. Different ways of selecting symbols results in different set of codes. The decoder must also have the same Huffman dictionary that was used during the encoding, as in [11]. The first important feature of Huffman code is that they are prefix codes, i.e. no codeword is a prefix of any other codeword. This result in an important feature, i.e. Huffman decoder unambiguously reconstructs the original set of symbols. Hence, Huffman code is uniquely decodable code i.e. has only one possible source string producing it, as in [7].
Fig. 3 Huffman tree of the illustrated example 1 E=10 0 D=15 1 n2=45 0 n3=55 1 n1=25 0 A=30 1 C=20 0 B=25 Step 1: Pick two symbols 'a','b' from alphabet set-A, with the smallest frequencies and create a subtree that has these two characters as leaves. Label the root of this subtree as 'n1' Step 2: Remove the symbols a and b and add n1 as new symbol of the Alphabet set-A and set frequency, f(n1)=f(a)+f(b) for the newly added symbol, n1. The resulting new alphabet set is: A 1 =A U {n1} - {a,b} Number of elements in -A is |A 1 |=|A|-1 Step 3: Repeat steps 1 and 2 until an alphabet left with only one symbol. Step 4 : Construct Huffman dictionary that contains binary code for each symbol of the alphabet, by traversing from root to each symbol Step 5: Generate Huffman binary bit stream for given set of symbols using Huffman dictionary
International Journal of Emerging Trends in Signal Processing( IJETSP ) ISSN(2319-9784) , Volume 1 , Issue 5 September 2013
4
VI. EXPERIMENTS CONDUCTED As mentioned in the first section of this paper, three different types of vibration tests are conducted on satellites and its subsystems. They are random vibration tests, sine vibration tests and shock tests. Data of each type of test have unique characteristics, as in [2]. Hence, it is required to carry out experiments on each type of test data to identify the appropriate entropy coding technique that provides better performance for each of the test data. Data acquired during each of these tests on a typical satellite are considered and experiments are conducted on these data sets. Ten different sets of data of each type of vibration test were used during the experiments. The data set was conditioned and wavelet decomposition was carried out to obtain transformed coefficients. The resulting coefficients were pre-processed by converting them to integers which is an essential step prior to arithmetic or Huffman coding. In order to choose the optimal entropy coding technique to be used in compression of satellite vibration test data, both arithmetic and Huffman coding were carried out on transformed coefficients and the performance parameters such as compression ratio and coding time of both coding algorithms were computed. The steps used to conduct this experiment are depicted in Fig. 4. The satellite vibration test data, which was represented by real numbers were coded into binary bit stream by using the algorithm described above. The size of original test data and the coded binary bit stream were computed to estimate the compression that was achieved by this algorithm.
Fig. 4 Process of experiment conducted
VII. RESULTS AND OBSERVATIONS The results of experiments conducted on ten data samples of each of three types of satellite vibration tests are analyzed. The compression ratio obtained by arithmetic and Huffman coding for different sets of sine test data is shown in Table II. TABLE II COMPARISON OF COMPRESSION RATIO OBTAINED BY ARITHMETIC AND HUFFMAN CODING FOR SINE VIBRATION DATA Data Set Arithmetic Coding Huffman Coding Data Set Arithmetic Coding Huffman Coding 1 358.04 59.84 6 34.20 31.95 2 50.98 40.50 7 32.26 30.61 3 54.49 42.54 8 13.19 13.09 4 281.27 58.95 9 325.86 59.61 5 208.28 57.88 10 37.65 34.23 The compression ratio obtained by arithmetic and Huffman coding for different sets of random test data of a typical satellite is shown in Table III. The results observed during arithmetic and Huffman coding for different sets of shock test data of a typical satellite is shown in Table IV. Experiments are conducted by keeping all the parameters of data conditioning, wavelet transformation, pre-processing unchanged and arithmetic coding and Huffman coding are carried out for each data set. Hence, the variations in compression ratio and computation time were due to the coding algorithm alone. TABLE III COMPARISON OF COMPRESSION RATIO OBTAINED BY ARITHMETIC AND HUFFMAN CODING FOR RANDOM VIBRATION DATA Data Set Arithmetic Coding Huffman Coding Data Set Arithmetic Coding Huffman Coding 1 9.32 9.29 6 12.13 12.02 2 9.81 9.76 7 12.69 12.60 3 9.06 9.03 8 11.94 11.87 4 9.95 9.90 9 8.30 8.03 5 8.20 8.17 10 8.41 8.23
TABLE IV COMPARISON OF COMPRESSION RATIO OBTAINED BY ARITHMETIC AND HUFFMAN CODING FOR SHOCK VIBRATION DATA Data Set Arithmetic Coding Huffman Coding Data Set Arithmetic Coding Huffman Coding 1 19.70 19.58 6 20.96 20.66 Vibration data for compression Data Conditioning
Wavelet Decomposition
Coefficients Pre-processing
Frequency Estimation
Arithmetic / Huffman Coding
Coded Bit Stream
International Journal of Emerging Trends in Signal Processing( IJETSP ) ISSN(2319-9784) , Volume 1 , Issue 5 September 2013
It is observed from the above tables that in the all the experiments conducted on different data sets, arithmetic coding provided better compression ratio than Huffman coding. It is also observed that in case of data sets, where higher compression ratio is obtained, the arithmetic coding is most suitable compared to the Huffman coding. In case of sine and shock test data, the compression ratio obtained is high due to periodicity of sine data and transient nature of shock data. These two factors leads to the higher redundancy in the transformed coefficients corresponding to the data and hence resulted in higher compression ratio. In case of random test data the redundancy is comparatively less and hence, the lesser compression ratio is obtained. The computation time required for carrying out arithmetic and Huffman coding of transformed coefficients was recorded. The computational time required to carry out arithmetic and Huffman coding for different samples of sine, random and shock vibration test data is noted and averaged values of computation time of each type of test data is shown in Table V. This table also shows the computation time required to decode the binary bit stream and reconstruct the transformed coefficients. It was ensured that the both arithmetic and Huffman algorithms were executed on all data sets with same computational resources. It was observed that computational time required for execution of arithmetic coding and decoding was significantly less than that of Huffman coding and decoding for all three types of satellite vibration test data. TABLE V Comparison Of Computation Time Of Arithmetic And Huffman Coding Coding Method Sine Test Data Random Test Data Shock Test Data Arithmetic Coding 23.84 Sec 159.13 sec 26.67 Sec Huffman Coding 537.01 Sec 690.21 Sec 76.75 Sec Arithmetic Decoding 35.53 Sec 183.53 sec 32.73 Sec Huffman Decoding 763.52 Sec 756.24 Sec 96.67 Sec
VIII. CONCLUSION From observation made based on the results of the experiments conducted, it is concluded that the arithmetic coding is most suitable entropy coding technique for compression of satellite vibration test data in order obtain the maximum compression ratio and least coding time. ACKNOWLEDGEMENT Authors wish to acknowledge S N Prakash, V Ramesh Naidu, A R Prashant and M Madheswaran of Vibration Laboratory, ISRO Satellite Centre, Bangalore, India, for the support given to carry out the present study. REFERENCES [1] Zhou J, Wong K W, Chen J, "Distributed Block Arithmetic Coding for Equiprobable Sources, IEEE Sensors Journal, Volume 13, Issue 7, pp. 2750-2756, July 2013 [2] B R Nagendra, A M Khan, V Ramesh Naidu, S N Prakash, N K Misra, "Characterization of Vibration Test Data of Satellites", American Journal of Signal Processing, Volume 3, No. 2, pp. 35-40, 2013 [3] Chien-Pen Chuang, Guan-Xian Chen, Yi-Tsai Liao, Chia-Chieh Lin, "A Lossless Color Image Compression Algorithm with Adaptive Arithmetic Coding Based on Adjacent Data Probability", International Symposium on Computer, Consumer and Control (IS3C-2012), pp. 141-145, June 2012 [4] B. R. Nagendra, V. Ramesh Naidu, Dr. N. K. Misra, Dr. A. M. Khan, Signal Magnitude Analysis of Vibration Test Data of Satellites, International Conference on Electronic Design and Signal Processing, (ICEDSP-2012), Dec 2012 [5] Singh R, Srivastava V K, "Performance comparison of arithmetic and Huffman coder applied to EZW codec", 2nd International Conference on Power, Control and Embedded Systems (ICPCES-2012), Dec 2012 [6] Zhiwei Tang, "One Adaptive Binary Arithmetic Coding System Based On Context", International Conference on Computer Science and Service System (CSSS-2011), pp. 1440 - 1443, June 2011 [7] Ren Weizheng, Wang Haobo, Xu Lianming, Cui Yansong, Research on a quasi-lossless compression algorithm based on huffman coding International Conference on Transportation, Mechanical, and Electrical Engineering (TMEE-2011), pp. 1729 - 1732, Dec. 2011 [8] Hyoung Joong Kim, "A Fast Implementation of Arithmetic Coding", 12th International Asia-Pacific Web Conference (APWEB-2010), pp. 419-423, April 2010 [9] Jiaji Wu, Minli Wang, Jechang Jeong, Licheng Jiao, "Adaptive- distributed Arithmetic Coding for Lossless Compression", 2nd IEEE International Conference on Network Infrastructure and Digital Content, pp. 541-545, Sep 2010 [10] Yuying Zheng, Chun Qi, Guanzhen Wang, "A New Image Pre- Processing for Improved Performance of Entropy Coding", Chinese Conference on Pattern Recognition (CCPR-2010), pp. 1-6, Oct 2010 [11] Babu K A, Kumar V S, "Implementation of Data Compression using Huffman Coding", International Conference on Methods and Models in Computer Science (ICM2CS-2010), pp. 70 - 75, Dec. 2010 [12] Pinho A J, Neves A J R, Bastos C A C, Ferreira P J S G, "DNA Coding Using Finite-Context Models and Arithmetic Coding", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-2009) pp 1693-1696, April 2009 [13] Ming-Bo Lin, Yung-Yi Chang, "A New Architecture of a Two-Stage Lossless Data Compression and Decompression Algorithm", IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Volume 17, Issue 9, pp. 1297 - 1303, Sep 2009 [14] Mu Feng, Hu Fangqiang, Zhong Weibo, "A New Waveform Speech Entropy Coding Method", International Conference on Multimedia Information Networking and Security (MINES-2009), Volume 2, pp. 262 - 265, Nov 2009 International Journal of Emerging Trends in Signal Processing( IJETSP ) ISSN(2319-9784) , Volume 1 , Issue 5 September 2013
6
[15] B. R. Nagendra, V. Ramesh Naidu, Dr. N. K. Misra and Dr. A. M. Khan, Region of Interest Based Compression of Medical Images Using Wavelets International Conference on Electronic Design and Signal Processing, (ICEDSP-2009), Dec 2009 [16] Elsayed H A, Alghoniemy M, El-Banna M, "A Comparative Study of Lossless Audio Coding Schemes Computer Engineering and Systems" , The 2006 International Conference on Computer Engineering and Systems, pp 271-275, Nov 2006 [17] Jas A, Ghosh Dastidar J, Mom Eng Ng, Touba N A, "An efficient test vector compression scheme using selective Huffman coding", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Volume 22, Issue 6, pp. 797 - 806, June 2003
Information Geometry Manifold of Toeplitz Hermitian Positive Definite Covariance Matrices: Mostow/Berger Fibration and Berezin Quantization of Cartan-Siegel Domains
International Journal of Emerging Trends in Signal Processing (IJETSP)