Professional Documents
Culture Documents
Research Article
A Multilayer Improved RBM Network Based Image Compression
Method in Wireless Sensor Networks
Copyright © 2016 Chunling Cheng et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly
cited.
The processing capacity and power of nodes in a Wireless Sensor Network (WSN) are limited. And most image compression
algorithms in WSN are subject to random image content changes or have low image qualities after the images are decoded. Therefore,
an image compression method based on multilayer Restricted Boltzmann Machine (RBM) network is proposed in this paper. The
alternative iteration algorithm is also applied in RBM to optimize the training process. The proposed image compression method
is compared with a region of interest (ROI) compression method in simulations. Under the same compression ratio, the qualities
of reconstructed images are better than that of ROI. When the number of hidden units in top RBM layer is 8, the peak signal-to-
noise ratio (PSNR) of the multilayer RBM network compression method is 74.2141, and it is much higher than that of ROI which
is 60.2093. The multilayer RBM based image compression method has better compression performance and can effectively reduce
the energy consumption during image transmission in WSN.
and the other is about the model distribution parameter. The and the DC routing algorithm. In [9, 10], a ring model
model distribution parameter which is about to be assessed is based distributed time-space data compression algorithm
calculated alternatively with the normalizing parameter and and a wavelet based self-fitting data compression algorithm
eventually can be obtained through a highly efficient training are proposed. Storage efficient two-dimension and three-
process. This training process is of low complexity. This dimension continuous wavelet data compression methods are
algorithm can improve the likelihood of RBM for training proposed in [11]. They are based on the ring model of fitting
data. sensor network wavelet transform and the overlapping cluster
Furthermore, we have used the improved RBM train- partition model, respectively. They are storage efficient and
ing process in image compression in WSNs. A multilayer can save the transmission energy consumption in networks.
improved RBM based image compression method is pre- The distributed data compression algorithm is based on
sented in this paper. This image compression method can the fact that all the centralized and decentralized information
extract more abstract data to coding based on the image fea- services can be implemented. The feature of a distributed data
tures and has a better compression effect. In the simulations, compression algorithm is that it can reduce the data amount
the reconstructed image quality of multilayer RBM networks by the cooperative work among different sensor nodes. A
is superior to that of another image compression method chain model based distributed data compression algorithm
under the same compression ratio, which will be stated in is proposed in [12] based on the random lengths of wavelets.
detail in Section 5. At the same time, the proposed image This algorithm designs a chain model that is suitable for
compression method can reduce the energy consumption wavelet transform. It is suitable for random lengths of wavelet
during image data transmission process. functions.
The rest of the paper is organized as follows. In Section 2, Traditional lossless data compression methods mainly
related work on image compression and RBM training algo- include Run Length Encoding technology, Huffman coding
rithms is discussed. Section 3 presents the basic idea of the compression, dictionary compression method, and arith-
multilayer RBM network based image compression method. metic compression method. These methods are mainly
And the RBM model and the improved RBM algorithm adopted in advanced computers or workstations. In the
based on alternative iteration are depicted in Section 4. The application of sensor networks, the processing capacity of
performance of the proposed algorithm is compared with each processor is limited. Its memory is small. Therefore, it is
some typical algorithms in Section 5. At last, conclusions and essential to optimize the traditional compression algorithm.
future work are presented in Section 6. In [13], the difference between the two perceptual pieces of
data is encoded based on the self-fitting Huffman coding
algorithm. Reference [14] proposes a region of interest (ROI)
2. Related Work based lossy-lossless image compression method. It carries
out different coding compression methods on the small area
Typical image compression algorithms include time-space that is important to itself and the other large area. In this
related data compression algorithm, wavelet transform based way, compression ratio is improved under the condition that
data compression algorithm, distributed data compression sensitive information is reserved.
algorithm, and improved traditional data compression algo- In recent years, Deep Learning (DL) is widely used
rithm. in WSNs to carry out image compression. Deep Learning
The space-time relativity based data compression algo- extracts the characteristics of data from low to high layers
rithm mainly includes prediction coding and linear fitting by modeling the layer model of analyzing in human brains.
method for time series. A prediction coding method is However, the effect of image compression using DL is
proposed in [5]. It can effectively evaluate the source data subject to the likelihood of RBM for training data and the
based on the time relativity of the source data. However, the training complexity of RBM. Therefore, an improved training
prediction coding based data compression method does not algorithm based on RBM training is also proposed in this
involve large amount of image data transmission. Reference paper.
[6] proposes a curve fitting technology based data flow com- Currently, researchers have made lots of researches on
pression method. It compresses data collected on each node RBM training algorithms. In 2002, Hinton proposed a fast
and restores the data in the base station. But this method is learning algorithm of RBM, Contrastive Divergence (CD)
very complex, and it does not consider the transmission delay [15]. This algorithm is a RBM approximate learning algorithm
in each sensor node. Reference [7] presents a space-time data of high efficiency. However, the RBM model acquired by the
compression technology based on simple linear regression CD algorithm is not a maximum entropy model and does not
model. This method can eliminate data redundancy in single have high likelihood when training data [16].
node and collector node, respectively. But only the data that In 2008, Tijmen Tieleman proposed a Persistent Con-
satisfies the error requirement is considered in this method. trastive Divergence (PCD) algorithm [17]. This algorithm has
Abnormal data is not involved in this method. remedied the deficiency in CD algorithm. It has the same
Wavelet transform is a time-frequency analysis method efficiency of CD algorithm and does not violate the maximum
which is superior to traditional signal analysis methods. likelihood learning. In addition, the RBM obtained by PCD
Reference [8] considers the existence of stream data in the training has more powerful pattern generation capacity. In
data transmission of sensor networks. It compresses data 2009, Tieleman and Hinton made further improvement of
by using wavelet transform based on the data aggregation PCD algorithm [18] and proposed Fast Persistent Contrastive
International Journal of Distributed Sensor Networks 3
The model distributed parameter is estimated and changed At this time, we need to choose the equation to calculate
𝑖
to 𝜃11 based on 𝑧1 . Continue to estimate the above two 𝜃𝑡+1 . When 𝑧𝑡+1 and 𝜃𝑡𝑖 are all already known, we can get the
parameters alternatively until the convergence condition is joint probability distribution of (V𝑖 , ℎ) based on (1):
satisfied or it reaches the maximum iteration times 𝑇. The 𝑖 𝑖
final value of the model parameter, which is obtained by 𝑒−𝐸(V ,ℎ|𝜃𝑡 )
𝑝 (V𝑖 , ℎ | 𝜃𝑡𝑖 ) = , (11)
sample V𝑖 , is denoted by 𝜃𝑖 . After that, the model parameter 𝑍 (𝜃𝑡𝑖 )
is denoted by 𝜃1 . It is the initial value of the model parameter
when inputting the second sample V2 , which means 𝜃02 = 𝜃1 . where 𝑍(𝜃𝑡𝑖 ) is the normalized parameter 𝑧𝑡+1 which is
When sample V𝑖 and model parameter 𝜃𝑡𝑖 are inputted, we obtained above.
need to consider the objective function of 𝑧𝑡+1 . Assume that We can get the marginal distribution of the joint probabil-
𝑧𝑡 = 𝑍(𝑧𝑖 | 𝜃𝑡𝑖 ) is the distribution of the normalized parameter ity distribution 𝑝(V𝑖 , ℎ | 𝜃𝑡𝑖 ) based on the derivation equation
of the original RBM:
of the sample when 𝜃𝑡𝑖 keeps unchanged, where 𝑧𝑖 is the
normalized parameter of sample V𝑖 . The satisfied conditions 1 𝑖 𝑖
of 𝑍(𝑧𝑖 | 𝜃𝑡𝑖 ) are ∑𝑧𝑖 𝑍(𝑧𝑖 | 𝜃𝑡𝑖 ) = 1 and 𝑍(𝑧𝑖 | 𝜃𝑡𝑖 ) ≥ 0. Because 𝑝 (V𝑖 | 𝜃𝑡𝑖 ) = ∑𝑒−𝐸(V ,ℎ|𝜃𝑡 ) . (12)
𝑧𝑡+1 ℎ
the log function is concave, we can calculate the approximate
expression of 𝑧 by using the Jensen inequality. Then, we can 𝑖
Then, we keep 𝑧𝑡+1 unchanged and get a value 𝜃𝑡+1 of the
derive the following equation: model parameter:
𝑛V 𝑛V 𝑛V
𝑖
∑ log 𝑝 (V𝑛𝑖 ; 𝜃𝑡𝑖 ) = ∑ log ∑𝑝 (V𝑛𝑖 , 𝑧𝑖 ; 𝜃𝑡𝑖 ) . (6) 𝜃𝑡+1 = arg max𝑙 (𝜃𝑡𝑖 ) = arg max ∑ ln 𝑝 (V𝑛𝑖 | 𝜃𝑡𝑖 ) . (13)
𝑛=1 𝑛=1 𝑧𝑖 𝜃𝑡𝑖 𝜃𝑡𝑖 𝑛=1
Multiply the denominator and numerator of the right However, the initial value 𝜃0 we assigned to the model
fraction in (6) by 𝑍(𝑧𝑖 | 𝜃𝑡𝑖 ): parameter may not be suitable for the model. In that case,
we can update the value of the model parameter by iterative
𝑛V 𝑛V 𝑝 (V𝑖 , 𝑧𝑖 ; 𝜃𝑡𝑖 ) optimization based on the alternative iteration algorithm.
∑ log 𝑝 (V𝑛𝑖 ; 𝜃𝑡𝑖 ) = ∑ log ∑𝑍 (𝑧𝑖 | 𝜃𝑡𝑖 ) . (7) Thus, sample V𝑖 can be used to estimate a value of 𝜃𝑖 . 𝜃𝑖 which
𝑛=1 𝑛=1 𝑧𝑖
𝑍 (𝑧𝑖 | 𝜃𝑡𝑖 ) is obtained by the training of former sample can be used as
the initial value of 𝜃𝑖+1 . 𝜃𝑖+1 is the model parameter which
We can deduce from the Jensen inequality and the is about to be estimated based on the next sample. Repeat
property concave function the following equation: the optimization operations until termination conditions are
satisfied.
𝑛V 𝑝 (V𝑛𝑖 , 𝑧𝑖 ; 𝜃𝑡𝑖 ) The improved RBM algorithm is described in
∑ log ∑𝑍 (𝑧𝑖 | 𝜃𝑡𝑖 ) Algorithm 1.
𝑛=1 𝑧𝑖
𝑍 (𝑧𝑖 | 𝜃𝑡𝑖 )
(8)
𝑛V 𝑝 (V𝑛𝑖 , 𝑧𝑖 ; 𝜃𝑡𝑖 ) 5. Simulation Experiments and
≥ ∑ ∑𝑍 (𝑧𝑖 | 𝜃𝑡𝑖 ) log . Results Analysis
𝑛=1 𝑧𝑖 𝑍 (𝑧𝑖 | 𝜃𝑡𝑖 )
The experiment consists of three parts: the performance
Equation (8) is true if and only if 𝑝(V𝑛𝑖 , 𝑧𝑖 ; 𝜃𝑡𝑖 )/𝑍(𝑧𝑖 | 𝜃𝑡𝑖 ) = analysis of RBM; the analysis of the compression performance
𝑐, where 𝑐 is a constant which is independent of 𝑧𝑖 . of the proposed image compression method and the evalu-
According to ∑𝑧𝑖 𝑍(𝑧𝑖 | 𝜃𝑡𝑖 ) = 1, we can draw the ation of reconstructed image quality; the analysis of energy
following equation: consumption in WSNs when multilayer RBM network image
compression method is used. MATLAB 2013a is used to carry
𝑖
𝑝 (V𝑛𝑖 , 𝑧𝑖 ; 𝜃𝑡𝑖 ) 𝑝 (V𝑛𝑖 , 𝑧𝑖 ; 𝜃𝑡𝑖 ) out the simulations.
𝑍 (𝑧 | 𝜃𝑡𝑖 ) = =
∑𝑧𝑖 𝑝 (V𝑛𝑖 , 𝑧𝑖 ; 𝜃𝑡𝑖 ) 𝑝 (V𝑛𝑖 ; 𝜃𝑡𝑖 ) (9)
5.1. Performance Analysis of the Improved RBM. The datasets
𝑖
= 𝑝 (𝑧 | V𝑛𝑖 ; 𝜃𝑡𝑖 ) . of our experiment are the famous handwritten digital
database of MNIST [25] and the toy dataset. The MNIST
When 𝜃𝑡𝑖 keeps unchangeable, we maximize 𝑝(V𝑖 | 𝑧𝑡 ), dataset consists of 50,000 groups of training samples and
10,000 groups of testing samples. Each group of samples
which is the same as maximizing ln 𝑝(V𝑖 | 𝑧𝑡 ). Hence, consists of a grayscale image whose resolution is 28∗28. There
𝑛V are handwritten Arabic numerals in the image. These Arabic
𝑧𝑡+1 = arg max𝑙 (𝑧𝑡 ) = arg max ∑ ln 𝑝 (V𝑛𝑖 | 𝑧𝑡 ) . (10) numerals have their indexes so as to conduct the experiment
𝑧𝑡 𝑧𝑡 𝑛=1 with supervised learning. Part of the data samples in the
MNIST dataset is shown in Figure 4.
The normalized parameter can be estimated and we get a Compared with MINST dataset, the toy dataset is simpler
value 𝑧𝑡+1 . and lower dimensional. It consists of 10,000 images. Each
6 International Journal of Distributed Sensor Networks
90
70
60
50
Figure 4: Part of samples of MNIST database.
40
30
image has 4 × 4 binary pixels. The dataset is generated in the
same way as that mentioned in [26].
20
We compare the proposed algorithm with PCD algo- 0 50 100 150 200 250 300 350
rithm, parallel tempering algorithm (PT-K) [27], and parallel The number of hidden units
tempering with equienergy moves (PTEE) [28] in the experi-
PCD PT-10
ments. In PT-K, 𝐾 is the number of auxiliary distributions of
Our proposed algorithm PTEE-10
parallel tempering under different temperatures. The value of
each temperature is usually between 0.9 and 1. The parameter Figure 5: The reconstruction errors of the four algorithms after 15
in PT-K can be easily controlled and in our experiments 𝐾 times of iterations on the MNIST dataset.
is set to 5 and 10, respectively. Based on some preliminary
experiments, we find that PT can achieve better likelihood
scores than that of using 5 chains when using 10 chains. Result
yielded by PTEE when using 5 chains is similar to that when We set the number of hidden units to 10, 15, 20, 25, 30, 50,
using 10 chains, which means that PTEE cannot be affected 100, 150, 200, 250, and 300. Results obtained by using PT-
by the number of Markov chains to some extent [28]. So, 10, PTEE-10, PCD, and the proposed algorithm are shown in
we show the results obtained by using PTEE and PT with 10 Figures 5–8.
chains. From Figures 5 and 6, we can see that the average
We evaluate their qualities by the likelihood of the RBM reconstruction error of the proposed algorithm is always
for training data with two methods: the reconstruction error smaller than the other three algorithms on the MNIST
and enumerating states of hidden units. dataset. And we can get similar results from Figures 7 and
Firstly, we compare the reconstruction errors of four 8.
algorithms with different numbers of hidden nodes on the Figures 5–8 show that all of the reconstruction errors of
MNIST dataset and toy dataset. The first 30,000 groups of four algorithms decrease when the number of hidden units
samples in MINIST are divided into three parts. Each part increases. When there are a small number of hidden units,
includes 10,000 groups of samples. The number of hidden the reconstruction error obtained by the proposed algorithm
units is set to 10, 15, 20, 25, 30, 50, 100, 200, 250, 300, and 350. is close to that of the other three algorithms. However, when
The number of iterations on each part ranges from 1 to 45. the number of hidden units increases, the superiority of
And the average reconstruction errors of the three parts after the proposed algorithm appears gradually. And we can see
15 and 30 times of iterations are shown, respectively, below. decreasing ratios of the average reconstruction errors of PT-
Then, the experiments on the toy dataset are also executed. 10, PTEE-10, and our proposed algorithm compared with
International Journal of Distributed Sensor Networks 7
90 120
110
80
100
70
90
60
80
50 70
40 60
50
30
40
0 50 100 150 200 250 300
20
0 50 100 150 200 250 300 350 The number of hidden units
The number of hidden units
PCD PT-10
PCD PT-10 Our proposed algorithm PTEE-10
Our proposed algorithm PTEE-10
Figure 8: The reconstruction errors of the four algorithms after 30
Figure 6: The reconstruction errors of the four algorithms after 30 times of iterations on the toy dataset.
times of iterations on the MNIST dataset.
120 −100
The average log-likelihood of training data
The average reconstruction error (2-norm)
110
100
−150
90
80
70 −200
60
50
−250
40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0 50 100 150 200 250 300
The number of parameter updating times (∗5000)
The number of hidden units
PCD PT-10
PCD PT-10
Our proposed algorithm PTEE-10
Our proposed algorithm PTEE-10
Figure 9: The average likelihood of the four algorithms when there
Figure 7: The reconstruction errors of the four algorithms after 15
are 15 hidden units on the MNIST dataset.
times of iterations on the toy dataset.
PCD on the MNIST and the toy dataset when there are the 10,000 times of parameter updates. However, as the number
same numbers of hidden units. When the number of hidden of updates gradually increases, the likelihood for data of the
units is 350, after 30 times of iterations, the reconstruction proposed algorithm also increases which can be better than
error of the proposed algorithm is 26.60% lower than that that of PTEE-10. When the number of updates is 30,000, the
of PCD on MNIST dataset. Under the same conditions and likelihood of PCD reaches the peak. And it decreases because
compared with PCD, PT-10 is 8.20% lower and PTEE-10 is the number of Gibbs transitions increases and the model
16.64% lower. distribution also gets steeper and steeper. PT-10 algorithm
Next, a small scale of experiment with 15 hidden units is is a Monte Carlo method based on tempering. It has more
conducted on the MNIST dataset. The log-likelihood can be even distribution when the temperature gets higher so that it
obtained by enumerating states of hidden units. Therefore, can overcome the steep distribution difficulties by conducting
high accuracy can be achieved. Figure 9 shows the averaged state transitions from low to high temperatures. So it has a
log-likelihood by training each model 5 times. better effect than that of PCD. PTEE-10 proposes a new type
Figure 9 shows that the likelihood of the proposed algo- of move called equienergy move, which improves the swap
rithm is not as good as the other three algorithms within rates between neighboring chains to some extent. But after
8 International Journal of Distributed Sensor Networks
1200 affect the delay in the sensor network. We should find more
suitable normalizing parameter function during the RBM
1000 training process. Besides, the problem of finding routing path
Total energy consumption (EJ/bit)
200
Acknowledgments
0
This work is sponsored by the Fundamental Research Funds
0 50 100 150 200 250 for the Central Universities (no. LGZD201502), the Natural
Distance (m) Science Foundation of China (nos. 61403208 and 61373139),
and the Research and Innovation Projects for Graduates of
No compression
Multilayer RBM network
Jiangsu Graduates of Jiangsu Province (no. CXZZ12 0483).
ROI
wireless sensor networks,” Journal of Communication, vol. 30, the International Joint Conference on Neural Networks (IJCNN
no. 3, pp. 48–53, 2008. ’10), pp. 1–8, Barcelona, Spain, July 2010.
[11] S.-W. Zhou, Y.-P. Lin, and S.-T. Ye, “A kind of sensor network [28] N. Ji and J. Zhang, “Parallel tempering with equi-energy moves
storage effective wavelet incremental data compression algo- for training of restricted boltzmann machines,” in Proceedings
rithm,” Journal of Computer Research and Development, vol. 46, of the International Joint Conference on Neural Networks (IJCNN
no. 12, pp. 2085–2092, 2009. ’14), pp. 120–127, Beijing, China, July 2014.
[12] W.-H. Luo and J.-L. Wang, “Based on chain model of distributed
wavelet compression algorithm,” Computer Engineering, vol. 36,
no. 16, pp. 74–76, 2010.
[13] F. Xiang-Hui, L. Shi-Ning, and D. Peng-Lei, “Adaptive nonde-
structive data compression system of WSN,” Computer Mea-
surement and Control, vol. 18, no. 2, pp. 463–465, 2010.
[14] N. Cai-xiang, Study on Image Data Compression Processing
in Wireless Multimedia Sensor Network, Chang’an University,
Xi’an, China, 2014.
[15] G. E. Hinton, “Training products of experts by minimizing
contrastive divergence,” Neural Computation, vol. 14, no. 8, pp.
1771–1800, 2002.
[16] I. Sutskever and T. Tieleman, “On the convergence properties of
contrastive divergence,” Journal of Machine Learning Research—
Proceedings Track, vol. 9, pp. 789–795, 2010.
[17] T. Tieleman, “Training restricted Boltzmann machines using
approximations to the likelihood gradient,” in Proceedings of the
25th International Conference on Machine Learning, pp. 1064–
1071, ACM, Helsinki, Finland, July 2008.
[18] T. Tieleman and G. E. Hinton, “Using fast weights to improve
persistent contrastive divergence,” in Proceedings of the 26th
Annual International Conference on Machine Learning (ICML
’09), pp. 1033–1040, ACM, June 2009.
[19] G. Desjardins, A. Courville, and Y. Bengio, “Adaptive parallel
tempering for stochastic maximum likelihood learning of
RBMs,” in Neural Information Processing Systems (NIPS), MIT
Press, 2010.
[20] J. Xu, H. Li, and S. Zhou, “Improving mixing rate with tempered
transition for learning restricted Boltzmann machines,” Neuro-
computing, vol. 139, pp. 328–335, 2014.
[21] Y. Hu, Markov chain Monte Carlo based improvements to
the learning algorithm of restricted Boltzmann machines [M.S.
thesis], Shanghai Jiao Tong University, Shanghai, China, 2012.
[22] Y. Bengio, A. C. Courville, and P. Vincent, Unsupervised Feature
Learning and Deep Learning: A Review and New Perspectives,
Department of Computer Science and Operations Research,
University of Montreal, Montreal, Canada, 2012.
[23] A. Fischer and C. Igel, “Training restricted Boltzmann
machines: an introduction,” Pattern Recognition, vol. 47, no. 1,
pp. 25–39, 2014.
[24] A. Fischer and C. Igel, “An Mpirical analysis of the divergence of
Gibbs sampling based learning algorithms for restricted Boltz-
mann machines,” in Artificial Neural Networks-ICANN 2010:
20th International Conference, Thessaloniki, Greece, September
15–18, 2010, Proceedings, Part III, vol. 6354 of Lecture Notes in
Computer Science, pp. 208–217, Springer, Berlin, Germany, 2010.
[25] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based
learning applied to document recognition,” Proceedings of the
IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[26] G. Desjardins, A. Courville, and Y. Bengio, “Parallel tempering
for training of restricted Boltzmann machines,” Journal of
Machine Learning Research Workshop & Conference Proceed-
ings, vol. 9, pp. 145–152, 2010.
[27] K. H. Cho, T. Raiko, and A. Ilin, “Parallel tempering is efficient
for learning restricted Boltzmann machines,” in Proceedings of