You are on page 1of 652

FUZZY SYSTEMS AND DATA MINING II

Frontiers in Artificial Intelligence and


Applications
The book series Frontiers in Artificial Intelligence and Applications (FAIA) covers all aspects of
theoretical and applied Artificial Intelligence research in the form of monographs, doctoral
dissertations, textbooks, handbooks and proceedings volumes.
The FAIA series contains several sub-series, including ‘Information Modelling and Knowledge
Bases’ and ‘Knowledge-Based Intelligent Engineering Systems’. It also includes the biennial
European Conference on Artificial Intelligence (ECAI) proceedings volumes, and other EurAI
(European Association for Artificial Intelligence, formerly ECCAI) sponsored publications. An
editorial panel of internationally well-known scholars is appointed to provide a high quality
selection.

Series Editors:
J. Breuker, N. Guarino, J.N. Kok, J. Liu, R. López de Mántaras,
R. Mizoguchi, M. Musen, S.K. Pal and N. Zhong

Volume 293
Recently published in this series

Vol. 292. H. Jaakkola, B. Thalheim, Y. Kiyoki and N. Yoshida (Eds.), Information Modelling
and Knowledge Bases XXVIII
Vol. 291. G. Arnicans, V. Arnicane, J. Borzovs and L. Niedrite (Eds.), Databases and
Information Systems IX – Selected Papers from the Twelfth International Baltic
Conference, DB&IS 2016
Vol. 290. J. Seibt, M. Nørskov and S. Schack Andersen (Eds.), What Social Robots Can and
Should Do – Proceedings of Robophilosophy 2016 / TRANSOR 2016
Vol. 289. I. Skadiņa and R. Rozis (Eds.), Human Language Technologies – The Baltic
Perspective – Proceedings of the Seventh International Conference Baltic HLT 2016
Vol. 288. À. Nebot, X. Binefa and R. López de Mántaras (Eds.), Artificial Intelligence Research
and Development – Proceedings of the 19th International Conference of the Catalan
Association for Artificial Intelligence, Barcelona, Catalonia, Spain, October 19–21,
2016
Vol. 287. P. Baroni, T.F. Gordon, T. Scheffler and M. Stede (Eds.), Computational Models of
Argument – Proceedings of COMMA 2016
Vol. 286. H. Fujita and G.A. Papapdopoulos (Eds.), New Trends in Software Methodologies,
Tools and Techniques – Proceedings of the Fifteenth SoMeT_16
Vol. 285. G.A. Kaminka, M. Fox, P. Bouquet, E. Hüllermeier, V. Dignum, F. Dignum and
F. van Harmelen (Eds.), ECAI 2016 – 22nd European Conference on Artificial
Intelligence, 29 August–2 September 2016, The Hague, The Netherlands – Including
Prestigious Applications of Artificial Intelligence (PAIS 2016)

ISSN 0922-6389 (print)


ISSN 1879-8314 (online)
Fuzzy Systems and Data Mining II
Proceedings of FSDM 2016

Edited by
Shilei Sun
International School of Software, Wuhan University, China

Antonio J. Tallón-Ballesteros
Department of Languages and Computer Systems, University of Seville, Spain

Dragan S. Pamučar
Department of Logistic, University of Defence in Belgrade, Serbia
and
Feng Liu
International School of Software, Wuhan University, China

Amsterdam • Berlin • Washington, DC


© 2016 The authors and IOS Press.

All rights reserved. No part of this book may be reproduced, stored in a retrieval system,
or transmitted, in any form or by any means, without prior written permission from the publisher.

ISBN 978-1-61499-721-4 (print)


ISBN 978-1-61499-722-1 (online)
Library of Congress Control Number: 2016958585

Publisher
IOS Press BV
Nieuwe Hemweg 6B
1013 BG Amsterdam
Netherlands
fax: +31 20 687 0019
e-mail: order@iospress.nl

For book sales in the USA and Canada:


IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel.: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

LEGAL NOTICE
The publisher is not responsible for the use which might be made of the following information.

PRINTED IN THE NETHERLANDS


v

Preface
Fuzzy Systems and Data Mining (FSDM) is an annual international conference devoted
to four main groups of topics: a) fuzzy theory, algorithm and system; b) fuzzy applica-
tion; c) the interdisciplinary field of fuzzy logic and data mining; and d) data mining.
Following the great success of FSDM 2015, held in Shanghai, the second edition in the
FSDM series was held in Macau, China, where experts, researchers, academics and
participants from the industry were introduced to the latest advances in the field of
Fuzzy Sets and Data Mining. Macau was declared a UNESCO World Heritage Site in
2005 by virtue of its cultural importance. The historic centre of Macau is of particular
interest because of its mixture of traditional Chinese and Portuguese cultures. Macau
has both Cantonese (a variant of Chinese) and Portuguese as official languages.
This volume contains the papers accepted and presented at the 2nd International
Conference on Fuzzy Systems and Data Mining (FSDM 2016), held on 11–14 Decem-
ber 2016 in Macau, China. All papers have been carefully reviewed by programme
committee members and reflect the breadth and depth of the research topics which fall
within the scope of FSDM. From several hundred submissions, 81 of the most promis-
ing and FAIA mainstream-relevant contributions have been selected for inclusion in
this volume; they present original ideas, methods or results of general significance
supported by clear reasoning and compelling evidence.
FSDM 2016 was also a reference conference, and the conference programme in-
cluded keynote and invited presentations, oral and poster contributions. The event pro-
vided a forum where more than 100 qualified and high-level researchers and experts
from over 20 countries, including 4 keynote speakers, gathered to create an important
platform for researchers and engineers worldwide to engage in academic communica-
tion.
I would like to thank all the keynote and invited speakers and authors for the effort
they have put into preparing their contributions to the conference. We would also like
to take this opportunity to express our gratitude to those people, especially the program
committee members and reviewers, who devoted their time to assessing the papers. It is
an honour to continue with the publication of these proceedings in the prestigious series
Frontiers in Artificial Intelligence and Applications (FAIA) from IOS Press. Our par-
ticular thanks also go to J. Breuker, N. Guarino, J.N. Kok, R. López de Mántaras, J. Liu,
R. Mizoguchi, M. Musen, S.K. Pal and N. Zhong, the FAIA series editors, for support-
ing this conference.
Last but not least, I hope that all our participants have enjoyed their stay in Macau
and their time at the Macau University of Science and Technology (M.U.S.T.). We
hope you had a magnificent experience in both places.

Antonio J. Tallón-Ballesteros
University of Seville, Spain
This page intentionally left blank
vii

Contents
Preface v
Antonio J. Tallón-Ballesteros

Fuzzy Control, Theory and System

Cumulative Probability Distribution Based Computational Method for High


Order Fuzzy Time Series Forecasting 3
Sukhdev S. Gangwar and Sanjay Kumar
Introduction to Fuzzy Dual Mathematical Programming 11
Carlos A.N. Cosenza, Fabio Krykhtine, Walid El Moudani
and Felix A.C. Mora-Camino
Forecasting National Football League Game Outcomes Based on Fuzzy
Candlestick Patterns 22
Yu-Chia Hsu
A Fuzzy Control Based Parallel Filling Valley Equalization Circuit 28
Feng Ran, Ke-Wei Hu, Jing-Wei Zhao and Yuan Ji
Interval-Valued Hesitant Fuzzy Geometric Bonferroni Mean Aggregation
Operator 37
Xiao-Rong He, Ying-Yu Wu, De-Jian Yu, Wei Zhou and Sun Meng
A New Integrating SAW-TOPSIS Based on Interval Type-2 Fuzzy Sets
for Decision Making 45
Lazim Abdullah and C.W. Rabiatul Adawiyah C.W. Kamal
Algorithms for Finding Oscillation Period of Fuzzy Tensors 51
Ling Chen and Lin-Zhang Lu
Toward a Fuzzy Minimum Cost Flow Problem for Damageable Items
Transportation 58
Si-Chao Lu and Xi-Fu Wang
Research on the Application of Data Mining in the Field of Electronic
Commerce 65
Xia Song and Fang Huang
A Fuzzy MEBN Ontology Language Based on OWL2 71
Zhi-Yun Zheng, Zhuo-Yun Liu, Lun Li, Dun Li and Zhen-Fei Wang
State Assessment of Oil-Paper Insulation Based on Fuzzy Rough Sets 81
De-Hua He, Jin-Ding Cai, Song Xie and Qing-Mei Zeng
Finite-Time Stabilization for T-S Fuzzy Networked Systems with State
and Communication Delay 87
He-Jun Yao, Fu-Shun Yuan and Yue Qiao
viii

A Trapezoidal Fuzzy Multiple Attribute Decision Making Based on Rough


Sets 94
Zhi-Ying Lv, Ping Huang, Xian-Yong Zhang and Li-Wei Zheng
Fuzzy Rule-Based Stock Ranking Using Price Momentum and Market
Capitalization 102
Ratchata Peachavanish
Adaptive Fuzzy Sliding-Mode Control of Robot and Simulation 108
Huan Niu, Jie Yang and Jie-Ru Chi
Hesitant Bipolar Fuzzy Set and Its Application in Decision Making 115
Ying Han, Qi Luo and Sheng Chen
Chance Constrained Twin Support Vector Machine for Uncertain Pattern
Classification 121
Ben-Zhang Yang, Yi-Bin Xiao, Nan-Jing Huang and Qi-Lin Cao
Set-Theoretic Kripke-Style Semantics for Monoidal T-Norm (Based) Logics 131
Eunsuk Yang

Data Mining

Dynamic Itemset Mining Under Multiple Support Thresholds 141


Nourhan Abuzayed and Belgin Ergenç
Deep Learning with Large Scale Dataset for Credit Card Data Analysis 149
Ayahiko Niimi
Probabilistic Frequent Itemset Mining Algorithm over Uncertain Databases
with Sampling 159
Hai-Feng Li, Ning Zhang, Yue-Jin Zhang and Yue Wang
Priority Guaranteed and Energy Efficient Routing in Data Center Networks 167
Hu-Yin Zhang, Jing Wang, Long Qian and Jin-Cai Zhou
Yield Rate Prediction of a Dynamic Random Access Memory Manufacturing
Process Using Artificial Neural Network 173
Chun-Wei Chang and Shin-Yeu Lin
Mining Probabilistic Frequent Itemsets with Exact Methods 179
Hai-Feng Li and Yue Wang
Performance Degradation Analysis Method Using Satellite Telemetry Big Data 186
Feng Zhou, De-Chang Pi, Xu Kang and Hua-Dong Tian
A Decision Tree Model for Meta-Investment Strategy of Stock Based on Sector
Rotating 194
Li-Min He, Shao-Dong Chen, Zhen-Hua Zhang, Yong Hu
and Hong-Yi Jiang
Virtualized Security Defense System for Blurred Boundaries of Next
Generation Computing Era 208
Hyun-A. Park
ix

Implicit Feature Identification in Chinese Reviews Based on Hybrid Rules 220


Yong Wang, Ya-Zhi Tao, Xiao-Yi Wan and Hui-Ying Cao
Characteristics Analysis and Data Mining of Uncertain Influence Based
on Power Law 226
Ke-Ming Tang, Hao Yang, Qin Liu, Chang-Ke Wang and Xin Qiu
Hazardous Chemicals Accident Prediction Based on Accident State Vector
Using Multimodal Data 232
Kang-Wei Liu, Jian-Hua Wan and Zhong-Zhi Han
Regularized Level Set for Inhomogeneity Segmentation 241
Guo-Qi Liu and Hai-Feng Li
Exploring the Non-Trivial Knowledge Implicit in Test Instance to Fully
Represent Unrestricted Bayesian Classifier 248
Mei-Hui Li and Li-Min Wang
The Factor Analysis’s Applicability on Social Indicator Research 254
Ying Xie, Yao-Hua Chen and Ling-Xi Peng
Research on Weapon-Target Allocation Based on Genetic Algorithm 260
Yan-Sheng Zhang, Zhong-Tao Qiao and Jian-Hui Jing
PMDA-Schemed EM Channel Estimator for OFDM Systems 267
Xiao-Fei Li, Di He and Xiao-Hua Chen
Soil Heavy Metal Pollution Research Based on Statistical Analysis and BP
Network 274
Wei-Wei Sun and Xing-Ping Sheng
An Improved Kernel Extreme Learning Machine for Bankruptcy Prediction 282
Ming-Jing Wang, Hui-Ling Chen, Bin-Lei Zhu, Qiang Li, Ke-Jie Wang
and Li-Ming Shen
Novel DBN Structure Learning Method Based on Maximal Information
Coefficient 290
Guo-Liang Li, Li-Ning Xing and Ying-Wu Chen
Improvement of the Histogram for Infrequent Color-Based Illustration Image
Classification 299
Akira Fujisawa, Kazuyuki Matsumoto, Minoru Yoshida and Kenji Kita
Design and Implementation of a Universal QC-LDPC Encoder 306
Qian Yi and Han Jing
Quantum Inspired Bee Colony Optimization Based Multiple Relay Selection
Scheme 312
Feng-Gang Lai, Yu-Tai Li and Zhi-Jie Shang
A Speed up Method for Collaborative Filtering with Autoencoders 321
Wen-Zhe Tang, Yi-Lei Wang, Ying-Jie Wu and Xiao-Dong Wang
Analysis of NGN-Oriented Architecture for Internet of Things 327
Wei-Dong Fang, Wei He, Zhi-Wei Gao, Lian-Hai Shan and Lu-Yang Zhao
x

Hypergraph Spectral Clustering via Sample Self-Representation 334


Shi-Chao Zhang, Yong-Gang Li, De-Bo Cheng and Zhen-Yun Deng
Safety Risk Early-Warning for Metro Construction Based on Factor Analysis
and BP_Adaboost Network 341
Hong-De Wang, Bai-Le Ma and Yan-Chao Zhang
The Method Study on Tax Inspection Cases-Choice: Improved Support Vector
Machine 347
Jing-Huai She and Jing Zhuo
Development of the System with Component for the Numerical Calculation
and Visualization of Non-Stationary Waves Propagation in Solids 353
Zhanar Akhmetova, Serik Zhuzbayev, Seilkhan Boranbayev
and Bakytbek Sarsenov
Infrared Image Recognition of Bushing Type Cable Terminal Based on Radon
and Fourier-Mellin Transform and BP Neural Network 360
Hai-Qing Niu, Wen-Jian Zheng, Huang Zhang, Jia Xu and Ju-Zhuo Wu
Face Recognition with Single Sample Image per Person Based on Residual
Space 367
Zhi-Bo Guo, Yun-Yang Yan, Yang Wang and Han-Yu Yuan
Cloud Adaptive Parallel Simulated Annealing Genetic Algorithm
in the Application of Personnel Scheduling in National Geographic
Conditions Monitoring 377
Juan Du, Xu Zhou, Shu Tao and Qian Liu
Quality Prediction in Manufacturing Process Using a PCA-BPNN Model 390
Hong Zhou and Kun-Ming Yu
The Study of an Improved Intelligent Student Advising System 397
Xiaosong Li
An Enhanced Identity Authentication Security Access Control Model Based
on 802.1x Protocol 407
Han-Ying Chen and Xiao-Li Liu
Recommending Entities for E-R Model by Ontology Reasoning Techniques 414
Xiao-Xing Xu, Dan-Tong Ouyang, Jie Liu and Yu-Xin Ye
V-Sync: A Velocity-Based Time Synchronization for Multi-Hop Underwater
Mobile Sensor Networks 420
Meng-Na Zhang, Hai-Yan Wang, Jing-Jie Gao and Xiao-Hong Shen
An Electricity Load Forecasting Method Based on Association Rule Analysis
Attribute Reduction in Smart Grid 429
Huan Liu and Ying-Hua Han
The Improved Projection Pursuit Evaluation Model Based on Depso Algorithm 438
Bin Zhu and Wei-Dong Jin
HRVBased Stress Recognizing by Random Forest 444
Gang Zheng, Yan-Hui Chen and Min Dai
xi

Ricci Flow for Optimization Routing in WSN 452


Ke-Ming Tang, Hao Yang, Xin Qiu and Lv-Qing Wu
Research on the Application-Driven Architecture in Internet of Things 458
Wei-Dong Fang, Wei He, Wei Chen, Lian-Hai Shan and Feng-Ying Ma
A GOP-Level Bitrate Clustering Recognition Algorithm for Wireless Video
Transmission 466
Wen-Juan Shi, Song Li, Yan-Jing Sun, Qi Cao and Hai-Wei Zuo
The Analysis of Cognitive Image and Tourism Experience in Taiwan’s Old
Streets Based on a Hybrid MCDM Approach 476
Chung-Ling Kuo and Chia-Li Lin
A Collaborative Filtering Recommendation Model Based on Fusion
of Correlation-Weighted and Item Optimal-Weighted 487
Shi-Qi Wen, Cheng Wang, Jian-Ying Wang, Guo-Qi Zheng,
Hai-Xiao Chi and Ji-Feng Liu
A Cayley Theorem for Regular Double Stone Algebras 501
Cong-Wen Luo
ARII-eL: An Adaptive, Informal and Interactive eLearning Ontology Network 507
Daniel Burgos
Early Prediction of System Faults 519
You Li and Yu-Ming Lin
QoS Aware Hierarchical Routing Protocol Based on Signal to Interference
plus Noise Ratio and Link Duration for Mobile Ad Hoc Networks 525
Yan-Ling Wu, Ming Li and Guo-Bin Zhang
The Design and Implementation of Meteorological Microblog Public Opinion
Hot Topic Extraction System 535
Fang Ren, Lin Chen and Cheng-Rui Yang
Modeling and Evaluating Intelligent Real-Time Route Planning and Carpooling
System with Performance Evaluation Process Algebra 542
Jie Ding, Rui Wang and Xiao Chen
Multimode Theory Analysis of the Coupled Microstrip Resonator Structure 549
Ying Zhao, Ai-Hua Zhang and Ming-Xiao Wang
A Method for Woodcut Rendering from Images 555
Hong-Qiang Zhang, Shu-Wen Wang, Cong Ma and Bing-Kun Pi
Research on a Non-Rigid 3D Shape Retrieval Method Based on Global
and Partial Description 562
Tian-Wen Yuan, Yi-Nan Lu, Zhen-Kun Shi and Zhe Zhang
Virtual Machine Relocating with Combination of Energy and Performance
Awareness 570
Xiang Li, Ning-Jiang Chen, You-Chang Xu and Rangsarit Pesayanavin
Network Evolution via Preference and Coordination Game 579
En-Ming Dong, Jian-Ping Li and Zheng Xie
xii

Sensor Management Strategy with Probabilistic Sensing Model


for Collaborative Target Tracking in Wireless Sensor Network 585
Yong-Jian Yang, Xiao-Guang Fan, Sheng-Da Wang, Zhen-Fu Zhuo,
Jian Ma and Biao Wang
Generalized Hybrid Carrier Modulation System Based M-WFRFT with Partial
FFT Demodulation over Doubly Selective Channels 592
Yong Li, Zhi-Qun Song and Xue-Jun Sha
On the Benefits of Network Coding for Unicast Application in Opportunistic
Traffic Offloading 598
Jia-Ke Jiao, Da-Ru Pan, Ke Lv and Li-Fen Sun
A Geometric Graph Model of Citation Networks with Linearly Growing
Node-Increment 605
Qi Liu, Zheng Xie, En-Ming Dong and Jian-Ping Li
Complex System in Scientific Knowledge 612
Zong-Lin Xie, Zheng Xie, Jian-Ping Li and Xiao-Jun Duan
Two-Wavelength Transport of Intensity Equation for Phase Unwrapping 618
Cheng Zhang, Hong Cheng, Chuan Shen, Fen Zhang, Wen-Xia Bao,
Sui Wei, Chao Han, Jie Fang and Yun Xia
A Study of Filtering Method for Accurate Indoor Positioning System Using
Bluetooth Low Energy Beacons 624
Young Hyun Jin, Wonseob Jang, Bin Li, Soo Jeong Kwon,
Sung Hoon Lim and Andy Kyung-yong Yoon

Subject Index 633


Author Index 637
Fuzzy Control, Theory and System
This page intentionally left blank
Fuzzy Systems and Data Mining II 3
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-3

Cumulative Probability Distribution Based


Computational Method for High Order
Fuzzy Time Series Forecasting
Sukhdev S. GANGWAR and Sanjay KUMAR1
Department of Mathematics, Statistics & Computer Science, G. B. Pant University of
Agriculture & Technology, Pantnagar-263145, Uttarakhand, India

Abstract. Issue of deciding interval length, calculations of complicated fuzzy


logical relations and hunt of apposite defuzzification process have been an
important area of research in fuzzy time series forecasting since its inception. In
present study, cumulative probability distribution based computational scheme
with discretized of universe is proposed for fuzzy time series forecasting. In this
study, cumulative probability distribution decides the length of intervals using
characteristic of data distribution and proposed computational algorithm
minimizes calculations of complex fuzzy logical relations and search of suitable
defuzzification method. To verify the enhancement in forecasting accuracy of
developed model, it is applied to the benchmark problem of forecasting historical
student enrollments of University of Alabama. Accuracy in forecasted enrollments
of developed model is also compared with the other various methods using
different error measures. Coefficients of correlation and determination are used to
determine the strength between forecasted and actual enrollments.

Keywords. Fuzzy time series, probability distribution, computational method,


forecasting

Introduction

Multiple regression based parametric models (Autoregression, Moving-average,


ARMA, ARIMA etc.) are comprehensive Statistical techniques used for forecasting.
An important confine of these parametrical forecasting models is not to tackle issue of
uncertainty in time series data that occurs because of imprecision and vagueness. Song
and Chissom [1, 2, 3] integrated fuzzy set theory of Zadeh [4] with time series
forecasting and developed few models of fuzzy time series (FTS) forecasting to grip
the uncertainty in historical time series data to forecast enrollments of the University of
Alabama. Chen [5] and Hwang et. al. [6] used simple arithmetic operation and
variations of the enrollments between current and last year to develop more efficient
FTS forecasting methods than the ones presented by Song and Chissom [1, 2, 3]. Own
and Yu [7] proposed high order forecasting model to address the limitation of the
model developed by Chen [5].

1
Corresponding Author: Sanjay KUMAR, Department of Mathematics, Statistics & Computer Science,
G. B. Pant University of Agriculture & Technology, Pantnagar-263145, Uttarakhand, India; E-mail:
skruhela@hotmail.com.
4 S.S. Gangwar and S. Kumar / Cumulative Probability Distribution

Wong et al. [8] utilized window size of FTS to propose a time variant forecasting
model. Performance of this model was tested by using time series data of enrollments
of University of Alabama and TAIEX. Kai et. al. [9] used K-mean clustering technique
to discretize universe of discourse and proposed an enhanced fuzzy time series
forecasting model. Chen and Tanuwijaya [10] presented new methods to handle
forecasting problems using high-order fuzzy logical relationships and automatic
clustering techniques.
Cheng et al. [11] discretized universe of discourse (UD) using minimum entropy
principle and used trapezoidal membership functions for enhancing accuracy in FTS
forecasting. Hurang and Yu [12] used ratio-based method to identify the length of
intervals in fuzzy time series forecasting which was further enhanced by Yolcu et. al.
[13] using single-variable constrained optimization technique. Teoh et al. [14] used
cumulative probability distribution approach (CPDA) with rough set rule induction and
proposed a hybrid FTS model. Su et al. [15] proposed used MEPA, CPDA and a rough
set algorithm to develop a new model for FTS forecasting.
Fuzzy relational equation and suitable defuzzification process are the pivot
components in any fuzzy time series forecasting method. To minimize the time in
generating fuzzy relational equations using complex min-max composition operation
and to eliminate the search of suitable defuzzification process Singh [16, 17, 18]
proposed various computational methods using difference parameter as fuzzy relation
for FTS forecasting. Joshi and Kumar [19] also presented a computational method
using third order difference as fuzzy relation. To enhance the performance of
computational FTS forecasting method, Gangwar and Kumar [20] developed a
computational algorithm using high order difference parameters and implemented it in
discretized universe of discourse. Intuitionistic fuzzy set (IFS) were used with CPDA
by Gangwar and Kumar [21] to introduce hesitation in FTS forecasting with unequal
intervals.
UD in all computational methods was portioned into the intervals of equal length.
In some cases, the discretization of universe of discourse into equal length intervals
may not give correct classification of time series data. The motivation and intention of
this study is to present a computational method using high order difference parameters
as fuzzy relation with discretized UD in which length of the intervals are optimized
using CPDA. Proposed algorithm eliminates time of making relational equations by
using tedious min-max composition operations and defuzzification process. Developed
method of FTS forecasting has been applied to benchmark problem of forecasting
student enrollments data of University of Alabama and compared with the other recent
methods proposed by various researchers.

1. Some Basic Concepts of Fuzzy Time Series

~
Let U = {u1, u2, u3, . . . , un,}be an UD. A fuzzy set Ai of U is defined as follows:

~
Ai P A~ (u1 ) / u1  P A~ (u 2 ) / u 2  P A~ (u3 ) / u3  .......  P A~ (u n ) / u n
i i i i
~
Here P A~ is membership function of fuzzy set Ai and assigns a value to each element
i
S.S. Gangwar and S. Kumar / Cumulative Probability Distribution 5

~
of U in [0, 1]. P A~ (u k )
i
(1 ≤ k ≤ n) is grade of membership of uk in Ai . Suppose
fuzzy sets fi(t), (i = 1, 2, . . .) are defined in the Universe of discourse Y(t). If F(t) is
the collection of fi(t), then F(t) is known as fuzzy time series on Y(t) [1]. F(t) and Y(t)
depend upon t and hence both are the function of time. If only F(t-1) causes F(t), i.e.
F (t  1) o F (t ), then relationship is denoted by fuzzy relational equation
F (t ) F (t  1)oR (t , t  1) and is called the first-order model of F(t). (‘‘o’’ is
Max–Min composition operator). If more than one fuzzy sets F(t-n), F(t-n+1), . . . ,F(t-
1) cause F(t) , then relationship is called nth order fuzzy time series model [1, 2].

2. Proposed FTS Method and Computational Algorithm

Proposed FTS method uses CPDA to discretize UD. It uses the ratio formula [20] for
determining the number of partitions. Order of difference parameters used in forecast is
computed as follows:
x For year 1973 enrollment forecast, proposed computational method uses second
order difference parameter D2 | E2  E1 | .
x For year 1974 enrollment forecast, proposed computational method uses third
order difference parameter D3 | E3  E2 |  | E2  E1 | .
x For year 1975 enrollment forecast, proposed computational method uses fourth
order difference parameter

D4 | E4  E3 |  | E3  E2 |  | E2  E1 | .
ith order difference parameter is defined as follows:

ª i 1 º
Di | Ei  Ei 1 |  «¦ | Ei c  Ei ( c 1) |»  | E1  E 0 | , 2 d i d N
¬c 1 ¼ (1)

Here, N is number of observations in each partition.


The methodology of proposed computational algorithm based FTS forecasting
method is explained in following steps:
Step 1 Since normal distribution is essential constraint for CPDA. We use lilliefors test
of Dallal and Wilkinson [22] to verify whether time series data follow normal
distribution or not. If time series data follow normal distribution, go to step 2.
Step 2 Standard deviation (V) is main characteristic of normal distribution and is
implemented to define universe of discourse, U = [Emin - V, Emax+ V].
Step 3 U is discretized into n intervals. Length of these intervals is determined using
CPDA in following sub steps:
1 Calculate both lower (PLB )and upper bound (PUB) of cumulative probabilities
using following equations:
6 S.S. Gangwar and S. Kumar / Cumulative Probability Distribution

1
PLB 0 ½
°
2i  3 ¾
i
PLB , 2 d i d 3°
2n ¿ (2)

PUB 1, i n (3)

2 Calculate boundaries of each interval using inverse of following normal


cumulative distribution function (CDF) with parameters mean (c) and standard
deviation (V) at the corresponding probabilities in P.

x F 1 ( P | c,V ) {x : F ( x | c,V ) P (4)

1 ­  ( x  c) 2 ½
x

V 2S ³f ¯ 2V
and, P F ( x | c, V ) exp ® 2 ¾dx (5)
¿

Step 4 Construct the triangular fuzzy sets Ãi in accordance with the intervals
constructed in step 3.
Step 5 Fuzzify observations of time series by choosing maximum membership grade
set up fuzzy logical relationships.
Step 6 Use ratio formula [20] for repartitioning time series into different partitions.
Step 7 Apply the following computational algorithm.
~ ~
For a fuzzy logical relation Ai o A j Ãi and Ãj are fuzzified enrollment of
current and next year. Ei and Fj are actual enrollment of current year and crisp
forecasted enrollment of the next year.
Computational algorithm: Forecasted enrollments of University of Alabama are
computed using the following computational algorithm with complexity of linear order.
This algorithm uses the difference parameters (Di) of various orders, lease and upper
~ ~
bounds of the intervals. For a fuzzy logical relation Ai o A j , it uses mid point of the
~
intervals ui and uj having supremum value in Ai and Ãj. The algorithm starts to forecast
enrollment for year 1973 in partition 1, 1981 in partition 2 and 1988 in partition 3 using
the second order difference parameter. In following computational algorithm [*Ãj] is
interval uj for which membership in Ãj is supremum (i.e. 1), L[*Ãj] and U[*Ãj] are
lower and upper bounds of interval uj respectively. l[*Ãj] and M[*Ãk] is length and mid
point of the interval uj whose membership in Ãj is supremum (i.e. 1).

For i = 2, 3, . . . . . . N (No. of observations in partition)


Obtained fuzzy logical Relation for year i to i+1
~ ~
Ai o A j
P=0 and Q=0
Compute
S.S. Gangwar and S. Kumar / Cumulative Probability Distribution 7

ª i 1 º
Di Ei  Ei 1  «¦ Ei c  Ei ( c 1) »  E1  E0
¬c 1 ¼
For a = 2, 3,......i
Fia = M[*Ãi] + Di/(2(a-1))
FFia = M[*Ãi] - Di/(2(a-1))
If Fia ≥ L[*Ãj] and Fia ≤ U[*Ãj]
Then P =P+ Fia and Q =Q+ 1
If Fia ≥ M[*Ãj]
Then P =P+ l[*Ãj]/( 2(i-1)*(2(a-1))**2)
Else P =P- l[*Ãj]/( 2(i-1)*(2(a-1))**2)
If FFia ≥ L[*Ãj] and FFia ≤ U[*Ãj]
Then P =P+ FFia and Q =Q+ 1
If FFia ≥ M[*Ãj]
Then P =P+ l[*Ãj]/( 2(i-1)*(2(a-1))**2)
Else P =P- l[*Ãj]/( 2(i-1)*(2(a-1))**2)
Next a
Fj = (P + M(*Ãj))/(Q + 1)
Next i
We use the root mean square error (RMSE) and average forecasting error (AFE) to
compare the forecasting results of different forecasting methods. Coefficients of
correlation and determination are used to determine the strength between actual and
forecasted enrollments of University of Alabama.

3. Experimental Study

In this section, proposed method is applied to forecast enrollments at University of


Alabama. Online lilliefors calculator confirms that time series data obey normal
distribution. Emin and Emax are observed from actual enrollments at University of
Alabama (Table 1). UD is defined as U [ Emin  V , Emax  V ] and is
approximately equal to [11280, 21112]. UOD is further discretized into seven unequal
intervals. Both PLB and PUB for each interval are computed using the equations 3, 4, 5
and 6 given in section 3. Seven fuzzy sets Ã1, Ã2, Ã3, ..........., Ã7 are defined on UD. Time
series data is discretized into three parts using ratio expression [9]. Finally,
computational algorithm described in section III is applied to each partition to compute
forecasted enrollments of University of Alabama. The forecasted enrollments are
presented in Table 1. Table 2 (a & b) shows RMSE and AFE in forecasted enrollments.
Table 1. Actual and Forecasted enrollments of University of Alabama from year 1971 to year 1992.

Year Enrollments Enrollments


Year
Actual Forecasted Actual Forecasted
1971 13055 - 1982 15433 15502
1972 13563 - 1983 15497 15332
1973 13867 13993 1984 15145 15332
1974 14696 14392 1985 15163 15332
8 S.S. Gangwar and S. Kumar / Cumulative Probability Distribution

1975 15460 15209 1986 15984 -


1976 15311 15332 1987 16859 -
1977 15603 15332 1988 18150 18478
1978 15861 15875 1989 18970 19356
1979 16807 - 1990 19328 19356
1980 16919 - 1991 19337 19356
1981 16388 16696 1992 18876 19356

4. Results and Discussions

In order to compare the performance of the aforementioned fuzzy time series


forecasting method, it has been implemented to forecast enrollments of University of
Alabama. RMSE and AFER in forecasting enrollments by proposed method are
observed 240.20 and 1.183 respectively (Table 2a) which is less than that of the
methods proposed by Liu [23], Cheng et. al [24], Wong et. al [8], Egrioglu [25], Singh
[18], Joshi and Kumar [19] and Gangwar and Kumar [21], Gang & Hong-wei [26],
Gangwar & Kumar [20]. Diminished amount of RMSE and AFER confirms that the
CPDA and computational algorithm based projected FTS forecasting method
outperforms the methods given by [23, 24, 8, 25, 18, 19, 21, 26, 20] Coefficient of
Correlation (R) and Coefficient of determination (R2) between actual and forecasted
enrollments were observed 0.994294 and 0.988622 that confirms the good strength of
association between actual and forecasted enrollments.
Table 2a: Comparison of proposed method in terms of error measures
Proposed [23] [8] [18] [20]
240.2 328.78 297.2 308.7 642.6
RMSE
1.183 1.32 1.52 1.53 2.97
AFER

Table 2b: Comparison of proposed method in terms of error measures

Proposed [24] [25] [26] [21] [19]


240.2 478.4 484.6 440.6 251 419
RMSE
1.183 2.40 2.21 2.06 1.27 2.07
AFER

5. Conclusions

This study proposes cumulative probability distribution and computational approach


based method for high order FTS forecasting to enhance the accuracy in forecast. The
fusion of cumulative probability distribution with computational method is proposed to
give a hybrid fuzzy time series model. The computational algorithm based FTS
forecasting methods those are reviewed in the literature use intervals of equal length
and keep the order of difference parameters fixed. The major recompenses of this FTS
forecasting method are (i) it uses a computational algorithm whose complexity is of
S.S. Gangwar and S. Kumar / Cumulative Probability Distribution 9

linear order with partition mechanism of UD and thus forecasting of time series data
with large number of observations may not be a matter of concern, (ii) it uses CPDA to
determine the length of the intervals used in forecasting, (iii) it reduces intricate
computations of fuzzy relational matrices and eliminates need of defuzzification
method.
Even though the fusion of CPDA with computational approach in partitioned
environment enhances the accuracy in forecasted output, following are few limitations
with the proposed method.
1. It can not be applied to time series data that does not follow normal
distribution.
2. Time series data are partitioned using the ratio
U ( Emax  Emin ) / 2( Emax  Emin ) . If 0  U d 1 then there will be no.
In this case, difference parameters increases heavily to make the computation
very complex.
3. If U t N / 2 (N = no. of observations in time series data), there will not be
enough observations in partitions for subsequently forecast.
However, some preprocessing techniques can be explored to make time series data
approximately normally distributed to address the limitation of non-normally
distributed time series data. There is also scope to explore the proposed method with
well known k-mean or any exclusive clustering techniques for partitioning the time
series data rather than using ratio formula.

References

[1] Q. Song, B. S. Chissom, Fuzzy time series and its models, Fuzzy Sets and Systems, 54(1993), 269-277.
[2] Q. Song, B. S. Chissom, Forecasting enrollments with fuzzy time series - Part I, Fuzzy Sets and Systems,
54(1993), 1-9
[3] Q. Song, B. S. Chissom, Forecasting enrollments with fuzzy time series - Part II., Fuzzy Sets and Systems,
64(1994), 1-8.
[4] L. A. Zadeh, Fuzzy set, Information and Control, 8(1965), 338-353.
[5] S. M. Chen, Forecasting enrollments based on fuzzy time series, Fuzzy Sets and Systems, 81(1996), 311-
319.
[6] J. R. Hwang, S. M. Chen, C. H. Lee, Handling forecasting problem using fuzzy time series, Fuzzy Set
and System, 100(1998), 217-228.
[7] C. M Own, P. T Yu, Forecasting fuzzy time series on a heuristic high-order model, Cybernetics and
Systems: An International Journal, 36(2005), 705-717.
[8] W. K. Wong, E. Bai, A. W. C. Chu, Adaptive time variant models for fuzzy time series forecasting. IEEE
Transaction on Systems, Man and Cybernetics-Part B: Cybernetics, 40(2010), 1531-1542.
[9] K. Chi, F. P. Fu and W. G. Chen, A novel forecasting model of fuzzy time series based on K-means
clustering, IWETCS, IEEE, 2010, 223–225.
[10] S. M. Chen, K, Tanuwijaya, Fuzzy forecasting based on high-order fuzzy logical relationships and
automatic clustering techniques, Expert Systems with Applications, 38(2011), 15425-15437.
[11] C. H. Cheng, R. J. Chang, C. A. Yeh, Entropy-based and trapezoid fuzzification based fuzzy time series
approach for forecasting IT project cost, Technological Forecasting and Social Change, 73(2006), 524-
542.
[12] K. Huarng, T. H. K. Yu, Ratio-Based Lengths Of Intervals To Improve Fuzzy Time Series Forecasting,
IEEE Transactions on Systems, Man and Cybernetics-Part B: Cybernetics, 36(2006), 328–40.
[13] U, Yolcu, E. Egrioglu, V. R. Uslu, M. A. Basaran, C. H. Aladag, A new approach for determining the
length of intervals for fuzzy time series, Applied Soft Computing, 9(2009), 647-651.
[14] H. J. Teoh, C. H. Cheng, H. H. Chu, J. S. Chen, Fuzzy Time Series Model Based on Probabilistic
Approach and Rough Set Rule Induction for Empirical Research in Stock Markets, Data & Knowledge
Engineering, 67(2008), 103–17.
10 S.S. Gangwar and S. Kumar / Cumulative Probability Distribution

[15] C. H. Su, T. L. Chen, C. H. Cheng, Y. C. Chen, Forecasting the Stock Market with Linguistic Rules
Generated from the Minimize Entropy Principle and the Cumulative Probability Distribution
Approaches, Entropy, 12(2010), 2397-417.
[16] S. R. Singh, A robust method of forecasting based on fuzzy time series, Applied Mathematics and
Computation, 188(2007), 472-484.
[17] S. R. Singh, A simple time variant method for fuzzy time series forecasting, Cybernetics and Systems:
An International Journal, 38(2007), 305-321.
[18] S. R. Singh, A computational method of forecasting based on fuzzy time series, Mathematics and
Computers in Simulation, 79(2008), 539-554
[19] B. P. Joshi, S. Kumar, A Computational method for fuzzy time series forecasting based on difference
parameters, International Journal of Modeling, Simulation and Scientific Computing, 4(2013),
1250023-1250035.
[20] S. S. Gangwar, S. Kumar, Partitions based computational method for high-order fuzzy time series
forecasting, Expert Systems with Applications, 39(2012), 12158-12164.
[21] S. S Gangwar, S. Kumar, Probabilistic and intuitionistic fuzzy sets based method for fuzzy time series
forecasting, Cybernetics and Systems, 45(2014), 349-361.
[22] G. E. Dallal, L. Wilkinson, An Analytic Approximation to the Distribution of Lilliefors’s Test for
Normality, The American Statistician, 40(1986), 294-296.
[23] H. T Liu, An improved fuzzy time series forecasting method using trapezoidal fuzzy numbers, Fuzzy
Optimization and Decision Making, 6(2007), 63-80.
[24] C. H. Cheng, J. W. Wang, G. W. Cheng, Multi-attribute fuzzy time series method based on fuzzy
clustering, Expert Systems with Applications, 34(2008), 1235-1242.
[25] E. Egrioglu, A new time-invariant fuzzy time series forecasting method based on genetic algorithm,
Advances in Fuzzy Systems, 2012, 2.
[26] G. Chen, H. W. Qu, A new forecasting method of fuzzy time series model, Control and Decision,
28(2013) 105-109.
Fuzzy Systems and Data Mining II 11
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-11

Introduction to Fuzzy Dual Mathematical


Programming
Carlos A. N. COSENZA a, Fabio KRYKHTINE a, Walid El MOUDANI b
and Felix A. C. MORA-CAMINO c,1
a
Lab Fuzzy, COPPE, Universidade Federal do Rio de Janeiro, Centro de Tecnologia,
Ilha do fundão, CEP 21941-594 Rio de Janeiro, RJ, Brazil
b
Doctorate School of Sciences and Technologies, Lebanese University, Tripoli-Al
koubba, Lebanon
c
ENAC, Toulouse University,7 avenue Edouard Belin, 31055 Toulouse, France

Abstract. In this communication the formulation of optimization problems using


fuzzy dual parameters and variables is introduced to cope with parametric or
implementation uncertainties. It is shown that fuzzy dual programming problems
generate finite sets of deterministic optimization problems, allowing to assess the
range of the solutions and of the resulting performance at an acceptable
computational effort.

Keywords. fuzzy dual numbers, fuzzy dual calculus, optimization, mathematical


programming

Introduction

In general optimization problems assume implicitly that their parameters (cost


coefficients, limit values for decision variables, boundary levels for constraints) are
perfectly known while very often for real problems this is not exactly the case [1].
Different approaches have been proposed in the literature to cope with this difficulty. A
first approach has been to perform around the nominal optimal solution numerical post
optimization sensibility analysis [2]. When some probabilistic information about the
values of the uncertain parameters is available, stochastic optimization techniques [3]
may provide the most expected optimal solution. When these parameters are only
known to remain within some intervals, robust optimization techniques [4] have been
developed to provide robust solutions. The fuzzy formalism has been also considered in
this case as an intermediate approach to represent the parameter uncertainties and
provide fuzzy solutions [5]. These different approaches result in general into a very
large amount of computation which turns them practically unfeasible.
In this communication, a new formalism based on fuzzy dual numbers is proposed
to diminish the computational burden when dealing with uncertainty in mathematical
programming problems.
The adopted formalism considers fuzzy dual numbers which have been introduced
recently by two of the authors [6] and which can be seen as a simplified version of

1
Corresponding Author: Felix A. C. MORA-CAMINO; ENAC, Toulouse University, 7 avenue
Edouard Belin, 31055 Toulouse, France , E-mail: felix.mora@enac.fr
12 C.A.N. Cosenza et al. / Introduction to Fuzzy Dual Mathematical Programming

fuzzy numbers adopting some elements of classical dual number calculus [7] and [8].
Indeed, the proposed special class of numbers, dual fuzzy numbers, integrates the
nilpotent operator H of dual numbers theory while considering symmetrical fuzzy
numbers. Then uncertain values are characterized by only three parameters: a mean
value, an uncertainty interval and a shape parameter.
In this communication, first are introduced the elements of fuzzy dual calculus
useful to tackle the proposed issue: basic operations as well as strong fuzzy dual and
weak fuzzy dual partial orders and fuzzy dual equality. Then two classes of fuzzy dual
mathematical programming problems are considered: those where uncertainty relays
only in the parameters of the problem and those for which the implementation of the
solution is subject to uncertainty. In both situations, the proposed formalism is
developed and used to identify the expected performance of the solutions.

1. Fuzzy Dual Numbers

~
The set of fuzzy dual numbers is the set ' of numbers of the form u = a  H b such as
aR, bR+ where r(u) = a is the primal part and d(u) = b is the dual part of the fuzzy
dual number.
A crisp fuzzy dual number will be such as b is equal to zero, losing its fuzzy dual
attribute. To each fuzzy dual number a  H b is attached a fuzzy symmetrical number
whose membership function μ is such that:

­0 if x d a  b or x t a  b
° (1)
P ( x) ® P ( x ) P ( 2a  x )
° x  [a  b, a  b]
¯

where μ is an increasing function between a-b and a with μ(a)=1.

1.1. Operations with Fuzzy Dual Numbers

~ ~
Different basic operations can be defined on ' [9]. First, the fuzzy dual addition  , is
given by:
~ (x  H y )
( x1  H y1 )  ( x1  x 2 )  H ( y1  y 2 )
2 2 (2)

where the neutral element of the fuzzy dual addition is (0  0 H ) , written ~0 .

Then the fuzzy dual product, written ~


x , is given by:

( x1  H y1 ) ~x ( x 2  H y 2 ) ( x1 . x 2  H ( x1 ˜ y 2  x 2 ˜ y1 ))
(3)

The fuzzy dual product has been chosen here in a way to preserve the fuzzy
interpretation of the dual part of the fuzzy dual numbers, so it is different of the product
of dual calculus. The neutral element of fuzzy dual multiplication is (1  0 H ) , written ~1 .
C.A.N. Cosenza et al. / Introduction to Fuzzy Dual Mathematical Programming 13

It is easy to check that internal operations such as fuzzy dual addition and fuzzy dual
multiplication are commutative and associative. The fuzzy dual multiplication is
distributive with respect to the fuzzy dual addition since operator ε is according to Eq.
(3) such as:
~
H ~x H 0 (4)

Comparing with common fuzzy calculus, fuzzy dual calculus appears to be much
less demanding in computer resource [10] and [11].

1.2. Fuzzy Dual Vectors

Let E be an Euclidean space of dimension p over R then we define the set of fuzzy dual
~
vectors E as the pairs of vectors which are taken from the Cartesian product E u E  ,
~
where E+ is the positive half-space of E. Basic operations can be defined over E :
Addition:

(a  H b)  (c  H d ) (a  c)  H (b  d ) a, c  E, b, d  E  (5)

Multiplication by a fuzzy dual scalar O  H P :

~ ~
(O  H P )( a  H b) Oa H (O b P a) O  H P  ', a  H b  E (6)

A pseudo scalar product is defined by:

~ (7)
u v r (u ).r (v)  H ( r (u ) .d (v)  d (u ). r (v) )  u, v  E

~
where "*" represents the inner product in E and "." represents the inner product in E.

2. Fuzzy Dual Inequalities

With the objective to make possible the comparison of fuzzy dual numbers as well as
the identification of extremum values between fuzzy dual numbers, a new operator
~
from ' to R+, called fuzzy dual pseudo norm, is introduced.

2.1. Fuzzy dual pseudo norm

Let us introduce the proposed operator:


~
a  H B  ' : a  H b D
a  U b  R (8)

where U is a shape parameter associated with the considered fuzzy dual number which is
given by:
14 C.A.N. Cosenza et al. / Introduction to Fuzzy Dual Mathematical Programming

1
U ³ P ( x)dx  [0, 1]
2b xR
(9)

In the case of fuzzy dual numbers with symmetrical triangular membership


functions, U = ½ while for crisp fuzzy dual numbers, U 0 . In this paper it is supposed
that the considered fuzzy dual numbers have the same shape, i.e. a common ρ value.
It is straightforward to establish that the operator defined in Eq.(8), whatever the
value of the shape parameter, satisfies the characteristic properties of a norm:
~
 a H b' : a H b t 0 (10)

a  R, b  R  a H b 0Ÿa b 0 (11)

(a  H b)  (D  H E ) d a  H b)  D  H E a, D  R, b, E  R  (12)

O (a  H b) D O a  H b D a, O  R, b  R  (13)

~
However, since the set of dual numbers ' is not a vector space, the proposed
operator can be only regarded as a pseudo norm.
The fuzzy dual pseudo norm of a fuzzy dual vector u can be introduced as (here
is the Euclidean norm associated to E):
u D r (u )  U d ( u ) (14)

2.2. Strong and weak fuzzy dual inequalities

Partial orders between fuzzy dual numbers can be introduced using this pseudo norm.
Depending if fuzzy dual numbers overlap or not, strong and weak partial orders can be
introduced.
 ~
A strong fuzzy dual partial order written t is defined over ' by:
~ 
 a1  H b1 , a2  H b2  ' : a1  H b1 t a2  H b2 (15)
œ a1  U b1 t a2  U b2

In that case there is no overlap between the membership functions associated with
the two fuzzy dual numbers and the first one is definitely larger than the second one.
 ~
A weak fuzzy dual partial order written t is defined over ' by:
~ 
a1  Hb1 , a 2  Hb2  ' : a1  Hb1 t a 2  Hb2 (16)
œ a1  Ub1 t a 2  Ub2 t a1  Ub1 t a 2  Ub2
In that case there is an overlap between the membership functions associated with
the two fuzzy dual numbers and the first one appears to be partially larger than the
second one.
C.A.N. Cosenza et al. / Introduction to Fuzzy Dual Mathematical Programming 15

A fuzzy dual equality, written ~ , can be defined between two fuzzy dual numbers
by:
~
 a1  H b1 , a 2  H b2  ' : a1  H b1 ~ (a 2  H b2 ) (17-a)
œ a 2  >a1  U b1 , a1  U b1 @ et a1  >a 2  U b2 , a 2  U b2 @

~
a1  Hb1 , a 2  Hb2  ' : a1  Hb1 # a 2  Hb2
œ a1  Ub1 t a 2  Ub2 t a 2  Ub2 t a1  Ub1 (17-b)
or a 2  Ub2 t a1  Ub1 t a1  Ub1 t a 2  Ub2

In this last case there is a complete overlap of the membership functions associated
with the two fuzzy dual numbers.
Then when considering two fuzzy dual numbers, they will be in one of the above
situations (no overlap, partial overlap or full overlap): strong fuzzy dual inequality,
weak fuzzy dual inequality or fuzzy dual equality.

2.3. Extremum operators

The max and the min operators over two or more fuzzy dual numbers can now be
defined. Let c+H J be the fuzzy dual maximum of fuzzy dual numbers a + H α and b+ H E :

c  H ˜J max ^a  H ˜ D , b  H ˜ E ` (18)

then:

c max ^a, b` (19.a)

J max ^a  U D , b  U E ` max ^a, b` (19.b)

Let d+ H G be the fuzzy dual minimum of fuzzy dual numbers a + H α and b+ H E :

d  H ˜G min ^a  H ˜ D , b  H ˜ E ` (20)

then:

d min ^a, b` (21.a)

G min ^a  U D , b  U E ` min ^a, b` (21.b)


Observe that here the max and min operators produce new fuzzy dual numbers.

3. Mathematical Programming with Fuzzy Dual Parameters

Here is introduced the fuzzy dual formulation of uncertain mathematical programming


problems.
16 C.A.N. Cosenza et al. / Introduction to Fuzzy Dual Mathematical Programming

3.1. Discussion

To illustrate the proposed approach the case of a linear programming problem with real
variables where all parameters are uncertain and described by fuzzy dual numbers, is
considered. The proposed approach can be easily extended to integer mathematical
programming problems or to nonlinear mathematical programming problems, or to
problems with different types of level constraints.
Let then define formally problem L~ given by:
n
minn 
xR
¦ c~ x
i 1
i i
(22)

under the constraints:


n
~
¦ a~ ki xi t bk k  ^1,", m` (23)
i 1

and
xi  R  i  ^1,", n` (24)

where the coefficients a~hi , b~k , c~i are uncertain parameters.

When the problem is a constrained cost minimization problem, the cost parameters
c~i , although uncertain, remains positive and the absolute operator can be retrieved
from expression of Eq. (22). Here is adopted the fuzzy dual hypothesis for the cost
coefficients ci , the technical parameters aki and the constraint levels bk . This opens
different perspectives to be considered when dealing with the parameter uncertainty.
Here are considered three different cases:
 the nominal case (a standard deterministic linear programming problem) in which
the dual parts of the parameters are zero;
 the pessimistic case where uncertainty adds to the cost and where constraints are
strong ones,
 the optimistic case where uncertainty subtracts from the cost and the constraints
are weak ones.
The nominal case corresponds to a standard mathematical programming problem. The
analysis of the pessimistic case is developed here with more detail and can be transposed
easily to the study of the optimistic case.

3.2. Minimum Performance Bound

In the pessimistic case, problem L+ is formulated which is a fuzzy dual linear


programming problem with fuzzy dual constraints and real decision variables and is
written as:
n
minn
xR
¦ (c  H d ) x
i 1
i i i
(25)

under strong inequality constraints:


n 
¦ (a ki  H D ki ) xi t bk  HEk k  ^1, " , m` (26)
i 1
C.A.N. Cosenza et al. / Introduction to Fuzzy Dual Mathematical Programming 17

and xi  R  i  ^1,", n` (27)

where ci , d i , aki , D ki , bk , E k are given.

This problem corresponds to the minimization of the worst estimate of total cost
with satisfaction of strong level constraints. Here variables xi are supposed to take real
positive values, but they could take also fully real or integer values. In the case in
which the di are zero, the uncertainty is relative to the feasible set. Problem L+ is
equivalent to the following problem in R  n :
n n
minn 
xR
¦
i 1
ci xi  U ¦d x
i 1
i i
(28)

under the hard constraints:


n

¦ (a ki  U D ki ) xi t bk  U E k k  ^1,", m` (29)
i 1

and
xi t 0 i  ^1,", n` (30)

It appears that the proposed formulation leads to minimize a combination of the


values of the nominal criterion and of its degree of uncertainty. In the case in which
the cost coefficients are positive this problem reduces to a classical linear programming
problem over R  n . In the general case, since the quantity n c x will have at solution a
¦
i 1
i i


particular sign, the solution x of problem L+ will be the one corresponding to the
minimum of:
^ min §¨ c xH  U d xH ·¸ , min §¨ c xG  U d xG ·¸ `
n n n n
(31)
xR n 
¦
©i 1
i i ¦i 1
i i
¹ xR n 
¦
©i 1
i i ¦
i 1
i i
¹
H
where x is solution of problem:
n n

xR
minn  ( ¦c x  U ¦d x )
i 1
i i
i 1
i i
(32)

under the hard constraints:


n

¦ (a ki  U D ki ) xi t bk  U E k k  ^1, ", m` (33)


i 1

¦c xi 1
i i t 0 and xi t 0 i ^1,", n` (34)

G
and where x is solution of problem:
n n

xR
minn  ( U ¦ i 1
d i xi  ¦c x )
i 1
i i
(35)
18 C.A.N. Cosenza et al. / Introduction to Fuzzy Dual Mathematical Programming

under the following constraints:


n

¦ (a ki  U D ki ) xi t bk  U E k k  ^1, ", m` (36)


i 1

¦c x
i 1
i i d0 and xi t 0 i ^1,", n` (37)

The fuzzy dual optimal performance of this program is then given by:
n n n

¦ (c
i 1
i  H d i ) xi ¦c x
i 1

i i  H ¦ d i xi
i 1
(38)

Problems of Eqs. (32), (33) and (34) and of Eqs. (35), (36) and (37) are classical
continuous linear programming problems which can be solved in acceptable time even
for large size problems.

3.3 Performance analysis

It is of interest to consider the complementary problem L- given by:


n n
minn  ¦ ci xi  U ¦ d i xi (39)
xR
i 1 i 1

under the constraints:


n

¦ (a ki  U D ki ) xi t bk  U E k k  ^1,", m` (40)
i 1

and
xi t 0 i  ^1,", n` (41)

and the nominal problem L0 given by:


n
minn ¦ ci xi (42)
xR
i 1 i

under the nominal constraints:


n

¦a x t bk
ki i k  ^1,", m` (43)
i 1

and
xi t 0 i  ^1,", n` (44)
 0
Let x and x be the respective solutions of problems of Eqs.(39), (40) and (41) and of
Eqs. (42),(43) and (44), it will be instructive to compare in a first step the performances
of problems L+, L- and L0 where:
n n n n n
(45)
¦c x
i 1
i

i  U ¦ d i xi d
i 1
¦c x
i 1
i
0
i d ¦c x
i 1
i

i  U ¦ d i xi
i 1
C.A.N. Cosenza et al. / Introduction to Fuzzy Dual Mathematical Programming 19

This allows to display the dispersion of results between the pessimistic view of problem
L+, the optimistic view of problem L- and the neutral view of problem L0.

Then in a second step, since x is feasible for problems L- and L0, it is of interest to

compare the different performances when adopting the x solution:
n n n n n
(46)
¦c x
i 1
i

i  U ¦ d i xi d
i 1
¦c x
i 1
i

i d ¦c x
i 1
i

i  U ¦ d i xi
i 1

to produce bounds to the effective performance of the solution.

4. Mathematical Programming with Fuzzy Dual Variables

Now we consider fuzzy dual programming problems with fuzzy dual parameters and
decision variables as well. In that case problem V is formulated as:
n
min ( c  H d )( x  H y )
xi R , yi R 
¦
i 1
(47)
i i i i

under the strong constraints


n  (48)
¦ (a ki  H D ki ) ( xi  H yi ) t bk  HEk k  ^1, ", m`
i 1

and
xi  R, yi t 0 i  ^1, ", n` (49)

The above problem corresponds to the minimization of the worst estimate of total cost
with satisfaction of strong level constraints when there is some uncertainty not only on
the values of the parameters but also on the ability to implement exactly what should be
the optimal solution. According to Eq. (3), problem V can be rewritten as:
n
min
xR , y R
n n ¦ (c x
i 1
i i  H ( xi d i  ci y ))i (50)

under constraints Eq. (41) and :


n 
¦ (a ki xi  H (D ki xi  aki yi )) t bk  HEk k  ^1, ", m` (51)
i 1

which is equivalent in R n u R n to the following mathematical programming problem:


n n
min C ( x, y )
xR , yR n 
¦c x
i 1
i i  U ¦ (d i xi  ci yi )
i 1
(52)

under constraints of Eq. (41) and hard constraints:


n

¦ (a ki xi  U ( D ki xi  aki yi )) t bk  U E k k  ^1, ", m` (53)


i 1

Let A(x,y) be the set defined by:


­x  R n , y  R n : ½
°n ° (54)
° °
A( x, y ) ®¦ ( a ki x i  U ( D ki x i  a ki y i )) t bk  U E k ¾
°i 1 °
° k  ^1, " , m` °
¯ ¿
then
20 C.A.N. Cosenza et al. / Introduction to Fuzzy Dual Mathematical Programming

x  R n , y  R n A( x, y)  A( x, 0) and C ( x, y) t C ( x, 0) (55)
It appears, as expected, that the case of no diversion of the nominal solution is
always preferable. In the case in which the diversion from the nominal solution is fixed
to yi , i  ^1, ", n`, problem V has the same solution than problem V’given by:
n n
minn ¦ ci xi  U ¦ d i xi (56)
xR
i 1 i 1

under constraints Eq. (41) and:


n n

¦ (a
i 1
ki x i  U D ki x i ) t bk  U ( E k  ¦ a ki y i )
i 1
(57)
k  ^1, " , m`
The fuzzy dual optimal performance of problem of Eq. (46) will be given by:
n n

¦c
i 1
i xi*  H ¦ ( xi* d i  ci yi )
i 1
(58)

where x * of problem V’.


~
Here also other linear constraints involving the other partial order relations over '
(weak inequality and fuzzy equality) could be introduced in the formulation of problem
V while the consideration of the integer version of problem V will lead also to solve
families of classical integer linear programming problems.
The performance of the solution of problem V will be potentially diminished by the
reduction of the feasible set defined by Eq. (54).

5. Conclusion

This study has considered mathematical programming problems presenting some


uncertainty on the values of their parameters or in the implementation of the values for
the decision variables. A special class of fuzzy numbers, fuzzy dual numbers, has been
defined in such a way that the interpretation of their dual part as an uncertainty level
remains valid through the basic operations on these numbers. A pseudo norm has been
introduced, allowing the comparison between fuzzy dual expressions and leading to the
definition of hard and weak constraints to characterize fuzzy dual feasible sets.
Mathematical programming problems with uncertain parameters and variables have
been considered under this formalism. The proposed solution approach leads to solve a
finite collection of classical mathematical programming problems corresponding to
nominal and extreme cases, allowing to the characterization of the expected optimal
performance and solution. These results in a rather limited additional computational
effort compared with classical approaches. The above approach could be easily
extended to cope with fuzzy dual numbers of different shapes present in the same
mathematical programming problem.

References

[1] M. Delgado, J. L. Verdegay and M. A. Vila, Imprecise costs in mathematical programming problems,
Control and Cybernetics, 16(1987), 114-121.
[2] T. Gal, H. J. Greenberg (Eds.), Advances in Sensitivity Analysis and Parametric Programming, Series:
International Series in Operations Research & Management Science, Vol. 6, Springer, 1997.
C.A.N. Cosenza et al. / Introduction to Fuzzy Dual Mathematical Programming 21

[3] A. Ruszczynski and A. Shapiro. Stochastic Programming. Handbooks in Operations Research and
Management Science, Vol. 10, Elsevier, 2003.
[4] A. Ben-Tal, L. El Ghaoui and A. Nemirovski, Robust Optimization. Princeton Series in Applied
Mathematics, Princeton University Press, 2009.
[5] H. J. Zimmermann, Fuzzy Sets Theory and Mathematical Programming, in A. Jones et al. (eds.), Fuzzy
Sets Theory and Applications, D. Reidel Publishing Company, 99-114, 1986.
[6] C. A. N. Cosenza and F. Mora-Camino, Nombres et ensembles duaux flous et applications, in French,
Technical repport, Labfuzzy laboratory, COPPE/UFRJ, Rio de Janeiro, August 2011.
[7] W. Kosinsky, On Fuzzy Number Calculus, International Journal of Applied Mathematics and Computer
Science, 16(2006), 51-57.
[8] H. H. Cheng , Programming with Dual Numbers and its Application in Mechanism Design, Journal of
Engineering with Computers, 10(1994), 212-229.
[9] Mora_Camino F., O. Lengerke and C. A. N. Cosenza, Fuzzy sets and dual numbers, an integrated
approach, Fuzzy sets and Knowledge Discovery Conference, Chongqing, China, 28-31 May 2012.
[10] H. Nasseri, Fuzzy Numbers: Positive and Nonnegative, International Mathematical Forum, 3(2006),
1777-1780.
[11] E. Pennestrelli and R. Stefanelli, Linear Algebra and Numerical Algorithms using Dual Numbers,
Journal of Multibody Systems Dynamics, 18(2007), 323-344.
22 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-22

Forecasting National Football League


Game Outcomes Based on Fuzzy
Candlestick Patterns
Yu-Chia HSU1
Department of Sports Information and Communication, National Taiwan University of
Sport, Taichung City, Taiwan.

Abstract. In this paper, a sports outcome prediction approach based on sports


metric candlestick and fuzzy pattern recognition is proposed. The sports gambling
market data are gathered and processed to form the candlestick chart, which has
been widely used in financial time series analysis. Unlike the traditional
candlestick is composed of the price for financial market analysis, the candlestick
for sports metric is determined by the point spread, total point scored, and the
gambling shock which measures the bias of gambling line and real total point
scored. The fluctuation behaviors of sports outcome are represented by the
fuzzification of candlestick for pattern recognition. The decision tree algorithm is
applied on the fuzzified candlesticks to find the implicit knowledge rules, and used
these rules to forecasting the sports outcome. The National Football League is
introduced to our empirical study to verify the effectiveness of forecasting.

Keywords. fuzzy logic, pattern recognition, sports metric forecasting, sports


gambling markets

Introduction

Sports outcome prediction is an important area of betting on sports events, which has
gained a lot of popularity recently. American Football, such as National Football
League (NFL) games, uses a complex scoring system that the resulting scores are
hardly to model using standard modeling approaches. There are five ways to score in
American Football, giving 2 points, 3 points, 6 points, and 7 points under different
touchdown situation. Other sports, such as soccer, baseball, basketball, are relatively
much simpler to give different points under few situations. Consequently, the standard
modeling approaches, such as Poisson-type regression models, can provide impressive
performance when modeling scores in soccer, but it may perform worse when applied
to American Football scores due to the peculiar distribution [1].
Many researches on sport forecasting have demonstrated that the win/lose results
of the game may be affected by the past score, offense/defense statistics, player
absence [2], and etc. Even the temperature, wind speed and moistures in the
competition venues may potentially influence player performance. Most research
adopts these influencing factors for quantitative analysis to estimate the pointed score

1
Corresponding author: Yu-Chia Hsu, Dep. of Sports Information and Communication, National Taiwan
University of Sport, No. 16, Sec. 1, Shuang-Shih Rd., Taichung, Taiwan; E-mail: ychsu@ntupes.edu.tw.
Y.-C. Hsu / Forecasting National Football League Game Outcomes 23

or probability of victory. Many notable forecasting models were proposed by


academics and professionals and growing exponentially since the appearance of the
“Moneyball” phenomenon [3]. Although, considering these explosive data as the
variables that influence performance of players are important for coaches and managers
of sports teams, there have been very few studies conducted on modeling the betting
market data to forecast the winner of the game using non-parametric model based on
computational intelligence techniques.

1. Overview of Sport Beating Forecasting and Candlestick Chart Analysis

1.1. Sport beating Market

The market data in sport betting, such as odds, point spread, over/under, offer a type of
predictor and source of expert advice and expectation probability regarding sports
outcome. Adopting the betting market data published by bookmakers in the prediction
model could provide a rather high forecasting accuracy [4]. It is reasonable because
betting companies would not survive with inefficient odds and spread.
The betting market has many similar characteristics to financial markets [5]. Three
variants of the efficient market hypothesis (EMH): "weak", "semi-strong", and "strong"
form, which reflect the relationship between the current prices and the information
rationality and instantaneousness, have also been extended to betting market to reflect
the line incorporates all relevant information contained, all public information, and
inside information in the past game outcome [6].
Moreover, the price fluctuation were followed the mechanism known as the
random walk model under some restriction and condition, that the profitable
forecasting models were not persisting for a long time. However, both in financial and
betting market, the profitable forecasting models existed during the periods of market
inefficiency, but require extensive modeling innovations [7].

1.2. Japanese Candlestick Theory

Candlestick charting originates back to the Japanese rice future market in the 18th
century. It provides a visual aid for looking at data differently and forecasting near term
equity price movement, and then develops insight into market psychology. Recently,
Japanese candlestick theory is one of the most widely used technical analysis
techniques that based on the empirical model for investment decision. The trend of
financial time series was assumed to be predictable by recognizing the specific
candlestick patterns.
The candlestick is produced with the opening, highest, closing, lowest prices over
a given time interval. Each candlestick includes both a body and a wick that extends
either above or below the body. Figure 1 illustrates the candlestick line. The body is
shown as a box to represent the difference between the opening and closing price, and
the wick is shown as a line to represent the highest and the lowest price range during
the opening and closing. The body is filled with either black or white color, according
to the condition that weather the opening price is above or below the closing price,
respectively. In some particular time interval, the highest /lowest price is marked by the
top/bottom of the body. However, a candlestick may or may not have a wick.
24 Y.-C. Hsu / Forecasting National Football League Game Outcomes

The advantage of the candlestick theory is to introduce rich information in a


visualized interface for experienced chartist easy to identify the patterns. In a decade,
this analysis technique was extended to apply in other field, such as in predicting teen’s
stress level change on a micro-blog platform [8], and in sports metric [9] to forecasting
the game outcomes. However, the graphic patterns, such as the size of body, and the
relationship of position between two successive candlestick are hardly to be
represented. Some researchers have propose to utilize the fuzzy logic to solve the
problem [10-12].

Figure 1. The basic candlestick Figure 2. The sports metric candlestick

2. Candlestick Chart for Sports Metric

The sports metric candlestick charts provide simple graphics of game outcomes relative
to the gambling line, which have been proposed by Mallios [9]. Similarly with the
candlestick chart used in financial equity price analysis, each candlestick of sports
metric includes both a body and a wick that extends either above or below the body.
But the open, high, close, and low price, which constitute the body and wick of
candlestick chart in finance are not appropriate for sports. For sports metric, the
candlestick charts are composed by the winning/losing margin, the total points scored,
and their corresponding gambling line. Figure 2 illustrates the sports metric candlestick.
The body of candlestick is determined by the winning/losing margin, denoted D, and
the gambling line on the wining/losing margin, denoted LD, for a certain team. If D >
LD, the body’ color is white, and the body’s maximum and minimum values are
defined by D and LD. If LD > D, the body’ color is black, and the body’s maximum
and minimum values are defined by LD and D. The length of the candlestick wick is
determined by the gambling shock of line on total points scored, denoted GST. GST is
calculated by the difference between total points scored in the game and the
corresponding line on total points scored. If GST > 0, the wick extends above the body,
and below the body when GST <0. There is no wick when GST = 0.

3. Fuzzy Representation of Candlestick Patterns

3.1. Size of Body and Wick


The lengths of the wicks and the body can reflect the price fluctuation during a time
interval which are considered as the critical characteristics for candlestick pattern
recognition. In traditional technique analysis, the size of chart as short, medium or long
Y.-C. Hsu / Forecasting National Football League Game Outcomes 25

is defined variously with different opinion. In order to describe the characteristic of


candlestick more appropriately, four fuzzy linguistic variables used in fuzzy set are
adopt to describe the length of the wicks and the body: Very Short, Short, Long, and
Very Long. Figure 3 illustrates the membership function of the linguistic variables.
Two type of membership function are adopt to define the linguistic variables, linear
function is used for Very Long and Very Short, and triangle function is used for Short
and Long. In Figure 3, the footnote of x-axis indicate the real length of body or wick,
and the unit of x-axis is the normalized scale from 0 to 1. In this study the result of
evaluating the input values through the membership functions are obtained by
calculating the length of bodies or wicks with min-max normalization to be between 0
and 1.

Figure 3. The membership function for the linguistic variables

3.2. Relationship between Candlesticks

The size of candlestick line only reflects the characteristics of the price fluctuation
during a time interval, which is not enough to model valuable candlestick patterns. In
order to capture the characteristics of consequent trend of candlestick, the relationship
between two adjacent candlestick lines should be considered. Compared with the
previous candlestick line, the related position of the opening and closing price are used
to model the open style and the close style. Five linguistic variables, Low, Equal Low,
Equal, Equal High, and High, are defined to represent the open and close style. Figure
4 shows the membership function of the linguistic variables of the open style and close
style. The unit of x-axis is the prices in previous time interval and the y-axis is the
possible values of the membership function. The parameters in the function to describe
the linguistic variables depend upon the previous candlestick line, which is illustrated
by the previous candlestick line in the bottom of figure 4.

3.3. Fuzzification of Candlestick Pattern

The candlestick charts are characterized with fuzzy linguistic variables by applying
subordinate function maximum method. When more than one fuzzy set matched for a
single crisp value, the fuzzy set with the maximum membership value will be selected.
Table 1 shows the example of fuzzy candlestick pattern at time t-i to t for forecasting
the next game outcome.
To mining the rule of candlestick pattern for forecasting next game outcome, we
extract the historical data, consist of the point spread line, total point line, the actual
box score, and the outcome, at time t, t-1,…to t-i. Then, we translate these data to the
candlestick char entity, and symbolize the time series by fuzzification. The fuzzy
26 Y.-C. Hsu / Forecasting National Football League Game Outcomes

candlestick patterns are then recognized by using the random forests algorithm to
achieve the optima decision tree. Finally, the next game outcomes are predicted by
using the optimal decision tree.

Figure 4. The membership function of the linguistic variables of the open style and close style
Table 1. Example of fuzzy candlestick pattern
Time Body Upper Lower Body Open style Close style Outcome
frame length wick wick color
Length length
t-i Short VeryShort VeryShort Black EqualHigh Low Win

t VeryLong Long Short White EqualLow Equal Lose

4. Empirical Studies and Analysis

For demonstration the effectiveness of forecasting game outcome, we use the NFL data
gathered from the covers.com in the 2011-2012 season. We arbitrarily choose the
champion of the Super Bowl in the year, New York Giant, as the team for empirical
study. The data covers the regular season, and after seasons data for the year. Total 20
games that New York Giant have joined were held in the year, including 17 regular
games from week 1 to week 17, and 4 after season games including the Wildcard,
Divisional, Conference, and Super Bowl. The data in the year are divided into two sets
according to the NFL season. The data in regular season is considered as the training
set, and the data in play-off or super bowl is considered as the testing set. The rules of
candlestick patterns is found by the regular season data, and used to forecast the
outcome of super bowl.
The effect of the prediction is evaluated based on four performance measurements,
precision, recall, and F-measure, which are widely used in data mining. The formulas
are shown in Eqs. (1) – (3).
TP (1)
precision u 100%
TP  FP
TP (2)
recall u 100%
TP  FN
recall u precision (3)
F  measure 2 u
recall  precision
where TP, FP, and FN denote true positive, false positive, and false negative.
Y.-C. Hsu / Forecasting National Football League Game Outcomes 27

The empirical results of the forecasting are presented in Table 2. The results
revealed that the precision, recall, and F-measure of the outcome prediction for win are
extreme high. This may be occurred due to the small size of samples, which is the
innate limitation of sports outcome forecasting. Most team of NFL only played almost
20 games in one season. So, it is reasonable that only 17 samples are used for training,
and left 4 samples are for testing. In fact, the New York Giant wins the all 4 games of
after season, including the Super Bowl.
Table 2. The results of prediction
Time frame of Number of Outcome Precision Recall F-measure
the input data input variables prediction
t 7 Win 100% 100% 1
Lose 0% 0% 0
t-1, t 14 Win 100% 75% 0.857
Lose 0% 0% 0

5. Conclusion

In this paper, we proposed a computational intelligence based sports forecasting model


to predict the champion of NFL super bowl. This model combine the advantage of
candlestick chart analysis for financial time series and pattern recognition technique by
applying fuzzy set and random forests algorithm. Unlike most sports forecasting
models which are focus on the athletes’ performance, we adopt the beating market data
for considering the psychology and behavior of beating market maker and sport fans.
The original beating market data are transformed into candlestick chart and
characterized by fuzzification, and then be classified to find the implicit patterns for
forecasting. Empirical results show that this idea is feasible and obtains acceptable
accurate of prediction.

References

[1] R. D. Baker, I. G. McHale, Forecasting exact scores in National Football League games, International
Journal of Forecasting, 29 (2013), 122-130.
[2] W. H. Dare, S. A. Dennis, R. J. Paul, Player absence and betting lines in the NBA, Finance Research
Letters, 13 (2015), 130-136.
[3] M. Lewis, Moneyball: The Art of Winning an Unfair Game, W. W. Norton & Company, New York, 2003.
[4] D. Paton, L. V. Williams, Forecasting outcomes in spread betting markets: can bettors use ‘quarbs’ to
beat the book, Journal of Forecasting, 24 (2005), 139-154.
[5] S. D. Levitt, Why are gambling markets organized to differently from financial markets, The Economic
Journal, 114 (2004), 223–246.
[6] L. V. Williams, Information efficiency in betting markets: A survey, Bulletin of Economic Research, 51
(1999), 1-39.
[7] W. S. Mallios, Forecasting in Financial and Sports Gambling Markets. Wiley, New York, 2011.
[8] Y. Li, Z. Feng, L. Feng, Using candlestick charts to predict adolescent stress trend on micro-blog,
Procedia Computer Science, 63 (2015), 221-228.
[9] W. Mallios, Sports Metric Forecasting, Xlibris Corporation, 2014.
[10] C.-H. L. Lee, A. Liu, W.-S. Chen, Pattern discovery of fuzzy time series for financial prediction, IEEE
Transactions on Knowledge and data Engineering, 18 (2006), 613-625.
[11] Q. Lan, D. Zhang, L. Xiong, Reversal pattern discovery in financial time series based on fuzzy
candlestick lines, Systems Engineering Procedia, 2 (2011), 182-190.
[12] P. Roy, S. Sharma, M. K. Kowar, Fuzzy candlestick approach to trade S&P CNX NIFTY 50 index
using engulfing patterns, International Journal of Hybrid Information Technology, 5 (2012), 57-66.
28 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-28

A Fuzzy Control Based Parallel Filling


Valley Equalization Circuit
Feng RANa, Ke-Wei HUb, Jing-Wei ZHAOb and Yuan JIa,1
a
Department of Microelectronics Center, Shanghai University, Shanghai, China
b
School of Mechatronic Engineering and Automation, Shanghai University, Shanghai,
China

Abstract. Aiming at the problem of high cost and slow equalization speed in
traditional circuit, a parallel filling valley equalization circuit based on fuzzy
control is proposed in this paper. A fuzzy controller suitable for the circuit is
designed. The average voltage, voltage range and the balance electric quantity of
the battery pack are modeled by fuzzy model. The fuzzy reasoning and
defuzzification is produced to optimize the circuit control logic, which can be
adapted to the nonlinearity of the battery pack and the uncertainty of the battery
parameters. The simulation and experiment results show that, in the process of
charging and discharging, the fuzzy control based parallel filling valley
equalization circuit has the advantage of fast and efficient equalization which can
improve the use efficiency of the battery pack.

Keywords. Fuzzy control, battery equalization, filling valley balancing, energy


utilization, lithium battery

Introduction

As the continuous environmental pollution and the deterioration of oil, the vehicles
energy system structure has become a hot issue of the global concern and research [1].
In recent years, people are committed to the development of safe, efficient and clean
transport. The electric vehicle represents the development direction of the new
generation of environmentally friendly vehicles. As the power source of electric
vehicles, the power battery directly affects the use of electric vehicles [2]. The lithium
battery is one of the best choices for the power source of electric vehicle because of its
advantages, such as the high voltage, low self-discharge rate, high efficiency and
environmental protection [3]. However, the lithium battery in the production, long
standing and times of the charge and discharge process, battery charge amount of the
gap increases, so that within the battery pack cells dispersion increased dispersion
increases, individual cell performance degradation intensified, eventually leading to the
whole group batteries failure [4]. Therefore, the battery equalization technology is an
indispensable technology to ensure the safety of the battery and extend the service life
of the battery pack [5].The battery equalization can be roughly divided into active and
passive equalization [6-7]. Active balance in the process will not consume the battery
energy and has become a hot research topic today [8]. In the active balance scheme, the

1
Corresponding author: Yuan JI, Department of Microelectronics Center, Shanghai University,
Shanghai, China; E-mail: jiyuan@shu.edu.cn.
F. Ran et al. / A Fuzzy Control Based Parallel Filling Valley Equalization Circuit 29

highest energy cell of the battery pack adds energy to the lowest battery cell through
the converter. The super capacitor equalization, the inductance balance, and the
converter equalization are the most common ways to achieve the parallel filling valley
equalization [9]. However, problems of active equalization that needs to be solved
urgently, such as the high cost, the complex control circuit and the slow equalization
speed.
At this stage, the research of battery equalization technology mainly includes two
aspects. On the one hand it is the equalization strategy [10], about how to build a
common evaluation system of the battery group and then obtains the control strategy
equilibrium basis. On the other hand it is the design of the equalization circuit topology
[11], mainly research on the hardware implementation. In view of these two aspects,
the researchers put forward many different equalization solutions. Tian et al. [12]
proposed an energy tap charging and discharging equalization control strategy but did
not give a specific implementation of the program. Wu et al. [13] proposed that the
SOC based equilibrium of the battery can effectively eliminate the inconsistency of the
battery. But due to the SOC estimate accuracy is not guaranteed, it is only suitable for
the offline mode equalization. Fu et al. [14] proposed a control strategy based on the
battery voltage as the criterion of equilibrium, and the goal is to achieve the relative
consistency of the SOC of a single cell. It is widely used because of its clear goal and
simple control, but the ability to deal with nonlinear problems needs to be improved in
this method.
Generally, the lithium battery shows the nonlinear characteristic. In order to make
the battery maintain good system stability and fast balancing speed in different
environments with uncertain parameters, this paper proposes a parallel filling valley
equalization scheme based on the fuzzy control. A balanced fuzzy controller is used to
optimize the balance strategy. Simulation results show that the balancing speed and the
efficiency of the proposed parallel fill valley equalization scheme has been improved,
compared with the traditional inverse excitation filling valley control strategy. Thirteen
general E-bike lithium batteries (rated 48V) were used as the object of the series
battery for charging and discharging experiments. The experimental results show that,
the voltage difference between the lithium battery converges to less than 10mV with
the fuzzy control based parallel fill valley equalization strategy when the large voltage
difference are initialized between the battery packs.

1. Design of Filling Valley Equalization Fuzzy Controller

As the battery working characteristic is highly nonlinear curve, it is difficult to


determine all related parameters with a precise mathematical model. By using the fuzzy
control method, the system can make reasonable decision under uncertain or imprecise
conditions. In this paper, the fuzzy control technique is used to adjust the equilibrium
current and time. The fuzzy logical system uses Sugeno type. Sugeno method is
computationally effective and works well with optimization and adaptive techniques,
which makes it very attractive in control problems, particularly for dynamic nonlinear
systems. The inference calculation of the input to the output is realized by a set of
inference rules prior mastered. A typical fuzzy control system is composed of the rule
base, data base, inference engine, fuzzy unit and the defuzzification unit. Figure 1
shows the structure of filling valley equalization fuzzy controller, a typical two input
and one output fuzzy control system.
30 F. Ran et al. / A Fuzzy Control Based Parallel Filling Valley Equalization Circuit

Figure 1. Fuzzy logic controller of filling valley balancing

The controller is designed by fuzzy control theory. It is supposed to control the


equalization electricity quantity of the battery cell by controlling the equalization
current and time. The rule base is used to collect control rules used to describe the
battery equalization control algorithm. The database is used to store some of the data
that has been mastered.
The fuzzy controller has two inputs, the average voltage (AV) and the voltage
difference (VD), of the battery pack. The output is the equalization electricity quantity
(QBAL). As the input of fuzzy controller, the value of AV and VD are transformed into
the fuzzy language μ1(x)‫ޔ‬μ2(x) after the fuzzification process. Then the inference
engine will generate language control logic μ0(z) according to the pre-established rule
base and the input fuzzy language. At last, the language control logic is transformed
into the control output signal QBAL by the defuzzification process. The relationship
among the equalization electricity quantity QBAL, the equilibrium current IBAL and the
equilibrium time TBAL can be expressed as:

VS S M L VL
Membership degree relation

2.7V 3.4V 3.7V 4.1V4.2V


Average voltage

Figure 2. Membership functions for average voltage electric quantity

QBAL I BAL ˜ TBAL (1)


F. Ran et al. / A Fuzzy Control Based Parallel Filling Valley Equalization Circuit 31

Figure 2 and Figure 3 show the filling valley equalization fuzzy controller
input/output membership function. The equalization current and the equalization time
are determined by the measured average voltage AV and the voltage range VD of the
fuzzy controller. The triangle function is chose to be the membership function of the
average voltage (AV) and voltage difference (VD) as it’s easy to be calculated,
compared with other membership function. The average voltage (AV) is divided into 5
fuzzy subsets: average very large (AVL), average large (AL), average medium (AM),
average small (AS), average very small (AVS), for covering the domain [2.7V, 4.2V].
Input variable voltage difference VD is also divided into 5 fuzzy subsets: difference
very large (DVL), difference large (DL), difference medium (DM), difference small
(DS), difference very small (DVS), which used to cover the domain [0mv, 20mv].
System is in accordance with the 20mV input when the voltage range is greater than
20mV. The output variable equalization electricity QBAL is divided into subsets: VL
(very large), L (large), M (medium), S (small), VS (very small). Fig.2 and Fig. 3 shows
the membership function of fuzzy control system in the horizontal coordinates VD, AV,
QBAL. For example, when the AV is 3.4V, it is one hundred percent in the S
membership and zero percent in the M and VS membership. Figure 4 shows the surface
relationship of fuzzy controller. From the figure, the relationship among AV, VD and
the balance capability could be seen. The rule base can be described in Table 1.

Table 1. Control rules of fuzzy logic controller


Output DVS DS DM DL DVL
AVS OVS OVS OS OM OL
AS OVS OS OM OL OVL
AM OVS OM OVL OVL OVL
AL OVS OS OM OL OVL
AVL OVS OVS OS OM OL

VS S M L VL
Membership Degree Relation

0mv 5mv 10mv 15mv 20mv


VD

0As 2.5As 5As 7.5As 10As


QBAL

Figure 3. Membership functions for voltage difference and balancing electric quantity
32 F. Ran et al. / A Fuzzy Control Based Parallel Filling Valley Equalization Circuit

Figure 4. Surface of the fuzzy logic controller output VS input

Table 1 shows the fuzzy control system has a total of 25 rules, write them
separately R1㧘R2…R25. Fuzzy rule expression can be given by

­ R1 (AVS and DVS) o OVS


°R 2 (AVS and DS) o OVS
°
®R3 (AVS and DM) o OS
°
°
¯ R 25 (AVS and DVS) o OVS
(2)

In the theory of fuzzy control, there are many kinds of operations. So it has many
kinds of choices in the practical application. According to the demand of the definition,
filling valley equalization fuzzy controller operation rules are shown as follows: fuzzy
variable "and" is used for and operation, "min" is used to take the minimum value,
fuzzy variable "or" is used for or operation, "max" is used to take the maximum value.
Implication relation operation use the "min", output synthesis calculation use the "max"
and the centroid method is used in the output defuzzification process. All of the above
rules can be expressed as:

25
R R1  R2  ...  R25 Ri
i 1 (3)

The exact values of AV and VD is known, the fuzzy quantity of output QBAL can
be given by

Po QBAL (VB u VD) ˜ R


(4)

According to the fuzzy control logic operation rules


F. Ran et al. / A Fuzzy Control Based Parallel Filling Valley Equalization Circuit 33

25
Po QBAL ( VB u VD)
(VB V ˜ Ri
i 1
25
(V u VD)
(VB V ˜ ( Ai and Bi o Ci )
i 1 (5)

Because the "min" method is used in the calculation of the implication relation

25
Po QBAL [[V ˜ ( Ai o Ci )] [VD
[VB [[V ˜ ( Bi o Ci )]
i 1 (6)

Finally, the output fuzzy variables are accurate through the solution of the
defuzzification module

QBAL
³ Q P Q dQ
BAL o BAL BAL

³ P Q dQ
o BAL BAL
(7)

The maximum equalization current Ieq_max and equalization current IBAL can be
given by

U dc  U M U 0  U dio LP ˜ T ˜K
2

I eq _ max 2
2 ª¬ U dc  U M LP LSK  U 0  U dio LP  Lx º¼
(8)

§ QBAL ·
I BAL min ¨ , Ieq _ max ¸ (9)
¨ TBAL _ MIN ¸
© ¹

Where TBAL_MIN is the minimum equilibrium time, here TBAL_MIN = 0.8s.


Then calculate the PWM wave duty cycle V and equalization time TBAL.


2 I BAL ˜ U min  U dio ˜ L p  Lx
2

V
U dc  U M
2
˜ Lp ˜K ˜ T
(10)

­
°Q ½
TBAL min ® BEC , TBAL _ MAX ¾ (11)
° I BEC
¯ ¿

Where TBAL_MAX shows the maximum equilibrium time, here TBAL_MAX = 10s.
TBAL_MAX is set to limit the length of the equilibrium time, in case the battery pack
charging and discharging voltage changes too large in the time period.
34 F. Ran et al. / A Fuzzy Control Based Parallel Filling Valley Equalization Circuit

2. Simulation and Experiment Result

2.1. The simulation of the filling valley equalization fuzzy controller

The simulation of battery charging and discharging is carried out under the MATLAB
and the ordinary fly-back control is compared with the fuzzy control according to the
simulation results.

Figure 5. Fuzzy logic controller VS Normal fly-back controller

Figure 5 shows the lithium battery charging and discharging process under
MATLAB simulation. The battery pack has 13 sections batteries in different initial
voltage, the 13 curves of different colors in the figure are corresponding. According to
the algorithm, the charge and discharge process differences, distinguish the figure into
the Fig.5A, Fig.5B, Fig.5C and Fig.5D. The horizontal axis shows the simulation time,
and the vertical axis shows the battery voltage value of the battery pack. Where the
fuzzy control is used in Fig.5B and Fig.5D, the ordinary fly-back control is used in
Fig.5A and Fig.5C. By the comparison, in the charging process, as shown in Figure 5A,
the battery pack needs 190min to reach the energy balance, while fuzzy control based
controller achieve energy balance only in 120min, as shown in Figure 5B. In the static
discharge process, as shown in Figure 5C, the battery pack needs 232min to reach the
energy balance, while in Figure 5D, only in 150min, equilibrium state were reached.
The simulation results shows that the fuzzy control based parallel filling valley
equalization strategy the has a faster equalization speed compared with normal fly-back
controller control.

2.2. Experimental results of charge and discharge equalization

The battery charge and discharge experiment are carried out in this paper in the
background of the filling valley equalization fuzzy control. The initial voltage values of
these cells vary from 2.9V to 3.4V.
F. Ran et al. / A Fuzzy Control Based Parallel Filling Valley Equalization Circuit 35

Charging experiment is according to the way that constant current charging first
and then constant voltage charging. The full charging process is shown in Figure 6. In
order to see the equalization time clearly, Figure 7 shows the first 50min equalizing
charge diagram. It can be seen from the figure, only 30min the battery pack from the
initial state of disequilibrium can enter into the equilibrium, which has remarkably
improve compared with general equalization technology. As shown in Figure 8, the
equalizing discharging process can balance the power of batteries in different initial
voltage and make the batteries equalization.

Figure 6. Full diagram of equalizing charge

Figure 7. Equalizing charge diagram of first 50min

Figure 8. Full diagram of equalizing discharge

Table 2. Comparison before and after charging and discharging


Battery Pack Parameters Balanced discharge Balanced charge
Voltage range before balanced 535.2070mV 117.0849mV
Voltage range after balanced 8.9998mV 9.7893mV
The time to reach the balance About 30min About 98min
Charge and discharge time 237min 160min

From Table 2, the experiment result shows that, the fuzzy control based parallel
filling valley equalization circuit can clearly reduce the voltage difference and have
good performance in solving the nonlinear problem and equalization speed. However,
36 F. Ran et al. / A Fuzzy Control Based Parallel Filling Valley Equalization Circuit

the fuzzy rule and data base in the realistic fuzzy control process need adequate
accuracy to be reliable and better rules, inference process will certainly be found in the
later research.

3. Conclusion

The proposed fuzzy control based parallel filling valley equalization circuit can
fast reaching equalization. It has good ability of solving the nonlinear problem,
compared with traditional circuit. With the development of electrical vehicle, people
require high quality cell equalization. Lossless equalization can achieve lossless energy
transfer between different batteries to avoid the waste of energy. Filling valley
equalization is one of the schemes in lossless equalization. But how to improve the
energy flow efficiency and change the imbalance in multi- string-parallel battery pack
should be concerned in future research of lossless equalization. In addition,
equalization circuit is supposed to be as succinct as possible. How to reduce the size of
the chip and enhance the applicability deserve enough attention in further study.

References

[1] E. Kim, K. G. Shin, J. Lee. Real-time battery thermal management for electric vehicles. Cyber-Physical
Systems (ICCPS). Berlin: IEEE, (2014):72-83.
[2] C. L. Wey, P. C. Jui. A unitized charging and discharging smart battery management system. Connected
Vehicles and Expo (ICCVE). Las Vegas: IEEE, (2012):903-909.
[3] B. B. Qiu, H. P. Liu, J. L. Yang, et al. An active balance charging system of lithium iron phosphate
power battery group, Advanced Technology of Electrical Engineering and Energy, 2014.
[4] J. Cao, N. Schofield, A. Emadi. Battery Balancing Methods: A Comprehensive Review. Vehicle Power
and Propulsion Conference (VPPC). Harbin: IEEE, (2008):1-6
[5] B. T. Kuhn, G. E. Pitel, P. T. Krein, et al. Electrical properties and equalization of lithium-ion cells in
automotive applications. Vehicle Power and Propulsion Conference (VPPC): IEEE, 2005
[6] B. Lindemark. Individual cell voltage equalizers (ICE) for reliable battery performance.
Telecommunications Energy Conference,: INTELEC, (1991):196-201
[7] A. Baughman, M. Ferdowsi. Analysis of the Double-Tiered Three-Battery Switched Capacitor Battery
Balancing System. Vehicle Power and Propulsion Conference (VPPC). Harbin: IEEE, (2006):1-6
[8] W. G. Ji, X. Lu, Y. Ji, et al. Low cost battery equalizer using buck-boost and series LC converter with
synchronous phase-shift control. Annual IEEE Applied Power Electronics Conference and Exposition
(APEC). Long Beach: IEEE, 331(2013):1152-1157
[9] M. Daowd, N. Omar, DBP Van, et al. Passive and Active Battery Balancing comparison based on
MATLAB Simulation. IEEE Vehicle Power and Propulsion Conference (VPPC). Chicago, IL: IEEE,
(2011):1-7
[10] H. R. Liu, S. H. Zhang, et al. Lithium-ion battery charge and discharge equalizer and balancing
strategy.Transactions of China Electrotechnical Society, 16(2015):186-192.
[11] W. G. Ji, X. Liu, Y. Ji, et al. Low cost battery equalizer using buck-boost and series LC converter with
synchronous phase-shift control. In 2013 28th Annual IEEE applied Power Electronics Conference and
Exposition (APEC). Long Beach. CA, USA, (2013): 1152-1157
[12] R. Tian, D. T. Qin, M. H. Hu, et al. Research on battery equalization balance strategy. Journal of
Chongqing University (Nature Science Edition), (2005):1-4
[13] Y. Y. Wu, H. Liang. Research on electric vehicle battery equalization method. Automotive Engineering,
(2004): 384-385.
[14] J. J. Fu, B. J. Wu, H. J. Wu, et al. Dynamic bidirectional equalization system to a vehicle hang-ion
battery weave. China Measurement Technology, (2005): 10-11.
Fuzzy Systems and Data Mining II 37
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-37

Interval-Valued Hesitant Fuzzy Geometric


Bonferroni Mean Aggregation Operator
Xiao-Rong HEa,1, Ying-Yu WUa, De-Jian YUb, Wei ZHOUc and Sun MENGc
a
School of Economics and Management, Southeast University, Nanjing, China
b
School of Information, Zhejiang University of Finance and Economics, Hangzhou,
China
c
Yunnan University of Finance and Economics, YNFE
Kunming, China

Abstract. Hesitant fuzzy set (HFS) is one of the most common used techniques for
expressing the decision maker’s subjective evaluation information. Interval-valued
hesitant fuzzy set (IVHFS) is the extension of HFS and can reflect our intuition
more objectively. In this paper we focus on the IVHF information aggregation
methods based on Bonferroni mean (BM). We proposed the IVHF geometric BM
operator (IVHFGBM) and weighted IVHFGBM operators. Some numerical
examples for the operators are designed for showing their effectiveness. The
desirable properties of weighted IVHFGBM operator are also discussed in detail.
These operators can be applied in many areas especially in decision making
problems.

Keywords. Bonferroni mean, IVHFS, aggregation operator

Introduction

There are various methods available for decision making. One of the common features
for decision making methods is the information aggregation techniques [1-7]. Using
information aggregation operator in decision making, we can obtain the comprehensive
performance values of alternatives, which are used to compare alternatives. The
alternative with the biggest comprehensive performance value is the best option. The
Bonferroni mean (BM) [8-10] is a widely used technique in information aggregation
and decision making area. At present, it has been extended to interval-valued
uncertainty environment, intuitionistic fuzzy (IF) environment, interval-valued
intuitionistic fuzzy (IVIF) environment, fuzzy environment, uncertain linguistic fuzzy
environment and hesitant fuzzy environment.
However, we found that the BM operator cannot be used to aggregate interval-
valued hesitant fuzzy information [11] which is the research focus of this paper. In the
rest of this paper, we first review the basic concept about interval-valued hesitant fuzzy
set (IVHFS) and then extend the BM to interval-valued hesitant fuzzy environment.
The numerical examples are presented to better understand these interval-valued
hesitant fuzzy information aggregation methods based on BM operators.

1
Corresponding Author: Xiao-Rong HE, School of Economics and Management, Southeast University,
Nanjing, China; E-mail: shelley526@126.com.
38 X.-R. He et al. / Interval-Valued Hesitant Fuzzy Geometric BM Aggregation Operator

1. Preliminaries

In this section, a briefly review of the interval-valued hesitant fuzzy set (IVHFS) is
presented.
Definition 1 [11]. Let X be a referenced set. An IVHFS on X can be represented
as the following mathematical form:

E { x, f E ( x) !| x  X } (1)

where f E ( x ) denotes all possible interval-valued membership degrees of the


element x to the set E.
The IVHFS has a strong practical value in the situations where the membership
degree is difficult to determine. For example, a patient often on a regular abdominal
pain and he/she go to the hospital to consult three doctor independently. After
understanding of his/her illness, the first doctor thinks the possibility that patient with
stomachache is [0.6, 0.7]. The second doctor believes the possibility that patient with
stomachache is [0.1, 0.2] and likely to suffer from other diseases. The point of the third
doctor is similar to the first doctor and believes the possibility that patient with
stomachache is [0.7, 0.8]. In this case, the possibility that the patient with stomachache
can be represented by an interval-valued hesitant fuzzy element (IVHFE)
^[0.6,0.7], [0.1,0.2], [0.7, 0.8]` . Obviously, other kinds of the extended fuzzy set
theory cannot deal with this case effectively. Furthermore, IVHFE is the basic element
of IVHFS.
For any IVHFEs, Chen et al. [11] defined the operations and given the comparison
rules.
Definition 2. Suppose that h J h ^ª¬J L
`
, J U º¼ , h1 J 1h1 ^ª¬J 1
L
`
, J 1U º¼ and

h2 J 2 h2 ^ª¬J 2
L
`
, J 2U º¼ be three IVHFEs. O is a real number bigger than 0. Then the
operations are defined as follows.

1)
hO J h ^ª«¬ J , J
L O U O º
»¼`
2) O h J h ^ªª1¬1  (1  J L O
(  J U )O º¼
,1  (1
) ,1 `
3) h1 † h2 J 1h1 ,J 2 h2 ^[J 1
L
 J 2L  J 1LJ 2L , J 1U  J 2U  J 1U J 2U ]`
4) h1 † h2 J 1h1 ,J 2 h2 ^[J J , J 1U J 2U ]`
L L
1 2

Definition 3. For an IVHFE h J h ^ª¬J L


`
, J U º¼ , ʶh is the number of the elements

in h .

1 1 § L JU J L · 1 §J L JU ·
S (h ) ¦ J
# h J h
¦ ¨J 
# h J h © 2 ¹
¸ ¦ ¨
# h J h © 2 ¹
¸
(2)
X.-R. He et al. / Interval-Valued Hesitant Fuzzy Geometric BM Aggregation Operator 39

is defined as the score function of IVHFE h . For two IVHFE s h1 and h2 , if


S (h1 ) ! S (h2 ) , then h1 h2 ; if S (h1 ) S (h2 ) , then h1 h2 .
Example 1. Suppose that h1 ={[0.5,0.6], [0.6,0.7]}, h2
={[0.7,0.8],[0.4,0.6],[0.7,0.9]} and h3 ={[0.5,0.6]} be three IVHFEs. According the
score function and comparison rules defined in Definition 3, we have
1 1 § 0.5  0.6 0.6  0.7 ·
S (h1 )
ʶh1
¦ J
J 1h1 1

¨
2

2
¸ 0.6
¹
1 1 § 0.7  0.8 0.4  0.6 0.7  0.9 ·
S (h2 ) ¦ J 2 3 ¨© 2  2  2 ¸¹ 0.68
ʶh2 J 2 h2
1 1 § 0.5  0.6 ·
S (h3 ) ¦ J 3 1 ¨© 2 ¸¹ 0.55
ʶh3 J 3h3
Since S (h2 ) ! S (h1 ) ! S (h3 ) , then h2 h1 h3

2. Interval-Valued Hesitant Fuzzy Geometric BM Aggregation Operators

After the concepts of IVHFS and IVHFE are proposed, the aggregation operators for
aggregating IVHFEs are forwarded correspondingly, such as IVHFWA operator,
IVHFWG operator, IVHFOWA operator, IVHFOWG operator, GIVHFWA operator,
GIVHFWG operator, induced IVHFWA operator, induced IVHFWG operator, and so
on [12-13]. It should be noted that the above IVHF information aggregation operators
cannot be used to fuse the correlated arguments. On the other hand, the geometric mean
(GM) is the common aggregation operator and has been widely used in the information
fusion field. Based on the GM, the geometric BM (GBM) operator has been proposed
and investigated by some researchers. However, it seems that the researchers have no
concern with the investigation on GBM for aggregating IVHFEs which is the concern
of the following studies.
Definition 4. Let h j J j h j ^ª¬J j
L
`
, J jU º¼ ( j 1, 2,..., n) be a group of IVHFEs. If

1 § n ·
1

IVHFGBM h1 , h2 , , hn = ¨ … ( phi † qh j ) nn(( n 1) ¸


p  q ¨ ii ,zj j 1 ¸
(3)
© ¹

Then IVHFGBM is called the interval-valued hesitant fuzzy geometric BM


operator (IVHFGBM).
Theorem 1. Let p, q ! 0 , and h j J j h j ^ª¬J j
L
`
, J jU º¼ ( j 1, 2,..., n ) be a group of
IVHFEs. After using IVHFBM operator, the aggregated IVHFE is obtained as follows.

IVHFGBM h1 , h2 , , hn
40 X.-R. He et al. / Interval-Valued Hesitant Fuzzy Geometric BM Aggregation Operator

­ª 1

°« § · pq
° n 1

® «1  ¨1  – 1  (1  J i ) ((1  J j ) n(
= ¨ L p L q n ( n 1) ¸
J i hi ,J j h j
« ¸
° ¨ i 1, j 1 ¸
°¯ «¬ © iz j ¹

1
º½
§ · pq » °
°
n 1
1  ¨1  – 1  (1  J iU ) p (1
(  J Uj )q n(
n ( n 1) ¸ »
¨ i 1, j 1 ¸ »¾
¨ ¸ °
© iz j ¹ »°
¼¿ (4)

Example 2. Suppose there are three IVHFEs, h1 ={[0.49,0.63], [0.58,0.78],


[0.37,0.66], [0.68,0.87]}, h2 ={[0.69,0.81]} and h3 ={[0.57,0.69], [0.63,0.77]}. Based
on the IVHFGBM operator, the aggregated IVHFE for h1 , h2 and h3 can be obtained.
Since there are two parameters p and q in the IVHFGBM, the values of p and q may
change the aggregated results to a certain extent. For example,
(1) When p=1, q=10, then

IVHFGBM h1 , h2 , h3
= {[0.5516, 0.6758], [0.5627, 0.6850], [0.5902, 0.7153], [0.6232, 0.7801], [0.4622,
0.6925], [0.4640, 0.7098], [0.6020, 0.7190], [0.6506, 0.7868]}
the score of the aggregated IVHFE is 0.6419.
(2) When p=3, q=7, then

IVHFGBM h1 , h2 , h3
= {[0.5534, 0.6777], [0.5655, 0.6889], [0.5958, 0.7277], [0.62326, 0.7823], [0.4678,
0.6949], [0.4711, 0.7126], [0.6163, 0.7337], [0.6568, 0.7945]}
the score of the aggregated IVHFE is 0.6476.
As can be seen from the Definition 4, the IVHFGBM is symmetrical about
parameters p and q which is the same with IVHFBM operator. In order to describe this
phenomenon figuratively, Figure 1 is provided as followed.

Figure 1. Scores for IVHFEs obtained by the IVHFGBM operator (p∊ (0, 10), q∊ (0, 10))
X.-R. He et al. / Interval-Valued Hesitant Fuzzy Geometric BM Aggregation Operator 41

Figure 2 shows the changing trend of the scores for the aggregated IVHFEs based
on IVHFGBM operator when the two parameters are fixed.

Figure 2. Scores trends p=0.01, 1 and 10 (q∊ (0, 10))

Definition 5. Let h j J j h j ^ª¬J j


L
`
, J jU º¼ ( j 1, 2,..., n ) be a group of IVHFEs,
w ( w1 , w2 ,..., wn ) T
1,2,..., n) , satisfying wi ! 0
is the weight vector of h j ( j 1,2,
n
( i 1,2,..., n ), ¦w
i 1
i 1 . If

IVHFWGBM (h1 , h2 ,...,


.. hn )

1 § n ·

1


wi wj n(( n 1)
n
¨ … ph hi † qh j ¸
p  q ¨ ii ,zj j 1 ¸
© ¹ (5)

then IVHFWGBM is called the interval-valued hesitant fuzzy weighted geometric


BM operator.
Theorem 2. Let h j J j h j ^ª¬J j
L
`
, J jU º¼ ( j 1, 2,..., n ) be a group of IVHFEs,
w ( w1 , w2 ,..., wn ) T
1,2,..., n) , satisfying wi ! 0
is the weight vector of h j ( j 1,2,
n
( i 1,2,..., n ), and ¦ wi 1 . Then the IVHFWBM and IVHFWGBM operators can be
i 1
transformed as follows:
IVHFWBM (h1 , h2 ,..., hn )
­ª 1

° «§ 1 · pq
° ¨

n
¸
® «¨1  – 1  (1  (1  J i ) i ) ((1  ((1  J j ) )
L w p L wj q n ( n 1))
J i hi ,J j h j
« ¸ ,
° ¨ i 1, j 1 ¸
° «© iz j ¹
¯¬
§ 1 ·º ½
¸» °

n
¨1 
¨¨ i –
1  (1  (1  J iU ) wi ) p (1 (  J Uj ) j ) q
(  (1
w n ( n 1))
¸¸ » ¾
(6)
1, j 1 °
© iz j ¹ ¼» ¿
42 X.-R. He et al. / Interval-Valued Hesitant Fuzzy Geometric BM Aggregation Operator

IVHFWGBM (h1 , h2 ,..., hn )


­ª 1

°« § 1 · pq
°« ¨

n
¸ ,
® 1  ¨1  – 1  (1  (J i ) i ) ((1  (J j ) )
L w p L wj q n ( n 1))
J i hi ,J j h j
« ¸
° ¨ i 1, j 1 ¸
° « © iz j ¹
¯¬
1
º½
§ · pq » ° 1
¸ »°

n
1  ¨ 1  – 1  (1  (J iU ) wi ) p (1
(  (J Uj ) j ) q
w n ( n 1))
¸¸ » ¾
(7)
¨¨ i 1, j 1
°
© iz j ¹ »°
¼¿
Example 3. Suppose there are three IVHFEs, h1 ={[0.31,0.45], [0.46,0.71] }, h2
={[0.34,0.47]} and h3 ={[0.23,0.35], [0.46,0.58], [0.65,0.73]} and the weight of the
three IVHFEs is 0.3, 0.4, 0.3 . Based on the IVHFWGBM operator, the aggregated
T

IVHFE can be obtained when the values of p and q were assigned to specific numbers.
For example, when pp=0.1, qq=10, then
IVHFWGBM h1 , h2 , h3
= {[ 0.6512, 0.7377], [0.6829, 0.7648], [0.6831, 0.7650], [0.6528, 0.7390], [0.6861,
0.7672], [0.6864, 0.7673] }
the score of the aggregated IVHFE is 0.7153.
Example 4. Suppose there are four IVHFEs, h1 ={[0.2,0.4], [0.2,0.7]}, h2
={[0.5,0.6], [0.3,0.6]} , h3 ={[0.3,0.5]} and h4 ={[0.5,0.6],[0.3,0.6]}, the weight of the
four IVHFEs are supposed as 0.2, 0.3, 0.3, 0.2 . Based on the IVHFWGBM
T

operator, the aggregated IVHFE can be obtained, for example,


(1) when p 0.001, q 10 , the score is 0.7766;
(2) when p q 5 , the score is 0.7848;
(3) when p 10, q 0.001 , the score is 0.7765;
When the parameters p and q changed from 0 to 10 simultaneously, the scores are
shown in Figure 3 in detail.

Figure 3. Scores obtained by the IVHFWGBM operator


X.-R. He et al. / Interval-Valued Hesitant Fuzzy Geometric BM Aggregation Operator 43

Example 5. Suppose there are four IVHFEs, h1 ={[0.5,0.8], [0.5,0.6], [0.4,0.7]},


h2 ={[0.3,0.4], [0.6,0.7]} , h3 ={[0.4,0.6]} and h4 ={[0.3,0.5],[0.4,0.4]}, the weight of
the four IVHFEs are supposed as 0.2, 0.3, 0.3, 0.2 . Based on the IVHFWGBM
T

operator, the scores are shown in Figure 4 when the parameters p and q changed from 0
to 10 simultaneously.

Figure 4. Scores obtained by the IVHFWGBM operator

3. Conclusions

In this paper, we have extended the traditional BM and proposed the IVHFGBM and
IVHFGWBM operators to aggregate IVHFEs. Some numerical examples for these
operators are also presented to show the practicality and effectiveness. In the future
research, we intend to consider the extensions of some other BMs and study their
relationships, pay attention to the application of the proposed operators to the real
application area such as sustainable development evaluation, science and technology
project review, group decision making [14-16]and so on.

References

[1] J. J. Peng, J. Q. Wang, J. Wang, et al. Simplified neutrosophic sets and their applications in multi-criteria
group decision-making problems. International Journal of Systems Science, 47(2016), 2342-2358.
[2] D. Yu, D. F. Li and J. M. Merigó, Dual hesitant fuzzy group decision making method and its application
to supplier selection. International Journal of Machine Learning and Cybernetics, In press. DOI:
10.1007/s13042-015-0400-3
[3] H. Zhao, Z. Xu and S. Liu, Dual hesitant fuzzy information aggregation with Einstein t-conorm and t-
norm. Journal of Systems Science and Systems Engineering, In press.DOI:10.1007/s11518-015-5289-6.
[4] X. F. Wang, J. Q. Wang and W. E. Yang. Group decision making approach based on interval-valued
intuitionistic linguistic geometric aggregation operators. International Journal of Intelligent
Information and Database Systems, 7(2013), 516-534.
[5] M. Xia, Z. Xu and N. Chen. Induced aggregation under confidence levels. International Journal of
Uncertainty, Fuzziness and Knowledge-Based Systems, 19(2011), 201-227.
44 X.-R. He et al. / Interval-Valued Hesitant Fuzzy Geometric BM Aggregation Operator

[6] G. Wei. Interval valued hesitant fuzzy uncertain linguistic aggregation operators in multiple attribute
decision making. International Journal of Machine Learning and Cybernetics, In press. DOI:
10.1007/s13042-015-0433-7
[7] H. Liu, Z. Xu and H. Liao. The multiplicative consistency index of hesitant fuzzy preference relation.
IEEE Transactions on Fuzzy Systems, 24(2016), 82-93.
[8] C. Bonferroni, Sulle medie multiple di potenze, Bolletino Matematica Italiana, 5 (1950), 267-270.
[9] M. M. Xia, Z. S. Xu, and B. Zhu. Geometric Bonferroni means with their application in multi-criteria
decision making. Knowledge-Based Systems, 40 (2013), 88-100.
[10] W. Zhou and J. M. He. Intuitionistic fuzzy geometric Bonferroni means and their application in multi-
criteria decision making. International Journal of Intelligent Systems, 27(2012), 995-1019.
[11] N. Chen, Z. S. Xu, and M. M. Xia. Interval-valued hesitant preference relations and their applications
to group decision making. Knowledge-Based Systems, 37(2013), 528-540.
[12] R. M. Rodríguez, B. Bedregal, H. Bustince, et al. A position and perspective analysis of hesitant fuzzy
sets on information fusion in decision making. Towards high quality progress. Information Fusion,
29(2016), 89-97.
[13] R. Pérez-Fernández, P. Alonso, H. Bustince, et al. Applications of finite interval-valued hesitant fuzzy
preference relations in group decision making. Information Sciences, 326(2016), 89-101.
[14] D. Yu. Group decision making under intervaĺvalued multiplicative intuitionistic fuzzy environment
based on Archimedean t́conorm and t́norm. International Journal of Intelligent Systems, 30(2015),
590-616.
[15] D. Yu, W. Zhang and G. Huang. Dual hesitant fuzzy aggregation operators. Technological and
Economic Development of Economy, 22(2016), 194-209.
[16] W. Zhou and Z. S. Xu. Generalized asymmetric linguistic term set and its application to qualitative
decision making involving risk appetites. European Journal of Operational Research, 254(2016), 610-
621.
Fuzzy Systems and Data Mining II 45
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-45

A New Integrating SAW-TOPSIS Based on


Interval Type-2 Fuzzy Sets for Decision
Making
Lazim ABDULLAH1 and CW Rabiatul Adawiyah CW KAMAL
School of Informatics and Applied Mathematics, Universiti Malaysia Terengganu,
Malaysia

Abstract. Most of the integrated methods of multi-attributes decision making


(MADM) used type-1 fuzzy sets to represent uncertainties. Recent theory has
suggested that interval type-2 fuzzy sets (IT2 FS) could be used to enhance
representation of uncertainties in decision making problems. Differently from the
typical integrated MADM methods which directly used type-1 fuzzy sets, this
paper proposes an integrating simple additive weighting - technique for order
preference similar to ideal solution (SAW-TOPSIS) based on IT2 FS to enhance
judgment. The SAW with IT2 FS is used to determine the weight for each
criterion, while TOPSIS method with IT2 FS is used to obtain the final ranking for
the attributes. A numerical example is used to illustrate the proposed method. The
numerical results show that the proposed integrating method is feasible in solving
MADM problems under complicated fuzzy environments. In essence, the
integrating SAW-TOPSIS is equipped with IT2 FS in contrast to type-1 fuzzy sets
for solving MADM problems. The proposed method would make a great impact
and significance for the practical implementation. Finally, this paper provides
some recommendations for future research directions.

Keywords. Interval type-2 fuzzy set, Simple additive weighting, Multi-criteria


decision making , TOPSIS, preference order

Introduction

Decision making based on multi-criteria evaluation has been used with great success
for many applications. Most of these applications are characterized by high levels of
uncertainties and vague information. Fuzzy set theory has provided a useful way to
deal with vagueness and uncertainties in solving multi-criteria decision making
(MCDM) problem. During the last two decades, MCDM methods that integrated with
fuzzy sets have been one of the fastest growing research areas. Abdullah [1] presents a
brief review of category in the integration of fuzzy sets and MCDM. In general,
MCDM can be categorized into multi-attribute decision making (MADM) and multi-
objective decision making (MODM). Naturally, MADM problem is related to multiple
attributes. The attributes of MADM represent the different dimensions from which the
alternatives can be viewed by decision makers. There are many fuzzy MADM methods
that have been discussed in the literature, and fuzzy technique for order preference

1
Corresponding Author: Lazim ABDULLAH, School of Informatics and Applied Mathematics,
Universiti Malaysia Terengganu; E-mail: lazim_m@umt.edu.my.
46 L. Abdullah and C.W.R.A.C.W. Kamal / A New Integrating SAW-TOPSIS

similar to ideal solution (FTOPSIS) is one of the MADM methods. Preference or


decision derived from FTOPSIS is made by observing the degree of closeness to ideal
solution. Add to this method, fuzzy simple additive weighting (FSAW) is another type
of fuzzy MADM methods. It is an extension of the SAW method, where it employs
trapezoidal fuzzy numbers to represent imprecision in judgements.
Lately, the integration of MADM method has received considerable attention in
literature. Integrated method is simply defined as two or more methods that are
concurrently employed to solve decision making problems. For example, the TOPSIS
is integrated with fuzzy analytic hierarchy process (FAHP) model to propose a new
integrated model for selecting plastic recycling method [2]. Rezaie et al., [3] present an
integrating model based on FAHP and VIKOR method for evaluating cement firms.
Wang et al., [4] develop an integrating OWA–fuzzy TOPSIS to tackle fuzzy MADM
problems. Kharat et al., [5] applied an integrated fuzzy AHP–TOPSIS to municipal
solid waste landfill site selection problem. Pamučar and Ćirović [6] applied the new
integrated fuzzy DEMATEL–MABAC method in making investment decisions.
Tavana et al., [7] proposed an integrated fuzzy ANP-COPRAS-Grey method to
determine the selection of social media platform.
Most of these integrating methods employed type-1 fuzzy sets to represent
uncertainties in decision making. However, the type-1 fuzzy sets have some extent of
limitation in dealing with uncertainties. Recent theories suggest that interval type-2
fuzzy sets (IT2 FSs) are more flexible than the interval type-1 fuzzy sets in representing
uncertainties. Therefore, in contrast to these methods, this paper introduces linguistic
terms based on IT2 FS for proposing a new integrating MADM method. The IT2 FS is
incorporated within the framework of FSAW and FTOPSIS to develop a new
integrating fuzzy MADM method. Specifically, Interval Type-2 Fuzzy Simple
Additive Weighting (IT2 FSAW) method is integrated with Interval Type-2 Technique
for Order Preference Similar to Ideal Solution (IT2 FTOPSIS) method for solving
MADM problems. In the proposed method, the judgements made by decision makers
over the relative importance of alternatives are determined using IT2 FSAW procedure
and the final preference is obtained using IT2 FTOPSIS. The ranking method of IT2
FTOPSIS approach preserves the characteristics of fuzzy numbers where the linguistic
terms can easily be converted to fuzzy numbers.

1. Proposed Method

This paper integrates the IT2 FSAW with IT2 FTOPSIS to establish a new MADM
method. In this proposed method, the IT2 FSAW is used to find weights of the criteria,
whereas IT2 FTOPSIS is used to establish preference of alternatives. The definitions
of IT2 FS [8], upper and lower memberships of IT2 FS [9], and ranking values of the
trapezoidal IT2 FS [10] are used in the proposed method. The detailed procedure of the
proposed method is described as follows.
Step 1: Construct the decision matrix Y p of the p-th decision maker and construct the
average decision matrix Y , respectively.
L. Abdullah and C.W.R.A.C.W. Kamal / A New Integrating SAW-TOPSIS 47

x1 x2 xn
ª p f12p f11nnp º
f1 « f11 »
f2 « f p f 22p p »
f 22nn »
Yp ( fijp ) mun « 21
« »
fm « p »
«¬ f m
m1
1 f mp2 f mnp »
m ¼

Y ( fij )mun ,
(1)

§ f1† f 2 † † fijk ·
fij ¨ ij ij ¸,
¨ k ¸ f1 , f 2 , , f m represent the criteria and
where © ¹ .
x1 , x2 , , xn represents alternatives.

Step 2: Construct the aggregated fuzzy weight W , from the weighting matrix Wp of
the attributes provided by p-th decision maker.
p
Let wi (ai , bi , ci , di ), i 1, 2, , m be the linguistic weight given to the subjective
criteria C1 , C2 , , Ch and objective criteria Ch 1 , Ch 1 , , Cn by decision maker Dt .
f1 f2 fm
Wp ( wip )1um ª w1p w2p wmp º,
¬ ¼ (2)
W ( wi )1um , (3)
wi1 † wi2 † wik
wi , wi
where k is an interval type-2 fuzzy set.

To defuzzify weights of fuzzy attribute, the signed distance is employed [11].


Defuzzification of W is represented as:
1
d (W j ) ( w1j  w2j  w3j  w4j ), j 1
1, 2, ,n
4 (4)
The crisp value for criteria W . is given by:
d (W j )
Wj n
, j 1, 2, ,n
¦ d (W j )
j 1 (5)
n
¦ Wj 1
where j 1 . Therefore, the weight vector W [W1 , W2 , , Wn ] is constructed.

Step 3: Create the weighted decision matrix Yw ,


48 L. Abdullah and C.W.R.A.C.W. Kamal / A New Integrating SAW-TOPSIS

x1 x2 xn

f1 ª v11 v12 v11nn º


« »
f 2 « v21 v22 2n »
Yw v ij
mun «
v2n
»,
« »
f m «v vmn »¼
¬ m1 vm 2 (6)

where vij W … fij , 1 d i d m, and 1 d j d n.

Step 4: Calculate the ranking value Rank vij of the IT2 FS v ij


ij,, using Eq (7). Create
the ranking for weighted decision matrix Yw* ,
Yw* Rank v iij
mun
,
(7)

Step 5: Calculate the positive-ideal solution


x 
v1 , v2 , , vm and the negative-

ideal solution
x 
v1 , v2 , , vm , where

­ max{Rank
°1d j d n
v },}
ij if fi  F1
vi ®
° min{Rank
¯1d j d n
v },
ij if fi  F2
(8)
and


­ min{Rank
°1d j d n
v },
ij if fi  F1
vi ®
° max{Rank
¯1d j d n
v },
ij if fi  F2
(9)
F F
where 1 denoted the set of benefit attributes, and 2 denotes the set of cost attributes.

Step 6: Find the distance


d xj
between each alternative x j and the positive ideal

solution x , using the Eq (10).

¦ Rank v  v
m

2
d xj ij

i ,
i 1
(10)

where 1 d j d n. Similarly, find the distance


d xj 
between each alternative x j and

the negative-ideal solution x , using the following equation.

¦ Rank v  v
m

2
d xj ij

i ,
i 1
(11)

Step 7: Calculate the degree of closeness


C xj
of x j with respect to the positive

ideal solution x , using the following equation.
L. Abdullah and C.W.R.A.C.W. Kamal / A New Integrating SAW-TOPSIS 49

xj

d

C xj ,

d xj  d xj (12)
where 1 d j d n.

Step 8: Arrange the values of in a descending order, and the larger value of
C xj


C x j , indicates the higher preference of the alternative x j ,
.

2. Numerical Example

For the purpose of illustration and to show the feasibility of the proposed method, an
example is presented. This example is retrieved from Chou et al. [5].
Researchers intend to identify the facility location alternatives to build a new plant.
The team has identified three alternatives which are alternative 1 ( A1 ) , alternative 2
( A2 ) , and alternative 3 ( A3 ) . To determine the best alternative site, a committee of
four decision makers is created; decision maker 1 ( D1 ) , decision maker 2 ( D2 ) ,
decision maker 3 ( D3 ) and decision maker 4 ( D4 ) . Three selection criteria are
deliberated: transportation availability (C1 ) , availability of skilled workers (C2 ) and
climatic conditions (C3 ) . Table 1 shows the linguistic terms used to rate criteria with
respect to alternatives and also the weights for criteria.
Table 1. Linguistic terms and IT2 FS
Linguistic Terms Interval Type-2 Fuzzy Sets
Very Poor (VP) ((0,0,0,0.1;1,1),(0,0,0,0.05;0.9,0.9))
Poor (P) ((0.0,0.1,0.1,0.3;1,1),(0.05,0.1,0.1,0.2;0.9,0.9))
Medium Poor (MP) ((0.1,0.3,0.3,0.5;1,1),(0.2,0.3,0.3,0.4;0.9,0.9))
Fair (F) ((0.3,0.5,0.5,0.7;1,1),(0.4,0.5,0.5,0.6;0.9,0.9))
Medium Good (MG) ((0.5,0.7,0.7,0.9;1,1),(0.6,0.7,0.7,0.8;0.9,0.9))
Good (G) ((0.7,0.9,0.9,1;1,1),(0.8,0.9,0.9,0.95;0.9,0.9))
Very Good (VG) ((0.9,1,1,1;1,1),(0.95,1,1,1;0.9,0.9))

Based on the ratings given by decision makers , the example is solved using the
proposed method. The final degree of closeness and preference are shown in Table 2.
Table 2. Degree of closeness and preference
Degree of closeness Preference order
C ( A1 ) 0.4112 3
C ( A2 ) 0.4605 2
C( A3 ) 0.4778 1

It can be seen that the preference order of the alternatives is A3 ; A2 ; A1. The
proposed method therefore decided that the best alternative is A3. This preference is
slightly inconsistent with the result obtained using the FSAW where the preference is
A2 ; A3 ; A1.
50 L. Abdullah and C.W.R.A.C.W. Kamal / A New Integrating SAW-TOPSIS

3. Conclusions

This paper proposed a novel method, which integrate IT2 FSAW and IT2 FTOPSIS to
solve MADM problems. Decision makers used interval type-2 linguistic variables to
assess the importance of the criterion. The ranking weighted decision matrix obtained
from IT2 FSAW was then used as an input to the IT2 FTOPSIS where ideal solutions
could be computed. Finally, preference of alternatives was obtained as a result of the
implementation using the integration method. To illustrate the feasibility of the
proposed method, a numerical example, that formerly solved using the FSAW method
was considered. The results showed that A3 is the most preferred alternative. Detailed
comparative analysis between the results obtained using the integrated method and
other decision making methods is left for future research. Future research may also
include sensitivity analysis where the uncertainty of the final preference of the
integrating model can be investigated.

Acknowledgments

This work is part of the research grant project FRGS 59389. We acknowledged the
financial support provided by Malaysian Ministry of Education and Universiti
Malaysia Terengganu.

References

[1] L. Abdullah, Fuzzy Multi Criteria Decision Making and its Application: A Brief Review of Category.
Procedia-Social and Behavioral Sciences, 97 (2013), 131-136.
[2] S. Vinodh, M. Prasanna, N. Hari Prakash, Integrated fuzzy AHP-TOPSIS for selecting the best plastic
recycling method: A case study. Applied Mathematical Modelling, 39 (2014),4662-4672.
[3] K. Rezaie, S. S. Ramiyani, S Nazari-Shirkouhi, A. Badizadeh, Evaluating performance of Iranian
cement firms using an integrated fuzzy AHP-VIKOR method. Applied Mathematical Modelling, 38
(2014), 5033-5046.
[4] T. Wang, J. Liu, J. Li, C. Niu, An integrating OWA–TOPSIS framework in intuitionistic fuzzy settings
for multiple attribute decision making, Computers & Industrial Engineering, 98(2016), 185-194.
[5] M. G. Kharat, S. J. Kamble, R. D. Raut, S. S Kamble, S. M. Dhume, Modeling landfill site selection
using an integrated fuzzy MCDM approach . Earth Systems and Environment, 2(2016), 53.
[6] D. Pamučar, G. Ćirović, The selection of transport and handling resources in logistics centers using
Multi-Attributive Border Approximation area Comparison (MABAC), Expert Systems with
Applications, 42(2015), 3016-3028.
[7] M. Tavana, E. Momeni, N. Rezaeiniya, S. M. Mirhedayatian, H. Rezaeiniya, A novel hybrid social media
platform selection model using fuzzy ANP and COPRAS-G, Expert Systems with Applications,
40(2013), 5694-5702.
[8] Y. C. Chang, S. M. Chen, A new fuzzy interpolative reasoning method based on interval type-2 fuzzy
sets. IEEE International Conferencte on Systems, Man and Cybernetics, (2008), 82-87.
[9] J. M. Mendel, R. I., John, F. Liu, Interval Type-2 Fuzzy Logic Systems Made Simple. IEEE
Transactions of Fuzzy Systems, 14 (2006), 808-821.
[10] L. Lee, S. Chen S, Fuzzy Multiple Attributes Group Decision-Making Based On The Extension Of
Topsis Method And Interval Type-2 Fuzzy Sets. Proceedings of the Seventh International Conference on
Machine Learning and Cybernetics, (2008), 3260-3265.
[11] J. S.Yao, K. Wu, Ranking fuzzy number based on decomposition principle and signed distance. Fuzzy
Sets and Systems, 116(2000), 275-288.
Fuzzy Systems and Data Mining II 51
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-51

Algorithms for Finding Oscillation Period


of Fuzzy Tensors
Ling CHEN a,b,1 Lin-Zhang LU a,c
a
School of Mathematical Sciences, Guizhou Normal University, GuiYang, Guizhou, P.
R.China 550001
b
School of Science, Shandong jianzhu University, JiNan, Shandong, P.R.China 250101
c
School of Mathematical Sciences, Xiamen University, Xiamen, Fujian, P.R.China
361005.

Abstract. In this paper, we focus on describing the oscillation period and index
of fuzzy tensor. The definition of the induced third-order fuzzy tensor is proposed.
By using this notion, firstly, the oscillation period and index of fuzzy tensor are
obtained on the basis of Power Method with max-min operation. Secondly, we rely
on Minimal Strong Component to find the oscillation period of fuzzy tensor. It
is a more practical graph theory method that the number of nonzero elements is
less than half of the sum of fuzzy tensor elements. Furthermore, numerical results
demonstrate the Power Method and the Minimal Strong Component two algorithms
for solving the period and index of fuzzy tensor which are effective and promising.

Keywords. Fuzzy tensors, oscillation period, minimal strong component

Introduction

In fuzzy mathematics, the study of fuzzy matrix is very complex but quite important
since it has a wide range of applications, especially in fuzzy control and fuzzy decision.
The object of fuzzy control is fuzzy system which is one of the important aspects of
fuzzy control system in which one can reach the stable state in limited time, and study
it’s stability by using the periodicity of fuzzy matrix. In order to study the multi-objective
fuzzy decision making and dynamic multiple objective fuzzy control, it is necessary to
investigate the higher order forms of fuzzy matrix.
The periodicity is one of the most important characteristics of fuzzy matrices.
Thomason [1] first proposed the powers of fuzzy matrix with convergence period or os-
cillation period. Fan and Liu [2] got the conclusion that the period of fuzzy matrix is
equal to the least common multiple of the period of its cutting matrix. Li [3] discussed
the periodicity of fuzzy matrices in the general case. Liu and Ji [4] described the peri-
odicity of square fuzzy matrices. Furthermore, in the same paper [3], and perfected the
conclusion of the upper bound of powers convergence index of fuzzy matrix, obtained
the greatest periodicity value of any square fuzzy matrix. So they solved the problem of
estimating period in the general fuzzy matrix.
1 Corresponding Author: Ling CHEN, School of Mathematical Sciences, Guizhou Normal University,

GuiYang, Guizhou, P. R.China; E-mail: chenling 100@163.com.


¯
52 L. Chen and L.-Z. Lu / Algorithms for Finding Oscillation Period of Fuzzy Tensors

Owing to the literatures [5,6,7], nowadays, many practical problems can be modeled
as tensor problems. For example, vector is one order tensor, matrix is a second order
tensor, three order or higher is called higher order tensor. [8] explain fast algorithm ex-
ponential data fitting and its application to exponential data fitting. [9]considered infinite
and finite dimensional Hilbert tensors and researched its periodicity. So generalizing the
tensor to the fuzzy tensor is practical and meaningful.
In this paper, we deal with the oscillation period and index of fuzzy tensor with max-
min operation. We find the oscillation period and index of fuzzy tensor by Power Method
and Minimal Strong Component in section 1 and in section 2, respectively. Our numerical
examples show the feasibility of the two proposed algorithms. Finally, in section 3, we
will give Conclusions.

1. A power method finding oscillation period and index of fuzzy tensor

In this section, we first describe some concepts and results about fuzzy matrices in the
literatures [1,2,3,4,10,11,12,13] , which will be used in the section. We give the definition
of fuzzy tensor, and analyze the periodicity and index of fuzzy tensor.
Let A = (aij ) and B = (bij ) be n × n fuzzy n matrices. We have the following
product definition: A × B = C = (cij ) = ( k=1 (aik ∧ bkj )), where aij ∧ bij =
min{aij , bij }, aij ∨ bij = max{aij , bij }. And Ak+1 = Ak × A, k = 1, 2, · · · . A = B
if aij = bij for all i, j ∈ {1, 2, · · · , n}.
Consider a finite number of fuzzy matrices A1 , A2 , · · · , An with any Ai ∈
F n×n , where F n×n denotes the set of all of n × n fuzzy matrices. We have F =
{A1 , A2 , · · · , An }.
Let Z + = {x|x be a position integer} and [n] be the least common multiple of
1, 2, · · · , n.
Referring to the relevant literatures [1,2,3,4,11], for convenience in application, we
propose an equivalent definition of the period of oscillation and the index of fuzzy matrix.

Definition 1. Let A be an n × n fuzzy matrix, there exist s, t ∈ Z + , such that


As+t = As , then we call d = min{t|As+t = As } the period of oscillation of A, and
k = min{s|As+d = As } the index of A.

Remark 1. The possible range of the period of fuzzy matrix is from 1 to [n], that is ,
1 ≤ d ≤ [n] and d|[n]. If d = 1, we say A is convergence.

Similar to the definition of tensor, in view of the characteristic of fuzzy matrix, we


will present the definition of fuzzy tensor as follows.

Definition 2. An order m dimension n fuzzy tensor A = (ai1 ···im ) consists of nm entries


0 ≤ ai1 · · · in ≤ 1, where ij = 1, · · · , n for j = 1, · · · , m.

For our purposes, throughout this paper, we always consider i1 · · · , im be the same
dimensional.
From the above definition of fuzzy tensor, clearly, a fuzzy tensor is higher order
generalization of a fuzzy matrix, and is also a tensor extension of characteristic function.
Next, we discuss the third order clustering of fuzzy tensor by using the slice-by-slice
method. For fuzzy tensor, we obtained second-dimensional sections by fixing all indices
L. Chen and L.-Z. Lu / Algorithms for Finding Oscillation Period of Fuzzy Tensors 53

(a)Horizontal slices (c)Fromtal slices


(b)Lateral slices

Figure 1. Slices of third-order fuzzy tensor.

expect for tow indices. Each slice is a fuzzy matrix. Fixing all indices but three indices,
we will define the induced third-order fuzzy tensor.

Definition 3. Let an order m dimension n fuzzy tensor A = (ai1 ···im ). Multiple third-
order fuzzy tensors clustering (Aij ik ih , A) of A are constructed by fixing all but three
indices. We call Aij ik ih is the induced third-order fuzzy tensor of A. Where ij , ik , ih ∈
{i1 · · · , im }.

By the third-order clustering theory, we shall explore the period and index of higher
order fuzzy tensor, which is converted into the study of third-order fuzzy tensor. A third-
order fuzzy tensor has the horizontal, lateral and frontal slices. Each direction contains
3 m−3
a set of fuzzy matrices. We obtain Cm n the induced third-order fuzzy tensors and
3 m−3
3Cm n sets of fuzzy matrices sequences by an order m dimension n fuzzy tensor.
Figure 1 shows the horizontal, lateral and frontal slices of the third-order fuzzy tensor
Aij ik ih , denoted by Aij :: , A:ik : and A::ih , respectively.
On the whole, it is far more intuitive and simpler to investigate higher order fuzzy
tensor with the help of geometric significance of third-order fuzzy tensor. Furthermore,
it is convenient to apply them to various fields.
Now, we introduce the period of induced third-order fuzzy tensor and the given fuzzy
tensor. The following result follows immediately by Difinition1.

Theorem 1. Let F = {A1 , A2 , · · · , An }. Then the oscillation period of F is the least


common multiple (l.c.m) the period of A1 , A2 , · · · , An , and the index of F is the largest
of the index of A1 , A2 , · · · , An .That is, suppose d1 , · · · , dn , dF and k1 , · · · , kn , kF is
the period oscillation and index of A1 , A2 , · · · , An , F respectively. We get
dF = l.c.m[d1 , · · · , dn ], kF = max{k1 , · · · , kn }.

Proof. Rebuilding a fuzzy matrix, for F = {A1 , A2 , · · · , An }, consider Ai (i = 1, 2 · · · ,


n) as a block fuzzy matrix, we have block diagonal matrix F = diag (A1 , A2 , · · · , An ).
by Definition 1 then dF = l.c.m[d1 , · · · , dn ] and kF = max{k1 , · · · , kn }.

From the geometric significance of 3-order fuzzy tensor, we state easily the main
conclusion as follows.

Theorem 2. Let Aij ik ih is the induced third-order fuzzy tensor of order m dimension n
fuzzy tensor A. Suppose d, dij , dik , dih and k, kij , kik , kih is the oscillation period and
index of A, Aij :: , A:ik : and A::ih , respectively. Then
d = l.c.m[d1 , · · · , dn ], k = max{k1 , · · · , kn }.
54 L. Chen and L.-Z. Lu / Algorithms for Finding Oscillation Period of Fuzzy Tensors

Table 1. Numerical data for Example 1


i3 i4 = 1 i4 = 2 i4 = 3 i4 = 4
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
0.3 0.1 0.8 0.9 0.1 0.2 0.1 0.4 0.7 0.9 0.8 0.9 0.3 0.9 0.5 0.5
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ 0.1 0.9 0.2 0.8 ⎟ ⎜ 0.8 0.1 0.1 0.4 ⎟ ⎜ 0.3 0.1 0.1 0.6 ⎟ ⎜ 0.6 0.5 0.4 0.8 ⎟
1 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ 0.5 0.1 0.2 0.7 ⎟ ⎜ 0.9 0.2 0.4 0.3 ⎟ ⎜ 0.4 0.6 0.6 0.6 ⎟ ⎜ 0.9 0.9 0.9 0.8 ⎟
⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠
⎛ 0.1 0.3 0.4 0.6 ⎞ ⎛ 0.3 0.6 0.8 0.4 ⎞ ⎛ 0.1 0.7 0.8 0.4 ⎞ ⎛ 05 0.4 0.2 0.7 ⎞
0.8 0.9 0.9 0.1 0.3 0.8 0.4 0.5 0.6 0.8 0.8 0.4 0.3 0.6 0.8 0.7
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ 0.3 0.9 0.8 0.6 ⎟ ⎜ 0.5 0.7 0.4 0.8 ⎟ ⎜ 0.3 0.4 0.9 0.3 ⎟ ⎜ 0.3 0.9 0.3 0.7 ⎟
2 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ 0.7 0.1 0.7 0.2 ⎟ ⎜ 0.9 0.6 0.9 0.4 ⎟ ⎜ 0.3 0.9 0.8 0.8 ⎟ ⎜ 0.7 0.3 0.6 0.3 ⎟
⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠
0.2 0.2 0.9 0.1 0.5 0.7 0.6 0.1 0.7 0.9 0.7 0.2
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ 0.5 0.7 0.4 0.9 ⎞
0.5 0.9 0.9 0.6 0.5 0.2 0.7 0.6 0.5 0.8 0.1 0.3 0.5 0.3 0.4 0.7
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ 0.9 0.5 0.3 0.1 ⎟ ⎜ 0.6 0.9 0.4 0.7 ⎟ ⎜ 0.4 0.5 0.2 0.4 ⎟ ⎜ 0.9 0.8 0.6 0.1 ⎟
3 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ 0.4 0.8 0.3 0.8 ⎟ ⎜ 0.8 0.7 0.6 0.3 ⎟ ⎜ 0.5 0.4 0.7 0.8 ⎟ ⎜ 0.8 0.3 0.3 0.2 ⎟
⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠
⎛ 0.4 0.2 0.8 0.2 ⎞ ⎛ 0.7 0.8 0.3 0.7 0.9 0.4 0.3 0.8 0.3 0.7 0.4 0.8
⎞ ⎛ ⎞ ⎛ ⎞
0.2 0.8 0.8 0.7 0.9 0.6 0.6 0.9 0.5 0.3 0.2 0.6 0.7 0.1 0.2 0.8
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ 0.9 0.9 0.7 0.3 ⎟ ⎜ 0.9 0.5 0.7 0.3 ⎟ ⎜ 0.9 0.5 0.8 0.3 ⎟ ⎜ 0.3 0.2 0.5 0.2 ⎟
4 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ 0.6 0.9 0.7 0.1 ⎟ ⎜ 0.3 0.7 0.5 0.5 ⎟ ⎜ 0.1 0.7 0.4 0.5 ⎟ ⎜ 0.5 0.5 0.1 0.4 ⎟
⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠
0.2 0.5 0.7 0.8 0.8 0.2 0.7 0.9 0.7 0.5 0.2 0.7 0.6 0.1 0.1 0.2

Proof. Using the block fuzzy matrices theory as Theorem 1 can prove the theorem.
Clearly, based on Definition 3 and Theorem 2, we have the following result.

Theorem 3. Let an order m dimension n fuzzy tensor A = (ai1 ···im ). Multiple third-
order fuzzy tensors clustering (Aij ik ih , A) of A, Aij ik ih is the induced third-order fuzzy
tensor of A, then the oscillation period D of fuzzy tensor A is the least common multiple
of the oscillation period all of induced third-order fuzzy tensors, and the index K of A is
the max of the index all of induced third-order fuzzy tensors.

Proof. By using block theory, the proof can be done.


By the algorithm in [3] of finding the oscillation period and index of fuzzy matrix,
we give here a power method for the oscillation period and index of fuzzy tensor. From
the above discussion, the following algorithm can be given naturally.
Algorithm 1(A power method finding oscillation period and index of fuzzy tensor).
Input: An order n dimension fuzzy tensor A = (ai1 ···im ).
Output: The oscillation period and index of fuzzy tensor as D and K.
Step 1. Choose ij , ik , ih ∈ {i1 , · · · , im }, let Aij ik ih = (aij ik ih ).
Step 2. By using Definition 1 and Theorem 1, compute dij , dik , dih and kij , kik , kih .
Step 3. By Theorem 2 compute d and k.
Step 4. Until ij , ik , ih throughout all i1 , · · · , im , repeat Step 1-Step 3.
Step 5. Based on Theorem 3 calculate D and K from the above all d and k.
Next, to demonstrate that Algorithm 1 works for fuzzy tensor, we test the following
example whose codes are written in R language.

Example 1. Let A be a 4-order fuzzy tensor with dimension four which is defined by
Table 1. For m = 4, we have the induced 3-order fuzzy tensor Ai1 i2 i3 , Ai1 i2 i4 , Ai1 i3 i4
and Ai2 i3 i4 . For Ai1 i2 i3 , if i4 = 1, we have the induced 3-order fuzzy tensor Ai1 i2 i3 1
L. Chen and L.-Z. Lu / Algorithms for Finding Oscillation Period of Fuzzy Tensors 55

contains data denoted by Ai1 i2 i3 1 = (A(:, :, 1, 1), A(:, :, 2, 1), A(:, :, 3, 1), A(:, :, 4, 1)),
and we obtain three sets of fuzzy matrices F11 , F21 , F31 by fixed one indices in turn i1 , i2 ,
i3 , where Fi1 = {A1 , A2 , A3 , A4 }, i = 1, 2, 3.
Consider all of fuzzy matrices Fi1 (i = 1, 2, 3) by Definition 1 and Theorem 1:
dF1 = [1, 1, 1, 1] = 1, kF11 = max{4, 4, 4, 4} = 4; dF21 = [2, 1, 1, 1] = 2, kF21 =
1

max{4, 3, 2, 2} = 4; dF31 = [2, 1, 2, 1] = 2, kF31 = max{3, 4, 5, 3} = 5. So,


the oscillation period d1 and index k 1 of fuzzy tensor Ai1 i2 i3 1 are as follows: d1 =
[dF11 , dF21 , dF31 ] = [1, 2, 2] = 2, k 1 = max{kF11 , kF21 , kF31 } = max{4, 4, 5} = 5.
If i4 = 2, i4 = 3, i4 = 4 we have: d2 = [2, 1, 1] = 2, k 2 = max{3, 6, 5} = 6; d3 =
[2, 1, 2] = 2, k 3 = max{3, 4, 4} = 4; d4 = [2, 3, 2] = 6, k 4 = max{5, 5, 4} = 5.
Hence, the oscillation period d1 and index k1 of fuzzy tensor Ai1 i2 i3 are: d1 =
[d1 , d2 , d3 , d4 ] = [2, 2, 2, 6] = 6, k1 = max{k 1 , k 2 , k 3 , k 4 } = max{5, 6, 4, 5} = 6.
Similar to the above analysis for Ai1 i2 i4 , Ai1 i3 i4 , Ai2 i3 i4 we obtain d2 = [2, 2, 2, 2] =
2, k2 = max{4, 4, 6, 5} = 6; d3 = [2, 2, 2, 6] = 6, k2 = max{6, 5, 6, 4} = 6; d4 =
[2, 2, 2, 2] = 2, k2 = max{5, 5, 5, 5} = 5. Based on Theorem 3 we get the oscillation
period D and index K of fuzzy tensor A: D = [d1 , d2 , d3 , d4 ] = [6, 2, 6, 2] = 6, K =
max{k1 , k2 k3 , k4 } = {6, 6, 6, 5} = 6.
This example verified the feasibility and correctness of the Algorithm 1.

2. Using minimal strong component for period of fuzzy tensor

In this section, by using graph theory tools, we give a method to find oscillation period
of fuzzy tensor. When m and n are not large and the number of nonzero elements is less
than half of the sum of fuzzy tensor elements, with the minimal strong component than
Power Method for oscillation period is simple and does not need much calculation. The
following definition in [11].
Let ΦA denote the set of all nonzero elements of fuzzy matrix A, for any λ ∈ ΦA ,
we call Aλ = (aλ )ij the cut matrix of A, where (aλ )ij = 1 if aij ≥ λ, else (aλ )ij = 0.
We follow [14,15,4] to show the period of Boolean matrix by strong components
and express the period of fuzzy matrix with minimal strong component. Furthermore, we
shall find the period of fuzzy tensor based on the minimal strong component.

Definition 4. (See [4]Definition4.2) We call S a strong component of a fuzzy matrix A,


if there is a λ ∈ ΦA such that S is a strong component of cut matrix Aλ .

If D is the digraph of fuzzy matrix A and S is a strong component, we let d(S) be


the period of S, Ω denote the set of minimal strong component of fuzzy matrix A.

Theorem 4. (See [4], Theorem 4.9). If A is an n × n fuzzy matrix, Ω = {s1 , s2 , · · · sw },


then d(A) = [d(si )], where si ∈ Ω.

According to the above discussion, we can develop the following algorithm for the
oscillation period of fuzzy tensor by minimal strong component.

Algorithm 2(Minimal strong component finding oscillation period of fuzzy tensor).


Input: An order n dimension fuzzy tensor A = (ai1 ···im ).
Step 1. Establish the induced 3-order fuzzy tensor Aij ik ih by Definition 3.
56 L. Chen and L.-Z. Lu / Algorithms for Finding Oscillation Period of Fuzzy Tensors

Table 2. Numerical data for Example 2


A(:, :, 1) A(:, :, 2) A(:, :, 3)
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
0.5 0.3 0.4 0 0 0 0 0.8 0 0 0 0.1 0.8 0 0.5 0
0 0
⎜ 0 0.3 0 0 0 ⎟ ⎜ 0.5 0 0 ⎟ ⎜ 0 0 0 0.4 0 0.5 ⎟
⎜ 0 ⎟ ⎜ 0 0.3 0 ⎟ ⎜ ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ 0.4 0 0 0.3 0 0 ⎟ ⎜ 0 0.7 0 0 0.5 0 ⎟ ⎜ 0.5 0 0 0 0.4 0 ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ 0 0 0 0.3 0 ⎟ ⎜ 0.3 0 0.2 ⎟ ⎜ 0 0.5 0 0 0 0.4 ⎟
⎜ 0 ⎟ ⎜ 0.3 0 0 ⎟ ⎜ ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ 0 0 0 0.4 0 0 ⎠ ⎝ 0 0 0 0.5 0 0 ⎠ ⎝ 0 0 0.4 0 0 0 ⎠
0 0 0 0 0.3 0 0 0 0.2 0 0 0 0 0 0 0 0 0

a
1 a2

a1
a4
a
3
a
3

a
a 5 a
1 4 a6 a5
(a)Digraph of D0.5 (b)Digraph of D0.4 (c)Digraph of D0.3

Figure 2. Digraph of A(:, :, 1).

Step 2. Create Fij , Fik , Fih by Aij :: , A:ik : and A::ih .


Step 3. Compute the period of fuzzy matrix of Fij , Fik , Fih by minimal strong compo-
nent .
Step 4. According to Theorem 1, obtaining the period of Fij , Fik , Fih .
Step 5. Calculate the period of the induced 3-order fuzzy tensor Aij ik ih by Theorem 2 .
Step 6. Figure out the oscillation period of fuzzy tensor A by Theorem 3 .
To illustrate the Algorithm 2 works for fuzzy tensor, we test the following example.

Example 2. Let A be a third order six dimensional fuzzy tensor A = (A(:, :, 1), A(:, :
, 2), A(:, :, 3)) defined by Table 2.
For A(:, :, 1) See Figure 2 we have λ1 = 0, λ2 = 0.3, λ3 = 0.4, λ4 = 0.5 then
digraph Di (i = 1, 2, 3) can be represented as follows.
In D0.5 there is only one strong component S1 = {a1 }. In D0.4 there is one
strong component S2 = {a1 , a3 }. In D0.3 there are two strong components S3 =
{a1 , a2 , a3 }, S4 = {a4 , a5 }.
Notice that S4 is a strong component which has no common vertices with S1 , S2 , S3 .
Hence, we say that S4 is a newly appeared strong component. Moreover, we obtain that
the set of minimal strong components of fuzzy matrix A(:, :, 1) is Ω = {S1 , S4 }. Then
d(A(; , ; , 1)) = [d(s1 ), d(s4 )] = [1, 2] = 2.
Consider A(:, :, 2) and A(:, :, 3) we have d(A(; , ; , 2)) = [2, 3] = 6, d(A(; , ; , 3)) =
[1, 2, 2] = 2. Then d(A) = [d(A(:, :, 1)), d(A(:, :, 1), d(A(:, :, 1)] = [2, 6, 2] = 6.
This example illustrates Algorithm 2 has one great advantage that only uses the
directed graph of spare fuzzy matrix can find it’s oscillation period , and there is no need
the troublesome calculations.
L. Chen and L.-Z. Lu / Algorithms for Finding Oscillation Period of Fuzzy Tensors 57

3. Conclusions

In this paper, we proposed fuzzy tensor which is a new class of nonnegative tensor and
which is a form of higher order of fuzzy matrix. We gave the definition of the induced
third-order fuzzy tensor that has an advantage of intuitive geometric significance. Based
on these concepts, we investigated the oscillation period and index of fuzzy tensor with
the help of Power Method and Minimal Strong Component, respectively. Our numerical
results showed that two methods are feasible and favourable. Hence, it is necessary to
research many more properties of fuzzy tensors. In the future, we will continue to mull
over all aspects of fuzzy tensor.

Acknowledgements

The work of the first author was supported by Innovation Foundation of Guizhou Nor-
mal University for Graduate Students(201529, 201528), and the Shandong province Col-
lege’s Outstanding Young Teachers Domestic Visiting Scholar Program(2013). The work
of the second author was supported by the National Science Foundation of China (Grant
Nos.11261012).

References

[1] M.G.Thomason, Convergence of powers of a fuzzy matrix, Journal of Mathematical Analysis and Appli-
cations, 57(1977), 476–480.
[2] Z.T.Fan, D.F.Liu, On the oscillating power sequence of a fuzzy matrix, Fuzzy Sets and Systems, 93(1998),
75–85.
[3] J.X.Li, Periodicity of powers of fuzzy matrices, Fuzzy Sets and Systems, 48(1992), 365–369.
[4] W.B.Liu, Z.J.Ji, The periodicity of square fuzzy matrices based on minimal strong components, Fuzzy
Sets and Systems, 126(2002), 233–240.
[5] L.Q.Qi, Eigenvalues of a real supersymmetric tensor, Journal of Symbolic Computation, 40(2005), 1302–
1324.
[6] L.H.Lim, Singular values and eigenvalues of tensors: A variational approach, Proceeding of the IEEE
Internatinal Workshop on Computation advances in multi-tensor adaptive processing, 1(2005), 129–132.
[7] T.G.Kolda , B.W.Bader, Tensor decomposition and applications, SIAM Review, 51(2009), 455–500.
[8] W.Y.Ding, L.Q.Qi, YM Wei, Fast Hankel tensorĺCvector product and its application to exponential data
fitting, Linear Algebra and its Applications, 22(2015), 814–832.
[9] Y.Song, L.Q.Qi, Infinite and finite dimensional Hilbert tensors, Linear Algebra and its Applications,
451(2014), 1–14.
[10] C.Z.Luo, Introduction to fuzzy sets (Vol.1), Beijing Normal University Press,Beijing,(In Chinese), 1989.
[11] Z.T.Fan, D.F.Liu, On the power sequence a fuzzy matrix-Convergent power sequence, Journal of Com-
putational and Applied Mathematics, 4(1997), 147–165.
[12] L.A.Zadeh, Fuzzy sets, Information and Control, 8(1965), 338–353.
[13] S.G.Guu, Y.Y.Lur, C.T.Pang, On infinite products of fuzzy matrices, SIAM Journal on Matrix Analysis
and Applications, 22(2001), 1190–1203.
[14] B.De.Schutter, B.DE.Moor, On the sequence of consecutive powers of a matrix in a Boolean algebra,
SIAM Journal on Matrix Analysis and Applications, 21(1999), 328–354.
[15] K.H.Kim, Boolean matrix theory and application, Marcel Dekker, New York, 1982.
58 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-58

Toward a Fuzzy Minimum Cost Flow


Problem for Damageable Items
Transportation
Si-Chao LU1 and Xi-Fu WANG
School of Traffic and Transportation, Beijing Jiaotong University, Beijing, China

Abstract. In this paper, we have proposed a mathematical formulation of fuzzy


minimum cost flow problem for damageable items transportation. For the
imprecise model, capacity, cost, percentage of unit damage of each route have
been considered as triangular fuzzy numbers. This problem has been solved by
using the k-preference integration method, the area compensation method, and the
signed distance method. Finally, to show the validity of the proposed model, a
numerical example is provided and solved with Wolfram Mathematica 9.

Keywords. minimum cost flow, k-preference integration, area compensation,


signed distance

Introduction

As a classic combinatorial problem, the minimum cost flow problem has a wide range
of applications and ramifications. In the logistics industry, it is common for decision
makers to generate a plan to optimally transport damageable items from multiple
sources to multiple destinations through transshipment stations. Furthermore,
impreciseness in defining parameters such as the cost per unit on one route is another
commonly appeared problem in realistic environment. Therefore, this paper is devoted
to solve this problem.
With respect to the fuzzy minimum cost flow problem [1], there exist a lot of
fruitful outcomes. In the fuzzy minimum cost flow problem proposed by Shih and Lee
[2], the cost parameter and capacity constraints are taken as fuzzy numbers. In addition,
they proposed a fuzzy multiple objective minimum cost flow problem and used
minimization of the total passing time as the second objective in an example. Ding
proposed an ǩ-minimum cost flow problem to deal with uncertain capacities [3].
However, few studies refer to adaption of this problem to the damageable items
transportation. A close related problem is the multi-objective, multi-item intuitionistic
fuzzy solid transportation problem for damageable items which was proposed by
Chakraborty et al [4]. To defuzzify the imprecise parameters, we use the k-preference
integration method, the area compensation method, and the signed distance method
respectively. Computations to solve the problem are done by using the Wolfram
Mathematica 9.

1
Corresponding Author. Si-Chao LU, School of Traffic and Transportation, Beijing Jiaotong
University, Beijing, China; E-mail: lusichao@163.com.
S.-C. Lu and X.-F. Wang / Toward a Fuzzy Minimum Cost Flow Problem 59

The remaining of the paper is organized as follows: The next section offers a brief
introduction to fuzzy numbers and three defuzzification methods. The mathematical
model of fuzzy minimum cost flow problem for damageable items is proposed in
Section 2. A simulated problem instance is given and solved in Section 3. Finally, the
paper is concluded in Section 4.

1. Fuzzy Preliminaries

1.1. Fuzzy Numbers

Definition 2.1. If X is a universe of discourse, then a fuzzy number  in X is defined as:


={(x, (x)) | x∈X} (1)
where  : X [0,1] is a mapping called the membership function of x∈X in . [5]

Definition 2.2. A triangular fuzzy number (TFN)  can be defined as = ( 1 , 2 , 3 ),


which is shown in Figure 1. The membership function of  is determined in Eq. (2) [5].

0 x ≤ a1

⎪ x − a1
⎪ a1 ≤ x ≤ a2

( ) = aa2 − a1
⎨ 3−x (2)
a2 ≤ x ≤ a3
⎪a3 − a2

⎩ 0 a3 ≤ x
Figure 1. A triangular fuzzy number.

1.2. K-Preference Integration Representation Method

The k-preference integration method was introduced by Chen and Hsieh [6]. Based on
this method, the k-preference integration representation of a general TFN = (a1 , a2 , a3 )
is defined as:
1 1
1
 =  h[kL-1 (h)+(1-k)R-1 (h)] dh h dh = [ka1 +2a2 +(1-k)a3 ]
Pk  (3)
 0 3
From Eq.(3), it can be obviously seen that the k-preference integration is fairly
flexible compared with other defuzzification methods, because the value of k is
determined by the decision maker. It has been used to handle the fuzzy cold storage
problem [7] and the constrained knapsack problem in fuzzy environment [8].
If k=0.5 then the result generated by k-preference integration method will be the
same as that obtained by graded mean integration (GMI) method which was introduced
by Chen and Heieh [9].

1.3. Area Compensation Method

=(a1 , a2 , a3 ) can be
Based on the area compensation method [10], the TFN A
defuzzified as:
60 S.-C. Lu and X.-F. Wang / Toward a Fuzzy Minimum Cost Flow Problem

a
2 3 a
∫a h
(x)dh+ ∫a h
(x)dh (a3 − a1 )(a1 +a2 +a3 )⁄6 a1 +a2 +a3
 =
ΦA A 1
a2 a3

= = (4)
∫a 
(x)dh+ ∫a 
(x)dh (a3 − a1 )⁄2 3
1 

1.4. Singed Distance Method

The left and the right α-cuts of the TFN A = (a1 , a2 , a3 ) are L-1 (α) = a+(b-a)α and
R-1 (α) = c-(c-b)α [11]. Based on the ranking system for fuzzy numbers proposed by
is defined as follows:
Yao and Wu [12], the signed distance of the A
1 1
1 1 1
,0) =  [L-1 (α)+R-1 (α)] dα =  [a+(b-a)α+c-(c-b)α] dα= (a+2b+c)
d(A (5)
2 0 2 0 4
Shekarian et al. [11] combined this method with existing economic production
quantity models to find optimal production quantities.

2. Mathematical Formulation

The fuzzy minimum cost flow problem for damageable items transportation blends the
fuzzy set theory and the minimum cost flow problem. The objective of the proposed
problem is to minimize the total cost of sending the available supply through
transshipment nodes to satisfy the demand. It is also necessary to introduce constraints
that guarantee the feasibility of flows.
Let G=(N, A) be a directed network with node set N={1,2,3,…,n} and arc set A.
Each arc aij ∈ A stands for a route and has a positive upper bound capacity uij and a
positive cost cij . Both uij =(ulij ,uij , urij ) and cij =(clij ,cij , crij ) are taken as triangular fuzzy
numbers, because some vehicles may provide a small degree of leeway of capacity [5]
and the transportation cost of each route tends to vary. Each node i ∈ N has a bi , which
represents the nature of node n. If node i is a supply node then bi >0, if node i is a
demand node then bi <0, if node i is a transshipment node then bi =0. We use a TFN
l r
α
=(α
ij ij ,αij ,αij ) to denote the percentage of unit damage products for the route aij due to
physical vibration caused by bad road condition or improper driving behaviors etc. xij is
the decision variable which denotes the flow quantity through route aij .
Based on the above descriptions, the mathematical formulation can be developed
as follows.
min Z = ∑ni=1 ∑nj=1 cij xij (6)
s.t.
∑nj=1 xij - ∑nj=1 (1-α
ji )xji =bi , ∀i ∈ {i|bi ≥0} (7)
∑nj=1 xij - ∑nj=1 (1-α
ji )xji ≤bi , ∀i ∈ {i|bi <0} (8)
0≤ xij ≤ uij , ∀i, ∀j (9)
∑ni=1 bi -∑ni=1 ∑nj=1 α
ij xij ≥0 (10)
Here (6) indicates the cost minimization objective function. Constraint (7) and
constraint (8) represent the net flow of node i under two different situations
respectively. In addition, constraint (8) implies that demand nodes can be satisfied with
excessive items. Constraint (9) ensures that the total amount of transported damageable
items is less or equal to the capacity of route aij . Constraint (10) guarantees that the
S.-C. Lu and X.-F. Wang / Toward a Fuzzy Minimum Cost Flow Problem 61

total amount of items provided by the supply nodes is no less than the amount of
damaged items plus the total amount of items required by the demand nodes.
Based on the k-preference integration method, Eq. (6)-Eq. (10) can be redefined as
follows, where kc , kα , ku can be determined differently under the decision maker’s
preference.
r
min Z = ∑ni=1 ∑nj=1 [kc clij +2cij +(1-kc )cij ]xij ⁄3 (11)
s.t.
r
∑nj=1 xij - ∑nj=1 {1- [kα αlji +2αji +(1-kα )αji ]⁄3 }xji =bi , ∀i ∈ {i|bi ≥0} (12)
r
∑nj=1 xij - ∑nj=1 {1- [kα αlji +2αji +(1-kα )αji ]⁄3 }xji ≤bi , ∀i ∈ {i|bi <0} (13)
0≤ xij ≤ [ku ulij +2uij +(1-ku )urij ]⁄3, ∀i, ∀j (14)
n n n l r
∑i=1 bi -∑i=1 ∑j=1 (kα αij +2αij +(1-kα )αij )xij ⁄3 ≥0 (15)
Applying the area compensation method, Eq. (6)-Eq. (10) can be written in the
following form:
min Z = ∑ni=1 ∑nj=1 (clij +cij +crij )xij ⁄3 (16)
s.t.
∑nj=1 xij - ∑nj=1 [1- (αlji +αji +αrji )⁄3 ]xji =bi , ∀i ∈ {i|bi ≥0} (17)
n n l r
∑j=1 xij - ∑j=1 [1- (αji +αji +αji )⁄3 ]xji ≤bi , ∀i ∈ {i|bi <0} (18)
0≤ xij ≤ (ulij +uij +urij )⁄3, ∀i, ∀j (19)
∑ni=1 bi -∑ni=1 ∑nj=1 (αlij +αij +αrij )xij ⁄3 ≥0 (20)
Similarly, with the help of the signed distance method, Eq. (6)-Eq. (10) can be
expressed as:
min Z = ∑ni=1 ∑nj=1 (clij +2cij +crij )xij ⁄4 (21)
s.t.
∑nj=1 xij - ∑nj=1 [1- (αlji +2αji +αrji )⁄4 ]xji =bi , ∀i ∈ {i|bi ≥0} (22)
n n l r
∑j=1 xij - ∑j=1 [1- (αji +2αji +αji )⁄4 ]xji ≤bi , ∀i ∈ {i|bi <0} (23)
l r ⁄
0≤ xij ≤ (uij +2uij +uij ) 4, ∀i, ∀j (24)
∑ni=1 bi -∑ni=1 ∑nj=1 (αlij +2αij +αrij )xij ⁄4 ≥0 (25)

3. Numerical Experiment

The case in this section is adapted from an example in [1], which copes with the crisp
model of the minimum cost flow problem. Assume 60 units and 40 units of damageable
items are supplied by node A and node B, whereas no less than 30 units and 60 units of
damageable items are required by node D and node E respectively. Node C is a
transshipment node. Capacities and costs of the routes cannot be determined precisely
in advance. If the route aij has no specified capacity, then uij can be regarded as a large
number and hence be ignored in the mathematical model. Critical parameters of this
problem instance are shown in Figure 2.
Given that this problem is small-scale and hence can be solved by exact algorithms,
we use the Wolfram Mathematica 9 to generate optimal solutions. The imprecise
parameters are defuzzified using three methods, which are the k-preference integration
method, the area compensation method, and the signed distance method. To simplify
the problem, we let k=kc =kα =ku . Mathematical formulations and results by using the
62 S.-C. Lu and X.-F. Wang / Toward a Fuzzy Minimum Cost Flow Problem

GMI method and the area compensation method are shown in Figure 3 and Figure 4.
Computational results are shown in Table 1.

Figure 2. Network representation of a fuzzy minimum cost flow problem for damageable items
transportation.

Figure 3. Mathematical formulation and results by using the GMI method with Mathematica.

Figure 4. Mathematical formulation and results by using the area compensation method with Mathematica.

From Table 1, it can be clearly seen that the GMI method, the area compensation
method, and the signed distance method generated similar results. Furthermore,
decremented cost can be obtained when k is increased, which proves the correctness of
the fuzzification by using the k-preference integration method.
S.-C. Lu and X.-F. Wang / Toward a Fuzzy Minimum Cost Flow Problem 63

Table 1. Solutions obtained using the k-preference integration method, the signed distance method, and the
area compensation method

k=0.5 Signed Area


Variable k=0 k=0.2 k=0.8 k=1
(GMI) Distance Compensation
xAB 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xAC 40.50 40.11 39.52 38.93 38.53 39.28 39.03
xAD 19.50 19.89 20.48 21.07 21.47 20.72 20.97
xBC 40.00 40.00 40.00 40.00 40.00 40.00 40.00
xCE 80.50 80.11 79.52 78.93 78.53 79.28 79.03
xDE 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xED 11.22 10.92 10.48 10.04 9.75 10.21 9.94
Z 573.17 569.32 563.29 557.28 553.20 564.35 563.96

4. Conclusion

In this paper, we have presented a minimum cost flow problem for damageable items
transportation in imprecise environment. After defuzzifying the fuzzy parameters with
k-preference integration method, area compensation method, and the signed distance
method, the optimal flow can be obtained with Wolfram Mathematica.
There are three major avenues for future work. First, more defuzzification methods
such as the credibility measure method [8] or using the violence tolerance level [13]
can be used and the results could be compared to a further step. Second, more objective
functions could be added and more item properties can be considered. Finally, given
that Das et al. successfully solved a multi-objective solid transportation problem with
type-2 fuzzy variable [14], some parameters in this models could also be taken as type-
2 fuzzy numbers to better describe the problem and defuzzified to generate optimal
solutions.

References

[1] F. S. Hillier and G. J. Lieberman, Introduction to Operations Research (Ninth Edition), McGraw-Hill,
New York, 2010.
[2] H. S. Shih and E. S. Lee, Fuzzy multi-level minimum cost flow problems, Fuzzy Sets & Systems,
107(1999), 159-176.
[3] S. Ding, Uncertain minimum cost flow problem, Soft Computing, 18 (2014), 2201-2207.
[4] D. Chakraborty, D. K. Jana, T. K. Roy, Expected value of intuitionistic fuzzy number and its application
to solve multi-objective multi-item solid transportation problem for damageable items in intuitionistic
fuzzy environment, Journal of Intelligent & Fuzzy Systems, 30 (2016), 1109-1122.
[5] H. J. Zimmermann, Fuzzy Set Theory-and Its Applications, Fourth Edition, Luwer Academic Publishers,
Norwell, 2001
[6] S. H. Chen and C. H. Hsieh, A new method of representing generalized fuzzy number, Tamsui Oxford
Journal of Management Sciences, 13-14 (1998), 133-143.
[7] S. Lu and X. Wang, Modeling the Fuzzy Cold Storage Problem and Its Solution by a Discrete Firefly
Algorithm, Journal of Intelligent and Fuzzy Systems, 31(2016), 2431-2440.
[8] C. Changdar, G. S. Mahapatra, and R.K. Pal, An improved genetic algorithm based approach to solve
constrained knapsack problem in fuzzy environment, Expert Systems with Applications 42 (2015),
2276-2286.
64 S.-C. Lu and X.-F. Wang / Toward a Fuzzy Minimum Cost Flow Problem

[9] S. H. Chen and C. C. Wang, Representation, ranking, distance, and similarity of fuzzy numbers with step
form membership function using k-preference integration method, Joint 9th. IFSA World Congress and
20th NAFIPS International Conference, 2 (2001). IEEE, 801-806.
[10] S. K. De and I. Beg, Triangular dense fuzzy sets and new defuzzification methods, Journal of
Intelligent and Fuzzy Systems, 31(1) (2016), 469-477.
[11] E. Shekarian, C. H. Glock, S.M.P. Amiri, K. Schwindl, Optimal manufacturing lot size for a single-
stage production system with rework in a fuzzy environment, Journal of Intelligent and Fuzzy Systems
27 (2014), 3067-3080.
[12] J. S. Yao, K. Wu, Ranking fuzzy numbers based on decomposition principle and signed distance, Fuzzy
Sets and Systems, 116 (2000), 275-288.
[13] J. Brito, F. J. Martinez, J. A. Moreno, J. L. Verdegay, Fuzzy optimization for distribution of frozen
food with imprecise times, Fuzzy Optimization and Decision Making, 11 (2012), 337-349.
[14] A. Das, U. K. Bera, M. Maiti. Defuzzification of trapezoidal type-2 fuzzy variables and its application
to solid transportation problem, Journal of Intelligent and Fuzzy Systems, 30 (2016), 2431-2445.
Fuzzy Systems and Data Mining II 65
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-65

Research on the Application of Data


Mining in the Field of Electronic
Commerce
Xia SONG 1 and Fang HUANG
Shandong Agricultural Engineering Institute, Jinan, Shandong, China

Abstract. E-Commerce is a business mode based on internet and information


technology. Data mining techniques are widely used in E-Commerce for digging
out patterns and retrieving information from large scale noisy datasets. The
booming of E-Commerce enables businesses to collect large amount of data which
could be analyzed for enhancing revenues. The abundant data collected online is
the foundation of big data analysis. How to employ data mining models on
strategizing and making business decisions is an important topic in recent years.
This paper talks about data mining and its application in E-Commerce. The
application of data mining in electronic commerce developed based on data mining
technology of electronic commerce system to strengthen the ability of business
information analysis, it is concluded that the intrinsic relationship between data
and extract the useful information, to provide the expected information of the
electronic commerce for the business management personnel, to ensure the
effective operation of the electronic commerce. Data mining techniques could be
used for automated data analysis, pattern identification, information retrieving,
business strategizing as well as providing personalized services.

Keywords. E-Commerce, big data, data mining, case

Introduction

Electronic commerce is a new commerce mode in the field of business, refers to the use
of digital electronic technology to carry out all the business activities, it to the Internet
as the main body, with information technology as the core. Electronic commerce
appear to businesses and individuals to bring new opportunities and challenges,
promote the process of network of the traditional business model, change the business
activities of enterprises and personal consumption, to achieve the business activities of
the digital and intelligent.
Electronic commerce development prompted internal collected a lot of data, is in
urgent need of these data into useful information and knowledge, for enterprises to
create more potential profit. Internet access from the massive data enable data mining
with rich data base, using data mining technology can effectively help the enterprise to
highly automated analysis data, makes an inductive reasoning, found hidden in a
subsequent regularity, extract the effective information, guide enterprises to adjust their
marketing strategies, make the right business decisions, at the same time, provide the

1
Corresponding Author: Xia SONG, Shandong Agricultural Engineering Institute, Jinan Shandong,
China; E-mail: 643549139@qq.com
66 X. Song and F. Huang / Research on the Application of DM in the Field of Electronic Commerce

dynamic personalized and efficient service for customers, improve the core
competitiveness of enterprises.

1. E-commerce and Data Mining

E-Commerce is a business mode based on internet and information technology. It shifts


the traditional business mode and individual’s consumption patterns as more trades and
deals are carried out online. The booming of E-Commerce enables businesses to collect
large amount of data which could be analyzed for enhancing revenues. The abundant
data collected online is the foundation of big data analysis. Data mining techniques
could be used for automated data analysis, pattern identification, information retrieving,
business strategizing as well as providing personalized services. E-Commerce business
could develop online business system which uses data mining techniques to analyze
online business data, identifying correlations within the data and making predictions of
the market.

1.1. E-commerce

Electronic commerce is refers to individuals or enterprises on the Internet as the carrier,


the digital electronic exchanges business data and business activities [1]. E-Commerce
attracts users by its low cost, convenience, high reliability and free of time and space
constraint. There are lots of E-Commerce activities in China nowadays which include
online advertising, electronic business notes exchange, online shopping and online
payment as well as B2B, B2C and C2C business mode.
With the rapid development of network technology and database technology,
electronic commerce is more and more strong vitality, the amount of online
transactions rose year by year, but the development of electronic commerce, but also to
the traditional enterprise has brought many new problems. To increase the enterprise
electronic commerce, e-commerce platform, the emergence of a large number of
shopping websites, have all kinds of business information, these "big data" in a huge
commercial value. However, in the face of the huge amount of structural diversity,
different types of information, enterprises should be how to organize and utilize, to get
to their own value or is associated with their own demand information? The application
of data mining technology in electronic business has become an inevitable choice. The
data mining technology from noisy, disorderly data extraction, to extract the potential
unknown and useful data, and gives the logical reasoning and Visual interpretation, to
facilitate business decision-makers in a timely manner to grasp the market dynamics, to
make a reasonable decision in real time.
The application of data mining in electronic commerce developed based on data
mining technology of electronic commerce system to strengthen the ability of business
information analysis, it is concluded that the intrinsic relationship between data and
extract the useful information, to provide the expected information of the electronic
commerce for the business management personnel, to ensure the effective operation of
the electronic commerce. Many large electronic business enterprises (such as Taobao,
Jingdong Mall, etc.) provide a variety of data mining tools for managers to use in order
to increase sales, for customer relationship management also have very good help.
X. Song and F. Huang / Research on the Application of DM in the Field of Electronic Commerce 67

1.2. Data Mining

Data mining (DM), also known as the database knowledge discovery (knowledge
discovery in database (KDD), is from a large number of, completely, noisy, fuzzy and
stochastic data, extraction implied in them, people do not know in advance, but is
potentially useful information and knowledge process [2]. Data mining is a cross
discipline, it is a gathering of the database technology, artificial intelligence, machine
learning, data visualization, pattern recognition, parallel computing multiple fields of
knowledge.
Data mining is a new business information processing technology, is according to
the enterprise established business objectives, a large number of enterprise database
business data extraction, conversion, analysis and handling of other models, the
extraction key data that is helpful to business decision, reveal the hidden, unknown or
validation of known regularity and further advanced and effective method of the model.
In the electronic commerce data mining, Web mining, is to use data mining
technology from WWW resources (Web document) and behavior (Web service) to
automatically find and extract interesting and useful patterns and information [3].Web
data have 3 types: the Web data of HTML markers, Web document in the connection
structure data and user access data. According to the corresponding data type, Web
mining can be divided into 3 categories: Web content mining, is from the Web
document or the description of the process of knowledge selection; Web structure
mining, closed system is derived from the organizational structure and knowledge link
Web, its purpose is through clustering and analysis of Web links, web page structure
and useful patterns, find authoritative web pages; Web usage mining, is through mining
storage access log on Web, to discover patterns and potential customers users access
the Web, and other information of the process.

2. Data Mining Methodologies in E-Commerce

2.1. Correlation Analysis

Correlation analysis digging out hidden correlation within the dataset. For example, it
could analyze the correlation of different items in one online purchase. If the customer
buys an item A then the model could predict the probability of the customer buys item
B based on the correlation of A and B. A Prior algorithm is the most commonly used
method for correlation analysis [4].

2.2. Cluster Analysis

Cluster analysis is a technique that clustering objects into different groups. It could be
used to cluster customers with similar interests or items with common characteristics.
The most widely used clustering algorithms include: hierarchical clustering, centroid-
based clustering, distribution-based clustering and density-based clustering [5].
Cluster analysis is commonly used in E-Commerce for sub-dividing client groups.
The algorithm could cluster clients into different subgroups by analyzing the
similarities of their consumption patterns. The business owner could then make
different strategies and provide personalized services for different target groups.
68 X. Song and F. Huang / Research on the Application of DM in the Field of Electronic Commerce

2.3. Data Categorization

Data categorization is the process of classifying items by analyzing their certain


properties [6]. It solves for the optimal categorizing rules based on training data and
uses the rule to categorize data other than the training set. The most popular
categorizing algorithms include genetic algorithm, Bayesian classification and neural
network.
The goal of data categorization is to classify an item into a specific class. It could
be used for analyzing existing data as well as making predictions. The algorithm builds
a classification model based on existing training data and use the model to predict
possible reactions of customers with different characteristics. Business could make
personalized service for different categories of customers.

2.4. Serial Pattern Analysis

Like correlation analysis, serial pattern analysis identifies correlations between


different items. Serial pattern analysis focuses on analyzing time series data; it makes
prediction based on time series models. For example, it may discover that within a
certain time period, the purchase pattern of buying A then buying B then buying C has
a high occurring frequency [7]. It could dig out such bundles with high frequencies by
analyzing the purchasing data.
Serial pattern analysis enables the business to predict inquiry patterns of the
customers and then pushing advertisement and services that may meet customers’
demand accordingly.

3. Application of Data Mining in E-Commerce

Data mining is a powerful tool and provides informed guidance in the decision making
process of E-Commerce. It seeks pattern in the sea of unorganized internet traffic and
discovers valuable information to support decision making and strategy development.
Data mining is widely used in product positioning and purchasing behavior
analysis to formulate marketing strategy. It can also be applied to forecast sales market
by analyzing purchasing patterns. Currently, all the major data companies start to
embed the data mining function into their own products. For example, those giant
players such as IBM and Microsoft all incorporate the online analysis function into
their corresponding products. By mining the customer information including
customers’ visit behavior, visit content and visit frequency, the E-commerce
recommendation system based on data mining is able to analyze customer features and
to conclude their visiting patterns in order to offer tailor-made service and product
recommendation catering to customer need.
Data mining technique can figure out the correlation algorithm among products by
analyzing the portfolio in shopping cart and then fixing customers’ purchasing
behaviors accordingly, hereby generating the marketing strategy associated with
commodity displays, bundling sales and marketing promotion. The major task for
correlation analysis is to digging out hidden correlations within the dataset. One
example is the purchase of bread and butter implies the purchase of milk: over 90% of
customers who buy bread and butter will purchase milk as well. The business owner
could make better item bundling by analyzing the correlation among different goods.
X. Song and F. Huang / Research on the Application of DM in the Field of Electronic Commerce 69

The sales manager in Wal-Mart has found a surprising fact that the beer and
nappies, two apparently irrelevant products, are always purchased together [8]. This
phenomenon is frequently observed among young fathers who are used to pick up beers
when they are required to buy nappies in supermarket. This motivated the grocery store
to move the beer isle closer to the diaper, instant increase in sales of both.
In the E-commerce, by figuring out all the similar association rules through data
mining, online vendors can recommend commodities to customers based on their
existing products in shopping cart, thus enhancing the cross-selling. Furthermore, by
offering personalized commodity information and advertisement, customer interest and
loyalty are also expected to be increased.

4. Application Cases

Data mining techniques are used in E-Commerce for analyzing online inquiries, online
trades and registration information. It usually takes steps such as define business scope,
data collection, data preprocessing, model construction and evaluation, output analysis
and evaluation [9]. The steps above are usually repeated and iterated to get more
accurate results.
Data mining is playing more and more important role in E-Commerce. There are
successful cases applying data mining related theory and technology to the E-
Commerce [10].This section discusses the application of data mining in customer
segmentation on Taobao.com. Purchase behavior and sales behavior coexist on Taobao
platform. Experts suggest using the following 15 key factors and weights for
classifying customers and predict their behaviors, as shown in Table 1.

Table 1. purchase behavior and sales behavior


Influence Factors Weight
Purchase Voluntary phone inquire or onsite help 11.2
Behavior Show interest in product and inquire about promotion 10.3
69% Budget for web promotion 8.5
Have or in the process of hiring trade specialists 8.1
Used to e-commerce 7.5
Respond to EFAX/EDM/Phone Promotion 6.5
Participated in Alibaba conferences such as marketing, training and business 6.1
development
Experience with third party B2B web platform 5.4
Experience with overseas trade exhibition or domestic export exhibition 5.3
Sales Attempt to sign sales contract in the coming month 8.1
Behavior In direct competition with competitors 7.5
31% Presence of director, manager or colleague in sales process 5.4
Made proposal to clients 4.4
Client visit in one year 2.9
Open house in one year 2.6

Formula: Client score S=в(Influence Factor*Weight). By clustering result we get


the four tiers of clients: (1) S҆50, 90% potential client㧔2) 23҅S㧨50㧘50% potential
client㧔3) 11҅S㧨23㧘25% potential client㧔4) 0҅S㧨11㧘first time visit client.
By segmenting current customers and studying their responses to existing
marketing and promotion strategies, companies could design more targeted strategies
on how to communicate to each segment of customers.
70 X. Song and F. Huang / Research on the Application of DM in the Field of Electronic Commerce

5. Conclusion

E-Commerce is developing rapidly and generating tons of data to analyze. Data mining
enables businesses to predict market trend and customer’s behaviors, it also helps to
provide personalized service and push personalized advertisements. Business may
enhance the revenue by forming better strategies with the help of data mining analysis.
Data mining in E-Commerce will enjoy further development with the progress on
hardware technology and algorithms research as the accumulation of application
experiences.

References

[1] S. Z. Zhang, X. K. Qu, L. Zhang, Research on the Web data mining based on Electronic-Commerce,
Modern Computer, 03(2015),12–17.
[2] H. M. Wu, Sales data mining technology and e-commerce application research, Guangdong University of
Technology,2014
[3] Y. N. Zhang. Application of web data mining in e-commerce. Fujian computer, 05(2013),138-140
[4] J. X. Wu, Research on web data mining and its application in E-Commerce, Information System
Engineering, 01 (2010), 15–18.
[5] X. J. Chen, Research on data mining in electronic commerce, Information and Computer, 05(2014),135
[6] H. Y. Lu, Application of data mining techniques in e-commerce, Network and Information Engineering
(2014),73-75
[7] L. Huang, Research on the application of Web data mining in e-commerce, Hunan University,2014
[8] Y. Gao. Beer and diapers. Tsinghua University press, 2008
[9] S. Liu, Application of Web data mining technology for e-commerce analysis, Electronic technology and
software engineering, 07(2014),216-7
[10] China statistics web. Application of data mining in e-commerce.
http://www.itongji.cn/datamining/hangye/dianzishangwuzhongshujuwajuefangfadeyingyong/ 2010
Fuzzy Systems and Data Mining II 71
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-71

A Fuzzy MEBN Ontology Language Based


on OWL2
Zhi-Yun ZHENG, Zhuo-Yun LIU, Lun LI, Dun LI1 and Zhen-Fei WANG
School of Information Engineering, Zhengzhou University㧘
Zhengzhou 450001, China

Abstract. With the rapid development of the Semantic Web research, the demand
for representation and reasoning with uncertain information increases. Despite
ontology is capable of modeling the semantic and knowledge in knowledge-based
system, classical ontology languages are not appropriate to deal with uncertainty in
knowledge, which is inherent in most of the real world application domains. In this
paper, we address this issue by extending the power of expression in current
ontology language, that is, proposing a Fuzzy Multi Entity Bayesian Networks
ontology language which extends the PR-OWL and based on combination of
Fuzzy MEBN and ontology, defining and studying its syntax and semantics, and
showing representation of domain knowledge by RDF graphs. The proposed
language Fuzzy PR-OWL will move beyond the current limitation of modeling the
knowledge with fuzzy semantic or fuzzy relation in PR-OWL. By providing a
principled means of uncertainty representation and reasoning, Fuzzy PR-OWL can
serve for many applications with fuzzy and probability knowledge.

Keywords. the Semantic Web, Fuzzy MEBN, ontology language, PR-OWL

Introduction

With the rapid development of information technology, the techniques of data


collection, data storage, and high performance computing have gained significant
improvement. As some recent surveys, the amount of data around the world doubles
every 20 months. The mountainous amounts and various types of data complicate the
data relations. To enable the computer to automatically process and integrate valuable
data from the Internet, the semantic web, which aimed at seamless interoperability and
information exchange among web applications, and rapid and accurate identification
and invocation of appropriate web services [1], is put forward.
Nevertheless, there are several immature aspects in this area which need further
improvement. Specifically, as semantic services become more ambitious, there is
increasing demand for principled approaches to the formal representation under
uncertainty, such as incompleteness, randomness, vagueness, ambiguity and
inconsistency [2]. All these require reasonable semantic expression and enhanced
semantic inference. However, there are not enough existing theories and practices to
solve these problems well.

1
Corresponding Author. Dun LI, School of Information Engineering, Zhengzhou University,
Zhengzhou 450001, China ; E-mail: ielidun@zzu.edu.cn; iedli@zzu.edu.cn.
72 Z.-Y. Zheng et al. / A Fuzzy MEBN Ontology Language Based on OWL2

Multi-Entity Bayesian Networks (MEBN) [3] is a theoretically rich language that


expressively handles semantics analysis and effectively model uncertainty management.
Albert its practical usefulness in many aspects, but MEBN lacks the capability of
modeling the fuzzy knowledge and concept. To address this problem, the Fuzzy MEBN
(Fuzzy Multi Entity Bayesian Networks) [4-5] has been proposed in recent years, it is
able to deal with ambiguous semantics and uncertain causal relationships between
knowledge entities [5]. In this paper, we present an ontology-based Fuzzy MEBN
solution termed as Fuzzy PR-OWL (Fuzzy Probability Web Ontology Language), an
extension to the OWL2 [6]. This is an attempt at modeling both probability and fuzzy
information by ontology.
To present more details, the structure of this paper is as follows. Section 1
comparative analyzes BN and MEBN and illustrates the advantages of MEBN as well
as Fuzzy MEBN. Then the Fuzzy MEBN ontology (Fuzzy PR-OWL) is illustrated in
Section 2. Section 3 presents the representation of domain ontology using Fuzzy
PR-OWL in RDF graph form. Finally, we set out some conclusions along with future
works in Section 4.

1. Related Research

1.1. BN and MEBN

The main models of the current uncertainty representation and reasoning of semantic
web are Probabilistic and Dempster-Shafer Models, Fuzzy and Possibilistic Models [1],
etc. The representative models in probabilistic models are mainly BN and MEBN,
which ontology languages based on are BayesOWL [7] and PR-OWL2 [8-9].
BN has the ability to deal with uncertain and probabilistic events and incomplete
data set according to the causality or other type of relationships in events. However,
standard BNs has limitations of the relational information representation. Figure 1a
shows a BN that represents the probability knowledge of bronchitis, that is, smoking
may cause bronchitis, and colds that may be incurred by factors like bad weather can
also lead to airway inflammation. The BN clearly shows causation of the patient’s
illness, but it cannot represent relational information such as the effect of harmful gas
produced by others smoking on the patient. While MEBN takes advantage of first-order
logic that makes it overcome the limitations of BN. Figure 1b, where ovals present
resident node, trapezoids present input node, and pentagons present context node,
shows that person and other are entities of the class Person and context rule
other=peopleAround (person), which may link to another MFrag, defines other is
people around the person. So MEBN can represent the relationship between entities
and take effect of others’ smoking on the probability of the patient having bronchitis
into account via parent node getCold (other).
In reality, however, the experience or knowledge of human beings is
characteristic of fuzziness that can’t be dealt with by MEBN. As the example above,
the impact of slight cold must differ from a bad cold. Though MEBN can represent the
possibility of getting a cold, for instance, getCold{true 1, false 0} where 1 and 0
represent probabilities, it cannot represent the degree of cold. Another situation of
value of resident node is state. For example, suppose the weather has two states {sunny,
cloudy}. MEBN assigns the probability to these states, say {sunny 0.5, cloudy 0.5}, but
situations like partly cloudy can’t be dealt with by MEBN.
Z.-Y. Zheng et al. / A Fuzzy MEBN Ontology Language Based on OWL2 73

Figure 1. Representing bronchitis knowledge in BN and MEBN

1.2. Fuzzy MEBN

Fuzzy MEBN redefines the semantics specifications of normal MEBN by incorporating


concepts of First-order Fuzzy Logic (FOFL) [10]. Therefore, contextual constraints of
MEBN are generalized in a way to represent the ambiguity which is usually delivered
with the imperfect semantic information. Moreover, Fuzzy MEBN updates regular BN
of MEBN to Fuzzy Bayesian Networks (FBN). Therefore, the fuzzy or ambiguity
information in section 1.1, that MEBN lacks the capability to process, can be dealt with
by Fuzzy MEBN. For example, slight cold can be represented as {true0.3 1, false0.7 0}
where the subscripts denote true values and partly cloudy can be set {sunny0.6 0.5,
cloudy0.4 0.5} where the subscripts denote membership degree.
The major differences between Fuzzy MEBN and MEBN are, in Fuzzy MEBN,
phenomenal (non-logical) constant symbols and Entity identifier symbols are followed
by a real-valued membership degree subscript within region [0,1], such as Vehicle0.85
and !V4280.75, and truth value symbols or logical findings are assigned a truth value or
from the predefined finite chain of truth values  = 〈! , " , … , # 〉.
The building blocks of a MEBN Theory (MTheory) are MEBN Fragments
(MFrags) that semantically and causally represent a specific notion of the knowledge.
Basic model of Fuzzy MEBN is similar to those of regular MEBN. The FMFrag can
define a probability distribution and some fuzzy rules of a resident node given
input/parent and context nodes.
A Fuzzy MFrag (FMFrag) [5] F = (C, I, R,G,D,S) consists of three kinds of nodes,
that is context nodes C, input nodes I and resident nodes R. Context nodes using
FOFL sentences to represent semantic structures of knowledge. Input nodes connect
resident nodes in other FMFrags, Finally, resident nodes are random variables
conditional on the values of the context and input nodes. Besides, G represents an
FMFrag graph set, D contains local distributions one for each resident node, and the
set S of fuzzy if-then rules used by the Fuzzy Inference System (FIS). It is worth
noted that the sets C, R and I are pairwise disjoint, and graph G is a directed acyclic
graph whose nodes belong to I∪R, and the root nodes correspond to members of I
only.
74 Z.-Y. Zheng et al. / A Fuzzy MEBN Ontology Language Based on OWL2

2. Fuzzy PR-OWL

2.1. Elements

Figure 2 shows the classes of ontology language Fuzzy PR-OWL created by


protégé-4.1 [11]. Fuzzy PR-OWL extends PR-OWL with some properties and classes
such as fuzzy random variable, fuzzy states, membership degree, and fuzzy rule sets
(FRS) to increase expressive power.
2.Finding 2.Finding
2.Domain Resident Input
Resident
2.Generative
2.Finding Input
2.Resident
FMFrag 2.Input 2.If-Part

2.Domain 4.Membership
FMFrag 2.Context 2.If-Then Rules
1.FMFrag 1.Node
2.FExemplar 2.Then-Part
Argument 1.FRS
2.OVArgument 4.Probability 2.State Assignment
Assignment
4.FArgument 1.FMTheory
2.FConstant 2.Declarative
Argument 1.Probability Distribution
Distribution
2.FMapping 4.conditioning
Argument 2.FMExpression
1.Fuzzy state
Argument 2.FPR-OWL
1.FRandom Variable MExpression
table
3.Ordinary
Variable 3.Fuzzy 2.TrueValue
LogicalOperator 2.Simple FMExpression
FMExpression
2.TrueValue
Random Variable 1. (Main Classes/elements)
3.FExemplar 2. (SubClasses)

3.Quantifier 3. (Built-in Elements)

4. (Reified Relationships)

Figure 2. Elements of Fuzzy PR-OWL


Table 1 presents corresponding relationships between the elements of Fuzzy
MEBN, FOFL [12] and FuzzyPR-OWL. As shown in Table 1, the ontology proposed
in this paper can be represented as the sentence of Fuzzy MEBN based on FOFL.
Table 1. Corresponding relationships between FOFL, Fuzzy MEBN and FuzzyPR-OWL

Fuzzy MEBN FOFL FuzzyPR-OWL


Symbols for the general  ,
Quantifiers Class:Quantifier
existential quantifiers ∃
Ordinary variable symbols Variables x, y,… Class:OrdinaryVariable
Phenomenal constant symbols Constants: c, d,… Class: ConstantArgument
Truth value symbols Symbols for truth value: a; Class: TrueValueRandomVariable
Data Property: hasUID
Entity identifier symbols
(Range: Thing, Domain: string)
Binary connectives: v*, ∧∗ ,&*,
Logical connectives Class: FLogicalOperator
Ÿ* ,equality operator =
Class: FindingFMFrag,
Findings
FindingResidentNode
Domain- Logical random
n- ary predicate symbols p, q,… Class: TrueValueRandomVariable
specific variables
random Phenomenal random
n-ary functional symbols f, g,… Class: FRandomVariable
variable variables

2.2. Syntax

An overview about the basic model of the Fuzzy PR-OWL is described in Figure 3. In
this diagram, an oval and an arrow represent general class and major relationship
Z.-Y. Zheng et al. / A Fuzzy MEBN Ontology Language Based on OWL2 75

between classes, respectively. A probabilistic ontology has at least one individual of


class FMTheory which contains a group of FMFrags. For the syntax of FuzzyPR-OWL,
the link is expressed via the object property has FMFrag.
Individuals of class FMFrag are comprised of nodes. Each individual of class
Node is a random variable. Compared with PR-OWL, the major difference of this
ontology is using the FRS to define membership degrees of fuzzy states. The object
property hasFRS links one node to one or many FRS. Represented by class Probability
Distribution, the unconditional or conditional probability distributions of the random
variable are linked to its respective nodes via the object property hasProbDist. Finally,
logical expressions based on FOFL or simple expression that can describe random
variables in which arguments that may refer to entities are represented by class
FMExpression and linked to nodes via the object property hasFMExpression.
Includes
FMTheory FMFrag
(hasFMFrag)

Is built from
(hasNode)

Has context constraints Is defined by Probability


FMExpression Node
(hasFMExpression) (hasProbDist) Distribution

Has rules
(hasFRS)

Fuzzy Rule Set

Figure 3. Basic model of Fuzzy PR-OWL


Syntax of Fuzzy PR-OWL extends the abstract syntax of OWL. The syntax rules
can be defined as the Extended Backus Naur Form(EBNF) where definition symbols
can be represented via sign ::=, terminal symbols are enclosed with quotation marks
followed by a semicolon as terminating character, an alternative symbol can be
represented by the vertical bar | , an option where everything may present just once or
never and expressions that may be omitted or repeated can be represented through
squared brackets [...] and curly braces {...}, respectively. In this paper, expressions
that repeat or only present once can be represented through curly braces {...} + and
FMTheory needs URI reference to be identified. The fundamental structure of Fuzzy
PR-OWL presented as follows:
FMTheory ::= ‘FMTheory(’ [URI reference] |annotation| {FMFrag}+ ‘)’;
FMFrag ::= ‘FMFrag(’ FMFrag_id ‘,’ {Node}+ ‘,’{ ParentRel} ‘)’;
Node ::= ‘Node (’ Node_id ‘,’ FMExpression [‘,’ ProbilityDistribution ‘,’ { If-ThenRule } ] ‘)’;
ParentRel ::= ‘hasParent (’ Node ‘,’ Node ’)’;
*_id ::=’ UID(‘ letter { letter | digit } ’)’;
In literature [2], FRS can be defined in the form of if-then rules. For example, the
conditional probability of a variable with two parents can be shown as -(. = /! |0! =
-, 0" = 1). For the form of an if-then rule, such relation can be indicated as: ‘If P 1 is p
and P 2 is q, then V is v 1 ’, wherein all the states are fuzzy states which are denoted as
4 = [{056 }78 ] where 056 and 6 are the probability distribution and membership
degree of ith state, respectively. While according to the FBN formula of MEBN in
literature [4], the FRS only defines membership degree. Therefore, probabilities and
membership degrees are respectively defined through conditional distributions and
if-then rules in this paper. Next we will present models of conditional probability
distributions, FRS and fuzzy expression to show the syntactic structure of Fuzzy
PR-OWL.
76 Z.-Y. Zheng et al. / A Fuzzy MEBN Ontology Language Based on OWL2

x Conditional Probability Distributions


A node’s probability distribution depends on the state configuration of its parents.
PR-OWL2 uses string to present the probability distribution, but this approach need
syntactic parser to analyze declarative syntax. To embed more probability information
to ontology, this paper describes probability distribution by ontology.
In FuzzyPR-OWL, class ProbabilityAssignment indicates the assignment of
probabilities in condition of parents’ state which presented by class ConditioningState.
Class StateAssignment indicates assignments of a state, such as state name and
probability, as illustrated in Figure 4. The basic structure is defined below:
ProbabilityDistribution ::= FPR-OWLTable;
FPR-OWLTable ::= ‘FPR-OWLTable(’PRTable_id‘,’{ProbabilityAssignment }+‘)’;
ProbabilityAssignment ::=‘ProbabilityAssignment(’ProbabilityAssignment_id‘,’
{StateAssignment}+ [’,’ { CondingtioningState}+ ] ‘)’;
StateAssignment ::= ‘State (stateName(’ string’)’ [‘, stateProbability (’float’)’] ‘)’;
CondingtioningState ::= ‘CondState(’ Node_id ‘,’ StateAssignment ‘)’;
string ::= letter { letter | digit };
Probability ResidentNode
Distribution FRS
/InputNode

ResidentNode * 1 *
/InputNode FPR-OWL Table
1 1 1 1 If-Then Rule
* *
1 1
* 1 1 1
Probability
*

ConditioningState StateAssignment * 1
Assignment
1 1 If-Part StateAssignment Then-Part
1 1 1 *

Figure 4. The model of conditional distributions Figure 5. The model of FRS


x FRS
Fuzzy PR-OWL adopts If-Then Rule to define FRS and constrain membership
degree of fuzzy states. As showed in Figure 5, an If-Then Rule of resident nodes may
include one or more If-Part and a Then-Part. Every instance of If-Part corresponds to
an assumption of a parent node and the instance of Then-Part corresponds to the
assignment of fuzzy states in resident node. The structure of If-Then Rule is defined
below:
FRS ::= If-ThenRule;
If-ThenRule ::= ‘If-ThenRule(’ If-ThenRule_id ‘, ’ {If-Part}+ ‘,’ Then-Part ‘)’;
If-Part ::= ‘if (’Node_id‘, ’ StateAssignment ‘)’;
Then-Part ::= ‘then(’ {StateAssignment}+ ’)’;
StateAssignment ::= ‘State (stateName(’ string’)’ [‘,’MembershipDegree] ‘) ’;
MembershipDegree ::=’Membership( degree(’ float ‘)’[‘ ,descript(’ string’)’];
1
*
* FMFrag
1 Ordinary
Node 1 Variable
*
1 *
1 Exemplar 1
Exemplar
1 FArgument
*

FMExpression
FArgument
*
1 *
*
RandomVariable *
1

Figure 6. The model of fuzzy expression


Z.-Y. Zheng et al. / A Fuzzy MEBN Ontology Language Based on OWL2 77

x Fuzzy Expression
As showed in Figure 6, this part proposes the model of fuzzy expression that can
represent the constraints or fuzzy relationships between entities.
The expression represents the relationship between entities in FuzzyPR-OWL
wherein the class FMExpression can present true value expression of the context node
or simple expression of other kind of nodes. The former indicates logical expressions
based on FOFL and the latter can be deemed as random variables of input nodes or
resident nodes with some arguments. The class Exemplar here indicates the general or
existential quantifiers in fuzzy expression which is in the form of Skolem. The
structure of fuzzy expression is defined below:
FMExpression::= ‘FMExpression(’FMExpression_id‘,’[ exits|forAll Exemplar_id ’,’] Expression ‘)’;
Expression::= Term [“and”|”or” Term] [“=”Term] [”implies”|’iff’ Term];
Term::=[“not”] RandomVariable_id [’(’Argument_id{,Argument_id}+’)’] | FMExpression_id|
OrdinaryVariable_id;
RandomVariable::=‘RandomVariable(’ RandomVariable_id ‘, hasPossibleValues(’ {URI
reference}+’)’ [‘,defineUcertaintyOf(’URI’)’] [‘,probDistr(‘PrTable_id’)’]
[‘,trueValue(‘float’)’]’)’;
OrdinaryVariable::=‘OrdinaryVariable(’ OrdinaryVariable_id ‘, ( class(’ DomainClass_URI ’))’;
Argument ::= ‘Argument(’Argument_id‘,’[‘type(’Thing‘)’][‘,typeOfData(’ Literal’)’ ]
[‘,‘MembershipDegree ‘])’;
Exemplar ::= ‘Exemplar (‘Exemplar_id ‘,’[‘type(’ Thing ‘)’] [‘,typeOfData(’ Literal’)’ ] )’;
[‘,’MembershipDegree ])’;

2.3. Semantic

The structure of FuzzyPR-OWL language LF based on Fuzzy MEBN is defined by


interpretations of FOFL TF [11].
The structure 9 = 〈9: , 0?>; , … ; A?>; , … ; BC, /C, … 〉is a 4-tuple with the follow structure:
x DI is a nonempty set called the domain of the structure;
x {0 CCCC
D } are n-ary relations adjoined to each n-ary predicate symbol{06 }㧧
# #

x EA CCDC#CFare n-ary (ordinary) functions defined on D I and adjoined to each n-ary


functional symbol{A6# }㧧
x BC, /̅ , … ∈ 9: are elements which are assigned to each constant u, v of the
language LF.
Assume that LF contains one constant 5 ∈ HI associated with each element
5̅ ∈ 9: (a name of d). Let B ∈ HI be a constant. Then its interpretation is an
element J(B) ∈ 9: . Let A CCC#C be a function assigned to A # and L! , L" , . . . , L# be terms
K
without variables, then JA # (L! , L" , . . . , L# ) = CACKC#C(L! , L" , . . . , L# ).
The fuzzy function, defined in fuzzy set theory, can be deemed as special fuzzy
relations. Note that functional symbols are introduced for the sake of completeness,
since they can be replaced by special predicates [11]. Consider the corresponding
relationships between elements of LF and that of TF shown in table 2, the definition of
D can be further illustrated below.
x LF using entity identifier symbols N identifies entities or elements which are
assigned to the constants.
x The phenomenal random variables and logical random variables in LF
represent the fuzzy functions and predicates respectively. The former possible
78 Z.-Y. Zheng et al. / A Fuzzy MEBN Ontology Language Based on OWL2

values can be N ∪ {⊥}, and the latter can be either a real number  ∈ [0,1], or
a member of the chain  = 〈! , " , … , # 〉.
The random variables mentioned here can be represented as the expressions in
section 2.2. The probability and the membership degree of possible values of functions
can be assigned by joint probability distribution, and the If-Then rules, respectively. LF
uses phenomenal random variable with n-ary arguments to represent a function. The
(V ) (V ) (V )
function A ̅: ∆→7 Umaps a vector of entity identifier symbols ∆= 〈N! W , N"  , … , N# X 〉,
like input arguments, into the vector of identifier symbols U =
(Z ) (Z ) (Z )
〈Y! W , Y"  , … , Y\ ^ 〉,like fuzzy state or fuzzy value assignment, where the value of 
for various arrangements of arguments and possible values are predefined in the
language by the fuzzy interpretation of A,̅ which can also be represented as the fuzzy
relation [12] in which the truth values of a relation of input set are resulted, that is
_: 〈`, U〉 →  ∈ {! , " , … , # }. By the matching of domain entity identifier symbols and
domain entities, the function or relation can map the n-ary vector of domain entities
into the entities for phenomenal random variables or true values of domain assertions
for logical random variables.

3. Use Case

In the Equipment Diagnosis problem, the belt status and room temperature can affect
the engine status. This problem represented by an EngineStatus FMFrag is shown in
figure 7. In the figure, Isa(Machine,m) represents that m is an instance of machine,
EngineStatus(m), BeltStatus(b) and RoomTemp(r) represent the engine status of
machine m, status of belt b and temperature of room r, respectively. Suppose that the
engine status node has a local distribution shown in table 2 where superscripts denote
the membership degree.
Table 2 Local distribution of EngineStatus FMFrag
RoomTemp(r) BeltStatus(b) EngineStatus(m)
(Normalα1;Hotα2) (OKβ1;Brokenβ2) Satisfactoryα1 Overheatedα2
Normal OK 0.8 0.2 0
Normal Broken 0.6 0.4 0
… … … … …

isA(m,Machine) isA(r,Room)
...
m=BeltLocation(b) Isa(Belt,b)
MachineLocation(m)
R=MachineLocation(m)
MachineLocation_FMFrag

BeltStatus(b) RoomTemp(r)

EngineStatus(m)

EngineStatus_FMFrag

EquipmentDiagnosis_FMTheory

Figure 7. EngineStatus FMFrag


Representing the probability distribution of EngineStatus in the FuzzyPR-OWL is
shown in Figure 8. The upper large parallelogram and the nether large parallelogram
represent Fuzzy PR-OWL ontology and domain ontology, respectively. This figure
shows part of information in table 2 that is the probability distribution of node
Z.-Y. Zheng et al. / A Fuzzy MEBN Ontology Language Based on OWL2 79

EngineStatus which includes the probability assignment of states like OverHeated once
conditioning state of parent BeltStatus is OK.
fpr:Probability fpr:hasProbability fpr:Probability fpr:InputNode
Distribution Assignment Assignment (ResidentNode)

fpr:hasState fpr:hasCond
fpr:hasCondNode
Assign State

fpr:hasState fpr:Conditioning
fpr:StateAssignment
fpr:hasState Assign State
string Name fpr:hasState
Prob
float

subClassOf InstanceOf InstanceOf InstanceOf InstanceOf


Domain Ontology
es:EngineStatus fpr:hasProbability es:EngineStatus es:BeltStatus
Table Assignment PA1 InputNode
fpr:hasState fpr:hasCond
0.2 Assign fpr:hasCondNode
fpr:hasState State
Prob
fpr:hasState es:EngineStatusSA1 es:EngineStatus
Overheated Name ConditioningState
fpr:hasState
Assign
fpr:hasState
OK es:BeltStatusSA1
Name Parent Node EngineStatus

...
… BeltStatus Overheated …
… OK 0.2 …

Figure 8. Representation of the probability distribution


The FRS of Engine Status which denotes the membership degree of states in
condition of assignment of parent nodes is shown in Figure 9. The RDF graph shows if
state of parent node BeltStatus is Normal OK (suppose words that describe degree
include Very, Normal, A little), then membership degree of state Overheated for
EngineStatus is 0.5.
FuzzyPR-OWL ontology
fpr:hasThen
fpr:Then-Part Part
fpr:FRS fpr:ResidentNode
\InputNode
fpr:hasState
Assign fpr:hasIfPart
string fpr:hasCondNode
fpr:hasState
Name
fpr:StateAssignmt fpr:hasState
fpr:Membership fpr:hasMembership fpr:If-Part
Assign
fpr:hasMembership
Degree
float

InstanceOf InstanceOf InstanceOf SubClassOf InstanceOf InstanceOf


Domain Ontology
Es:EngineStatus fpr:hasThen Es:EngineStatus
Then-Part Part
es:BeltStatus
IfThenRule1
ResidentNode
fpr:hasState
Overheated fpr:hasIfPart fpr:hasCondNode
fpr:hasState Assign
Name
es:EngineStatus es:EngineStatus es:EngineStatus
Membership1 fpr:hasMembership
SA1 If-Part
fpr:hasMembership fpr:hasState
Degree Assign
es:BeltStatus
0.5 SA1
fpr:hasState If BeltStatus is Nomal OK and ,
fpr:hasDegreeDiscription Name
OK then degreeOf(EngineStatus) is
Normal {Overheated 0.5 , }

Figure 9. Representation of FRS


es:EquipmentDiagnosis_ r=MachineLocation(m)
FMTheory
fpr:hasFMFrag fpr:hasFMFrag

es:DomainFMFrag.Enginestate es:DomainFMFrag.BeltLocation

fpr:hasResidentNode
fpr:hasContextNode
fpr:hasOV fpr:hasOV
es:BeltLocation
es:ContextNode_C es:Enginestate_ _DomainRes ...
es:Enginestate_
X Mfrag.mechine
Mfrag.room fpr:hasFMExp
fpr:hasFMExp fpr:isSubstutedBy es:MachineLoc_
fpr:typeOfFArg FMExp
fpr:isSubstutedBy
File:/...#Machine
fpr:typeOfFArg es:FMExpression_CX1 fpr:typeOfFMExp
es:CX1_2_
File:/...Engine.owl es:MachineLoc_
inner_1
#Room fpr:hasFArg fpr:hasFArg RandomVariable
fpr:typeOfFMExp
es:CX1_1 fpr:hasFArg
es:CX1_2 fpr:typeOfFMExp
fpr:typeOfFArg
fpr:hasArgNumber es:equalTo es:CX1_2_inner_
fpr:hasArgNumber FMExp
1
fpr:hasPossibleValue 2

0.9

Figure 10. Representation of fuzzy expression


The fuzzy expression r=MachineLocation(m) of the context node is shown in
Figure 10. The expression defines the relation between room r and machine m which is
80 Z.-Y. Zheng et al. / A Fuzzy MEBN Ontology Language Based on OWL2

connected to another FMFrag MachineLocation. The deep color ovals constitute the
main parts of the expression, including logical connective equalTo with a truth value,
arguments CX1_1 and CX1_2 which respectively correspond to the ordinary variable
room in EngineStatus FMFrag and random variable MachineLocation(m) in
BeltLocation FMFrag.

4. Conclusion

Representation and reasoning of uncertain knowledge is one of the goals in the


semantic web area. Probability ontology language based on OWL2 is envisioned as an
important approach to achieve this goal. In view of the weakness of the ontology
language on lacking the ability of synchronously modeling probability and fuzzy
knowledge, this paper proposed the Fuzzy PR-OWL ontology language based on Fuzzy
MEBN that adds expressive power of widespread fuzzy knowledge in PR-OWL2
related domain. Domain cases in the last part show that Fuzzy PR-OWL can represent
the probability or fuzzy information in a specific domain well.
As for future work, we intend to construct a reasoning frame for Fuzzy PR-OWL
by studying more about FOFL and fuzzy BN theory and improve it continuously.

Acknowledgment

This work was funded by the key scientific and technological project of Henan
Province (162102310616)

References

[1] P. Michael, Uncertainty Reasoning for the Semantic Web III, Springer International Publishing, 2013.
[2] K. J. Laskey and K. B. Laskey, Uncertainty Reasoning for the World Wide Web: Report on the
URW3-XG Incubator Group, International Workshop on Uncertainty Reasoning for the Semantic Web,
Karlsruhe, Germany, 2008.
[3] K. B. Laskey, MEBN: A language for first-order Bayesian knowledge bases, Artificial
Intelligence, 172(2008):140-178.
[4] K. Golestan, F. Karray, and M. S. Kamel, High level information fusion through a fuzzy extension to
Multi-Entity Bayesian Networks in Vehicular Ad-hoc Networks, International Conference on
Information Fusion, (2013):1180-1187.
[5] K. Golestan, F. Karray, and M. S. Kamel, Fuzzy Multi Entity Bayesian Networks: A Model for Imprecise
Knowledge Representation and Reasoning in High-Level Information Fusion, IEEE International
Conference on Fuzzy Systems, (2014):1678-1685.
[6] P. Hitzler, et al, OWL2 Web Ontology Language Primer(Second edition) (2015).
[7] Z. L. Ding, and Y. Peng, A Probabilistic Extension to Ontology Language OWL, Hawaii International
Conference on System Sciences, 4(2004):40111a-40111a.
[8] P. C. Costa, G. Da, K. B. Laskey and K. J. Laskey, PR-OWL: A Bayesian Ontology Language for the
Semantic Web, Uncertainty Reasoning for the Semantic Web I:, ISWC International Workshop, URSW
2005-2007, Revised Selected and Invited Papers, (2008):88-107.
[9] N. C. Rommel, K. B. Laskey, and P. C. G. Costa, PR-OWL2.0 – Bridging the Gap to OWL
Semantics, Uncertainty Reasoning for the Semantic Web II, Springer, Berlin Heidelberg, (2013):1-18.
[10] V. Novák, On the syntactico-semantical completeness of first-order fuzzy logic, Kybernetika
-Praha- 2(1990):47-66.
[11] N. F. Noy, et al, Creating semantic web contents with protégé-2000, IEEE Intelligent Systems, 16
(2001): 60–71.
[12] W. Gueaieb, Soft computing and intelligent systems design: Theory, tools and applications, Neural
Networks IEEE Transactions on, 17(2004):825-825.
Fuzzy Systems and Data Mining II 81
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-81

State Assessment of Oil-Paper Insulation


Based on Fuzzy Rough Sets
De-Hua HE1, Jin-Ding CAI, Song XIE, Qing-Mei ZENG
College of Electrical Engineering and Automation, Fuzhou University, Fuzhou, China

Abstract. Return voltage method (RVM) is a good method to study aging state of
transformer insulation, but it is difficult to accurately assess the insulation aging
state by a single characteristic quantity. In this paper, the fuzzy rough sets theory
combined with RVM is proposed to assess the oil paper insulation state of
transformer and construct the assessment system of oil paper insulation of
transformer based on a lot of test data. First, the evaluation index of oil-paper
insulation status of transformer is established by return voltage characteristic
parameters. Then, fuzzy c-means clustering algorithm is used to obtain the
membership function of the transformer test data along with fuzzy partition of
characteristics .Moreover, the fuzzy attributes of assessment table of oil paper
insulation statue is simplified according to the distinct matrix,and it extracts the
evaluation rule of oil paper insulation condition. Finally, the examples in this
paper demonstrate that the assessment system is effective and feasible, which
provides a new idea for the assessment of transformer oil-paper insulation state.
The research has practical value in application of engineering

Keywords. Return voltage, fuzzy rough sets, fuzzy C means clustering

Introduction

Transformers play a vital role in the whole electrical power system. Due to a large
number of transformers within electric utilities are approaching the end of their design
life, there has been a growing interest in the condition assessment of transformer
insulation currently. The degradation of the main insulation system in transformer is
recognized to be one of the major causes of transformer breakdown [1- 3].
Methods based on the analysis of electrical polarization in dielectrics are often
used in the diagnostics of paper-oil insulation state. Three parameters customarily were
selected to assess the oil-paper insulation [4-5]. However, due to the characteristics of
insulation aging affected by a variety of factors, it is difficult to accurately assess the
insulation aging state by a single feature. The grey correlation method was introduced
for the insulation condition assessment [6], but did not consider the amount of
redundant characteristics in condition assessment of oil paper insulation, the assesse
process is complicated.
In this paper, fuzzy rough set theory is introduced and multiple characteristics are
considered to comprehensive assesse the condition of oil paper insulation. The method
solves the problem of partial information is incomplete and unknown. It has been used

1
Corresponding Author: De-Hua HE, College of Electrical Engineering and Automation, Fuzhou
University, Fuzhou, China; E-mail:153367542@qq.com.
82 D.-H. He et al. / State Assessment of Oil-Paper Insulation Based on Fuzzy Rough Sets

that the fuzzy C means clustering algorithm (FCM) in disperse important data category
to form classification attribution [7]. The characteristics fuzzy rules and the insulation
assessment system are established based on historical database.

1. Theory of Fuzzy Rough Sets

Rough set theory is a powerful tool in dealing with vague and uncertain information.
The basic idea of the fuzzy rough model is that a fuzzy similarity relation is used to
construct the fuzzy lower and upper approximations of a decision. The sizes of the
lower and upper approximations reflect the discriminating capability of a feature subset.
The union of fuzzy lower approximations forms the fuzzy positive region of decision.
Let a universe U as a finite nonempty set of objects. Each object within U is defined by
a set of attributes, denoted by A. The pair (U, A) is an information system (IS), where
for every subset P⊆A there exist an associated similarity relation. ǴRp(x,y) denote the
similarity of objects x, and y induced by the subset of features p. Given X⊆U, X can be
approximated by the information contained in P through the construction of the P-
lower and P-upper approximations of X as defined in Eqs. (1):

P R p X _ ( x) inf I ( P R p ( x, y ), P X ( y))
yU

P R p X ( x) sup T ( P R p ( x, y ), P X ( y ))

(1)
yU

Where I represents the fuzzy implicator and T is the t-norm, and Rp is the fuzzy
similarity relation induced by the subset of features P. The degree of similarity of
object with respect to subset of features can be constructed using Eq. (2)

PR p ( x, y) TaP ^PRa ( x, y)`


(2)

where μRa(x,y) is the degree to which objects x and y are similar for feature a. It
employs a quality measure termed the fuzzy-rough dependency function γP(Q) that
measures the dependency between two sets of attributes P and Q, which is defined by:

P POS ( x) ¦P POS RP ( x)
J Pi (Q ) RP xU

U U
(3)

where the fuzzy positive region, which contains all objects of U that can be
classified into classes of U/Q using the information in P, is defined as:

P posR p (Q ) ( x) sup X U/ Q P R P X ( x)


( )
D.-H. He et al. / State Assessment of Oil-Paper Insulation Based on Fuzzy Rough Sets 83

2. Attributes Reduction Based on Rough Sets Theory

Not all attributes are necessary for assessment of oil-paper insulation system, removal
of these extra features and the amount of property vague language entry does not affect
the original oil-paper insulation diagnostic effect. The discernibility matrix can be
reduction of condition attributes and attribute values. Specific reduction steps are as
follows: 1: Calculate the similarity relation of fuzzy attributes Ck

­min{Ck ( xi ), Ck ( x j )} Ck ( xi ) z Ck ( x j )
Rk ( xi , x j ) ®
¯ 1 Ck ( xi ) Ck ( x j )
(5)

2: Calculate all fuzzy similarity relation Sim(R): Sim(R) ∩ R │R ЩR 3:


Calculate the matrix evaluation system M(U,R)=( cij) n×n,:

°^Rk :1  Rk ( xi , x j ) t Oi ` Oi t O j
­
cij ®
°̄ ‡ Oi  O j
( )

where: λi=Sim(R)*([xi]Q)(xi);λj=Sim(R Щ U)*([xi]Q)(xj) 㧘 [x]Q(x) Щ U/Q ‫ ޕ‬4: fD


(U,R)= ш { щ (cij): cij ≠ Ø} 㧧 5: gD(U,R)=( ш R1) щ ̖ щ ( ш Rl) 㧧 6:output
RedD(R)={ R1,̖, Rl}㧧7: Building assessment rules table, delete duplicate evaluation
rule, extract oil-paper insulation condition assessment rules.

3. Membership of Characteristic

In this paper, FCM is used to calculate the cluster center of each cluster and
membership of transformer test data. Let (U,PыQ)be(a fuzzy decision system with
U={x1, x2,̖, xn}, fuzzy condition attributes P is divided into three categories, and
cluster centers V={v1,v2, v3},The relationship between sample and cluster centers can
be expressed by membership degree. Membership function is obtained by the algorithm,
and then membership degree matrix μ is obtained:

ª P11 P1 j P1n º
« »
P « P21 P2 j P2 n » j 1,
1 ,n
« P31 P3 j P3n »¼
¬ ( )

2
(1/ x j  vi )1/ m1
Pij 3

¦ (1/
2
x j  vc )1/ m 1
c 1 ( )

The iteration objective function is:


84 D.-H. He et al. / State Assessment of Oil-Paper Insulation Based on Fuzzy Rough Sets

3 n

¦¦ (P )
2
min J ( Pij , vn ) ij
m
x j  vi
i 1 j 1 (9)

The calculation formula of cluster center is:

n
1
vi n ¦ (P ij ) m x j , i 1, 2,3
¦ (P
j 1
ij )m j 1

(1 )

4. Assessment of Oil Paper Insulation Based on Fuzzy Rough Sets


The test data and ageing information of transformers are shown in Table 1. The P1‫ޔ‬P2‫ޔ‬
P3‫ޔ‬P4 and P5 is condition attribute of tcdom Urmp Srmax Rg Cg, respectively, Q
is the fuzzy decision attribute of oil paper insulation. According to the relevant
regulations power equipment, the transformer insulation is divided into good (B) and
bad (G). The characteristics are divided into 15 fuzzy attributes Ck(k=1,2,…,15). The
membership of the fuzzy attribute Ck of the test data is obtained by the FCM algorithm.
The membership is listed in Table 2.
Table 1. Return voltage test sample data of transformers

Trafo tcdom/s Urmp/V Srmax Rg/GΩ Cg/nF State


x1 2518 183.5 31.20 12.26 92.17 G
x2 546.6 353.4 257.1 1.96 186.9 B
x3 1214 256.0 96.4 4.899 109.3 G
x4 667.4 385.5 293.2 1.440 190.1 B
x5 2415 175.0 32.11 13.35 70.36 G
x6 1226 248.2 87.66 4.026 106.8 G
x7 649.5 363.4 179.2 2.743 169.9 B
x8 3613 269.4 80.10 11.00 45.23 G
x9 333.7 32.60 74.02 2.830 64.38 B
x10 3540 223.4 23.70 3.682 99.51 B
x11 1265 236.1 44.50 1.537 235.3 B
x12 2655 218.5 67.72 11.77 80.40 G
x13 896.9 169.7 120.5 2.885 149.8 B
x14 1524 320.3 79.24 2.832 183.0 G
x15 3289 239.7 19.71 13.05 47.88 G
x16 700.6 83.45 46.40 2.339 125.7 G
x17 189.1 313.8 54.50 1.253 168.8 B
x18 2706 110.7 32.43 12.40 127.1 G

According to Eqs.(1) and (4), And the most important for assessment is P4,
followed by P3, P5, P1 and P2, and the set is calculated by attributes reduction
D.-H. He et al. / State Assessment of Oil-Paper Insulation Based on Fuzzy Rough Sets 85

algorithms. The set is {C3,C4,C8,C9,C10,C12,C15},which has removed redundant of


attribute, decision rules are listed in Table 3, the elements in the table are membership
of interval .Taking 3 transformers not in historical database as example, the basic
information are shown in Table 4. According to the insulation assessment process, the
membership of transformers is obtained, and the results are shown in Table 5. The
membership degree of Transformer T1 match the rule 1, based on the assessment rules,
the insulation of transformer T1 is well, does not need maintenance. Membership
degree of T2 match the assessment rule 6, according to the rules, insulation of T2 aging
serious, need maintenance. Membership degree of T3 match the assessment rules 9,
insulation of T3 aging serious. The diagnosis results of T3 is well judged by the method
proposed in the reference [4], the result is different from the actual condition. Three
diagnosis results are in perfect agreement with the actual condition; results have
verified the method based on fuzzy rough sets theory is effective and accurate.
Table 2. Membership function of partial fuzzy attributes

P1(10-2) P2 (10-2) P3(10-2) P4(10-2) P5(10-2)


T L M H L M H L M H L M H L M H Q
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15
1 5 13 81 8 87 4 99 0 0 0 0 99 37 59 2 G
2 99 0 0 0 0 99 0 0 99 98 1 0 0 1 97 B
3 1 98 0 3 2 13 2 96 0 8 90 1 1 98 0 G
4 95 3 0 1 5 93 1 1 97 91 8 0 0 0 98 B
5 7 20 72 14 80 5 99 0 0 0 1 97 97 2 0 G
6 1 98 0 2 89 8 0 99 0 0 99 0 3 96 0 G
7 97 2 0 0 1 98 16 38 45 88 11 0 4 18 76 B
8 3 6 90 4 66 28 1 98 0 2 3 94 92 6 1 G
9 94 4 0 94 3 1 8 90 0 82 17 0 99 0 0 B
10 2 5 92 0 99 0 96 4 0 7 91 0 14 83 2 B
11 0 99 0 1 96 2 95 4 0 92 7 0 4 8 86 B
12 2 6 91 0 99 0 24 75 0 0 0 99 79 18 1 G
13 55 43 1 18 75 5 13 81 4 78 21 0 9 57 32 B
14 5 92 2 0 6 93 2 97 0 82 17 0 1 3 94 G
15 1 1 97 1 94 3 94 5 0 0 0 98 93 5 1 G
16 92 6 0 98 1 0 92 7 0 99 0 0 3 94 2 G
17 89 9 1 1 10 88 72 26 0 84 11 1 5 20 74 B
18 1 4 93 85 12 2 99 0 0 0 0 99 3 93 3 G

Table 3. The rules of insulation assessment system

P1(H) P2(L) P3(M) P3(H) P4(L) P4(H) P5(H)


Rule
C3 C4 C8 C9 C10 C12 C15 Q

1 (0.5 , 1) (0 , 0.5) (0 , 1) (0 , 0.5) (0 , 0.5) (0.5 , 1) (0 , 0.5) G


2 (0 , 0.5) (0 , 0.5) (0.5 , 1) (0 , 0.5) (0.5 , 1) (0 , 0.5) (0.5 , 1) G
86 D.-H. He et al. / State Assessment of Oil-Paper Insulation Based on Fuzzy Rough Sets

3 (0 , 0.5) (0.5 , 1) (0 , 0.5) (0 , 0.5) (0.5 , 1) (0 , 0.5) (0 , 0.5) G


4 (0 , 0.5) (0, 0.5) (0.5 , 1) (0 , 0.5) (0 , 0.5) (0 , 0.5) (0 , 0.5) G
5 (0.5 , 1) (0 , 1) (0 , 0.5) (0 , 0.5) (0 , 0.5) (0.5 , 1) (0 , 0.5) G
6 (0 , 0.5) (0, 0.5) (0 , 0.5) (0 , 1) (0.5 , 1) (0 , 0.5) (0.5 , 1) B
7 (0 , 0.5) (0, 1) (0.5 , 1) (0 , 0.5) (0.5 , 1) (0 , 0.5) (0 , 0.5) B
8 (0 , 0.5) (0.5 , 1) (0.5 , 1) (0 , 0.5) (0.5 , 1) (0 , 0.5) (0 , 1) B
9 (0.5 , 1) (0 , 0.5) (0 , 0.5) (0 , 0.5) (0 , 0.5) (0 , 0.5) (0 , 0.5) B

Table 4. The basic information of power transformers

Traf Model years tcdom Urmp Srmax Rg/GΩ Cg/nF Furfur state
T1 SFSE-220 1 2314 230 27 15.74 90.35 0.06 Good
T2 SFP-220 14 1449 243 45 1.795 364.2 0.74 Bad
T3 cub-/220 22 3328 289 24 4.027 95.48 0.99 Bad

Table 5. The membership and assessment results of power transformers

Traf P1(H) P2(L) P3(M) P3(H) P4(L) P4(H) P5(H) Rule Result
T1 0.6684 0.0034 0.0123 0.0007 0.0001 0.9980 0.0236 1 G
T2 0.0052 0.0132 0.0418 0.0014 0.9883 0.0050 0.9932 6 B
T3 0.9787 0.0341 0.0245 0.0016 0.0007 0.0000 0.0170 9 B

5. Conclusion
To avoid a single characteristics impact the correctness of the insulation condition
assessment, the fuzzy rough sets theory combine with RVM is proposed and used to
assess the oil paper insulation of transformer. The results demonstrate that the
assessment system is effective and feasible which provides a new idea for the
assessment of transformer oil paper insulation.

References

[1] T. K. Saha, Review of modern diagnostic techniques for assessing insulation condition in aged
transformers, IEEE Trans. Dielectr. Electr. Insul. 10(2003), 903-917.
[2] M. de Nigris, R. Passaglia, R. Berti, L. Bergonzi and R. Maggi, Application of modern techniques for the
condition assessment of power transformers, CIGRE Session 2004, , France, Paper A2-207, 2004.
[3] W. G. Chen, J. Du, Y. Ling, et al. Air-gap discharge process partition in oil-paper insulation based on
energy-wavelet moment feature analysis. Chinese Journal of Scientific Instrument, 34(2013):1062-1069.
[4] Y. Zou, J. D. Cai. Study on the relationship between polarization spectrum characteristic quantity and
insulation condition of oil-paper transformer. Chinese Journal of Scientific Instrument, 36(2015): 608-
614.
[5] R. J. Liao, H. G. Sun, Q. Yuan, et al. Analysis of oil-paper insulation aging characteristics using Return
voltage method. High Voltage Engineering, 37(2011): 136-142.
[6] J. D. Cai and Y. Huang. Study on Insulation Aging of Power Transformer Based on Gray Relational
Diagnostic Model. High Voltage Engineering, 41(2015): 3296- 3301.
[7] S. H. Gao, L. Dong, Y. Gao, et al. Mid-long term wind speed prediction based on rough set theory.
Proceedings of the CSEE, 32(2012): 32-37.
Fuzzy Systems and Data Mining II 87
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-87

Finite-Time Stabilization for T-S Fuzzy


Networked Systems with State and
Communication Delay
He-Jun YAO1 , Fu-Shun YUAN and Yue QIAO
School of Mathematics and Statistics, Anyang Normal University, 455000, Anyang,
Henan, China

Abstract. The finite-time stabilization problem for nonlinear networked systems


has been considered. T-S approach has been used to model the controlled
nonlinear systems. By using the Lyapunov functional method, a finite-time
stabilization sufficient condition has been given. Then, a state feedback fuzzy
controller has been designed to make the closed networked control systems finite-
time stable. Finally, the proposed design method has been used into the
temperature control system for polymerization reactor.

Keywords. networked systems; fuzzy; delay

Introduction

Networked control systems (NCSs) are the feedback control systems with a network.
As we all know, NCSs has a lot of advantages, for example ease of maintenance, low
cost, greater flexibility . In recent years, a number of papers have been report on
analysis and control of NCSs[2-4]. In order to design the networked-based control, Gao
obtained a new delay system approach by using LMI approach [5]. In [6], Walsh et al.
considered the asymptotic stable of nonlinear NCSs. For the NCSs with long
communication delay, the networked-based optimal controller has been designed in [7].
Yue etc. considered the H f control problem of NCSs with uncertainty [8].
As a useful approach, the fuzzy control approach is usually used to design the
robust control for nonlinear systems. With the well-known T-S approach, many papers
have been published on the stabilization and control problem for nonlinear delay
systems [9-10]. In [11], by considering the insertion of the network, in order to ensure
systems properties, a new two-step approach has been introduced. For the nonlinear
NCSs, the input-to-state stability problem has been considered in [12]. But the results
of the above papers have only focused on the asymptotic stability of dynamic systems.
A few paper considered the finite-time stability of nonlinear NCSs. Therefore, the
finite-time control problem of nonlinear NCSs worthy to be concerned, which
motivates this paper.

1
Corresponding Author. He-Jun YAO, School of Mathematics and Statistics, Anyang Normal
University, 455000, Anyang, Henan, China; E-mail addresses: yaohejun@126.com.
88 H.-J. Yao et al. / Finite-Time Stabilization for T-S Fuzzy Networked Systems

In this paper, by using the LMI approach, based on the Lyapunov functional
approach, we obtained the fuzzy controller design methods and a finite-time stability
condition.

1. Problem formulation

Consider the following plant in Figure 1[13]


Rule i :
IF z1 (t ) is M 1i , z 2 (t ) is M 2i ,…, z n (t ) is M ni
THEN x(t ) Ai x(t )  Adi x(t  d )  Biu (t )  GiZ (t ) (1)
x(t ) I (t ) t  [d , 0]
where z1 (t ) , z 2 (t ) , , z n (t ) are the premise variable. x(t )  Rn is the systems
state vector. u(t )  Rm is the controlled input vector 㧘
i
M k (i 1, 2, , r ; k 1, 2, , n) are fuzzy sets. r is the numbers of IF-THEN
rules. n is the numbers of fuzzy sets. Ai , Adi , Bi ,Gi are known constant matrices . d is
state delay . I (t )  Rn is initial state on [  d , 0] . Z (t )  Rl is the exogenous disturbance
and satisfies
T
³ 0
ZT (t )Z (t )dt d d , d t 0 (2)
Actuator Plant Sensor

Delay W ca Network Medium Delay Wsc

Controller

Figure 1. The closed networked control systems


By using the T-S approach, without considering the communication delay, the
networked systems are described by[13]
r
x(t ) ¦ P ( z(t ))[ A x(t )  A
i
i i di x(t  d )  Biu (t )  GiZ (t )] (3)
x(t ) I (t ) t  [d, 0]
where Pi ( z(t )) satisfying
r
Pi ( z (t )) t 0, ¦ Pi ( z (t )) ! 0 i 1, 2, ,,rr

i 1

Assumption1[14]. The controller and actuator are event driven, the sensor-
controller delay is W sc ; the sensor is time driven, the controller-actuator delay is
W ca .Therefore, the communication delay is W W sc  W ca .
By insertion of the network, with considering the communication delay W , the
control systems of Fig1 with a network is
r
x(t ) ¦ P ( z(t ))[ A x(t )  A
i
i i di x(t  d )  Biu (t  W )  GiZ (t )] (4)
x(t ) I (t ) t  [d, 0]
In this paper, we would design the following controller
H.-J. Yao et al. / Finite-Time Stabilization for T-S Fuzzy Networked Systems 89

r
u (t ) ¦ P ( z(t ))K x(t )
i
i i
(5)

Inserting the above controller㧔5㧕 into network system (4), we obtain the closed
systems:
r r
x(t ) ¦¦ P ( z(t ))P ( z(t ))[ A x(t )  A
i j
i j i di x(t  d )  Bi K j (t  W )  GiZ (t )] (6)

x(t ) \ (t ) t [ d, 0]
We suppose the initial state x(t ) \ (t ) is a smooth function on [d , 0] ,
d max{W , d} . So, ||\ (t ) ||d \ t [d , 0] , where \ is a positive constant.
Definition1[15] For the given positive scalars c1 , c2 , T , positive matrix R , the
time delay NCSs (6) (setting Z (t ) { 0 ) is finite-time stable, if
xT (0) Rx(0) d c1 Ÿ xT (t ) Rx(t )  c2 t  [0, T ] (7)
Definition2[16] For the given positive scalars c1 , c2 , T , positive matrix R , with the
state feedback controller, the time delay NCSs (6) is finite-time stabilization if the
following condition holds
xT (0) Rx(0) d c1 Ÿ xT (t ) Rx(t )  c2 t  [0, T ] (8)

2. Main Results

Theorem1. For the given positive scalars c1 , c2 , T , positive matrix R , the NCSs (6) is
finite-time stabilization, if there are scalar D t 0 , matrix K i  R mu n , positive matrices
P, Q, T  R nun , S  R l ul to make the matrix inequalities hold
ª; PAdi PGi º PBi K j
« Q 0 »» 0
« 0
(9)
« 0 » T
« »
¬ D S ¼
c1 (Omax ( P)  hOmax
m (Q )  WOmax
m x (T ))  d Om )(  e D T )
max ( S )(1
 c2 e D T (10)
Omin ( P)
where
; PAi  AiT P  Q  T  D P , P R 1/ 2QR
Q 1/ 2 , T R 1/ 2TR 1/ 2
R 1/ 2 PR
P 1/ 2 , Q
and Omax ( ) and Omin ( ) are the maximum and minimum eigenvalue.
Proof. For the positive matrix P, Q, T in Theorem 1, we choose the Lyapunov
function[13]:
t t
V ( x(t )) : xT (t ) Px(t )  ³ xT (T )Qx(T )dT  ³ xT (T )Tx(T )dT (11)
t -h t -W

The derivative of V ( x(t )) (6) is given by


T
ª x(t ) º ª PAi  AiT P  Q  T PAdi PBi K j PGi º ª x(t ) º
« x(t  d ) » « »« »
r r
« Q 0 0 » « x(t  d ) »
V ( x(t )) ¦¦
¦ Pi ( z (t )) P j ( z (t )) « »
i 1 j 1 « x(t  W ) » « T »
0 « x(t  W ) »
« » « »« »
¬ Z (t ) ¼ «¬ 0 »¼ ¬ Z (t ) ¼
From condition (9), we have
90 H.-J. Yao et al. / Finite-Time Stabilization for T-S Fuzzy Networked Systems

V ( x(t ))  D xT (t ) Px(t )  DZ T (t ) SZ (t )  DV ( x(t ))  DZ T (t ) S Z (t ) (12)


D t
Multiplying (12) by e , we obtain
e D tV ( x(t ))  e D tDV ( x(t ))  D e D t Z T (t ) SZ (t )
Furthermore
d D t
(e V ( x(t )))  D e D t Z T (t ) SZ (t )
dt
From 0 to t , integrating the above inequality, with t  [0, T ] ,
t
eD tV ( x(t ))  V ( x(0))  ³ D eDT Z T (T ) SZ (T )dT (13)
0

Noting that D t 0 , P R 1/ 2 PPR 1/ 2 , Q R1/ 2Q 1/ 2 1/ 2


QR1/ 2 , and T R TR , we can obtain the
following relation:
xT (t ) Px(t ) d V ( x(t ))  eDT [c1 (Omax ( P)  hOmax (Q)  WOmax m (T ))  d Omax )(  eD t )]
m ( S )(1
(14)
On the other hand, it yields
xT (t ) Px(t ) xT (t ) R1/ 2 PR1/ 2 x(t ) t Ommin ( P) xT (t ) Rx(t ) (15)
Putting together (14) and (15) we have
eDT [c1 (Omax ( P)  hOmax
m (Q )  WOmax
max (T ))  d Omax )(  e DT )]
m ( S )(1 (16)
xT (t ) Rx(t ) 
Omin ( P)
Condition (10) and inequality (16) imply,
xT (t ) Rx(t ) d c2 ,  t  [0, T ] .
Theorem2. For the given positive scalars c1 , c2 , T , positive definite matrix R ,
with the fuzzy controller (5), the NCSs(6) is finite-time stabilization, if there are
scalars D t 0, Oi ! 0, i 1, 2,3, 4. , matrix K  Rmun , positive matrices X , Q, T  R nun ,
S  R l ul to make the following matrix inequalities hold:
ª4 Adi X Bi K j Gi º
« » (17)
« Q 0 0 »
0
« T 0 »
« »
¬« D S ¼»
O1 R 1  X  R 1 (18)
O2 Q  O1 X (19)
O3T  O1 X (20)
0  S  O4 I (21)
ª d O4 (1  e D T )  c2 e D T c1 h Wº
« » (22)
« O1 0 0 »
0
« O2 0 »
« »
«¬ O3 »¼
where
4 Ai X  XAiT  Q  T  D X
Proof. Left-and right-multiplying the inequality (9) by diag{P -1 , P -1 , P -1 , I } , the
inequality (9) is equivalent to
ª6 Adi P 1 Bi K j P 1 Gi º
« 1 1 » (23)
«  P QP 0 0 »
0
«  P TP1 1
0 »
« »
¬« D S ¼»
where 6 Ai P 1  P 1 AiT  P 1QP 1  P 1TP 1  D P 1
H.-J. Yao et al. / Finite-Time Stabilization for T-S Fuzzy Networked Systems 91

By letting X P 1 , K j K j P 1 , Q P 1QP 1 , T P 1TP 1 , the inequality (23) is equivalent


to inequality (17).
On the other hand, we denote X R 1/ 2 XR 1/ 2 , Q R 1/ 2QR
Q 1/ 2 , T R 1/ 2TR 1/ 2 .

For the positive-definite matrix R , we have Omax (X )= 1 . And the inequalities


Omin (P)
(18-21) imply that
1 O O
1  Omin ( P)), Ommax ( P)  , O (Q)  1 Ommax ( P)), Omax (T )  1 Omax ( P)), Omax
m x ( S )  O4
(24)
O1 mmax O2 O3 m
With the Schur Lemma, the inequality (22) is equivalent to
c h W
d O4 (1  eDT )  c2 eDT  1    0 (25)
O1 O2 O3
With (24), the condition (10) follows that
c1 (Omax ( P)  hOmax
m (Q )  WOmax
m x (T ))  d Omax )(  e D T )
m ( S )(1 c h W
 d O4 (1  eD T )  1   (26)
Omin ( P) O1 O2 O3
Inserting the inequality (25) into (26), the inequality (10) is satisfied.

3. Numerical Example

The temperature control system for polymerization reactor is a inertia link with time
delay. The state space model of polymerization reactor is usually written as[6]
x1 (t ) x2 (t )
x2 (t ) a1 x1 (t )  a2 x2 (t )  bu (t )
y (t ) x1 (t )
It is impossible to avoid the external disturbance and time delay. We consider the
nonlinear delay system with norm-bounded uncertainties as following
x(t )
x( Ai x(t )  Adi x(t  d )  Biu (t )
x(t ) \ (t ) d dt d0
where
ª 30 0 º ª3 12º ª 2 0.5º ª 3 1º ª1 º ª0 º ª1º
A1 « 0 20 » , A2 «1 0 » , Ad 1 « 0.5 2 » , Ad 2 « 0.1 1» , B1 « 2» , B2 «1 » ,\ (t ) « 1» ,
¬ ¼ ¬ ¼ ¬ ¼ ¬ ¼ ¬ ¼ ¬ ¼ ¬ ¼
d 0.2,W 0.5
Solving the LMIs (17), the gain matrix can be obtained
K1 K1P 1 [3.4529 1.6837], K 2 K 2 P 1 [8.6183  4.3602]
With the state feedback controller (5) in Theorem 2, and choosing the initial
conditions \ (t ) [2  0.5]T
The simulation results are shown in the following figures 2-3
92 H.-J. Yao et al. / Finite-Time Stabilization for T-S Fuzzy Networked Systems

Figure 2. x1 (t ) of systems

Figure 3. x2 (t ) of systems
In the above figures, one can see that the systems is well finite-time stable.

4. Conclusion

In this paper, by introducing the Lyapunov approach and a new finite-time stable
approach, a finite-time stabilization condition is obtained. Based on this condition, the
state feedback fuzzy controller has been designed by using LMI.

Acknowledgments

This work was supported by Anyang Normal University Innovation Foundation Project
under Grant ASCX/2016-Z113.
H.-J. Yao et al. / Finite-Time Stabilization for T-S Fuzzy Networked Systems 93

References

[1] Y. Xia, Y. Gao, Recent progress in networked control systems-a survey, International Journal of
Automation and Computing, 12(2015), 343-367.
[2] G. Chen, Q. Lin, Finite-time observer based cooperative tracking control of networked large range
systems, Abstract and Applied Analysis, 2014, Article ID 135690.
[3] B. Chen, W. Zhang, Distributed fusion estimation with missing measurements, random transmission
delays and packet dropouts. IEEE Transactions on Automatic Control, 59(2014), 196-1967.
[4] J. Chen, H. Zhu, Finite-time H f filtering for a class of discrete-time Markovian jump systems with partly
unknown transition probabilities. International Journal of Adaptive Control and Signal Processing,
28(2014), 1024-1042.
[5] H. Gao, T. Chen, J. Lam, A new delay system approach to network-based control, Automatica, 44(2008),
39-52.
[6] G. C. Walsh, H. Ye, L G. Bushnell, Stability analysis of networked control systems, IEEE Trans on
Control Systems Technology, 10(2002), 438-446.
[7] S. Hu, Q. Zhu, Stochastic optimal control and analysis of stability of networked control systems with
long delay, Automatica, 39(2003),1877–1884.
[8] D. Yue, Q. L. Han, and J. Lam, Network-based robust H∞ control of a system with uncertainty,
Automatica, 4(2005), 999- 1007.
[9] Z. H. Guan, J. Huang, G. R. Chen, Stability Analysis of Networked Impulsive Control Systems, Proc. 25th
Chinese Control Conference, 2006, 2041-2044.
[10] Y. Tian, Z. Yu, Multifractal nature of network induced time delay in networked control systems,
Physics Letter A, 361(2007), 103-107.
[11] G. C. Walsh, O. Beldiman, L. G. Bushnell, Asymptotic behavior of nonlinear networked control
systems, IEEE Transactions on Automatic Control, 46(2001), 1093–1097.
[12] D. Nesic, Observer design for wired linear networked control systems using matrix inequalities,
Automatica, 44(2008), 2840-2848.
[13] S. He, H. Xu, Non-fragile finite-time filter design for time-delayed Markovian jumping systems via T-S
fuzzy model approach, Nonlinear Dynamic, 80(2015), 1159-1171.
[14] D. Huang, S. Kiong, State feedback control of uncertain networked control systems with random time
delays, IEEE Transactions on Automatic Control, 53(2008), 829-834.
[15] F. Amato, M. Ariola, P. Dorate, Finite-time stabilization via dynamic output feedback, Automatica,
42(2006), 337-342.
[16] F. Amato, M. Ariola, C, Cosentino, Finite-time control of discrete- time linear systems: Analysis and
design conditions, Automatica, 46(2010), 919-924.
94 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-94

A Trapezoidal Fuzzy Multiple Attribute


Decision Making Based on Rough Sets
Zhi-Ying LVa,b,1, Ping HUANGb, Xian-Yong ZHANGc,d and Li-Wei ZHENGe
a
College of Mathematics, University of Electronic Science and Technology of Chi-
na Chengdu, Sichuan, China
b
College of Management, Chengdu University of Information Technology,Chengdu,
Sichuan, China
c
College of Mathematics and Software Science, Sichuan Normal Universi-
ty,Chengdu, Sichuan, China
d
Institute of Intelligent Information and Quantum Information,Sichuan Normal
University, Chengdu, Sichuan, China
e
College of Applied Mathematics, Chengdu University of Information Technology,
Chengdu, Sichuan, China

Abstract. Fuzzy multiple attribute decision making (FMADM) is an efficient way


to solve complex systems, and has wide, practical application. This paper studies
the FMADM of the trapezoidal fuzzy number. In order to achieve desirable deci-
sion making, the similarity measures between two trapezoidal fuzzy numbers is
defined, which is based on a new method for ranking fuzzy numbers. A new algo-
rithm is proposed to remove surplus attributes. This algorithm is based on rough
sets and the technique for order of preference by similarity to ideal solution (TOP-
SIS) method; Finally, an example is examined to demonstrate the model’s use in
practical problems.

Keywords. FMADM, trapezoidal fuzzy number, centroid points, attribute reduc-


tion, rough sets

Introduction

The main idea of multiple attribute decision making (MADM) problems is to rank the
alternatives or choose the optimal solution. However, the available information is often
imprecise or vague. In this case, a better solution is to use fuzzy number. Fuzzy theory
[1] is able to address many decision problems that experts and decision makers struggle
to respond to, because of lack of information. Over the years, many theories and appli-
cations have been proposed for solving FMADM problems [2-3].To deal with these
fuzzy situations, experts are usually encouraged to use the trapezoidal fuzzy number,
which can involve the triangular number and interval number. At the same time, rank-
ing fuzzy numbers [4-5] is very important in real time decision-making applications.
Therefore there is a need for a procedure which can rank fuzzy numbers in more condi-

1
Corresponding Author: Zhi-Ying Lv, College of Mathematics, University of Electronic Science and
Technology of China, Chengdu 611731, China; College of Management, Chengdu University of Information
Technology; E-mail: lvZhiying1979@163.com.
Z.-Y. Lv et al. / A Trapezoidal Fuzzy Multiple Attribute Decision Making Based on Rough Sets 95

tions. Ref [6] gives a way to rank trapezoidal fuzzy numbers based on the circumcenter
of centroids. This is a very practical method, which can incorporate the importance of
using the mode and spreads of fuzzy numbers.
Study found that correlations among the attributes will seriously affect the scientif-
ic objectivity and fairness of the evaluation, so attribute reduction [7-8] is an essential
subject in MADM. Usually, the rough set theory is a useful tool to study the attribute r
reduction problems. This theory is initiated by Pawlak in 1982 [9]. However, few stud-
ies have been conducted on the problem of attribute reduction in fuzzy decision making.
In this paper, a new FMADM method is presented, in which the distance between
two trapezoidal fuzzy numbers is defined and a fuzzy number attribute reduction meth-
od based on the TOPSIS method and rough sets [10] is proposed.

1. Preliminaries

In this section, we give the concepts of rough sets and trapezoidal fuzzy numbers and
their extensions.

1.1. Pawlak Rough Sets

An approximation space apr (U , R) is defined by a universe U and a relation R, where


U is a set which have finite elements and R is an equivalence relation on U , then the
equivalence class containing x is given by [ x]R .
Let S (U , C,V , f ) be an information system, where C is the set of attributes, V is
the domain of attribute values, V U cCVc , where Vc is a nonempty set of values of
attribute c  C , called the domain of c , f : U u C o V is an information function that
maps an object in U to exactly one value in Vc such that c  C , x  U , f ( x, c)  Vc .
For B Ž C , denote [ x]RB { y U | ( f ( x, b), f ( y, b))  R, b  B} , RB {[ x]RB : x U } , that
is, RB is the set of equivalence relation classes. A subset B Ž C has its lower and upper
approximations of X Ž U , which are defined as:
apr ( X ) {x U | x |RB Ž X } and apr ( X ) {x  U | x |RB  X z )}
| apr (U ) |
and the approximate quality is rBU .
|U |
Because rCU 1 , if ck  C , s.t. rCU{ck } 1 , then ck is a reduction of C . Otherwise,
ck is a dispensable attribute. The set of all dispensable attributes is the core of C ,
which can be denoted by core(C) .

1.2. Trapezoidal Fuzzy Number

Below, we briefly review the definition of the trapezoidal fuzzy number and the rank-
ing method.
Definition 1 The membership function of a trapezoidal fuzzy number
~
P (a, b, c, d ; Z ) is given by:
96 Z.-Y. Lv et al. / A Trapezoidal Fuzzy Multiple Attribute Decision Making Based on Rough Sets

­ Z (a  t )
° a b adt db
°° Z bdt dc
P ~p (t ) ® Z (d  t )
° cdt dd
° d c
¯° 0 otherwise

~
where, -f  a d b d c d d  f , 0 d Z d 1 . If Z 1 , then P is normalized and can be
~
denoted by P (a, b, c, d ) , which is shown in figure 1.

Figure 1. Trapezoidal fuzzy number


We may see a trapezoidal fuzzy number as a trapezoid, which can be divided into
three plane figures. These figures are two triangles (APB and CQD) and a rectangle
(BPQC). Suppose G1, G2 , G3 are the centroids of these figures, which can form a new
triangle ( G1, G2 , G3 ).
Now, we give the definition of the circumcenter of the trapezoidal fuzzy number.
Definition 2[6] Let ~p (a, b, c, d ;Z) is a generalized trapezoidal fuzzy number, The
circumcenter S ~p ( ~x0 , ~y0 ) of the triangle( G1 G2 G3 ) can be defined as:
a  2b  2c  d (2a  b  3c)(2d  c  3b)  5Z 2
S ~p ( x 0 , y 0 ) ( , ) (1)
6 12Z
Definition 3[6] Based on the circumcenter of centroids S ~p ( ~x0 , ~y0 ) , the ranking
function of fuzzy number is defined as:
R( ~
p) (~
x0 ) ~p ( ~
y0 ) ~p (2)
This represents the area of a rectangle, which is formed by S ~p ( ~x0 , ~y0 ) and the origin.
As the value of R( ~p) increases, so does the fuzzy number ~p .We can define the dist-
ance between two normalized trapezoidal fuzzy numbers according to the distance bet-
ween their circumcenter points of the centroids because these points can be considered
red better balancing points for the trapezoidal fuzzy numbers.
~ ~
Definition 4 Let P1 (a1, b1, c1, d1) and P2 (a2 , b2 , c2 , d2 ) be two normalized trapezoi-
~ ~
dal fuzzy numbers, let SP1 (~
x01, ~
(~
x02 , ~
y01 ) and S P2
y02 ) be the circumcenter of the centroi-
~ ~ ~ ~
ds of P1 and P2 respectively, then the distance between P1 and P2 is defined by
~ ~
d ( P1, P2 ) ~x
1 ~2 2
0  x0  ~y
1
0 ~
y02
2
(3)
Z.-Y. Lv et al. / A Trapezoidal Fuzzy Multiple Attribute Decision Making Based on Rough Sets 97

2. Fuzzy Multiple Attribute Decision Method Based on Rough Sets

2.1. Problem Description

Suppose X {x1, x2 ,", xn } is an alternative set. Alternatives are assessed on m attributes.


Denote the set of all attributes by C {c1, c2 ,", cm} . Assume the weight vector of the att-
m
ributes is Z (Z1 , Z2 ,", Zm )T , such that ¦ Z j 1 , where Z j t 0 and Z j denotes the wei-
j 1
~
ght of attribute C j . Suppose P ( ~pij ) num is the trapezoidal fuzzy decision matrix given
by the expert, where ~pij (aij , bij , cij , dij ) is the attribute value of the alternative xi with
respect to the attribute c j  C .

2.2. Decision Method

Given the fuzzy and rough theories described above, the proposed FMADM procedure
is defined as follows:
~ ~
Step 1. Construct the circumcenter of the centroid matrix O (( xij , yij )) of P .
~
Step 2. Construct the value matrix Q (qij ) of P .
Step 3. Determine the positive ideal and negative idea solution using the following
steps:
­ ½ ­ ½
°~
p j
~
® pij : i  N , qij max qij °¾ and p j
~ °~
® pij : i  N , qij min qij °¾ (4)
°̄ iN °¿ °̄ iN °¿
Then,
{ p1 , p2 ,!, pm
AP 
} and A N { p1 , p2 ,!, pm

} (5)
~
Step 4. The distance between pij and the positive value or negative values are de-
fined as:

dij d(~ p j )
pij , ~ x 
j  xij  y
2 
j  yij
2
and dij d(~ p j )
pij , ~ x 
j  xij  y
2 
j  yij
2
(6)


where x j , y j and x j , x j are the circumcenters of the centroids of p j and p j respec-
tively. Then calculate the similar degrees tij between ~pij and the idea solution and con-
struct matrix T (tij )mun , where
dij
tij (7)
dij  dij
Step 5. Construct a judgment matrix M (mij )8u6 through T (tij )mun , where
­0 0 d tij  0.3
°
mij ®1 0.3 d tij  0.6 (8)
°2 0.6 d t d 1
¯ ij
98 Z.-Y. Lv et al. / A Trapezoidal Fuzzy Multiple Attribute Decision Making Based on Rough Sets

Step 6. Let S (U , C,V , f ) be an information system, and construct the equivalence


relation RB about B Ž C .
For xi U and [ xi ]RB {xk : mki mij , c j  B} , so RB {[ xi ]RB : i  N } .The lower ap-

proximation of U about B is defined by apr (U ) {xi : [ xi ]RB Ž [ xi ]CR , i  N } , then the ap-
| apr (U ) |
proximate quality is rBU . Because rCU 1 , if ck  C , s.t. rCU{ck } 1 , then ck is a
|U |
reduction of C .
Step 7. Give the weight vector Z (Z1, Z2 ," , Zt ) of the set of all non-superfluous at-
tributes, then calculate the values of all alternatives:
t
di ¦ Z jtij (i 1,2,", n) (9)
j 1
Then choose the best alternatives based on the ranking value of d i .

3. An Application Analysis of the Proposed Method

In this section, we present an example to show how the given model works in practice.
A fuzzy multiple attribute decision with trapezoidal fuzzy number involves a company
making an investment decision. Let us consider an investment company, which wants
to make the best investment decision for a given sum of money.
There is a panel with eight possible alternatives U {x1, x2 ,", x8} in which the
company can invest. Each alternative is assessed on six attributes C {c1, c2 ,", c6} . The
decision makers compare these eight companies with respect to the attributes, then con-
~
struct the decision matrix P ( ~pij )8u6 , which is shown below:
§ (0.7,0.72,0.75,0.8) (0.4,0.45,0.6,0.63) (0.7,0.72,0.82,0.9) (0.5,0.5,0.64,0.72) (0.18,0.19,0.2,0.21) (0.09,0.1,0.14,0.17) ·
¨ ¸
¨ (0.54,0.57,0.59,0.6) (0.5,0.52,0.6,0.63) (0.5,0.62,0.62,0.7) (0.5,0.5,0.54,0.6) (0.18,0.19,0.2,0.21) (0.09,0.09,0.098,0.1) ¸
¨ (0.7,0.73,0.78,0.79) (0.5,0.52,0.6,0.63) (0.6,0.72,0.8,0.9) (0.8,0.85,0.9,0.92) (0.21,0.23,0.25,0.27) (0.1,0.1,0.15,0.2) ¸¸
¨
¨ (0.6,0.63,0.66,0.73) (0.4,0.45,0.6,0.63) (0.7,0.72,0.86,0.9) (0.44,0.5,0.66,0.7) (0.17,0.18,0.18,0.19) (0.09,0.12,0.15,0.18) ¸
¨ ¸
¨ (0.72,0.75,0.77,0.8) (0.7,0.73,0.81,0.83) (0.7,0.72,0.8,0.83) (0.7,0.7,0.74,0.8) (0.19,0.21,0.24,0.26) (0.1,0.16,0.18,0.22) ¸
¨ (0.54,0.57,0.59,0.6) (0.4,0.46,0.5,0.56) (0.7,0.75,0.8,0.92) (0.4,0.5,0.54,0.62) (0.18,0.19,0.2,0.21) (0.1,0.12,0.13,0.13) ¸
¨ ¸
¨ (0.6,0.63,0.69,0.71) (0.5,0.52,0.7,0.74) (0.41,0.45,0.5,0.51) (0.44,0.5,0.66,0.7) (0.18,0.19,0.21,0.23) (0.12,0.18,0.21,0.22) ¸
¨ (0.72,0.75,0.77,0.8) (0.5,0.52,0.6,0.63)
© (0.71,0.72,0.86,0.9) (0.7,0.7,0.74,0.8) (0.19,0.21,0.24,0.26) (0.1,0.16,0.18,0.22) ¸¹
~
Step1.Using Eq. (1), construct the circumcenter of the centroid matrix O (( xij , yij )) :
§ (0.7400,0.4146) (0.5217 ,0.3933) (0.7800,0.4036) (0.5833,0.3964) (0.2267 ,0.4161) (0.1233,0.4146) ·
¨ ¸
¨ (0.5767 ,0.4159) (0.5617 ,0.4907 ) (0.6133,0.4135) (0.5300,0.4143) (0.1950,0.4165) (0.0943,0.4166) ¸
¨ (0.7517 ,0.4137 ) (0.5617 ,0.4907 ) (0.7567 ,0.3991) (0.8700,0.4127 ) (0.2400,0.4158) (0.1333,0.4135) ¸¸
¨
¨ (0.6517 ,0.4138) (0.5217 ,0.3933) (0.7933,0.3975) (0.5767 ,0.3887 ) (0.1800,0.4166) (0.1350,0.4148) ¸
¨ ¸
¨ (0.7600,0.4155) (0.7683,0.4097 ) (0.7617 ,0.4097 ) (0.7300,0.4143) (0.2250,0.4153) (0.1667 ,0.4146) ¸
¨ (0.5767 ,0.4159) (0.4800,0.4119) (0.7867 ,0.4085) (0.5167 ,0.4092) (0.1950,0.4165) (0.1217 ,0.4165) ¸
¨ ¸
¨ (0.6583,0.4123) (0.6133,0.3867 ) (0.4700,0.4134) (0.5767 ,0.3887 ) (0.2017 ,0.4160) (0.1867 ,0.4147 ) ¸
¨ (0.7600,0.4155)
© (0.5617 ,0.4097 ) (0.7950,0.3983) (0.7300,0.4143) (0.2250,0.4153) (0.1667 ,0.4146) ¸¹
~
Step 2. Based on Eq. (2), construct the value matrix Q (qij )8u6 of P as follows:
Z.-Y. Lv et al. / A Trapezoidal Fuzzy Multiple Attribute Decision Making Based on Rough Sets 99

§ 0.3068 0.2052 0.3148 0.2312 0.0943 0.0511 ·


¨ ¸
¨ 0.2398 0.2756 0.2536 0.2196 0.0812 0.0393 ¸
¨ 0.3110 0.2756
¨ 0.3020 0.3590 0.0999 0.0551 ¸¸
¨ 0.2697 0.2052 0.3153 0.2242 0.0750 0.0560 ¸
¨ ¸
¨ 0.3158 0.3148 0.3121 0.3024 0.0934 0.0691 ¸
¨ 0.2398 0.1977 0.3214 0.2114 0.0812 0.0507 ¸
¨ ¸
¨ 0.2714 0.2372 0.1943 0.2242 0.0839 0.0774 ¸
¨ 0.3158 0.2301
© 0.3166 0.3024 0.0934 0.0691 ¸¹
Step 3. Based on Eqs. (4) - (5), determine the positive ideal and negative ideal so-
lutions:
AP {(0.72,0.75,0.77,0.8) (0.7,0.73,0.81,0.83) (0.71,0.72,0.86,0.9) (0.8,0.85,0.9,0.92)
(0.21,0.23,0.25,0.27) (0.12,0.18,0.21,0.22)}
AN {(0.54,0.57,0.59,0.6) (0.4,0.46,0.5,0.56) (0.41,0.45,0.5,0.51) (0.4,0.5,0.54,0.62)
(0.17,0.18,0.18,0.19) (0.09,0.09,0.098,0.1)}
Step 4. Construct the similar degree matrix T (tij )8u6 based on Eqs. (6) - (7) as
follows:
§ 0.8908 0.1559 0.9512 0.1910 0.7783 0.3144 ·
¨ ¸
¨ 0 0.2835 0.4401 0.0402 0.2500 0 ¸
¨ 0.9537 0.2835 0.8823 1 1 0.4228 ¸¸
¨
¨ 0.4092 0.1559 0.9942 0.1773 0 0.4407 ¸
¨ ¸
¨ 1 1 0.8923 0.6038 0.7500 0.7836 ¸
¨ 0 0 0.9601 0 0.2500 0.2965 ¸
¨ ¸
¨ 0.4453 0.4640 0 0.1773 0.3618 1 ¸
¨ 1
© 0.2835 1 0.6038 0.7500 0.7836 ¸¹
Step 5. Based on Eq. (8) , construct the judgment matrix M (mij )8u6 :
0 2 0 2 1· §2
¨¸
0 1 0 0 0¸ ¨1
0 2 2 2 1¸ ¨2
¨¸
0 2 0 0 0¸ ¨1
M ¨¸
2 2 2 2 2¸ ¨2
0 2 0 0 0¸ ¨0
¨¸
1 0 0 1 2¸ ¨1
0 2 2 2 2 ¸¹¨2
©
Step 6. Compute the equation class RB , where B Ž C .
RC c1 ^^x1`,^x2`,^x3`,^x4 , x6`,^x5`,^x7 `` ,
RC c2 ^^x1`, ^x2 `, ^x3`, ^x4`, ^x5 , x8`, ^x6`, ^x7 ``,
RC  c ^^x1`, ^x2 , x4 `, ^x3`, ^x5 `, ^x6 `, ^x7 `, ^x8 `` ,
3

RC c4 ^^x1, x3`, ^x2`, ^x4`, ^x5`, ^x6`, ^x7 `, ^x8``,


RC  c ^^x1`, ^x2 `, ^x3`, ^x4 `, ^x5 `, ^x6 `, ^x7 `, ^x8 `` ,
5

RC  c ^^x1`, ^x2 `, ^x3 , x8 `, ^x4 `, ^x5 `, ^x6 `, ^x7 `` ,


6

RC {c ,c } ^^x1`, ^x2 `, ^x3`, ^x4 , x6 `, ^x5 `, ^x7 `` ,


1 5
100 Z.-Y. Lv et al. / A Trapezoidal Fuzzy Multiple Attribute Decision Making Based on Rough Sets

RC {c4 ,c5 } ^^x1, x3`, ^x2 `, ^x4 `, ^x5`, ^x6 `, ^x7 `, ^x8`` ,
RC {c3 ,c5 } ^^x1`, ^x2 , x4 `, ^x3`, ^x5 `, ^x6 `, ^x7 `, ^x8 `` ,
RC {c2 ,c5 } ^^x1`, ^x2 `, ^x3`, ^x4 `, ^x5 , x8 `, ^x6 `, ^x7 `` ,
RC {c5 ,c6 } ^^x1`, ^x2 `, ^x3 , x8 `, ^x4 `, ^x5 `, ^x6 `, ^x7 `` ,
RC ^^x1`, ^x2 `, ^x3`, ^x4 `, ^x5 `, ^x6 `, ^x7 `, ^x8 `` .
Thus, rC {c5 } ( X ) 1 ; therefore, c5 is a reduction of C . so, core(C ) ^c1, c2 , c3 , c4 , c6` .
So we can deduct the fifth line in the matrix T .
Step 7. Let Z {0.18,0.24,0.16,0.23,0.19} be the weight vector of ^c1, c2 , c3 , c4 , c6 ` .
Then using Eq. (9) calculates the values of the alternatives as follows:
d1 0.0511 , d 2 0.0393 , d3 0.0551 , d 4 0.0560 ,
d5 0.0691 , d 6 0.0507 , d 7 0.0774 , d 4 0.0691 .
Therefore, we can conclude that the most desirable alternative is x7 .

4. Conclusion

In this article, a new fuzzy attribute decision making method is proposed, in which the
attributed values are trapezoidal fuzzy numbers. An attribute reduction method is pro-
posed based on the distance definition between two trapezoidal fuzzy numbers and
rough sets, which can improve the accuracy of the evaluation. In future research, the
decision model presented in this paper will be extended to interval type-2 fuzzy values
based on Ref. [10].

Acknowledgment

This paper is supported by the National Natural Science Foundation Project of China
(No.61673285; No.61203285; No. 41601141); the Province Department of Soft Sci-
ence Project in Sichuan (2016ZR0095); soft Science Project of the technology bureau
in Chengdu (2015-RK00-00241-ZF); the high level research team of the major projects
division of Sichuan province (Sichuan letter [2015] no.17-5); the Project of Chengdu
University of Information Technology (N0.CRF201508, CRF201615)

References

[1] L.A. Zadeh, Fuzzy sets. Information and Control, 8(3)(1965):338-353.


[2] Z.Y. Lv, X. N. Liang, X. Z. Liang, L. W. Zheng, A fuzzy multiple attribute decision making method
based on possibility degree, 2015 12th International Conference on Fuzzy Systems and Knowledge
Discovery, January 13, 2016, 450-454.
[3] D.F. Li, Multiple attribute decision making method using extended linguistic variables. International
Journal Uncertain Fuzziness Known Based System, 17(2009): 793-806.
[4] G. Facchinetti, R.G. Ricci and S. Muzzioli, Note on fuzzy ranking Triangular numbers. International
Journal of Intelligent Systems, 13(1998):613-622.
[5] Z.S. Xu and Q.L. Da, Possibility degree method for ranking internal numbers and its applications.
Journal of Systems and Engineering, 18(1)(2003):67-70.
Z.-Y. Lv et al. / A Trapezoidal Fuzzy Multiple Attribute Decision Making Based on Rough Sets 101

[6] P.B. Rao and N.R. Shanker, Ranking fuzzy numbers with an area method using circumcenter of centroids.
Fuzzy Information and Engineering, 1( 2013): 3-18
[7] Z.Y. Lv, T. M. Huang and F.X. Jin. Fuzzy multiple attribute lattice decision making method based on the
elimination of redundant similarity index. Mathematics in Practice and Theory, 43(10)(2013):173-
181
[8] X.Y. Zhang and D.Q. Miao, Quantitative/qualitative region-change uncertainty/certainty in attribute
reduction, Information Sciences, 334-335(2016):174--204.
[9] Z. Pawlak, Rough Sets. International Journal of Computer and Information Science, 11(1982):34-356.
[10] L. Dymova, P. Sevastjanov and A. Tikhonenko, An interval type-2 fuzzy extension of the TOPSIS
method using alphacuts. Knowledge-based Systems, 83(2015):116-127.
102 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-102

Fuzzy Rule-Based Stock Ranking Using


Price Momentum and Market
Capitalization
Ratchata PEACHAVANISH1
Department of Computer Science, Thammasat University, Pathum Thani, Thailand

Abstract. Stock market investing is an inherently risky and imprecise activity,


requiring complex decision making under uncertainty. This paper proposes a
method that applies fuzzy rule-based inference to rank stocks based on price
momentum and market capitalization. Experiments performed on Thai stock
market data showed that high-momentum stocks significantly outperformed the
market index benchmark, and that stocks of companies with small market
capitalization performed better than larger ones. Fuzzy rule-based inference was
applied to combine both the momentum factor and the market capitalization factor,
with different sets of rules for different prevailing market conditions. The result
produced a higher investment return than using either momentum or market
capitalization alone.

Keywords. fuzzy, stock, finance, technical analysis, momentum.

Introduction

Stock market investing is a high-risk activity with a potentially high reward, requiring
complex decision making based on imprecise and incomplete information under
uncertainty. Typically, two analytical approaches are utilized in investment decision
making: fundamental analysis and technical analysis. Decisions based on fundamental
analysis primarily consider the business entity represented by a stock. The information
under consideration includes the nature of the business, its profitability, its
competitiveness, and most importantly its financial standing through detailed study of
its financial statements. For technical analysis, a stock is treated separately from the
business entity. Only stock price movements and patterns generated by them are used
in making trading decisions. Technical analysis views price movements as being
governed by supply and demand of market participants and aims to exploit them.
This paper proposes a technical analysis-based method that applies fuzzy rule-
based inference on stock price momentum and market capitalization (company size),
with different sets of rules for different prevailing market conditions. The method was
tested on the Stock Exchange of Thailand.

1
Corresponding Author: Ratchata PEACHAVANISH, Department of Computer Science, Thammasat
University, Pathum Thani, Thailand; E-mail: rp@cs.tu.ac.th.
R. Peachavanish / Fuzzy Rule-Based Stock Ranking 103

1. Related Works

There is a large and diverse body of research literature on computerized stock market
investing. Techniques in soft computing, fuzzy logic, machine learning, and traditional
data mining have been applied to address various aspects of stock trading, utilizing
both fundamental analysis and technical analysis. Support vector machine and genetic
algorithm were applied on business financial data to perform stock selection that can
outperform market benchmark [1, 2]. Fuzzy logic was applied on stock price
movements to market time stock trades [3], to create a new technical indicator that
incorporated investor risk tendency [4], and to assist in portfolio management [5, 6].
Machine learning experiments on technical analysis-based trading conducted by [7] did
not outperform the market benchmark when using transaction costs. In addition, using
sentiment data obtained from social networks to assist in stock market investing has
also been attempted [8, 9]. A recent comprehensive review of works using evolutionary
computing methods can be found in [10].
Stock markets in different regions have different rules and characteristics. Highly-
developed and efficient markets, such as the New York Stock Exchange, differ greatly
from emerging markets like the Stock Exchange of Thailand. In smaller markets,
extreme price movements are more common as a few well-funded participants can
dictate the market direction in the short term and affect market volatility. This is
especially true for market participants that are classified as foreign fund flows [11].
Lack of regulation and enforcement against insider trading in emerging markets like
Thailand also makes market inefficient and unfair [12]. These differences make
comparisons among research studies difficult. A working strategy under one market
environment may not be effective in another. Nevertheless, the industry-standard way
of judging an investment strategy is to compare the investment return against the
market index benchmark. Most mutual funds, in the long term, failed to outperform the
market [13]. The method proposed in this paper provides a superior investment return
to the market index. It is described in the next section.

2. Method

The strategy proposed in this paper is based on a key technical analysis principle: price
moves in trend and has momentum. This momentum effect, which implies that stock
price tends to continue on its current direction due to inertia, has been observed in
stock markets [14, 15]. Price reversal then occurs after the momentum weakens.
According to this principle, buying stocks with strong upward momentum is likely to
give superior result to buying stocks with weaker or downward momentum. The
strategy is then to make trading decisions based on a technical indicator that reflects
stock price momentum, which by definition is computed from past price series. This
reactive approach therefore makes no attempt is explicitly forecast future price, but
rather to take actions based on past price behavior.
Additionally, past evidence suggested that company’s market capitalization, or its
size, also determines the characteristic of its stock return [16]. In general, stocks of
small companies (the so called “small caps” stocks) tend to be far more volatile than
those of large, established companies (“big caps”). This is simply due to the tendency
for small companies to grow faster, albeit with higher risk. During a bull market, small-
104 R. Peachavanish / Fuzzy Rule-Based Stock Ranking

cap stocks as a group far outperform big-cap stocks. On the other hand, investors prefer
the relative safety of big-cap stocks during an economic downturn or a bear market.
To see how trading using momentum and market capitalization can provide
addition returns above the market index, experiments were performed on the Thai
stocks spanning January 2012 to July 2016. The pool of stocks for the experiments
comprised all constituents of the Stock Exchange of Thailand’s SET100 index. These
stocks are the 100 largest and most liquid stocks in the market (SET100 members are
updated semiannually). These relatively large stocks are considered investment grade
and are least susceptible to manipulations. The daily closing price data of the stocks
were obtained from the SETSMART system [17]. The experiments were conducted
using a custom-written software implemented in the C# language and Microsoft SQL
Server.
The momentum indicator used in the experiment was the Relative Strength Index
(RSI) [18], a standard technical indicator widely-used by stock traders for measuring
the strength of stock price movements. The RSI is a bounded oscillating indicator
calculated using past n-period closing price data series (1).

100
_bc = 100 −
d;
1+
9;

d;−1 ∗ (> − 1) + d; 9;−1 ∗ (> − 1) + 9;


d; = 9; =
> >
(1)

-f;gh; − -f;gh;−1 , ;A -f;gh; > -f;gh;−1 -f;gh;−1 − -f;gh; , ;A -f;gh;−1 > -f;gh;
d; = e 9; = e
0, ;A -f;gh; ≤ -f;gh;−1 0, ;A -f;gh;−1 ≤ -f;gh;

The RSI is effectively a ratio of average gain to average loss during a given past n
consecutive trading periods. An RSI value is bounded between 0 and 100 where a value
higher than 50 indicates an upward momentum and a value lower than 50 indicates a
downward momentum. An extreme value on either end indicates an overbought or an
oversold condition, often used by traders to identify point of price reversal. For this
experiment, the 60-day RSI was chosen.
For trading, the portfolio was given 100 million Thai Baht of cash for the initial
stock purchase. The algorithm selected a quartile of 25 stocks from the pool of 100
stocks ranked by 60-day RSI. They were then purchased on an equal weight basis using
all available cash and held on to for 20 trading days (one month). The process was then
repeated – the algorithm chose a new group of stocks and the portfolio was readjusted
to hold on only to them. Trading commission fees at retail rate were incorporated into
the experiments.
Similarly, the same 100 stocks, this time ranked by market capitalization, were
divided into four quartiles for the algorithm to choose from. However, since the weight
distribution of stocks in the market was nonlinear, each of the four quartiles contained
different numbers of stocks: the first quartile comprised the 4 largest stocks in the
market, the second quartile comprised the next 8 largest stocks, the third quartile
comprised the next 16 largest stocks, and the last quartile comprised the remaining 72
stocks. In other words, every quartile weighted approximately the same when the
market capitalizations of its component stocks are summed.
R. Peachavanish / Fuzzy Rule-Based Stock Ranking 105

The results of the experiments are shown in Table 1. Monthly trading based on 60-
day RSI momentum indicator significantly outperformed the market index. Small-cap
stocks outperformed big-cap stocks.

Table 1. Portfolio returns based on monthly trading using momentum and market capitalization, compared to
the return of the SET100 market index benchmark.
Group By Momentum By Market Capitalization
First Quartile 126.61 % 9.40 %
Second Quartile 68.82 % 29.82 %
Third Quartile 32.12 % 76.96 %
Fourth Quartile -5.29 % 65.31 %
SET100 40.40 % 40.40 %

Experiments using momentum and market capitalization have provided the basis for
stock selection: buy small-cap stocks with high momentum. However, this strategy
does not work during market downtrend. While small-cap stocks as a group outperform
the market during normal times, they severely underperform during market downtrends
due to their lower liquidity. In addition, stocks with high momentum are indicative of
being overbought and have a much greater chance of sudden and strong price reversal.
Price momentum, company size as measured by market capitalization, and
prevailing market condition are the three dimensions that influence stock price
behavior. Each has inherently vague and subjective degrees of measure and so fuzzy
logic [19] is an appropriate tool to assist in the decision-making process. For the
proposed method, fuzzy rules were constructed based on these three factors with
membership functions shown in Figure 1 and fuzzy rule matrix shown in Figure 2. The
60-day RSI indicator was used to indicate both the momentum of stocks and the
prevailing market condition (bull market is characterized by a high RSI value, and vice
versa). There were three linguistic values expressing the momentum – “Weak”,
“Moderate”, and “Strong”, with a typical non-extreme 60-day RSI value ranging
between 40 and 60. For company size, relative ranking of market capitalization was
used instead of the absolute market capitalization of a company. The largest 50 stocks
out of 100 were considered “Large” and “Mid”, with overlapping fuzzy memberships.
The remaining half was considered “Mid” and “Small”, also with overlapping fuzzy
memberships. For output, there were five levels of stock purchase ratings in linguistic
terms: “Strong Buy” (SB), “Buy” (B), “Neutral” (N), “Sell” (S), and “Strong Sell” (SS),
having overlapping numerical scoring ranges between 0 and 10.

Weak Moderate Strong Small Mid Large SS S N B SB


1

40 45 50 55 60 10 30 50 70 90 1 3 5 7 9
Momentum (RSI) Market Capitalization Purchase Rating

Figure 1. Fuzzy membership functions for momentum as measured by RSI (left), company size as measured
by market capitalization (middle), and purchase rating of stock (right).
Mamdani-type [20] fuzzy inference was used to determine stock purchase rating.
For each rule, the intersection between antecedents was evaluated. Consequents of
106 R. Peachavanish / Fuzzy Rule-Based Stock Ranking

rules were then combined using Root-Sum-Square method and the Center of Gravity
defuzzification process was performed to obtain the final crisp stock purchase rating.
The Fuzzy Framework [21] C# library was used to implement the fuzzy logic rule-
based algorithm.
Market Capitalization
Small Mid Large Small Mid Large Small Mid Large
Stock Momentum

Weak SS S N Weak S S S Weak N N N

Moderate SS N B Moderate N N N Moderate N N N

Strong SS B SB Strong B B B Strong SB B N

Weak Market Moderate Market Strong Market

Figure 2. Fuzzy rules for different market conditions as measured by momentum (RSI): weak market (left),
moderate market (middle), and strong market (right).
During strong market condition, money should be allocated first to small-cap
stocks with strong momentum and second to mid-cap stocks, also with strong
momentum. During weak market condition, small-cap stocks should be avoided and
priority should be given to big-cap stocks with strong momentum. For moderate market
condition, desirability of a stock was decided on its momentum.
Portfolio readjustments were performed in the same manner to the previous
experiments. The algorithm chose the top quartile of stocks with the best purchase
rating computed from the fuzzy rules. The portfolio returns 161.76%, which was better
than the best return from the experiment using momentum alone (126.61%) or market
capitalization alone (76.96%). The fuzzy rule-based approach also outperformed both
the SET100 index benchmark (40.40%) and one of the best actively-managed mutual
funds in the industry (“BTP” by BBL Asset Management Co., Ltd. at 124.43%) The
results are shown in Figure 3.
200
150
100
50
0
SET100 BTP Mutual Fund Momentum Market Capitalization Fuzzy Rules

Figure 3. Investment returns by algorithms: best result from momentum-only strategy (126.61%), best result
from market capitalization-only strategy (76.96%), and fuzzy rule-based method (161.76%). Returns of the
SET100 index benchmark and “BTP” mutual fund are shown for comparison.

3. Conclusions and Future Works

This paper proposes a method that uses fuzzy rule-based inference to rank stocks based
on a combination of price momentum, company’s market capitalization, and prevailing
market condition. The method yields superior return to both the market index
benchmark as well as an industry-leading mutual fund. The method can be further
R. Peachavanish / Fuzzy Rule-Based Stock Ranking 107

improved in the future by incorporating the ability to hold cash during market
downturns. Additionally, short-term indicators may also be used to detect imminent
weakening or strengthening of momentum – information that is potentially useful in
making trading decisions.

References

[1] H. Yu, R. Chen, and G. Zhang, A SVM stock selection model within PCA, 2nd International Conference on
Information Technology and Quantitative Management, 2014.
[2] C. Huang, A hybrid stock selection model using genetic algorithms and support vector regression,
Applied Soft Computing, 12 (2012), 807-818.
[3] C. Dong, F. Wan, A fuzzy approach to stock market timing, 7th International Conference on Information,
Communications and Signal Processing, 2009.
[4] A. Escobar, J. Moreno, S. Munera, A technical analysis indicator based on fuzzy logic, Electronic Notes
in Theoretical Computer Science 292 (2013), 27-37.
[5] K. Chourmouziadis, P. Chatzoglou, An intelligent short term stock trading fuzzy system for assisting
investors in portfolio management, Expert Systems with Applications, 43 (2016), 298-311.
[6] M. Yunusoglu, H. Selim, A fuzzy rule based expert system for stock evaluation and portfolio
construction: An application to Istanbul Stock Exchange, Expert Systems with Applications, 40 (2013),
908-920.
[7] A. Andersen, S. Mikelsen, A novel algorithmic trading frame-work applying evolution and machine
learning for portfolio optimization, Master’s Thesis, Norwegian University of Science and Technology,
2012.
[8] J. Bollen, H. Mao, X. Zeng, Twitter mood predicts the stock market, Journal of Computational Science, 2
(2011), 1-8.
[9] L. Wang, Modeling stock price dynamics with fuzzy opinion networks, IEEE Transactions on Fuzzy
Systems, (in press).
[10] Y. Hu, K. Liu, X. Zhang, L. Su, E. W. T. Ngai, M. Liu, Application of evolutionary computation for
rule discovery in stock algorithmic trading: a literature review, Applied Soft Computing, 36 (2015), 534-
551.
[11] C. Chotivetthamrong, Stock market fund flows and return volatility, Ph.D. Dissertation, National
Institute of Development Administration, Thailand, 2014.
[12] W. Laoniramai, Insider trading behavior and news announcement: evidence from the Stock Exchange
of Thailand, CMRI Working Paper, Thai Stock Exchange of Thailand, 2013.
[13] C. Mateepithaktham, Equity mutual fund fees & performance, SEC Working Papers Forum, The
Securities and Exchange Commission, Thailand, 2015.
[14] N. Jegadeesh, S. Titman. Returns to buying winners and selling losers: implications for stock market
efficiency, Journal of Finance, 48 (1993), 65-91.
[15] R. Peachavanish, Stock selection and trading based on cluster analysis of trend and momentum
indicators, International MultiConference of Engineers and Computer Scientists, 2016.
[16] T. Bunsaisup, Selection of investment strategies in Thai stock market, Working Paper, Capital Market
Research Institute, Thailand, 2014.
[17] SETSMART (SET market analysis and reporting tool), http://www.setsmart.com.
[18] J. Welles Wilder, New concepts in technical trading systems, Trend Research, 1978.
[19] L. Zadeh, Fuzzy sets, Information and Control, 8 (1965), 338-353.
[20] E. Mamdani, S. Assilian, An experiment in linguistic synthesis with a fuzzy logic controller,
International Journal of Man-Machine Studies, 7 (1975), 1-13.
[21] Fuzzy Framework, http://www.codeproject.com/Articles/151161/Fuzzy-Framework.
108 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-108

Adaptive Fuzzy Sliding-Mode Control of


Robot and Simulation
Huan NIUa, Jie YANGa,1 and Jie-Ru CHIb
a
School of Electrical Engineering, Qingdao University, Qingdao, Shandong, China
b
School of Electronic and Information Engineering, Qingdao University, Qingdao,
Shandong, China

Abstract. Aiming at control method of 2-DOF joint robot, the 3D robot model is
established in ADAMS firstly, and then dynamic equation of the robot is deduced
by using the obtained parameters. And dynamic model is combined with control
system model in MATLAB/Simulink by the ADAMS/Control module and is
established coordinated simulation system. In order to eliminate the effect of the
modeling error and uncertainty signal, a sliding-mode control is proposed. In this
method, a linear sliding surface is used to ensure the system to reach equilibrium
with the sliding surface in finite time; and fuzzy control is used to compensate for
the modeling error and uncertainty signal. Equivalent control law and switching
control law are derived by using Lyapunov stability criterion and exponential
reaching law. Fuzzy control law and membership function are set up by using
fuzzy control rules. Through online adaptive learning of fuzzy, buffeting is
weakened. Simulation result shows that the control method is effective.

Keywords. Joint Robot, fuzzy control, sliding-mode control, simulation

Introduction

In order to achieve accurate control of the multi-joint robot system including modeling
errors and uncertainty signals, there have been many effective methods. And the
development of robot control theory has gone through three stages: traditional control,
modern control and intelligent control. The traditional control theory mainly includes
PID control, feed-forward control, and so on; modern control theory mainly includes
robust control, sliding-mode control and so on; intelligent control theory mainly
includes fuzzy control, neural network control, adaptive control, etc [1-2].Robot
control is divided into point-to-point control(PTP) and trajectory tracking control(or
continuous path control, CP).Point-to-point control only requires that the end effector
of the robot is moved from one point to another without taking into account the motion
trajectory. Robot trajectory tracking control is that the driving torque of each joint is
given, so that the position, velocity and other state variables of robot are tracked the
known ideal trajectory. For the entire trajectory, it is necessary to strictly control [3-6].
In recent years, fuzzy control and sliding-mode control have been got more and
more people's attention for their strong robustness. As for the sliding-mode control, by
designing a stable sliding surfaces can ensure that the control system would be run into

1
Corresponding Author: Jie YANG, School of Electrical Engineering, Qingdao University, 308
Ningxia Rd, Qingdao, Shandong, PRC, 266071, China; E-mail: jackiey69@sina.com.
H. Niu et al. / Adaptive Fuzzy Sliding-Mode Control of Robot and Simulation 109

the surfaces from any of the initial state within a limited time frame, and be sported
near the balance point on the surface. But the problem of buffeting is still existed in the
control system and the upper limit of the modeling error and uncertainty signal of the
control system must be knew in advance in this method, which is hard to do in the
actual robot control[7]. However, the fuzzy control has overcome these deficiencies,
which is an effective way to eliminate the buffeting of sliding-mode control system. Its
strong adaptive learning capability can also be used to weaken the uncertain signal. So
combining sliding-mode control with fuzzy control is used to implement the control of
trajectory tracking, which ensures the stability and effectiveness of the control system.
In this paper, the first part mainly introduces the establishment of the 3D model
and the derivation of dynamic equation of robot; in the second part, the design of the
sliding-mode control system is introduced; in the third part, the design of the sliding-
mode control system is introduced; in the fourth part, the simulation experiment and
simulation results of the robot control system are introduced; a brief summary is at the
end of the paper. Those have a certain reference value for the robot control in the future.

1. Mechanical Virtual Prototype System

Firstly, the 3D model of the robot is established in function module of ADAMS VIEW,
which has two robotic arms and can be realized 2-DOF rotary motion in YOZ plane.
The robotic arms’ length is set to 0.225m and qualities are set to 0.03kg, as shown in
Figure 1.

Figure 1. 3D model of robot

The barycenter positions of two robotic arms are


x1 =0.1125, y1 =0.33, z1 =0; x2

=0.3375,
y2 =0.39, z2 =0.The inertial parameters of robot are I =0.1732, I yy =0.1588,
xx

I zz =0.0251.Based on D-H coordinate method, the dynamic equation of robot is


deduced.
M (q) > q1 q2 @  C (q, q) > q1 q2 @  G ( q )  U t
T T
W (1)
In Eq.(1): W is controlling moment; q, q, q  R are respectively position
n

vector ,velocity vector and acceleration vector of joint angular.


ª M M 12 º nu n
M (q ) « 11
M M »  R , it is inertia matrix; parameters of it are: M 22 0.0252 ,
¬ 21 22 ¼

M11 0.004545cos q2  0.005265sin q2  0.0519 , M12 M 21 0.00227cos q2  0.00263sin q2  0.0252 ;


110 H. Niu et al. / Adaptive Fuzzy Sliding-Mode Control of Robot and Simulation

ª C111 C112 C121 C122 º


»  R , it is centrifugal force matrix, parameters of it
n
C ( q, q ) «C
¬ 211 C212 C221 C 222 ¼

are: C111 C212 C221 C222 0 , C112 C121 C122 0.0026325cos q2  0.0022725sin q2 ,
C211 0.0026325cos q2  0.0022725sin q2 ; G(q) >G1 G2 @ is gravity matrix, parameters of
it are: G1 G2 0 ; U t is the modeling error and uncertainty signal; it is generally set
to the same form of input signal, which the amplitude is > 2% 5%@ of input signal[8].

2. Sliding-mode Control System of Robot

2.1. The Design of Sliding-mode Surface

The purpose of trajectory tracking control of robot is to make the joint position
vectorconsistent with the desired joint angular displacement as much as possible[9-10].
Therefore, the sliding-mode surface is designed to Eq.(2):
s e De (2)
In Eq.(2): D is constant of sliding-mode surface; e q  qr is tracking error; e q  qr
is derivative of tracking error. And the exponential reaching law of sliding-mode
s
control is designed to s M  Ks , and M , K ! 0 .
s

2.2. The Design of Control Law

The Eq.(2)is simultaneous withthe reaching law, so the Eq.(3) can be got:
W ueq  uvss (3)
In Eq.(3):
s
ueq M (q)qr  C (q, q )q  G (q )  U (t )  D M (q )e , uvss M M (q )  KsM (q ) ;
s
K ! K  U (t ) , K is any small positive number; M , K are parameters of exponential
reaching law.

3. Fuzzy Control System of Robot

3.1. The Design of Fuzzy Control Rule

In multi-joint robot system,the effect of modeling error and uncertainty signal is always
existed. So combining sliding-mode control with fuzzy control is usually used to
weaken the effect, which ensures the stability and effectiveness of the control system.
Fuzzy reasoning is used to establish fuzzy rules. The fuzzy set is defined as shown in
Table 1:
H. Niu et al. / Adaptive Fuzzy Sliding-Mode Control of Robot and Simulation 111

Table 1Rule Set of Fuzzy Controller

S*ds ds
NB NM NS ZO PS PM PB
s
PB NB NB NM PM PB PB PB
PM NB NM NS PS PM PB PB
PS NM NS NS PS PS PM PB
ZO NM NS NS ZO PS PS PM

NS PB PM PS PS NS NS NM
NM PB PB PM PS NS NM NB
NB PB PB PB PM NM NB NB
Among the Table 1: NB is represented the maximum of negative number; NM is
represented the middle value of negative number; NS is represented the minimum of
negative number; ZO is represented the zero; PN is represented the minimum of
positive number; PM is represented the middle value of positive number; PB is
represented the maximum of positive number. Fuzzy rules are the model of IF-THEN:
Rm : if i is A and i is B si si is C
l

A, B and C are taken from Table 1.

3.2. The Design of Membership Function and Control Law

The membership function of fuzzy control is set up by fuzzy logic toolbox of


MATLAB/Simulink. The basic form is set to triangular membership function(trimf);
the range of values is set to [-3 3]; defuzzify is the method of membership degree of
average maximum.
After the fuzzy control is introduced into the sliding-mode control, the control law
should be changed to the form of Eq.(4):
W ueq  uvss  u f (4)
In Eq.(4): u f u f (x T ) ª¬u f 1 ( x1 T1 ) u f 2 ( x2 T 2 ) º¼ , it is the output of fuzzy
controller; xi > si , si @ , it is the input of fuzzy controller; Ti ri si[ ( x) , it is adaptive
law; ri is learning coefficient of control system.

3.3. Stability Analysis of Fuzzy Control

For the 2-DOF robot, it is assumed that the upper bound of the modeling error and
uncertainty signal is Ui (t ) d Li ; optimal approximation parameter of adaptive laws is:
Ti arg min ª¬sup u fi ( xi Ti )  ( Li  K ) sign( s) º¼ ;adaptive error is Ti Ti  Ti ; the upper
T R

bound of approximation error of the fuzzy controller is > w1 w2 @ ; the minimal


112 H. Niu et al. / Adaptive Fuzzy Sliding-Mode Control of Robot and Simulation

approximation error of fuzzy controller is H i u fi ( xi T )  ( Li  K ) sign( si )  wi .The Lyapunov


1 T 1 2 1
function is V s M (q)s  ¦ TiT Ti ; it is taken the derivative:
2 2 i 1 ri
2
V
1 T
s M (q ) s  sT M (q ) s  sT M (q ) s  ¦ TiT Ti
1
2 i 1 ri
(5)
2

¦ ª¬( L  K ) s
i 1
i i  H i si  Ui (t )  wi si º¼  0

The result of Eq.(5) shows that the control system has the global stability.

4. Simulation Experiment

The control system is set up in MATLAB/Simulink, as shown in Figure 2.

Figure 2.Control system


Physical parameters of 2-DOF robot are set as follow: the parameters of sliding-
mode surface: D1 D 2 1 ;the parameters of exponential reaching law: M 10, K 10000 ;
s is replaced with s +0.000001 to prevent the emergence of the Singularity;
memory module is used to prevent the emergence of the algebraic loop, and the
parameters of it is set to 1; learning coefficient: ri >0.85 0.85 0.85 0.85 0.85 0.85@ , desired
trajectory: qr1 1  cos(S t ) , qr 2 0.5  0.5cos(S t) ; modeling error and uncertainty signal:
U1 (t ) 0.5sin 0.5t , U2 (t ) 0.1sin 0.1t .
S-Function is written in MATLAB, which is used to simulate. MATLAB is
connected with ADAMS by the interface of Control, and the simulation of control
system is implemented. After that, the trace curve of joint_1, trace curve of joint_2 and
error curve are gotten, as shown in Figure3, Figure4and Figure5.
If PD Control is used, which the control law is Eq.(6):
W i k pi ei  kdi ei (6)
H. Niu et al. / Adaptive Fuzzy Sliding-Mode Control of Robot and Simulation 113

Figure 3. Trace Curve of Joint_1 Figure 4. Trace Curve of Joint_2

Figure 5. Error Curve of Joint1/2 Figure 6. Error Curve of PD Control


The error curve of PD control will be gotten, as shown in Figure 6.
Through the observation of Figure3, Figure4 and Figure5, the adaptive fuzzy
sliding-modecontrol has better ability in trajectory tracking. When the interfering signal
is imported, the control system can be restored to steady state operation near the
equilibrium point of sliding-mode surface(s=0). So the control system is effective and
robust. There is no obvious buffeting in simulation experiment, so the control system is
met the requirement of design. Comparing Figure5 and Figure6 shows that adaptive
fuzzy sliding-mode control is superior to the PD control. For the effects of same
interference signal, anti-jamming capability of PD control is poorer. The control error
is increased and the control precision is reduced sharply. This can also verify the
validity of adaptive fuzzy sliding-mode control.

5. Conclusions

In allusion to position control of 2-DOF joint robot and the modeling error and
uncertainty signal of control system, adaptive fuzzy sliding-mode control is proposed.
Simulation experiment is conducted in MATLAB and ADAMS, and the experiment
result of adaptive fuzzy sliding-mode control is compared with PD control. The
simulation result shows that adaptive fuzzy sliding-mode control is effective and robust.
And there is no obvious buffeting in control system. The trajectory tracking is more
effective than PD control. So this controlpolicy has practical operability, and the study
would supply a certain practice guidance with value in theory.
114 H. Niu et al. / Adaptive Fuzzy Sliding-Mode Control of Robot and Simulation

Acknowledgment

This work is supported by the Science & Technology Project of College and University
in Shandong Province (J15LN41).

References

[1] J. X. Lv, Y. H. Li, X. Z. Wang, X. L. Bao, Mechanical structure optimization and power fuzzy control
design of picking robot end effector, Journal of Agricultural Mechanization Research, 38(2016): 36-40.
[2] S. H. Ju, Y. M. Li, Research on nonholonomic mobile robot based on self-adjusting universe fuzzy
control, Electronic Design Engineering, 24(2016), 103-106.
[3] Z. M. Ju, Fuzzy, control is applied to wheel type robot target tracking, Computer Measurement &
Control, 22(2014): 614-616.
[4] J. L. Zhang, Comprehensive obstacle avoidance system based on the fuzzy control for cleaning robot,
Machine Tool & Hydraulics, 18(2014): 92-95.
[5] Z. B. Ma, Self-adjusting parameter fuzzy control for self-balancing two-wheel robots, Techniques of
Automation and Applications, 33(2014): 9-13
[6] S. B. Hu, M. X. Lu, Fuzzy integral sliding mode control for three-links spatial robot, Computer
Simulation, 20(2012): 162-166.
[7] L. Lin, H. R. Wang, Y. N. Hu, Fuzzy adaptive sliding mode control for trajectory tracking of uncertain
robot based on saturated function, Machine Tool& Hydraulics, 36(2008): 137-140.
[8] C. Z. Xu, Y. C. Wang, Nonsingular terminal fuzzy sliding mode control for multi-link robots based on
back stepping, Electrical Automation, 34(2012): 8-9.
[9] W. D. Gao, Y. M. Fang, W. L. Zhang, Application of adaptive fuzzy sliding mode control to
servomotor system, Small& Special Electrical Machines, 37(2009): 32-36.
[10] T. W. Wu, Y. S. Yang, Research on simulation of adaptive sliding-mode guidance law, Modern
Electronics Technique, 34(2011): 23-25.
Fuzzy Systems and Data Mining II 115
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-115

Hesitant Bipolar Fuzzy Set and Its


Application in Decision Making
Ying HAN 1 , Qi LUO and Sheng CHEN
B-DAT & CICAEET, Nanjing University of Information Science and Technology,
Jiangsu, Nanjing 210044, P. R. China

Abstract. In this paper, by combining hesitant fuzzy set with bipolar-valued fuzzy
set, the concept of hesitant bipolar value fuzzy set is introduced, and the hesitant
bipolar fuzzy group decision making method based on TOPSIS is proposed. Our
study firstly integrates fuzziness, hesitation and incompatible bipolarity in multiple
criteria decision making method. An illustrative case of chemical project evaluation
also demonstrates the feasibility, validity, and necessity of our proposed method.
Keywords. Fuzzy set, Bipolar-valued fuzzy set, Hesitant fuzzy set, Multiple criteria
decision making, Incompatible bipolarity

Introduction

As an extension of fuzzy set [1], hesitant fuzzy set (HFS) was introduced by Torra and
Narukawa to describe the case that the membership degrees of an element to a given set
have a few different values, which arises from hesitation the decision makers hold [2].
A growing number of studies focus on HFS and some extensions are presented, such as
interval-valued HFS [3], possible-degree generalized HFS [4] and linguistic HFS [5].
On the other hand, in recent years, incompatible bipolarity has attracted re-
searchers’ attentions with some instructive results have devoted to it [6,7]. In fact,
incompatible bipolarity is inevitable in the real world. See an example of the psychol-
ogy disease-bipolar disorder. A patient suffering bipolar disorder has episodes of mania
and depression. Two poles may simultaneously reach extreme cases, i.e., the sum of
positive pole value and negative pole value is bigger than 1. Bipolar-valued fuzzy set
(BVFS) was pointed out is suitable to handle incompatible bipolarity [8,9].
The aforementioned HFS and its extensions can not accommodate incompatible
bipolarity. Considering BVFS is adept at modeling incompatible bipolarity, by combin-
ing BVFS with HFS, hesitant bipolar fuzzy set (HBFS) is introduced in this paper. And a
hesitant bipolar fuzzy multiple criteria group decision making (MCGDM) method based
on TOPSIS [10] is presented. Our study firstly accommodates fuzziness, hesitation, and
incompatible bipolarity in fuzzy set and multiple criteria decision making.
The rest is structured as follows. In Section 1, some related notions are reviewed.
The concept of HBFS is introduced and some related properties are discussed. In Section
1 Corresponding Author: Ying Han, B-DAT & CICAEET, Nanjing University of Information Science and

Technology, Jiangsu, Nanjing 210044, P. R. China; E-mail:hanyingcs@163.com.


116 Y. Han et al. / Hesitant Bipolar Fuzzy Set and Its Application in Decision Making

2, a hesitant bipolar fuzzy group decision making method based on TOPSIS is presented.
In Section 3, an illustrated case about chemical project evaluation is included to show the
feasibility, validity, and necessity of the theoretical results obtained. Finally, the paper is
concluded in Section 4.
Throughout the paper, denote I P = [0, 1], I N = [−1, 0]. The sets X always repre-
sents the finite discourse.

1. Hesitant Bipolar Fuzzy Set

In this section, firstly, some related notions are reviewed. Then, the concept of HBFS is
introduced and some related properties are discussed.
In [2], Torra and Narukawa suggested the concept of HFS permitting the member-
ship degree of an element to a set presented as several possible values in I P . In [11],
bipolar-valued fuzzy set B in X is defined as B = {< x, B(x) = (B P (x), B N (x)) >|
x ∈ X}. Where the functions B P : X → I P , x → B P (x) ∈ I P and B N : X →
I N , x → B N (x) ∈ I N define the satisfaction degree of the element x ∈ X to the prop-
erty corresponding and the implicit counter-property to the BVFS B in X, respectively.
Denote L = {α = (αP , αN ) | αP ∈ I P , αN ∈ I N }, then α is called a bipolar-valued
fuzzy number (BVFN) in [9]. For any α = (αP , αN ), the preference order relation is
defined as α ≤ β if and only αP ≤ β P and αN ≤ β N . The preference order relation is
P N
partial. Denote αM = α +α 2 and we see that if α ≤ β, then αM ≤ β M , then we can
rank all the BVFNs according to their mediation values [9].
Next, the concept of the HBFS is introduced, accommodating fuzziness, hesitation,
and incompatible bipolarity in fuzzy set theory for the first time.

Definition 1 Hesitant bipolar fuzzy set in X is defined as à = {< x, h̃à (x) >| x ∈ X}.
Where h̃Ã (x) is a set of some different BVFNs in L, representing the possible bipolar
membership degree of the element x ∈ X to the set Ã. For convenience, h̃Ã (x) is called
a hesitant bipolar fuzzy element (HBFE), a basic unit of HBFS.

Inspired by work about HFS proposed by Xia et al. [12], for a HBFE h̃Ã (x), it
is necessary to arrange the BVFNs in h̃Ã (x) in the increasing order according to the
mediation value. Suppose that l(h̃Ã (x)) stands for the number of BVFNs in HBFE h̃Ã (x)
σ
and h̃Ãj (x) be the jth largest BVFN in h̃Ã (x). Given two different HBFSs Ã, B̃ in X,
denote lx = max{l(h̃Ã (x)), l(h̃B̃ (x))}. If l(h̃Ã (x)) = l(h̃B̃ (x)), then the shorter one
should be extended by adding the largest value until it has the same length with the longer
one.
In the following paper, all of HBFSs in X are denoted by F̃ (X). HBFE is denoted
by h̃ for simplicity, and the set of all of the h̃ is denoted by L̃. And the preference order
relation in L̃ is defined in the following definition.

Definition 2 Let h̃1 , h̃2 ∈ L̃, then define preference order relation in L̃ as follows: h̃1 ≤
σ σ σ σ σj
h̃2 if and only if (h̃1 j )P ≤ (h̃2 j )P and (h̃1 j )N ≤ (h̃2 j )N . Where, (h̃i ) be the jth
σj
largest BVFN in (h̃i ), (i = 1, 2) according to the mediation value.
Y. Han et al. / Hesitant Bipolar Fuzzy Set and Its Application in Decision Making 117

Aggregation operator is the fundamental element in MCGDM method, thus by in-


troducing some operations about HBFEs, a hesitant bipolar fuzzy aggregation operator
is proposed.

Definition 3 Let h̃1 , h̃2 ∈ L̃ and λ > 0. Define operations in L as follows:



h̃1 ⊗ h̃2 = (γ̃1P · γ̃2P , −γ̃1N · γ̃2N ),
γ̃1 ∈h̃1 ,γ̃2 ∈h̃2
  
(h̃1 )λ = (γ̃1P )λ , −|γ̃1N |λ
γ̃1 ∈h̃1

Definition 4 Let h̃i ∈ L̃ (i = 1, 2, · · · , n),and w = (w1 , w2 , · · · , wn ) be the weight


n
vector of h̃i satisfying wi ∈ I P as well as i=1 wi = 1. Then, a hesitant bipolar fuzzy
weighted geometric (HBFWG) operator is a mapping defined as follows:

HBF W G(h̃1 , h̃2 , · · · , h̃n ) = ⊗ni=1 h̃w


i
i
(1)

Distance is needed in TOPSIS method, next, axiom definition of distance about


HBFSs is introduced.

Definition 5 For any Ã, B̃, C̃ ∈ F̃ (X), if the operation d˜ : F̃ (X) × F̃ (X) → I P
satisfying the following conditions: 1◦ 0 ≤ d( ˜ Ã, B̃) ≤ 1 and d(
˜ Ã, B̃) = 0 if and only if
à = B̃; 2◦ d( ˜ B̃, Ã); 3◦ d(
˜ Ã, B̃) = d( ˜ Ã, C̃) ≤ d(
˜ Ã, B̃) + d(
˜ B̃, C̃). Then d˜ is called the
distance in F̃ (X).

A distance about HBFSs is proposed in the next example.



n
Example 1 Let ωi ∈ I P (i = 1, 2, · · · , n) satisfying ωi = 1. For any Ã, B̃ ∈ F̃ (X),
i=1
weighted Hamming distance d˜wh between à and B̃ is defined as follows:
 lxi 
 
 σj P σ 
d˜wh (Ã, B̃) = ωi 1
2lxi (h̃Ã ) (xi ) − (h̃B̃j )P (xi )
 j=1  (2)
 σ σ 
+ (h̃Ãj )N (xi ) − (h̃B̃j )N (xi )

2. Hesitant Bipolar Fuzzy Multiple Criteria Decision Making Method

In this section, based on the theory results in the above section, a hesitant bipolar fuzzy
MCGDM method based on TOPSIS is presented.
Considering a MCGDM problem with hesitant bipolar fuzzy information. Let
{x1 , · · · , xm } be the alternatives set, {c1 , · · · , cn } be the evaluation criteria set and t
experts be invited to make evaluation. The hesitant bipolar fuzzy evaluation value to
alternative xi about the criteria cj given by the sth expert is denoted by the HBFE
(h̃sij ), then we can derive the hesitant bipolar fuzzy matrix (HBFM) given by the sth
expert as H̃ s = (h̃sij )m×n (i = 1, · · · , m; j = 1, · · · , n, s = 1, · · · , t). Suppose all the
118 Y. Han et al. / Hesitant Bipolar Fuzzy Set and Its Application in Decision Making

BVFNs in h̃sij is arranged in the increasing order according to the mediation value. The
weights vector about experts is supposed to be known as w = (w1 , · · · , wt ) satisfying
s
ws ∈ I P and ws = 1 and the weights vector about criteria is supposed to be known
t=1

n
as ω = (ω1 , · · · , ωn ) satisfying ωj ∈ I P and ωj = 1.
j=1
The hesitant bipolar fuzzy multiple criteria decision making method based on TOP-
SIS is given as follows:
Step 1. Use (1) to aggregate the HBFM H̃ s to get the comprehensive HBFM H̃ =
(h̃ij )m×n (i = 1, · · · , m; j = 1, · · · , n). Where h̃ij = HBF W G(h̃1ij , h̃2ij , · · · , h̃tij ).
Step 2. Denote lj = maxi=1,···,m {l(h̃ij )}. For j = 1, · · · , n, if l(h̃ij ) < lj , adding
the largest value in it until its length equal to lj . And compute

 
(h̃j )∗ =
σ(1) σ(1)
maxi=1,···,m (h̃ij )P , maxi=1,···,m (h̃ij )N , · · · ,
 
σ(l ) σ(l ) (3)
maxi=1,···,m (h̃ j )P , maxi=1,···,m (h̃ j )N
ij lj

and

 
σ(1) σ(1)
(h̃j )∗ = mini=1,···,m (h̃ij )P , mini=1,···,m (h̃ij )N , · · · ,
 
σ(l ) σ(l ) (4)
mini=1,···,m (h̃ij j )P , mini=1,···,m (h̃lj j )N .

Then h̃∗ = {(h̃1 )∗ , · · · (h̃n )∗ } is the positive ideal point and h̃∗ = {(h̃1 )∗ , · · · (h̃n )∗ } is
the negative ideal point.
Step 3. Denoted h̃i = {h̃i1 , · · · , h̃in }. For any i = 1, · · · , m, compute the distance
(d˜i )∗ ((d˜i )∗ )between h̃i and h̃∗ (h̃∗ ) by (2).
˜i )∗
Step 4. Computes ξi = (d˜ )(d+( d˜i )∗
, i = 1, · · · , m
i ∗
Step 5. Rank the alternatives according to the principle that the smaller ξi is, the
better the project xi is.

3. Case Study

In this section, a chemical project evaluation problem is presented to demonstrate how


to use the proposed method to make evaluation with incompatible fuzzy bipolarity and
hesitation information.

Example 2 Considering a chemical project evaluation problem. Suppose there are four
chemical projects {x1 , x2 , x3 , x4 } to be evaluated, and two experts are invited to make
evaluation. c1 : economy, c2 : environment and c3 : society are evaluation criteria. Con-
sidering its economy evaluation criteria: in a short time, it may bring huge benefits to
the company, resulting its positive evaluation value is 0.8; on the other hand, in the long
run, the pollution needs a huge amount of money to fix, resulting its negative evaluation
value is 0.7. The sum of two poles is 1.5, bigger than 1, i.e., there exists incompatible
bipolarity. And when make evaluation, experts may have hesitation among several mem-
Y. Han et al. / Hesitant Bipolar Fuzzy Set and Its Application in Decision Making 119

berships, thus, to alternative xi about the criteria cj the evaluation values given by the
sth expert is denoted by the HBFE h̃sij . Suppose all the BVFNs in h̃sij is arranged in the
increasing order according to the mediation value. The HBFMs given by Expert 1 and
Expert 2 are presented in Table 1 and Table 2, respectively. The weights vectors about
experts and criteria are given as w = (0.7, 0.3) and ω = (0.3, 0.5, 0.2), respectively.

Table 1 HFBM given by Expert 1


X c1 c2 c3
x1 ([0.9, −0.7]) ([0.8, −0.6], [0.7, −0.4]) ([0.7, −0.4], [0.8, −0.3])
x2 ([0.8, −0.1]) ([0.8, −0.3]) ([0.9, −0.2], [0.8, −0.1])
x3 ([0.7, −0.2]) ([0.6, −0.3]) ([0.7, −0.1])
x4 ([0.6, −0.4]) ([0.5, −0.3]) ([0.7, −0.4])

Table 2 HFBM given by Expert 2


X c1 c2 c3
x1 ([0.8, −0.6]) ([0.8, −0.5]) ([0.8, −0.4])
x2 ([0.8, −0.2]) ([0.7, −0.1]) ([0.8, −0.2], [0.7, −0.1])
x3 ([0.6, −0.3]) ([0.7, −0.2], [0.8, −0.4]) ([0.8, −0.3])
x4 ([0.6, −0.3]) ([0.6, −0.2], [0.7, −0.1]) ([0.6, −0.4])

Next, we will see how to use the proposed method to make evaluation.
Step 1. Use (1) to aggregate the HBFM H̃ s given by the sth expert to get the com-
prehensive HBFM H̃ = (h̃ij )4×3 .
The comprehensive HBFM is given in Table 3.

Table 3 Comprehensive HFBM


U c1 c2
x1 ([0.8688, −0.6684]) ([0.8000, −0.5681], [0.7286, −0.4277])
x2 ([0.8000, −0.1231]) ([0.7686, −0.2158])
x3 ([0.6684, −0.2259]) ([0.6284, −0.2656], [0.6541, −0.3270])
x4 ([0.6000, −0.3669]) ([0.5281, −0.2656], [0.5531, −0.2158])
X c3
x1 ([0.7286, −0.4000], [0.8000, −0.3270])
x2 ([0.8688, −0.2000], [0.7686, −0.1000], [0.8000, −0.1231], [0.8346, −0.1625])
x3 ([0.7286, −0.1390])
x4 ([0.6684, −0.4000])

Step 2. Compute the positive (negative) ideal point h̃∗ (h̃∗ ) by (3) ((4)).
By (3), we have h̃∗ = {([0.8688, −0.1231]), ([0.8000, −0.2158], [0.7686, −0.2158]),
([0.8688, −0.1390], [0.8000, −0.1000], [0.8688, −0.1231], [0.8000, −0.1390])}.
By (4), we have h̃∗ = {([0.6000, −0.6684]), ([0.5281, −0.5681], [0.5531, −0.4277]),
([0.6684, −0.4000], [0.6684, −0.4000], [0.6684, −0.4000], [0.6684, −0.4000])}.
Step 3. Compute the distance (d˜i )∗ ((d˜i )∗ ) between h̃i and h̃∗ (h̃∗ ) by (2), i =
1, 2, 3, 4.
By (2), we have (d˜1 )∗ = 0.1850, (d˜2 )∗ = 0.0199, (d˜3 )∗ = 0.1116, (d˜4 )∗ = 0.1934;
(d˜1 )∗ = 0.1178, (d˜2 )∗ = 0.2878, (d˜3 )∗ = 0.1977, (d˜4 )∗ = 0.1093;
˜i )∗
Step 4. Computes ξi = (d˜ )(∗d+( d˜i )∗
, i = 1, 2, 3, 4.
i
We have ξ1 = 0.3890, ξ2 = 0.9352, ξ3 = 0.6392, ξ4 = 0.3610.
Step 5. Rank the alternatives according to the principle.
120 Y. Han et al. / Hesitant Bipolar Fuzzy Set and Its Application in Decision Making

We get the conclusion that x2 is the optimal project.


Comparison
If we just consider the positive evaluation of criteria in Example 2, we get (ξ1 )P =
0.9230, (ξ2 )P = 0.8377, (ξ3 )P = 0.3796, (ξ4 )P = 0, then x1 is the optimal project.
The result is different from the one considering incompatible bipolarity. Comparing with
the existing method, by accommodating incompatible bipolarity, fuzziness and hesitation
information in decision making for the first time, our method is more suitable to the
urgent demands about protection of environment and resource.

4. Conclusions

In this paper, by combining hesitant fuzzy set with bipolar-valued fuzzy set, the concept
of hesitant bipolar value fuzzy set is introduced, and then, a hesitant bipolar fuzzy group
decision making method is presented. Our study firstly accommodates fuzziness, hesita-
tion, and incompatible bipolarity in information process. In the following work, we will
try to combining rough set theory with hesitant bipolar fuzzy set.

Acknowledgements

This work was supported in part by the Joint Key Grant of National Natural Science
Foundation of China and Zhejiang Province (U1509217), the National Natural Sci-
ence Foundation of China (61503191) and the Natural Science Foundation of Jiangsu
Province, China (BK20150933).

References

[1] L.A. Zadeh, Fuzzy sets, Inform. and Control, 8 (1965) 338–353.
[2] V. Torra and Y. Narukawa, On hesitant fuzzy sets and decision, in: the 18th IEEE International Confer-
ence on Fuzzy Systems, Korea, 2009, 1378–1382.
[3] N. Chen, Z.S. Xu and M.M. Xia, Correlation coefficients of hesitant fuzzy sets and their applications to
clustering analysis, Applied Mathematical Modeling, 37 (2013) 2197–2211.
[4] Y. Han, Z.Z. Zhao, S. Chen and Q.T. Li, Possible-degree generalized hesitant fuzzy set and its Applica-
tion in MADM, Advances in Intelligent Systems and Computing, 27 (2014) 1–12.
[5] F.Y. Meng and X.H. Chen, A hesitant fuzzy linguistic multi-granularity decision making model based
on distance measures, Journal of Intelligent and Fuzzy Systems, 28 (2015) 1519–1531.
[6] J. Montero, H. Bustince, C. Franco, J.T. Rodríguez, D. Gómez, M. Pagola, J. Fernández and E. Bar-
renechea, Paired structures in knowledge representation, Knowledge-Based Systems, 100 (2016) 50–58.
[7] C.G. Zhou, X.Q. Zeng, H.B. Jiang, L.X. Han, A generalized bipolar auto-associative memory model
based on discrete recurrent neural networks, Neurocomputing, 162 (2015) 201–208.
[8] H. Bustince, E. Barrenechea, M. Pagola, J. Fernandez, Z.S. Xu, B. Bedregal, J. Montero,H. Hagras,
F. Herrera and B.D. Baets, A historical account of types of fuzzy sets and their relationships, IEEE
Transactions on Fuzzy Systems, 24 (2016) 179–194.
[9] Y. Han, P. Shi and S. Chen, Bipolar-valued rough fuzzy set and its applications to decision information
system, IEEE Transactions on Fuzzy Systems, 23 (2015) 2358–2370.
[10] Y.J. Lai, T.Y. Liu and C.L. Hwang, TOPSIS for MODM, European Journal of Operational Research, 76
(1994) 486–500.
[11] W.R. Zhang, Bipolar fuzzy sets and relations: a computational framework for cognitive modeling and
multiagent decision analysis, Proceedings of IEEE Conf., 1994: 305–309.
[12] M.M. Xia and Z.S. Xu, International Journal of Approximate Reasoning, 52 (2100) 395–407.
Fuzzy Systems and Data Mining II 121
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-121

Chance Constrained Twin Support Vector


Machine for Uncertain Pattern
Classification
Ben-Zhang YANG a ,Yi-Bin XIAO b , Nan-Jing HUANG a,1 , and Qi-Lin CAO c,2
a Department of Mathematics, Sichuan University, Chengdu, Sichuan, P.R. China
b Department of Mathematics, University of Electronic Science and Technology of

China, Chengdu, Sichuan, P.R. China


c Business School, Sichuan University, Chengdu, Sichuan, P.R. China

Abstract. In this paper, using chance constrained programming formulation, a new


chance constrained twin support vector machine (CC-TWSVM) is proposed. This
paper studies twin support vector machine classification when data points are un-
certain with measurement statistically noise. With some properties known for the
distribution, the CC-TWSVM model aims to ensure the small probability of error
classification for the uncertain data. We also provide equivalent second-order cone
programming (SOCP) model of the CC-TWSVM model by the properties of mo-
ment information of uncertain data. The dual problem of SOCP model is introduced
and the optimal value of the CC-TWSVM model can be solved directly. In addition,
we also show the performance of CC-TWSVM model in artificial data and real data
by numerical experiments.
Keywords. support vector machine, robust optimization, chance constraints,
uncertain classification.

Introduction

Nowadays, support vector machines (SVMs) are considered as one of the most effective
learning methods for classification. The main idea of this classification technique is by
mapping the data to the higher dimensional space with some kernel methods and then
determine a hyperplane separating binary classes with maximal margin [1,2].
Binary data classification methods have made breakthrough progress in recent years.
Mangasarian et al. [3] proposed generalized evigenvalue proximal support vector ma-
chinie (GEPSVM). Different from canonical SVM, GEPSVM aims to find two optimal
nonparallel planes such that each hyperplane is closer to its class and is as far as possi-
ble from the other class. Motivated by GEPSVM Jayadeva et al. [4] proposed a a twin
support vector machine (TWSVM) to solve the classification of binary data. The main

1 Corresponding Author: Nan-Jing Huang, Department of Mathematics, Sichuan University, Chengdu,

Sichuan, P.R. China, 610000, E-mail: nanjinghuang@hotmail.com.


2 Corresponding Author: Qi-Lin Cao, Business School, Sichuan University, Chengdu, Sichuan, P.R. China,

610000, E-mail: qlcao@scu.edu.cn.


122 B.-Z. Yang et al. / Chance Constrained Twin SVM for Uncertain Pattern Classification

idea of TWSVM is generating two nonparallel planes that have the similar properties
in GEPSVM. But different from GEPSVM, the two planes in TWSVM are obtained by
double related programming problems. At the same time, the ν -TWSVM [5] was pro-
posed for handling outliers as an extension of TWSVM. Some extensions to the TWSVM
can be founded in [6].
For the above-mentioned methods, the parameters in the training data sets are im-
plicitly assumed to be known exactly. However, in real world application, parameters are
perturbed as they are estimated from the data of the measurement and the statistical error
[7]. For instance, the real data points are always incorporating the uncertain information
in automatic acoustic identification and other imbalanced data problems [8]. When the
data points are uncertain, some SVM models for processing uncertainties have been pro-
posed as the development of previous model. Trafalis et al. [9] proposed a robust opti-
mization model when the noise of the uncertain data is norm-bounded. Robust optimiza-
tion [10] was also introduced in the cases of chance constraints. The usage of robust op-
timization in chance constraints is to ensure small probability of error classification for
the uncertainty. More precisely, this assurance is to require the probability of construct-
ing a maximum margin linear classifier by random variables more higher. It also means
that probability which the points of one class are classified to the other is controlled by a
extremely low value. Ben Tal et al. [11,12] employed moment information of uncertain
training data to developing a different chance-constrained SVM (CC-SVM) model. How-
ever, to our best knowledge, there is no researcher considering the chance-constrained
optimization in TWSVM problem. Therefore, it is interesting and important to study the
TWSVM with chance constraints for the uncertain data classification problem. The main
purpose of this paper is to make an attempt in this direction.
Combining the capability of processing the uncertainty of chance constraints and
the benefits of TWSVM, in this paper, we propose a chance constrained twin support
vector machine (CC-TWSVM). The main method of this paper is ,by using the moment
information of uncertain data, to transform chance constrained programming into second
order cone programming. Section 1 recalls SVM and TWSVM briefly. In Section 2, we
introduce the model of CC-TWSVM. Experimental results on the uncertain data sets are
presented in Section 3. Conclusions are provided in Section 4.

1. Preliminaries

In this section, we briefly recall some concepts of SVM and TWSVM for binary classi-
fication problem.

1.1. SVM

Let us consider the linearly separable classification problem. Given training set

{(x1 , y1 ), · · · , (xl , yl )} ⊆ Rm × {−1, +1}.

SVM aims to find an optimal hyperplane wT x + b = 0 which separates the data into
2
two classes based on maximizing the distance w between two support hyperplanes,
2
which can be formulated as follows
B.-Z. Yang et al. / Chance Constrained Twin SVM for Uncertain Pattern Classification 123

min 12 w 22 +C ∑li=1 ξi
w,b
s.t. yi (wT xi + b) ≥ 1 − ξi , (1)
ξi ≥ 0, i = 1, · · · , l.

After solving (1), a new point is classified as class +1 or class −1 according to the
finally decision function f (x) = sgn(wT x + b).

1.2. TWSVM

Consider a binary classification problem of l1 positive points and l2 negative points (l1 +
l2 = l). Suppose that data points belong to positive class are denoted by A ∈ Rl1 ×n , where
each row Ai ∈ Rn (i = 1, · · · , l) represents a data point with label +1. Similarly, B ∈ Rl2 ×n
represents all the data points with the label −1. The TWSVM determines two nonparallel
hyperplanes:

f+ (x) = wT+ x + b+ = 0 and f− (x) = wT− x + b− = 0, (2)

where w+ , w− ∈ Rn , b+ , b− ∈ R. Here, each hyperplane is close to one of the two classes


and is at least one distance from the other class points. The formulation of TWSVM are
as follows:

min 1 Aw+ + e+ b+ 22 +C1 eT− ξ


w+ ,b+ 2 (3)
s.t. −(Bw+ + e− b+ ) + ξ ≥ e− , ξ ≥ 0

and

min 1 Bw− + e− b− 22 +C2 eT+ η


w− ,b− 2 (4)
s.t. (Aw+ + e+ b+ ) + η ≥ e+ , η ≥ 0,

where C1 ,C2 are pre-specified penalty factors, e+ , e− are vectors of ones of correspond-
ing dimensions. It is apparent from the formulations that the vector of ones e+ is l2 di-
mensions and e1 is l1 . The nonparallel hyperplanes (2) can be obtained by solving (3)
and (4). Then the new point is classified by following decision function

xT wr + br = min | xT wr + br |, (5)
r=+,−

where r represents the class label +1 or −1.

2. Chance Constrained Twin Support Vector Machine

In this section, we introduce chance constrained programming (CCP) briefly and propose
a chance constrained twin support vector machine (CC-TWSVM) to process uncertain
data points.
When uncertain noise exists in the datast, the TWSVM model need to be modified
to contain the uncertainty information. Suppose there are l1 and l2 training data points in
124 B.-Z. Yang et al. / Chance Constrained Twin SVM for Uncertain Pattern Classification

Rn , use A i = [Ai1 , · · · , Ain ], i = 1, · · · , l1 to denote the uncertain data points and the label
is positive +1. And let B i = [Bi1 , · · · , Bin ], i = 1, · · · , l2 to denote the uncertain data points
and the label is negative −1 respectively. Then A = [A 1 , · · · , A
l ]T and B = [B
1 , · · · , B
l ]T
1 2
represent two data sets. The chance-constrained program is to determine two nonparallel
planes such that each hyperplane is closer to its class in the sense of expectation and is
as far as possible from the other class in probability. The chance-constrained TWSVM
formulations are

l1
∑ ξi
2 E{ Aw+ + e+ b+ 2 } +C1
1 2
min
w+ ,b+ i=1 (6)
s.t. P{−(B i w+ + b+ ) ≤ 1 − ξi } ≤ ε
ξi ≥ 0, i = 1, · · · , l1

and

l2
∑ ηi
2 E{ Bw− + e− b− 2 } +C2
1 2
min
w− ,b− i=1
(7)
s.t. P{(A i w− + b− ) ≤ 1 − ηi } ≤ ε
ηi ≥ 0, i = 1, · · · , l2 .

where E{·} denote the expectation under corresponding distribution, C1 ,C2 are user-
given regularization parameters, 0 < ε < 1 is a parameter close to 0 and P{·} is the prob-
ability distribution of uncertain data points of binary classes sets. The objective functions
of model ensure that minimum distance between each hyperplane to its class in average.
The chance constraints of model ensure that an upper bound on the misclassification
probability which the point is assigned to another class. The chance constraints in the
model have the advantages of guaranteing classification correctly with high probability.
And the determined planes constructing by maximum margin classifiers are robust to
uncertainties in data. But two quadratic optimization problems (6) and (7) with chance
constrained are obviously non-convex, so the model is difficult to solve. So far using
different bounded inequalities is always effective technique to deal with CCP. When the
mean and covariance matrix of uncertain data points are known, then multivariate bound
[13,14,15] can be adopted to express the chance constraints by robust optimization.
Let X ∼ (μ , Σ) denote random vector X with mean μ and covariance matrix Σ, the
multivariate Chebyshev inequality states that for any closed convex set S, the supremum
of the probability that X take a value in S is

sup P{X ∈ S} = 1
1+d 2
X∼(μ ,Σ) (8)
d 2 = inf (X − μ )T ∑−1 (X − μ ).
X∈S

Assume the first and second moment information of random variables A i and B i are
known. Let μi+ = E[A i ] and μi− = E[B i ] be the mean vector seperately. And let ∑+
i =
+ T − − T −
E[(Ai − μi ) (Ai − μi )] and ∑i = E[(Bi − μi ) (Bi − μi )] be the covariance matrix of
+

the two data set uncertain points respectively. Then the problems (6) and (7) could be
reformulated respectively as:
B.-Z. Yang et al. / Chance Constrained Twin SVM for Uncertain Pattern Classification 125

l1
T +T b + 1 l b2 +C
2 w+ G w+ + w + μ ∑ ξi
1 T +
min + 2 1 + 1
w+ ,b+ i=1 (9)
1
s.t. −(μi− w+ + b+ ) ≥ 1 − ξi + k ∑−
i
2
w+ , ξi ≥ 0

and

l2
1 T − T −T b + 1 l b2 +C
min 2 w− G w− + w − μ − 2 2 − 2 ∑ ηi
w− ,b− i=1 (10)
1
s.t. μi+ w− + b− ≥ 1 − ηi + k ∑+
i
2
w− , ηi ≥ 0,

1−ε
where k = ε and

l1 l1
G+ = ∑ (μi+ μi+ + Σ+ μ + = ∑ μi+
T
i ),
i=1 i=1

with
l2 l2
G− = ∑ (μi− μi− + Σ− μ − = ∑ μi− .
T
i ),
i=1 i=1

Let
 
1 G+ μ +T
H+ = . (11)
2 μ + l1

Then the matrix H + is positive semi-define. To ensure the strict convexity of problem
(9), we can always append a perturbation ε I (ε > 0, I is the identity matrix) such that the
matrix H + + ε I is positive define. Without loss of generality, suppose that H + is positive
define.
The dual problems of chance-constrained TWSVM models (9) and (10) can be for-
mulated as the following models

l1 T T T T
max ∑ λi − 12 s+
i H1 G H1 si − 2 l1 si H2 H2 si − μi H1 si H2 si
+ + + + 1 + + + + + + + + +
λi ,ν i=1
l  1
 l1

1 T (12)
s.t. − ∑ λi μi− + kΣi− 2 ν , ∑ λi = s+ i
i=1 i=1
0 ≤ λi ≤ C1 , ν ≤ 1

and
l2 T T T T
max ∑ γi − 12 s− − − − − − − − − + + − + −
i H1 G H1 si − 2 l2 si H2 H2 si − μi H1 si H2 si
1
γi ,υ i=1
l  1
 l 
2
+T
2 (13)
s.t. − ∑ γi μi − kΣi υ , ∑ γi = s−
+2
i
i=1 i=1
0 ≤ γi ≤ C2 , υ ≤ 1,
126 B.-Z. Yang et al. / Chance Constrained Twin SVM for Uncertain Pattern Classification

where
−1 −1
H+ = [H1+ , H2+ ], H− = [H1− , H2− ].

3. Numerical Experiments

In this section, our CC-TWSVM model is illustrated by numerical test based on two
types of data sets. The first test is implemented to certify the performance of our CC-
TWSVM by artificial data. And in second test, we also test the performance of CC-
TWSVM model on real-word classifying data sets from UCI Machine Learning Repos-
itory. All results were averaged on 10 train-test experiments and carried out by Matlab
R2012a with 2.5GHz CPU, 2.5G usable RAM. The SeDuMi 3 software is employed to
solve the SOCP problems of CC-TWSVM.

3.1. Artificial data

To give an direct interpretation of CC-TWSVM performance, we generate one uncertain


set of 2-dimension data randomly. The normal distribution contribution of binary classes
of the data is
       
0 10 − −1 − 70
μ =+
, Σ =+
,μ = , Σ = .
2 04 0 03

8 8

6 6

4 4

2 2

0 0

−2 −2

−4 −4

−6 −6
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

(a) ε = 0.1 (b) ε = 0.01

Figure 1. The performance of CC-TWSVM in the first data set.

Figure 1 showes that the performance of CC-TWSVM to two uncertain data set
points. In numerical experiments, different data points are generated by respective dis-
tribution. In data set, one class points were generated by normal distribution (μ + , Σ+ )
and the other by (μ − , Σ− ). Each class has 50 points, and 20 points are randomly picked
as the training points, the other points are the test points. In Figure 1 , the blue stars are
the points of +1 class, while -1 class with the red circles. For simplicity, we set ε to be
0.1 and 0.01 respectively. The penalty parameters C1 and C2 are selected form the set
3 http://sedumi.ie.lehigh.edu/
B.-Z. Yang et al. / Chance Constrained Twin SVM for Uncertain Pattern Classification 127

{10i |i = −5, · · · , 5}. After 10 times experiments, we obtain the results of two parameters
of hyperplanes and set the average parameters to be the ultimate result. The blue and red
lines are the separating hyperplane (2-D) that we look for. In fact, the value of parame-
ter ε also effect the determination of two hyperplane. When the parameter ε decreases
from 0.1 to 0.01, the average accuracy of classifier is more higher and the planes are
more closet to responding classes. Figure 1. (a) and (b) perform the effect of various
parameters.

3.2. Real data

In numerical experiments, this section presents results in two real data sets. The following
datasets were used in the experiments:
• WBCD Wisconsin Breast Cancer Diagnostic dataset was also obtained from UCI
dataset [16]. WBCD data is 10-dimensional. The data set has 699 samples, with
444 benign samples are labeled as the class +1 and the remaining malignant as
the class -1.
• IONOSPHERE Ionosphere dataset was collected from UCI dataset . Ionosphere
data is of 34-dimension. The data set has 351 samples, with 225 good samples
are labeled as the class +1 and the remaining as the class -1.
The distribution properties are often so unknown that need to be estimated from data
points. If an uncertain point x i = [
xi1 , · · · , x
in ] has N samples xik , k = 1, · · · , N, then the
T
N
sample mean xi = 1
N ∑ xik is used to estimate the mean vector μi = E[
xi ], and the sample
i=1
covariance

1 N
Si = ∑ (xik − xi )(xik − xi )T
N − 1 i=1

is used to estimate the covariance matrix

xi − μi )(
Σi = E[( xi − μi )T ].

But these could bring possible estimation errors in some condition that the mean
vector μi and covariance matrix Σi may not obtained. Panos M. Pardalos et al. [17] has
discussed the way to processing these special cases. In our practical experiments, similar
to Pardalos, we employ mentioned methods to modify the estimation.
Since the data sets are uncertain, the measures of performance are worth studied.
Ben-Tal et al. [11] proposed using nominal error and optimal error to evaluate the perfor-
mance. In our experiment, we choose these index to calculate the accuracy of our model.
The formula of NomErr is
∑ 1yipre =yi
i
NomErr = × 100%
the amount o f training data

The optimal error (OptErr) is defined on the basis of the misclassification probabili-
ty. The chance constraints in the model (6), (7) can be reformulated to (9), (10), then we
can derive the least value of ε called εopt . Te OptErr of data point xi is defined as
128 B.-Z. Yang et al. / Chance Constrained Twin SVM for Uncertain Pattern Classification

1 if yipre = yi
OptErr =
εopt if yipre = yi

And the OptErr of dataset is defined as

∑ OptErri
i
OptErr = × 100%.
the amount o f training data

We tested on the WBCD firstly. Because each data points in WBCD has 10 attributes,
the amount of time in calculating SOCP would take too much. We used principle com-
ponent analysis (PCA) to exact the two important features. Then 80% of the data points
was used as training and the remaining as the test data. For the setting of parameter ε ,
three parameter values {0.1, 0.05, 0.01} were adopted separately. Similar to the exper-
iments in artificial data, the penalty parameters C1 and C2 were selected form the set
{10i |i = −5, · · · , 5}.

3.1
5.5
5.5

3.08
5
5
3.06
4.5
4.5
3.04
4

4 3.02
3.5

3.5 3
3

e=0.1 e=0.05 e=0.01 e=0.1 e=0.05 e=0.01 e=0.1 e=0.05 e=0.01

(a) NomErr (b) OptErr (c) Training time

Figure 2. The performance of CC-TWSVM in the Wisconsin breast cancer data set.

The average results over 10 runs are shown in Figure 2. In Figure 2.(a), it is obvi-
ously that NomErr decreases slightly when ε descends from 0.1 to 0.01. That is because
ε represents the upper bound of misclassification. The result for OptErr is also the case
in Figure 3.(b). When ε decrease from 0.1 to 0.01, average OptErr rate decrease from
5.4% to 5.3% approximately. So we can get the conclusions that classifying accuracy
improves when parameter ε decreases. Since the definitions of OptErr and NomErr, it is
not difficult to see that OptErr was bigger than NomErr from the previous two maps. In
addition, the model takes more time when ε increases. This is due to solving process of
second cone programming problem is related heavily to the parameters.

22
22 2.1
21
21 2.09
20
20 2.08
19
19 2.07
18
2.06
18
17
2.05
17
16
2.04
16 15
2.03
15 14
2.02
14 13
2.01
13 12 2
e=0.1 e=0.05 e=0.01 e=0.1 e=0.05 e=0.01 e=0.1 e=0.05 e=0.01

(a) NomErr (b) OptErr (c) Training Time

Figure 3. The performance of CC-TWSVM in the Ionosphere set.


B.-Z. Yang et al. / Chance Constrained Twin SVM for Uncertain Pattern Classification 129

Table 1. Misclassification rate with different model


Data Sets Classes Instance Features TWSVM CC-SVM CC-TWSVM
Bliver 2 345 6 0.3521 0.3514 0.3504
Heart-c 2 303 14 0.1867 0.1875 0.1802
Hepatitis 2 155 19 0.2082 0.2074 0.1991
Ionosphere 2 351 34 0.0633 0.0625 0.0604
Votes 2 435 16 0.0824 0.0816 0.0736
WBCD 2 699 10 0.1643 0.1606 0.1578

The average results for Ionosphere set over 10 runs are shown in Figure 3.. Similar
to the process in WBCD, we obtained 3 principal attributes of Ionosphere by PCA. Based
on these principal components, 80% of the data points was used as training and the
remaining as the test data. For the setting of parameter ε , three parameter values set
{0.1, 0.05, 0.01} were adopted and the penalty parameters C1 and C2 were selected form
the set {10i |i = −5, · · · , 5} respectively. We can also get the conclusions that classifying
accuracy improves when parameter ε decreases. And in this experiment, it is easy to
see that OptErr was bigger than NomErr. Moreover, because of the usage of SeDuMi
software in solving SOCP, the model takes more time when ε increases.
We also tested our model to compare with previous model, such as TWSVM and
CC-SVM. The experiment data sets were ”Bliver”, ”Heart-c”, ”Hepatitis”, ”Inosphere”,
”Votes”, and ”WBCD” which were selected from UCI datasets. In the experiments,
the penalty parameters in three model were all same. They were selected from the set
{10i |i = −5, · · · , 5} respectively. The parameter ε in CC-SVM and CC-TWSVM model
was selected from the set {0.1, 0.05, 0.01} respectively, and 80% of the data points ware
used as training and the remaining as the test data. Comparison of previous models and
our model is given in Table 1. It is easy to see that the average misclassification rate
of CC-TWSVM is better than original TWSVM. Furthermore, the performance of CC-
TWSVM is better than CC-SVM. This is consistent with the result that two nonparallel
planes has advantages than single hyperplane.

4. Conclusions

A new chance constrained twin support vector machine (CC-TWSVM) via chance con-
strained programming formulation was proposed, which can attend to data set with mea-
surement noise efficiently. This paper studied twin support vector machine classification
when data are uncertain statistically. With chance constraints programming (CCP) in the
model, the CC-TWSVM was used to ensure the low probability of classification error
for the uncertainty. The CC-TWSVM model could be transformed to second-order cone
programming (SOCP) by the properties of moment information of uncertain points and
the dual problem of SOCP model was also introduced. Then we obtained the twin hyper-
planes by the calculating the dual problem. In addition, we also showed the performance
of CC-TWSVM model in artificial data and real data by numerical experiments. In the
future work, how to further make the model more robust is under our consideration. In
addition, to deal with the situation of nonlinear classification with chance constrained is
also interesting.
130 B.-Z. Yang et al. / Chance Constrained Twin SVM for Uncertain Pattern Classification

Acknowledgement

This work was supported by the joint Foundation of the Ministry of Education of China
and China Mobile Communication Corporation (MCM20150505) and the Fundamental
Research Funds foe the Central Universities of Sichuan University (skqy201646).

References

[1] B. Scholkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regulation, Optimiza-
tion, and Beyond, MIT press, Cambridge, 2002.
[2] B. Z. Yang, M. H. Wang, H. Yang, T. Chen, Ramp loss quadratic support vector machine for classifica-
tion, Nonlinear Analalysis Forum, 21 (2016), 101-115.
[3] O. Mangasarian, E. Wild, Multisurface proximal support vector classification via generalized eigenval-
ues, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28 (2006), 69-74.
[4] Jayadeva, R. Khemchandani, S. Chandra, Twin support vector machines for pattern classification, IEEE
Transactions on Pattern Analysis and Machine Intelligence, 29 (2007), 905-910.
[5] X. J. Peng, A v-twin support vector machine (v-TWSVM) classifier and its geometric algorithms, Infor-
mation Sciences, 180 (2010), 3863-3875.
[6] Y. J. Lee, O. L. Mangasarian, SSVM: a smooth support vector machine for classification, Computational
Optimization and Applications, 20 (2001), 5-22.
[7] D. Goldfarb, G. Iyengar, Robust convex quadratically constrained programs, Mathematical Program-
ming, 97 (2009), 495-515.
[8] Paul Bosch, Julio López, Héctor Ramı́rez, Hugo Robotham, Support vector machine under uncertainty:
An application for hydroacoustic classification of fish-schools in Chile, Expert Systems with Applica-
tions, 40 (2013), 4029-4034.
[9] T. B. Trafalis, R. C. Gilbert, Robust classification and regression using support vector machines, Euro-
pean Journal of Operational Research, 173, (2006), 893-909.
[10] C. Bhattacharyya, L. R. Grate, M. I. Jordan, G. L. El, I. S. Mian, Robust sparse hyperplane classifier:
application to uncertain molecular profiling data, Journal of Computational Biology, 11 (2004), 1073-
1089.
[11] A. Ben-Tal, S. Bhadra, C. Bhattacharyya, J.S. Nash, Chance constrained uncertain classificiation via
robust optimization, Mathematical Programming, 127 (2011), 145-173.
[12] A. Ben-Tal, A. Nemirovski, Selected topics in robust convex optimization, Mathematical Programming,
112 (2008), 125-158.
[13] D. Bersimass, I. Popescu, Optimal inequities in probality theory: a convex optimization approach, SIAM:
SIAM Journal on Optimization, 15 (2005), 780-804.
[14] A. W. Marshall, I. Olkin, Multivariate chebyshev inequities. The Annals of Mathematical Statistics, 31
(1960), 1001-1014.
[15] A. Nemirovski, A. Shapiro, Convex approximations of chance constrained programs. SIAM: SIAM Jour-
nal on Optimization, 17 (2010) , 969-996.
[16] A. Frank and A. Asuncion, UCI Machine Learning Repository, 2010. Available at
http://archive.ics.uci.edu/ml.
[17] X. Wang, N. Fan, P. M. Pardalos, Robust chance-constrained support vector machines with second-order
moment information. Annals of Operations Research, (2015), 10.1007/s10479-015-2039-6
Fuzzy Systems and Data Mining II 131
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-131

Set-Theoretic Kripke-Style Semantics for


Monoidal T-Norm (Based) Logics
Eunsuk YANG 1
Department of Philosophy, Chonbuk National University, Jeonju, KOREA

Abstract. This paper deals with non-algebraic binary relational semantics, called
here set-theoretic Kripke-style semantics, for monoidal t-norm (based) logics. For
this, we first introduce the system MTL (Monoidal t-norm logic) and some of its
prominent axiomatic extensions, and then their corresponding Kripke-style seman-
tics. Next, we provide set-theoretic completeness results for them.
Keywords. relational semantics, (set-theoretic) Kripke-style semantics, substructural
logic, fuzzy logic, t-norm (based) logics

1. Algebraic Kripke-style Semantics

After introducing algebraic semantics for t-norm (based) logics, their corresponding
Kripke-style semantics have been introduced. For instance, after Esteva and Godo intro-
ducing algebraic semantics for monoidal t-norm (based) logics in [4], their correspond-
ing Kripke-style semantics were introduced by Montagna and Ono [6], Montagna and
Sacchetti [7], and Diaconescu and Georgescu [3]. These semantics have one important
common feature as follows:
• While such semantics are called Kripke-style semantics in the sense that those
semantics are provided using forcing relations, they are still algebraic in the sense
that their completeness results are provided using the fact that such semantics are
equivalent to algebraic semantics.
Because of this fact, Yang [8,9,10] called these semantics algebraic Kripke-style
semantics. Although non-algebraic Kripke-style semantics, where the “non-algebraic”
means that their completeness results are provided without using the above fact, were
provided for some particular systems (see e.g. [9]), such semantics have not yet been
established for basic fuzzy logics in general.
The aim of this paper is to provide set-theoretic Kripke-style semantics for basic core
fuzzy logics2 . As its starting point, we investigate set-theoretic Kripke-style semantics
for the logic system MTL (Monoidal t-norm logic) and its most prominent axiomatic
1 Corresponding Author: Eunsuk Yang, Department of Philosophy & Institute of Critical Thinking and
Writing, Chonbuk National University, Rm 307, College of Humanities Blvd. (14-1), Jeonju, 54896, KOREA
Email: eunsyang@jbnu.ac.kr.
2 Here, fuzzy logics are logics complete with respect to (w.r.t.) linearly ordered algebras and core fuzzy logics

are logics complete w.r.t. the real unit interval [0, 1] (see [1,2]).
132 E. Yang / Set-Theoretic Kripke-Style Semantics for Monoidal T-Norm (Based) Logics

extensions. For this, first, in Section 2, we discuss monoidal t-norm (based) logics and
their corresponding Kripke-style semantics. Next, in Section 3, we provide set-theoretic
completeness results for them.
For convenience, we adopt notations and terminology similar to those in [1,7,8,9,10]
and assume reader familiarity with them (together with results found therein).

2. Monoidal T-norm Logics and Kripke-style Semantics

Monoidal t-norm (based) logics are based on a countable propositional language with the
set of formulas FOR built inductively from a set of propositional variables VAR, propo-
sitional constants , ⊥, and binary connectives →, &, ∧, and ∨. Further connectives are
defined as follows:

df1. ϕ ↔ ψ := (ϕ → ψ) ∧ (ψ → ϕ), and


df2. ¬ϕ := ϕ → ⊥.

Th constant  may be defined as ⊥ → ⊥. Henceforth, the customary notations and termi-


nology are followed and the axiom systems are used to provide a consequence relation.
We first list the axioms and rules of MTL, the most basic monoidal t-norm logic.

Definition 1. The logic MTL is axiomatized as follows:


A1. (ϕ → ψ) → ((ψ → χ) → (ϕ → χ)) (suffixing, SF)
A2. ϕ → ϕ (reflexivity, R)
A3. (ϕ ∧ ψ) → ϕ, (ϕ ∧ ψ) → ψ (∧-elimination, ∧-E)
A4. ((ϕ → ψ) ∧ (ϕ → χ)) → (ϕ → (ψ ∧ χ)) (∧-introduction, ∧-I)
A5. ϕ → (ϕ ∨ ψ), ψ → (ϕ ∨ ψ) (∨-introduction, ∨-I)
A6. ((ϕ → χ) ∧ (ψ → χ)) → ((ϕ ∨ ψ) → χ) (∨-elimination, ∨-E)
A7. ⊥ → ϕ (ex falsum quodlibet, EF)
A8. ϕ →  (verum ex quodlibet, VE)
A9. (ϕ → (ψ → χ)) ↔ (ψ → (ϕ → χ)) (permutation, PM)
A10. (ϕ → (ψ → χ)) ↔ ((ϕ&ψ) → χ) (residuation, RES)
A11. (ϕ&ψ) → ϕ (weakening, W)
A12. (ϕ → ψ) ∨ (ψ → ϕ) (prelinearity, PL)
ϕ → ψ, ϕ  ψ (modus ponens, mp)
ϕ, ψ  ϕ ∧ ψ (adjunction, adj)

Well-known monoidal t-norm logics are axiomatic extensions (extensions for short)
of MTL. We introduce some prominent examples.

Definition 2. The following are famous monoidal t-norm logics extending MTL:

• Basic fuzzy logic BL is MTL plus (DIV) (ϕ ∧ ψ) → (ϕ&(ϕ → ψ)).


• Łukasiewicz logic Ł is BL plus (DNE) ¬¬ϕ → ϕ.
• Gödel logic G is BL plus (CTR) ϕ → (ϕ&ϕ).
• Product logic Π is BL plus (CAN) (ϕ → ⊥) ∨ ((ϕ → (ϕ&ψ)) → ψ).

For easy reference, we let Ls be a set of the monoidal t-norm logics defined previ-
ously.
E. Yang / Set-Theoretic Kripke-Style Semantics for Monoidal T-Norm (Based) Logics 133

Definition 3. Ls = {MTL, BL, Ł, G, Π}.


For convenience, “,”, “⊥,” “¬,” “→,” “∧,” and “∨” are used ambiguously as propo-
sitional constants and connectives and as top and bottom elements and frame operators,
but context should clarify their meanings.
Now we provide Kripke-style semantics for Ls. First, Kripke frames are defined as
follows.
Definition 4. (Cf. [7,8,10])
(i) (Kripke frame) A Kripke frame is a structure X = (X, , ⊥, ≤, ∗), where (X, , ≤
, ∗) is an integral linearly ordered commutative monoid such that ∗ is residuated,
i.e., for every a, b ∈ X, the set {c : c ∗ a ≤ b} has a supremum, denoted here a → b.
The elements of X are called nodes.
(ii) (L frame) An MTL frame is a Kripke frame, where ∗ is conjunctive (i.e., ⊥ ∗  = ⊥)
and left-continuous (i.e., if there exists sup{ci : i ∈ I}, then sup{ci ∗ a : i ∈ I} =
sup{ci : i ∈ I} ∗ a). Consider the following conditions: for all a, b ∈ X,
• (DIVF ) min{a, b} ≤ a ∗ (a → b).
• (DNEF ) ¬¬a ≤ a.
• (CTRF ) a ≤ a ∗ a.
• (CANF )  = a → ⊥ or  = (a → (a ∗ b)) → b.
BL frames are MTL frames satisfying (DIVF ); Ł frames are BL frames satisfying
(DNEF ); G frames are BL frames satisfying (CTRF ); Π frames are BL frames
satisfying (CANF ). We call all these frames (including MTL frames) L frames.
An evaluation on a Kripke frame is a forcing relation  between nodes and proposi-
tional variables, constant, and arbitrary formulas satisfying the following conditions: For
every propositional variable p,
(Atomic hereditary condition, AHC) if a  p and b ≤ a, then b  p;
(min) ⊥  p,
for the propositional constant ⊥,
(⊥) a  ⊥ iff a = ⊥, and
for arbitrary formulas,
(∧) a  ϕ ∧ ψ iff a  ϕ and a  ψ;
(∨) a  ϕ ∨ ψ iff either a  ϕ or a  ψ;
(&) a  ϕ&ψ iff there exist b, c ∈ X such that a ≤ b ∗ c, b  ϕ, and c  ψ;
(→) a  ϕ → ψ iff for each b ∈ X, if b  ϕ, then a ∗ b  ψ.
Definition 5. (i) (Kripke model) A Kripke model is a pair (X , ), where X is a
Kripke frame and  is a forcing on X .
(ii) (L model) An L model is a pair (X , ), where X is an L frame and  is a forcing
on X .
Definition 6. Given a Kripke model (X , ), a node a of X and a formula ϕ, we say
that a forces ϕ to express a  ϕ. We say that ϕ is true in (X , ) if   ϕ, and that ϕ is
valid in the frame X (expressed by X |= ϕ) if ϕ is true in (X , ) for each forcing 
on X .
134 E. Yang / Set-Theoretic Kripke-Style Semantics for Monoidal T-Norm (Based) Logics

3. Soundness and Completeness for Ls

We first introduce two lemmas, which can be easily proved:

Lemma 1. (Hereditary condition, HC) Let X be a Kripke frame. For every formula ϕ
and for any two nodes a, b ∈ X , if a  ϕ and b ≤ a, then b  ϕ.

Lemma 2.   ϕ → ψ iff for every a ∈ X, if a  ϕ, then a  ψ.

We then provide soundness and completeness results for Ls.

Proposition 1. ([6]) Let X = (X, , ⊥, ≤, ∗) be an L frame and v be an evaluation in


X . For each atomic formula p and for each a ∈ X , let a  p iff a ≤ v(p). Then (X , )
is an L model, and for each formula ϕ and for each a ∈ X , we have that a  ϕ iff
a ≤ v(ϕ).

Proposition 2. (Soundness) For any formula ϕ, if L ϕ, then ϕ is valid in every L frame.

Proof. Here we consider the formulas (PL), (DIV ), (DNE), (CT R) and (CAN) as exam-
ples.
(PL): By the condition (∨), it is sufficient to prove that   ϕ → ψ or   ψ → ϕ. By
Proposition 1, we can instead show that  ≤ v(ϕ → ψ) or  ≤ v(ψ → ϕ). Proposition 1
also ensures v(ϕ → ψ) = v(ϕ) → v(ψ) for all formulas ϕ and ψ. If v(ϕ) ≤ v(ψ), then
 ∗ v(ϕ) ≤ v(ψ) and thus  ≤ v(ϕ → ψ). If v(ψ) ≤ v(ϕ), then  ∗ v(ψ) ≤ v(ϕ) and
thus  ≤ v(ψ → ϕ).
(DIV ): Lemma 2 ensures that in order to prove   (ϕ ∧ ψ) → ((ϕ&(ϕ → ψ)) → ψ),
it is sufficient to show that for each node a ∈ X, if a  ϕ ∧ ψ, then a  ϕ&(ϕ → ψ). By
Proposition 1, we can instead assume a ≤ v(ϕ ∧ ψ) and show a ≤ v(ϕ&(ϕ → ψ)). Note
that Proposition 1 also ensures v(ϕ ∧ ψ) = min{v(ϕ), v(ψ)} and v(ϕ&ψ) = v(ϕ) ∗ v(ψ)
for all formulas ϕ and ψ. Then, since min{v(ϕ), v(ψ)} ≤ v(ϕ) ∗ (v(ϕ) → v(ψ)) by
(DIV F ), we have a ≤ v(ϕ&(ϕ → ψ)).
(DNE): As above, it is sufficient to prove that for each a ∈ X, if a  ¬¬ϕ, then
a  ϕ. By Proposition 1, we instead assume a ≤ v(¬¬ϕ) and show a ≤ v(ϕ). Note that
v(¬ϕ) = v(ϕ → ⊥) = ¬v(ϕ). Then, since a ≤ v(¬¬ϕ) = ¬¬v(ϕ) and ¬¬v(ϕ) ≤ v(ϕ)
by (DNE F ), we have a ≤ v(ϕ).
(CT R): As above, it is sufficient to prove that for each a ∈ X, if a  ϕ, then a  ϕ&ϕ. Let
a  ϕ. By Proposition 1, we have a ≤ v(ϕ). Then, using the monotonicity and (CT RF ),
we also have a ≤ a ∗ a ≤ v(ϕ) ∗ v(ϕ). Hence, by the condition (&) and Proposition 1, we
obtain a  ϕ&ϕ.
(CAN): We need to show that either   ϕ → ⊥ or   (ϕ → (ϕ&ψ)) → ψ. Ob-
viously, v(ϕ) = ⊥ ensures  ≤ v(ϕ → ⊥) since v(⊥ → ⊥) = v(⊥) → v(⊥) = v().
Thus, by Proposition 1, we have   ϕ → ⊥ in case v(ϕ) = ⊥. Let v(ϕ) = ⊥. In or-
der to prove   (ϕ → (ϕ&ψ)) → ψ, we assume a  ϕ → (ϕ&ψ) and show a  ψ.
By Proposition 1, we instead assume a ≤ v(ϕ → (ϕ&ψ)) and show a ≤ v(ψ). Then,
since a ≤ v(ϕ → (ϕ&ψ)) = v(ϕ) → v(ϕ&ψ) = v(ϕ) → (v(ϕ) ∗ v(ψ)) and  = (v(ϕ) →
(v(ϕ) ∗ v(ψ))) → v(ψ) by (CAN F ), we have a ≤ v(ψ).
We leave the proofs for the other cases to the interested reader.
E. Yang / Set-Theoretic Kripke-Style Semantics for Monoidal T-Norm (Based) Logics 135

Now, we provide completeness results for Ls. A theory T is said to be linear if, for
each pair ϕ, ψ of formulas, we have T  ϕ → ψ or T  ψ → ϕ. By an L-theory, we
mean a theory T closed under rules of L. By a regular L-theory, we mean an L-theory
containing all of the theorems of L. Since we have no use of irregular theories, by an
L-theory, we henceforth mean an L-theory containing all of the theorems of L.
Let T be a linear L-theory. We define the canonical L frame determined by T as a
structure X = (Xcan , can , ⊥can , ≤can , ∗can ), where can = T , ⊥can = {ϕ : T L ⊥ →
ϕ}, Xcan is the set of linear L-theories extending can , ≤can is ⊇ restricted to Xcan , i.e,
a ≤can b iff {ϕ : a L ϕ} ⊇ {ϕ : b L ϕ}, and ∗can is defined as a ∗can b := {ϕ&ψ : for
some ϕ ∈ a, ψ ∈ b} satisfying integral commutative monoid properties corresponding
to L frames on (Xcan , can , ≤can ). Notice that we construct the base can as the linear
L-theory that excludes nontheorems of L, i.e., excludes any formula ϕ such that L ϕ.
The linearly orderedness of the canonical L frame depends on ≤can restricted on Xcan .
First, we can easily show the following.

Proposition 3. A canonical L frame is connected and thus linearly ordered.

Proof. It is easy to show that a canonical L frame is partially ordered. We show that this
frame is connected and so linearly ordered. Suppose toward contradiction that neither
a ≤can b nor b ≤can a. Then, there are ϕ, ψ such that ϕ ∈ b, ϕ ∈ a, ψ ∈ a, and ψ ∈ b. Note
that, since can is a linear theory, ϕ → ψ ∈ can or ψ → ϕ ∈ can . Let ϕ → ψ ∈ can
and thus ϕ → ψ ∈ b. Then, by (mp), we have ψ ∈ b, a contradiction. The case, where
ψ → ϕ ∈ can , is analogous.

Next, let vcan be a canonical evaluation function from formulas to sets of formulas,
i.e, vcan (ϕ) = {ϕ}. We define a canonical evaluation as follows:

(a) a can ϕ iff ϕ ∈ a.

This definition allows us to state the following lemmas.

Lemma 3. can can ϕ → ψ iff for each a ∈ Xcan , if a can ϕ, then a can ψ.

Proof. By (a), we need to show that ϕ → ψ ∈ can iff for all a ∈ Xcan , if ϕ ∈ a, then
ψ ∈ a. For the left-to-right direction, we assume ϕ → ψ ∈ can and ϕ ∈ a, and show
ψ ∈ a. The definition of ∗can ensures (ϕ → ψ)&ϕ ∈ can ∗can a = a. Since L proves
((ϕ → ψ)&ϕ) → ψ, we have ((ϕ → ψ)&ϕ) → ψ ∈ can and thus ((ϕ → ψ)&ϕ) → ψ ∈
a. Therefore, we obtain ψ ∈ a by (mp). We prove the other direction contrapositively.
Suppose ϕ → ψ ∈ can . We set a0 = {Z : there exists X ∈ can and can  (X&ϕ) → Z}.
Clearly, a0 ⊇ can , ϕ ∈ a0 , but also ψ ∈ a0 . (Otherwise, can  (X&ϕ) → ψ and thus
can  X → (ϕ → ψ); therefore, since can  X, by (mp), we have can  ϕ → ψ, a
contradiction.) Then, by the Linear Extension Property of Theorem 12.9 in [2], we have
a linear theory a ⊇ a0 with ψ ∈ a; therefore ϕ ∈ a but ψ ∈ a.

Lemma 4. (Canonical evaluation lemma) The canonical forcing relation can is an


evaluation.

Proof. We first consider the conditions for propositional variables.


For (AHC), we must show that: for each propositional variable p,
136 E. Yang / Set-Theoretic Kripke-Style Semantics for Monoidal T-Norm (Based) Logics

if a can p and b ≤can a, then b can p.


Let a can p and b ≤can a. By (a), we have p ∈ a and a ⊆ b, and thus p ∈ b. Hence, by
(a), we have b can p.
For (min), we must show that: for each propositional variable p,
⊥can can p.
By (a), we need to show that p ∈ ⊥can . Since ⊥can = {ϕ : T L ⊥ → ϕ}, p ∈ ⊥can .
We next consider the condition for the propositional constant ⊥.
For (⊥), we must show that:
a can ⊥ iff a =can ⊥can .
By (a), we need to show that ⊥ ∈ a iff a =can ⊥can . This is obvious since ⊥can = {ϕ :
T L ⊥ → ϕ}.
Now we consider the conditions for arbitrary formulas.
For (∧), we must show
a can ϕ ∧ ψ iff a can ϕ and a can ψ.
By (a), we need to show that ϕ ∧ ψ ∈ a iff ϕ ∈ a and ψ ∈ a. We can prove the left-to-right
direction using the axiom (∧-E) and the rule (mp). We can also prove the right-to-left
direction using the rule (adj).
For (∨), we must show
a can ϕ ∨ ψ iff either a can ϕ or a can ψ.
By (a), we need to show that ϕ ∨ ψ ∈ a iff either ϕ ∈ a or ψ ∈ a. We can prove the left-
to-right direction using the fact that a is linear and linear theories are also prime theories.
We can also prove the right-to-left direction using the axiom (∨-I) and the rule (mp).
For (&), we must show
a can ϕ&ψ iff there exist b, c ∈ Xcan such that b can ϕ, c can ψ, and a ≤can b ∗can c.
By (a), we need to show that ϕ&ψ ∈ a iff there exist b, c ∈ X such that ϕ ∈ b, ψ ∈ c,
and a ≤can b ∗can c. For the right-to-left direction, we assume that there exist b, c ∈ Xcan
such that ϕ ∈ b, ψ ∈ c, and a ≤can b ∗can c. Then, using the definition of ∗can , we obtain
ϕ&ψ ∈ a. For the left-to-right direction, we assume that, for all b, c ∈ Xcan , if ϕ ∈ b and
ψ ∈ c, then a ≤can b ∗can c does not hold true, and we show ϕ&ψ ∈ a. Let ϕ ∈ b and
ψ ∈ c. Since a ≤can b ∗can c does not hold true, we obtain ϕ&ψ ∈ a.
For (→), we must show
a can ϕ → ψ iff for each b ∈ Xcan , if b can ϕ, then a ∗can b can ψ.
By (a), we need to show that ϕ → ψ ∈ a iff for each b ∈ X, if ϕ ∈ b, then ψ ∈ a ∗can b.
For the left-to-right direction, we assume ϕ → ψ ∈ a and ϕ ∈ b, and show ψ ∈ a ∗can
b. The definition of ∗can ensures (ϕ → ψ)&ϕ ∈ a ∗can b. Then, since L proves ((ϕ →
ψ)&ϕ) → ψ, using Lemma 3, we obtain ψ ∈ a ∗can b. We prove the right-to-left direction
contrapositively. Suppose ϕ → ψ ∈ a. We need to construct a linear theory b such that
ϕ ∈ b and ψ ∈ a ∗can b. Let b0 be the smallest regular L-theory extending can with
{ϕ} and satisfying a ∗can b0 = {Z : there is X ∈ a and can  (X&ϕ) → Z}. Clearly,
ϕ ∈ b0 , but ψ ∈ a∗can b0 . (Otherwise, can  (X&ϕ) → ψ and thus can  X → (ϕ → ψ)
for some X ∈ a; therefore, ϕ → ψ ∈ a, a contradiction.) Then, by the Linear Extension
E. Yang / Set-Theoretic Kripke-Style Semantics for Monoidal T-Norm (Based) Logics 137

Property, we can obtain a linear theory b such that b0 ⊆ b and a ∗can b = {Z : there is
X ∈ a and can  (X&ϕ) → Z}; therefore, ϕ ∈ b but ψ ∈ a ∗can b.
Let a model M for L be an L model. Using Lemma 4, we can show that the canoni-
cally defined (X , can ) is an L model. Then, by construction, can excludes our chosen
nontheorem ϕ and the canonical definition of |= agrees with membership. Therefore, we
can say that, for each nontheorem ϕ of L, there exists an L model in which ϕ is not
can |= ϕ. This gives us the following weak completeness of L.

Theorem 1. (Weak completeness) For any formula ϕ, if ϕ is valid in every L frame, then
L ϕ.

Furthermore, using Lemma 4 and the Linear Extension Property, we can show the
strong completeness of L as follows.

Theorem 2. (Strong completeness) L is strongly complete w.r.t. the class of all L frames.

4. Concluding Remarks

We investigated set-theoretic Kripke-style semantics for some prominent t-norm (based)


logics. But we have not yet considered such semantics for fuzzy logics based on more
general structures. We will investigate this in a subsequent paper.

Acknowledgments: This work was supported by the Ministry of Education of the Repub-
lic of Korea and the National Research Foundation of Korea (NRF-2016S1A5A8018255).

References

[1] P. Cintula, R. Horčı́k and C. Noguera, Non-associative substructural logics and their semilinear exten-
sions: axiomatization and completeness properties, Review of Symbolic Logic 6 (2013), 394-423.
[2] P. Cintula, R. Horčı́k and C. Noguera, The quest for the basic fuzzy logic, in: Petr Hájek on Mathematical
Fuzzy Logic, F. Montagna, ed., Springer, Dordrecht, 2015, pp. 245-290.
[3] D. Diaconescu and G. Georgescu, On the forcing semantics for monoidal t-norm-based logic, Journal
of Universal Computer Science 13 (2007), 1550-1572.
[4] F. Esteva and L. Godo, Monoidal t-norm based logic: towards a logic for left-continuous t-norms, Fuzzy
Sets and Systems 124 (2001), 271-288.
[5] P. Hájek, Metamathematics of Fuzzy Logic, Kluwer, Amsterdam, 1998.
[6] F. Montagna and H. Ono, Kripke semantics, undecidability and standard completeness for Esteva and
Godo’s Logic MTL∀, Studia Logica 71 (2002), 227-245.
[7] F. Montagna and L. Sacchetti, Kripke-style semantics for many-valued logics, Mathematical Logic
Quarterly 49 (2003), 629-641.
[8] E. Yang, Algebraic Kripke-style semantics for relevance logics, Journal of Philosophical Logic 43
(2014), 803-826.
[9] E. Yang, Two kinds of (binary) Kripke-style semantics for three-valued logic, Logique et Analyse 231
(2015), 377-394.
[10] E. Yang, Algebraic Kripke-style semantics for substructural fuzzy logics, Korean Journal of Logic 19
(2016), 295-322.
This page intentionally left blank
Data Mining
This page intentionally left blank
Fuzzy Systems and Data Mining II 141
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-141

Dynamic Itemset Mining Under Multiple


Support Thresholds
Nourhan ABUZAYED and Belgin ERGENÇ1
Computer Engineering Department, Izmir Institute of Technology, Urla, Izmir, Turkey

Abstract. Handling dynamic aspect of databases and multiple support threshold


requirements of items are two important challenges of frequent itemset mining
algorithms. Existing dynamic itemset mining algorithms are devised for single
support threshold whereas multiple support threshold algorithms assume that the
databases are static. This paper focuses on dynamic update problem of frequent
itemsets under MIS (Multiple Item Support) thresholds and introduces Dynamic
MIS algorithm. It is i) tree based and scans the database once, ii) considers
multiple support thresholds, and iii) handles increments of additions, additions
with new items and deletions. Proposed algorithm is compared to CFP-Growth++
and findings are; in dynamic database 1) Dynamic MIS performs better than CFP-
Growth++ since it runs only on increments and 2) Dynamic MIS can achieve
speed-up up to 56 times against CFP-Growth++.

Keywords. Association rule mining, itemset mining, dynamic itemset mining,


multiple support thresholds

Introduction

Recently, an intensive research focused on association rule mining, which is one of the
main functions of data mining [1] Association rule was first introduced by Agrawal et
al. [2] and is defined as, X% of the customers who buy item A also buy item B and
denoted as AÆB. Association rules are meant to find the impact of a set of items on
another set of items. An itemset (items that co-occur in a transaction) frequency is
referred as the support count, which is the number of transactions that contain the
itemset. An itemset is frequent if its support count satisfies the minimum support
minsup threshold [3]. Confidence of an association rule XÆY is the ratio of
‰Y to the number of transactions that contain X [2 and 4].
transactions that contain X‰
Association rule mining has two main steps; 1) finding frequent itemsets/patterns,
2) generating association rules [5]. The first step is more expensive and several
algorithms have been proposed to find the frequent itemsets from huge databases. The
most classical one is the Apriori algorithm that uses candidate generation and testing
approach. Other subsequent algorithms using Apriori-like technique were introduced in
[6-12]. FP-Growth [4] and Matrix Apriori [13 and 14] are more recent algorithms that
try to overcome the drawback of candidate generation and multiple scans of the
database.

1
Corresponding Author: Belgin ERGENÇ, Computer Engineering Department, Izmir Institute of
Technology, Urla, Izmir, Turkey; Email: belginergenc@iyte.edu.tr.
142 N. Abuzayed and B. Ergenç / Dynamic Itemset Mining Under Multiple Support Thresholds

The major disadvantages of these itemset mining algorithms are 1) their


dependence on single user given minsup and 2) their inapplicability on dynamic
databases. Single support is not enough to represent the characteristics of the items and
causes rare item problem [15]. Some algorithms like MSapriori [16], CFP-Growth [17],
CFP-Growth++ [18] and MISFP-Growth [19] are introduced to find frequent patterns
under multiple support thresholds. For the second disadvantage, several algorithms
have been introduced in [20-27]. These algorithms perform faster and use less system
resources since they update frequent association rules by considering only the updates
instead of repeating all mining process from beginning.
All mentioned works handle either the dynamic itemset mining with single support
threshold or static itemset mining with multiple support thresholds. In this paper,
Dynamic MIS (Multiple Item Support) algorithm that provides a solution to the
dynamic itemset mining under multiple support thresholds problem is introduced. This
algorithm is tree based, scans the database only once, avoids the candidate generation
problem, and handles increments of additions, additions with new items and deletions
by using dynamic MIS-tree. The closest work to ours is introduced by incremental
tuning mechanism [28]. Proposed algorithm, Dynamic MIS is compared to CFP-
Growth++ using four datasets. Findings reveal that; in dynamic database, Dynamic
MIS performs better than CFP-Growth++ since it runs only on increments, speed-up
gained by Dynamic MIS can reach up to 56 times with large sparse datasets.
The organization of this paper is as follows: Section 1 introduces Dynamic MIS
algorithm with builder and increment handling parts. Section 2 shows the performance
evaluation. Section 3 is dedicated to the conclusion remarks.

1. Dynamic MIS Algorithm

Dynamic MIS algorithm provides a solution to the dynamic itemset mining under
multiple support thresholds problem by maintaining dynamic MIS-tree and two header
tables that keep the support counts of all items of the database. Frequent pattern
generation from the tree is done by related module of CFP-Growth++ algorithm [18].
Throughout the section, we use the following example illustrated. Table 1 presents
a sample database D and Table 2 illustrates the user given multiple item support (MIS)
for each item in decreasing order and items’ actual support in the database D. In the
right most column of Table 1, the transactions’ items are in an order of support values
as given in Table 2.
Table 1. Transaction database D [17]. Table 2. MIS and actual support of each item in
TID Item bought Item bought (ordered) D [17].
100 D, C,A, F A, C, D, F
Item A B C D E F G H
200 G, C, A, F, E A, C, E, F, G MIS 80 80 80 60 60 40 40 40
300 B, A, C, F, H A, B, C, F, H (%)
400 G, B, F B, F, G Support 60 60 80 20 20 80 40 20
500 B, C B, C (%)

1.1. Building MIS-tree

To build the MIS-tree, the MIS-tree builder algorithm illustrated in Figure 1 is used.
First, the MIS sorted list is created from the MIS values in Table 1 and ordered in de-
N. Abuzayed and B. Ergenç / Dynamic Itemset Mining Under Multiple Support Thresholds 143

creasing order (Line 1) then the root node of the tree is created (Line 2). Primary and
secondary header tables are created (Line 3) as shown in Figure 2.
INPUT: Database D, Minimum item supports MIS
OUTPUT: MISsorted, MIS-tree

BEGIN
1 Build MISsorted list (in decreasing order)
2 Create the root of MIS-tree as null
3 Create primary and secondary header tables
4 Insert items into primary table (count=0)
5 Scan D
6 FOR each transaction T in D do:
7 Sort all items in T (as MISsorted)
8 Add T to the tree
9 END FOR
10 Calculate the support of items in D
11 Update the supports in the tables
12 Relocate items between header tables
END

Figure 1. MIS-tree builder algorithm. Figure 2. MIS-tree by MIS tree builder.


After that, items are ordered as MISsorted then inserted into the primary header table
with item’s count 0 (Line 4). Database D is scanned, and the transactions are added to
the tree (Line 5-9). First, the items in the new transaction are sorted in decreasing order
according to MISsorted list as in the right most column of Table 1.Then transaction is
added to the tree; if the transaction shares prefix with previous transactions, these
prefixes will be incremented by one, otherwise; new nodes will be created starting from
the root node with item’s count equal to one. Item’s count in the node and header table
is incremented. Nodes of same item are linked all through the tree and to the header
table. Supports of all items in D are calculated then updated in the header tables.
Eventually, the items are located in the two header tables; here items with support
more than the MIN MIS value (40%) are inserted into the primary header table, the rest
are inserted in the secondary header table. Likewise the node links are arranged.
Table 3. The incremental database d.
TID Items Items( ordered)
1 C, B, H B ,C, H
2 G, B, F B ,F, G
3 C, D, H C, D, H
INPUT: MIS-tree, MISsorted, increment d
OUTPUT:Dynamic MIS-tree

BEGIN
1 Scan d
2 FOR each transaction T in d do:
3 Sort items in T (like MISsorted )
4 Add T to the tree
5 END FOR
6 Calculate the support of items
7 Update the supports in the tables
8 Relocate items between header tables
END

Figure 3. Update process in Dynamic MIS for additions. Figure 4. Dynamic MIS-tree after adding d.

1.2. Adding Increments of Additions

The pseudo code of the update process for additions is given in Figure 3. When new
transactions (Table 3) arrive, they are scanned to be added to the tree (Line 2-5). First,
144 N. Abuzayed and B. Ergenç / Dynamic Itemset Mining Under Multiple Support Thresholds

items in the new transaction are sorted in descending order of MISsorted list then
transactions are added to the tree one by one as seen in Figure 4. Each item’s count in
this transaction is incremented in the primary table. Then, the nodes of same item are
linked all through the tree and to the header tables of the same figure. Supports of
items are calculated then updated in the header tables (Line 6 - 7). Lastly, items are
relocated between header tables by comparing item’s support with MIN MIS value
(Line 8). The items (A and G) are transferred from primary to secondary header table,
because their supports become less than the MIN MIS value (40%).

1.3. Adding Increments of Additions with New Items

The pseudo code of the update process for additions with new items is given in Figure
5. Let us explain this process by using the MIS-tree shown in Figure 2, incremental
database (with new items J, K, L) given in Table 5 and MIS values of new items given
Table 4. The first step is combining the new MIS values in Table 4 with the MIS values
of the old items in Table 2 to get Table 6.
Table 4. MIS values for new items in d. Table 5. The incremental database d with new items J, K, L.
Item J K L TID Item bought Item bought ( ordered)
MIS value 70% 35% 30% 1 C, B, K, J, H, L B ,C, J, H, K, L
2 K,H H, K
3 K, B, C B , C, K
Table 6. MIS values of all items.
Item A B C J D E F G H K L
MIS (%) 80 80 80 70 60 60 40 40 40 35 30

When new items appear, the MISsorted is updated by adding the new MIS values in
descending order as in Line 1. After that, new items in MISnew are appended to the
primary header table with item’s count 0 (Line 2). These two lines are the main
difference between additions and additions with new items.

INPUT: MIS-tree, MISsorted,increment d, MISnew


OUTPUT: MISsorted, Dynamic MIS-tree

BEGIN
1 Build MISsorted (MISsorted + MISnew)
2 Insert new items into primary header
table (count=0)
3 Scan d
4 FOR each transaction T in d do:
5 Sort items in T (like MISsorted )
6 Add T to the tree
7 END FOR
8 Calculate the support of all items
9 Update the supports in the tables
10 Relocate items between header tables
END

Figure 5. Dynamic MIS for additions with new items. Figure 6. Dynamic MIS-tree after adding d.
At the end, some items will be transferred between two header tables. Here, item
(G) is transferred from primary to secondary because its new support (25%) is less than
the new MIN MIS value (30%) and item (H) is transferred from secondary to primary
because with its new support of (37%). Figure 6 presents MIS-tree after adding the new
three transactions.
N. Abuzayed and B. Ergenç / Dynamic Itemset Mining Under Multiple Support Thresholds 145

1.4. Adding Increments of Deletions

Let us explain the pseudo code of the update process for deletions which is shown in
Figure 7 by using the increment of deletions shown in Table 7. This example is applied
on the tree of Figure 2. The new transactions in d are scanned, and then deleted from
the tree as seen in Figure 8. Some items’ counts are decremented. These counts’
supports are calculated and updated in the tables of tree. According to the new items’
supports, some items are relocated between header tables. In this example; the support
of item (G) is 33.3%, which is less than the MIN MIS value (40%) and it is moved into
the secondary header table.

Table 7. Transactional database d with deletions.


TID Item bought Item bought ( ordered)
100 D,C,A,F A ,C, D, F
400 B,F,G B, F,G

INPUT: MIS-tree, MISsorted, increment d


OUTPUT: Dynamic MIS-tree

BEGIN
1 Scan d
2 FOR each transaction T in d do:
3 Sort tems in T (like MISsorted)
4 Delete T from the tree
5 END FOR
6 Calculate the support of all items
7 Update the supports
8 Relocate items between header tables
END

Figure 7. Update process in Dynamic MIS for deletions. Figure 8. Dynamic MIS-tree after deletions.

The nodes with count 1 are decremented and deleted from the tree, but their
records are kept in its specified table. The result Dynamic MIS-tree is illustrated in
Figure 8.

2. Performance Evaluation

Dynamic MIS is compared with the popular tree based algorithm, CFP-Growth++ [2].
Several experiments are executed on 4 datasets with different properties (T: average
size of the transactions, D: number of transactions, N: number of items) as shown in
Table 8. D1 and D4 are real; D2 and D3 are synthetic datasets. Density2 of a dataset
indicates the similarity of the transactions. D3 is generated to be used only in the
experiment additions with new items.
Table 8. Properties of datasets.

Dataset Type T D N Density


(%)3
D1 (Retail) Real 10.3 88162 16470 0.06
D2 (T40I1D100K) Synthetic 40 100K 942 4.25
D3 Synthetic 1.1 100K 5356 0.02
D4 (Kosarak) Real 8.1 990002 41270 0.02

2 Density (%) = (Average Transaction Length / # of Distinct Items) × 100


146 N. Abuzayed and B. Ergenç / Dynamic Itemset Mining Under Multiple Support Thresholds

All experiments are implemented on an Intl(R) core i7 -5500u CPU@ 2.40 GHz
with 8GB main memory, and Microsoft Windows 10 operating system. All programs
are implemented on C# environment.
For our experiments, we use two formulas [12] to assign MIS values to items in
the datasets: M (i)= β f (i) and MIS (i)=
{ M (i) M(i) > LS
LS Otherwise
f(i) is the actual frequency of item i in the data. LS is the user-specified lowest
minimum item support allowed. β (0 ≤ β ≤ 1) is a parameter that controls how the
MIS values for items should be related to their frequencies. If β = 0, we have only one
minimum support, LS, which is the same as the traditional association rule mining. If β
= 1 and f(i) ≥ LS, f(i) is the MIS value for i [16]. This formula is used to generate MIS
values to algorithms which use multiple support thresholds as in [16, 17, 18 and 28].

2.1. Complexity analysis of algorithms

Computational complexity of building the initial tree is same for both algorithms. It is
(T * V); where T is the number of transactions, and V the average transaction length.
The complexity of the pruning procedure in CFP-Growth++ is O (N * C) where N is
the number of nodes holding the items to be pruned, C is the number of their children.
The merging procedure in CFP-Growth++ is O (N2 * K) where N is number of nodes
in the tree and K is the node links. However in Dynamic MIS the pruning and merging
procedures are replaced by relocating items between header tables procedure which has
a linear complexity of O (N) where N is the number of items to be transferred. The
complexity of adding increments to the tree is O (T * V) where T is the number of the
incremental transactions, and V the average transaction length.

2.2. Execution time with additions

Execution time of Dynamic MIS and CFP-Growth++ algorithms on the increments of


additions is measured by dividing the dataset is into two parts. The part with D = (100 -
x)% forms the initial dataset and the remaining part with d = x% of the transactions
forms the increments. MIS values are kept constant. D1 has thirteen splits of 1% - 13%,
D2 has ten splits of 5% to 50% and D4 has eighteen splits of 5% - 90%.

Figure 9. Speed-up on Retail with additions. Figure 10. Speed-up with additions
The speed-up by running Dynamic MIS instead of re-running CFP-Growth++
when the database is updated is shown in Figure 9 and Figure 10. Speed-up of Dynamic
MIS is from 22.21 to 55.94 on D1 (Figure 9), from 1.56 to 1.35 on D2, and from 37.67
to 5.69 on D4 respectively as seen in Figure 10. The reasons behind these speed-up are
1) Dynamic MIS runs only on the increment whereas CFP-Growth++ runs from the
N. Abuzayed and B. Ergenç / Dynamic Itemset Mining Under Multiple Support Thresholds 147

beginning, 2) Dynamic MIS generates frequent patterns from the items of primary
header table only whereas CFP-Growth++ requires pruning and merging of MIS-tree.

2.3. Execution time with additions of new items

Execution time performance of increments of additions with new items is measured on


D3 that is generated by IBM_Quest_data_generator [29] to be able to control the
existence of new items that do not exist in the original dataset. Eighteen split sizes in
the range of 5% - 90% are used. LS and β are constant as 0.01 and 0.5 respectively.
The number of new items in each split is constant and equal to 100. As shown in the
Figure 11 speed-up decreases from 5.76 to 1.72 while the split size increases.

Figure 11. Speed-up with additions with new items. Figure 12. Speed-up with deletions.

2.4. Execution time with deletions

The last comparison is to determine how the size of deletions affects the performance
of algorithm. Each split contains 20 % of the transactions of the original dataset. MIS
values are kept constant. The speed-up by running Dynamic MIS instead of re-running
CFP-Growth++ when the database is up-dated with deletions can be seen in Figure 12.
The speed-up increases from 2.26 to 44.88 in D1, from 2.06 to 40.16 in D4 and from
1.12 to 1.25 in D2 while the split size decreases.

3. Conclusion

Single support threshold and dynamic aspect of databases bring additional challenges
on frequent itemset mining algorithms. Dynamic MIS algorithm is proposed as a
solution to dynamic update problem of frequent itemset mining under multiple support
thresholds. It is tree based and handles increments of additions, additions with new
items, deletions and is faster especially with large sparse database.

Acknowledgements

This work is partially supported by the Scientific and Technological Research Council
of Turkey (TUBITAK) under ARDEB 3501 Project No: 114E779

References

[1] M. Chen, J. Han, P. S. Yu, Data mining: An overview from a database perspective. IEEE Transaction
on knowledge and Data Engineering, 8(1996), 866–883.
148 N. Abuzayed and B. Ergenç / Dynamic Itemset Mining Under Multiple Support Thresholds

[2] R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large databases,
In: ACM SIGMOD International conference on Management of data, USA (1993), 207–216.
[3] J. Han, M. Kamber, J. pei, Data mining concepts and techniques, Morgan Kaufmann Publishers,
Location-Based Services Jochen Schiller, Agnes Voisard (2006), 157–218.
[4] J. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation, In: ACM SIGMOD
International Conference on Management of Data, ACM New York, USA (2000), 1–12.
[5] R. Agrawal, R. Srikant, Fast algorithms for mining association rules in large databases, In: The 20th
International Conference on Very Large Data Bases, San Francisco, CA, USA (1994), 487–499.
[6] H. Mannila, H. Toivonen, A.I. Verkamo, Efficient algorithms for discovering association rules, In:
AAAI Workshop on KDD, Seattle, WA, USA (1994), 181–192.
[7] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, A.I. Verkamo, Fast discovery of association rules, In
Advances in KDD. MIT Press, 12(1996), 307–328.
[8] A. Savasere, E. Omiecinski, S.B. Navathe, An efficient algorithm for mining association rules in large
databases, In: The 21st VLDB Conference, Zurich, Switzerland (1995), 432–443.
[9] J.S. Park, M. Chen, P.S. Yu, An effective hash-based algorithm for mining association rules, In: ACM
SIGMOD International Conference on Management of Data, San Jose, CA, USA (1995), 175–186.
[10] R. Srikant, Q. Vu, R. Agrawal, Mining association rules with item constraints, In: ACM KDD
International Conference, Newport Beach, CA, USA (1997), 67–73.
[11] R.T. Ng, L.V.S. Lakshmanan, J. Han, A. Pang, Exploratory mining and pruning optimizations of
constrained associations rules, In: ACM-SIGMOD Management of Data, USA (1998), 13–24.
[12] G. Grahne, L. Lakshmanan, X. Wang, Efficient mining of constrained correlated sets, In: The 16th
International Conference on Data Engineering, San Diego, CA, USA (2000), 512–521.
[13] J. Pavon, S. Viana, S. Gomez, Matrix Apriori: Speeding up the search for frequent patterns, In: The
24th IASTED International Conference on Database and Applications, Austria (2006), 75–82.
[14] B. Yıldız, B. Ergenç, Comparison of two association rule mining algorithms without candidate
generation, In: The 10th IASTED International Conference on Artificial Intelligence and Applications,
Innsbruck, Austria (2010), 450–457.
[15] H. Mannila, Database methods for data mining, Tutorial for the 4th ACM SIGKDD International
Conference on KDD, New York, USA (1998).
[16] B. Liu, W. Hsu, Y. Ma, Mining association rules with multiple minimum supports, In: The 5th ACM
SIGKDD International Conference on KDD, San Diego, CA, USA (1999), 337–341.
[17] Y. Hu, Y. Chen, Mining association rules with multiple minimum supports: a new mining algorithm
and a support tuning mechanism, Decision Support Systems, 42(2006), 1–24.
[18] R.U. Kiran, P.K. Reddy, Novel techniques to reduce search space in multiple minimum supports-based
frequent pattern mining algorithms, In: The 14th International Conference on Extending Database
Technology, ACM, New York, USA, (2011), 11–20.
[19] S. Darrab, B. Ergenç, Frequent pattern mining under multiple support thresholds, In: The 16th Applied
Computer Science Conference, WSEAS Transactions on Computer Research, Turkey, 4(2016), 1–10.
[20] D.W. Cheung, J. Han, V.T. Ng, C.Y. Wong, Maintenance of discovered association rules in large
databases, An incremental updating technique, In: The 12th IEEE International Conference on Data
Engineering, New Orleans, Louisiana, USA, (1996), 106–114.
[21] D.W. Cheung, S.D. Lee, B. Kao, A general incremental technique for maintaining discovered
association rules, In: The 5th International Conference on Database Systems for Advanced
Applications, Melbourne, Australia, (1997), 185–194.
[22] D. Oğuz, B. Ergenç, Incremental itemset mining based on Matrix Apriori, DEXA-DaWaK, Vienna,
Austria, (2012), 192–204.
[23] D. Oğuz, B. Yıldız, B. Ergenç, Matrix based dynamic itemset mining algorithm, International Journal
of Data Warehousing and Mining, 9(2013), 62–75.
[24] Y. Aumann, R. Feldman, O. Lipshtat, H. Manilla, Borders: An efficient algorithm for association
generation in dynamic databases, Journal of Intelligent Information System, 12(1999), 61–73.
[25] S. Shan, X. Wang, M. Sui, Mining Association Rules: A continuous incremental updating technique,
In: International Conference on WISM, IEEE Computer Society, Sanya, China (2010), 62–66.
[26] B. Dai, P. Lin, iTM: An efficient algorithm for frequent pattern mining in the incremental database
without rescanning, In: The 22nd International Conference on Industrial, Engineering and Other
Applications of Applied Intelligent Systems, Tainan, Taiwan (2009), 757–766.
[27] W. Cheung, O.R. Zaiane, Incremental mining of frequent patterns without candidate generation or
support constraint, In: IDEAS, Hong Kong, China (2003), 111–116.
[28] F.A. Hoque, M. Debnath, N. Easmin, K. Rashad, Frequent pattern mining for multiple minimum
supports with support tuning and tree maintenance on incremental database, Research Journal of
Information Technology, 3(2011), 79–90.
[29] Frequent Itemset Mining Implementations Repository, http://fimi.ua.ac.be/data/
Fuzzy Systems and Data Mining II 149
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-149

Deep Learning with Large Scale Dataset


for Credit Card Data Analysis
Ayahiko NIIMI 1
Faculty of Systems Information Science, Future University Hakodate,
2-116 Kamedanakano, Hakodate,
Hokkaido 041-8655, Japan

Abstract. In this study, two major applications are introduced to develop advanced
deep learning methods for credit card data analysis. Credit card information is con-
tained in two data sets; credit approval dataset and card transaction dataset. The
credit card dataset has two problems. One problem is using credit card approval
dataset, it is necessary to combine multiple models, each referring to a different
clustered group of users. The other problem is using card transaction dataset, since
the actual unauthorized credit card use is very small, these imprecise solutions do
not allow the appropriate detection of fraud. To solve these problems, we proposed
deep learning algorithm to apply credit card dataset. The proposed methods are val-
idated using benchmark experiments with other machine learnings. To evaluate our
proposed method, we use two credit card datasets, credit approval dataset by UCI
machine learning repository and credit transaction dataset constructed by random.
The experiments confirm that deep learning exhibits comparable accuracy to the
Gaussian kernel support vector machine (SVM). The proposed methods are also
validated using large scale transaction dataset. Moreover, we apply our proposed
method for the time-series benchmark dataset. Deep learning parameter adjustment
is difficult. By optimizing the parameters, it is possible to increase the learning
accuracy.
Keywords. Data Mining, Deep Learning, Credit Approval Dataset, Card Transaction
Dataset

Introduction

Deep learning is a state-of-the-art research topic in the machine learning field with ap-
plications for solving various problems [1, 2]. This paper investigates the application of
deep learning in credit card data analysis.
Credit card data are mainly used in user and transaction judgments. User judgment
determines whether a credit card should be issued to the user satisfying particular criteria.
On the other hand, transaction judgment refers to whether the validity of a transaction is
correct [3]. We determined the deep learning processes required for solving each of these
problems, and we proposed appropriate methods for deep learning [4, 5].
To verify our proposed methods, we use benchmark experiments with other machine
learnings, which confirm the accuracy of the deep learning methods similar to that of
1 Corresponding Author: Ayahiko Niimi, Faculty of Systems Information Science, Future University

Hakodate, 2-116 Kamedanakano, Hakodate, Hokkaido 041-8655, Japan; E-mail: niimi@fun.ac.jp.


150 A. Niimi / Deep Learning with Large Scale Dataset for Credit Card Data Analysis

the Gaussian Kernel SVM. In the final section of this paper, we provide suggestions for
future deep learning experiments.
We only used a small scale transaction dataset for evaluation experiment, and did
not use a large-scale dataset [6]. In this paper, The proposed methods are also validated
using large scale transaction dataset. Moreover, we apply our proposed method for the
time-series benchmark dataset.
First, in section 1, we will introduce the characteristics of the data-set of the credit
card. Then, in section 2, we will introduce Deep Learning. In section 3, we will discuss
the data processing infrastructure that is suitable for analysis of credit card data. In sec-
tion 4, we describe the experiment, and the results are shown in section 5. We discuss
about the results in section 6. Finally, in section 7, we describe conclusions and future
works.

1. Credit Card Data Set

The datasets of credit card are as follows:


1. credit approval dataset
2. card transaction dataset

1.1. Credit approval dataSet

For each user submitting a credit card creation application, there is a record of the deci-
sion to issue the card or to reject the application. This is based on the user’s attributes, in
accordance with the general usage-trend models.
However, to reach this decision, it is necessary to combine multiple models, each
referring to a different clustered group of users.

1.2. Credit card transaction data

In actual credit card transactions, the data is complex, constantly changing, and conti-
nously arrives online as follows:

(i) Approximately one million transactions are arrived per day.


(ii) Each transaction takes less than one second for completion.
(iii) Approximately one hundred transactions arrive per second during peak time.
(iv) Transactions data arrive continuously.

Therefore, credit card transaction data can be precisely called a data stream. How-
ever, even if we use data mining for such data, an operator can monitor around only 2,000
transactions per day. Therefore, we have to detect suspicious transaction data effectively
by analyzing less than 0.02% of the total number of transactions. In addition, fraud de-
tection is extremely low from analyzing massive amounts of transaction data, because
real fraud occurs at an extremely low rate, i.e., within a 0.02% to 0.05% of all of the
transaction data.
In a precious paper, transaction data in CSV format were described as attributed in a
time order [3]. Credit card transaction data have 124 attributes, 84 are called transactional
A. Niimi / Deep Learning with Large Scale Dataset for Credit Card Data Analysis 151

data, including an attribute used to discriminate whether the data refers to fraud; the
others are called behavioral data, and they refer to the credit card usage. The inflow file
size is approximately 700 MB per month.
Mining the credit card transaction data stream presents inherent difficulties, since it
requires performing efficient calculations on an unlimited data stream with limited com-
puting resources. Therefore many streams mining methods seek an approximate or prob-
abilistic solution instead of an exact one. However, since the actual unauthorized credit
card use is very small, these imprecise solutions do not allow the appropriate detection
of fraud.

2. Deep Learning

Deep learning is a new technology that recently attracted much attention in the field of
machine learning. It significantly improves the accuracy of abstract representations by re-
constructing deep structures such as neural circuitry of the human brain. The deep learn-
ing algorithms were honored in various competitions such as International Conference
on Representation Learning.
Deep learning is a generic term for multilayer neural networks, which were re-
searched for a long time [1, 2, 7]. Multilayer neural networks decrease the overall calcu-
lation time by performing calculation on hidden layers. Thus, they were prone to exces-
sive over training, as an intermediate layer was often used for approximately every single
layer.
However, the technological advances suppressed over training, whereas GPU uti-
lization and parallel processing increased the number of hidden layers.
A sigmoid or a tanh function was commonly used as an activation function (see
Equation 1, 2), although recently, a maxout function was also used (section 2.1).
The dropout technique was implemented to prevent over training (section 2.2).

hi (x) = sigmoid(xT W...i + bi ) (1)


hi (x) = tanh(xT W...i + bi ) (2)

2.1. Maxout

The maxout model is simply a feed-forward architecture such as a multilayer perceptron


or deep convolutional neural network that uses a new type of activation function, the
maxout unit [2].
In particular, given an input x ∈ Rd (x may be v, or it may be a hidden layer ’s
state), a maxout hidden layer implements the function

hi (x) = maxj∈[1,k] zij (3)

where zij = xT W...ij + bij , W ∈ Rd×m×k and b ∈ Rm×k are learned parameters. In
a convolutional network, a maxout feature map can be constructed by taking the maxi-
mum across k affine feature maps (i.e., pool across channels, in addition to spatial loca-
152 A. Niimi / Deep Learning with Large Scale Dataset for Credit Card Data Analysis

tions). When training with dropout, we perform the element-wise multiplication with the
dropout mask immediately prior to the multiplication by weights, in all cases; inputs are
not dropped to the max operator. A single maxout unit can be interpreted as a piecewise
linear approximation of an arbitrary convex function. Maxout networks learn not just the
relationship between hidden units, but also the activation function of each hidden unit.
Maxout abandons many of the mainstays of traditional activation function design.
The representation it produces is not sparse at all, though the gradient is highly sparse,
and the dropout will artificially sparsify the effective representation during training. Al-
though maxout may learn to saturate on one side or another, this is a measure zero event
(so it is almost never bounded from above). Since a significant proportion of parame-
ter space corresponds to the function delimited from below, maxout learning is not con-
strained at all. Maxout is locally linear almost everywhere, whereas many popular acti-
vation functions have significant curvature. Given all of these deviations from standard
practice, it may seem surprising that maxout activation functions work at all, but we find
that they are very robust, easy to train with dropout, and achieve excellent performance.

hi (x) = maxj∈[1,k] Zij (4)


zij = X T W...ij + bij (5)

2.2. Dropout

Dropout is a technique that can be applied to deterministic feedforward architectures that


predict an output y given an input vector v [2].
In particular, these architectures contain a series of hidden layers h= {h(1), . . . , h(L)}.
Dropout trains an ensemble of models consisting of a subset of the variables in both v
and h. The same set of parameters θ is used to parameterize a family of distributions
p(y|v; θ, μ), where μ ∈ M is a binary mask determining which variables to include
in the model. On each example, we train a different submodel by following the gra-
dient log p(y|v; θ, μ) for a different randomly sampled μ. For many parameterizations
of p (usually for multilayer perceptrons) the instantiation of the different submodels
p(y|v; θ, μ) can be obtained by elementwise multiplication of v and h with the mask μ.
The functional form becomes important when the ensemble makes a prediction by
averaging together all the submodels’ predictions. Previous studies on bagging averages
used the arithmetic mean. However, this is not possible with the exponentially many
models trained by dropout. Fortunately, some models easily yield a geometric mean.
When p(y|v; θ) = softmax(v T W + b), the predictive distribution defined by renormaliz-
ing the geometric mean of p(y|v; θ, μ) over M is simply given by softmax(v T W/2 + b).
In other words, the average exponential prediction for many submodels can be computed
simply by running the full model with the weights divided by two. This result holds
exactly in the case of a single layer softmax model. Previous work on dropout applies
the same scheme in deeper architectures, such as multilayer perceptrons, where the W/2
method is only an approximation of the geometric mean. This approximation was not
characterized mathematically, but performed well in practice.
A. Niimi / Deep Learning with Large Scale Dataset for Credit Card Data Analysis 153

3. Data Analysis Platform

In this section, we consider the data processing infrastructure that is suitable for analysis
of credit card data, as well as the applications of deep learning to credit card data analysis.

3.1. R

R is a language and environment for statistical computing and graphics [8]. It is a GNU
project similar to the S language and environment which was developed at Bell Labora-
tories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R
can be considered as a different implementation of S. There are some important differ-
ences, but much code written for S runs unaltered under R. R is available as Free Soft-
ware and is widly used. It includes many useful libraries, such as multivariate analysis,
machine learning, and it is suitable for data mining.
However, R performs processing in memory, therefore, it is not suitable for large
amounts of data processing.

3.2. Google BigQuery, Amazon Redshift

Google BigQuery [9] and Amazon Redshift [10] are systems corresponding to the in-
quiries using large amounts of data. These cloud systems can easily store a large amount
of data and processing it at high speed. Therefore, we can use them to analyze data trends
interactively. However, data processing, such as machine learning, needs to be further
developed.

3.3. Apache Hadoop

Apache Hadoop is a platform for handling large amount of data as well [11]. Apache
Hadoop divides the process into mapping and reducing, wich operate in parallel; the Map
processes data, whereas the Reduce summarizes the results. In combination, these pro-
cesses realize high-speed processing of large amounts of data. However, since process-
ing is performed in batches the Map/Reduce cycle can be completed before all data are
stored. It is difficult to apply separate algorithms for Map/Reduce different batches. In
particular, it is difficult to implement the algorithm repeatedly for the same data, as is
required in machine learning.

3.4. Apache Storm

Apache Storm is designed to process a data stream [12]. For incessantly flowing data,
data conversion is executed. The data source is called the Spout and the part that per-
forms the conversion process is called the Blot. Apache Storm is a model that performs
processing by a combination of Bolt from Spout.

3.5. Apache Spark

Apache Spark is also a platform that processes large amounts of data [13]. Apache Spark
has generalized the Map/Reduce processing. It processes by caching the work memory,
and it is designed to execute efficient iterative algorithms by maintaining shared data,
154 A. Niimi / Deep Learning with Large Scale Dataset for Credit Card Data Analysis

which is used for repeated processing in the memory. In addition, a machine learning
and graphs algorithms library is prepared, and it can be an easily build environment for
stream data mining.
H2O is a library of deep learning for Spark [14, 15].
SparkR is an R package that provides a light weight frontend for Apache Spark
from R [16]. In Spark 1.5.0, SparkR provides distributed data frame implementation that
supports operations, such as selection, filtering, aggregation, similar to R data frames,
dplyr, but on large datasets. SparkR also supports distributed machine learning using
MLlib.
In the present paper, we perform credit card data analysis using R and Spark. It is
possible to use an extensive library with R to gain the high performance by parallel and
distributed processing of Spark.

4. Experiments

We used the credit approval dataset by UCI Machine Learning repository to evaluate the
experimental results [4].
All attribute names and values were reassigned to meaningless symbols to protect
the data confidentiality.
In addition, the original dataset contains missing values. In the experiment, we use
a pre-processing dataset [17], as presented in Table 1.

Table 1. UCI Dataset(Credit Approval Dataset)


Number of Instances: 690
Number of Attributes 15 + class attribute
Class Distribution: +: 307 (44.5%),
-: 383 (55.5%)
Number of Instance for Training: 590
Number of Instance for Test: 100

Deep learning uses the R library of H2O [14, 15]. H2O is a library for Hadoop and
Spark, but it also has an R package.
For comparison, we also use five typical machine learning algorithms. In addition,
the deep learning parameters (activation functions and dropout parameter) are changed
five times. In this experiment, the hidden layer neurons are set to (100, 100, 200) for deep
learning. The parameters used are shown in Table 2.
XGBoost is an optimized general purpose gradient boosting library [18]. The li-
brary is parallelized and provides an optimized distributed version. It implements ma-
chine learning algorithms under the gradient boosting framework, including a general-
ized linear model and gradient boosted decision trees. XGBoost can also be distributed
and scaled to Terascale data.
The activation functions used here are summarized in Table 3 [15].
Moreover, to ascertain whether there is a bias in the results of the training data
and the test data, we perform 10-fold cross-validation using the entire dataset. In this
experiment, the hidden layer neurons are set to (200, 200, 200).
In the experiment, we use the following environment.
A. Niimi / Deep Learning with Large Scale Dataset for Credit Card Data Analysis 155

Table 2. Comparison Algorithms


Deep learning Rectifier with Dropout
Rectifier
Tanh with Dropout
Tanh
Maxout with Dropout
Maxout
Logistic Regression
Support Vector Machine Gaussian Radial Basis Function
Linear SVM
Random Forest
XGBoost

Table 3. Activation functions


Function Formula
eα −e−α
Tanh f (α) = eα −e−α
Rectified Linear f (α) = max(0, α)
Maxout f (·) = max(wi xi + b),
rescale if maxf (·) ≥ 1
Tanh with Dropout Tanh with Dropout
Rectifier with Dropout Rectfied Linear with Dropout
Maxout with Dropout Maxout with Dropout

• AWS EC2 t2.micro


• CPU Intel Xeon 3.3 GHz
• Memory 1GB
• Disk 8GB SSD
• R version 3.2.2
• H2O version 3.0.0.30

4.1. Large-Scale Dataset

In this paper, The proposed methods are also validated using large scale transaction
dataset. We made a dataset from the actual card transaction dataset which contains the
same number of attributes(130 attributes) and the value of each attribute made by random
with the same range. The data set has about 300,000 transactions which include about
3,000 illegal usages. We made a dataset with six months data for experiment. Because
this dataset has random values, it could not use to evaluate accuracy. We used this dataset
to estimate machine specs and calculation times.
The percentage of fraud count is too low in the dataset. We used all illegal usages
(approximately 3,000) and sampling normal usages (approximately 3,000) in the exper-
iment.
We used the Amazon EC2 r3.8xlarge (32 cores, 244GB memory) for experiment. As
a preliminary experiment, deep learning’s parameters of hidden layer nurons (100, 100,
200) and epochs (200) were used, but the learning did not converge. Therefore, in the
experiment, the parameters of the deep learning (hidden layer nurons (2048, 2048, 4096)
156 A. Niimi / Deep Learning with Large Scale Dataset for Credit Card Data Analysis

and epochs (2000) and hidden dropout ratios (0.75, 0.75, 0.7)) were used. The “Maxout
With Dropout” is used for activation function.
The experimental results are currently being analyzed.

4.2. Benchmark Dataset for Time Series Data

For comparison of the proposed method, we evaluate our proposed method by using
public time-series benchmark data. We used the gas sensor dataset from the UCI Machine
Learning repository [4, 19, 20].
We are going to apply for experiment and tune parameters and analyze the obtained
results.

5. Experimental Results

5.1. Comparison of Algorithms Using the UCI Dataset

Teble 4 shows the experimental results. We run each algorithms five times and the Table
4 presents the average. Because the machine learning algorithms that we used have no
initial value dependent, the results of the algorithms are the same, all five times.

Table 4. Result of UCI Dataset


Algorithm Error Rate
Rectifier with Dropout (Deep Learning) 18.4
Rectifier (Deep Learning) 18.8
Tanh with Dropout (Deep Learning) 17.6
Tanh (Deep Learning) 22.8
Maxout with Dropout (Deep Learning) 12.4
Maxout (Deep Learning) 16.2
Logistic Regression 18.0
Gaussian Kernel SVM 11.0
Linear Kernel SVM 14.0
Random Forest 14.0
XGBoost 14.0

The deep learning results depend on the initial parameters. Deep learning of accu-
racy with the Maxout with Dropout produces a result close to the Gaussian kernel SVM.

5.2. Deep Learning: 10-Fold Cross-Validation

Table 5 shows the results of the 10-fold cross-validation. N and Y are class attributes.
Stability results are obtained regardless of the dataset.
A. Niimi / Deep Learning with Large Scale Dataset for Credit Card Data Analysis 157

Table 5. Result of Deep Learning (10-fold cross-validation)


N Y Error Rate
Totals 332 287 0.138934 =86/619
Totals 337 280 0.132901 =82/617
Totals 333 277 0.136066 =83/610
Totals 318 302 0.135484 =84/620
Totals 325 295 0.156452 =97/620
Totals 306 316 0.180064 =112/622
Totals 338 296 0.140379 =89/634
Totals 336 281 0.136143 =84/617
Totals 327 301 0.141720 =89/628
Totals 318 305 0.146067 =91/623
Average of Error Rate 0.144421

6. Considerations

The presently conducted experiments confirm that deep learning has the same accuracy
as the Gaussian kernel SVM.
In addition, the 10-fold cross-validation experiment indicates that it is deep learning
offers higher precision.
In this experiment, we used the H2O library for deep learning, with the deep learning
modules written in Java were activated each time. Therefore, we cannot assessment the
execution time.
Deep learning parameter adjustment is difficult. By optimizing the parameters, it is
possible to increase the learning accuracy.
There are some different approaches for time-series dataset [21, 22]. These ap-
proaches are different from the proposed method, but it is useful to improve our proposed
method.

7. Conclusion

In this paper, we consider the application of deep learning in credit card data analysis.
We introduce two major applications and propose methods for deep learning. To verify
our proposed methods, we use benchmark experiments with other machine learnings.
Through these experiments, it is confirmed that deep learning has the same accuracy as
the Gaussian kernel SVM. The proposed methods are also validated using large scale
transaction dataset.
In the future, we will consider evaluation an experiment using the transaction data
and real datasets.

Acknowledgment

The authors would like to thank to Intelligent Wave Inc. for many comment of credit card
transaction datasets.
158 A. Niimi / Deep Learning with Large Scale Dataset for Credit Card Data Analysis

References

[1] Y. Bengio. Learning Deep Architectures for AI. Foundations & Trends R in Machine Learning,
2(2009):1-127.
[2] I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio. Maxout Networks. ArXiv
e-prints, Feb., 2013.
[3] T. Minegishi and A. Niimi. Detection of Fraud Use of Credit Card by Extended VFDT, in World
Congress on Internet Security (WorldCIS-2011), London, UK, Feb., (2011), 166–173.
[4] M. Lichman. UCI Machine Learning Repository. (2013), (Access Date: 15 September, 2015). [Online].
Available: http://archive.ics.uci.edu/ml
[5] T. J. OZAKI. Data scientist in ginza, tokyo. (2015), (Access Date: 15 September, 2015). [Online]. Avail-
able: http://tjo-en.hatenablog.com/
[6] A. Niimi. Deep Learning for Credit Card Data Analysis, in World Congress on Internet Security
(WorldCIS-2015), Dublin, Ireland, Oct., (2015), 73–77.
[7] Q. Le. Building High-Level Features using Large Scale Unsupervised Learning. in Acoustics, Speech
and Signal Processing (ICASSP), 2013 IEEE International Conference on, May, (2013), 8595–8598.
[8] R: The R project for statistical computing. (Access Date: 15 September, 2015). [Online]. Available:
https://www.r-project.org/
[9] Google cloud platform. what is BigQuery? - Google BigQuery. (Access Date: 15 September, 2015).
[Online]. Available: https://cloud.google.com/bigquery/what-is-bigquery
[10] AWS Amazon Redshift. Cloud Data Warehouse Solutions. (Access Date: 15 September, 2015). [Online].
Available: https://aws.amazon.com/redshift/
[11] Apache Hadoop. Welcome to Apache Hadoop! (Access Date: 15 September, 2015). [Online]. Available:
https://hadoop.apache.org/
[12] Apache Storm. Storm, distributed and fault-tolerant realtime computation. (Access Date: 15 September,
2015). [Online]. Available: https://storm.apache.org/
[13] Apache Spark. Lightning-Fast Cluster Computing. (Access Date: 15 September, 2015). [Online]. Avail-
able: https://spark.apache.org/
[14] 0xdata — H2O.ai — Fast Scalable Machine Learning. (Access Date: 15 September, 2015). [Online].
Available: http://h2o.ai/
[15] A. Candel and V. Parmar. Deep Learning with H2O. H2O, (2015), (Access Date: 15 September, 2015).
[Online]. Available: http://learnpub.com/deeplearning
[16] SparkR (R on Spark) — Spark 1.5.0 Documentation. (Access Date: 15 September, 2015). [Online].
Available: https://spark.apache.org/docs/latest/sparkr.html
[17] T. J. OZAKI. Credit Approval Data Set, modified. (2015), (Access Date: 15 September, 2015).
[Online]. Available: https://github.com/ozt-ca/tjo.hatenablog.samples/tree/
master/r_samples/public_lib/jp/exp_uci_datasets/card_approval
[18] dmlc XGBoost extreme Gradient Boosting. (Access Date: 15 September, 2015). [Online]. Available:
https://github.com/dmlc/xgboost
[19] A. Vergara, S. Vembu, T. Ayhan, M. Ryan, M. Homer, and R. Huerta. Chemical Gas Sensor Drift Com-
pensation using Classifier Ensembles. Sensors and Actuators B: Chemical, 166(1), (2012), 320–329.
[20] I. Rodriguez-Lujan, J. Fonollosa, A. Vergara, M. Homer, and R. Huerta. On the Calibration of Sensor Ar-
rays for Pattern Recognition using the Minimal Number of Experiments. Chemometrics and Intelligent
Laboratory Systems, 130, (2014), 123–134.
[21] S. Yin, X. Xie, J. Lam, K. C. Cheung, and H. Gao. An Improved Incremental Learning Approach for
KPI Prognosis of Dynamic Fuel Cell System. IEEE Transactions on Cybernetics, PP(99), (2015), 1–10.
[22] S. Yin, H. Gao, J. Qiu, and O. Kaynak. Fault Detection for Nonlinear Process with Deterministic Dis-
turbances: A Just-In-Time Learning Based Data Driven Method. IEEE Transactions on Cybernetics,
PP(99), (2016), 1–9.
Fuzzy Systems and Data Mining II 159
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-159

Probabilistic Frequent Itemset Mining


Algorithm over Uncertain Databases with
Sampling
Hai-Feng LI1, Ning ZHANG, Yue-Jin ZHANG and Yue WANG
School of Information, Central University of Finance and Economics, Beijing 100081,
China

Abstract. Uncertain data is the data accompanied with probability, which makes
the frequent itemset mining have more challenges. Given the data size n,
computing the probabilistic support needs O(n(logn)2) time complexity and O(n)
space complexity. This paper focuses on the problem of mining probabilistic
frequent itemsets over uncertain databases and proposed PFIMSample algorithm.
We employ the Chebyshev inequation to estimate the frequency of the items,
which decreases certain computing from O(n(logn)2) to O(n). In addition, we
propose the sampling technique to improve the performance. Our extensive
experimental results show that our algorithm can achieve a significantly improved
runtime cost and memory cost with high accuracy.

Keywords. Uncertain database, probabilistic frequent itemset, data mining,


sampling

Introduction

The restraint of physical factors, the data preprocessing and the data privacy protecting
methods will bring uncertainty to data, which is significant over continuous arrived
data [1]. By introduce the probability of data occurrence, we can improve the robust of
data mining method, and guarantee that the data analysis can achieve exact and precise
knowledge, which is much valuable for user decision. Frequent itemset mining
algorithms over certain databases have achieved many good results [2-4]. Nevertheless,
the uncertainty of data [5, 6] brings new challenges.
According to the different definitions of frequent itemset over uncertain data, the
mining methods can be categorized into two types: one is based on the expected
support and another is based on the probabilistic support [7-22]. The methods based on
the expected support mainly used the expectation of the itemset support to evaluate
whether an itemset is frequent; the methods based on the probabilistic support
considered that an itemset is frequent when its support is larger than the minimum
support with a specified high probability. If the database size is n, then the former have
O(n) time complexity and O(1) space complexity, and the latter have O(n(logn)2) time
complexity and O(n) space complexity[11]. Clearly, the former has a much higher

1
Corresponding Author: Hai-Feng LI, School of Information, Central University of Finance and
Economics, Beijing 100081, China; E-Mail: mydlhf@cufe.edu.cn.
160 H.-F. Li et al. / Probabilistic Frequent Itemset Mining Algorithm

performance. The latter, however, can represent the probabilistic characters of frequent
itemsets.
Since the computing of probability support is complicate, it is more challengeable.
In this paper, we focus on this problem and propose a frequency estimating method
based on Chebyshev Inequation and sampling method, to implement the approximately
computing of probabilistic support, and guarantee the accuracy by theoretical analysis.
Also, we use the experiments to testify this method.
This paper is organized as follows. Section 1 introduces the preliminaries of
frequent itemset mining. Section 2 proposed our PFIMSample algorithm in detail.
Section 3 presents the experimental results. Section 4 concludes this paper.

1. Preliminaries

We use * {i , i , , i } to denote the distinct items, in which *


1 2 n
is the size of * .
n

We call an itemset X with size n the k-itemset. Assuming X has item xt (0  t d k ) with
a probability pt , then X is an uncertain itemset, denoted as
X {x1 , p1; x2 , p2 ; ; xk , pk } . For uncertain dataset UD {UT1 ,UT2 , UTv } , each
UTi (i 1 v) denotes a transaction based on * , which has an id and the corresponding
(tid , X )
itemset X, denoted as . Figure 1 is a simple uncertain dataset, which, if using
possible world model, can be converted multiple certain dataset with a probability, and
each certain dataset is called a possible world.
Definition 1 (Count Support): Given the uncertain database UD and itemset X, the
occurrence count is called the count support of X, denoted as /UD ( X ) 㧘
/( X )
for
short.
Definition 2 (Possible World) [9]: Given the uncertain database, the generated
possible world PW has |UD| transactions, each transaction Ti is a subset of UTi ,
, in which Ti Ž UTi .
PW {T1 , T2 , T|UD|}
denoted as
Providing the uncertain transaction is independent, then the probability of the
possible world, p(PW), can be computed by the following method. If an item x exists in
Ti and UTi , then we get the probability of x, the p(x); if x exists in UTi but not in Ti ,

then we get the probability of x , the p( x ). Then we can multiply all the probabilities.
p( PW ) 3 ( 3 p( x ))( 3 p( x))
xUTi xTi xTi
The computing equation is .
Using < to denote the possible worlds generated from UD, then the size of < will
increase exponentially w.r.t. the size of UD. That is, if UD has m transactions, and each
6im1ni
transaction has nm items, then < has 2 possible worlds.
H.-F. Li et al. / Probabilistic Frequent Itemset Mining Algorithm 161

Figure 1. Uncertain database vs. possible worlds

The left image of Figure 1 show the uncertain dataset has 2 transactions, each one
22
has two items. As can be seen in the right image of Figure 1, there are 2 16
possible worlds, each of which has a occurrence probability. As an example, possible
world PW6 has two transactions T1 and T2 , which are all {A}. Then the probability of
PW6 is
p( PW6 ) p{ A}UT1 š{ A}T1 ({ A}) p{B}UT1 š{B}T1 ({B}) p{ A}UT2 š{ A}T2 ({ A})
= * * *
p{C}UT2 š{C}T2 ({C})
=0.6*0.3*0.2*0.7=0.025. As can be seen, the summary of
probability of all the possible worlds is 1.

2. PFIM Sample Algorithm

In the uncertain database UD, the frequent itemset is defined by the possible world
model. If the itemset X has the support / PW ( X ) in each possible world PW, then the
probability pPW ( X ) is the probability of PW, the p(PW). We can use a 2-tuple
< / PW ( X ) , pPW ( X ) > to denote it. In UD, X has 2
6im1ni
such tuples, which can be denoted
P (X )
with the probability summed vector / ( X ) .
Definition 3 (Probabilistic Frequent Itemset) [10, 23]: Given the uncertain
database UD, the minimum support O and the minimum probabilistic confidence W ,
and itemset X is a ( O , W )-probabilistic frequent itemsetif the probabilistic support
/WP ( X ) t O /P (X ) P
, in which W =Max{i| / ( X ) ti > W }.
For an uncertain database with size n, we can use a divide-and-conquer method
[11], which has the time complexity O(n(logn)2) and space complexity O(n). As can be
seen, n is the key factor that determines the computing efficiency. If we can decrease n,
then the runtime cost will decrease linearly.
162 H.-F. Li et al. / Probabilistic Frequent Itemset Mining Algorithm

According to the law of large number, when n is large enough, the data tend to fit
the normal distribution, with which we proposed our mining algorithm PFIMSample
based on the sampling method. We described the detail as follows.
1) To scan the database and get the data statistics characteristics, that is, the average
and the variance of the itemset probability.
2) To scan the database and compute the count support and the expected support, in
which the expected support, is the sum of the probabilities.
3) For a given sampling parameter, we use random sampling over the database so that
the acquired data fits the normal distribution. Since we assume that the uncertain
database fits the normal distribution initially if the data is massive enough, we use
the simple system sampling method, which can guarantee the mining efficiency
with a similar distribution. On the other hand, since sampling will decrease the
database size, which may reduce the mining accuracy; thus, we will evaluate it
with our experiments, and we finally find that the accuracy is not relative to the
sampling rate. According to each item, we scan the sampling database and
compute the probabilistic support, which if is larger than the minimum support,
then is a 1-frequent itemset.
4) To match all the n-probabilistic frequent itemsets and generate the (n+1)-
probabilistic itemset, and compute the probabilistic support to determine whether
they are frequent.
5) To repeat the 4) phase until no new probabilistic frequent itemsets are generated,
then output the results.
In the PFIMSample algorithm, when the independent item is generated, to
guarantee the accuracy, we will scan the database but not the sampling database to
compute the probabilistic support. Consequently, the computing cost will be
O(n(logn)2). We use the heuristic rule based pruning strategy in the 3) phase.
According to the Chebyshev Inequation, a given variable X has the expected
support E(X) and the standard variance D(X); for random constant ε>0, we can get P( |
X - E(X) | ≥ ε ) ≤ D2(X) / ε². That is, in a arbitrary dataset, the ratio that it locates
within m D(X) centered by the expected support is at least 1-1/ m2, and m is a positive
number larger than 1. For an example, if m=5, then at least 1-1/25=96% data has the
probability that the support is larger than E(X)-5D(X). Thus, before we can determine
the frequency of a distinct X, we will first compute the expected support and the
standard variance, if E(X)-mD(X) is larger than the minimum support, then X is a
probabilistic frequent itemset with 1-1/m2 probability. Since computing the expected
support is O(n), which is far less than the cost of computing the probabilistic support,
then we can prune the itemsets efficiently. This efficiency will be better follows the
larger n.
To guarantee that the memory cost is low, we use a prefix-tree to maintain the
itemsets, as well the count support, the expected support and the probabilistic support.
Note that our algorithm does not store the probabilistic density function of the itemset,
which is due the fact that the space complexity of a probabilistic density function is
O(n), and many itemsets will result in massive memory usage . Since the probabilistic
support will be computed once, the probabilistic density functions can be deleted once
the probabilistic support is achieved, which can significantly improve the performance.
H.-F. Li et al. / Probabilistic Frequent Itemset Mining Algorithm 163

3. Experimental Results

We compared the performance and the accuracy when the minimum probabilistic
confidence is set 0.9. Our algorithm was implemented with Python 2.7 under Windows
7, and run over i7-4790M 3.6GHZ CPU and 4GB memory. Two uncertain datasets
were used to evaluate our algorithm: One is the GAZELLE that contains the real e-
commerce click stream, another is the synthetic data T25I15D320K generated by the
IBM generator. We assigned a probability generated from Gaussian distribution for
each item, which is widely accepted by the current research over uncertain data [16].
We showed the characteristics of the two datasets in Table 1. Our sampling method is
a framework that can be extensively applied on the existing algorithm, we employed
the state-of-the-art method TODIS [11] as the benchmark algorithm accordingly. That
is, TODIS is actually the condition that our algorithm PFIMSample with sampling rate
1.
Table 1. The Characteristics of Uncertain Datasets

Dataset Size Trans Size Items Count


GAZELLE 59602 3 497
T25I15D320K 320000 26 1000

3.1. Runtime Cost

We first conducted PFIMSample algorithm over two datasets with different sampling
rate. From Figure 2 and 3 we can see, when the minimum support was fixed, the
mining efficiency will reduced in line with the incremental sampling rate. When the
sampling rate is 0.01, the mining cost will be very low, that is, the runtime can be 100
folds better. Furthermore, to reduce the minimum support will result in the same trend
of the performance, which was much significant over T25I15D320K dataset. This is for
the reason that T25I15D320K dataset is denser than GAZELLE dataset.

Figure 2. Runtime VS Sampling rate (GAZELLE) Figure 3. Runtime VS Sampling rate (T25I15D320K)

3.2. Memory Usage

Figure 4 and 5 compared the memory usage over different sampling rates. We can see
that the memory cost turned larger but not significantly, when the sampling rate
increased. On the other hand, the memory usage was not related to the minimum
164 H.-F. Li et al. / Probabilistic Frequent Itemset Mining Algorithm

support, which is because we used the relative minimum support. Moreover, the
memory usage is low when mining over the sparse dataset GAZELLE.

Figure 4. Memory cost VS Sampling rate Figure 5. Memory cost VS Sampling rate
(GAZELLE) (T25I15D320K)
3.3. Precision and Recall

We used the Precision and the Recall to evaluate the accuracy of our algorithm. For an
original mining results D and the ones that used the sampling rate D’, we defined
Precision=|D ∩ D’|/|D|, and Recall=|D ∩ D’|/|D’|. As a result, the larger the precision
and the recall, the higher accuracy our algorithm will be. Table 2 shows the Precision
and the Recall using different sampling rates over two datasets. As can be seen, when
the minimum support is 0.08, our algorithm can achieve 100% accuracy; also, it can
achieve more than 90% accuracy over T25I15D320K on most cases. In addition, the
accuracy of our algorithm is not related to the sampling rate since we use the random
samples.
Table 2. Precision and Recall

Dataset Minimum Sampling rate Precision Recall


support
0.01 100% 100%
0.02 100% 100%
0.03 100% 100%
0.04 100% 100%
GAZELLE 0.08 0.05 100% 100%
0.06 100% 100%
0.07 100% 100%
0.08 100% 100%
0.09 100% 100%
0.01 87% 95%
0.02 91% 95%
T25I15D320K 0.08 0.03 95% 92%
0.04 95% 92%
0.05 91% 91%
H.-F. Li et al. / Probabilistic Frequent Itemset Mining Algorithm 165

0.06 91% 91%


0.07 95% 88%
0.08 100% 96%
0.09 91% 95%

4. Conclusions

This paper made a study on probabilistic frequent itemset mining over uncertain
databases. The proposed algorithm PFIMSample employed the Chebyshev inequation
to estimate the count of frequent items, and thus can partly reduce the computing cost
from O(n(logn)2) to O(n). Moreover, we used the sampling method to improve the
performance with a high accuracy. Our extensive experimental results over two
datasets showed that our algorithm was effective and efficient.

Acknowledgement

This research is supported by the National Natural Science Foundation of China


(61100112, 61309030), National Social Science Foundation of China (13AXW010),
Discipline Construction Foundation of Central University of Finance and Economics.

References

[1] B. Babcock, S. Babu, M. Datar, et al. Models and issues in data stream systems. Proceedings of PODS,
2002.
[2] J. Han, H. Cheng, D. Xin, et al. Frequent pattern mining: current status and future directions. Data
Mining & Knowledge Discovery. 15(2007):55-86.
[3] J. Chen, Y. Ke, W. Ng. A survey on algorithms for mining frequent itemsets over data streams.
Knowledge and Information System, 16(2008), 1-27.
[4] C. C. Aggarwal, P. S. Yu. A survey of uncertain data algorithms and applications. IEEE Transaction on
Knowledge and Data Engineering, 21(2009), 609-623.
[5] A. Y. Zhou, C. Q. Jin, G. R. Wang, et al. A survey on the management of uncertain data. Chinese
Journal of Computers, 31(2009).
[6] J. Z. Li, G. Yu, A. Y. Zhou. Challenge of uncertain data management. Chinese Computer
Communications, 5(2009).
[7] J. Xu, N. Li, X. J. Mao, et al. Efficient probabilistic frequent itemsets mining in big sparse uncertain data.
Proceedings of PRICAI, 2014.
[8] Y. Konzawa, T. Amagasa, H. Kitagawa. Probabilistic frequent itemset mining on a gpu cluster. IEICE
Transactions of Information and Systems, E97-D(2014) , 779-789.
[9] Q. Zhang, F. Li, K. Yi, Finding frequent items in probabilistic data. Proceedings of SIGMOD, 2008
[10] T. Bernecker, H. P. Kriegel, M. Renz, et al, Probabilistic frequent itemset mining in uncertain
databases. Proceedings of SIGKDD, 2009.
[11] L. Sun, R. Cheng, D. W. Cheung, et al, Mining uncertain data with probabilistic guarantees.
Proceedings of SIGKDD, 2010.
[12] T. Bernecker, H. P. Kriegel, M. Renz, et al, Probabilistic frequent pattern growth for itemset mining in
uncertain databases. Proceedings of SSDM, 2012.
[13] L. Wang, R. Cheng, S. D. Lee, et al, Accelerating probabilistic frequent itemset mining: a model-based
approach. Proceedings of CIKM, 2010.
[14] L. Wang, D. Cheung, R. Cheng, et al. Efficient mining of frequent item sets on large uncertain
databases. IEEE Transaction on Knowledge and Data Engineering, 24(2012), 2170-2183.
166 H.-F. Li et al. / Probabilistic Frequent Itemset Mining Algorithm

[15] T. Calders, C. Garboni, B. Goethals. Approximation of frequentness probability of itemsets in uncertain


data. Proceedings of ICDM, 2010.
[16] Y. Tong, L. Chen, Y. Cheng, et al. Mining frequent itemsets over uncertain databases. Proceedings of
VLDB, 2012.
[17] P. Tang, E. A. Peterson. Mining probabilistic frequent closed itemsets in uncertain databases.
Proceedings of ACMSE, 2011.
[18] E. A. Peterson, P. Tang. Fast approximation of probabilistic frequent closed itemsets. Proceedings of
ACMSE, 2012.
[19] Y. Tong, L. Chen, B. Ding, Discovering threshold-based frequent closed itemsets over probabilistic
data. Proceedings of ICDE, 2012.
[20] C. Liu, L. Chen, C. Zhang. Mining probabilistic representative frequent patterns from uncertain data.
Proceedings of SDM, 2013.
[21] C. Liu, L. Chen, C. Zhang. Summarizing probabilistic frequent patterns: a fast approach. Proceedings
of SIGKDD, 2013.
[22] B. Pei, S. Zhao, H. Chen, et al. FARP: Mining fuzzy association rules from a probabilistic quantitative
database. Information Sciences, 237(2013), 242-260.
[23] P. Y. Tang and E. A. Peterson. Mining probabilistic frequent closed itemsets in uncertain databases,
Proceedings of ASC, 2011.
Fuzzy Systems and Data Mining II 167
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-167

Priority Guaranteed and Energy Efficient


Routing in Data Center Networks
Hu-Yin ZHANG a, b,1 , Jing WANG b, Long QIAN b and Jin-Cai ZHOU b
a
Shenzhen Research Institute of Wuhan University, Shenzhen, Guangdong, China
b
School of Computer Science, Wuhan University, Wuhan, Hubei, China

Abstract. In data center networks, energy consumption accounts for a


considerably large slice of operational expenses. Many energy saving strategies
have been proposed, most of them follow the point of bandwidth or throughput to
complete the design of energy saving model. This paper provides a new
perspective of energy saving in data center networks, which basic idea is to ensure
the higher priority traffics have the shorter routes. Combine the bandwidth
constraints with the aim of energy saving, and keep the balance between energy
consumption and traffic priority demand. Simulations show that our routing
algorithm can effectively reduce the transmission delay of the higher priority
traffics and reduce the power consumption of data center networks.

Keywords. data center networks, priority, energy saving, routing

Introduction

With the change of traffic model, large-scale data center networks (DCNs) are often
deployed in Fat-Tree architecture as the non-blocking network. It has over provisioned
network resources and inefficient power usage. Thus, the goal of network power
conservation is to make the power consumption on networking devices proportional to
the traffic load [1]. Many researchers have investigated energy saving for DCNs from
different aspects. The article [2] proposed energy saving routing based on elastic tree.
In [3], the authors proposed a data center energy-efficient network-aware scheduling.
The article [4] presented an energy efficient routing algorithm with the network load
balancing and energy saving. In [5], the authors proposed a bandwidth guaranteed
energy efficient DCNs scheme from the perspective of routing and flow scheduling.
The article [6] aimed to reduce the power consumption of DCNs from the routing
perspective while meeting the throughput performance requirement.
In the DCNs, the network delay is also an important parameter which can reflect
the network performance [7]. The traffic with high priority usually has a strict demand
of transmission delay. In this paper, a new energy efficient routing algorithm is
proposed with the considering of traffic priority. Its basic idea is to make sure the
higher priority traffics get the shorter routings, combine the bandwidth constraints, and
balance between energy consumption and traffic priority demand.

1
Corresponding Author: Hu-Yin ZHANG, Shenzhen Research Institute of Wuhan University,
Shenzhen, China; School of Computer Science, Wuhan University, Wuhan, China; E-mail:
zhy2536@whu.edu.cn.
168 H.-Y. Zhang et al. / Priority Guaranteed and Energy Efficient Routing in Data Center Networks

1. Network Model and Problem Statement

1.1. Network Model

Figure 1 shows the Fat-Tree architecture, which contains three tiers of switch modules,
the Ck core switches, the Ak aggregation switches and the Sk edge switches. i is from one
to n, it represents the number of switches in the tier. This is conventionally denoted as
a v(c, a, s) network. In order to achieve the goal of efficient energy, it needs to use the
links as little as possible, so that we can make switches work in the sleep mode as
many as possible. The Eq.(1) describes the minimum link number, which is intended to
use a minimum number of switches in the v(c, a, s) network.

Figure 1.Fat-Tree DCNs.

min L = Rw v(c, a, s) (1)


z
w
R 㨪 y R(Ck , Ak , Sk ) , i = 1,2, ⋯ n (2)
k~

Rw in Eq.(1) is an array which array R sum the nodes and then take a linear
transform in Eq.(2). It expresses the number of active switches in each tier. In the array
R, Ck , Ak , Sk represent the node name of active switches in each tier respectively. The
problem is how to obtain the optimal array R with the priority guaranteed traffics, and
establish the least links.

1.2. Problem Statement

In order to establish the least links, the bandwidth utilization of the used links needs to
get a maximum value. In array R, the higher priority traffic will choose the shorter
routing path. However, we will encounter the problem in figure 2.
When traffic 1 (higherpriority) used the A–>B–>E path, traffic 2 and 3 have no
path to use, and the failure bandwidth (FB) occurs. If we analyze the traffic
requirements from the overall situation, optimize the routing and change the traffic 1
into the path A->D->E, then traffic 2 and 3 can both have their paths, and the FB is 0.
Although the higher priority traffic needs to choose a new route, it did not increase its
forwards, so we can regarded this change as no increase of transmission delay.
H.-Y. Zhang et al. / Priority Guaranteed and Energy Efficient Routing in Data Center Networks 169

Figure 2. Failure bandwidth.


The goal of our priority guaranteed and energy efficient routing (PER) algorithm is
no FB with the priority guarantee, and obtain the optimal array _ which can make the
DCNs topology to gained the maximum bandwidth utilization and the minimum links,
then we turn idle switches into sleep mode for energy saving.

2. PER Algorithm

2.1. Network Model

The scheme is to compute transmission paths for all flows in the DCNs topology and
reduce the energy consumption of switches in this topology as little as possible.

Any Failure
Bandwidth ?

Figure 3. PER scheme.


As shown in figure 3, the PER algorithm works in the following steps:
x Step 1, according to the priority level, the highest priority traffics obtains the
shortest routing configuration.
170 H.-Y. Zhang et al. / Priority Guaranteed and Energy Efficient Routing in Data Center Networks

x Step 2, update priority parameter, and then configure the lower priority traffic.
x Step 3, see if there is any failure bandwidth, if yes jump to step 4, else jump to
step 6.
x Step 4, the priority guaranteed optimization algorithm.
x Step5, is the optimized new routing of the higher priority traffic longer than
the existing one? If yes, the routing of the higher priority traffic maintain the
existing one, the lower priority traffic choose the longer path. If no, the
optimized new routing will be executed. Then repeat step 3.
x Step 6, judge that if all configurations are completed, if not completed, repeat
the step 3, if all configurations are completed, generate the energy efficient
routing topology, and then turn the idle switches into sleep mode.

2.2. Priority Guaranteed Optimization Algorithm

This algorithm is designed for selecting route path for the flows with different priorities.
Each selected path can eliminate the failure bandwidth and make the link bandwidth
utilization rate as high as possible. If there are many available paths for a flow, the
problem can be converted to an undirected graph G=(S, E). Assume that the weight is
the bandwidth left in the link. We need to find out the shortest path from node b6 to b€ ,
and maximize the link bandwidth utilization. So the more bandwidth left, the bigger
weight link has. We use these path selection rules as follow:
x Rule 1, set an accessorial vector SA, each of its components SA[i][j]
represents the weight of the link from source node Sk to node A .
x Rule 2, the state of the SA: if there is a link from node Sk to A , SA[i][j]
represents the weight of this link, if no link, SA[i][j]= -1. So, we choose (Sk ,A )
for the SA[i][j]= Min{ SA | A V }. If there are links with same weight, we
choose the node which has the minimum subscript.
x Rule 3, another accessorial vector AC, each of its components AC[j][k]
represents the weight of the link from source node A to C‚ .
x Rule 4, the state of the AC: if there is a link from node A to C‚ , AC[j][k]
represents the weight of this link, if no link, AC[j][k]= -1. So, we choose
(A ,C‚ ) for the AC[j][k] = Min{ AC | C‚ V }. If there are links with same
weight, we choose the node which has the minimum subscript.
x Rule 5, we store the nodes which be selected from the rule 1 to 4.
x Rule 6, if there is any failure bandwidth, all flows in the links which related to
failure bandwidth should be reconfigured according to the Rule 1 to 4. We
should select the route path for the flow which has caused the
failure bandwidth firstly, and then for other flows according to
priorities in descending order.
x Rule 7, all the nodes including in the routing that generated from Rule 6 are
stored in the accessorial array D, then we do a comparison between the array
D and R.
x Rule 8, if the nodes of higher priority traffics in array D is more than R, the
higher priority traffics will preserve the status inarray R and then copy R to D,
or else the higher priority traffics will choose the status inarray D and then
copy D to R.
H.-Y. Zhang et al. / Priority Guaranteed and Energy Efficient Routing in Data Center Networks 171

When the path selections for all traffics are completed, the higher priority flows
are configured with the less number of routing nodes, and the array R stores switch flag
nodes which we used in the links. Therefore, we can sleep the idle switches in order to
save the data center energy consumption.

3. Evaluations

We evaluate our PER algorithm by using Fat-Tree topologies and Matlab7.11 platform.
We compare the results to the random routing without priority guaranteed. We use
simulation model with the network /(16,32,32) which includes eighty nodes of
switches. The available bandwidth of each link is randomly generated, and it does not
exceed 10M. We select twelve traffics, and set their priorities and flow capacities
randomly. To simplify the simulation system, we assume that the data processing
abilities of each layer are same. We set the transmission delay that begin from the
current node and arrive at the next node from 30 to 50ms randomly.

Figure 4. Transmission delay.


Figure 4 shows the transmission delay of twelve traffics with different priorities.
The average values of the results of three times are adopted. The dotted line with
blocks shows the value of random routing transmission delay, the solid line with dots
represents the PER algorithm transmission delay. From this figure we can see that the
transmission delay of the PER algorithm is less than the random routing, and the
fluctuation of the PER algorithm is small.
100
Active switches

80
60
40
20
0
Fat-Tree Random PER

Figure 5. Energy consumption.


172 H.-Y. Zhang et al. / Priority Guaranteed and Energy Efficient Routing in Data Center Networks

Figure 5 shows the energy consumption of three kinds of topologies based on the
network /(16,32,32). In the Fat-Tree topology, eighty switches remain in the active
state even if no traffic in some of them. In the random routing, we used almost half of
the switches in the same network, so nearly half of the number of switches can be turn
into the sleep mode. In the PER, because of the increasing utilization of link bandwidth,
about 75% of the switches can be turn into sleep mode in this network, it reduce the
energy consumption greatly.

4. Conclusion

In this paper, we address the power saving problem in DCNs from a routing
perspective. We establish the network model, and introduce the priority guaranteed and
energy efficient routing problem. Then we propose a routing algorithm to solve the
problem of improving energy consumption in DCNs with the guarantee of traffic
priorities. The evaluation results demonstrate that our algorithm can effectively reduce
the transmission delay of the higher priority traffics and the power consumption of
DCNs compared with the random routing.

Acknowledgements

This work was supported by the National Natural Science Foundation of China under
Grant No. 61540059, and the Shenzhen science and technology projects under Grant
No. JCYJ20140603152449639.

References

[1] L.A. Barroso, U. Hlzle, The case for energy-proportional computing, Computer, 40(2010):33–37.
[2] B. Heller, S. Seetharaman, P. Mahadevan, et al.., Elastic Tree: Saving energy in datacenter networks.
Proc of the 7th USENIX Symp on Networked Systems Design and Implementation (NSDI 10). New
York: ACM, 2010:249–264.
[3] D Kliazovich, P Bouvry, S.U. Khan, DENS: Data Center Energy-Efficient Network-Aware Scheduling,
Cluster Computing, 16(2013):65–75.
[4] S Dong, R Li, X Li, Energy Efficient Routing Algorithm Based on Software Defined Data Center
Network, Journal of Computer Research and Development, 52(2015): 806–812.
[5] T Wang, B Qin, Z Su,Y Xia, M Hamdi, et al.., Towards bandwidth guaranteed energy efficient data
center networking, Journal of Cloud Computing, 4(2015):1–15.
[6] M Xu, Y Shang, D Li, X Wang, Greening data center networks with throughput-guaranteed power-aware
routing, Computer Networks, 57(2013):2880–2899.
[7] W. Lao, Z. Li, Y. Bai, Methodology and Realization of Measure on Network Performance Parameter,
Computer Applications & Software, 21(2004).
Fuzzy Systems and Data Mining II 173
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-173

Yield Rate Prediction of a Dynamic


Random Access Memory Manufacturing
Process Using Artificial Neural Network
Chun-Wei CHANG and Shin-Yeu LIN1
Department of Electrical Engineering, Chang Gung University, Taiwan

Abstract. To provide a reference for the fault detection of a dynamic random


access memory (DRAM) manufacturing process, we propose a yield-rate predictor
using an artificial neural network (ANN). The inputs to the ANN are the
machining parameters in each step of the manufacturing process for a DRAM
wafer, and the output is the yield rate of the corresponding wafer. In this study, a
three-layer feed-forward back propagation ANN is used and trained by
input-output pairs of data collected from real manufacturing process. We have
tested the proposed ANN for five cases, and each case has different size of training
data set. The test results show that the average of the absolute prediction errors in
all five cases are very small, and as the size of the training data set increases, the
prediction accuracy increases and the associated standard deviation decreases.

Keyword̆̆. data mining, DRAM, yield analysis, artificial neural network, fault
detection.

Introduction

Due to the lengthy manufacturing process for a dynamic random access memory
(DRAM) chip [1-2], it would be beneficial if any manufacturing error can be detected
earlier before the whole process is completed. To do so, on-line machine-fault
detection should be performed to prevent further damage to the wafers in process [3-5].
In general, a yield rate that is much lower than average indicates a possible fault with
high probability. Therefore, an on-line prediction of the yield rate would be helpful to
the machine-fault detection.
For any integrated circuitry, the machining parameters in each step of the
manufacturing process are usually specified. However, no physical or mathematical
model exists to relate the machining parameters with the yield rate. To cope with this
modeless problem, data mining technique can be used to investigate this relationship by
extracting the information from the manufacturing data [6-7]. Therefore, in this paper,
we propose using an ANN to build up the functional relationship between the
machining parameters and the yield rate, and use the constructed ANN to serve as a
yield rate predictor [8-10]. The training algorithm for the proposed ANN will be
introduced, and the ANN will be trained by real manufacturing data. To investigate the

1
Corresponding Author: Chun-Wei CHANG, Department of Electrical Engineering, Chang Gung
University, Kwei-Shan, Tao-Yuan 333, Taiwan; E-mail: shinylin@mail.cgu.edu.tw.
174 C.-W. Chang and S.-Y. Lin / Yield Rate Prediction of a DRAM Manufacturing Process

effect of the size of the data set on training the ANN, the prediction accuracy of the
ANN trained by different size of data sets will be investigated in this paper.
This paper is organized in the following manner. Section 1 presents the proposed
ANN. Section 2 presents the test results of the proposed ANN. Section 3 draws a
conclusion.

1. Construction of ANN

There are two parts for constructing an ANN as a yield rate predictor. The first part is
to collect the data of machining parameters and the corresponding yield rate of DRAM
wafers to serve as a training data set. The second part is using the training data set to
train the ANN.

1.1. Data Collection

There are hundreds to thousands of processing steps for manufacturing a DRAM chip.
Each DRAM wafer may repeatedly visit the same machine but with different setup of
machining parameters. To train the ANN, a pair of input and output data for each wafer
is collected. The collected input data are the machining parameters, which consist of
the following types, average thickness of oxide coating, range of thickness of oxide
coating, average Nitride thickness, range of Nitride thickness, polish time of chemical
mechanical planarization, photo dose, photo focus, etc. The output data is the yield rate
of the wafer, e. g. 90%. Therefore, the input-output pair of data is formed by a multiple
input data and a single output data, and the collected input-output pairs of data will
serve as the training data set for the ANN.

1.2. Training ANN

T
x [ x1 , , xN ]
Let , where x1 , , xN represent the N machining parameters. Let
y ( x) represent the yield rate of the wafer, which is a function of the vector of
machining parameters x . Let M denote the number of input-output pairs of
collected training data set. We employ a feed-forward back propagation ANN that
consists of an input layer, one hidden layer and an output layer [11]. Fig. 1 shows the
three-layer ANN consisting of N input neurons, q hidden-layer neurons, and 1
Z
output neuron, where i , j , i 1, …, q , j 1,…, N , and E k , k 1,…, q
represent the arc weights.
The N neurons in the input layer correspond to x , and the single output neuron
is for y ( x ) . The input layer neurons directly distribute each component of x to
neurons of hidden layer. Hyperbolic tangent sigmoid function shown in Eq. (1) is used
as the activation function of the hidden layer neurons.

e x  e x
tanh( x) (1)
e x  e x
C.-W. Chang and S.-Y. Lin / Yield Rate Prediction of a DRAM Manufacturing Process 175

The activation function of the output layer is a linear function as described in


Figure 1.

Z 1 ,1 tanh(˜)
E1
x1 Z 2,1 2
Zq ,1 tanh(˜) E2
Z1,2 y( x)
¦
3
x2 E3
Zq ,2 tanh(˜)
..
. Z1,N ..
.
Eq
Z3,N q
xN
Zq , N tanh(˜)
ʳʳʳʳʳʳʳʳʳʳ
Figure 1. A three-layer feed-forward back propagation ANN

The procedures to train the ANN using the training data set, which are the M
input-output pairs of collected data, can be stated in the following. For a given input
xi to the ANN that is presented in Fig. 1, we let the corresponding output of the ANN
be denoted by yˆ (xi | ω, β) , which can be calculated by the following formula:
q N
yˆ (xi | ω, β) ¦ E tanh(¦Z , jθij )
1 j 1
(2)

where β [E1 , , Eq ]T and ω [Z1,1 , , Zq, N ]T are the vectors of arc


weights of the ANN; xij is the jth component of xi . The training problem for the
considered ANN is to find the vectors of arc weights ω and β that will minimize
the following mean square error (MSE) problem:

M
1
min
ω ,β M
¦{ y(x )  yˆ (x
i 1
i i | ω, β)}2 (3)

Based on the collected M pairs of (xi , y(xi )), i 1,..., M . We employed


Levenberg-Marquardt algorithm [12] as the iterative training algorithm to solve (3).
Stopping criteria of the employed training algorithm are when any of the following two
conditions occurs: (i) the sum of the mean squared errors, i.e. the objective value of the
MSE problem, is smaller than 0.01, and (ii) the number of epochs exceeds 1000.

2. Test Results

In this section, the prediction accuracy of the trained ANN is investigated. In addition,
the relationship between the size of training data set and the prediction accuracy is also
investigated. Therefore, five test cases with various size of training data set are set up.
In all test cases, the number of machining parameters is set to N =78. The employed
three-layer ANN consists of 78 input neurons, 150 (= q ) hidden layer neurons and one
176 C.-W. Chang and S.-Y. Lin / Yield Rate Prediction of a DRAM Manufacturing Process

output neuron. The number of epochs exceeding 1000 is chosen as the termination
criteria for training the ANN. The value of M , which is the size or the number of
input-output pairs of training data set, is set to M =50 for case 1, 150 for case 2, 400
for case 3, 700 for case 4 and 1000 for case 5. For each case, we collect 2M pairs of
input-output data from real manufacturing process and separate them into two sets, the
training and testing data sets. The training data set is used to train the employed
three-layer ANN, and the testing data set is utilized to test the prediction accuracy of
the trained ANN.
The prediction accuracy of the trained ANN is defined by the average of the
percentage of the absolute error between the actual and the predicting yield rate, which
is denoted by e and can be calculated by the following equation

1 M
y(xi )  yˆ (xi | ω, β)
e
M
¦|
i 1 y ( xi )
| u100% (4)

The corresponding standard deviation Ve is defined as

M
1
Ve
M
¦ (e  e )
i 1
i
2
(5)

y (xi )  yˆ (xi | ω,β)


where ei | | u100%
y ( xi )
For each of the five cases, after the ANN is trained by the corresponding training
data set, we test the trained ANN using the corresponding testing data set. The
prediction accuracy e for the trained ANN in all five cases is presented in Table 1,
and the associated standard deviation V e in all five cases is reported in Table 2. From
Table 1, we see that e =4.72 for case 5, and as the size of the training data set
increases, the prediction accuracy increases. From Table 2, we see that V e =4.73 for
case 5, and as the size of the training data set increases, V e decreases, which implies
that the prediction accuracy is more stable. Therefore, from the results presented in
Tables 1 and 2, we see that larger training data set enhances the prediction accuracy of
the ANN before exceeding the size that causes overtraining. To give a more insightful
prediction results of the trained ANN, a histogram of the number of tested input-output
pair with respect to the percentage of the prediction error, which is defined as
y (xi )  yˆ (xi | ω,β)
u 100% , is presented in Figure 2.
y ( xi )
Table 1. Prediction accuracy of the trained ANN for the five cases

case 1 2 3 4 5
Size, M 50 150 400 700 1000

e 45.30 17.79 10.87 6.29 4.72


C.-W. Chang and S.-Y. Lin / Yield Rate Prediction of a DRAM Manufacturing Process 177

Table 2. Standard deviation of the prediction accuracy of the trained ANN for the four cases

case 1 2 3 4 5
Size, M 50 150 400 700 1000
36.00 15.13 9.08 5.55 4.73

180
Number of input-uotput pairs

160
140
120
100
80
60
40
20
0
-30 -26 -22 -18 -14 -10 -6 -2 2 6 10 14 18 22 26 30
Percentage of prediction error

Figure 2. Histogram of the prediction error of case 5.

From Figure 2, we see that most of the tested input-output pairs of data are with
very small prediction error, which confirms that the proposed ANN can serve as a good
yield rate predictor for DRAM manufacturing process.

3. Conclusion

In this paper, a three-layer feed-forward and back propagation ANN is presented and is
used to serve as a predictor for the yield rate of a DRAM manufacturing process. The
proposed ANN is trained and tested using real manufacturing data. The test results
reveal that the prediction errors are very small, and as the size of the training data set
increases, the prediction accuracy of the ANN increases and the associated standard
deviation decreases. Therefore, the presented ANN is qualified to serve as a yield rate
predictor for the future purpose of fault detection.

Acknowledgments

This research work is supported in part by Chang Gung Memorial Hospital under grant
BMRP29.
178 C.-W. Chang and S.-Y. Lin / Yield Rate Prediction of a DRAM Manufacturing Process

References

[1] K. Chandrasekar, S. Goossens, C. Weis, M. Koedam, B. Akesson, N. Wehn and K. Goossens, Exploiting
expendable process-margins in DRAMs for run-time performance optimization, Design, Automation &
Test in Europe Conference & Exhibition, 2014, 1-6.
[2] P. S. Huang, M. Y. Tsai, C. Y. Huang, P. C. Lin, L. Huang, M. Chang, S. Shih and J. P. Lin, Warpage,
stresses and KOZ of 3D TSV DRAM package during manufacturing process, 14th International
Conference on Electronic Materials and Packaging, 2012, 1-5.
[3] S. Hamdioui, M. Taouil and N. Z. Haron, Testing open defects in memristor-based memories, IEEE
Trans. on Computers, 64(2015), 247-259.
[4] R. Guldi, J. Watts, S. PapaRao, D. Catlett, J. Montgomery and T. Saeki, Analysis and modeling of
systematic and defect related yield issues during early development of a new technology, Advanced
Semiconductor Manufacturing Conference and Workshop, 4(1998), 7-12.
[5] L. Shen and B. F. Cockburn, An optimal march test for locating faults in DRAMs, Records of the 1993
IEEE International Workshop on Memory Testing, 1993, 61-66.
[6] A. Purwar and S. K. Singh, Issues in data mining: a comprehensive survey, IEEE International
Conference on Computational Intelligence and Computing Research, 2014, 1-6.
[7] J. Han and M. Kamber. Data mining concepts and techniques. 2nd ed. Morgan Kaufmann Publishers,
2006.
[8] B. Dengiz, C. Alabas-Uslu and O. Dengiz, Optimization of manufacturing systems using a neural
network metamodel with a new training approach, Journal of the Operational Research Society,
60(2009), 1191-1197.
[9] N. Alali, M. R. Pishvaie and V. Taghikhani, Neural network meta-modeling of steam assisted gravity
drainage oil recovery processes, Journal of Chemistry & Chemical Engineering, 29(2010), 109-122.
[10] T. Chen, H. Chen and R. Liu, Approximation capability in C(Rn) by multilayer feed-forward networks
and related problems, IEEE Transactions on Neural Networks, 6(1995), 25-30.
[11] J. A. Anderson. An introduction to neural network. MIT Press, Boston, USA, 1995.
[12] B. M. Wilamowski and H. Yu, Improved computation for Levenberg-Marquardt training, IEEE Trans.
On Neural Network, 21(2010), 930-937.
Fuzzy Systems and Data Mining II 179
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-179

Mining Probabilistic Frequent Itemsets


with Exact Methods
Hai-Feng LI 1 and Yue WANG
School of Information, Central University of Finance and Economics,
Beijing, China, 100081

Abstract. Probabilistic frequent itemset mining over uncertain databases is a chal-


lenging problem. The state-of-the-art algorithm uses O(nlog 2 n) time complexity
to conduct the mining. We focus on this problem and design a framework, which
can discover the probabilistic frequent itemsets with traditional exact frequent item-
set mining methods; thus, the time complexity can be reduced to O(n). In this
framework, we supply a minimum confidence to convert the uncertain database to
exact database; furthermore, a sampling method is used to find the reasonable min-
imum confidence so that the accuracy is guaranteed. Our experiments show our
method can significantly outperform the existing algorithm.
Keywords. Uncertain Database; Exact Database; Probabilistic Frequent Itemset
Mining; Exact Frequent Itemset Mining; Data Mining

Introduction

Frequent itemset mining is one of the important techniques in data mining, which dis-
covers the patterns from databases to support the commercial decisions. Recently, new
applications have been developed in web site, Internet and wireless networks, which will
generate many uncertain data, that is, each data will be adhered with a probability to show
the existence of the data[3], Table 1 shows an example of the uncertain database with
4 items {a, b, c, d}. In such cases, traditional exact frequent itemset mining algorithms
were studied in the recent years[1] were not effective yet since new feature brings us
new challenges; thus, new methods need to be designed to handle this data environment.
The existing uncertain frequent itemset mining methods can be split into two categories.
One is based on the expected support to achieve the results[2], another is to discover
the probabilistic frequent itemsets according to the definition of probabilistic support[4].
The probabilistic frequent itemsets, in comparison to the expected frequent itemsets, can
better represent the probability of the itemsets; thus, the mining problem obtain more
focus. Nevertheless, the mining is hard since converting the uncertain database to exact
database is an NP-hard problem. One will use O(nlog 2 n) time complexity and O(n)
space complexity to conduct the probabilistic support computing for an itemset. Clearly
to see, when the database size n is large, the mining cost will be huge.

1 Corresponding Author: Hai-Feng Li, School of Information, Central University of Finance and Economics,

Beijing, China, 100081; E-mail:mydlhf@cufe.edu.cn.


180 H.-F. Li and Y. Wang / Mining Probabilistic Frequent Itemsets with Exact Methods

Table 1. An Example of Uncertain Database


ID Uncertain Transaction
1 a:0.6 b:0.4 d:1
2 a:0.8 c:0.6
3 b:1 c:0.9
4 a:0.8 b:0.8 c:0.6 d:0.6
5 d:0.7

In this paper, we will focus on the mining problem and present an approximate
method to convert the uncertain database to exact database so that the runtime can be re-
duced. The rest of the paper are organized as follows. section 1 presents the preliminaries
and then present the challenge of the problem. Section 2 introduces our method. Section
3 evaluates the performance with our experimental results. Finally, section 4 concludes
the paper.

1. Preliminaries and Problem Definition

1.1. Preliminaries

Given a set of distinct items Γ = {i1 , i2 , · · · , in }, in which we use |Γ| = n de-


notes the size of Γ. A subset X ⊆ Γ is an itemset. Suppose each item xt (0 <
t ≤ |X|) in X is annotated with an existing probability p(xt ), we call X an un-
certain itemset, which is denoted as X = {x1 , p(x1 ); x2 , p(x2 ); · · · ; x|X| , p(x|X| )},
|X|
and the probability of X is p(X) = Πi=1 p(xi ). An uncertain transaction U T is an
uncertain itemset with an ID. An uncertain database U D is a collection of uncertain
transactions U Ts (0 < s ≤ |U D|). If X ∈ U Ts , then we use p(X, U Ts ) to de-
note the probability that X occurs in U Ts . As a result, in U D, X occurs t times
with a probability pt (X, U D) = ΣΠX∈U Ts ,count(U Ts )=t (p(X, U Ts ))ΠX∈U Ts (1 −
p(X, U Ts )). The list {p1 (X, U D), p2 (X, U D), · · · , p|U D| (X, U D)} is the probabil-
ity density function.Given an itemset X, the number it occurring in an uncertain
database is called the support of X, denoted Λ(X). Consequently, we use ΛP τ (X) ≥ i
to denote the probability that X occurs more than i times, which is actually the
{pi (X, U D), pi+1 (X, U D), · · · , p|U D| (X, U D)}.

1.2. Problem Definition

Probabilistic Frequent Itemset[9] Given minimum support λ, minimum probabilistic


confidence τ and uncertain database U D, itemset X is a probabilistic frequent itemset
τ (X) ≥ λ, in which the Λτ (X) is the maximal support of
iff the probabilistic support ΛP P

itemset X that has probabilistic confidence τ , that is,

τ (X) = M ax{i|PΛ(X)≥i > τ }


ΛP (1)

In this paper, we will discover all the probabilistic frequent itemsets from the uncer-
tain databases for the given λ and minimum probabilistic confidence.
H.-F. Li and Y. Wang / Mining Probabilistic Frequent Itemsets with Exact Methods 181

Table 2. The Impact of Minimum Probabilistic Confidence τ


Mininum Probabilistic Confidence
Dataset
0.009 0.09 0.9 0.1 0.01 0.001
Runtime Cost(Sec) 34.5 35.4 34.8 35.2 35.6 34.7
GAZELLA(λ=0.08)
Memory Cost(MB) 66.8 67.4 67.4 67.1 67.4 66.9
Runtime Cost(Sec) 1979 1931 1882 1894 1873 1989
T25I15D320K(λ=0.1)
Memory Cost(MB) 1275 1275 1275 1275 1275 1275

To address this problem, many research have be studied. Zhang et al. firstly intro-
duced the conception of probabilistic frequent items[4], and employed the dynamic pro-
gramming(DP) technique to perform mining, which was improved by Bernecker et al.[5]
with using the a priori rule for further pruning. With this method, the time complexity is
O(n2 ) and the space complexity is O(n). Sun et al. improved the method by regarding
the probability computation as the convolution of two vectors and thus used the divide-
and-conquer method(DC)[6] to conduct mining, in which the fast fourier transform can
reduce the computing complexity from O(n2 ) to O(nlog 2 n). The probabilistic frequent
itemset and the expected frequent itemset were proved having relationships in [7] based
on standard normal distribution. Tong et al. surveyed all the methods in [8].

2. Probabilistic Frequent Itemset Mining Method

As can be seen, the most efficient method to computing the probabilistic support has a
significantly high cost, which, as a result, will reduce effective of the mining method in
real applications. We develop a novel method, which does not consider the method of
improving the mining method but design a framework to mining probabilistic frequent
itemsets with traditional exact frequent itemset mining methods. In this framework, we
build a relationship between uncertain data and exact data with a supplied parameter,
called the minimum confidence
.
With the minimum confidence
, we can convert the uncertain database to the exact
one as follows. We will scan the uncertain database, if the probability of an item is
smaller that
, then we will consider it as not existing in the exact database, otherwise
existing. The reason behind it is based on an instinctive consideration, that is, an item
with a small probability contributes little in getting an effective probability of its high
occurrences. Once an exact database is generated, we can employ the traditional frequent
itemset mining algorithm to discover the results. The pseudocode is shown in Algorithm
1. As an example, when we set
= 0.5, then the uncertain database in Table 1 can be
converted to the database in Table 3, in which all the items with probability smaller than
0.5 are removed directly.
In this paper, we ignored τ for two reasons. On the one hand, in [8], Tong et. al
evaluated that τ has little impact over the mining results with their experiments; we
also conducted experiments with the state-of-the-art algorithm TODIS, whose results are
shown in Table 2. As can be seen, when we fix the minimum support, the runtime cost
and the memory cost kept almost unchanged no matter how τ changes. On the other
hand, we employ a novel framework to convert the uncertain database to exact one, over
which the traditional mining methods can be used, and thus τ is not useful and can be
ignored accordingly.
182 H.-F. Li and Y. Wang / Mining Probabilistic Frequent Itemsets with Exact Methods

Table 3. The Database Converted from the Uncertain Database when  = 0.5
ID Transaction
1 ad
2 ac
3 bc
4 abcd
5 d

Algorithm 1 Probabilistic Frequent Itemset Mining Method


Require: UD: an initial uncertain database;
D: the converted exact database;
T: the transaction in D;

: minimum confidence;
λ: minimum support;
1: for each uncertain transaction U Ti in U D do
2: for each uncertain item U I in U Ti do
3: if U I.prob ≥
then
4: add U I in transaction Ti ;
5: add Ti in D;
6: perform exact frequent itemset mining algorithm with λ;

Analysis: Our method is in line with the database size n, that is, the conversion
from uncertain data to exact data need O(n) time complexity; furthermore, an other
advantage is that it can read the data into the memory synchronously, which can almost
be ignored. On the other hand, since our final mining is based on the exact database, the
mining speed can be much improved. In comparison to the uncertain database mining,
the time complexity will be reduced to O(n) at least. Suppose the count of itemsets that
need to be computed is m, then our method has the time complexity O(mn), the most
effective method to directly discover the probabilistic frequent itemsets, however, needs
o(mnlog 2 n). Clearly to see, when the database size n is large, the mining speed will be
improved significantly.
Even though the performance can be improved, the mining results will be approx-
imate. The minimum confidence
is the key parameter to determine how approximate
the mining results. Consequently, how to decide
is the main problem so far. Table 4 is
the precision and recall of our method when we set the minimum support to 0.1, 0.08
and 0.06; also, we set the minimum confidence to 0.9, 0.8 and 0.7. As can be seen, the
precision and the recall will reach to their highest value when for a special minimum
confidence. That is to say, if we find this special minimum confidence, the accuracy will
be high.
To address this problem, we employed a sampling method to find this special param-
eter. Before we convert the uncertain database, we will take samples from the database as
a sub-database, which will firstly be converted and we can use our method to determine
the minimum confidence; then, the mining can be conducted over the entire database.
H.-F. Li and Y. Wang / Mining Probabilistic Frequent Itemsets with Exact Methods 183

Table 4. Precision and Recall

Precision Recall
Data Minsup
0.9 0.8 0.7 0.9 0.8 0.7
0.1 100% 100% 100% 100% 75% 75%
GAZELLE 0.08 80% 100% 100% 100% 100% 100%
0.06 85% 100% 100% 100% 70% 70%
0.1 11% 88% 100% 100% 72% 56%
T25I15D320K 0.08 25% 95% 100% 100% 74% 64%
0.06 24% 98% 100% 100% 81% 70%

Table 5. Uncertain DataSet Characteristics

uncertain data data size avg. size min. size max. size item count mean variance item corr.
T25I15D320K 320,002 26 1 67 994 0.87 0.27 38
GAZELLE 59,602 3 2 268 497 0.94 0.08 166

(a) GAZELLE (b) T25I15D320K

Figure 1. Running Time Cost for Minimum Confidence

3. Experiments

We evaluate the performance of our framework in comparison to the state-of-the-art al-


gorithm UMiner[6]. The minimum confidence is the main parameter in our evaluations.
The method was implemented with Python 2.7 running on Microsoft Windows 7. The
experimental computer has a 3.60GHZ Intel Core i7-4790M CPU and 12GB memory.
We employed 2 datasets as the evaluation datasets. One is created with the IBM data
generator and another is a real-life dataset. The data characteristics are presented in Ta-
ble 5. Given the item number u and the average transaction size v, we demonstrate the
approximate correlation among transactions with uv .
We presented the runtime cost of our method in comparison to the UMiner algo-
rithm. As can be seen in Figure1(minsup={0.1, 0.08, 0.06}), the minimum confidence
was set from 0.1 to 0.9, the runtime cost was significantly lower than UMiner. The
smaller the minimum confidence, the higher the mining cost; thus, when we set the small-
est minimum confidence 0.1, the highest mining cost can achieve speedup of one hun-
dred over GAZELLE dataset, as well 30 folds faster over T25I15D320K dataset. This
184 H.-F. Li and Y. Wang / Mining Probabilistic Frequent Itemsets with Exact Methods

(a) GAZELLE (b) T25I15D320K

Figure 2. Memory Cost for Minimum Confidence

showed that our method was more efficient over the sparse datasets. Moreover, we pre-
sented the memory cost of our method. In Figure2, the memory cost was not impacted
by the minimum support. When the minimum confidence was small, the memory usage
was high, which, however, still smaller that the one of UMiner algorithm.

4. Conclusions

We focused on the probabilistic frequent itemset mining problem over uncertain


databases and proposed a novel method. We did not directly improve the current mining
algorithm but converted the uncertain databases to exact ones, in which a sample method
was used to find a reasonable parameter for accuracy guarantee. With such a method,
many traditional efficient algorithms over exact databases can be employed directly for
probabilistic frequent itemset mining. Our experiments showed our method was efficient.

Acknowledgement

This research is supported by the National Natural Science Foundation of China(61100112,


61309030), Beijing Higher Education Young Elite Teacher Project(YETP0987), Disci-
pline Construction Foundation of Central University of Finance and Economics, Key
project of National Social Science Foundation of China(13AXW010).

References

[1] J.Han, H.Cheng, D.Xin, and X.Yan, Frequent pattern mining: current status and future directions, Data
Mining and Knowledge Discovery,Vol.15(2007),55-86
[2] C.K.Chui, B.Kao, and E.Hung, Mining Frequent Itemsets from Uncertain Data, Proceedings of
PAKDD’2007
[3] C.C.Aggarwal, and P.S.Yu. A survey of uncertain data algorithms and applications. Transaction of
Knowledge and Data Mining, Vol.21(2009), 609-623
[4] Q.Zhang, F.Li, and K.Yi, Finding Frequent Items in Probabilistic Data, Proceedings of SIGMOD’2008
H.-F. Li and Y. Wang / Mining Probabilistic Frequent Itemsets with Exact Methods 185

[5] T.Bernecker, H.P.Kriegel, M.Renz, F.Verhein, and A.Zuefle, Probabilistic Frequent Itemset Mining in
Uncertain Databases, Proceedings of SIGKDD’2009
[6] L.Sun, R.Cheng, D.W.Cheung, and J.Cheng, Mining Uncertain Data with Probabilistic Guarantees,
Proceedings of KDD’2010
[7] T.Calders, C.Garboni, and B.Goethals. Approximation of Frequentness Probability of Itemsets in Uncer-
tain Data, Proceedings of ICDM’2010
[8] Y.Tong, L.Chen, Y.Cheng, and P.S.Yu. Mining Frequent Itemsets over Uncertain Databases, Proceed-
ings of VLDB’2012
[9] P.Tang, and E.A.Peterson. Mining Probabilistic Frequent Closed itemsets in Uncertain Databases, Pro-
ceedings of ACMSE’2011
186 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-186

Performance Degradation Analysis Method


Using Satellite Telemetry Big Data
Feng ZHOU a, De-Chang PI a,1, Xu KANG a and Hua-Dong TIAN b
a
College of Computer Science and Technology, Nanjing University of Aeronautics and
Astronautics, Nanjing, Jiangsu, China
b
China Academy of Space Technology, Beijing, China

Abstract. Satellites have features of high control integration, various working


modes, and complex telemetry big data, which make it difficult to evaluate their
performance degradation. In this paper, a novel data mining analysis method is
proposed to analyze the satellite’s telemetry big data, in which sample entropy is
calculated to characterize states and the support vector data description is utilized
to analyze the satellite performance degradation process. The experimental results
show that our proposed method could generally describe the performance degrada-
tion process of satellites. Meanwhile, it also provides an important approach for
the ground-station-monitor to analyze the performance of satellites.

Keywords. performance degradation, telemetry big data, sample entropy, support


vector data description

Introduction

With more and more satellites being sent into space these years, the ground in-orbit
managements have to handle such challenges as satellite’s high control precision, vari-
ous working modes, and high complexity. As advanced technologies and new materials
are utilized in satellites [1, 2], the sudden failure is not the primary failure mode for
most satellite failures, which is replaced by performance degradation. The theory of
analyzing satellite performance degradation only focuses on the overall performance of
equipment, regardless of failure modes, which is different from analyzing sudden fail-
ures.
In 2001, the University of Wisconsin and the University of Michigan, together
with other 40 industry partners, were united to establish the Intelligent Maintenance
Systems (IMS) research center under the U.S. National Science Foundation. After then,
many methods of performance degradation assessment have been proposed, such as the
pattern discrimination model (PDM) based on a cerebellar model articulation controller
(CMAC) neural network [3], self-organizing map (SOM) and back propagation neural
network methods [4], hidden Markov model (HMM) and hidden semi-Markov model
(HSMM) [5], etc. However, these methods are deficient in some aspects. For example,
the results of CMAC assessment method are greatly influenced by parameter setting

1
Corresponding Author: De-Chang PI, College of Computer Science and Technology, Nanjing Univer-
sity of Aeronautics and Astronautics, 29 Yudao Street, Nanjing, Jiangsu, 210016, China. E-mail:
dc.pi@nuaa.edu.cn.
F. Zhou et al. / Performance Degradation Analysis Method Using Satellite Telemetry Big Data 187

and the assessment results of the SOM, neural network method and HMM cannot di-
rectly reflect degradation degree. In order to accommodate the characteristics of as-
sessment for different key components, the analysis theory of performance degradation
has been developed from single degradation variable to a more diverse practical direc-
tion. Although some new theories and methods have emerged, the researches on the
performance degradation of satellite are still limited. M Tafazoli [6] studied in-orbit
failures for more than 130 different spacecraft and revealed that the spacecraft are vul-
nerable to failures occurring in key components. MAW [7] analyzed the space radiation
environment of thermal coatings and proposed degradation models for the optical prop-
erties of thermal coatings. However, these methods mainly focus on failure data and
also require relevant experience.
The conventional analysis methods for satellite performance degradation have
some shortcomings such as experimental difficulties and high cost. Satellites telemetry
big data contain monitoring information, abnormal states, space environment, and oth-
ers, which reflect the operational status and payload of satellites. A novel analysis
method for satellite performance degradation with telemetry big data is proposed in this
paper. This method uses data mining techniques and provides a quantitative description
for satellite performance degradation process.
Recently, the presented performance degradation methods are based on physical
rules or models [8, 9], this methods need to understand the internal structure of the sat-
ellite which is a difficult work for analyst. However, our proposed method uses the data
sampled in satellite operation process to analyze satellite performance degradation pro-
cedure without needing to determine the relationship of equipment accurately. What’s
more, our proposed method studies the characteristics of historical data, summarizes
the regulation of change, and analyzes the performance degradation process automati-
cally. To the best of our knowledge, a similar approach to performance degradation of
satellite has not appeared yet. Furthermore, it also can be extended to apply to failure
prediction.

1. Related Concepts

1.1. Sample Entropy

The sample entropy [10] (SamEn) is an improved algorithm of approximate entropy


(ApEn) proposed by Pincus [11]. The advanced algorithm is able to quantify the com-
plexity rate of a nonlinear time series.
For a data series X N x 1 , x 2 ,...x n , where N is the length of the series, two
parameters are defined: m is the embedded dimension of the vector to be formed and
r is the threshold that serves as a noise filter. The steps to calculate SamEn are shown
as follows:
1) N  m  1 patterns (vectors) are generated, and each pattern owns m dimensions.
The pattern is represented as following:

X m i > x i , x i  1 , , x(i  m  1)@ i 1, , N  m 1 (1)


188 F. Zhou et al. / Performance Degradation Analysis Method Using Satellite Telemetry Big Data

2) The distance, d ª¬ X m i , X m j º¼ between each two patterns can be computed by


using Eq. (2).


¬X
m
i , X j º¼
m
max[| x i  k  x( j  k ) |]
(2)
k 0, , m  1, j 1,
1 , N  m  1, j z i

3) For each pattern X m i , Crm i N m i / N  m is the probability that other patterns


X m j matches i where the number of matching pattern N m i is the number
pattern X
m

satisfied the condition d ¬ª X m i , X m j ¼º d r . And the matching probability of two se-


quences with m points can be achieved by using Eq. (3):

N  m 1
1
)
m
r
N  m 1
¦ C i m
r
(3)
i 1

4) When the dimension expands to m  1 , steps 1-3 are repeated to find out ) m +1 r .
The theoretical value of the SamEn is defined as follows:

SamEn m, r
N of
^
lim  ln ª¬) m 1 r ) m r º¼ ` (4)

Experiments conducted by Pincus [8] indicate that a reasonable statistical character


can be achieved when m 2 , r 0.1 ~ 0.25 ˜ std X , where std X denotes the standard
deviation of X ^x 1 , x2 , xN ` .

1.2. Support Vector Data Description

Support Vector Data Description [12] (SVDD) is inspired by the Support Vector Clas-
sifier. The method is robust against outliers in the training set and is capable of tighten-
ing the description by using negative examples.
A hypersphere that contains all or most samples of the target class is defined
as X = ^x1 ,x2 , xn ` . The hypersphere is bounded by the core of the hypersphere a and
radius R . If the hypersphere covers all the training samples of target class, the classifi-
cation is established by the empirical error which is equal to zero, and the structural
error is defined as H a,R =R 2 .
As the distance from xi to the core a should not be larger than radius R for all the
samples of the target class X , the constraint of the minimization problem can be de-
2
scribed as xi -a d R 2 .
To account for the possibility of outliers in the training set, the distance between xi
and the core a should not be strictly smaller than R , but larger distances should be
penalized. Therefore, slack variable [ i is brought in, and the minimization problem is
transformed into
F. Zhou et al. / Performance Degradation Analysis Method Using Satellite Telemetry Big Data 189

N
min H R,a,[ =R 2  C ¦ [i s.t xi -a d R 2  [i [i t 0 i =1,2, N
2
,,N (5)
i 1

The penalty factor C makes a trade-off between the volume and errors. The minimi-
zation problem in Eq. (5) can be calculated by using Eq. (6).
L R,a,Di ,[i =R2  C ¦[i  ¦Di ^R2  [ 2  xi  2axi  a 2 `  ¦ J i[i Di t 0,J i t 0 (6)
i i i

In Eq. (6), D i and J i are the Lagrange multipliers. L should be minimized with re-
spect to R , a , and [ i , and maximized with respect to Di and J i . Respectively taking
their partial derivatives equal to zero, and then get the following constraint Eq. (7):

¦Dx = Dx
¦D =1 ¦ C  D i  J i =0 i (7)
i i i
a=
¦D
i i i
i i i i

Substituting (7) into (6),we obtain max L :

max L = ¦ D i xi ˜ xi  ¦ D iD j xi ˜ x j (8)
i =1 i ,j

Thus, the optimization problem can be further transformed into Eq. (9):

max L =1  ¦ D iD j K G xi ,x j , V s.t 0 d D i d C
i ,j
(9)
K G x,y, V = exp  x -y 2
V2
Eq. (9) shows that the core of the hypersphere is a linear combination of the objects.
Only objects xi with D i t 0 are needed in the description. Therefore, these objects are
called the support vectors of the description (SVs). To test an object z , the distance to
the core of the hypersphere and the radius R are respectively calculated by Eq. (10).

d = z -a =K G z , z -2¦ D i K G z , xi + ¦ D iD j K G xi , x j
i i ,j
(10)
R 2 = xsv -a =1-2¦ D i K G xi , xsv  ¦ D iD j K G xi , x j
2

i i ,j

The test object z is accepted when this distance is not greater than the radius
(i.e. d d R ).

2. Method to Analyze the Performance Degradation of Satellite

2.1. Definition Description

Definition 1 (Performance Eigenvector)


190 F. Zhou et al. / Performance Degradation Analysis Method Using Satellite Telemetry Big Data

The SamEn of a time period is taken as its performance feature. And the vector
composed of the performance features of parameters within the same time period is
called performance eigenvector.
In this study, parameters are not limited to those of the objective equipment, but
they also contain a number of closely related equipment parameters. As parameters are
relative to specialized knowledge, their selections are conducted based on the domain
and expert knowledge.
Definition 2 (Health Model)
With SVDD method, the model obtained by training the performance eigenvector
of satellite in the healthy status is called health model (model).
According to the theory of SVDD, the model described in definition 2 is composed
of the support vectors of healthy state vector (model.SV), corresponding coefficients
( model.V ), number of support vectors (model.len), hypersphere bounded by the core
(model.a) and the radius (model.R)
Definition 3 (Performance Degradation Degree)
Here, dec denotes the distance between the performance eigenvector of satellite
and the core of hypersphere. The performance degradation degree deg which reflects
the “health condition” [13] is defined by the difference between dec and the radius of
hypersphere model.R, that is, deg = dec – model.R (in Figure 1).
It means that performance degradation process of the objective equipment may oc-
curs when the value of deg is larger than 0. When the value increases monotonously,
the speed of performance degradation process of the objective equipment increases
accordingly. As the degree cannot be negative, set deg = 0 when dec – model.R <0.

a) Performance states and eigenvectors b) Performance degradation degree


Figure 1. Principle of the performance degradation degree
Figure 1 shows the principle of performance degradation degree. However, the mod-
el cannot contain all the health status features of the satellite for the operating mode of
satellite is complex, and the training sets in healthy status of each operating mode are
limited. A satellite may remain in the healthy status under other operating modes, espe-
cially when deg is positive.

2.2. Framework Description

Figure 2 shows the overall framework of the analysis for satellite performance degrada-
tion presented in this study, which has four main steps.
Step 1. Select parameters of the satellite according to expert knowledge. Then, medi-
F. Zhou et al. / Performance Degradation Analysis Method Using Satellite Telemetry Big Data 191

an filter method is used to reduce the noise in satellite telemetry big data so as to gen-
erate a new clean dataset.
Step 2. Extract the performance features from the selected parameters through Step 1
according to Definition 1. And compose the final set of the performance eigenvectors.
Step 3. Select the performance eigenvectors in the healthy status as the training set,
and build a health model with SVDD method.
Step 4. To measure the degradation status of the new performance eigenvector, cal-
culate the performance degradation degrees according to Definition 3 and the results of
the model obtained in Step 3.
Satellite
Satellite telemetry
telemetry
r
data
data

Parameter
Parameter selection
selection Expert
Expert knowledge
knowledge

Telemetry
r data
elemetry data
processing
processing
Processing

Local Performance Eigenvector Median


Median Filter
Filter

Eigenvectors
Eigenvectors in
in Eigenvectors
Eigenvectors for
f r
fo Sample
Sample Entropy
Entropy
healthy
healthy states
states analysis
analysis extraction
extraction

Support
u port Vector
Sup Vector Data
Data Health
Health Model
Model
Description
Description

Performance
Perfo
f rmance
Degradation
Degradation Degree
Degree

Figure 2. Framework of the satellite performance degradation analysis

3. Experimental Results and Analysis

The telemetry big data of one satellite is used as experimental data, which recorded
from 2011-05-01 00:00:00.0 to 2011-12-29 18:16:59.987, 14 million data frames that
contain several failures and performance degradation information. In our experiments,
seven important parameters in this dataset are selected by expert knowledge.
The telemetry big data is stored in Oracle 11g, and the algorithms are coded by Java.
The operating system used is Windows Server 2008 R2 Standard with the Intel (R)
Xeon (R) Eight-core E5606 processor with 8 G RAM.

3.1. Telemetry Big Data Processing

The experimental dataset is processed as the following steps:


(1) The outliers caused by decoding or other errors are removed according to the
ranges of the seven parameters. And further, the median filter method in every 30s is
used to reduce the noise in the dataset. Finally, a new dataset is achieved.
(2) The values of time series are normalized into the range [-1, 1] for each parameter
and each time series are equally divided into 800 groups. The performance features of
192 F. Zhou et al. / Performance Degradation Analysis Method Using Satellite Telemetry Big Data

each group are extracted by Definition 1. Finally, seven performance feature sequences
are obtained with a length of 800. The performance eigenvector is composed of the
features of seven parameters in the group with same number.

3.2. Modeling and Degradation Analysis

(1) Performance eigenvector under healthy status are selected as the training data,
SVDD method is used by setting V =1 in this experiment, and then the health model of
satellite is established.
(2) The remaining dataset is used as test data to verify the obtained health model,
and the degradation degree is calculated according to Definition 3. Figure 3 shows the
final results.
The degradation degrees are unsteady, and the curve is not smooth but fluctuant.
This is mainly due to the recognition accuracy of SVDD and cyclical factors of original
data that does not affect the overall reaction on the degradation process of satellite. In
order to reduce the interference of these factors, a relative algorithm is employed, and
the wavelet denoising sequence is obtained as Figure 3 shows. Overall, the average
degradation degree presents an increasing trend. Given the long period, the accidental
factors cannot influence the degradation degree all the time. Therefore, we conclude
that the satellite has entered the performance degradation state based on Definition 3.
0.35

degradation degree sequence


0.3 wavelet denoise sequence

0.25
degradation degree

0.2

0.15

0.1

0.05

0
1 100 200 300 400 500 600 700 800
group number

Figure 3. Degradation degree


Aerospace experts confirm that two major failures of satellite did occur from late Ju-
ly to late August (between the 246th group and 370th group) for unknown reasons, and
these two failures are corresponding to the two peaks nearby. That proves the correct-
ness of our proposed definition, especially explaining the degradation peak and the
high degradation degree level after the peak. In conclusion, the proposed method can
efficiently describe the performance degradation process of satellite.
By the way, our proposed method as a data-driven method to performance degrada-
tion of satellite has not appeared yet, it also can be used for failure prediction in a varie-
ty of engineering applications, such as aircraft engines.
F. Zhou et al. / Performance Degradation Analysis Method Using Satellite Telemetry Big Data 193

4. Conclusions

A method for satellite performance degradation with telemetry big data is proposed in
this paper while studies for solving this problem are limited. The experimental analysis
shows that the proposed method can extract effective state information from the param-
eters and provide a quantitative description for satellite performance degradation.
Moreover, the analysis on the performance degradation of satellite with telemetry big
data has a significant meaning in in-orbit research and management for satellites.
In our study, the definitions may have some limitations; for example, the degradation
degree of the experiment is unstable but fluctuant. The sample entropy algorithm may
take much time to trim redundant parameters in massive data, which will be improved
in our future work.

Acknowledgment

This paper is supported by the National Natural Science Foundation of China (Grant
No. U1433116).

References

[1] Z.Z. Zhong, D.C. Pi D. Forecasting Satellite Attitude Volatility Using Support Vector Regression with
Particle Swarm Optimization. Iaeng International Journal of Computer Science, 41(2014), 153-162.
[2] F. Zhou, D.C. Pi. Prediction Algorithm for Seasonal Satellite Parameters Based on Time Series Decom-
position. Computer Science, 43(2016), 9-12 (in Chinese).
[3] J. Lee. Measurement of machine performance degradation using a neural network model. Computers in
Industry, 30(1996), 193-209.
[4] R. Huang, L. Xi, et al. Residual life predictions for ball bearings based on self-organizing map and back
propagation neural network methods. Mechanical Systems and Signal Processing, 21(2007), 193-207.
[5] X.S. Si, W. Wang, C.H. Hu, et al. Remaining useful life estimation–A review on the statistical data driv-
en approaches. European Journal of Operational Research, 213(2011), 1-14.
[6] M. Tafazoli. A study of on-orbit spacecraft failures. Acta Astronautica, 64(2009), 195-205.
[7] W. Ma, Y. Xuan, Y. Han, et al. Degradation Performance of Long-life Satellite Thermal Coating and Its
Influence on Thermal Character . Journal of Astronautics, 2(2010), 43-45.
[8] G. Jin, D.E. Matthews, Z. Zhou. A Bayesian framework for on-line degradation assessment and residual
life prediction of secondary batteries inspacecraft. Reliability Engineering &System Safety, 113(2013),
7-20
[9] X. Hu, J. Jiang, D. Cao, et al. Battery Health Prognosis for Electric Vehicles Using Sample Entropy and
Sparse Bayesian Predictive Modeling. IEEE Transactions on Industrial Electronics, 63(2015), 2645-
2656.
[10] S.M. Pincus. Assessing serial irregularity and its implications for health. Annals of the New York Acad-
emy of Sciences, 954(2001), 245-267.
[11] D. Weinshall, A. Zweig, et al. Beyond novelty detection: Incongruent events, when general and specific
classifiers disagree. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(2012), 1886-
1901.
[12] G. Yan, F. Sun, H. Li, et al. CoreRank: Redeeming “Sick Silicon” by Dynamically Quantifying Core-
Level Healthy Condition. IEEE Transactions on Computers, 65(2016), 716-729.
194 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-194

A Decision Tree Model for Meta-


Investment Strategy of Stock Based on
Sector Rotating
Li-Min HE a, Shao-Dong CHEN a, Zhen-Hua ZHANG b,1, Yong HU c and Hong-Yi
JIANG a
a
School of Finance, Guangdong University of Foreign Studies, China
b
School of Economics and Trade, Guangdong University of Foreign Studies,
Guangdong, China
c
Institute of Big Data and Decision Making, Jinan University, Guangdong, China

Abstract. This study firstly proposes Meta-Investment Strategy, derived from the
concept of Meta-Search in network and Meta-Cognition in psychology. We
compare enormous web information to all A shares in China, process of searching
information to stock selection and search engines to equity funds. Based on the
sector rotation theory and decision tree model, through the construction of
indicator system and the statistical model, some stock selection rules according to
funds information can be extracted. After classifying the period from 2016.02 to
2016.04 as recovery, we selected finance industry. By importing 12 stock
indicators of all the component stocks in finance industry as input variables and
whether it is heavily held by stocks funds as target variable, a decision tree model
is constructed. Finally, by entering data of the last quarter in 2015, the predictive
classification results are obtained. Result shows that Meta-Investment Strategy
outperformed CSI300 and CSI300 of Finance Sector (000914) and obtained
significant excess return from 2016.02.01 to 2016.04.30.

Keywords. Meta-Investment strategy, sector rotation theory, decision tree model,


data mining, stock selection model

Introduction

In each surge of stock market in China, there are always hot industries which lead the
upward trend periodically. If investors can seize these fleeting investment opportunities
of hot industries, their portfolios can acquire excess return. Sector rotation has become
one of the most important means in investment research of stock market.
Sector rotation refers to a phenomenon that in every phase of business cycle and
stock market cycle, different industries take turns to outperform the market. The
research on sector rotation theory abroad is more mature than domestic ones. It
originated from the famous “The Investment Clock” [1], which classified the business
cycle into four phases and concluded the performance of different industries. Sassetti
and Tani [2] outperformed market returns by using 3 market-timing techniques on 41

1
Corresponding Author: Zhen-Hua Zhang, School of Economics and Trade, Guangdong University of
Foreign Studies, Guangzhou 510006, China; E-mail: zhangzhenhua@gdufs.edu.cn.
L.-M. He et al. / A Decision Tree Model for Meta-Investment Strategy of Stock 195

funds of the Fidelity Select Sector family over the period 1998 to 2003. The domestic
research of sector rotation focuses on the phenomenon itself and its underlying causes,
including business cycle, monetary cycle, industrial cycle and behavioral finance.
However, only a few researches probe into sector rotation as an investment
strategy. Peng and Zhang [3] empirically analyzed the sector rotation effect in Chinese
stock market and proved the feasibility of sector investment strategy. By adopting
association rules algorithm, many strong association rules of stock market were mined
from a massive amount of data [4]. In this research, manufacturing and petrochemical
industry stock indexes (the core of association rules) are closely related to other sector
indexes (except for finance, real estate, food & beverage and media).
In addition, because of the immature Chinese capital market, irrational investments
contribute to instability of stock market. This leads to the divergence of market
performance and economic fundamentals. In this case, stock fund, representative of
professional investors, can often forecast directions of financial market. That is why we
put forward the concept of “meta-investment”.
We first present the concept of Meta-Investment based on Meta-Search and Meta-
Cognition. And then, by fusing Meta-Investment and Sector Rotation Strategy, we
apply this concept to stock investment according to the investment results of some
funds and institutions. In order to get comprehensible rules, we adopt decision tree
model to construct the final investment strategy. Simulation results show the
advantages of the present method.

1. Sector Rotation Strategy

1.1. Interpretation Based on Business Cycle

Yang [5] proposed that the essence of sector rotation is an economic phenomenon.
Namely, factors influencing business cycle also induce sector rotation in capital market.
These factors include investment, monetary shock, external shock and consumption of
durable goods. In his study, by introducing phases of business cycle as dummy variable
to the classical CAPM model, the sector rotation strategy gains 0.2% excess Jensen
Alpha returns. Dai & Lin [6] put forward the innate logic of sector rotation and
business cycle. Business cycle is determined by external shock while industrial
structure decides internal forms of business cycle. The process is shown in Figure 1.

Industrial Structure External Shock

Economic Conduction Route

Financial Situation of
Business Cycle Different Industries

Relative Evaluation Level

Sector Rotation

Figure 1. The Conduction Route


196 L.-M. He et al. / A Decision Tree Model for Meta-Investment Strategy of Stock

1.2. Interpretation Based on Monetary Shock

Monetary policy is an important contributory factor of stock market. In the long run,
the performance of stock market is based on real economy. However, the change in
liquidity resulting from the conversion of monetary policy can influence the stock
market in the short run. The interpretation of sector rotation based on monetary policy
is that different industries have different sensitivity to liquidity. Conover, Jenson,
Johnson and Mercer [7] used federal FED discount rate as an indicator of monetary
policy to build a sector rotation strategy based on monetary environment. After
classifying the monetary phases, sensitivity to liquidity of different industries was
tested. Subsequently, cyclical industries sensitive to liquidity were invested during
monetary easing while noncyclical industries were invested during monetary tightening.
This strategy gained excess return.

1.3. Interpretation Based on Lead-Lag Relationship

Lead-Lag relationship refers to the horizontal or vertical profit transmission


relationship among different industries. Therefore, investment logic is formed to gain
excess return—selling industries that outperformed the market early and buying
industries that outperformed the market lately. Chen used the DAG method to conduct
an empirical analysis of the relationship of price index among different industries. He
proposed three explanations for sector rotation, which included associations among
different industries formed by business cycle, upstream & downstream relationship and
investment characteristics [8].

1.4. Conclusion

If we use a top-down method to interpret sector rotation phenomena from the


perspective of real economy, sector rotation originates from the changes of business
cycle and monetary shock. In addition, industrial structure determines the expression
form of sector rotation. Namely, the difference of income elasticity of demand, cost
structure [9], sensitivity to liquidity and profit transmission relationship among
different industries decide the form of sector rotation. Moreover, sector rotation can be
interpreted from a perspective of behavioral finance, which views sector rotation as a
market speculation. The proportion of retail investors in China Capital market is
relatively high. Thus, there are a lot of noise traders (According to Shiller, they pursue
fashion and fanaticism and incline to overreact to changes of stock prices.). Meanwhile,
informed traders among institutional investors joint together to lure retail traders to
gain excess profit by manufacturing the stock market [10].

2. Meta-Investment Strategy

Meta-Investment Strategy is an extension from the concept of Meta-Search and Meta-


Cognition. A Meta-search Engine is a search tool that uses other search engines' data to
produce their own results from the Internet [11]. Meta-search engines take input from a
user and simultaneously send out queries to third party search engines for results.
L.-M. He et al. / A Decision Tree Model for Meta-Investment Strategy of Stock 197

Therefore, sufficient data is gathered, formatted by their ranks and presented to the
users.
It is well known that Meta-Cognition is "cognition about cognition", "thinking
about thinking", or "knowing about knowing". The term Meta-Cognition literally
means cognition about cognition, or more informally, thinking about thinking defined
by American developmental psychologist Flavell [12]. Flavell defined Meta-Cognition
as knowledge about cognition and control of cognition. It comes from the root word
"meta", meaning beyond. It can take many forms; it includes knowledge about when
and how to use particular strategies for learning or for problem solving. There are
generally two components of Meta-Cognition: knowledge about cognition, and
regulation of cognition.
Meta-Memory, defined as knowing about memory and mnemonic strategies, is an
especially important form of Meta-Cognition. Differences in Meta-Cognitive
processing across cultures have not been widely studied, but could provide better
outcomes in cross-cultural learning between teachers and students.
Some evolutionary psychologists hypothesize that Meta-Cognition is used as a
survival tool, which would make Meta-Cognition the same across cultures. Writings on
Meta-Cognition can be traced back at least as far as On the Soul and the Parva
Naturalia of the Greek philosopher Aristotle.
As representatives of professional investors, stock funds can explore intrinsic
values of investment objectives before the market. Therefore, through the application
of the stock funds investment result, investing by “standing on the shoulders of giants”
can be a brand-new idea. Meta-Investment Strategy in this study is based on funds. It
compares enormous web information to all A shares, stock selection to search process
and equity funds to search engines. Through the construction of indicator system and
building of statistical modeling, the stock selection rules of stock funds can be
extracted for portfolio construction.

3. Application of Meta-Investment Strategy: Based on Sector Rotation theory and


Decision Tree C5.0 Algorithm

3.1. Model Selection in Data Mining Methods

Increasing methods of Data mining and machine learning have been applied to the
financial field. There have been many models of stocks selection, such as Neural
Network, Random Forest, Support Vector Machine (SVM), Genetic Algorithm (GA),
Rough Set Theory and Concept Lattices etc.
The aim of this research is to probe into Meta-Investment Strategy (based on sector
rotation theory) by searching for proper data mining and machine learning algorithm.
To realize the goal, firstly the comprehensibility of investment strategies has to be
considered. Therefore, algorithm which can be used to extract understandable rules is
the main approach in this study.
However, Neural Network and Random Forest are more suitable for large sample
of data. Besides, Neural Network cannot be used for rule extraction. SVM is applicable
to relatively small sample, whereas it is difficult to extract rules. In summary, Neural
Network, SVM) and Genetic Algorithm (GA) are suitable for prediction instead of rule
extraction. Thus, Decision Tree, Rough Set and Concept Lattices methods are more
suitable than the other prediction methods for the research purpose.
198 L.-M. He et al. / A Decision Tree Model for Meta-Investment Strategy of Stock

Some researchers had put forward the decision tree [13, 14, 15] and random forest
[16] used in the field of investment decisions. For example, Hu and Luo [14] applied
the decision tree model to sector selection, Sorensen and Miller et al. [15] utilized the
decision tree approach for stock selection. Liu et al. [16] proposed a random forest
model applied to bulk-holding stock forecast.
However, no available research directly applied stock funds’ investment result to
investment practice currently. In addition, although there were some related researches
on bulk-holding stock [16], they were not directly combined with investment practice.
Most importantly, because Meta-Investment Strategy is firstly launched in this
study, there are no specialized algorithms for meta-investment Strategy at present.
After comparison, C5.0 Decision Tree which is easy to extract understandable rules is
preferred.
Secondly, conditional attributes are continuous in data set. In terms of applying
Rough Set and Concept Lattices for extracting rules, discretization process is necessary,
which requires proper discretization model. C5.0 Decision Tree method, without the
discretization process, is comparatively easier to implement than the former.
Moreover, traditional extraction methods of comprehensible rules, which are used
to extract information from massive original data directly, are difficult for this research
because of several problems: (1) Massive data, large number of indicators and scattered
information make it difficult to extract rules; (2) Implicit rules of investment vary from
different periods because of various financial conditions and policies. Therefore, the
prediction accuracy is limited and rules are likely to contradict with each other; (3)
Operational speed is relatively slow when coping with massive data.
We aim to use Meta-Investment Strategy and rule extraction algorithm to solve the
aforesaid problems. Relevant research is limited in this field. It is built on existing
investment strategies and thus accuracy is improved. In this way, data size and
conflicting information are relatively less, which makes the extracted rules more
reasonable. Therefore, C5.0 is chosen for rule extraction.

3.2. Decision Tree Modeling and Preliminaries

In this study, Meta-Investment Strategy is used for portfolio construction by statistical


modeling and rule extraction. Therefore, the decision tree model is used for rule
extraction.
For one thing, decision tree model is one of supervised learning methods. In
decision tree model, each example is a pair consisting input objects and target variables.
Through analyzing the training data, a set of inference rules is produced for mapping
new examples. For another, the goal of this study, namely, extracting stock funds’
stock selection funds is in accordance with the output of decision tree, which is a
inference rule set.
This study uses C5.0 Decision Tree Model and SPSS Modeler for rule extraction.
The splitting criterion is the normalized information gain (difference in entropy). The
attribute with the highest normalized information gain is chosen to make decision. C5.0
Decision Tree introduced boosting algorithm to enhance the accuracy [13, 14, 15].
Based on sector rotation theory and C5.0 Decision Tree Model, the modeling
process is shown by Figure 2.
L.-M. He et al. / A Decision Tree Model for Meta-Investment Strategy of Stock 199

Classification of Business Cycle

Selection of Industry

Construction of Training Samples

Identifying Target Variable—Whether It


Is Heavily Held by Stock Funds

Identifying Input Variables—Indicator


System Constrction

C5.0 Decision Tree Model

Classification & Portfolio Construction

Data Back-testing

Figure 2. Flow Chart

3.3. Classification of Business Cycle

The method of classifying business cycle includes two-stage method, four-stage


method, and six-stage method and so on. Merrill Investment Clock divided the business
cycle into four phases by using OECD “output gap” estimates and CPI inflation data.
Zhang & Wang [17] proposed that because traditional “output gap” and CPI inflation
data were quarterly released, it’s difficult to identify the economic inflection point.
Therefore, monthly figures including Macroeconomic Prosperity Index and Consumer
Price Index (The same month last year=100), shown in Figure 3, can be used as the
main indicators for classification of business cycle.

Figure 3. Trend of Macroeconomic Prosperity Index and CPI (2009.03 to 2015.12)


Source: CSMAR
For these reasons, this study adopt the four-stage method with Macroeconomic
Prosperity Index and Consumer Price Index (The same month last year=100) (in Table
200 L.-M. He et al. / A Decision Tree Model for Meta-Investment Strategy of Stock

1), by which the business cycle is divided into four stages: recovery, overheat,
stagflation and recession. Classification results of business cycle are shown in Table 2.
Table 1. Classification of the Four Stages in a Business Cycle

Phase
Recession Recovery Overheat Stagflation
Indicator

Macroeconomic Prosperity Index ω χ χ ω


Consumer Price Index
ω ω χ χ
(The same month last year=100)

Table 2. Classification Results (Four-Stage Method)

Time Phase CSI300 Time Phase CSI300


2009.03-2009.07 Recovery 67.93% 2012.08-2012.10 Recovery -4.40%
2009.08-2010.02 Overheat -13.34% 2012.11-2013.07 Overheat 1.52%
2010.03-2010.08 Stagflation -12.67% 2013.08-2013.10 Stagflation 0.64%
2010.09-2010.12 Overheat 8.47% 2013.10-2015.01 Recession 44.00%
2011.01-2011.07 Stagflation -6.82% 2015.02-2016.01 Stagflation -12.16%
2011.08-2012.07 Recession -21.65% 2016.02-2016.04 Recovery 8.81%

3.4. Industry Selection

According to researches of investment clock of mainstream securities, we find that


finance is one of the most recommended industries to invest on, which is strong
focused on by three of the four major funds (Guotai Junan Securities, Shenyin &
Wanguo Securities, and Orient Securities) in Recovery Stage (Table 3).
Table 3. Industry Selection of Different Securities

Industry Recovery Overheat Stagflation Recession


Guotai Energy, Finance, Energy, Materials, Telecom, Consumer Health Care, Utilities,
Junan Consumer Finance Goods, Health Care Consumer Goods
Securities Discretionary
Shenyin Nonferrous Metals, Nonferrous Metals, Agriculture & Utilities, Health Care,
& Real Estate, Mining, Real Estate, Fishing, Health Finance,
Wanguo Finance, Ferrous Metals Care, Network Transportation
Securities Information Equipment,
Technology Electrical
Components
Orient Food & Beverage, Mining, Nonferrous Health Care, Food & Finance, Ferrous
Securities Nonferrous Metals, Metals, Beverage, Metals, Chemicals,
Real Estate, Transportation, Machinery, Utilities, Real Estate, Food &
Restaurant & Food Ferrous Metals Construction Beverage
Services, Tourism, Materials
Finance
Guoxin Real Estate, Agriculture, Home Utilities, Real Estate,
Securiteis Transportation, Appliances, Mining, Transportation, Transportation, Home
Mining, Restaurant Nonferrous Metals, Health Care, Food & Appliances, Electrical
& Food Services, Machinery, Trading Beverage Components,
Nonferrous Metals and Retailing Nonferrous Metals
L.-M. He et al. / A Decision Tree Model for Meta-Investment Strategy of Stock 201

In addition, finance industry is a cyclical business which is highly related to


economic fluctuation. Therefore, finance industry is chosen as a sample to investigate
on meta-investment strategy.

3.5. Sample Determination

Since 2009, mainstream securities have studied the investment clock in China. Typical
investigations include Guotai Junan Securities (2009), Shenyin Wanguo Securities
(2009), Orient Securities (2009) and Guoxin Securities (2010). The methodologies
including classification of domestic business cycle and statistical processing to
different industries are similar despite the different industry classification benchmark
and time range.
Chinese capital market is immature, for Chinese financial market greatly changes
because of different policies in different periods. During the period from 2016.02.01 to
2016.04.30, data is relatively more comprehensive and timely. In this way, extracted
rules are more likely to comply with implicit ones of Chinese capital market. In
addition, there are 51 stocks in finance industry at present. If we choose data before
2015, data size will be reduced greatly. For example, Guoxin Securities (002736) went
public in December, 2014 while Orient Securities (600958), Guotai Junan Securities
(601211), Dongxing Securities (601198) and Shenwan Hongyuan Group (000166)
went public in 2015. In conclusion, the chosen timeframe is considered from three
aspects: sector rotation theory, data size and timeliness.
It’s important to note that this study judges this period (2016.02.01-2016.04.30) to
be recovery. Subsequently, the training samples are confined to the first three quarters
in 2015. Financial and technical indicators are imported as input variables. Because of
the time lags of financial indicators, whether the stock in financial industry is heavily
held in the next quarter is set as target variable. Classification rules are produced
through C5.0 Decision Tree. Then the data of the last quarter in 2015 is imported for
classification and prediction and a portfolio is constructed with each chosen stock
weighted equally. Finally, performance of this portfolio is back tested from 2016.02.01
to 2016.04.30.
Table 3 summarizes the findings of the four studies mentioned above.
Since the research period in this study is recovery from 2016.02.01 to 2016.04.30,
finance industry is chosen according to the conclusion above.

3.6. Input Variables and Target Variable

All the input variables and target variable are shown in Table 4.
We manually chose input variables from four dimensions - profitability, operating
capacity, technical factors and indicators per share according to theory of financial
statements theories and previous researches [18-21].
In this study, sample size for model construction is 149. Samples are divided into
training set (70%) and test set (30%). According to Industry Classification of China
Securities Regulatory Commission (CSRC), number of stocks in China’s financial
industry is about 50. Because data used for model construction is confined to the first
three quarters in 2015, after excluding invalid samples, 149 samples in total are
available.
This study doesn’t adopt traditional stock selection model. Instead, we combine
sector rotation strategy and Meta-Investment Strategy. Therefore, after classifying
202 L.-M. He et al. / A Decision Tree Model for Meta-Investment Strategy of Stock

business cycle, choosing finance industry and research period (the first three quarters in
2015 for model construction), size of data that needs to be processed and noise data are
greatly reduced.
Table 4. Description of Variables

1st Level 2nd Level Indicators Name Attribute


Indicators
Profitability ROA (TTM) ROATTM Input Variables &
Metric Variable
EBIT Margin (TTM) EBITMarginTTM
Cash to Total Profit Ratio (TTM) CPRTTM
Indicators EPS (TTM) EPSTTM Input Variables &
Per Share Metric Variable

EBIT-EPS (TTM) EBITEPSTTM


Net Cash Flow Per Share (TTM) NCFPSTTM
Net Cash Flow Per Share (TTM) NCFOATTM
Net Cash Flow From Investing NCFIATTM
Activities (TTM)
Operational Price-Earnings Ratio (PE TTM) PERTTM Input Variables &
Ability Metric Variable
Price-Sales Ratio(PS TTM) PSRTTM
Price to Cash Flow (PCF TTM) PCFTTM
Technical Prior Three-Month Momentum Momentum Input Variables &
Indicators Metric Variable
Whether it is >0.02% of Net Value of Stock HH Target Variable &
Heavily Held by Funds Nominal Variable
Equity Funds
“Yes”=1,“No”=0
In the training set, there are 12 input variables and 1 target variable. “Whether It Is
Heavily Held by Stock Funds” is target variable. In this way, 13 indicators in total are
imported during model construction. When applying this model, by importing 12 input
variables, target variable is forecasted.
Additional notes of target variables: whether market value of a stock held by
Public Stock Funds is greater than 2% of Net Asset of all Public Stock Funds.

3.7. The Period Division of Training Sample

The research period of this study is confines to the first three quarters in 2015. Because
twelve of the input variables are from lagging financial statements, this study sets the
rules shown in Table 5.
Table 5. Usage of different Types of Report

Type of Report Correspondence Types of Sample


Seasonal Report of the 1st quarter Holdings of equity funds on 2015.06.30 Training Sample
Semi-annual Report Holdings of equity funds on 2015.09.30 Training Sample
Seasonal Report of the 3rd quarter Holdings of equity funds on 2015.12.31 Training Sample
L.-M. He et al. / A Decision Tree Model for Meta-Investment Strategy of Stock 203

3.8. Rule Extraction Based on C5.0 Decision Tree

By importing 13 stock indicators of all the component stocks in finance industry in the
first three quarter of 2015 as input variables and whether it is heavily held by stocks in
the next quarter as target variable, a set of inference rules is generated as follow.
Detailed rules are shown in appendix.
Rule 1 - estimated accuracy 89.22% [boost 96.1%]
NCFIATTM <= -4.903720 [ Mode: 1 ] => 1.0
NCFIATTM > -4.903720 [ Mode: 0 ]
Momentum <= -0.0665 [ Mode: 0 ] => 0.0
Momentum > -0.0665 [ Mode: 1 ]
Momentum <= 0.3515 [ Mode: 1 ]
PCFTTM <= 3.331410 [ Mode: 0 ]
PCFTTM <= -178.095000 [ Mode: 1 ] => 1.0
PCFTTM > -178.095000 [ Mode: 0 ] => 0.0
PCFTTM > 3.331410 [ Mode: 1 ] => 1.0
Momentum > 0.3515 [ Mode: 0 ] => 0.0…
Hence, we extract some rules and explain them.
For example, the first rule: NCFIATTM <= -4.903720 [ Mode: 1 ] => 1.0.
It means Net Cash Flow of Investment Activities (Trailing Twelve Months) which
is less than -4.903720 is chosen (1.0). In the field of commercial bank management,
banks have fixed demand of asset allocation. In China, the main investment activity of
commercial banks is purchasing treasury bonds. Because of the expansion of a bank’s
asset, the less the Net Cash Flow of Investment Activities (NCFIA) is, the faster of its
expansion. For example, in 2015 a bank has asset of RMB ¥100 Yuan, in which 30% is
allocated as one-year treasury bonds. In 2016 this bank has asset of RMB ¥120 Yuan,
in which 30% is allocated as one-year treasury bonds. The annual rate of return is 3%.
Therefore, in its financial statement, Net Cash Flow of Investment Activities is -6 (-
120*0.3+100*0.33). Minus sign means capital outflow while positive sign means
capital inflow.
The second rule: Momentum > -0.0665 [ Mode: 1 ] .
It means Three-Month Momentum which is greater than -0.0665 is chosen (1.0). In
short-term investment, there is an effect called “Momentum effect”. That is to say, rate
of return of a stock has the tendency of following the original trend.
From the above explanation of two most important rules, we know that the
extracted rules are reasonable. Certainly, we can also explain the others, which shown
in appendix.

4. Results Analysis

4.1. Comparison Object: CSI300

CSI 300 is a capitalization-weighted stock market index, designed to replicate the


performance of 300 stocks traded in the Shanghai and Shenzhen stock exchanges.
Therefore, it can be used as performance benchmark.
204 L.-M. He et al. / A Decision Tree Model for Meta-Investment Strategy of Stock

4.2. Results Comparison

By importing 12 financial indicators in the last quarter of 2015 (up to the end of
2015.12.31) and prior three-month momentum before 2016.02.01 of all the component
stocks in finance industry, classification results are produced. From those stocks with
classification value of “1”, stocks with a confidence level higher than 90% are chosen.
Subsequently, a portfolio is constructed with each chosen stock weighted equally.
Finally, performance of this portfolio is back tested from 2016.02.01 to 2016.04.30.
Classification results and performance of the portfolio are shown in Table 6.
Table 6. Classification Results

Code Stock Name $C HH (Forecast Level of


value, whether it is confidence
heavily held by equity
funds )
000001 Ping An Bank 1 1
000776 Guangfa Securities 1 1
002142 Ningbo Bank 1 1
600000 Shanghai Pudong Development Bank (SPDB) 1 1
600016 Minsheng Bank (CMBC) 1 1
600036 China Merchants Bank㧔CMB㧕 1 1

601009 Bank of Nanjing 1 1


601166 Industrial Bank (CIB) 1 1
601198 Dongxing Securities 1 1
601318 Ping An Insurance (Group) Company of China 1 1
601328 Bank of Communications 1 1
601818 China Everbright Bank Company 1 1
Result below (Figure 4. & Table 7) shows that Meta-Investment Strategy
outperformed CSI300, CSI300 of Finance Sector (000914) and yielded significant
excess return with a winning rate of 68.97% from 2016.02.01 to 2016.04.30.

Figure 4. Performance
L.-M. He et al. / A Decision Tree Model for Meta-Investment Strategy of Stock 205

Table 7. Back-testing data from 2016.02.01 to 2016.04.30

Period 2016.02.01-2016.04.29
Cumulative Return of CSI300 (399300) 9.12%
Cumulative Return of CSI300 Finance Sector (000914) 9.45%
Cumulative Return of Portfolio Based on Meta-Investment Strategy 12.53%
Winning Rate (Ratio of Days outperforming CSI300 to Total Days) 68.97%

4.3. Description and Explanation of Results

In this study, 149 samples are divided into training set (70%) and test set (30%).
Number of trails of boosting is five. It is used for amplifying sample size and
enhancing accuracy.
After selecting training set and test set and five iterations, the overall accuracy is
up to 96.1%. It is necessary to note that samples in each iteration are different to some
extent. The first model is built on equal probability sampling of training set, while the
second model is mainly based on the incorrectly classified samples of the first model.
The third model focuses on incorrectly classified samples of the second model and so
forth. Therefore, the estimated accuracy is different among rules.
It is also necessary to explain that the purpose of setting “Whether it is Heavily
Held by Equity Funds” as target variable is not for forecasting bulk-holding stocks of
stock funds. Instead, our purpose is to apply investment result of stock funds, extract
principals and rules and invest by “standing on the shoulders of giants”. Therefore, this
study uses the comparison among cumulative return of portfolio, cumulative return of
CSI300 Finance sector and cumulative return of CSI300 to test effects of extracted
rules and stock selection model.

5. Conclusions

Our research on Meta-Investment Strategy is combined with sector rotation theory. In


order to extract comprehensive rules and select proper investment strategies, this study
is based on investment results of stock funds.
There is mounting evidence in the literature that sector rotation phenomenon
incorporate the economic cycle. Armed with this evidence, we investigate on the nature
of sector rotation strategy from three aspects (business cycle, monetary shock and lead-
lag relationship). In this way, we draw to a conclusion that sector rotation originates
from the changes of business cycle and monetary shock. In addition, industrial
structure determines the expression form of sector rotation.
Furthermore, this study firstly proposes Meta-Investment Strategy, which is an
extension from the concept of Meta-Cognition and Meta-Search Engine. Meta-
Investment Strategy is based on stock funds. To facilitate understanding, we compare
enormous web information to all A shares, process of searching information to stock
selection and search engines to equity funds. Through the construction of indicator
system and building of statistical modeling, the stock selection rules of funds can be
extracted for portfolio construction.
Finally, we combines sector rotation theory and decision tree model. After
classifying the period from 2016.02 to 2016.04 as recovery, we selected finance
206 L.-M. He et al. / A Decision Tree Model for Meta-Investment Strategy of Stock

industry. By importing 13 stock indicators of all the component stocks in finance


industry as input variables and whether it is heavily held by stocks funds as target
variable, the decision tree model is constructed. Subsequently, by entering data of the
last quarter in 2015, the predictive classification results are obtained. Result shows that
Meta-Investment Strategy outperformed CSI300 and CSI300 of Finance Sector
(000914) and obtained significant excess return from 2016.02.01 to 2016.04.30.
However, due to limitations of time, energy and data resources, the data back-
testing does not includes another three phases of economy, namely, overheat,
stagflation and recession. Follow-up studies will consider loosing restrictions on
research period and industries. Moreover, decision tree model in this study is static. A
dynamic decision tree model will be constructed in the follow-up studies, by which
training samples can be increased and validity of inference rules can be enhanced.
This study did not choose traditional stock selection model, which usually select
stocks from massive data. It requires complex data processing operations because of
noise data. Instead, stock selection model in this study can be seen as a secondary filter
(Its stock screening process is based on stock funds’ investment results). There are
various advantages. For example, it is easy to operate with a relatively small amount of
computation and stock selection rules can be extracted directly.
This study applied public stock funds’ investment result to investment practice
directly for the first time. By choosing “Whether It Is Heavily Held by Stock Funds” as
target variable and building stock selection model, portfolio was constructed, which
rate of return outperformed average market rate of return. In this way, our results prove
that stock funds’ investment result can be used for portfolio construction and portfolio
optimization.

Acknowledgments

This paper is supported by the National Natural Science Foundation of China (No.
71271061), the National Students Innovation Training Program of China (No.
201511846058), Student Science and Technology Innovation Cultivating Projects &
Climbing Plan Special Key Funds in Guangdong Province (No. pajh2016b0174),
Philosophy and Social Science Project (No. GD12XGL14) & the Natural Science
Foundations (No. 2014A030313575, 2016A030313688) & the Soft Science Project
(No. 2015A070704051) of Guangdong Province, Science and Technology Innovation
Project of Education Department of Guangdong Province (No. 2013KJCX0072),
Philosophy and Social Science Project of Guangzhou (No. 14G41), Special Innovative
Project (No. 15T21) & Major Education Foundation (No. GYJYZDA14002) & Higher
Education Research Project (No. 2016GDJYYJZD004) & Key Team (No. TD1605) of
Guangdong University of Foreign Studies.

References

[1] M. Lynch, M. Hartnett, The investment clock (Report), 2004


[2] P. Sassetti, M. Tani, Dynamic asset allocation using systematic sector rotation, Journal of Wealth
Management 8 (2006), 59-70.
[3] Y. Peng, W. Zhang, The research on strategy and application of sector rotation in Chinese stock market,
The Journal of Quantitative & Technical Economics 20 (2003), 148-151.
L.-M. He et al. / A Decision Tree Model for Meta-Investment Strategy of Stock 207

[4] Y. Ye, The cointegration analysis to stock market plate indexes based on association rules, Statistical
Education 9 (2008), 56-58.
[5] W. Yang, Research of sector rotation across the business cycle in the Chinese A share market, Wuhan:
Huazhong University of Science & Technology, 2011
[6] X. Lin, J. Dai, Quantitative and structural analysis of Guoxin investment clock (Report), Shenzhen China
(2012).
[7] C. M. Conover, G. R. Jensen, R. R. Johnson, et al., Is fed policy still relevant for investors? Financial
Analysts Journal 61 (2005), 70-79.
[8] H. Chen, Industry allocation in active portfolio management, Wuhan: Huazhong University of Science &
Technology (2011).
[9] M. Su, Y. Lu, Investigation on sector rotation phenomenon in Chinese A share market—from a
perspective of business cycle and monetary cycle, Study and Practice 27 (2011), 36-40.
[10] C. He, Analysis of sector rotation phenomenon in Chinese A share market, Economic research 47
(2001), 82-87.
[11] E. W. Glover, S. Lawrence, W. P. Birmingham, et al., Architecture of a metasearch engine that
supports user information needs, Conference on Information and Knowledge Management, 1999.
[12] J. H. Flavell, Metacognition and cognitive monitoring: A new area of cognitive-developmental inquiry,
American Psychologist 34 (1979), 906 – 911.
[13] W. Xue, H. Chen., SPSS modeler--the technology and methods of data mining, Beijing: Publishing
House of Electronics Industry (2014).
[14] H. Hu, J. Luo, Profitability and momentum are the key factors of selection of industries—the
exploration of the decision tree applied in sector selection (Report), Shenzhen China (2011).
[15] E. H. Sorensen, K. L. Miller, C. K. Ooi, The decision tree approach to stock selection, Journal of
Portfolio Management 27 (2000), 42-52.
[16] W. Liu, L. Luo, H. Wang, A forecast of bulk-holding stock based on random forest, Journal of Fuzhou
University (Natural Science Edition), 36 (2008), 134-139.
[17] L. Zhang, C. Wang, The investigation of Chinese business cycle and sector allocation on the
macroeconomic perspective (Report), Shenzhen China (2009).
[18] L. Zhang, Stock Selection Base on Multiple-Factor Quantitative Models, Shijiazhuang: Hebei
University of Economics and Business (2014).
[19] J. Zhao, Sector Rotation Multi-factor Stock Selection Model and Empirical Research on its
Performance, Dalian: Dongbei University of Finance and Economics (2015).
[20] P. Wang, J. Yu, Analysis of Financial Statements, Beijing: Tsinghua University Press (2004).
[21] H. Peng, X.Y. Liu, Sector Rotation Phenomenon Based on Association Rules, Journal of Beijing
University of Posts and Telecommunications, 18 (2016), 66-71.
208 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-208

Virtualized Security Defense System for


Blurred Boundaries of Next Generation
Computing Era
Hyun-A PARK 1
Department of Medical Health Sciences, KyungDong University,
815 Gyeonhwon-ro, MunMak-eub, WonJu-City, Kangwon-do, Korea

Abstract. This paper deals with the security problems facing with next genera-
tion computing environments. As the method, Virtualized Security Defense Sys-
tem (VSDS) is proposed under the application of ‘Trello’(web application) for on-
line patient networks, it deals with the following problems; (1) blurred security
boundaries between attackers and protectors, (2) group key management system,
(3) secret-collaborative works and sensitive information-sharing for group mem-
bers, (4) preserving privacy, (5) rendering of 3D image(member indicator, high
level of security). Consequently, although current IT paradigm is changing to more
‘complicated’, ‘overlapped’ and ‘virtualized’, VSDS makes it securely possible to
share information through collaborative works.
Keywords. Blurred Security Boundaries, Virtualized Security Defense System,
PatientsLikeMe, Trello, Group key, Reversed hash key chain, VR/AR, Member
indicator, Pseudonym

Introduction
0.1. Computing Environments for Next Generation and Problem Identification

In the quickly shifting computing societies, various kinds of information technologies


have developed new types of IT-enabled product and service innovations in our daily-
lives. The important features of these IT innovative technologies are highly advanced
wireless techniques such as mobile-internet, SNS, cloud, or big data technologies in the
networked collaborative computing environments.
Currently, IT paradigm, which has been changed from wired to wireless or to inte-
grated information environments, has made the information boundaries blurred between
attackers and protectors. Here, one of the most important problems is - although infor-
mation sharing is highly increased through collaborative works, virtualized IT resources
and overlapped trust boundaries have given rise to security dilemma about to protect
‘what information boundaries’ and ‘what characteristic information’[1]. Considering se-
curity information systems and mobile application researches, the security has consid-
1 Corresponding Author: Hyun-A Park, Department of Medical Health Sciences, KyungDong University,

815 Gyeonhwon-ro, MunMak-eub, WonJu, Kangwon-do, Korea; E-mail:kokokzi@naver.com


H.-A. Park / VSDS for Blurred Boundaries of Next Generation Computing Era 209

ered a clear objective in the traditional IT environments, and it is divided as two groups–
attackers and protectors, in which the security specialists take responsibility to prevent
the attacks and threats from outsiders using their knowledge in the security architec-
ture. On the other hand, at present, the changing IT paradigm has made the information
boundaries between attackers and protectors blurred.
The characteristics of next generation computing era (IT paradigm) can be summa-
rized as follows; (1) the increase of collaborative works through the network connections,
(2) the increase of information sharing in information-oriented society, (3) the blurred
security boundaries to protect, which is caused by virtualized IT resources and migration
policy, (4) the increase of 3D data such as in VR/AR.
Therefore, in this paper, to solve the problem against ‘blurred security boundaries’,
Virtualized Security Defense System (VSDS) is proposed under the web application of
‘Trello’ to construct online patient networks very similar to ‘PatientsLikeMe’.

0.2. Main Methods and Contributions


The solution for the problem is the security defense system for next generation’s com-
puting environments. The application is the web application ‘Trello’. With the Trello,
online patient network is constructed very similarly to ‘PatientsLikeMe’, because ‘Pa-
tientsLikeMe’ is very difficult to use for the patients(users) in non-English speaking re-
gions and the patients suffering from all other diseases. Hence, VSDS is extended for the
persons(ex. Researcher) having interests about the same diseases and the patients who
use all other languages including English and struggling against all other diseases by
using ‘Trello’. The main methods are as follows.
1. The proposed system VSDS (Virtualized Security Defense System) is the new con-
cept for security solutions. Its goal is to figure out the problem against ‘blurred security
boundaries’, so that VSDS figures out the problem by constructing the ‘Virtualized’ se-
curity solution for next generation’s computing environments. As the methods, it largely
uses Cryptographic Techniques and Member Indicator as 3D Video Image Technology
for virtualized IT resources. Especially, Member Indicator is a new security solution re-
flecting the characteristics of next generation’s computing.
2. VSDS should be secure and efficient group key management system, because in-
formation sharing has been and will be highly increased even in blurred security bound-
aries. As the method, each member’s group key is made based on Reversed One-way
Hash chain. According to onewayness properties of hash function[2], VSDS can guar-
antee Forward Secrecy and Backward Accessibility, Group Key Secrecy[3], which are
security requirements in group information-sharing system, VSDS.
2-1. Forward Secrecy and Backward Accessibility of security requirements should
be satisfied. In VSDS, a leaving member cannot know the next group key(Forward Se-
crecy) but a joining member can know all the previous keys and information (Backward
Accessibility) by properties of the reversed hash key chain. Therefore, VSDS is suitable
for secret-collaborative works and sensitive information-sharing for group members.
2-2. VSDS does not need to do re-keying for membership-changes. The principle of
each member’s group key generation: A fixed fundamental group key is assigned to each
group, and random numbers for each user and his every session are newly generated.
After applying the group key and random numbers to hash function respectfully, revers-
edly and repeatedly, the hashed group key and random numbers are combined according
210 H.-A. Park / VSDS for Blurred Boundaries of Next Generation Computing Era

to the given(developed) equation algorithm. Then, total five sub-group keys are made as
each member’s group session key.
Hence, every member has different group keys for each session. However, con-
sequently, every result of authentication(including encryption/decryption) is the same
as the fundamental group key’s result under the computation of the developed proto-
col(equation algorithm).
One of the most important things is that the same result value between fundamen-
tal group key and all other group keys has no need of re-keying processes whenever
membership-changes.
3. VR/AR Technique: A new concept of the 3D Video Image Mobile Security Tech-
nology Solution is proposed. As a member indicator, the 3-dimensional realistic models
which are decided at the registration time should be rendered in the log-in process to be
authenticated as a legitimate user [4].
4. VSDS preserves privacy. (1)Anonymity and Pseudonymity; Every session we use
pseudonymity. Although perfect anonymity cannot be provided, instead pseudonymity
can be provided, (2) Unlinkability; Every session users log-in with different pseudonyms
(Pd) and use different encryption keys(each member’s group key). Consequently, VSDS
can achieve the similar level of security to ’One Time Encryption’. (3) Unobservability;
All information is encrypted and pseudonym is changed every session by reversed hash
chain [5].
5. Access Control by Cryptographic Techniques and VR/AR Technique
6. VSDS is scalable to other group project systems on the websites. Application sce-
nario is about patient networks on the web, however VSDS is extendable to other secure
group projects.

1. Related Works and Application

1.1. Related Works


Among the researches related to main methods - Cryptographic techniques(especially,
group key management systems) and a member indicator as VR/AR technique, only one
research area about group key management systems is introduced and reviewed as related
works, because VR/AR Technique was applied just to use the new concept of security
solution.
The research areas about group key are so various such as group key agreement,
exchange, revocation, multicast/broadcast, yet this work only focuses on group key ap-
plication for multi-users. Especially, VSDS has a little different property from general
group key in that the security requirement of VSDS is not Backward Secrecy(a joining
member cannot know all the previous keys and information) but Backward Accessibil-
ity(a joining member can know all the previous keys and information). It is caused by
that the goal of application environments is all information-sharing with present group
members. That is the reason why VSDS is related to the research area of search schemes
for multi-users setting.
According to [6], Park et al.’s privacy preserving keyword-based retrieval proto-
cols for dynamic groups [7] is the first work for the multi-users setting in secret search
schemes. In [7], Park et al. generate each member’s group session key based on reversed
hash key chain and their shceme also stisfies with backward accessibility. As for other
H.-A. Park / VSDS for Blurred Boundaries of Next Generation Computing Era 211

researches based on reversed hash key chain, there are [3], [8]. [3] proposed two practical
approaches - efficiency and group search in cloud datacenter, where the authors defined
the group search secrecy requirements including baward accessibility. [8] suggested the
protocol for designated message encryption for designated decryptor, so that they make
a server see the only corresponding message in the cloud service system based on onion
modification and reversed hash key chain.
As for the multi-users setting researches not-based on reversed hash key chain, there
have been the following works; [6] proposed the common secure indices to make multi-
users obtain securely the encrypted group’s documents without re-encrypting them,
which is based on keyword field, dynamic accumulators, Paillier’s cryptosystem and
blind signatures. They formally defined common secure index for conjunctive keyword-
based retrieval over encrypted data (CSI-CKR) and its security requirements. The next
year, they proposed another scheme of keyword field-free conjunctive keyword searches
on encrypted data in the dynamic group setting [9], whereby the authors solve the open
problem asked by Golle et al. In [10], Kawai et al. showed the flaw of Yamashita and
Tanaka’s scheme SHSMG, and they suggest a new concept of Secret Handshake scheme;
monotone condition Secret Handshake with Multiple Groups (mc-SHSMG) for members
to authenticate each other in monotone condition. [11] suggested a new effective fuzzy
keyword search in a multi-user system over encrypted cloud data. This system supports
differential searching and privileges based on the techniques; attribute-based encryption
and Edit distance, which achieves optimized storage and representation overheads.
In this paper, VSDS generates group session keys for each user which are composed
of five sub-keys by using reversed hash key chain and random numbers. According to the
developed encryption/decryption algorithm, the group key achieves no need of re-keying
processes whenever membership changes happen.

1.2. Application

‘PatientsLikeMe’ is online patient networks, actually, VSDS does not apply to the web
‘PatientsLikeMe’ directly. The substantial application for VSDS is the web application
‘Trello’. It is intended that the proposed system VSDS, which is applied to Trello with
cryptographic and security techniques, can accomplish the goal and functional roles of
PatientsLikeMe. Hence, we need to know both of two websites.
Trello. Trello is a web-based project management application. Generally, basic ser-
vice charge is free, except for a Business Class service. Projects are represented by boards
containing lists (corresponding to task lists). Lists contain cards to progress from one list
to the next. Users and boards are grouped into organizations. Trello’s website can access
to most mobile web browsers. Trello dose various works such as real estate management,
software project management, school bulletin boards, and so on [12].
PatientsLikeMe. This online patient network has the goal of connecting patients
with one another, improving their outcomes, and enabling research. PatientsLikeMe
started the first ALS (amyotrophic lateral sclerosis) online community in 2006. There-
after, the company began adding other communities such as organ transplantation, mul-
tiple sclerosis (MS), HIV, Parkinson’s disease and so on. Today the website covers more
than 2,000 health conditions. The approach is scientific literature reviews and data-
sharing with patients to identify outcome measures, symptoms, treatments through an-
swering questions [13].
212 H.-A. Park / VSDS for Blurred Boundaries of Next Generation Computing Era

Figure 1. System Configuration of VSDS

1.3. Application Scenario

Using the web project application ‘Trello’, Security Defense System VSDS is con-
structed for the patients with any disease in all over the world, just like ‘PatientsLikeMe’.
The reasons are; (1) PatientsLikeMe does not deal with all kinds of disease. Although
the company began adding other communities such as MS, Parkinson’s disease, so many
other patients want to get in such kind of web and to be helped more easily. (2) The lan-
guage of PatientsLikeMe allows only for ‘English’. Patients in non-English speaking re-
gions are so difficult to sign in and use. The system is for group members who want to get
helps through information-sharing. The information scope is health conditions and pa-
tient profile. Mostly, the sensitive information could be shared but some secret personal
data in patient profile should not be revealed to anyone. Plus one more important thing
is that the system is Virtualized SDS using 3D image rendering for the next generation
computing.
The details are as follows; A board is assigned to one group. A list containing cards
is assigned to a user. Each member uploads his/her conditions or information to a card
and then the information is shared.
VSDS has three parties; Users, SM(Security Manager), VSDS Server. SM(Security
Manager) is a kind of a client, which is granted a special role of a security manager. SM
is assumed as a TTP (trusted third party) and it is located in front of the VSDS server. SM
controls group-key and key-related information, all sensitive information, and all other
events with powerful computational and storage abilities. Fig.1 is the system configura-
tion of VSDS. Every user should register at SM at first, thereafter they should get through
the authentication process every session and then they start some actions. When some
information is shared with other patients (it means that the shared card is generated), we
know the card is encrypted by the group’s encryption key. Only the legitimate users (who
registered at SM and have stored the information given by SM for authentication at his
device) can pass authentication processes and know the sharing information. In the last
H.-A. Park / VSDS for Blurred Boundaries of Next Generation Computing Era 213

step of the authentication, 3-dimensional image which is planned in advance is rendered.


This image can be said as a member indicator.

2. The Construction of VSDS

2.1. Notations
• KG : the fundamental group key of group G
• m : the number of group G’s members, j : session number, i : each member of group G
j
• kmi : group session key for each member ’i’ in the j-th session
j j j j j j
• Ki,1 , Ki,2 , Ki,3 , Ki,4 , Ki,5 : five subkeys for i’s group key kmi
j
• αi : random number of member i in the j-th session
j
• pdi : pseudonym of member i in the j-th session
• h(·) : hash function, f (·) : pseudorandom function
• C, E : Encryption function, D : Decryption function
q
• Vi : a video image information for a member i to render at q-th session, RV : a rendered
image of V

2.2. The Generation Process of Group Key


2.2.1. Group member’s group keys.
We assume that there are ‘m’ members of the group ‘G’, then the group key for the group
G is KG and the group keys for each member ‘i’ are kmij , (1 ≤ i ≤ m, 1 ≤ j ≤ q). Here, j
is a session number and q is the last session. The each member i’s group key kmij consists
j j j j j
of totally five subkeys; Ki,1 , Ki,2 , Ki,3 , Ki,4 , Ki,5 . We generate random number αiq for these
subkeys. Therefore, the last session group key of user ‘i’ is kmqi =;
q q
Ki,1 = h(KG )αi ,
q q
Ki,2 = h(KG ) fKG (KG )(1 − αi ),
q
Ki,3 = g fKG (KG ) ,
q q
Ki,4 = −(h(KG ) + αi ),
q q
Ki,5 = fKG (KG )αi

2.2.2. Group session keys - Reversed hash key chain.


We assume the total number of sessions is q. For every member i, we generate each
different random number αiq (1 ≤ i ≤ m) for the last session. Here, we again apply αiq to
hash function (q-1) times repeatedly to generate all sessions’ random number as follows.
q
αi , (randomly generated)
q q−1
h(αi ) = αi
q−1 q−2 q
h(αi ) = αi = h2 (αi )
q−2 q−3 q
h(αi ) = αi = h3 (αi )
.........
q
h(αi4 ) = αi3 = hq−3 (αi )
q
h(αi ) = αi = h (αi )
3 2 q−2
q
h(αi ) = αi = h (αi )
2 1 q−1

Therefore, the first session’s random number of member i is αi1 and the s-th session’s
214 H.-A. Park / VSDS for Blurred Boundaries of Next Generation Computing Era

random number of member i is αis ; h(αis+1 ) = αis = hq−s (αiq ). To make the member i’s
group session keys, αij is changed to αij+1 , 1 ≤ j ≤ q − 1 in the member’s group key.
With these different random numbers, we can make all different group keys for each
member and each session respectively.
One-way hash function h() plays the important role of group information-sharing
system in VSDS. One-Way Hash Key Chain is generated by randomly selecting the last
value, which is repeatedly applied to an one-way hash function h(). The initially selected
value is the last value of the key chain. One-way hash chain has two properties; 1. Any-
one can deduce an earlier value(ki ) with the later value(k j ) of the chain by computing
h j−i (ki ) = k j . 2. An attacker cannot find a later value(k j ) with the latest released value(ki )
because of h j−1 (k j ) = ki . Therefore, two properties make it possible that a leaving mem-
ber cannot compute new keys after leaving the group and any newly joining member
can obtain all previous keys and information through applying the current key to hash
function h() repeatedly.

2.2.3. Group members’ pseudonym keys - Reversed hash key chain.


In this scheme, there are each member’s pseudonyms, which are generated with the re-
versed hash key chain as the same way of group session keys. Thus, each member has
also q pseudonyms which are denoted as pdij (for each member i, 1 ≤ j ≤ q).
q
pdi , (randomly generated)
q q−1
h(pdi ) = pdi
q−1 q−2 q
h(pdi ) = pdi = h2 (pdi )
q−2 q−3 q
h(pdi ) = pdi = h3 (pdi )
.........
q
h(pdi4 ) = pdi3 = hq−3 (pdi )
3 2 q−2 q
h(pdi ) = pdai = h (pdi )
2 1 q−1 q
h(pdi ) = pdi = h (pdi )

2.3. Encryption and Decryption with group members’ group key

We assume that the encryption method for a massage ‘M’ with the group key ‘KG ’ is
q q q q q
C = gh(KG ) f (KG ) M. For simplicity, we put Ki,1 , Ki,2 , Ki,3 , Ki,4 , Ki,5 as K1 , K2 , K3 , K4 , K5 and
fKG (KG ) as f (KG ). Then, the encryption method with the each member’s group key kmij ,
for example, in the last session (i.e. j=q, kmqi ) is as follows.
q q
C = Ekmq (M) = K3K1 gK2 M = (g f (KG ) )h(KG )αi gh(KG ) f (KG )(1−αi ) M = gh(KG ) f (KG ) M. We can
i
check that the result of encryption with the group key ‘KG ’ is the same as one with each
member’s group key kmij , that is K1 , K2 , K3 , K4 , K5 .
The decryption method with the group key ‘KG ’ is D = C · g−h(KG ) f (KG ) = M. Then,
the decryption method with the the each member’s group key kmqi in the last session is;
q q
D = C · K3K4 gK5 = gh(KG ) f (KG ) M · (g f (KG ) )−(h(KG )+αi ) · g f (KG )αi =
q q
gh(KG ) f (KG )− f (KG )h(KG )− f (KG )αi + f (KG )αi · M = M
We can also check that the result of decryption with the group key ‘KG ’ is the same
as one with each member’s group key kmij . Because of the properties of this developed
encryption and decryption algorithm, VSDS can achieve no need of re-keying processes
whenever membership-changes happen.
H.-A. Park / VSDS for Blurred Boundaries of Next Generation Computing Era 215

In addition, pseudorandom function fk (·) is simple cryptographic function for en-


cryption with the secret key k.

2.4. Member Indicator

At the registration stage, SM assigns the 3-dimensional real models of the number of
j(3 ≤ j ≤ t, t value depends on the condition and policy of systems) to each member i of
each group A, and the members keep the given j 3D real models for later authentication.
s is put as the
Every group has its particular j models(3D shaped thing) respectfully. VA,i
video image information for the 3D model of the member i in group A at the s-th ses-
sion. Every session SM selects one model of the group’s 3D models and challenges the
member of the group with VA,is . Then, the member renders the 3D real model for V s .
A,i

2.5. Whole Protocol

2.5.1. Registration.
As the first process, every user should register at the Security Manager (SM). In this
registration Stage, pseudonyms, group members’ group keys, group session keys, and
the other information including member indicators are generated for each user to use this
system with safety.
Then, every user is given some information from SM. They stores them in one’s own
device such as smartphone or PC and keep VA,i s to j 3D real models. The given infor-

mation for each member i is as follows; h(Ekm1 (pdi1 ||V )), pdi1 , km1i , {h(Ekm j (pdij )), (1 ≤
i i
j ≤ q)}.
SM should also store some information for each member ; αiq , the values for pseudonym
hash key chain {h(pdij ), pdij , (1 ≤ j ≤ q)}.
Fig. 2 shows the whole process of VSDS from Registration.

2.6. The Detailed Protocol

[The First Session_Log-in Stage]


1. With the stored value pdi1 , km1i , a member i computes f pd 1 (km1i ), h(pdi1 ), then sends
i
the below information in 1 to SM, where h(Ekm1 (pdi1 ||Vi1 )) is also the stored value
i
at registration time. Because km1i is the member i’s group key in the first session,
K1 1
Ekm1 (pdi1 ||Vi1 ) means C = Ekm1 (pdi1 ||Vi1 ) = K31 1 gK2 (pdi1 ||Vi1 )
i i
1 1
= (g f (KG ) )h(KG )αi gh(KG ) f (KG )(1−αi ) (pdi1 ||Vi1 ) = gh(KG ) f (KG ) (pdi1 ||Vi1 ). Here, for simplic-
ity, K11 , K21 , K31 are denoted as the member i’s subkeys for its group key km1i in the first
session. K41 , K51 are also the subkeys for km1i .
2. After receiving the information from the member i, SM checks 1(s), h(pdi1 ) and
find the corresponding values αiq , pdi1 from its storage. Then, with pdi1 , SM decrypts
D( f pd 1 (km1i )) and gets km1i . For the found value αiq , SM applies αiq to hash function
i
 
repeatedly, to the (q-1) times. If he obtains the result αi1 , then SM computes km1i =;
       
K11 = h(KG )αi1 , K21 = h(KG ) fKG (KG )(1−αi1 ), K31 = g fKG (KG ) , K41 = −(h(KG )+αi1 ), K51 =

fKG (KG )αi1 .
216 H.-A. Park / VSDS for Blurred Boundaries of Next Generation Computing Era

Registration
The First Session
User SM
Log-in Stage
1. Compute: f pd 1 (km1i ), h(pdi1 )
i
{1(s),h(pdi1 ), f pd 1 (km1i ),h(Ekm1 (pdi1 ||Vi1 ))}
−−−−−−−−−−−−−−−−−−i−−−−−−→ i
q
2. Find: 1(s), h(pdi1 ) → αi , pdi1
Decrypt: D( f pd 1 (km1i )) = km1i
i
q 
Compute:hq−1 (αi ) = αi1 ,

km1i = {K11 , K21 , K31 , K41 , K51 }

Verify: km1i = km1i
Compute & Verify:

h(Ekm1 (pdi1 ||Vi1 )) =? h(Ekm1 (pdi1 ||Vi1 ))
i i
3. Compute: αi , kmi
2 2
f pd 1 (km2i ,pdi2 ), f pd 2 (pdi1 ||Vi1 )
Compute & Send:
 
←−i−−−−−−−−−i −−−−−−
4. Decrypt: D( f pd 1 (km2i , pdi2 )) = km2i , pdi2
i
Compute & Verify:
 
h(Ekm2 (pdi2 )) =? h(Ekm2 (pdi2 )), h(pdi2 ) = pdi1
i i
 
Then, km2i → km2i , pdi2 → pdi2
5. Decrypt: D( f pd 2 (pdi1 ||Vi1 )) = pdi1 ||Vi1
i
Render at a card: R(Vi1 )
6. Verify the card: R(V ) = RV 1
i
Action Stage
User V SDS Server
[member − i]
1
Ki,1 K1
7. Encrypt & Upload M: Ci1 = Ekm1 (M) 1
=Ki,3 ·g i,2 ·M=gh(KG ) f (KG ) M
i −−−−−−−−−−−−−−−−−−→
[member − j]
8. Download from VSDS Server: C1
←−−−
i
−−
K1 K 1j,5
9. Decrypt Ci1 :D = Ci1 · K 1j,3 j,4 ·g =M
2nd Session
User SM
Log-in Stage
1. Compute & Send:
{2(s),h(pdi2 ), f pd 2 (km2i ),h(Ekm2 (pdi2 ||Vi1 ))}
−−−−−−−−−−i−−−−−−−−i−−−−−−→ q
2. Find: 2(s), h(pdi2 ) → αi , pdi2
Decrypt: D( f pd 2 (km2i )) = km2i
i
q 
Compute:hq−2 (αi ) = αi2 ,

km2i = {K12 , K22 , K32 , K42 , K52 }

Verify: km2i = km2i
Compute & Verify:

h(Ekm2 (pdi2 ||Vi1 )) =? h(Ekm2 (pdi2 ||Vi1 ))
i i
3. Compute: αi3 , km3i
f pd 2 (km3i ,pdi3 ), f pd 3 (pdi2 ||Vi2 )
Compute & Send:
 
←−i−−−−−−−−−i −−−−−−
4. Decrypt: D( f pd 2 (km3i , pdi3 )) = km3i , pdi3
i
Compute & Verify:

h(Ekm3  (pdi )) = h(Ekm3 (pdi3 )), h(pdi3 ) = pdi2
3
i i
 
Then, km3i → km3i , pdi3 → pdi3
5. Decrypt: D( f pd 3 (pdi2 ||Vi2 )) = pdi2 ||Vi2
i
Render at a card: R(Vi2 )
6. Verify the card: R(V ) = RV 2
i
Action Stage
Same as the 1st Session
Figure 2. The Whole Process of VSDS.
H.-A. Park / VSDS for Blurred Boundaries of Next Generation Computing Era 217

 
Then, SM verifies km1i = km1i or not. Again, SM computes h(Ekm1 (pdi1 ||Vi1 ))) with the
i

km1i and checks h(Ekm1 (Vi1 )) is the same as the received value h(Ekm1 (Vi1 )) or not. Here,
i i
Ekm1 (pdi1 ||Vi1 ) has the same meaning as the above 1.
i
3. SM computes αi2 by applying αiq to hash function (q-2) times, and then computes
km2i ;
K12 = h(KG )αi2 , K22 = h(KG ) fKG (KG )(1 − αi2 ), K32 = g fKG (KG ) , K42 = −(h(KG ) + αi2 ), K52 =
fKG (KG )αi2 .
SM computes and sends f pd 1 (km2i , pdi2 ), f pd 2 (pdi1 ||Vi1 ). Here, pdi2 is the stored value.
i i
4. With the value pdi1 , the member i decrypts the received value; D( f pd 1 (km2i , pdi2 )) =
i
    
km2i , pdi2 . With the obtained values km2i , pdi2 , the group member i computes h(Ekm2 (pdi2 ))
i
and verify if this is the same as h(Ekm2 (pdi2 )).
i
Because km2i is the member i’s group key, the encryption method is also the same
 
as 1. Then, i hashes the value pdi2 and verifies; h(pdi2 ) = pdi1 . If the verifications are
 
successful, km2i and pdi2 become km2i and pdi2 .
5. With this pdi , the group member i also decrypts; D( f pd 2 (pdi1 ||Vi1 )) = pdi1 ||Vi1 .
2
i
With the decrypted Vi1 , i renders this R(Vi1 ) then i uploads the image of R(Vi1 ) at a card.
6. SM verifies if the rendered card image R(Vi1 ) is the same as RV 1 (3D real model)
i
or not. At the first session’s verification, member indicator’ authentication is processed.
If SM’s verification is successful, the member i can begin to act (log-in allowed). The
action means uploading, reading(decryption) and downloading.
[The First Session_Action Stage]
K1
1 i,1 · gKi,2 · M = gh(KG ) f (KG ) M.
1
7. A member i encrypt message M: Ci1 = Ekm1 (M) = Ki,3
i
Then, the member i uploads M to his card.
8.Another member j downloads an encrypted message Ci1 from a VSDS board(server).
K 1j,4 1
9.The member j decrypts Ci1 with his first group session key; D = Ci1 · K 1j,3 · gK j,5 =
gh(KG ) f (KG ) M · (g f (KG ) )−(h(KG )+α j ) · g f (KG )α j = gh(KG ) f (KG )− f (KG )h(KG )− f (KG )α j + f (KG )α j ·
1 1 1 1

M=M
[The Second Session]
From the second session, most processes are similar to the first session. As the
session is changed, the corresponding pseudonym keys and group session keys are also
changed. As for the video image information V for 3D real model, a member sends the
information V 1 kept from the first session to SM, and then SM challenges the member
with the newly selected information V 2 in the third step. Lastly, the member renders 3D
real model R(V 2 ) at his card. Action stage is also similar to the first session.
From the third session, all processes go through the same paths as the second session.

3. Discussion

3.1. Efficiency
3.1.1. Strength.
In the secure group information-sharing communication, ’Group Re-Keying’ is the im-
portant task when user joins or leaves the group. The group keys needs to be updated
218 H.-A. Park / VSDS for Blurred Boundaries of Next Generation Computing Era

to maintain the forward and backward secrecy [14]. However, in the proposed system
VSDS, according to computation of the developed protocol, the every result of authenti-
cation is the same as the authentication-result with the fundamental group key. Therefore,
it does not need to do re-keying for membership -changes.
3.1.2. Weakness.
In the last step of the first session’s authentication (5, 6), 3-dimensional image Vis ren-
dered. R(V) plays a role of a member indicator which is decided by SM in advance. The
meaning is “improving security". If 3-dimensional image is inefficient in a real world,
2-dimensional image is recommended.
However, Google’s project ‘Tango’ has been recently showcased with indoor map-
ping and VR/AR platform [4]. ‘Tango’ technology makes a mobile device possible to
measure the physical world. Tango-enabled devices (smartphones, tablets) are used to
capture the dimensions of physical space to create 3-D representations of the real world.
‘Tango’ gives the Android device platform the new ability for spatial perception. There-
fore, it can be said that the proposition of VSDS is timely good keeping abreast of
‘Tango’s AR/VR technique to mobile devices.

3.2. Security

VSDS is a reversed hash key chain based group-key management system. Message con-
fidentiality is one of the most important features in secure information sharing for group
members. The group key security requirements are;
1. Group Key Secrecy: It should be computationally impossible that a passive adver-
sary discovers any secret group key.
2.Forward Secrecy: Any passive adversary with a subset of old group keys cannot
discover any subsequent(later) group key.
3.Backward Secrecy: Any passive adversary with a subset of subsequent group keys
cannot discover any preceding(earlier) group key.
4. Key Independence; Any passive adversary with any subset of group keys cannot
discover any other group key [3, 15].
However, group-key based information sharing and service system does not follow such
requirements because a new joiner to the group could search all of the previous informa-
tion to be helped. Namely, backward secrecy is not eligible for a security requirement of
VSDS. The System VSDS satisfies with Group Information-sharing Secrecy as follows;
1. Forward Secrecy: For any group GT and a dishonest participant p ∈ GTj , the prob-
ability that a participant p can generate valid group key and pseudonym for (j+1)-th au-
thentication is negligible when the participant knows group key kmij and pseudonym pdij ,
where p ∈ GTj+1 and 0 < j < q. It means that all leaving members from a group should
not access to all of the next information or documents of the group any more.
2. Backward Accessibility: For any group GT and a dishonest participant p ∈ GTj , the
probability that a participant p can generate valid group key and pseudonym for (j-l)-th
authentication is 1 − η(n)2 when the participant knows group key kmij and pseudonym
pdij , where p ∈ GTj−l and 0 < l < j. Namely, all joining members to a group can access

2 the term negligible function refers to a function η : N → R such that for any c ∈ N, there exists n ∈ N, such
c
that η(n) < n1c for all n ≥ nc [16]
H.-A. Park / VSDS for Blurred Boundaries of Next Generation Computing Era 219

to all of the previous information or documents of the group.


3. Group Key Secrecy : For any group GT , and a dishonest participant p who knows a
set of initial knowledge-group fundamental key KGT and one member i’s group key km1i ,
the probability that participant p can guess correctly the encrypted information message
M of group GT at the j-th session is negligible. It must be computationally impossible for
dishonest participant p to know or guess correctly the contents of the encrypted message
even if a leaving member or another member of a group reveals his group keys.

4. Conclusion
VSDS was proposed for the patients in all over the world who want to get some helps
and share information such as the web ‘PatientsLikeMe’. This system guarantees secu-
rity and privacy, because most health and private information are sensitive. Therefore,
VSDS is scalable to other group’s project applications with safety. Moreover, it is firmly
believed that the identified problems between next generation’s collaborative comput-
ing and security and the approaches also should be managed as an Integrated Security
Management (ISM).

References

[1] H.A.Park, Secure Chip Based Encrypted Search Protocol In Mobile Office Environments, International
Journal of Advanced Computer Research, 6(24), 2016
[2] Y.Hu, A.Perrig, D.B.Johnson, Efficient security mechanisms for routing protocols, In the proceedings of
Network and Distributed System Security Symposium (2003), 57-73
[3] H.A.Park, J.H.Park, and D.H.Lee, PKIS: Practical Keyword Index Search on Cloud Datacenter,
EURASIP Journal on Wireless Communications and Networking, 2011(1), 84(2011), 1364-1372
[4] G.Sterling, Google to showcase Project Tango indoor mapping and VR/AR platform at Google I/O,
http://searchengineland.com/google-showcase-project-tango-indoor-mapping-vrar-platform-google-io-
249629, 2016
[5] H.A.Park, J.Zhan, D.H.Lee, PPSQL: Privacy Preserving SQL Queries, In the Proceedings of ISA(2008),
Taiwan, 549-554
[6] P.Wang, H.Wang, and J.Pieprzyk, Common Secure Index for Conjunctive Keyword-Based Retrieval over
Encrypted Data, SDM 2007 LNCS 4721(2007), 108-123
[7] H.A.Park, J.W.Byun, D.H.Lee, Secure Index Search for Groups, TrustBus 05 LNCS 3592(2005), 128-
140
[8] H.A.Park, J.H.Park, J.S.Kim, S.B.Lee, J.K.Kim, D.G.Kim, The Protocol for Secure Cloud-Service Sys-
tem. In the Proceedings of NISS(2012), 199-206
[9] P.Wang, H.Wang, and J.Pieprzyk, Keyword Field-Free Conjunctive Keyword Searches on Encrypted
Data and Extension for Dynamic Groups, CANS 2008 LNCS 5339(2008), 178-195
[10] Y.Kawai, S.Tanno, T.Kondo, K.Yoneyama, N.Kunihiro, K.Ohta, Extension of Secret Handshake Proto-
cols with Multiple Groups in Monotone Condition. WISA 2008 LNCS 5379(2009), 160-173
[11] J.Li, X.Chen, Efficient multi-user keyword search over encrypted data in cloud computing, Computing
and Informatics 32 (2013), 723-738
[12] http://lifehacker.com/how-to-use-trello-to-organize-your-entire-life-1683821040
[13] https://www.patientslikeme.com/
[14] R.V.Rao, K.Selvamani, R.Elakkiya, A secure key transfer protocol for group communication, Advanced
Computing: An International Journal, 3(2012), 83-90
[15] A.Gawanmeh, S.Tahar, Rank Theorems for Forward Secrecy in Group Key Management Protocols, In
the Proceedings of 21st AINAW(2007), 18-23
[16] D.Boneh, B.Waters, Conjunctive, Subset, and Range Queries on Encrypted Data, In the Proceedings of
4th TCC(2007), 535-554
220 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-220

Implicit Feature Identification in Chinese


Reviews Based on Hybrid Rules
Yong WANG1, Ya-Zhi TAO, Xiao-Yi WAN and Hui-Ying CAO
Key laboratory of electronic commerce and logistics of Chongqing, Chongqing
University of Posts and Telecommunications, Chongqing 400065, China

Abstract. In the most existed text-mining schemes for customer reviews, explicit
features are usually concerned while implicit features are ignored, which probably
leads to incomplete or incorrect results. In fact, it is necessary to consider implicit
features in customer review mining. Focusing on the identification of implicit
feature, a novel scheme based on hybrid rules is proposed, which mixed statistical
rule, dependency parsing and conditional probability. Explicit product features are
firstly extracted according to FP-tree method and clustered. Then, association pairs
are obtained based on dependency parsing method and the production of frequency
and PMI. Finally, implicit features are identified by considering the association
pairs and conditional probability of verbs, nouns and emotional words. The
proposed scheme is tested on a public cellphone reviews corpus. The results show
that our scheme can effectively find implicit features in customer reviews.
Therefore, our research can obtain more accurate and comprehensive results from
the customer reviews.

Keywords. network reviews, implicit feature, comment mining, association pair


extraction, conditional probability Introduction

Introduction

Today, E-commercial websites contain a large of consumer reviews about products. On


one hand, potential consumers can decide whether to buy the product after reading the
product reviews; on the other hand, reviews are helpful for manufacturers to improve
product design and quality. However, it is impossible for people to read all reviews by
themselves, because the amount of reviews is huge. So, review mining is emerging as
the times require and becomes a significant application field. Feature identification,
containing explicit features identification and implicit feature identification, is a core
step in review mining. If a feature appears in a review directly, it’s defined as an
explicit feature. Similarly, if a feature doesn’t appear in a review but is implied by
other words, it’s defined as an implicit feature [1]. A sentence which contains explicit
features is defined as explicit sentence, and a sentence which contains implicit feature
is defined as an implicit sentence. Wang et al. [2] counted the Chinese reviews they
crawled and discovered that at least 30 percent of the sentences are implicit sentences.
Thus, it can be seen that implicit features play a significant role in reviews mining.

1
Corresponding Author: Yong WANG, Chongqing University of Posts and Telecommunications, No.2
Chongwen Road, Nan’an District, Chongqing City, China; E-mail: wangyong1@cqupt.edu.cn.
Y. Wang et al. / Implicit Feature Identification in Chinese Reviews Based on Hybrid Rules 221

In recent years, some scholars have been studying implicit feature extraction. In
most proposes, implicit features are identified on the basis of emotional words. Qiu et
al. [3] proposed a novel approach to mine implicit features based on clustering
algorithm of k-means and F2 statistics. Hai et al. [4] identified implicit features via co-
occurrence association rules (CoAR) mining. Zeng et al. [5] proposed a method based
classification for implicit features identification. Zhang et al. [6] used explicitly multi-
strategy property extraction algorithm and similarity to detect implicit features. What’s
more, Wang et al. [7] proposed a hybrid association rule mining method to detect
implicit features.
To identify implicit feature, we proposed a novel scheme based on a hybrid rules,
which consist of three different methods. Compared with previous research results, the
presented scheme has two advantages: (1) considering semantic association degree and
statistical association degree together, we would get more accurate <feature clusters,
emotional words> association pairs. (2) In Chinese reviews, some emotional words can
qualify more than one features, such as ̌ᅢ̍(good),̌Ꮕ̍(bad). Thus, it is not
accurate to only consider the association between emotional words and features. To
solve this problem, the association between verbs, nouns and features is also
considered.

1. Scheme Design

Figure 1 depicts the framework of our scheme which is composed of several parts.

Figure 1. Scheme framework.

1.1. Explicit Features Extraction and Clustering

In this stage, explicit features are extracted. Detail steps are as follows:
 Do word segmentation and POS (part-of-speech) tagging for reviews via
ICTCLAS (Institute of Computing Technology, Chinese Lexical Analysis
System).Then, nouns and noun phrase from the annotated corpus of comments
are stored in a transaction file.
 Frequent itemsets obtained by FP-tree method are regarded as candidate explicit
features I0.
 Candidate explicit features I1 are got after pruning all single words in I0.
222 Y. Wang et al. / Implicit Feature Identification in Chinese Reviews Based on Hybrid Rules

 According to Chinese semantic and grammatical knowledge, a rule which


contains frequency words but non features is established. This rule is used to
filter I1 for getting Candidate explicit features I2. The rule is as follows:
appellation nouns, such as ̌ ̍(friend), ̌ ̍(classmate), etc.
colloquial nouns, such as ̌ ̍(machine), etc.
product name, such as ̌ ̍(cellphone), ̌ ̍(computer) , etc.
abstract nouns, such as ̌ ̍(reason), ̌ ̍(condition) , etc.
collective nouns, such as “ੱ ̍(people), etc.
 PMI algorithm [8] is used to measure the value between the product and each
feature in I2. The final explicit features are obtained after filtering the features
with PMI value small than a threshold. The PMI value is calculated as follows:
hit (" "and" ")
PMI ( )=log 2
hit ( )hit ( )
(1)
Where hit(x) is the pages returned by Baidu search engine when using x as a
keyword; the threshold is set as -3.77, which is determined by experimental sample
data.
 Each features similarity is calculated by Tongyici Cilin [9]. Features are clustered
into one group if their similarity values are 1. Once explicit feature clusters are
obtained, a feature will be chosen as the representative feature for the cluster
which is in.

1.2. Explicit association pairs <explicit feature cluster, emotional word> extraction

We use dependency parsing method and frequency*PMI method to judge whether


feature clusters and emotional words can compose association pairs from the two
aspects of semantics and statistics. Detail contents are as follows:
 Extracting emotional words in explicit sentences. Extracting adjective, POS of
words are ̌/a̍ or ̌/an̍, in explicit sentences as emotional words.
 Calculate frequency*PMI between emotional words and explicit feature clusters.
The frequency*PMI formula is as follows:
Pf &w
frequency * PMI  f ,w ! Pf &w *log 2
Pf Pw (2)
Where w is the emotional word, f is the feature cluster, Pf is the probability of the
feature f occurrence in explicit sentences. The formulas of Pf and Pf&w are as follows:
n
Pf ¦P
i 1
fi
(3)

n
Pf &w ¦ Co _ occurrence f , w / R
i 1
i
(4)
Where n is the number of features in a feature cluster, fi is ith feature in the feature
cluster f, Co_occurence (fi, w) is co-occurrence times of fi and w explicit sentences, R is
the number of sentences in explicit sentences.
 Using syntax analysis tools to obtain all dependence relationship in the
sentences. If “nsubj” relationship exists between feature clusters and emotional
words, there is modified relation feature between feature clusters and emotional
words. If a feature in a feature cluster has a modified relation with an emotional
Y. Wang et al. / Implicit Feature Identification in Chinese Reviews Based on Hybrid Rules 223

word, we consider that the feature cluster has a modified relation to the
emotional word.
 Setting a threshold p. The association pairs with frequency*PMI value larger
than p, or the frequency*PMI value smaller than p but existing modified
relations, are chosen as final association pairs. The p in the paper is -0.00009.

1.3. Implicit Features Identification

At present, most research only considered emotional words in implicit sentences.


Different from them, we identify implicit features by considering emotional words,
verbs and nouns. Detail steps are as follows:
 Analyzing elements of the implicit sentence and making two judgments. The
first judgment is whether the emotional words are in the implicit sentence. The
second judgment is whether existing verbs or nouns are in the implicit sentence.
 There are four types in terms of the two judgments: Y1 represents that the
association pairs are found by emotional words in the implicit sentence; N1 is
the opposite of Y1.Y2 represents that verbs or nouns are in the implicit sentence;
N2 is the opposite of Y2.
Y1N1
Step 1, extracting emotional words in the implicit sentence, then candidate
association pairs containing these emotional words are obtained. Feature clusters in the
candidate association pairs are treated as candidate feature clusters.
Step 2, verbs and nouns in the implicit sentence are extracted and treated as
notional words set. Then, we calculate each candidate feature cluster’s conditional
probability under the condition of these words. We defined the calculation formula
follows:
Co _ occurrence f , word j
P( f | word j )
count ( word j ) (5)
Where wordj is jth word in notional words set, f is a candidate feature cluster. Then we
defined the f’s average condition probability as follows:
v
T( f ) ¦ P( f
j 1
| word j ) / v (6)
Where v is the number of notional words set.
Step 3, we defined the scores of each candidate feature clusters as follows:

Score  f , w ! D *( frequency * PMI  f , w ! )  (1  D )T ( f ) (7)

Where D is a weight coefficient, and it is set as 0.7 after several experiments. Then the
representative feature of a feature cluster which is in the association pairs with the
highest score is chosen as the implicit feature.
Y1N2
Step 1 is the same as the first step of Y1N1.
Step 2, the representative feature of a feature cluster which is in the candidate
association pairs with the highest frequency*PMI value is chosen as the implicit feature.
Y2N1
Step 1, verbs and nouns in the implicit sentence are extracted and treated as
notional words set. Then, we use Eqs. (5) and (6) to calculate all explicit feature
cluster’s average conditional probability under the condition of these words.
224 Y. Wang et al. / Implicit Feature Identification in Chinese Reviews Based on Hybrid Rules

Step 2, the representative feature of a feature cluster which is in the association


pairs set with the highest score is chosen as the implicit feature.
Y2N2
The implicit feature can’t be identified

2. Experiment Evaluation

2.1. Data Set and Evaluation

Six hundred reviews about one kind of cell phone was download from a pubic website
called Datatang.com. In order to evaluate the performance of the scheme, data set was
manually annotated. In the data set, there are 1870 explicit sentences and 413 implicit
sentences. Three traditional methods, precision, recall and F-measure, are used to
evaluate the performance of the scheme.

2.2. Experimental Results and Comparison

89 product explicit features are obtained by the method described in Section 1.1. The
top 5 features most concerned by customers are shown in Table 1. The precision of this
method is 70.8%, the recall is 73.3% and F-measure is 72%.1285 association pairs are
extracted from explicit sentences by the approach described in Section 1.2. Five
association pairs are shown in Table 2. As seen from the table, the performance of the
approach is good.
Table 1.Top 5 product features results
rank feature PMI frequency
1 ᥓ⢻(intelligence) 0.0 14
2 ઙ(software) -0.10005 42
3 ภ (number) -0.44418 30
4 ዳ᐀(screen) -0.6529 194
5 ચᩰ(price) -0.79837 34
Table 2. Association pairs
rank feature PMI frequency
1 ᥓ⢻(intelligence) 0.0 14
2 ઙ(software) -0.10005 42
3 ภ (number) -0.44418 30
4 ዳ᐀(screen) -0.6529 194
5 ચᩰ(price) -0.79837 34
Implicit features are identified by the approach described in Section 1.3. Table 3 is
partial result. Compared with Ref. [4] by using the same data, results are in Table 4.
Table 3.partial result about Implicit features identification Table 4.Comparative results
Implicit sentences implicit feature Evaluation index our scheme Ref.[4]
900 Ma is difficult to meet the needs battery precision 67.49 % 41.55%
too expensive price recall 65.86% 37.53%
very slow and very troublesome reaction F-measure 66.67% 39.44%
Very beautiful appearance
shape looks like hard appearance

It can be seen from the above tables that the proposed algorithm is far superior to
the algorithm in [4]. Our scheme can better meet the needs of the practical application.
The algorithm proposed in this paper takes statistical analysis and semantic analysis
Y. Wang et al. / Implicit Feature Identification in Chinese Reviews Based on Hybrid Rules 225

into account which can find more association between emotional words and explicit
feature clusters. The research in [4] only focused on mining product features from the
point of statistical view. Therefore, our method has more advantages in performance.

3. Conclusion

Implicit features in customer reviews have an important effect on the text mining
results, which is also an important factor for customers or enterprises to make a wise
decision. In this paper, we proposed a scheme combining several rules to extract the
implicit features from the word segmentation to identification. Compare with the
conventional methods, our scheme not only obtains the association between emotional
words and product features based on statistics and semantics, but also consider the
effect of emotional words, verbs and nouns to the final results. Experiment results
shows that our scheme lays a good basis for the application of network reviews.

Acknowledgments

This work is supported by National Natural Science Foundation of China (61472464),


Natural Science foundation of CQ CSTC (cstc2015jcyjA40025), Social Science
Planning Foundation of Chongqing (2015SKZ09), and Social Science Foundation of
CQUPT (K2015-10).

References

[1] B. Liu, M. Hu, J. Cheng. Opinion observer: analyzing and comparing opinions on the web. In:
Proceedings of the 14th International Conference on World Wide Web (WWW’05), ACM, New York,
NY, USA, 2005, 342̄351.
[2] H. Xu, F. Zhang, W. Wang. Implicit feature identification in Chinese reviews using explicit topic mining
model. Knowledge-Based Systems, 76(2014):166̄175.
[3] Y. F. Qiu, X. F. Ni, L. S. Shao. Research on extracting method of commodities implicit opinion targets.
Computer Engineering and Applications, 51(2015):114-118.
[4] Z. Hai, K. Chang, J.-j. Kim, Implicit feature identification via co-occurrence association rule mining. In:
Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science,
6608(2011), 393̄404.
[5] L. Zeng, F. Li. A Classification-Based Approach for Implicit Feature Identification/ Chinese
Computational Linguistics and, Natural Language Processing Based on Naturally Annotated Big Data.
Springer Berlin Heidelberg, 2013:190-202.
[6] L. Zhang, X. Xu. Implicit Feature Identification in Product Reviews. New Technology of Library and
Information Service. 2015, (12):42-47.
[7] W. Wang, H. Xu, and W. Wan. Implicit feature identification via hybrid association rule mining. Expert
Systems with Applications, 40(2013):3518̄3531.
[8] K W Church, et al. Word association norms, mutual information and lexicography. In: Proceedings of
the 27th Annual Conference of the Association of Computational Linguistics, New Brunswick, NJ:
Association for Computational Linguistics.1989: 76̄83.
[9] J. L. Tian, W. Zhao. Words Similarity Algorithm Based on Tongyici Cilin in Semantic Web Adaptive
Learning System. Journal of Jilin University (Information Science Edition), 28(2011):602-608.
226 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-226

Characteristics Analysis and Data Mining of


Uncertain Influence Based on Power Law
Ke-Ming TANGa, Hao YANG a,b,1 , Qin LIUc, Chang-Ke WANGc, and Xin QIUa
a
School of Information Engineering, Yancheng Teachers University, Yancheng,
Jiangsu, China
b
School of Software and TNList, Tsinghua University, Beijing, China
c
School of Foreign Language, Yancheng Teachers University, Yancheng, Jiangsu,
China

Abstract. The researches of the traditional cascade events, such as avalanche, sand
model, only researched the power-law distribution in the process of all time. In
fact, the speed of Virus propagation is different in each time period. In this paper,
we can find that the number of infected people behaves as a power-law for Guinea,
Liberia and Sierra Leone respectively over different time through our empirical
observations. So the government could take different power exponents of the
number of infected people as the spread of the disease in different periods for the
speed of manufacturing of the drug.

Keywords. Ebola, avalanche, sandpile model

Introduction

A common phenomenon of ‘‘avalanche’’ [1–2] or ‘‘cascade failure’’ [3–7] has attracted


much attentions for a long time, where the event undergoes a chain reaction and often
gives rise to catastrophes or disasters. Snow avalanches [8] or landslide avalanches [9]
induced by cascade failures in power grids [2, 6, 7].
The Ebola epidemic wreaking havoc in West Africa has led to a global ripple effect.
In the absence of Characteristics of Ebola, the disease has alarmed the global public
health community and caused panic among some segments of the population.
The ongoing Ebola epidemic in West Africa has affected the United States and other
Western countries, and the phenomenon of “avalanche” must take place in all of the
word in the absence of effective methods of eradicating Ebola through analyzing
the propagation characteristics.
The researches of the traditional cascade events, such as avalanche, sand model,
only researched the power-law distribution in the process of all time. It means that
large-scale avalanches occur occasionally in the process of evolution. Correspondingly,
various small-scale avalanches appear more and more and its number satisfies the
power-law distribution. In fact, the speed of Virus propagation is different in each time

1
Corresponding Author: Hao YANG, School of Information Engineering, Yancheng Teachers
University, Yancheng, Jiangsu, China; School of Software and TNList, Tsinghua University, Beijing, China;
E-mail: classforyc@163.com.
K.-M. Tang et al. / Characteristics Analysis and Data Mining of Uncertain Influence 227

period. For Guinea, Liberia and Sierra Leone, we do some empirical observations about
the spread of the disease in different time.

1. The measurement of the Spread of the Disease Based on Power Law Model

We suppose the sand number increase with n. A possibility is that the sand number per
unit length of the cycles, λ. The parameter is constant. Put k/2 sands (k = 2πλctgθ is an
even number) on the 0-th phase, and k sands at the 1st sand. The number of the sand on
the n-th cycle should be nk. Likely, we assume the falling sands is an inelastic collision
with the resting sands. In this case, each sand slide together after the collision. We make
the sliding sands in order that (n2 −n+1) sands evenly meet 2n resting sands on the n-th
phase. There are (n2 − n + 1)k in the n-th generation (bn = (n2 − n + 1)k, dn = 0), then
N(t)~n(t)2~t2×2~t4 [16].
We resolve with susceptible people(S), latent people (L), infected people (I) and
death people(D). The transformation of the four nodes is shown in Figure 1.

Figure 1. The transformation of the four nodes.


Now we can study the different equations which show the virus spread on the basic
of the related knowledge in this paper. According to sandpile model’s analysis, the
equation about epidemic trend is as follows (with the acceleration of the number of
infected people B, and the acceleration of the number of dead people …′. A is a constant
which related to B in the axis, ′ is a constant which related to …′ in the axis):
ˆ(:(‰)) ˆ(K(‰))
ˆ ‰
=… ˆ‰
= …′ (1)
‹  ‹ 
c(L) = L ∙ 10 9(L) = L ∙ 10 (2)

2. Model Evaluations

We collected data about the number of all the cases and the number of the people
infected from a website (http://www.cdc.gov/vhf/ebola/outbreaks/2014-west-africa/
whats-new.html). As a result of our experiments, a linear function was fitted to the linear
ranges of log-log plotted distributions to estimate the value of the gamma exponent.
Figure 2-4 show distributions of I for Guinea, Liberia and Sierra Leone respectively
(with values of the Pearson correlation coefficient R, and standard deviation SD). Our
method considers the number of infected people from February 4, 2014 to March 25,
2015.
The Log-log plots of the number of the people infected and dead are demonstrated;
a) Log-log plot of the number of the people infected
228 K.-M. Tang et al. / Characteristics Analysis and Data Mining of Uncertain Influence

(a)0<t<100 (b)100<t<229

(c)229<t<318
Figure 2. Log-log plot of the number of the people infected for Guinea

(a)2<t<100 (b)100<t<267

(c)267<t<318
Figure 3. Log-log plot of the number of the people infected for Liberia
K.-M. Tang et al. / Characteristics Analysis and Data Mining of Uncertain Influence 229

(a)64<t<271 (b)271<t<318

Figure 4. Log-log plot of the number of the people infected for Sierra Leone

b) Log-log plot of the Number of the People Dead

(a)0<t<100 (b)100<t<318

Figure 5. Log-log plot of the number of the people dead for Guinea

(a)0<t<100 (b)100<t<318

Figure 6. Log-log plot of the number of the people dead for Liberia
230 K.-M. Tang et al. / Characteristics Analysis and Data Mining of Uncertain Influence

Figure 7.. Log-log plot of the number of the people dead for Sierra Leone

Table 1. Values of R(Pearson correlation coefficient), B(Power exponent) and SD(Standard deviation) for
log-log plot of the number of the people infected for three countries.
Guinea Liberia Sierra Leone
t 0-100 100-229 229-318 2-100 100-267 267-318 64-271 271-318
R 0.96003 0.98086 0.93949 0.88703 0.9855 0.98225 0.97084 0.98431
SD 0.06164 0.06282 0.01599 0.18816 0.10405 0.00326 0.15811 0.00396

Table 2. Values of R(values of the Pearson correlation coefficient) and SD(standard for log-log plot of the
number of the people dead for three countries.
Guinea Liberia Sierra Leone
t 0-100 100-318 2-100 100-318 64-318
R 0.95263 0.9909 0.79757 0.94552 0.96776
SD 0.06817 0.03521 0.15197 0.17199 0.15719

Figure 3-5 show the numbers of the people infected degree distributions for the three
countries and Figure 6-7 show the numbers of the people dead degree distributions for
the three countries. We use R and SD to illustrate the feasibility of our model(with values
of the Pearson correlation coefficient R, and standard deviation SD in Table 1-2). If 0.95
is taken as a minimal reliable value, we can state a power law for the infected and death
people.
Through the above analysis, we can get power relations about the number of people
infected or people dead changing over time. Values of A and B are shown in Table 3-4
respectively. Of course, it is not straightforward the correlation between them and the
number of people. We just represent the numerical results.

Table 3. Values of A and B of the people inflected for the three countries.
Guinea Liberia Sierra Leone
0-100 100-229 229-318 2-100 100-267 267-318 64-271 271-318
0.36666 3.1281 0.99661 0.82688 5.1708 0.73092 4.14169 1.01616
1.3384 -4.46411 0.53758 -0.18821 -8.61879 1.87354 -6.20484 1.336
K.-M. Tang et al. / Characteristics Analysis and Data Mining of Uncertain Influence 231

Table 4. Values of A and B values of the people dead for the three countries.
Guinea Liberia Sierra Leone
t 0-100 100-318 2-100 100-318 64-318
A 0.37032 1.92976 0.45961 3.71858 3.59267
B 1.61866 -1.52369 0.4246 -5.38156 -5.30488

3. Conclusion

According our model, the transmission speed of virus is slow at the beginning, but the
speed will accelerate after a period, which can cause people enough attention to the virus
and take some relevant measures to prevent the spread. Then the speed will be
decreased relatively. Our power law model is reasonable by using a simplified sandpile
model and analyzing the empirical data. The data of latent people could not be collected,
so we only analyze the data of infected and death people in the model to produce the drug.
As our experiment, it is a realistic, sensible, and useful mode, and can be applied to
eradicate Ebola.

Acknowledgements

This work is supported by the National High Technology Research and Development
Program (863 Program) of China (2015AA01A201), National Science Foundation of
China under Grant No. 61402394, 61379064, 61273106, National Science Foundation of
Jiangsu Province of China under Grant No. BK20140462, Natural Science Foundation of
the Higher Education Institutions of Jiangsu Province of China under Grant No.
14KJB520040, 15KJB520035, China Postdoctoral Science Foundation funded project
under Grant No. 2016M591922, Jiangsu Planned Projects for Postdoctoral Research
Funds under Grant No. 1601162B, JLCBE14008, and sponsored by Qing Lan Project.

References

[1] P. Bak, C. Tang, K. Wiesenfeld, Self-organized criticality, Physical review A, 38 (1988): 364.
[2] M. L. Sachtjen, B.A. Carreras, V.E. Lynch, Disturbances in a power transmission system, Physical Review
E, 61(2000): 4877.
[3] A. E. Motter, Cascade control and defense in complex networks, Physical Review Letters, 93(2004)
098701.
[4] J. Wang, L.-L. Rong, L. Zhang, Z. Zhang, Attack vulnerability of scale-free networks due to cascading
failures, Physical A, 387(2008): 6671.
[5] S.V. Buldyrev, R. Parshani, G. Paul, et al. Catastrophic cascade of failures in interdependent networks,
Nature, 464(2010): 1025.
[6] R. Parshani, S.V. Buldyrev, S. Havlin, Interdependent networks: reducing the coupling strength leads to a
change from a first to second order percolation transition, Physical Review Letters, 105(2010): 048701.
[7] T. Zhou, B. H. Wang, Chin. Maximal planar networks with large clustering coefficient and power-law
degree distribution, Physical Letters, 22 (2005): 1072.
[8] K. Lied, Avalanche studies and model validation in Europe, Avalanche studies and model validation in
Europe, European research project SATSIE (EU Contract no. EVG1-CT2002-00059), 2006.
[9] D.A. Noever, Himalayan sandpiles, Physical Review E, 47(1993): 724.
232 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-232

Hazardous Chemicals Accident Prediction


Based on Accident State Vector Using
Multimodal Data
Kang-Wei LIU , Jian-Hua WAN a and Zhong-Zhi HAN a
a
School of Geosciences, China University of Petroleum, Qingdao 266580 China
b
Sinopec Safety Engineering Institute, Qingdao, Shandong, 266071 China

Abstract. Hazardous chemicals industry is a high-risk industry, all kinds of explo-


sions, fires, leaks and poisoning incidents is occurred occasio1nally. So it is partic-
ularly important to forecast for hazardous chemical accidents and develop appro-
priate safety measures. In view of the analysis and summary of previous methods,
an improved Hazardous Chemicals Accident Prediction method is proposed based
on accident state vector in this paper. It defines the accident state vector using
Multi-modal Data such as authoritative data, accident report, webpage, image, vid-
eo, speech, etc. The Multi-modal Data is collected by web crawler which is built
by open-source tools. The web crawler is an Internet bot which systematically
browses the known hazardous chemical accident website, for the purpose of col-
lecting Multi-modal accident data. As mentioned before, the Multi-modal Data is
Multi format. In order to define the accident state vector easily, we divide the Mul-
ti-modal data into three dimensions based on the principle of accident causes. Re-
spectively is the human factors, physical state factors, environmental factors. Ac-
cording to the geometrical distribution characteristics of support vector, it can be
selected from the incremental samples that the sample of support vectors most
likely to become forming a boundary vector set by adopting vector distance pre-
extraction method, on which support vector training and accident prediction model
build. It ensures the validity of predictive models due to various factors of the
cause of the accident are fully considered by the accident state vector and ad-
vantages of support vector machines in high-dimensional, multi-factor, large sam-
ple datasets machine learning are exhibited. Sample experimental verification from
the mastered accident of hazardous chemicals has showed that hazardous chemical
accident prediction method proposed in this paper can effectively accumulate ac-
cident history information, possess higher learning speed and be positive signifi-
cance for the safe development of hazardous chemicals industry.

Keywords. hazardous chemical accidents, Support Vector Machine, accident pre-


diction, accident state vector

Introduction

Hazardous chemicals industry represented by petrochemical industry belongs to high


risk industry, which has some perilous characteristics of high temperature and high
pressure, inflammable and explosive, poisonous and harmful, continuous operation,
long chain side wide, etc. At present, the safety production situation of hazardous

1
Corresponding Author: Kang-Wei LIU, Engineer of Sinopec Safety Engineering Institute, No339,
Songling Road, Qingdao, Shandong, China ; E-mail: liukw.qday@sinopec.com.
K.-W. Liu et al. / Hazardous Chemicals Accident Prediction Based on Accident State Vector 233

chemical is very grim with all kinds of explosion, fire, leakage and poisoning accidents
occurred at times. According to statistics, more than 96000 chemical enterprises in
China, of which dangerous chemicals production enterprises are more than 24000 , the
species of chemicals are more than 100000,but more than 4600 chemical accidents
have occurred nearly a decade. As the device of large-scale, intensive, material and
economic loss will occur when any accident happens, and especially death and disabil-
ity loss will lead to health life loss. Therefore, it is particularly important to forecast for
hazardous chemical accidents and develop appropriate safety measures on this basis.
Accident prediction is based on the known information and data, which forecasts
and predict for the security of forecasting object, as shown in Figure 1. Accident pre-
diction method has become a hot topic of scholars gradually as the change trend and
security hidden danger of the accident can be analyzed through the method in recent
years. According to incomplete statistics, all kinds of forecasting method was more
than 300 now, and the development of modern forecasting methods are often accompa-
nied by cross-analysis and mutual penetration of all kinds of forecasting methods, so it
is difficult to classify them absolutely. The current common accident prediction method
can be summarized into 6 types of situational analysis method, regression prediction
method, time prediction method, Markova chain prediction method, gray prediction
method and the nonlinear prediction method. The establishment and algorithm im-
provement of model often tend to be an emphasis in the traditional accident prediction
method and the collection and carding of the prior accident data will be an overlook
frequently. Limited by difficulty of priori data collection and complexity of model, the
accident prediction models are usually based on number of factors of strong causal re-
lationship artificially to hazardous chemical accidents ,which include the number of
accident ,death toll and the amount and type of hazardous chemical, then, leading to
incomplete and inaccurate of the accident forecasting result ultimately.
Accident
Prior data
prediction
Of accidents
model
Figure 1. Establishment way of the Accident prediction model
Support Vector Machine (SVM) is developed by Vapnik and co-workers[1] It is
an excellent method of machine learning. SVM have empirically been shown to give
good generalization performance on a wide variety of problems. SVM is a kind of im-
plementation way of statistical learning theory, which is not only the pursuit of accura-
cy on the training sample, but also the consideration of complexity of the learning
space on the basis of the training sample accuracy, namely, it adopted a compromise
between spatial complexity and sample learning precision so that the resulting models
for unknown samples possess good generalization ability.
In view of the analysis and summary of previous methods, an improved Hazardous
Chemicals Accident Prediction method is proposed based on Support Vector Machine
in this paper. It defines the accident state vector from three dimensions of the human
factors, physical state factors, environmental factors based on the principle of accident
causes. According to the geometrical distribution characteristics of support vector, it
can be selected from the incremental samples that the sample of support vectors most
likely to become forming a boundary vector set by adopting vector distance pre-
extraction method, on which support vector training and accident prediction model
build. It ensures the validity of predictive models due to various factors of the cause of
234 K.-W. Liu et al. / Hazardous Chemicals Accident Prediction Based on Accident State Vector

the accident are fully considered by the accident state vector and advantages of support
vector machines in high-dimensional, multi-factor, large sample datasets machine
learning are exhibited.

1. Overview of SVM

The core of SVM is finding a hyperplane that separates a set of positive examples from
a set of negative examples with maximum margin[1,2,3]. The training of a Support
Vector Machine can be reduced to maximize a convex quadratic program to linear con-
straints. Given a training sample:
{( xi, yi )| i=1,…,l; xi Rn yi {+1, -1}},
For the condition of Linear Separable: The goal of SVM is to find a hyperplane
<w, x> + b = 0
Which divides the sample set exactly. But there are always not only one hyper-
plane. The hyperplane which has the largest margin of the two kinds of samples - the
optimum classification hyperplane - attains the best capacity of spread. The optimum
hyperplane is only determined by samples closest to it and has no responsibility on
other samples. These samples are so called support vectors. This is also the origin of
“support vector”[4,5,6,7].

2. The Definition of Accident State Vector

The accident causation theory is used to illustrate the causes of accidents, exploring
process and accident consequences, so the occurrence and development of accident
phenomenon can be analyzed definitely. It is accident mechanism and model extracted
from the essence of a large number of typical accident, which reflects the regularity of
the accident, provides a scientific and complete basis in theory for the accident predic-
tion and prevention, besides the improvement of the safety management work owing to
the capacity for quantitative and qualitative analysis of accident cause
In accordance with the accident causation theory, the insecure elements of human
beings, the insecure status of objects and insecure impact of environment can all lead to
the occurrence of accidents, so the accident can be described as three categories of sub-
jective evaluation indicator (human factors), objective inherent indicator (physical fac-
tors), environmental indicator (environment factors), as shown in Figure 2.

P D E

Figure 2. Accident causation theory


K.-W. Liu et al. / Hazardous Chemicals Accident Prediction Based on Accident State Vector 235

(1) Subjective evaluation indicator (human factors)


Subjective evaluation indicator are judged and scored regularly by the enterprise,
assuming that the number of subjective evaluation indicator is m, then, the subjective
evaluation indicator can be represented as an m dimensional vector P (People),
P = {P1, P2, P3,̖̖Pm}
Subjective evaluation indicator are mainly involves acquisition of some safety in-
dicators impossible to quantify or unable to automatically extract such as the question
of "Are the training of special equipment operation and maintenance in place?" or
"whether the security regulatory behavior in place or not? " etc. these indicators need to
be evaluated and scored by the enterprise personnel regularly
(2) Objective inherent indicator (physical factors)
Objective inherent indicator refers to the enterprise inherent risk level, assuming
that the number of objective inherent indicator is n, then, the objective inherent indica-
tor can be expressed as an n dimensional vector D (Device)
D ={D1, D2, D3,̖̖Dn}
Objective inherent indicator can be obtained automatically, for instance, "chemi-
cals production", "number of major hazard installation", "fire and explosion indicator
of hazardous substances", "chemical material toxicity indicator", etc.
(3) Environmental indicator (environment factors)
Local climate and weather, geography and geological environment, frequency of
natural disaster, regulation level of government and social events are usually included
in environmental indicator. In a word, all not classified as former two kinds pertain to
environmental indicator in order to meet the requirement of big data fault tolerance.
These indicators should be corresponding to a t dimensional vector E finally
E={E1, E2, E3,̖̖Et}
In conclusion, accident state vector can be defined as follows:
accident state vector A = { P, D, E }
Wherein: Human vector P={P1, P2, P3,̖̖Pm},Physical state vector D={D1,
D2, D3,̖̖Dn}, Environmental vector E ={E1, E2, E3,̖̖Et}.

3. Training Algorithm Based on Accident State Vector

Hyperplane can be established and predicted by Support Vector Machine (SVM)


through the learning of accident state vector, and the unknown accident state vector can
be forecasted via the hyperplane, thus forming the accident prediction model[9,10]. As
mentioned above, not all the vector works for the establishment of prediction hyper-
plane, but only a small amount of training sample called support vector function when
training and learning via SVM, which distributed to neighborhood of hyperplane in
geometry position[11,12]. So we should choose one which may become a support vec-
tor samples as far as possible for studying. Therefore, this paper presents Support Vec-
tor Machine (SVM) training algorithm based on accident state vector (ASV-SVM algo-
rithm).
We can descript the incremental learning algorithm with support vector machine as
follows:
Historical sample set (M), incremental sample set (N), Suppose that M and N satis-
fy  I , : is the initial SVM classifier and is corresponding support vector
236 K.-W. Liu et al. / Hazardous Chemicals Accident Prediction Based on Accident State Vector

set of M. Obviously, Ž , the learning target is to find the classifier : and cor-
responding support vector set * of * .
Based on the geometric character of support vector, determining whether one sam-
ple can transfer to support vector should consider two complications: One is the dis-
tance between the sample and the hyperplane; the other is the distance between this
sample and the center of this kind of samples [13,14,17]. So we can do our best to se-
lect the samples likely to become support vectors as the newly training set. There may
be samples which will become support vector in  and . Select samples which
are close to separating hyperplane and between the class center and hyperplane as new-
ly-increased samples. Select samples whose hyperplane-distance is less than center
plane distance form edge sample set T. Set * * as the final training set of in-
cremental learning.

4. Experimental Results

We apply this algorithm into establishment of the model for the prediction of hazardous
chemicals accidents. We compare the ASV-SVM algorithm with traditional SVM
learning algorithm and KNN k-Nearest Neighbor algorithm. Simple description of
three algorithms is as follows:
Classical SVM algorithm: This is traditional SVM algorithm. The algorithm com-
bines original samples and newly-increased samples, and does the learning again for all
of the training samples.
Classical KNN algorithm: KNN is a memory-based method. Prediction on a new
instance is preformed using the labels of similar instances of the training set.
ASV-SVM algorithm: Using ASV-SVM algorithm which select support vector
based on vector-distance for incremental learning.
In this experiment, the accident state vector is defined by Multimodal data. The
method is as follows:
(1)Collect and maintain the data of 619 typical hazardous chemicals accidents oc-
curring within the last ten years, including accident report, accident cause analysis,
accident consequence and influence.
(2)Crawling related data of mentioned accidents using web crawler which build by
open-source tools. The web crawler is an internet bot which systematically browses the
known hazardous chemical accident website, for the purpose of collecting Multi-modal
accident data. Such as the weather condition, geographical situation, population density
etc. when the accident happened.
(3)In order to do a good job of comparative test, we collected two to three sets of
non-accident status data on other times at the place where the accident occurred. And
1288 non accident state data are formed by this way.
(4)The data collected above is Multi format such as authoritative data, accident re-
port, webpage, image, video, speech, etc. In order to define the accident state vector
easily, we divide the Multi-modal data into three dimensions based on the principle of
accident causes. We use the open source big data tools, with the manual screening, the
data were structured to deal with. We add as many attribute labels as possible to each
data, so that these non-structural data become structured data. To be frank, for unstruc-
K.-W. Liu et al. / Hazardous Chemicals Accident Prediction Based on Accident State Vector 237

tured data such as video, image, most of which is done by artificial recognition, the
accuracy and availability of automatic recognition of the machine is not satisfactory.
(5)All the Multimodal data will have a lot of attribute labels after structured pro-
cess. Categorize these attribute labels by three dimensions based on the principle of
accident causes. Respectively is the human factors, physical state factors, environmen-
tal factors.
(6)Determine the dimension of the accident state vector of 265 dimensions, and
each attribute label represents one dimension, including human vector P (185 dimen-
sions), divided into leadership and safety culture, safety, process safety information for
process control, inspection and human performance, the state vector D (49 dimensions),
divided into Fire index of hazardous substances, explosive index, toxicity index, pro-
cess index, equipment index, safety facility index, etc. The environment vector (31
dimensions), by the meteorological index (We) and geography information index (Gi).
(7)Transfer accident state and the non-accident state into the vector form of acci-
dent as follows[15]:
<label> <index1>: <value1> <index2>: <value2> ̖̖ <indexn>: <valuen>
Label is result of the accident state. 1 is accident state.-1 is non-accident state. In-
dex is attribute label. Value is the weight or description of attribute label. And n is
equal to 256.
(8)And from which 1000 vectors are selected as test sets, 1000 vectors are used as
the initial training set, and the remaining 907 vectors are randomly divided into 3 sets,
as an incremental set.
After the pretreatment, Accident information are transferred to the form of vectors.
Then we use the three algorithms do the learning. All of the algorithms are carried out
in the LibSvm-mat-5.20 saddlebag[16]. The platform of the experiment is E7-4830V2,
operating system is Windows server 2012. In the experiment, kernel is REF function,
C=1. The results of the experiment are shown in table 1 3.
Table 1. Classical SVM algorithm experiment results

Classical SVM algorithm


Incremental set
Test set
number
samples numbers time/s Precision

Initial set 1000 1000 185.1 -----

set1 247 1247 243.6 93.6%

set2 438 1685 528.9 93.1%

set3 222 1907 612.8 92.7%

Table 2. KNN algorithm experiment results

Incremental Classical KNN algorithm


Test set
set number
samples numbers time/s Precision/%

Initial set 1000 1000 412.7

set1 247 1247 514.6 78.9%

set2 438 1685 702.6 79.4%

set3 222 1907 858.1 83.2%


238 K.-W. Liu et al. / Hazardous Chemicals Accident Prediction Based on Accident State Vector

Table 3. ASV-SVM algorithm experiment results

ASV-SVM algorithm
Incremental
Test set
set number
samples numbers time/s Precision

Initial set 1000 1000 185.1 ---

set1 247 453 163.2 92.1%

set2 438 737 348.1 94.2%

set3 222 805 393.3 94.1%


We can see that only in the initial training the ASV-SVM algorithm is similar to
traditional SVM algorithm and KNN algorithm. Other incremental learning process
performance is better than other classical algorithm. The number of training samples is
reduced and the training time is shortened when the accuracy rate is not lost. As in Fig-
ure 3 and Figure 4 below
ASV-SVM algorithm is prior to the traditional SVM learning algorithm and KNN
algorithm on training samples numbers. In the process of incremental learning, the
ASV-SVM algorithm screens the newly-increased and original samples effectively,
thus reduce the number of training samples on the premise of reserving the effective
information of samples. The training time of the ASV-SVM algorithm decreases great-
ly contrast to the Classical SVM and KNN learning algorithm. The decreasing of the
number of the training samples can well control the scale of the incremental learning,
thus shorten the training time on the premise of not losing useful information. The pre-
cision of the ASV-SVM algorithm is a little better to the SVM learning algorithm, and
great better than the KNN algorithm. As is shown in Figure 5.

Figure 3. Contrast of training samples

Figure 4. Contrast of training time


K.-W. Liu et al. / Hazardous Chemicals Accident Prediction Based on Accident State Vector 239

Figure 5. Contrast of classification precision

5. Conclusion

Hazardous chemicals industry is a high risk industry. Explosion, fire, leakage and poi-
soning accidents occur frequently. This paper analyzes the influence of occurrence of
hazardous chemicals accidents form human factors, physical factor and environmental
factor, and defines the accident state vector from three dimensions. In view of the anal-
ysis and summary of previous methods, an improved Hazardous Chemicals Accident
Prediction method is proposed based on Accident State Vector (ASV-SVM). The high
dimension vector is used to define the accident state, and the most possible factors are
considered. Using improved support vector machine learning algorithm (ASV-SVM
algorithm), an accident prediction model is established by accident state vector. A
sample test of the hazardous chemical accident shows that the method proposed this
paper can differentiate accident state accurately and efficiently, and make a positive
significance on accident prediction of hazardous chemicals.

Acknowledgement

This work was supported by the National Natural Science Foundation of China (Grant
No. 31201133).

References

[1] V.N. Vapnik, The Nature of Statistical Learning Theory, Springer-verlag, New York, (2000),332-350..
[2] N. Cristianini, J. Shawe-Talor. An Introduction to Support Vector Machines and Other Kernel-based
Learning Methods. Cambridge University Press, (2004), 543-566
[3] R. Xiao, J.C. Wang, Z.X. Sun An Incremental SVM Learning Algorithm. Journal of Nan Jing Univer-
sity (Natural Sciences) 38(2002), 152-157㧚
[4] N Ahmed, S Huan, K Liu, K Sung, Incremental learning with support vector machines. The International
Joint Conference on Artificial Intelligence, Morgan Kaufmann publishers, 10 1999), 352-356.
[5] P. Mitra, C. A. Murthy, S. K. Pal, Data Condensation in Large Databases by Incremental Learning with
Support Vector Machines. Proceeding of International Conference on Pattern Recognition, (2000), 2708-
2711.
240 K.-W. Liu et al. / Hazardous Chemicals Accident Prediction Based on Accident State Vector

[6] C. Domeniconi and D. Gunopulos Incremental Support Vector Machine Construction. Proceeding of
IEEE International Conference on Data Mining series (ICDM ), (2001),589-592.
[7] G. Cauwenberghs , T. Poggio, Incremental and Decremental Support Vector Machine Learning. Ad-
vances in Neural Information Processing Systems,(2000),122-127.
[8] S. Katagiri , S. Abe, Selecting Support Vector Candidates for Incremental Training. Proceeding of IEEE
International Conference on Systems, Man, and Cybernetics (SMC), (2005),1258-1263,.
[9] D. M. J. Tax, R. P. W. Duin, Outliers and Data Descriptions. Proceeding of Seventh Annual Conference
of the Advanced School for Computing and Imaging, (2001),234-241.
[10] L.M. Manevitz and M. Yousef, One-class SVMs for document classification. Journal of Machine Learn-
ing Research, 2 (2001), 139-154.
[11] R. Debnath, H. Takahashi, An improved working set selection method for SVM decomposition method.
Proceeding of IEEE International Conference Intelligence Systems, Varna, Bulgaria, 21-24(2004), 520-
523.
[12] R. Debnath, M. Muramatsu, H.Takahashi, An Efficient Support Vector Machine Learning Method with
Second-Order Cone Programming for Large-Scale Problems. Applied Intelligence, 23(2005), 219-239.
[13] W D.Zhou, L.Zhang, L.C.Jiao, An Analysis of SVMs Generalization Performance. Acta Electronica
Sinica. 29(2001),590-594
[14] J. Heaton, Net-Robot Java programme guide. Publishing House of Electronics Industry. 22(2002) 1-
141.
[15] C.W. Hsu C.J. Lin A simple decomposition method for support vector machines. Machine Learning,
46(2002) 291–314.
[16] C.C. Chang , C. Lin, LIBSVM : a library for support vector machines, 2001. Software available at
http://www.csie.ntu.edu.tw/~cjlin/libsvm
[17] C.H. Li, K.W. Liu, H.X. Wang. The incremental learning algorithm with support vector machine based
on hyperplane distance, Applied Intelligence, 46(2009):145-152
Fuzzy Systems and Data Mining II 241
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-241

Regularized Level Set for Inhomogeneity


Segmentation
Guo-Qi Liu a,1 , Hai-Feng Li b
a
School of Computer and Information Engineering, Henan Normal University,
Xinxiang 453007, China
b
College of Mathematics and Information Science, Henan Normal University,
Xinxiang 453007, China

Abstract. Active contour model based on level set method is a popular


method for image segmentation. However, intensity inhomogeneity uni-
versally exists in images and it greatly influences image segmentation.
Local binary fitting model (LBF) is a effective method to cope with
inhomogeneous intensity. However, the energy function of LBF is non-
convex and it costs much computational cost. Otherwise, LBF could not
preserve the weak edges. The non-convexity always causes the contour
suffer from local minimum, and the computational cost is large. In or-
der to cope with these shortcomings, we introduce a regularized mini-
mization for improved LBF model. In proposed model, the edge infor-
mation is integrated into the energy functional. The energy function-
al of improved LBF model is convex, and the local minimum is avoid.
Furthermore, some fast optimal method can be utilized. In this paper,
the regularized method is utilized to make the contour converge to min-
imization. Experimental results confirm that proposed method attains
a similar segmentation effect with the LBF but costs less computation
times.
Keywords. intensity inhomogeneity, level set, global minimization,
computation times

Introduction

Image segmentation plays an important role in image processing and computer


vision. Level set method [1-4] is a popular algorithm with the competitive advan-
tages in computational robustness and flexibility of topology changes. In general,
there are two types of level set models. One is the level set method based on
global information and the other is based on the local information. In the models
based on global information, Chan and Vese (C-V) [5] is one of the most pop-
ular methods, whose foreground and background usually have obvious different
intensity means.

1 Corresponding Author: Guoqi Liu; School of Computer and Information Engineering, Henan

Normal University; XinXiang, 453007; E-mail: liuguoqi080408@163.com.


242 G.-Q. Liu and H.-F. Li / Regularized Level Set for Inhomogeneity Segmentation

However, C-V model has difficulty in dealing with intensity inhomogeneity


or intensity non-uniformity. Intensity inhomogeneity [6-11] degrades partition ef-
fectiveness and leads to inaccurate target location, which always appears in some
medical images. Therefore, intensity inhomogeneity segmentation method based
on level set or active contour have sprung up to in the past years. In the models
based on local information, Li has derived a local binary fitting (LBF)[6] model.
By incorporating the local image information into the proposed model, the images
with intensity inhomogeneity can be effectively segmented. However, the LBF
could not keep the weak edges and the computational cost is relatively large. Some
researchers also proposed similar methods to improve the performances in dealing
with inhomogeneous intensity, such as Zhang [10]. Generally, the models based on
local information has better performances in dealing with inhomogeneous inten-
sity because the local intensity inhomogeneity can be decreased by local filter. In
order to improve segmentation efficiency and keep true target contour, we extend
the version of LBF. Our paper is organized as follows. In Section 1, we review
the background. In Section 2, a method is proposed to enhance the former LBF
version. Section 3 shows the experimental results and makes comparisons with
LBF. Section 4 makes a summary of this paper.

1. Background

1.1. The C-V model

The energy functional of C-V model is defined as follows:


  
E(C, c1 , c2 ) = μ ds + |I − c1 |2 dx + |I − c2 |2 dx (1)
C Ω1 Ω2

where Ω1 and Ω2 are the region of foreground and background respectively,


whose intensity means are c1 and c2 . C represents the zero level curve and I is
image intensity. Moreover, the first term is the curve length to regularize with a
weight μ and the last two terms are data fitting terms. The Eq. (1) depends on
curve C and intensity means c1 and c2 , which can be solved by variation method
and gradient descent equation. Therefore, by representing contour with level set
φ, the above equation is computed as follows:

∂φ
= −((I − c1 )2 − (I − c2 )2 + μK)δ (2)
∂t

where K is the curvature of curve C and δ represents the Dirac function


with parameter .

1.2. The local binary fitting model (LBF)

A data fitting energy is defined in LBF [6], which can be locally approximated the
image intensities on the two sides of the contour. This energy is then incorporated
into a variational level set formulation, and a curve evolution equation is derived
G.-Q. Liu and H.-F. Li / Regularized Level Set for Inhomogeneity Segmentation 243

for energy minimization. Intensity information in local regions is extracted to


guide the motion of the contour, which thereby enables LBF model to cope with
intensity inhomogeneity. The local binary fitting energy is defined as follows:


2 
e= λi ei (x)dx (3)
i=1

 
2
e1 (x) = Kσ (y−x)|I(y)−f1 (x)| Hdy, e2 (x) = Kσ (y−x)|I(y)−f2 (x)|2 (1−Hdy
Ω Ω
(4)
Kσ is served as a kernel function. f1 (x) and f2 (x) are computed as follows:

Kσ (x) ∗ I(x)H Kσ (x) ∗ I(x)(1 − H)


f1 (x) = f2 (x) = (5)
Kσ (x) ∗ H Kσ (x) ∗ (1 − H)

The total energy functional E of LBF is obtained by integrating the length


regularization term in the above energy. Length regularization term is defined as
follows:
 
L(φ) = ds = |∇φ|dx (6)
C

Then the evolution equation of level set function φ is computed as follows:

∂φ ∂E ∇φ
=− = −δ (φ)(e1 − e2 ) + λδ (φ)div( ) (7)
∂t ∂φ |∇φ|

where ∇ is the gradient operator, div(.) is divergence operator.

2. Regularized method for improved LBF

2.1. Improved LBF

The energy functional of LBF is defined as follows:

E LBF (φ) = e(φ) + λL(φ) (8)

The first term is a data fitting term, which is written as


 
e(φ) = e1 H(φ)dx + e2 (1 − H(φ))dx (9)

Similar with [12], the evolution equation of LBF is also computed by mini-
mizing the following energy functional:
 
E = λ|∇φ|dx + (e2 − e1 )φdx (10)
244 G.-Q. Liu and H.-F. Li / Regularized Level Set for Inhomogeneity Segmentation

Because of the non-convexity of the above energy, we propose to minimize


the following improved energy functional:
 
E = λg|∇u|dx + (e2 − e1 )udx (11)

1
where g = 1+|∇I| 2 is the edge stoping function, u is the character function

and 0 ≤ u ≤ 1. Since the above energy is convex but with constrained condition
minimization problem, the unconstrained and convex energy is obtained based on
introducing an exact penalty function:


E (u, f1 , f2 , λ, α) = λT Vg (u) + (e2 − e1 )u + αpf (u)dx (12)
Ω

where the parameter α is constant, and pf (ξ) := max{0, 2|ξ − 12 | − 1} is a


penalty function.

2.2. Regularized minimization algorithm for improved LBF

In order to obtain the solution of the energy functional (13), the regularized
method is utilized in this letter. By introducing a variable v, the regularized
energy functional is computed as follows:

 μ
E (u, v, f1 , f2 , λ, α) = λT Vg (u) + u − v F + (e2 − e1 )v + αpf (v)dx (13)
2 Ω

where · F denotes the Frobenius norm, μ is a constant. The following tasks


are to compute the Eq. (13). It is to obtain the iteration solution of u by fixing
v, f1 and f2 . A fast numerical minimization based on the dual formulation of the
TV energy is presented in [12-15]. According to [13], the solution of u is given by

1
u=v− div p (14)
μ

p = (p1 , p2 ) is a dual variable, it is computed by

1 1
g(x)∇( div p − v) − |∇( div p − v)|p = 0. (15)
μ μ

The above equation can be solved by a fixed point method, which is given in [13].
Similarly, v is obtained by minimizing the following equation: v is updated by

μ
v = argmin u − v F + (e2 − e1 )v + αpf (v)dx (16)
v 2 Ω

Finally, v is iteratively computed as follows:

λ
v = min(max(u − (e2 − e1 ), 0), 1) (17)
μ

u and v are computed iteratively. The algorithm is following:


G.-Q. Liu and H.-F. Li / Regularized Level Set for Inhomogeneity Segmentation 245

Algorithm 1 Regularized algorithm for improved LBF.


Input: initial value u0 , v0 , B, and p;
for k = 0 to maximum number of iterations do
for i = 1 to B do
uki = vk − μ1 div p
obtain p based on g(x)∇( μ1 div p − v) − |∇( μ1 div p − v)|p = 0.
end for
uk = conv(Gaussian, ukB )
σ (x)∗I(x)u
f1 (x) = KK σ (x)∗u)
and f2 (x) = KσK(x)∗I(x)(1−u)
σ (x)∗(1−u)

vk = min(max(uk − μλ (e2 − e1 ), 0), 1)


if uk+1 − uk F < δ then
return uk ;
end if
end for
Output:
u = 0.5;

Figure 1. The tested original images

Figure 2. Segmentation results of inhomogeneous intensity in medical images.

3. Experimental results and analysis

We have tested our algorithm on images with inhomogeneous intensity. Figure 1


demonstrates three images and initiation curves. The left image is 131 ∗ 103, the
middle image 110 ∗ 111 and the right image is 96 ∗ 127.
246 G.-Q. Liu and H.-F. Li / Regularized Level Set for Inhomogeneity Segmentation

Figure 3. Segmentation results of object with weak edges.

Table 1. Quantitative evaluation of the cost times for typical images


1 2 3
LBF 1.928899 2.276515 4.688177
LIF 1.256878 1.753792 3.798538
P roposedmethod 0.898745 1.356784 2.897653

The intensity of foreground and background in these three images is inhomo-


geneity. Segmentation results are provided from Figure 2 to Figure 3 in which the
first column is simulated by Lis algorithm and the second column is computed
by our method. In Figure 2, the medical images are tested. The results with LBF
and proposed method are similar.
Furthermore, proposed method utilizes the gradient information of image
edge. Thus, proposed method has better performance in keeping weak edges com-
pared with LBF model. As shown in Figure 3, the gray image is tested and some
of the edges in object is weak. LBF suffers from weak edges leakage and the strong
edges are extracted. While proposed method converges to the weak edges, since
the edge information g is integrated into the proposed energy functional and it
could preserve weak edges.
On the other hand, proposed method is more efficient compared with LBF
and LIF. All the experiments are conducted by using MATLAB R2010a on the
PC with Intel Core (3.3*4GHz) and 8GB memory under Windows 7 profession-
al without any particular code optimisation. In algorithm 1, proposed method
based on image decomposition iterates to obtain u, which can decrease the evo-
lution times. The computational times are shown in Table 1. From the Table,
proposed method costs less times in converging to objects compared with LBF.
Because proposed algorithm iterates several times to obtain u before iterating v
and this process enhances the image non-smooth component, which causes the
total number of iterations decreasing.

4. Conclusions

In this paper, we first introduce the CV model and the LBF model. Then, we
propose our model to improve the efficiency of contour evolution. There are two
contributions in ours. One is that an energy function with edge information is
added into LBF, the other is to introduce a fast algorithm to obtain the solu-
tion. Experimental results confirm that the proposed method can obtain similar
segmentation and keep weak edges. Meanwhile, proposed method obtains faster
evolution of contour.
G.-Q. Liu and H.-F. Li / Regularized Level Set for Inhomogeneity Segmentation 247

Acknowledgements

This work is jointly supported by National Natural Science Foundation of China


(No. U1404603).

References

[1] V. Caselles, R. Kimmel, and G. Sapiro. Geodesic active contours, International journal
of computer vision, 22(1997), 61-79.
[2] S. Kichenassamy, A. Kumar, P. Olver, A. Tannenbaum, and A. Yezzi: Gradient flows and
geometric active contour models, Proc. 5th Int. Conf. Comput. Vis., 1995, 810-815.
[3] R. Kimmel, A. Amir, and A. Bruckstein. Finding shortest paths on surfaces using level set
propagation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(1995),
635-640.
[4] R. Malladi, J. A. Sethian, and B. C.Vemuri. Shape modeling with front propagation: A
level set approach, IEEE Transactions on Pattern Analysis and Machine Intelligence,
17(1995), 158-175.
[5] T. han and L. Vese. Active contours without edges, IEEE Transactions on Image Pro-
cessing, 10(2) (2001), 266-277
[6] C. Li, C. Kao, J. C. Gore, and Z. Ding. Minimization of region-scalable fitting energy for
image segmentation, IEEE Transactions on Image Processing, 17(2008), 1940-1949.
[7] C. Li, Huang R., Ding Z., Gatenby C., Metaxas DN., Gore JC. A level set method for
image segmentation in the presence of intensity inhomogeneities with application to MRI,
IEEE Transactions on Image Processing, 20(7) (2011), 2007-2016.
[8] X.F. Wang, H. Min. A level set based segmentation method for images with intensity in-
homogeneity, Emerging Intelligent Computing Technology and Applications, with Aspects
of Artificial Intelligence, 2009, 670-679.
[9] F.F. Dong, Z.S. Chen and J.W. Wang, A new level set method for inhomogeneous image
segmentation, Image and Vision Computing, 31(2013), 809-822.
[10] K.H. Zhang, H.H. Song and L. Zhang, Active contours driven by local lmage litting energy,
Pattern recognition, 43(2010), 1199-1206.
[11] C. Li, C. Xu, C. Gui, MD. Fox, Level set evolution without re-initialization: A new vari-
ational formulation, Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005,
430-436.
[12] A. Chambolle, An algorithm for total variation minimization and applications, Journal of
Mathematical Imaging and Vision, 20(2004), 89-97.
[13] X. Bresson, S. Esedoglu, P. Vandergheynst, et al. Fast global minimization of the active
contour /snake model, Journal of Mathematical Imaging and Vision, 28(2007), 151-167.
[14] E.S. Brown, T.F. Chan, X. Bresson. Completely convex formulation of the Chan-Vese
image segmentation model, International journal of computer vision, 98(2012), 103-121.
[15] C. Li, R. Huang, Z. Ding, C. Gatenby, D. Metaxas, J. Gore, A variational level set approach
to segmentation and bais correction of images with intensity inhomogeneity, Processing
of medical image computing and computer aided intervention (MICCAI), 2008, Part II,
LNCS 5242, 1083-1091.
248 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-248

Exploring the Non-Trivial Knowledge


Implicit in Test Instance to Fully
Represent Unrestricted Bayesian
Classifier
Mei-Hui LI, Li-Min WANG 1
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of
Education, Jilin University, ChangChun City 130012, P. R. China

Abstract. Restricted Bayesian classifiers have demonstrated remarkable classifica-


tion performance for data mining. However, the restricted network structure makes
it impossible to represent the Markov blanket of class variable, which corresponds
to the optimal classifier. And the test instances are not fully utilized, the final deci-
sion thus may be biased. In this paper, a novel unrestricted k-dependence classifier
is proposed based on identifying the Markov blanket of the class variable. Further-
more, the algorithm also adopts local learning to build local structure, which can
represent the evidence introduced by test instance. 15 datasets are selected from the
UCI machine learning repository for zero-one loss comparison. The experimental
results indicate that the unrestricted Bayesian classifier can achieve good trade-off
between structure complexity and prediction performance.
Keywords. Data mining, Unrestricted Bayesian classifier, Local learning, Markov
blanket

Introduction

In the 1990s, Judea Pearl first talked about Bayesian network [1], which is a kind of infer-
ence network based on probabilistic uncertainty. A particularly restricted model, Naive
Bayes (NB), is a powerful classification technique. Many restricted Bayesian classifiers
[2] have been set out to extend the dependence of NB, such as Tree-augmented Naive
Bayes (TAN) [3] and k-dependence Bayesian classifier (KDB) [4].
Madden [2] finds that unrestricted Bayesian classifiers [5] learned using likelihood-
based scores are comparable to TAN. In this paper, a novel unrestricted k-dependence
Bayesian classifier (UKDB) is proposed to build from the perspective of Markov blanket.
Local mutual information and conditional local mutual information are applied to build
the local graph structure UKDBL for each test instance. UKDBL can be considered a
complementary part of UKDBG , which is learned from training set.
1 CorrespondingAuthor: LiMin Wang, Key Laboratory of Symbolic Computation and Knowledge
Engineering of Ministry of Education, Jilin University, ChangChun City 130012, P. R. China; E-mail:
wanglim@jlu.edu.cn.
M.-H. Li and L.-M. Wang / Exploring the Non-Trivial Knowledge Implicit in Test Instance 249

Figure 1. Three classical Bayesian classifiers.

The rest of the paper is organized as follows. Section 1 briefly introduces information
theory and Markov blanket. Section 2 introduces related Bayesian classifiers. Section
3 presents the learning procedure of UKDB and basic idea of local learning. Section 4
provides the experimental results and comparisons. Section 5 concludes the findings.

1. Related Theory Knowledge

1.1. Information Theory

In the 1940s, Claude E. Shannon introduced information theory, the theoretical basis of
modern digital communication. Many commonly used measures are based on the infor-
mation theory and used in a variety of classification algorithms.
The mutual information (MI) [6] I(X; Y ) can measure the reduction of uncertainty
about variable X when all the values of variable Y are known. Conditional mutual in-
formation (CMI) [6] I(X; Y |Z) can measure the mutual dependence between X and Y .
Local mutual information (LMI) I(X; y) can measure the reduction about variable X
after observing that Y = y. Conditional local mutual information (CLMI) I(x; y|Z) can
measure the mutual dependence between two attribute values x and y.

1.2. Markov Blanket

Definition 1. [1] The Markov blanket (MB) for variable C is the set of nodes composed
of C’s parents Xpa , its children Xch , and its children’s parents Xcp . Suppose that X =
{Xpa , Xch , Xcp }, Markov blanket Bayesian classifiers approximate P (x, c) as follows,
P (c, x) = P (xpa )P (c|xpa )P (xcp |xpa , c)P (xch |xcp , xpa , c) (1)

Eq.(1) presents a more general case.The Markov blanket of C shields C from effects
of those attributes outside it and is the only knowledge needed to predict its behavior.

2. Bayesian Classifiers: from 0-dependence to k-dependence classifier

NB is the most restrictive probabilistic classification algorithm. The predictive attributes


are assumed to be conditionally independent, then

n
P (x, c) ∝ P (c) P (xi |c). (2)
i=1
250 M.-H. Li and L.-M. Wang / Exploring the Non-Trivial Knowledge Implicit in Test Instance

Table 1. DataSets for Experimental Study


No. Dataset # Instance Attribute Class
1 Mushrooms 8124 22 2
2 Thyroid 9169 29 20
3 Pendigits 10992 16 10
4 Sign 12546 8 3
5 Nursery 12960 8 5
6 Seer 18962 13 2
7 Magic 19020 10 2
8 Letter-recog 20000 16 26
9 Adult 48842 14 2
10 Shuttle 58000 9 7
11 Connect-4 67557 42 3
12 Waveform 100000 21 3
13 Localization 164860 5 11
14 Census-income 299285 41 2
15 Covtype 581012 54 7

Figure 1(a) graphically shows the structure of NB. NB is 0-dependence classifier.


The basic structure of TAN allows each attribute to have at most one parent attribute
apart from the class, then

n
P (x, c) ∝ P (c)P (xr |c) P (xi |c, xj(i) ), (3)
i=1,i=r

where Xr denotes the root node and {Xj(i) } = Pa(Xi )\C, for any i = r. An
example of TAN is shown in Figure 1(b).
KDB further relaxes NB’s independence assumption by allowing every attribute to
be conditioned on the class and, at most, k other attributes [4]. Then

n
P (c|x) ∝ P (c)P (x1 |c) P (xi |c, xi1 , · · · , xip ) (4)
i=2

where {Xi1 , · · · , Xip } are the parent attributes of Xi and p = min(i − 1, k). Figure
1(c) shows an example of KDB when k=2.

3. The UKDB Algorithm

UKDB can output two kinds of sub-classifiers, i.e., UKDBG and UKDBL , which de-
scribe the causal relationships implicated in training set and test instance, respectively.
UKDB uses I(Xi ; C) and I(Xi ; Xj |C) simultaneously to measure the comprehensive
effect of class C and other attributes (e.g., Xj ) on Xi .
The learning procedures of UKDBG are described as follows:
———————————————————————————————————
Algorithm 1 UKDBG
———————————————————————————————————
M.-H. Li and L.-M. Wang / Exploring the Non-Trivial Knowledge Implicit in Test Instance 251

Input: Pre-classified training set, DB, and the k value for the maximum allowable
degree of attribute dependence.
1. Let the global Bayesian classifier being constructed, UKDBG , begin with a single
class node C. Let the used attribute list S be empty.
2. Select k attributes {X1 , · · · , Xk } as Xpa that correspond to the maximum of
I(X1 , · · · , Xk ; C).
3. Add {X1 , · · · , Xk } to S. Add k nodes to UKDBG representing {X1 , · · · , Xk }
as the parents of C. Add k arcs from {X1 , · · · , Xk } to C in UKDBG .
4. Repeat until S includes all domain attributes
• Select
q attribute Xi that corresponds to the maximum value of I(Xi ; C) +
j=1 I(Xi , Xj |C), where Xi ∈/ S, Xj ∈ S and q = min(|S|, k).
• Add Xi to S. Add a node that represents Xi to UKDBG . Add an arc from C
to Xi . Add q arcs from q distinct attributes Xj in S to Xi .
5. Compute the conditional probability tables inferred by the structure of UKDBG
by using counts from DB, and output UKDBG .
———————————————————————————————————
The learning procedures of UKDBL are described as follows:
———————————————————————————————————
Algorithm 2 UKDBL
Input: Test instance (x1 , · · · , xn ), estimates of probability distributions on training
set and the k value for the maximum allowable degree of attribute dependence.
1. Let the local Bayesian classifier being constructed, UKDBL , begin with a single
class node C. Let the used attribute list S be empty.
2. Select k attributes {X1 , · · · , Xk } as Xpa that correspond to the maximum of
I(x1 , · · · , xk ; C).
3. Add {X1 , · · · , Xk } to S. Add k nodes to UKDBL representing {X1 , · · · , Xk }
as the parents of C. Add k arcs from {X1 , · · · , Xk } to C.
4. Repeat until S includes all domain attributes
• 
Select attribute Xi that corresponds to the maximum value of I(xi ; C) +
q
j=1 I(xi , xj |C), where Xi ∈
/ S, Xj ∈ S and q = min(|S|, k).
• Add Xi to S. Add a node that represents Xi to UKDBL . Add an arc from C
to Xi . Add q arcs from q distinct attributes Xj in S to Xi .
5. Compute the conditional probability tables inferred by the structure of UKDBL
by using counts from DB, and output UKDBL .
———————————————————————————————————
For UKDBG and UKDBL , estimate the conditional probabilities P̂G (cp |x) and
P̂L (cp |x) that instance x belongs to class cp (p = 1, 2, · · · , t), respectively. The class
label of x is determined by the average of both of the conditional probabilities.

P̂G (cp |x) + P̂L (cp |x)


c∗ = arg max . (5)
cp ∈C 2
252 M.-H. Li and L.-M. Wang / Exploring the Non-Trivial Knowledge Implicit in Test Instance

Table 2. Experimental Results of Average Zero-one Loss


Dataset NB TAN KDB UKDBG UKDBL UKDB
Mushrooms 0.020 0.000 0.000 0.000 0.001 0.000
Thyroid 0.111 0.072 0.071 0.075 0.093 0.074
Pendigits 0.118 0.032 0.029 0.028 0.028 0.019
Sign 0.359 0.276 0.254 0.243 0.302 0.247
Nursery 0.097 0.065 0.029 0.029 0.070 0.045
Seer 0.238 0.238 0.256 0.258 0.244 0.244
Magic 0.224 0.168 0.164 0.162 0.176 0.161
Letter-recog 0.253 0.130 0.099 0.088 0.130 0.080
Adult 0.158 0.138 0.138 0.135 0.132 0.130
Shuttle 0.004 0.002 0.001 0.001 0.001 0.001
Connect-4 0.278 0.235 0.228 0.219 0.248 0.228
Waveform 0.022 0.020 0.026 0.024 0.019 0.019
Localization 0.496 0.358 0.296 0.297 0.331 0.285
Census-income 0.237 0.064 0.051 0.050 0.061 0.050
Covtype 0.316 0.252 0.142 0.143 0.274 0.150

Table 3. W/D/L Comparison Results of Average Zero-one Loss on All DataSets


W/D/L NB TAN KDB UKDBG UKDBL
TAN 14/1/0
KDB 13/0/2 9/4/2
UKDBG 13/0/2 10/3/2 3/11/1
UKDBL 14/1/0 3/7/5 2/3/10 2/3/10
UKDB 14/1/0 11/4/0 5/8/2 5/9/1 12/3/0

4. Experiments and Results

In order to better verify the efficiency of the proposed UKDB, experiments have been
conducted on 15 datasets from the UCI machine learning repository [7]. Table 1 sum-
marizes the characteristics of each dataset. Table 2 presents for each dataset the average
zero-one loss. The following algorithms are compared:
• NB, standard Naive Bayes.
• TAN [8], Tree-augmented Naive Bayes applying incremental learning.
• KDB (k=2), standard k-dependence Bayesian classifier.
• UKDBG (Global UKDB, k=2), a variant UKDB describes global dependencies.
• UKDBL (Local UKDB, k=2), a variant UKDB describes local dependencies.
• UKDB (k=2), a combination of global UKDB and local UKDB.
Statistically a win/draw/loss record (W/D/L) is computed for each pair of competi-
tors A and B with regard to a performance measure M . The record represents the number
of datasets in which A respectively beats, loses to, or ties with B on M . Finally, related
algorithms are compared via one-tailed binomial sign test with a 95% confidence level.
Table 3 shows the W/D/L records respectively corresponding to average zero-one loss.
Dems̆ar [8] recommends the Friedman test [9] for comparisons of multiple algo-
rithms. For any pre-determined level α, the null hypothesis will be rejected if F > χ2α ,
which is the upper-tail critical value having t − 1 degrees of freedom. The critical value
M.-H. Li and L.-M. Wang / Exploring the Non-Trivial Knowledge Implicit in Test Instance 253

of χ2α for α = 0.05 is 9.49. The Friedman statistic for zero-one loss in our experiments
are 16.64. By comparing those results, we can get the following conclusions:
For different classifiers the average ranks of zero-one loss on all datasets are {N-
B(4.66), TAN(3.74), KDB(3.56), UKDBG (3.45), UKDBL (3.58), UKDB(2.01)}. UKD-
B and UKDBG performs the best among all classifiers in terms of zero-one loss. From
Table 3, UKDB has lower zero-one loss more often than other classifiers and the differ-
ences are significant. UKDBG also has relative advantages, however, the differences are
not significant. The performance of UKDBL is similar to that of TAN. UKDB can make
full use of the information that is supplied by the training sets and test instances. Thus,
performance robustness can be achieved.

5. Conclusion

The working mechanisms of NB, TAN and KDB were analysed and summarised. The
proposed algorithm, i.e., UKDB, applies local learning and Markov blanket to improve
the classification accuracy. Local learning makes the final model more flexible and
Markov blanket breaks the limitation of strict restriction for the parent variables.
15 datasets are selected from UCI machine learning repository by 10-fold cross val-
idation for zero-one loss comparison. Overall, findings reveal that UKDB model outper-
formed NB, TAN and KDB extraordinarily. To clarify the working mechanism of UKDB
more clearly, global UKDB and local UKDB, are also implemented and compared.

Acknowledgements

This work was supported by the National Science Foundation of China (Grant No.
61272209, 61300145) and the Postdoctoral Science Foundation of China (Grant No.
2013M530980), Agreement of Science & Technology Development Project, Jilin
Province (No. 20150101014JC).

References

[1] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kauf-
mann, Palo Alto, CA, 1988.
[2] M.G. Madden, On the classification performance of TAN and general Bayesian networks, Knowledge-
Based Systems, 22 (2009), 489–495.
[3] R.A. Josep, Incremental Learning of Tree Augmented Naive Bayes Classifiers, in Proceedings of the 8th
Ibero-American Conference on Artificial Intelligence, Seville, Spain, 2002, 32–41.
[4] M. Sahami, Learning limited dependence Bayesian classifiers, in Proceedings of the 2nd International
Conference on Knowledge Discovery and Data Mining, 1996, 335–338.
[5] F. Pernkopf, Bayesian network classifiers versus selective k-NN classifier, Pattern Recognition,
38(2005), 1–10.
[6] C.E. Shannon, A Mathematical Theory of Communication, Bell System Technical Journal, 1948, 379–
423.
[7] UCI Machine Learning Repository. Available online: https://archive.ics.uci.edu/ml/datasets.html.
[8] J. Demšar, Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning
Research, 7 (2006), 1–30.
[9] M. Friedman, The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of
Variance, Journal of the American Statistical Association, 32 (1937), 675–701.
254 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-254

The Factor Analysis's Applicability on


Social Indicator Research
Ying XIEa, Yao-Hua CHENb and Ling-Xi PENG b,1
a
Guangzhou Social Work Research Center, Guangzhou University,
Guangzhou, P.R. China, 510006
b
School of Mechanical and Electrical Engineering, Guangzhou University,
Guangzhou, P.R. China, 510006

Abstract. Factor analysis is a multivariate statistical method widely used in social


indicator analysis. Most of time, factor analysis results in the textbook only give
some mathematical expressions without clear interpretation. Motivated from a case
study on a popular textbook, this paper attempts to illustrate the potential pitfall of
factor analysis in some real applications. The study demonstrates that without
careful examination of the original dataset, factor analysis can lead to misleading
conclusions. This issue has been largely ignored in the literature including popular
textbooks. The statistical analysis cannot completely rely on the automated
computer software. The Kaiser-Meyer-Olkin (KMO) test results can only be used
as a reference. We should carefully examine the applicability of the original data
and give a cautious explanation. Provided that some popular textbooks ignore this
point, we wish this article can draw the readers’ special attention to the raw data
when conducting factor analysis.

Keywords. applicability, data analysis, factor analysis

Introduction

Factor analysis is a popular method for multivariate statistical analysis. In a typical


multivariate statistics course, factor analysis is an essential part. Normally, before
conducting factor analysis, it is suggested to use Kaiser-Meyer-Olkin (KMO) test to
justify whether the dataset is suitable for factor analysis. However, KMO test is
designed to test the sampling adequacy and does not fully account for the applicability
of factor analysis to a specific dataset. Consequently, even with significant KMO test
results, the factor analysis still produces suspicious conclusions.
The purpose of factor analysis is to reduce the dimensionality of the dataset, and to
examine the underlying relationships among the variables. In general, factor analysis
attempts to find a few factors to capture most information about the original data,
where the factors are combinations of the related variables [1-8].
Clearly, the factor analysis is based on analysis of the characteristics of the original
variables to summarize the information of the original variables [9-10].Therefore, the
choice of the original variables is very important. If there is no correlation among the
original variables, the data is not suitable for factor analysis. Dimension reduction

1
Corresponding Author: Ling-xi PENG, School of Mechanical and Electrical Engineering, Guangzhou
University, Guangzhou, P.R. China. Email: xysoc@gzhu.edu.cn.
Y. Xie et al. / The Factor Analysis’s Applicability on Social Indicator Research 255

effect will be limited. On the contrary, with stronger correlation, factor analysis could
largely reduce the dimensionality, produce superior performance, and improve the
interpretability [11].
Nowadays, factor analysis is implemented in most statistical software. But the
software are unable to understand the underlying meaning of each variable, and thus
researchers need to name cryptic factor, make a practical interpretation of the factors,
and check the applicability of factor analysis to the dataset. Quite often, the
applicability is not tested at all, and the researchers assume applicability by default.
This is one of the key reasons of absurd factor analysis results are not uncommon in
many statistical textbooks and articles. Many authors do not examine the raw data
before conducing the factor analysis.
Specifically, this article will use an example in Statistics (the fourth edition,
Renmin University of China Press) to illustrate the importance of checking
applicability of factor analysis. This textbook is widely used in China, recommended
by National Statistics Committee and Ministry of Education, with supporting
comprehensive database of teaching. In fact, the similar misuses could be found in
many other statistical textbooks, including another popular textbook Multivariate
Statistical Analysis [8].

1. A Case Study

The following example uses factor analysis to rank economic development of China
Provinces. "Based on the data of six major economic indicators for 31 provinces,
municipalities and autonomous regions in 2006, conduct factor analysis, explain the
factors, and calculate the factor score [9]." (Quoted (translated) from 256-269 pages of
the original book, Chapter 12, principal component analysis and factor analysis):

Table 1. The Raw Data

Gross
Governm Total Total Household Total Retail
Regional
ent Investment in Consumption Sales of
Product Populatio
Region Revenue Fixed Assets Expenditure Consumer
Per n (10000
(10000 (100 million (yuan/per Goods (100
Capita persons)
yuan) yuan) Capita) million yuan)
(yuan)

Beijing 50467 11171514 3296.3757 1581 16770 3275.2169

Tianjin 41163 4170479 1820.5161 1075 10564 1356.78652


Hebei 16962 6205340 5470.2356 6898 4945 3397.42296
Shanxi 14123 5833752 2255.7351 3375 4843 1613.43996
Inner
Mongoli 20053 3433774 3363.2077 2397 5800 1595.26514
a

The result is shown in following Table 2-3.


256 Y. Xie et al. / The Factor Analysis’s Applicability on Social Indicator Research

Table 2. Rotated Component Matrix in the Textbook

Component
1 2
Gross Regional Product Per Capita .112 .981
Government Revenue .755 .622
Total Investment in Fixed Assets .931 .247
TotalPopulation .941 -.213
Household Consumption Expenditure .117 .980
Total Retail Sales of Consumer Goods .922 .349
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.

Table 3 Variance Explained Ratio in the Textbook

ComponentInitial Eigenvalues Extraction Sums of Squared Rotation Sums of Squared


Loadings Loadings
Total % of Cumulative %Total % of Cumulative %Total % of Cumulative %
Variance Variance Variance

1 3.963 66.052 66.052 3.963 66.052 66.052 3.197 53.284 53.284

2 1.771 29.518 95.570 1.771 29.518 95.570 2.537 42.286 95.570

3 .128 2.128 97.698

4 .095 1.589 99.287

5 .026 .433 99.720

6 .017 .280 100.000


According to the textbook, the first component is most highly correlated with Total
Investment in Fixed Assets; Government Revenue; Total Retail Sales. The author
defined it as "economical level factor", and defined the second factor as "consumption
level".
Table 4 Region Rank in the Textbook

Rank Region Fac1 Fac2 Score


1 Guangdong 2.42045 .89371 3.31416
2 Shanghai -.54724 3.46909 2.92185
3 Jiangshu 1.96498 .57532 2.54030
4 Shandong 2.36315 .00275 2.36591
5 Zhejiang .94065 1.11499 2.05565
6 Beijing -.64278 2.63862 1.99584
7 Liaoling .41769 .20721 .62490
8 Henan 1.29494 -.83424 .46070

Then, the author weighted each factor according to variance contribution rate, and
then summed. In this way, the textbook calculated the total scores of each region, and
Y. Xie et al. / The Factor Analysis’s Applicability on Social Indicator Research 257

used the total score to reflect regional economic development. The result of rank in the
textbook is in the following Table 4.
According to the textbook, the result of the KMO Test, shown in Table 5, is
statistically significant, which means that the result of factor analysis is meaningful.
However, the result is highly skeptical. For example, Beijing is significantly under-
ranked, and Henan is over-ranked. Guangdong being ranked first is inconsistent with
the actual situation of the economic development.
Table 5 KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .695
Approx. Chi-Square 277.025

Bartlett's Test of Sphericity df 15

Sig. .000
The problem of the above analysis lies in the raw data. The example selects a few
variables to reflect the economic development. However, these variables are not on the
same scale. The GDP per capita is on the "individual" scale, while the "total population
at the end of the year", "investment in fixed assets", "total retail sales of social
consumer goods" and "government revenue" are all on the "population" or "overall"
scale. Because of the mismatched scale, it is inappropriate to combine these variables
into meaningful factors. In fact, to compare the level of economic development, the
"total population" is not even a proper indicator, as it gives advantages to regions with
larger population in the ranking system. Obviously, large population does not
necessarily indicate a prosperous economy. For example, Beijing, the Capital of China,
has much less population than Henan province, but Beijing’s economy is much more
developed than Henan.
To overcome this problem, a more appropriate approach is to examine the raw data
before factor analysis. To evaluate the economic development, the per capita variables
are more reasonable. Using the data from the textbook, the author calculates per capita
data for each variable except "total population", and then using factor analysis to do
same kind of research. The results (Table 6) show that the Number 1 extracted factor
can explain more than 80% of the variation, showing that there is greater relationship
between per capita level of economic indicators. We use the component matrix (Table
7) to recalculate the score.
Table 6 New Total Variance Explained
Initial Eigenvalues Extraction Sums of Squared Loadings
Component
Total % of Variance Cumulative % Total % of Variance Cumulative %
1 4.210 84.210 84.210 4.210 84.210 84.210
2 .592 11.833 96.042
3 .139 2.776 98.818
4 .039 .770 99.588
5 .021 .412 100.000
Extraction Method: Principal Component Analysis.
258 Y. Xie et al. / The Factor Analysis’s Applicability on Social Indicator Research

Table 7 New Component Matrix


a
Component Matrix
Component
1
Government Revenue Per C .966
Investment in Fixed Asset Per C .948
Household Consumption Expenditure .698
Retail Sales of Consumer Goods Per C .968
Gross Regional Product Per C .978
Extraction Method: Principal Component Analysis.
a. 1 components extracted.

The final regional rank of the economic level (Fac 1) is shown below.

Table 8 New Region Rank


Rank Region Fac1
1 Shanghai 2.78325
2 Beijing 2.6151
3 Zhejiang 1.30243
4 Tianjin 1.13873
5 Jiangsu 1.02431
6 Guangdong 0.82984
7 Liaoning 0.67854
8 Shangdong 0.65539
18 Henan -0.31168

Clearly, the ranking in Table 8 agrees with the actual economic situation in China.
The more developed regions are on the top.

2. Conclusions

Factor analysis is a widely taught and used statistical method, especial in the field of
social indicator research. Various professional statistical softwares (such as SPSS and
SAS) integrate modules of factor analysis to automate the process. But without careful
examination of the raw data, erroneous conclusions are unavoidable. The quality of the
factor analysis result highly depends on the original variables, data sources, and the
analysis method.
The KMO test is often employed to test whether the data is suitable for factor
analysis, but this test cannot tell whether the data itself is reasonable for analysis.
Most of time, factor analysis results in the textbook only give some mathematical
expressions without clear interpretation. When researchers and teachers used factor
Y. Xie et al. / The Factor Analysis’s Applicability on Social Indicator Research 259

analysis to show how to analyze practical problems, it is crucial to examine the


applicability of factor analysis to the original data, and to analyze whether the variables
are on the same scale. Only if the original data meet the scientific requirements, the
reliable conclusions could be reached.
In short, the statistical analysis cannot completely rely on the automated computer
software. The KMO test results can only be used as a reference. We should carefully
examine the applicability of the original data and give a cautious explanation. Provided
that some popular textbooks ignore this point, we wish this article can draw the
readers’ special attention to the raw data when conducting factor analysis.

Acknowledgements

This work was supported by the National Social Science Fund 15AZD077.

References

[1] H. H. Harman, Modern Factor Analysis, 3rd ed. Chicago: University of Chicago Press. 1976.
[2] N. Cressie, Statistics for spatial data. John Wiley & Sons, 2015.
[3] J. L. Devore, Probability and Statistics for Engineering and the Sciences. Cengage Learning, 2015.
[4] D. R. Anderson, D. J. Sweeney, T. A. Williams, et al. Statistics for business & economics. Nelson
Education, 2016.
[5] J. Pearl, M. Glymour, N. P. Jewell. Causal Inference in Statistics: A Primer. John Wiley & Sons, 2016.
[6] J. R. Schott, Matrix analysis for statistics. John Wiley & Sons, 2016.
[7] D. C. Howell, Fundamental statistics for the behavioral sciences. Nelson Education, 2016.
[8] X. Q. He, multivariate statistical analysis, Renmin University of China Press, 2011, 143-173.
[9] J. P. Jia, Statistics, Renmin University of China Press, 2011, 254-270.
[10] J. Kim and C. W. Mueller. Factor Analysis: What it is and how to do it. Beverly Hills and London:
Sage Publications, 1978.
[11] P. Kline, An easy guide to factor analysis. London: Routledge, 1994.
260 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-260

Research on Weapon-Target Allocation


Based on Genetic Algorithm
Yan-Sheng ZHANG1, Zhong-Tao QIAO and Jian-Hui JING
The Fourth Department, Ordnance Engineering College, Shijiazhuang City, Hebei
Province, China

Abstract. Weapon-target allocation (WTA) is a typical constrained combinatorial


optimization problem, which is an important content of command and decision in
air defense operation. WTA is known to be NP-complete problem, and the
intelligent optimization methods are widely employed to solving it. A popular
coding length is n m corresponding to assigning n weapons to m targets. However,
the coding length will increase greatly with the problem scale growing, and the
computation is too heavy to meet the real-time requirements. This paper focuses
on designing a new gene coding to improve computational efficiency. In our study,
a sequence of weapons serves as gene coding, which is attached the two other
codes, target code and capacity code respectively. This coding length is n and
adapts to the constraints of WTA effectively. Then the operators of gene selection,
crossover and mutation are designed. On the other hand, the maximum operational
effectiveness is defined as the object function with the minimum consumption of
ammunition. This model is based on multi-objective optimization, and is more
realistic. An example shows that the method is feasible and can save computing
time greatly.

Keywords. Weapon-target allocation, GA, Multi-objective optimization

Introduction

The weapon target allocation (WTA) is to optimize the distribution of our forces and
weapons according to the characteristics and quantity of incoming targets for the best
operational effectiveness. The WTA is a typical constrained combinatorial optimization
problem, which is a hard Non-Polynomial optimizing problem. The model of WTA
based on multi-objective optimization is more realistic and a hot topic. At present, the
intelligent optimization methods[1-3], such as genetic algorithm (GA), particle swarm
algorithm (PSA), ant colony algorithm (CA), and simulated annealing (SA), are widely
employed to solving WTA.
These intelligent algorithms have been shown better solutions than the classic ones.
However, it is not enough to satisfy real-time requirement of air-defense. In this paper,
we focus on designing a new gene coding to improve computational efficiency. A
popular genetic coding length is n*m corresponding to assigning n weapons to m
targets. In our study, a sequence of weapons serves as gene coding, which is attached
the two other codes, target code and capacity code respectively. This coding length is n
and adapts to the constraints of WTA effectively. On the other hand, the maximum

1
Corresponding Author: Yan-Sheng ZHANG, Lecturer, Ordnance Engineering College, No.97 Heping
West Road, Shijiazhuang City, Hebei Province, China; E-mail: zhang_sheng_74@163.com.
Y.-S. Zhang et al. / Research on Weapon-Target Allocation Based on Genetic Algorithm 261

operational effectiveness is defined as the object function with the minimum


consumption of ammunition.

1. Mathematic Model

Our anti-aircraft equipment is represented by A=[a1, a2,…, an], in which ai means the ith
(1≤i≤n) weapon. R=[r1, r2,…, rn] represents the capacity of ammunition corresponding
to A=[a1, a2,…, an], and ri means the quantity of ammunition about ai. Target set is
given by T=[t1, t2,…, tm], and tj (1≤j≤m) is the jth incoming target. D=[d1, d2,…, dm]
shows threat levels corresponding to T=[t1, t2,…, tm], and dj represents threat degree of
tj. P=[pij]nm is a matrix of intercept probability, and pij gives the intercept probability of
ai to tj. The decision matrix is described by X=[xij]nm, and xij is the number of missile
about ai to tj.
Operational effectiveness f1(X) is expressed in Eqs. (1) [4]. The total number of
missiles f2(X) consumed is given in Eqs. (2). The optimization of WTA is to make f1(X)
as large as possible, and f2(X) as small as possible. The multiple objective
optimizations can be transformed into a single one as the following, shown in Eqs. (3).
Then f(X) is the objective function, in which L1 and L2 are weights. We expect f(X) as
large as possible, shown in Eqs. (4).
m n

¦ d j (1  – (1  pij ) ij )
x m
f1 ( X ) (1)
j 1
n m
i 1 ¦
j 1,觟j z k
xij 0 (5)

f2 ( X ) ¦¦ x ij
(2) n
i 1 j 1 s.t.
¦x ij t1 (6)
f (X ) L1 f1 ( X )  L2 f 2 ( X ) (3) i 1

1 d xij d ri (7)
max f ( X ) (4)

Usually, there are some constraints about f(X). The number of weapons is greater
than the number of targets, namely, n m. A weapon ai is allowed only to been allocated
to one target tk, which is indicated in Eqs. (5). It is concluded that there is only one
nonzero element in each line of X. Any target is assigned at least one weapon. This
tells that at least one element of each column is not zero in X, as shown in Eqs. (6). The
number of missiles xij, which ai is allocated to tj, shouldn’t exceed the capacity of the
missile ri, given in Eqs. (7).

2. Design of Genetic Algorithm

2.1. Gene Particle Coding

The decision matrix X is the solution of the objective function. It is very complicated to
perform gene crossover and mutation if X is directly encoded as gene particle. A
sequence of weapons, A=[a1, a2,…, an], serves as gene coding, and 1, 2,… n represents
a1, a2,…, an respectively. Additionally, each gene particle is set two other codes.
Corresponding to A=[a1, a2,…, an], one is the target set T=[t1, t2,…, tn], and the other is
the quantity set of ammunition C=[c1, c2,…, cn]. t1, t2,…, tn are described by ,
262 Y.-S. Zhang et al. / Research on Weapon-Target Allocation Based on Genetic Algorithm

,… m , and c1, c2,…, cn meet the conditions of c1< r1, c2< r2,…, cn< rn. For example,
the gene coding and its additional coding are shown in Figure 1.
W

1 2 3 4 5 6 T

1 0 0 0 3 0 0
2 2 0 0 0 0 0
Gene code W=[ 2 4 8 1 5 7 3 6 9]
3 0 0 0 4 0 0
4 0 1 0 0 0 0
Target coding T=[ ] X= 5 0 0 0 0 1 0
Capacity coding 6 0 1 0 0 0 0
C=[ 2 1 3 3 1 4 4 1 1]
7 0 0 0 0 0 4
8 0 0 3 0 0 0
9 0 0 0 0 1 0

Figure 1. Gene particle coding Figure 2. The decision matrix


In Figure 1, the particle length is 9, which represents the 9 weapon equipment. The
length of its target attribute coding is also 9, but the number of targets is 6. The code of
missile’s numbers has 9 elements too. The three codes are corresponding one by one.
According to Figure 1, X can be inferred as shown in Figure 2. The gene particle
coding can satisfy constraint conditions of Eqs. (5), (6) and (7) by analyzing this matrix.
The steps of generating gene particle are shown in the following.
Step1: A combination W, which is composed of the numbers from 1 to n, is
generated randomly.
Step2: The first m elements of T are assigned 1, 2, … , m-1 and m in turn, namely,
T(1)= , T(2)= ,…, T(m)= m . The ones from m +1 to n, are given a number of 1~m
randomly. For example, the 7, 8 and 9 elements of T are assigned , and
randomly in Figure 1.
Step 3: The members of C are integers and meet ci< ri. In Figure 1, C=[2 1 3 3 1 4
4 1 1] is a fit vector for requirements, if R=[3 2 4 3 2 5 4 4 1].
Step 4: X can be derived from W, T and C, as shown in Figure 2.
According to this method, the initial population containing M particles can be
generated easily.

2.2. Gene Particle Crossing

The roulette method is used to generate the parent population Q1, and some of them are
selected to implement gene reconfiguration by crossing with the probability of p1. The
cross point k of the two-parent genes is a random number between 1 and n. Figure 3
shows the process of crossing.
W1 and W2 are gene particles for crossing, and their affiliated codes (T1, C1, T2 and
C2) are also listed in Figure 3(a). Suppose k=4, and the crossing of W1(1) can be
explained as follows.
Step1: Search for the same value as W2(1)=1 in W1. Search result is W1(5)= W2(1)=1.
Step2: The Value of W1(1) is interchanged with the one of W1(5). As a result,
W1'(1)=1 and W1'(5)=8.
Step3: Accordingly, the Value of C1(1) is also interchanged with the one of C1(5) . As
a result, C1'(1)=2 and C1'(5)=4.
Y.-S. Zhang et al. / Research on Weapon-Target Allocation Based on Genetic Algorithm 263

a3 a4
a1 a2
W1=[8 4 3 2 1 9 5 6 7]
W1=[1 6 9 5 8 3 2 4 7]
W2=[1 6 9 5 3 2 7 8 4]
W2=[8 4 3 2 9 5 7 1 6]
b1
b3 b2 T1=[1 2 3 4 5 6 4 1 6]
b4
T1=[1 2 3 4 5 6 4 1 6] T2=[1 2 3 4 5 6 1 1 4]

T2=[1 2 3 4 5 6 1 1 4] C1=[2 2 1 1 4 1 2 1 4]

C1=[4 1 1 2 2 1 1 2 4] C2=[3 2 4 2 1 2 3 3 1]

C2=[3 1 1 2 4 2 3 3 2]

(a) (b)

Figure 3 Gene particle crossing

The crossing of W1(2), W1(3) or W1(4) is similar to W1(1). Again, W2(1)~ W2(4)
can be made this transformation. The two news genes W1' and W2' can be derived by
crossing, as shown in Figure 3(b).
On the surface, if k=1, the first bits of W1 and W2 will be swapped; if k=9, W1 and
W2 will be completely swapped. In fact, the traditional position swap, that the two
particles interchange each other at the crossing point, is not adopted. It is because there
may be the same values in a new gene code, which does not conform to the constraints
of Eqs. (5). In this paper, the crossover operator specifies that the element swaps only
occurs in one gene code. The result of exchanging between the elements of a gene is
that the first k elements of W1' are the same to the ones of W2, so do the ones of W2'. In
general, W1'≠W1 and W2'≠W2. After the genes crossing, the new population is generated
and named as Q2.

2.3. Gene Particle Mutating

A little of particles are selected to mutate with the probability of p2 in the population.
Suppose W1 is the mutation particle, and it affiliated codes are T1 and C1 in Figure 4.
The mutation operation is designed as shown in Figure 4.
a1 W

1 2 3 4 5 6 T
W1=[8 4 3 2 1 9 5 6 7]
1 0.91 0.16 0.96 0.66 0.32 0.45
T1=[1 2 3 4 5 6 4 1 6]
2 0.13 0.97 0.66 0.17 0.95 0.65
C1=[4 1 1 2 2 1 1 2 4] 3 0.91 0.96 0.04 0.71 0.03 0.71
(a) 4 0.63 0.49 0.85 0.03 0.44 0.75
P=
W1=[8 6 3 2 1 9 5 4 7] 5 0.10 0.80 0.93 0.28 0.38 0.28
6 0.28 0.14 0.68 0.05 0.77 0.68
T1=[1 2 3 4 5 6 4 1 6]
7 0.55 0.42 0.76 0.10 0.80 0.66
C1=[4 3 1 2 2 1 1 2 4] 8 0.96 0.92 0.74 0.82 0.19 0.16
(b) 9 0.96 0.79 0.39 0.69 0.49 0.12

Figure 4 Gene mutating Figure 5 The intercept probability

Step1: k1 and k2 are the random integers form 1 to n. The value of W1(k1) is exchanged
with the one of W1(k2). Set k1=2 and k1=8, then W1'(k1)=6 and W1'(k2)=4 after
264 Y.-S. Zhang et al. / Research on Weapon-Target Allocation Based on Genetic Algorithm

exchanging W1(k1)=4 with W1(k2)=6. Namely, it is the way to mutate that


weapons against different target are exchanged.
Step2: The corresponding numbers of ammunition are varied randomly. According to
W1'(k1)=6, C1'(k1)=3 ≤ r6=4; and according to W1'(k2)=4, C1'(k2)=2 ≤ r4=3.
The particle only varies its quality of ammunition if k1=k2.The mutation population
is named as Q3

2.4. Selecting the Next Generation

Set the initial population P0 and the ith generation Pi. The steps of generating the next
generation Pi+1 is the followings:
Step1: P0 experiences selecting, crossing and mutating, and becomes Q3.
Setp2: Q3 and P0 are mixed together to become the population P' with 2M particles.
Then the objective values of these particles are calculated.
Setp3: According to the values, they are listed in descending as following:

f ( X 1 ) t f ( X 2 ) t ... t f ( X M 1 ) t f ( X M ) t f ( X M 1 ) t ... t f ( X 2 M 1 ) t f ( X 2 M ) (8)

The first M particles are selected to form the next generation Pi+1.

3. Sample

In an air defense exercise, we are assigned the 9 air-defense weapons corresponding to


the 6 targets of attacking. Their ammunition capacities are specified as R=[3 2 4 3 2 5 4
4 1].The threat degrees of these targets are assessed by the integrated command system
as shown in D=[0.51,0.73,0.20,0.95,0.44,0.16].
It also estimates the probability of air-defense equipment intercepting targets in
Figure 5. The quantity of particle population: M=50; The maximum number of the
iteration: N=200; The probability of crossing: p1=0.8; The probability of mutating:
p2=0.05. The weights of the objective function: L1 =1 and L2 = -0.05. They are
determined by Efficacy Coefficient Method [5].
The algorithm runs 4 times randomly and draws the Figure 6. The optimums are
listed in Table 1. Some conclusions can be inferred from these optimizations.

2.4 2.4

2.3 2.3
f(X)
f(X)

2.2 2.2

2.1 2.1

2 2
0 50 100 150 200 0 50 100 150 200
number of iterations number of iterations
(b)
2.4 2.4
X: 53
Y: 2.408 2.3
2.3
f(X)
f(X)

2.2 2.2

2.1 2.1

2 2
0 50 100 150 200 0 50 100 150 200
number of iterations number of iterations
( ) (d)

Figure 6 Results about the four times


Y.-S. Zhang et al. / Research on Weapon-Target Allocation Based on Genetic Algorithm 265

A. The higher the intercept probability of weapon to target is, the greater the
probability that the weapon will be assigned to the target will be.
B. More equipment and more ammunition tend to be given to the target with the high
threat level.
C. More equipment tends to be given to the target to which all equipment has poor
intercept probability.
D. The application run shown is about 0.40 seconds long, which reduces an order of
magnitude compared to a similar-size example in the literature [6].
Table 1. The optimums of the four times

Order 1 2 3 4
Literation 146 104 53 131
f(X) 2.4011 2.4066 2.4078 2.4078
f1(X) 2.8511 2.8566 2.9078 2.9078
f2(X) 9 9 10 10
Time 0.3541s 0.4107s 0.4732s 0.3988s
000100 000100 001000 001000
000010 010000 000010 000010
010000 000100 010000 010000
000001 000001 000001 000001
X 001000 001000 010000 010000
000001 000010 000001 000001
000010 000010 000010 000010
000100 000100 000200 000200
100000 100000 100000 100000

4. Conclusions

The WTA model proposed in this paper conforms to air defense operation well
according to Conclusion A, B and C in the sample.
A new genetic coding method is presented, and its operators of crossing, mutating
and selecting are designed. Compared with the traditional coding method, the coding
length is n and shortened m times corresponding to assigning n weapons to m targets.
So the suggested algorithm should have a significant improvement in computational
efficiency, which has been confirmed in Conclusion D of the sample. The sample
shows that the algorithm is good at the optimization. It should be noted that the weight
and play an important role in balancing between f1(X) and f2(X). Therefore,
determining the weights of these parameters is further research for us.

References

[1] M. Dorigo and C. Blum, Ant Colony Optimization Theory: A Survey, Theoretical Computer Science㧘
344(2005):243-278.
[2] S. Chen and T. Hu, Weapon-target Assignment with Multi-objective Non-dominated Set Ranking
Genetic Algorithm, Ship Electronic Engineering, in Chinese, 35(2015):54-57.
266 Y.-S. Zhang et al. / Research on Weapon-Target Allocation Based on Genetic Algorithm

[3] C. L. Fan, Q. H. Xing, M. F. Zheng, et al. Weapon-target allocation optimization algorithm based on
IDPSO, Systems Engineering and Electronics, in Chinese, 37(2015):336-342.
[4] O. Karasakal, Air defense missile-target allocation models for a naval task group, Computers &
Operations Research, 35(2008): 1759-1770.
[5] C. G. Xue, Enterprise Information System Adaptability Optimization Based on Cloud Co-evolution
Algorithm, Industrial Engineering and Management, in Chinese, 18(2013): 47-53.
[6] C. L. Fan, Q. H. Xing, M. F. Zheng and Z. J. Wang, Weapon-target allocation optimization algorithm
based on IDPSO, Systems Engineering and Electronics, in Chinese, 37(2015):336-342.
Fuzzy Systems and Data Mining II 267
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-267

PMDA-Schemed EM Channel Estimator


for OFDM Systems
Xiao-Fei LIa, b, c,1, Di HEd and Xiao-Hua CHENe
a
The College of Mathematics and Computer, Wuyi University, Fujian, China.
b
The Key Laboratory of Cognitive Computing and Intelligent Information Processing
of Fujian Education Institutions, Fujian, China
c
Shanghai Key Laboratory of Trustworthy Computing East China Normal University,
Shanghai, China
d
Shanghai Key Laboratory of Navigation and Location Based Service Shanghai Jiao
Tong University, Shanghai, China
e
School of information and Engineering, Chuzhou University, Huzhou, Zhejiang
Province, China

Abstract. Channel estimation is a challenging problem in an Orthogonal


frequency division multiplexing (OFDM) system, where is met with frequency-
selective Rayleigh fading. The Poor man’s Data Augmentation (PMDA) estimator
is proposed to efficiently estimate the channel impulse response (CIR) making use
of multipath fading channels with additive white Gaussian noise (AWGN). A
modest number of pilot tones is used to efficiently improve the channel estimation
by the use of the PMDA estimator to obtain the initial estimation for the iterative
procedure. Using Cramer-Rao-like-lower (CRLB) estimator as the unbiased
channel estimation criterion, simulation results illustrate that the bit error rate
(BER) performance and the mean square error (MSE) performance estimated by
the PMDA algorithm are better than those used by the EM estimator and reveals
that the convergence of PMDA estimator is faster than the EM estimators.

Keywords. Poor man’s Data Augmentation (PMDA), expectation-maximization


(EM), orthogonal frequency division multiplexing (OFDM)

Introduction

When wireless communication transmission is applied in scenarios with high mobility


and high carrier frequency, the White Gaussian Noise of the inter-carrier interference
(ICI) is generated among the time variance of channel [1, 2]. The severe challenge to
design a wireless communication system is to overcome the White Gaussian Noise and
improve the system performance of the wireless communication system. Orthogonal
Frequency Division Multiplexing (OFDM) [3, 4] has robustness to inter-symbol-
interference (ISI) and high spread efficiency. The knowledge of the channel state
information (CSI) at the receiver is crucial for coherent detection and performs
diversity combining [5, 6]. A cooperative spectrum-sensing approach based on various

1
Corresponding Author: Xiao-Fei LI, Shanghai Key Laboratory of Trustworthy Computing East China
Normal University, Shanghai; The College of Mathematics and Computer WuYi university, FuJian, China;
E-mail: lixiaofei_73@163.com.
268 X.-F. Li et al. / PMDA-Schemed EM Channel Estimator for OFDM Systems

chaotic stochastic resonance (CSR) energy detection fusion techniques is proposed in


[7], which can overcome the uncertain channel noise. Channel estimation in OFDM
systems is subjected to frequency-selective Rayleigh fading. EM-based channel
estimator in [8] is proposed to estimate the CIR or multipath fading frequency channel
with AWGN in OFDM system. [9-17] apply EM algorithm to q-norm regularized
sparse parameter estimation or channel estimation and data detection techniques in
ultra-wideband systems and OFDM systems etc. Most commonly used estimators are
based on Least Square Error (LSE), Linear Mean Square Error (LMMSE) [18-22], and
Maximum Likelihood (ML) [23, 24]. A comparison between ML and MMSE can be
found in [25]. Blind channel estimation has been studied by [26, 27]. Other researchers
proposed a mixed approach which combines both blind and pilot based estimation
algorithms [28]. [29] illustrates the implementation of the E step of the EM algorithm
by use of Monte Carlo algorithm. In the M step of Monte Carlo EM (MCEM), the
maximizer update of the observed posterior is achieved using the mixture maximization.
[30] shows the relation between the MCEM algorithm and the data augmentation
algorithm. The simplicity of the posterior distribution of the parameter given the
augmented data is exploited by use of Data Augment algorithm or Asymptotic Data
Augmentation algorithm, while the simplicity of the maximum likelihood estimation
given the incomplete data is developed by use of the EM algorithm. In the paper, the
transmitting signals including the white Gaussian noise can be regarded as the
incomlete-data situation [31], where the White Gaussian Noises of the inter-carrier
interference result in the missing data of the transmitting signals. Meanwhile,
maximum likelihood estimation given the incomplete data is exploited in the paper.
Therefore, to solve this problem, the Poor man’s Data Augmentation (PMDA)
algorithm is introduced and compared with EM algorithm in the paper which develops
only the maximum likelihood estimation given the incomplete data in the paper.
The contribution to the paper is as follows:
The PMDA estimator [29] is introduced to estimate channel impulse response in
OFDM systems with multipath fading to overcome the uncertain channel noise.
According to CRLB estimator as the unbiased channel estimation criterion, the
simulation results are achieved from comparing the performance of the PMDA
estimator with EM estimator. Simulation results show that BER and MSE performance
of the PMDA are very close to the theoretic estimator, but much better than the EM
estimator and approximate to the CRLB estimator in high SNR.
The convergence of the PMDA estimator is faster than the EM estimator.
The rest of the paper is organized as follows. In section 1, the baseband OFDM
system models are described and discuss some assumption. In section 2, the PMDA
estimators are derived and fully discussed. Simulations are presented in section 3 to
illustrate the performance of the PMDA and EM estimators, and compare the
experimental results with CRLB estimator. Concluding remarks are given in section 4.

1. OFDM System Description

A baseband equivalent OFDM system is depicted in Figure 1. Each data stream which
possesses M-piont inverse fast fourier transform (IFFT) with additional cyclic prefix
(CP) ahead is first input into a serial to parallel (S/P) converter, and modulates the
corresponding sub-carrier by MPSK or MQAM. In terms of the maximum capacity or
the minimum BER under some constraints, one sub-carrier are transferred to another
X.-F. Li et al. / PMDA-Schemed EM Channel Estimator for OFDM Systems 269

altered by various schemes. Here, for simplicity, only QPSK is utilized in all the sub-
carriers. The modulated data stream, denoted as the complex valued variables
X(0),...,X(m),...,X(M−1) which are the modulated data symbols, are transformed by
IFFT, and the output symbols represents x(0),...,x(k),...,x(M − 1).
In terms of avoiding intersymbol interference (ISI), CP symbols propagating the
terminal IFFT symbols are added to the head of each frame. After shifted back to a
serial data streams, the parallel data are conveyed over the frequency selective channel.
After throwing away the prefix, and applying

Figure 1. Baseband OFDM System


FFT and demodulating, the received data which are denoted by y(0),...,y(k),...,y(M
− 1) and are corrupted by AWGN are shifted back to Y (0),...,Y (m),...,Y (M − 1).
In the paper, the channel mode [8] can be denoted as follows:

y(k) =∑‘’!
~ Ž ( − ) + η(k), 0 ≤ k ≤ M − 1, (1)

™^
!
where x(k) = ∑•’! 4(–)h €"—˜ š › , 0 ≤  ≤ œ − 1. The CIR ξwž  (0≤ l ≤ L − 1)
å \~
is i.i.d complex-valued Gaussian random data sequence, and η(k),(0 ≤ k ≤ M − 1) is the
additive AWGN variables with zero mean and variance σ2 for both real and imaginary
components, L is the length of the time-domain CIR. The received data frame in the
frequency domain

™^
’€"—˜ ›
Y(M) = ∑•’!
£~ ¢()h
š , 0 ≤ – ≤ œ −1 (2)

Combining equation (1) into (2), we have Y (m) = X(m)Ψ(m)+Φ(m),0 ≤ m ≤ M-1,



’€"— , ¦\¦•’!
where Ψ(m) = ∑‘’! ~ Ž h š is the frequency response of subcarrier m.
The set of the transmitted noise variables Φ(m), 0 ≤ m ≤ M − 1 are denoted as Φ(–) =

!
∑•’! Ž h ’€"— š , 0 ≤ – ≤ œ − 1 are i.i.d complex-valued Gaussian variables and
√• \~ 
possess the same distribution as η(k), with mean zero and variance σ2.

2. PMDA-Schemed EM Channel Estimator

In the paper, the PMDA algorithm yields an estimate of the entire observed posterior H
in order to specify a normal approximation to H, instead of just a maximizer and
270 X.-F. Li et al. / PMDA-Schemed EM Channel Estimator for OFDM Systems

curvature at this point. To computing the observed posterior, a sample X1,··· ,XM is
given from g(Y |X,H). The weights are assigned as follows:
©(ª|«¬ ) !
ω€ = , 1≤®≤œ and equation Q(H±H 6‰ ) = ∑\
€~! ²(³|4€ , ´) is
©(ª|«¬ ­) •
∑^
¬¶W µ¬ ©(«¬ |ª,­)
replaced with Q(H±H 6‰ ) = ∑^
, where the original smple is updated for the
¬¶W µ¬
new information at iteration it through the weights [32].
The iterative algorithm is described as follows:
1. Initialize n, H0.
Generate X1, ···, Xm ∼ g(Y |X, H) via Monte Carlo algorithm
At the iteration it +1:
¹¬∗ |ª,«¬ )
©(­
2. Compute the importance weights ω€ = (5hL∑∗ )!/" , 1≤®≤–
©(­|«¬ ª)
∑^
¬¶W ©(«¬ |ª,­)
3. E-step: Estimate Q(H|H 6‰ ) by Q(H|H 6‰ )= ∑^¬¶W µ¬
4. M-step: Maximize Q(H|H(it)) to obtain H(it+1). H(it+1) = argmaxQ(H|H(it))
5. Computing the difference between successive estimates |Hˆ(it+1) − Hˆit|, if the
difference is below a predetermined threshold, terminate the iteration and
output the final decisionX¹» , otherwise it +1 −→ it and repeat step 2 .

3. Simulation and Discussion

An OFDM system model is constructed according to the specification of 802.11a and


illustrated the validity and effectiveness of the PMDA-based channel estimation. The
time variation of the channel is characterized by the normalized Doppler frequency fd.
The assumption can be hold in [8, 33]. In [8, 33], the Doppler frequency is chosen to
be 55.6Hz and 277.8Hz, which make fdT 0.01 and 0.05. A Rayleigh fading channel
model is generated by Jakes’ model. The CIR ξl utilizes in the simulations is achieved
¥
’
by ξ (>) = ∑!
~! h  ¼ (L)½(> − ) [8, 33], where αl(t) which is independent complex-
valued Gaussian random variables with unit variance is the complex amplitude to alter
in terms of the Doppler frequency. The stopping criterion is ||ξl+1−ξl||2≤ 10e−2 [8, 33]
since independent complex-valued Gaussian random variables with unit variance,
varying in time in terms of the Doppler frequency. In [8], a conventional exponential
decay multipath channel model is established and the stopping criterion is set as
||ξl+1−ξl||2≤ 10e−2. In the paper, we have validated our PDMA-schemed channel
estimator by simulation. Using N = 64 subcarriers and QPSK modulation for each
subcarrier, BER for performance versus different SNRs when the normalized fading
parameter fdT ≤ 0.01[8, 33] is shown in Figure 2. The corresponding mean square
error (MSE) of the estimated channel parameters is shown in Figure 3 Comparing the
PMDA estimator with the EM estimator [8, 33], the CRLB estimator [8, 33] and the
ideal estimator, it is clear from Figure 2 that the BER performance of the PMDA
estimator is more great improvement to the BER performance of the EM estimator [8,
33] in the whole region of SNR and Figure 3 presents the result of MSE of channel
parameters versus SNR when time variation of the channel, fdT ≤ 0.01 [8,33].
Comparing with the EM estimator, the PMDA estimator reduces the MSE at high
SNRs.
X.-F. Li et al. / PMDA-Schemed EM Channel Estimator for OFDM Systems 271

Figure 2. BER v.s. SNR for Rayleigh fading channels with fdT≤0.01

Figure 3. MSE v.s. SNR for Rayleigh fading channels with fdT≤ 0.01

Figure 4. EM v.s. PMDA estimates


Figure 4 presents a noticeable discrepancy among the PMDA and EM estimator. In
practice, having noticed such a discrepancy, the PMDA estimators is convergent faster
272 X.-F. Li et al. / PMDA-Schemed EM Channel Estimator for OFDM Systems

than the EM estimator. Therefore, less iterations are run to obtain convergence using
PMDA as a starting point for the data augmentation estimates.

4. Conclusion

In the paper, the PMDA-based estimator is proposed to be efficient estimate the CIR in
an OFDM system. PMDA-based channel estimate yields an approximation to the
observed posterior to reduce the calculating quantity of E-step. The simulation reveals
that BER and MSE performance of the PMDA is very close to the theoretic estimator,
but much better than the EM estimator and approximate to the CRLB estimator in high
SNR, and the convergence of PMDA estimator is faster than EM estimators. The
simulation results show that the performance is acceptable when SNR is larger than
10dB. In the small SNR region, channel coding scheme is used to improve performance
in the PMDA estimator. It approves that the PMDA algorithm is more efficient
performance than the EM estimator. Our further work is to estimate multiple-input
multiple-output (MIMO) channels in the MIMO-OFDM systems.

Acknowledgements

This research work is supported by the Important National Science and Technology
Specific Project of China under Grant No.2016ZX03001022-006, National Natural
Science Foundation of China under Grant Nos.91438113, 61571064 and 61370176,
Education Department A Class Project in FuJian Province under Grant N0. JA15515,
and Science Research Project of WuYi University under Grant No. XL201012.

References

[1] W. W. Ren, L. Z. Liu. A novel iterative symbol detection of ofdm systems in time and frequency selective
channels. IEEE Conference General Assembly and Scientific Symposium (URSI GASS), (2014):1-4.
[2] S. Zettas, S. Kasampalis, P. Lazaridis; Z. D. Zaharis, J. Cosmas. Channel estimation for OFDM systems
based on a time domain pilot averaging scheme.16th International Symposium on Wireless Personal
Multimedia Communications (WPMC). (2013):1-6.
[3] Y. S. Liu, Z. H. Tan, H. J. Hu, et al. Channel estimation for ofdm. IEEE communications surveys &
tutorials, 16(2014):1891-1908.
[4] S. M. Riazul Islam, Kyung Sup Kwak. Two-stage channel estimation with estimated windowing for MB-
OFDM UWB system. IEEE Communications Letters, 20(2016): 272-275.
[5] M. Hajjaj, W. Chainbi, R. Bouallegue. Low-rank channel estimation for MIMO MB-OFDM UWB
system over spatially correlated channel. IEEE Wireless Communications Letters, 5(2016): 48-51.
[6] V. Pohl, P. H. Nguyen, V. Jungnickel, and C. V. Helmolt. How often channel estimation is needed in
MIMO systems, in Proc. IEEE Global Telecommun. Conf., San Francisco, Calif, USA, (2003): 814-
818.
[7] D. He. Chaotic Stochastic Resonance Energy Detection Fusion Used in Cooperative Spectrum Sensing,
IEEE Transactions on Vehicular Technology, 62 (2013):620-627.
[8] X.Q. Ma, H. Kobayashi and S. C. Schwartz. EM-Based Channel Estimation algorithm for OFDM.
Journal on Applied Signal Processing, 10(2004):1460-1477.
[9] R. Carvajal, B. I. Godoy, J. C. Aguero, J. I. Yuz, and W. Creixell. EM-based ML channel estimation in
OFDM systems with phase distortion using RB-EKF. 17th International Symposium on Wireless
Personal Multimedia Communications (WPMC2014). (2014): 232-237.
[10] J. W. Choi, S. C. Kim, J. H. Lee, Y. H. Kim. Joint channel and phase noise estimation for full-duplex
systems using the EM algorithm. 2015 IEEE 81st Vehicular Technology Conference (VTC Spring):1-5.
X.-F. Li et al. / PMDA-Schemed EM Channel Estimator for OFDM Systems 273

[11] R. Carvajal, J. Agüero, B. I. Godoy, D. Katselis. EM-based sparse channel estimation in OFDM
systems with q−norm regularization in the presence of phase noise and frequency offset. 2015 7th IEEE
Latin-American Conference on Communications (LATINCOM). (2015): 1 - 6
[12] A. Assra; J. X. Yang; B. Champagne. An EM approach for cooperative spectrum sensing in
Multiantenna CR networks. IEEE Transactions on Vehicular Technology. 65(2016): 1229-1243
[13] R. Carvajal, J. C. Aguero, B. I. Godoy, and D. Katselis. A MAP approach for q-norm regularized
sparse parameter estimation using the EM algorithm, in Proc. of the 25th IEEE Int. Workshop on
Mach. Learning for Signal Process (MLSP 2015), Boston, USA, (2015): 1-6
[14] C. H. Cheng; H. L. Hung; J. H. Wen. Application of expectation-maximisation algorithm to channel
estimation and data detection techniques in ultra-wideband systems. IET Communications. 6(2012):
2480 - 2486
[15] M. L. Ku, W. C. Chen, C. C. Huang: EM-based iterative receivers for OFDM and BICM /OFDM
systems in doubly selective channels, IEEE Transaction Wireless. Communication, 10(2011):1405–
1415.
[16] M. Marey, M. Samir, and O. A. Dobre: EM-based joint channel estimation and IQ imbalances for
OFDM systems, IEEE Transaction and Broadcasting, 58 (2012): 106–113
[17] R. Carvajal, J. C. Aguero, B. I. Godoy, G. C. Goodwin: EM-Based Maximum-Likelihood Channel
Estimation in Multicarrier Systems With Phase Distortion. IEEE Transactions on Vehicular
Technology. 62(2013): 152-160.
[18] M. Hajjaj; W. Chainbi; R. Bouallegue: Two-step LMMSE channel estimator for Unique Word MB-
OFDM based UWB systems. 2015 International Wireless Communications and Mobile Computing
Conference (IWCMC). (2015): 1012-1016.
[19] V. Savaux; Y. Louet; F. Bader. Low-complexity approximations for LMMSE channel estimation in
OFDM/OQAM. 23rd International Conference on Telecommunications (ICT). (2016): 1-5.
[20] S. M. Key: Fundamentals of Statistical Signal Processing: Estimation Theory, Prentice-Hall, 1998:595.
[21] S. Haykin: Adaptive Filter Theory, Prentice-Hall, 3td Edition, 1996:989.
[22] Y. Srivastava, H. C. Keong, H. W. F. Patrick, S. Sumei: Robust MMSE channel estimation in OFDM
systems with practical timing synchronization, 2004 iEEE Wireless Communications and Networking
Conference, 2(2004):711-716.
[23] B. Dulek; O.Ozdemir; P. K.Varshney; W.Su.Distributed Maximum Likelihood Classification of Linear
Modulations Over Nonidentical Flat Block-Fading Gaussian Channels. IEEE Transactions on Wireless
Communications. 14(2015): 724-737
[24] G. X. Zhou, W. Xu, G. Bauch. Efficient Maximum Likelihood Detection with Imperfect Channel State
Information for Interference-limited MIMO Systems. SCC 2015; 10th International ITG Conference on
Systems, Communications and Coding; Proceedings of. 2015:1-6
[25] I. Ngebani, Y. B. Li, X. G. Xia, M. J. Zhao. EM-based phase noise estimation in vector ofdm systems
using linear MMSE receivers. IEEE Transactions on Vehicular Technology. 65(2016): 110-122.
[26] Hayder Al-Salihi; Mohammad Reza Nakhai. An enhanced whitening rotation semi-blind channel
estimation for massive MIMO-OFDM. 2016 23rd International Conference on Telecommunications
(ICT). 2016:1-6
[27] W. Feng; J. L. Li; L. Zhang. Blind channel estimation combined with matched field processing in
underwater acoustic channel. OCEANS 2016 - Shanghai, 2016: 1-4
[28] W. Feng, W. P. Zhu, M. N. S. Swamy, A semiblind channel estimation approach for MlMO-OFDM
Systems, IEEE Transactions on Signal Processing., 56(2008):2821-2834.
[29] G. C. G. Wei and M. A. Tanner: A Monte Carlo Implementation of the EM Algorithm and the Poor
Man’s Data Augmentation, Journal of the American Statistical Association, 85(1990): 699-704.
[30] M. A. Tanner and W. W. Hung: The Calculation of Posterior Distributions by Data Augmentation.
Journal of the American Statistical Association, 82(1987):528-540, B.D. Ripley: Stochastic Simulation,
New York: John Wiley.
[31] G. J. McLachaln and T. Krishnan. The EM Algorithm and Extensions (second Edition). Wiley. A John
Wiley & Son, Inc., Publication. 2008.
[32] L. Tierney, R. E. Kass, and J. B. Kadane: Fully Exponential Lapalace Approximations to Expectation
and Variance of Nonpositive Functions, Journal of the American Statistical Association: theory and
method, (1989):710-716.
[33] X.Q. Ma; H. Kobayashi; S. C. Schwartz: An EM-based estimation of OFDM signals. 2002 IEEE
Wireless Communications and Networking Conference. 1(2002): 228 - 232.
274 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-274

Soil Heavy Metal Pollution Research


Based on Statistical Analysis and BP
Network
Wei-Wei SUN1 and Xing-Ping SHENG
School of Mathematics and Statistics, Fuyang Normal College, Fuyang, Anhui, China
236041

Abstract. Heavy metals in soil not only affect the growth of plants, also harm to
people's health through the food chain, and may cause the problem such as air and
water pollution, influence the ecological function of urban soil. Therefore, in order
to improve the living environment and solve the pollution problem thoroughly, we
must find out the cause of the heavy metal pollution. In this article, through
pollution index evaluation, statistical analysis, BP network spatial interpolation to
establish mathematical model. The soil heavy metal pollution degree, reason and
pollution source location are obtained. It provides important basis for
environmental protection and urban development.

Keywords. Heavy metal pollution, pollution index, correlation analysis, BP


network

Introduction

The soil is an important part of urban ecosystem. It is necessary to verify the anomaly
of soil environment and evaluate the urban environment quality by huge amounts of
data. Research on the evolution of urban soil environment under the influence of
human activity is of great significance for city ecological construction, agricultural
food safety, people's physical health and sustainable development. In this paper, the
content and spatial distribution of soil heavy metal elements such as Cu, Zn, Pb, Cd, Ni,
Cr, As, Hg in a certain city are discussed by statistical analysis. The location of
pollution sources is determined using BP network. The influence factors of heavy metal
pollution and potential hazard to the environment is found out. The distributions of
different urban activities on soil heavy metal forms provide the reference for evaluating
the environmental effect of heavy metals and guaranteeing the physical and mental
health of urban residents.

1. Samples Collection

Since the influence of human activities on the environment is different, the city is
divided into five functional areas: living area, industrial area, mountainous area, main
1
Corresponding Author: Wei-Wei SUN, lecture, School of Mathematics and Statistics, Fuyang Normal
College, Fuyang, Anhui, China 236041; E-mail: 93692849@qq.com.
W.-W. Sun and X.-P. Sheng / Soil Heavy Metal Pollution Research 275

road area and park green area [1], noting for class 1, class 2, class 3, class 4 and class 5
area respectively. In order to comprehensively analyze urban soil heavy metal pollution
problem, the soil samples are collected in the first place. To make sample have good
representativeness, a total of 319 soil sample are taken in different functional areas
using approximate grid method (1 km x 1 km). The location, altitude and functional
areas of sample are recorded by GPS, as shown in table 1. Then the concentration of
each sample containing Cu, zinc, Pb, Cd, Ni, Cr, As, Hg heavy metals are tested using
special equipment, as shown in table 2. In addition, we sample in the nature area away
from the crowd and industry according to two kilometers distance as the soil element
background values in the city, showing in table 3.
Table 1. Location and functional area of ample

Sample number X(m) Y(m) Altitude (m) Functional area


1 74 781 5 4
2 1373 731 11 3
3 1321 1791 28 4
4 0 1787 4 2
5 1049 2127 12 4
6 1647 2728 6 1
7 2883 3617 15 4
8 2383 3692 7 2
9 2708 2295 22 4
……
318 5985 2567 44 4
319 7653 1952 48 5

Table 2. Concentrations of eight kinds of heavy metals

Sample As Cd Cr Cu Hg Ni Pb
Zn (μg/g)
number (μg/g) (ng/g) (μg/g) (μg/g) (ng/g) (μg/g) (μg/g)
1 7.84 153.80 44.31 20.56 266.00 18.20 35.38 72.35
2 5.93 146.20 45.05 22.51 86.00 17.20 36.18 94.59
3 4.90 439.20 29.07 64.56 109.00 10.60 74.32 218.37
4 6.56 223.90 40.08 25.17 950.00 15.40 32.28 117.35
5 6.35 525.20 59.35 117.53 800.00 20.20 169.96 726.02
6 14.08 1092.90 67.96 308.61 1040.00 28.20 434.80 966.73
7 8.94 269.80 95.83 44.81 121.00 17.80 62.91 166.73
8 9.62 1066.20 285.58 2528.48 13500.00 41.70 381.64 1417.86
9 7.41 1123.90 88.17 151.64 16000.00 25.80 172.36 926.84
……
318 7.56 63.50 33.65 21.90 60.00 12.50 41.29 60.50
319 9.35 156.00 57.36 31.06 59.00 25.80 51.03 95.90
276 W.-W. Sun and X.-P. Sheng / Soil Heavy Metal Pollution Research

Table 3. Background values of eight kinds of heavy metals

Standard Standard
Element Average Scope Element Average Scope
deviation deviation
As (μg/g) 3.6 0.9 1.8~5.4 Hg (ng/g) 35 8 19~51
Cd (ng/g) 130 30 70~190 Ni (μg/g) 12.3 3.8 4.7~19.9
Cr (μg/g) 31 9 13~49 Pb (μg/g) 31 6 19~43
6.0~20.
Cu (μg/g) 13.2 3.6 4 Zn (μg/g) 69 14 41~97

2. Degree of Heavy Metal Pollution

2.1. Evaluation Method of Pollution

At present, the common evaluation methods of soil heavy metal pollution include
accumulated index method, pollution index method, potential ecological harm index
method and so on [2]. In this paper, the single factor index method and Nemerow
comprehensive pollution index method are used for evaluation of different type soil by
heavy metal pollution. The formula is [3]

2
§1 n ·
max pi  ¨ ¦ pi ¸
2

pi
ci
, pc ©n i 1 ¹
si 2 (1)

Where pi is single factor pollution index of heavy metal i , ci is the measured


concentration values of heavy metal i in the soil of, si is the concentration
background value of heavy metal i , pc is Nemerow comprehensive index, n is
the kinds of heavy metals.
In order to show the pollution degree of heavy metals in different regions of the
city, China's green food origin environmental quality evaluation outline is adapted [4],
the relevant standards as shown in table 4:
Table 4 Classification standard of element pollution

Single factor index pi d 1 1  pi d 2 2  pi d 4 pi t 4


moderate high
Pollution degree no pollution light pollution
pollution pollution

Comprehensive index pc d 1 1  pc d 2 2  pc d 3 3  pc d 5 pc t 5
moderate high
Pollution degree security cordon light pollution
pollution pollution
W.-W. Sun and X.-P. Sheng / Soil Heavy Metal Pollution Research 277

2.2. Evaluation Result

Combined with the data, the single factor index pi and Nemerow comprehensive
index pc of eight heavy metals for five classification areas in the city are got by
Eq.(1), as shown in table 5.
Table 5. Pollution index of soil heavy metal

Functional Single factor index Comprehen


area As Cd Cr Cu Hg Ni Pb Zn sive index

Living area 1.53 2.28 1.70 3.57 5.96 1.39 1.93 2.84 4.516
Industrial
1.59 2.40 1.80 4.59 9.69 1.44 2.01 3.16 7.248
area
Mountainous
1.48 2.08 1.34 2.58 5.92 1.31 1.68 1.99 4.487
area
Main road
1.57 2.33 1.73 4.17 8.58 1.40 1.99 2.92 6.448
area
Park green
1.56 2.27 1.70 3.49 5.80 1.39 1.91 2.78 4.498
area
By studying the contrast, there are some differences between soil heavy metal
pollution in five functional areas. Specific as follows:
(1) Hg in five classification areas belong to high pollution, especially the pollution
index in industrial and main road area are 9.69 and 8.58, obviously exceeds bid badly.
As, Cr, Ni, Pb four elements all belong to light pollution. Cd and Zn all belong to
moderate pollution. In addition to high pollution in industrial and main road area, Cu is
moderate pollution in other three.
(2) Through the comprehensive pollution index can be seen that industrial area and
main road area belong to high pollution, the rest three areas are moderate pollution.
And its pollution degree from big to small in turn is industrial area, main road area,
living area, park green area, mountainous area.
(3) Integrated all the data, the concentrations of all heavy metals in industrial area
and main road area are significantly higher than other functional areas. It shows that
industrial and traffic pollution exceeding other pollution has become the dominant
pollution sources in the city.

3. Cause of Pollution

Heavy metals in soil not only affect the growth of plants, also harm to people's health
through the food chain, and may cause the problem such as air and water pollution.
Therefore, in order to improve the living environment and thoroughly solve the
problem of pollution, we must find out the cause of heavy metal pollution.
Table 6. Statistics of heavy metal content in soil

As Cd Cr Cu Hg Ni Pb Zn

Minimum 1.61 40.00 15.32 2.29 8.57 4.27 19.68 32.86


278 W.-W. Sun and X.-P. Sheng / Soil Heavy Metal Pollution Research

Maximum 30.13 1619.8 920.84 2528.48 16000 142.5 472.48 3760.82

Mean 5.68 302.40 53.51 55.02 299.71 17.26 61.74 201.20

Standard
3.02 225.27 8.36 12.75 42.30 9.93 49.98 22.39
deviation
Variation
coefficient 53 74 11 23 14 57 81 11
(%)
Overstanda
77.4 79.6 80.6 88.4 66.5 75.2 81.5 79.0
rd rate (%)
Backgroun
3.6 130 31 13.2 35 12.3 31 69
d value
First, the mean, standard deviation and variation coefficient of eight heavy metals
in soil of the city are calculated using Matlab software, the specific results as shown in
table 6.
Through the data of table 6 we can see that as follows:
(1) The average contents of eight heavy metals in soil are higher than the
background value. As, Cd, Cr, Cu, Hg, Ni, Pb, Zn are respectively 1.58, 2.33, 1.73,
4.17, 8.56, 1.40, 1.99, 1.40 times of the background value. Cu and Hg have
accumulated to a certain extent, mainly from industrial activity and traffic activities of
human beings.
(2) According to rough classification rule of the variation coefficient, varying from
28.8% to 60.62% belong to moderate variation. In the city As, Cd and Pb in soil are
strong variation. Ni is medium variation. Cr, Cu, Hg, and Zn are weak variation. The
variation coefficient scope of eight elements is 11% ~ 81%, and the variability is very
big. It is visible that the soil pollution may be affected by unreasonable layout of
human activities and the influence of the enterprise and road traffic.
Second, the element geochemical study shows that elements with similar causes
often have good correlation [5]. Therefore, heavy metal elements with higher
correlation statistically have similarities in origin. Factor analysis with SPSS statistical
software, can be concluded that the correlation coefficient between the heavy metals in
table 7.
Table 7. Correlation coefficient of heavy metal content in soil

Element As Cd Cr Cu Hg Ni Pb Zn
As 1
Cd 0.2547 1
Cr 0.1890 0.3524 1
Cu 0.1597 0.3967 0.5316 1
Hg 0.0644 0.2647 0.1032 0.4167 1
Ni 0.3166 0.3294 0.7158 0.4946 0.1029 1
Pb 0.2899 0.6603 0.3828 0.5200 0.2981 0.3068 1
Zn 0.2469 0.4312 0.4243 0.3873 0.1958 0.4364 0.4937 1
From the table 7 shows, the relevance between Cr and Ni is strongest, explaining
their sources are roughly same. And their correlation coefficient is maximum 0.7158,
W.-W. Sun and X.-P. Sheng / Soil Heavy Metal Pollution Research 279

indicating Cr and Ni in soil have the closest relations, and their content influence on
each other. Second Cd and Pb is significantly positive correlation, implying a similar
process control distribution features of the heavy metal element in soil.

4. The Location of Pollution Sources

4.1. Statistical Regression Model

If the concentration of heavy metal in a certain place obtains maximum value, it is the
location of pollution sources [6]. According to this idea, we try to establish function
relation between the concentration U of heavy metals and the three-dimensional
coordinate ( x, y, h) of sample points. Using 319 data for interpolation and fitting of the
ternary function U ( x, y, h) , and then calculate maximum of the function. But since the
Matlab software can't achieve common interpolation and fitting for four-dimensional
scattered data, so we consider the statistical regression model. In order to find out the
relation between the concentration U and coordinates x, y , h , first make three scatter
plot about U in x, y , h respectively using data, such as element As is shown in figure 1:

Figure 1. Scatter plot of element As


From figure 1, we cannot see there exists obvious function relation between U and
x, y , h , so it is difficult to establish the regression model further. In fact, there is a
highly complex nonlinear mapping relationship between the content of heavy metal in
soil and its spatial location. Therefore such problem should not be solved with
conventional modeling method.

4.2. BP Network Model

In order to make use of sample data information fully effective, and confirm the space
position of largest concentration for every heavy metal in the urban area, this article
adopts BP neural network to encrypt space interpolation [7]. BP network can learn and
store a lot of input - output model mapping, without prior reveal the mathematical
equations describing the mapping relationship [8]. Therefore, the elevation height can
be effectively Integrated into the network, and the stability and precision of the
network is improved. Specific algorithm is as follows:
(1) Determine the topology of BP network. Input and output node is determined by
the problem. The problem has three input nodes, respectively three-dimensional
coordinates x, y , h of sample point. Output node is eight, respectively the concentration
of eight heavy metals. The number of hidden layer nodes is given according to the
empirical formula [9]
280 W.-W. Sun and X.-P. Sheng / Soil Heavy Metal Pollution Research

n ni  no  E (2)

Where ni , no is the number of neurons in input layer and output layer respective.
E is an integer between 1 to 10.
(2) Initialization. The initial weights, learning rate, error accuracy and maximum
iteration step are set.
(3) For each training sample, the forward error is calculated. If the error is greater
than the precision, then reverse modify weight step by step. When the error accuracy or
maximum iteration step is met, BP algorithm is stopped. Thus the mapping relationship
between the concentration of heavy metals and spatial location is determined.
(4) Use the trained BP network to spatial interpolation. All the sample points are
reduce the sampling intervals (10 m), and the data x, y , h of all encryption are input into
BP network, the network system will automatically calculate the concentration of eight
heavy metal.
(5) By the maximum of heavy metal concentrations of all interpolation points and
sample points, the position coordinates of pollution sources are determined.

4.3. Experimental Results

250 sample data randomly as the training sample set, the rest 69 samples as test data.
The number of hidden layer nodes takes six by Eq. (2). So the topology of BP network
is three-six-eight here. All the initial network weights take random values within the
scope of [-1, 1], learning rate K 0.9 .Error accuracy is set to 0.0001. Maximum
iteration step is 10000. The results as shown in table 8 and table 9.
Table 8. The experimental results of BP network

Recognition rate of
Algorithm Iteration steps Error of training 1000 steps
test sample
BP network 304 4.5951e-08 99.98%

Table 9. The location of pollution sources

As Cd Cr Cu Hg Ni Pb Zn

x (m) 18134 18101 18034 18014 17985 17934 18253 19002

y (m) 10046 10012 9946 9926 9883 10085 10093 9887

h(m) 41 43 42 43 44 41 45 46

Area 4 4 4 2 2 4 1 4
From table 9, we can see that the pollution sources of As, Zn, Cr, Ni, Cd are all in
forth areas, namely main road area. The pollution sources of Cu‫ޔ‬Hg are both in second
areas, namely industrial area. The pollution source of Pb is in first areas, namely living
area.
W.-W. Sun and X.-P. Sheng / Soil Heavy Metal Pollution Research 281

5. Conclusions

In this paper, statistical analysis and BP network higher dimensional interpolation are
used to solve the heavy metal pollution in the soil. The pollution degree, cause,
pollution source location and transmission characteristics come to conclusion. The
model not only can be used on other heavy metal pollution did not mention in the
article, but also in other problem such as air and water pollution, have certain
extension.

Acknowledgment

The first author is grateful to Associate Professor Hai Wu of Fuyang Normal College
for helpful discussions on soil pollution. The related works are supported by Natural
Science Research Project in Anhui Universities (2015KJ003, KJ2015A161) and
Natural Science Foundation in Anhui province (1508085MA12).

References

[1] C. M. Li. Spatial distribution characteristics of soil heavy metal in urban and influencing factors. Journal
of Jinzhong University, 31(2014):24-27.
[2] J. Tang, C. Y. Chen, H. Y. Li, et al. Assessment on potential ecological hazard and human health risk of
heavy metals in urban soil of Daqing city. Geographical Science, 31(2011):118-122.
[3] J. Yin, Y. L. Liu. Spatial Distribultion and Pollution Evaluation of Heavy Metal in Shanghai
Urban-Suburb Soil. Modern Agricultural Science and Technology, 10(2010):251-255.
[4] H. D. Wang, F. M. Fang, H. F. Xie, et al. Pollution evaluation and source analysis of heavy metal in
urban soil of Wuhu city. Urban Environment and Urban Ecology, 23 (2010):36-40.
[5] Y. Qian, W. Zhang, D. C. Ran. The chemical speciation and influencing factors of heavy metals in
Qingdao urban soils. Environmental Chemistry, 30(2011): 652-657.
[6] J. J. Chen, H. H. Zhang, J. M. Liu, et al. Spatial distributions and controlled factors of heavy metals in
surface soils in Guangdong based on the regional geology. Ecology and Environmental Sciences,
20(2011):646-651.
[7] D. W. Hu, X. M. Bian, S. Y. Wang et al. Study on spatial distribution of farmland soil heavy metals in
Nantong City based on BP -ANN modeling. Journal of Safety and Environment, 7(2007):91-95.
[8] G. Li and P. Niu. An enhanced extreme learning machine based on ridge regression for regression.
Neural Computing and Application, 22(2013): 804-810.
[9] R. Zhang, Z. B. Xu, G. B. Huang. Global convergence of online BP training with dynamic leaning rate.
IEEE Transactions on Neural Networks and Learning systems, 23(2012): 330-33.
282 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-282

An Improved Kernel Extreme Learning


Machine for Bankruptcy Prediction
Ming-Jing WANG, Hui-Ling CHEN1, Bin-Lei ZHU, Qiang LI, Ke-Jie WANG and
Li-Ming SHEN
College of Physics and Electronic Information Engineering, Wenzhou University,
325035, Wenzhou, China

Abstract. In this study, a novel parameter tuning strategy for a kernel extreme
learning machine (KELM) is constructed using an improved particle swarm
optimization method based on differential evolution (EPSO). First, the proposed
EPSO is used to obtain the global optimum by introducing the differential
evolution mutation strategy. Then, the EPSO is used to construct an effective and
stable KELM model for bankruptcy prediction. The resultant EPSO-KELM model
is compared to two other competitive KELM methods based on traditional particle
swarm optimization and the genetic algorithm via a 10-fold cross validation
analysis. The experimental results indicate that the proposed method achieved
superior results compared to the other two methods when applied to two financial
datasets. When applied to the Polish bankruptcy dataset, the EPSO-KELM
achieved a classification accuracy (ACC) of 83.95%, an area under the receiver
operating characteristic curve (AUC) of 0.8443, and Type I error and Type II error
of 13.15% and 16.61%, respectively. In addition, the proposed method achieved an
ACC of 87.10%, AUC of 0.8716, and Type I error and Type II error of 15.53%
and 10.13%, respectively, when applied to the Australian dataset. Therefore, the
proposed EPSO-KELM model could be effectively used as an early risk warning
system for bankruptcy predication.

Keywords: Kernel extreme learning machine; Parameter tuning; Improved particle


swarm optimization; Bankruptcy prediction

Introduction

Under normal economic operations, numerous enterprises are at risk of bankruptcy at


any given time. Therefore, an accurate and reliable early warning detection system is
needed by these companies to predict potential financial risks. Existing
decision-making systems for bankruptcy prediction are primarily based on statistical
theory. In these systems, a bankruptcy predication task is represented as a binary
classification task, the output of which is generated by a decision-making model. In
recent years, significant progress has been made in artificial intelligence (AI). In recent
years, decision-making models based on AI methods, such as artificial neural networks,
support vector machines, and rough sets, have been increasingly applied to financial
fields. In addition, several financial decision-making methods based on extreme
learning machine (ELM) theory have been developed, such as bankruptcy prediction

1
Corresponding author: Hui-Ling CHEN, College of Physics and Electronic Information Engineering,
Wenzhou University, 325035, Wenzhou, China; E-mail: chenhuiling.jlu@gmail.com.
M.-J. Wang et al. / An Improved Kernel Extreme Learning Machine for Bankruptcy Prediction 283

[1], corporate life cycle [2], and corporate credit rating [3] models. However, ELMs
can yield inaccurate results when applied to most practical tasks. In order to improve
the accuracy of ELMs, Huang et al. [4] developed the kernel extreme learning machine
(KELM). In a KELM, the connection weights between hidden layers and the input
layer do not have to be generated randomly, improving the performance and training
speed of the decision-making process. Since its development, the KELM has been
widely applied to various fields.
However, the kernel bandwidth γ and penalty parameter C of a KELM
significantly influence its performance. The penalty C controls the relationship between
the complexity and fitting error minimization results of the model. The kernel
bandwidth γ defines the non-linear mapping from the input space to some
high-dimensional feature space. In recent years, methods inspired by biology, such as
the genetic algorithm [5], particle swarm optimization (PSO) [6], and artificial bee
colony [7] methods, have been used to determine these two key parameters. In this
study, an enhanced PSO strategy (EPSO) is developed by introducing the mutation
strategy based on differential evolution (DE) [8] in order to more effectively tune the
kernel bandwidth γ and penalty parameter C. In addition, the classification accuracy
(ACC), area under the receiving operating characteristics curve (AUC), Type I error,
and Type II error of the proposed EPSO-KELM model is compared to those of the
original PSO optimized KELM (PSO-KELM) and genetic algorithm optimized KELM
(GA-KELM). The experimental results indicate that the proposed EPSO-KELM
method more effectively detected enterprises at risk of bankruptcy.
This remainder of this paper is structured as follows. A brief description of the
proposed EPSO-KELM is presented in Section 1. The experimental design of the
proposed method is provided in Section 2. The results and discussion are presented in
Section 3. Lastly, the conclusions and recommendations for future studies are discussed
in Section 4.

1. Proposed Method

1.1. EPSO

During the PSO process, position updates are completed based on the conventional
strategy, in which each particle simply moves around the Pbest and Gbest without
re-diversifying the particle. Because the current best position of the Gbest may not be
the global optimum, each particle moves around itself before moving toward the Pbest
and Gbest. Thus, the following DE mutation strategy was implemented in the PSO
before performing position updates:
X ki X k r2  F *( X k r3  X k r4 ) (1)
In this equation, r2 , r3 , and r4 denote the randomly-generated indices of the
particle with a range of [1, 2, ..., D] that are not equal to the current index i; D denotes
the number of particles in the PSO; and F is a mutation parameter called the scaling
factor. The scaling factor F, which controls the amplification of the difference and
prevents stagnation during the global search, was defined as 0.7 herein.
284 M.-J. Wang et al. / An Improved Kernel Extreme Learning Machine for Bankruptcy Prediction

1.2. EPSO-KELM

In this study, a novel EPSO-KELM model was developed for parameter optimization
problems using a KELM with an RBF kernel. The proposed model consisted of two
procedures, including the inner parameter optimization and outer performance
evaluation procedures. During the inner parameter optimization procedure, the
parameters of the KELM were tuned dynamically using the EPSO strategy via a 5-fold
cross validation (CV) analysis. Then, the obtained optimal parameters were substituted
into the KELM prediction model and used to perform bankruptcy prediction
classification tasks in the outer loop via a 10-fold CV strategy.
The classification accuracy was considered in the design of the fitness function,
written as:
K
f avgAcc ( ¦ testAcci ) / k (2)
i 1
where averAcc denotes the average test accuracy achieved by the KELM classifier
according to the 5-fold cross validation strategy. The pseudo code of the proposed
method is detailed as follows:
Pseudo-code of the parameter optimization procedure
Begin
Set the initialized parameters including the number of particle and the maximum minimum search space
and velocity, the max iterations;
For i = 1 to the number of particles
Initialize the position and velocity of each particle;
Code the C and ǫ according to the position of each particle and calculate the fitness
simultaneously;
C = position (i, 1);
ǫ = position (i, 2);
Fitness (i)= Function(C, ǫ);
end
Find the Pbest and Gbest at the current situation, save them for the next comparison;
[global_fitness ,bestindex] = max (Fitness);
Gbest = position (bestindex, :);
Pbest = position;
Local_fitness = Fitness;
For j = 1 to the max iterations
For k = 1 to the number of particles
Adopt the mutation strategy to diverse each particle;
end
For l = 1 to the number of particles
Update the positions and velocity of each particle;
end
For m =1 to the number of particles
Control the search space of each particle to avoid going out of the boundary of position and
velocity;
end
For n =1 to the number of particles
M.-J. Wang et al. / An Improved Kernel Extreme Learning Machine for Bankruptcy Prediction 285

Calculate the current fitness using the same coding way;


C = position (n, 1);
ǫ = position (n, 2);
Fitness_1 (n)= Function( C,ǫ);
end
Update the Gbest and Pbest;
Because of the obvious fact that the Gbest has the best particle position of the whole search process in the
search space, the BestC and Bestǫ is the position of Gbest;
BestC= Gbest (1, 1);
Bestǫ= Gbest (1, 2);
i=i+1;
end
Return BestC, Bestǫ;
End

2. Experimental Studies

2.1. Data Description and Experimental Setup

In this study, the Wieslaw dataset [9] was used to construct the decision system. The
Wieslaw dataset is comprised of 240 cases with 30 financial ratios. Of the 240 Polish
enterprises, 112 declared bankruptcy from 1997 to 2011. The remaining 128 enterprises
did not declare bankruptcy during this period. All of the observations in this period
occurred 2 to 5 years before bankruptcy. In order to further illustrate the performance
of the proposed method, a slightly larger financial dataset, the Australian credit dataset,
was also used. This dataset consists of 307 creditworthy applicants and 383
non-creditworthy applicants.
The proposed EPSO-KELM, PSO-KELM, and GA-KELM models were
implemented in the MATLAB platform. In order to prevent numerical difficulties when
performing the calculations, the data was scaled to a range of [-1, 1] before
constructing the model. In order to obtain unbiased results, the ACC, AUC, Type I
error, and Type II error of the EPSO-KELM, PSO-KELM, and GA-KELM were
obtained via a 10-fold CV. Then, the average results were used to compare the
performance of the methods. The same number of generations and population swarm
size were adopted in the EPSO, PSO, and GA in order to ensure the accuracy of the
results. According to the results of this preliminary experiment, all of the methods
yielded satisfactory classification results when 280 generations and a swarm size of 8
were adopted. The values of C andǫ varied within the ranges of CЩ{2-5,2-3,2-1...,25}
andǫЩ{2-5,2-3,2-1...,25}, respectively. The maximum velocities of the EPSO and PSO
were both approximately 65% of the dynamic ranges of the variable on each dimension,
with an acceleration coefficient of approximately 2.05 and an inertial weight of 1. The
mutation and crossover probabilities of the GA were approximately 0.7 and 0.4,
respectively.
286 M.-J. Wang et al. / An Improved Kernel Extreme Learning Machine for Bankruptcy Prediction

2.2. Performance Evaluation Measurement

The classification accuracy (ACC), AUC, Type I error, and Type II error results were
used to test the performance of the proposed EPSO-KELM model. These criteria can
be written as:

TP  TN
Acc u 100% (3)
TP  FP  FN  FN

FP
Type I error u 100% (4)
FP  TN

FN
Type II error u 100% (5)
TP  FN
where TP denotes the number of true positives, FN denotes the number of false
negatives, TN denotes the number of true negatives, and FP denotes the number of
false positives. The AUC, the area under the ROC curve, is one of the most accurate
methods of comparing classifiers in two-class problems. The Type I error, defined as
FP/ (FP+TN), calculates the proportion of bankrupt cases incorrectly defined as
non-bankrupt cases. The Type II error, defined as FN/ (TP+FN), calculates the
proportion of non-bankrupt cases incorrectly defined as bankrupt cases.

3. Experimental Results and Discussion

Table 1 displays the average ACC, AUC, Type I error, and Type II error results
obtained by the EPSO-KELM, PSO-KELM and GA-KELM using the two datasets.
According to the results, the performance of the proposed EPSO-KELM was superior
to that of the PSO-KELM and GA-KELM methods for both the Polish and Australian
datasets. The EPSO-KELM yielded an ACC of 83.95%, AUC of 0.8443, Type I error
of 13.15%, and Type II error of 16.61% when applied to the Polish dataset, and an
ACC of 87.10%, AUC of 0.8716, Type I error of 15.53%, and Type II error of 10.13%
when applied to the Australian dataset. In contrast, the PSO-KELM yielded an ACC of
82.19%, AUC of 0.8361, Type I error of 14.63%, and Type II error of 17.76% when
applied to the Polish dataset, and an ACC of 86.37%, AUC of 0.8638, Type I error of
16.62%, and Type II error of 10.60% when applied to the Australian dataset.
Furthermore, the GA-KELM yielded an ACC of 80.37%, AUC of 0.8078%, Type I
error of 16.10%, and Type II error of 22.23% when applied to the Polish dataset, and
an ACC of 85.94%, AUC of 0.8583, Type I error of 17.06%, and Type II error of
11.34% when applied to the Australian dataset. These results indicate that the
EPSO-KELM achieved a higher classification accuracy than the other methods when
applied to bankruptcy prediction. The results also demonstrate that the solution quality
of the proposed EPSO-KELM was superior to that of the PSO-KELM and GA-KELM.
M.-J. Wang et al. / An Improved Kernel Extreme Learning Machine for Bankruptcy Prediction 287

Table 1. Average ACC, AUC, Type I error, and Type II error of the two datasets.
Polish dataset Australian dataset
Methods ACC AUC Type I Type II ACC AUC Type I Type
error error error II
error
EPSO-KELM 0.8395 0.8443 0.1315 0.1661 0.8710 0.8716 0.1553 0.1013
PSO-KELM 0.8219 0.8361 0.1463 0.1776 0.8637 0.8638 0.1662 0.1060
GA-KELM 0.8037 0.8078 0.1610 0.2223 0.8594 0.8584 0.1706 0.1124

The standard deviation reflects whether the performance of a model is reliable. The
classification accuracy and standard deviation of each model after 10 runs of the
10-fold CV with the Polish dataset are displayed in Figure 1. In this figure, the vertical
coordinate of each node represents the ACC, while the length of the bar represents the
standard deviation.
In Figure 1, the green line represents the results obtained by the GA-KELM. As
shown, a relatively high degree of fluctuation was observed in the GA-KELM results.
The results obtained by the PSO-KELM, denoted by the blue line, also exhibited a high
degree of fluctuation. In contrast, the results obtained by the EPSO-KELM, denoted by
the red line, were relatively reliable, with significantly lower standard deviations than
the other methods. Figure 2 displays the experimental results obtained using the
Australian dataset. As shown in this figure, the EPSO-KELM achieved higher ACC
values and lower standard deviations than the other two methods. According to the
above analysis, the proposed EPSO-KELM approach yielded more reliable and robust
results than the PSO-KELM and GA-KELM methods.

Figure 1. ACC and standard deviation after 10 runs of the 10-fold CV with the Polish dataset
288 M.-J. Wang et al. / An Improved Kernel Extreme Learning Machine for Bankruptcy Prediction

Figure 2. ACC and standard deviation after 10 runs of the 10-fold CV with the Australian dataset
The evolutionary processes of the EPSO-KELM, PSO-KELM, and GA-KELM
meta-heuristic optimization methods were recorded using the Polish dataset in order to
analyze their optimization procedures. As shown in Figure 3, the three fitness curves
gradually improved from iteration 1 to iteration 280. However, no obvious
improvements were noted in the EPSO-KELM, PSO-KELM, or GA-KELM results
after iterations 44, 76, and 130. According to the above analysis, the proposed
EPSO-KELM demonstrated efficient convergence toward the global optimum, with an
average ACC of 83.95%. Thus, the performance of the EPSO-KELM was superior to
that of the PSO-KELM and GA-KELM when applied to bankruptcy prediction.

Figure 3. Average best fit results of the EPSO-KELM, PSO-KELM, and GA-KELM during the training
stage after one run of the 10-fold CV
M.-J. Wang et al. / An Improved Kernel Extreme Learning Machine for Bankruptcy Prediction 289

4. Conclusions and Future Work

In this study, an effective and accurate approach, the EPSO-KELM, was developed in
order to precisely detect companies at risk of bankruptcy. In the proposed EPSO-based
approach, the generalization capabilities of the KELM classifier are combined with
PSO and EPSO to achieve optimum parameter tuning for financial decisions. The
experimental results indicated that the ACC, AUC, Type I error, and Type II error of
the KELM constructed with EPSO were superior to those of two other advanced
KELM bankruptcy prediction models constructed with PSO and GA. Therefore, the
proposed EPSO-KELM method could be used as an effective early warning system in
financial decision-making applications. In future studies, the efficacy of the proposed
EPSO-KELM will be tested using other datasets. In addition, the EPSO-KELM will be
applied to other financial problems.

Acknowledgements

This study was financially supported by the National Natural Science Foundation of
China (61303113) and the Science and Technology Plan Project of Wenzhou, China
(G20140048).

Reference

[1] Q. Yu, Y. Miche, A. Lendasse, et al., Bankruptcy prediction using extreme learning machine and
financial expertise. Neurocomputing, 128(2014): 296-302.
[2] S. J. Lin, C. Chang, and M. F. Hsu, Multiple extreme learning machines for a two-class imbalance
corporate life cycle prediction. Knowledge-Based Systems, 39(2013): 214-223.
[3] H. Zhong, C. Miao, Z. Shen, et al., Comparing the learning effectiveness of BP, ELM, I-ELM, and SVM
for corporate credit ratings. Neurocomputing, 128(2014): 285-295.
[4] G. B. Huang, H. Zhou, X. Ding, et al., Extreme learning machine for regression and multiclass
classification. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 42(2012):
513-529.
[5] B. Liu, J. Tang, J. Wang, et al., 2-D defect profile reconstruction from ultrasonic guided wave signals
based on QGA-kernelized ELM. Neurocomputing, 128(2014): 217-223.
[6] L. Zhang and J. Yuan, Fault Diagnosis of Power Transformers using Kernel based Extreme Learning
Machine with Particle Swarm Optimization. Applied Mathematics & Information Sciences, 9(2015):
1003-1010.
[7] C. Ma, J. H. Ouyang, H. L. Chen, et al., A novel kernel extreme learning machine algorithm based on
self-adaptive artificial bee colony optimisation strategy. International Journal of Systems Science,
2014: 1-16.
[8] K. Price, R.M. Storn, and J. A. Lampinen, Differential evolution: a practical approach to global
optimization. 2006: Springer Science & Business Media.
[9] W.Pietruszkiewicz, Dynamical systems and nonlinear Kalman filtering applied in classification in
Cybernetic Intelligent Systems, 2008. CIS 2008. 7th IEEE International Conference on. 2008. IEEE.
290 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-290

Novel DBN Structure Learning Method


Based on Maximal Information Coefficient
Guo-Liang LI, Li-Ning XING1 and Ying-Wu CHEN
College of Information Systems and Management, National University of Defense
Technology, Changsha 410073, China

Abstract. Dynamic Bayesian Network (DBN) is a mainstream approach to


modeling various biological networks including the gene regulatory network
(GRN). For such DBN models that consist only of inter-timeslice arcs, most
current methods for learning it employ either a score and search approach or
Markov chain Monte Carlo (MCMC) simulation, both of which ignore the
structural constraints of DBN models. These structural constraints were first
applied to translate the structure learning problem into discovering associations
among variables, and then a new method was presented to obtain inter-timeslice
arcs. This method was based on maximal information coefficient (MIC).
Experiment results showed that the proposed MIC-based method outperformed
MI-based, MCMC, and K2 algorithm methods on the quality of learned structure.

Keywords. Dynamic Bayesian Networks (DBN), Structure Learning, Inter-


timeslice arc, Maximal Information Coefficient (MIC)

Introduction

Dynamic Bayesian networks (DBNs) are a useful and general representation to model
complex temporal processes, and are widely applied in bioinformatics for modeling
various biological networks including gene regulatory networks and metabolic
networks [1-3]. Learning dynamic Bayesian network structures is to identify
probabilistic relationships in time-series data, and is one of the most challenging
problems. The major methods for learning DBN are primarily adapted from the
approaches of learning static Bayesian network (SBN), namely the search and score
algorithm and Markov chain Monte Carlo (MCMC) simulation.
Some researchers believe score and search algorithms solve this problem.
Reference [4] combined multiple scoring criteria, such as BDe and BIC, and heuristic
search strategies, including greedy searching, simulated annealing, and genetic
algorithm, to design many structure learning algorithms. Reference [5] made changes
to the EGA-DBN algorithm, using an immune algorithm to replace the genetic
algorithm, which achieved good convergence. Reference [6] proposed PS_DBN based
on particle swarm algorithm. Reference [7] built network stepwise and proposed a
novel DBN structure learning algorithm based on particle swarm optimization.
Reference [8] proposed an unsupervised genetic algorithm in which mutual information
is used in the selection of initial population to reduce the search space. Furthermore,

1
Corresponding Author: Li-Ning XING, College of Information Systems and Management, National
University of Defense Technology, Changsha 410073, China; E-mail: xing2999@qq.com.
G.-L. Li et al. / Novel DBN Structure Learning Method Based on MIC 291

this paper provided a new structure representation with no need of the acyclicity test
and a novel searching algorithm for BIC scores using family inheritance to enhance the
efficiency.
The appliance of searching for only the highest scoring network in the score and
search approach may be doubtable, especially in a small size sample because the
posterior is likely to be relatively flat, so there is no sufficiently unique network with
the highest score. Therefore, in many cases, it is more appropriate to consider the full
posterior distribution over the network models or, in reality, a set of high scoring
networks. MCMC methods are used to sample networks directly from the posterior,
which are applied as optimization or estimation procedures.
Dirk Husmeier first applied the MCMC method to learn dynamic Bayesian
network learning, and then explored related factors that affected the sensitivity and
specificity of learned networks [9]. Reference [10] relaxed the time-invariant
assumption and introduced a new type of graphical model called non-stationary
dynamic Bayesian networks. He then presented an MCMC sampling algorithm to learn
the model structure from time-series data. In the experiment part, he applied both
simulated and biological data to test the effectiveness of the algorithm. Reference [11]
improved the MCMC-based DBN structure learning framework with evolutionary
algorithms, and effectively improved the convergence rate.
In the Markov chain Monte Carlo approach, The Metropolis-Hastings acceptance
probability for state transforming from A to B is calculated, and each state is a DBN
that represents the whole structure. Different than the MCMC approach, the score and
search approach learns the inter-network and intra-network separately. However, both
approaches ignore some structural constraints of DBN. In this paper, these structural
constraints were discovered and analyzed to transform the structure learning problem to
discovering associations among variables.
To search for the pairs of closely relevant variables in a dataset, the measurement
of relevance can be calculated for each pair, then the pairs are ranked by their scores,
and the high scoring pairs are examined. For this procedure, the statistic used to
measure relevance should have two heuristic properties, which include generality and
equitability.
Reference [12] presented a measure of dependence for two-variable relationships
that included the maximal information coefficient (MIC). They mathematically proved
that MIC is general, and tested its equitability through simulations. On the basis of their
tests, MIC was found to be useful for identifying and characterizing the relevant
relevance in data.
The maximal information coefficient (MIC) was firstly applied to learn Bayesian
network structure in reference [13]. Reference [14] proposed a novel MIC-based
approach for data discretization, and created a new method for mutual information-
based structure learning with DBN. Reference [15] presented a novel algorithm named
MIC-BPSO (Maximal Information Coefficient – Binary Particle Swarm Optimization)
to learn Bayesian networks from data. This algorithm firstly makes use of MIC in
network construction phase to enhance the quality of initial populations, and then uses
the scoring function’s decomposability to update BPSO algorithm.
The remainder of this paper is organized as follows. Preliminaries about Dynamic
Bayesian networks and maximal information coefficient are presented in Section 1. In
Section 2, the structure learning method based on MIC is proposed. Then, Section 3
presents the experimental results of several benchmark datasets with known structures.
Finally, Section 4 includes the final conclusions and future research is outlined.
292 G.-L. Li et al. / Novel DBN Structure Learning Method Based on MIC

1. Preliminaries

1.1. Dynamic Bayesian Network

A brief review of DBN models is provided. Let X {X1 , , X n } be a set of random


variables; let {xi1 , , xiN } be a data list corresponding to X i from N time points. Let
X i [t ] be the random variable with the value of X i at any time t ; and let
X [t ] {X1[t ], , X n [t ]} . A DBN represents the joint probability distribution function
over a set of n u N random variables, as X [1] ‰ X [2] ‰ X [ N ] . For the arbitrary
complexity of this general probability distribution function, we often apply several
assumptions to simplify it. There are two most popular assumptions of these ones,
which are first-order Markovian property, i.e.
P( X [t ] | X [1], , X [t 1])
1 P( X [t ] | X [t 1]) , and stationary property, i.e.,
P( X [t ] | X [t  1]) is independent of t . The first-order Markov stationary DBN is
obtained by using these two, in which both the structure and the parameters of the
network remain unchanged over time.
DBN models are made up of the following two sub-networks: the prior one and the
transition one, as shown in Figure 1(a) [16, 17]. The prior sub-network includes only
intra-timeslice arcs, while the transition sub-network contains both inter-timeslice and
intra-timeslice arcs, as showed in Figures 1(b) and (c). Collecting m independent data
sample is necessary for learning the prior sub-network. However, sometimes there is no
abundant data for learning biological networks, so that it is practical and relevant to
only learn the transition sub-network. Therefore, DBN refers to only the transition sub-
network portion. Some researchers have further restrained the transition sub-network to
contain only inter-timeslice arcs [18, 19].
1 1 1 1
1 1 1 1 X X X X
X
1 X X
3
X
1 t 1 t t 1 t
2

2 2 2 2 2 2 2 2
X X X X X X X X
1 2 3 1 t 1 t t 1 t

3 3 3 3 3 3 3 3
X X X X X X X X
1 2 3 1 t 1 t t 1 t

Slice 1 Slice 2 Slice 3 (c)transition network (d) DBN containing only


(a) DBN containing three time slices (b) prior network inter-time slice edge

Figure 1. Dynamic Bayesian network diagram


In the knowledge field of genetic networks, inter-timeslice arcs refer to time-
delayed effects, while intra-timeslice arcs refer to real-time effects. In practice, only
time-delayed genetic effects are biologically reasonable, because of the time needed for
the folding, translation, turnover time-scales, nuclear translocation for the regulatory
protein, and the time scale for elongation of the target mRNA[20]. When compared to
the sampling gap, the time lag is too small to be seen as an instant effect. For this paper,
DBN is considered with only inter-timeslice arcs, as presented in Figure 1(d).
G.-L. Li et al. / Novel DBN Structure Learning Method Based on MIC 293

1.2. Maximal Information Coefficient

Assumed max G I ( X ; Y ) is the maximum over all grids in G with the size of | X | u | Y | ,
and let I|*X |,|Y | ( X ; Y ) max G I ( X ; Y ) . We define MIC as follows:

°­ I| X |,|Y | ( X ; Y | D) °½
*

MIC ( X ; Y | D)max ® ¾ (1)


| X ||Y | B ( N ) log min{| X |,| Y |}
¯° ¿°
where N is the size of dataset, and B( N ) is a maximum value referring to the grid
size, which Reshef, et al. suggested to be B( N ) N 0.6 [12]. With the increasing number
of quantization levels, the classical MI tends to increase, so a normalized MI is used.
From the definition of MIC, we can see that it means to find the highest normalized MI
from testing every possible grid with the size up to B(N).
We can also notice that the MIC between two random variables X and Y is really
symmetric, just as follows, MIC ( X ; Y | D) MIC(Y ; X | D) .

2. The MIC-based Solution Method

2.1. Structural Constraints

The structural constraints in Dynamic Bayesian networks (DBNs) are demonstrated in


this section, and applied to transform the structure learning problem to discovering
associations among variables.
The DBN’s stationarity and Markovian characteristic force the inter-timeslice arcs
pointing from variables X i [t  1] of time t  1 to variables X j [t ] of time t . Therefore,
when an association is discovered between two nodes, the direction of it is determined.
In DBN structure learning, it is assumed that many temporal data sequences are
ready. Then, a complete dataset D {D1 , , DN } includes N sequences, in which
each Du consists of instances Du [t ] {xu ,1[t ], , xu , n [t ]} , for t , T (where T is
0, ,T
the total number of slices/frames apart from the initial one). Note that there is an
implicit order among the elements of each Du . D[0] {Du [0] :1 d u d N } is denoted as
the first timeslice’s data, and by D[t ] {( Dut , Dut 1 ) :1 d u d N } , with 1 d t d T , the tth
timeslice’s data (including the (t  1)th timeslice’s data because it is necessary for
learning the transitions).
The nodes were denoted as ( X1 , , X n , X1c, , X nc ) . And a new dataset Dc was
constructed, including N ˜ T elements {D1 , , DT } . We can notice that Dc was
actually a dataset covering 2n variables, because it was formed of pairs ( Dut 1 , Dut ) ,
which were complete instantiations for the variables of transition network in DBN,
containing the elements of two consecutive slices.
There were two additional structural constraints, as follows:
1di dn arc( X i , X ic) (2)
1di dn indegree( X i ,0, eq) (3)
294 G.-L. Li et al. / Novel DBN Structure Learning Method Based on MIC

Eq. (2) forced the relevant relation between the same variable in consecutive
timeslices (in fact, this constraint can be abandoned if there is no need to enforce each
variable to be associated with itself of the last timeslice). Eq. (3) forced the variables
X1 , , X n simulating the previous slice to have no parents.

2.2. Obtaining Inter-Timeslice Arcs Based on MIC

When obtaining inter-timeslice arcs in DBN, the appliance of MIC can be seen as
follows: If the MIC between two variables from continuous slices was high, then the
two nodes were directly associated with each other. Otherwise, if the MIC was very
low, they were independent of each other, referring to no connection edge between
them in DBN structure.
MIC is a useful tool to measure the degree of dependence between two variables
from the two continuous slices. To obtain inter-timeslice arcs, MIC was calculated for
all the variables corresponding to all variables in the next slice in the DBN. Then, the
maximum MIC (MaxMIC) for each variable was given. Also, a threshold value
D 0.9 of the maximum MIC for each variable was found appropriate to include most
of the true arcs [21]. If either of the following equations in (4) was satisfied, an
undirected arc was inserted in the two variables.
­ MIC ( X , Y ) t D ˜ MaxMIC ( X )
OR ® (4)
¯ MIC (Y , X ) t D ˜ MaxMIC (Y )
The pseudo-code for obtaining DBN structure only with inter-timeslice arcs was as
follows:
Obtain inter-timeslice arcs based on MIC
Input V {X1 , , X n , X1c, , X nc } -the variables set; D -dataset
Output G -DBN structure
Begin Procedure
Compute MIC ( X i , X cj ) ( i z j , i, j 1, 2, , n );
Find MaxMIC ( X i ) for each variable X i ( i 1,2, , n );
Select node pairs with threshold D 0.9 ;
­°1; MIC ( X i , X cj ) t D ˜ MaxMIC ( X i )
MIC ( X i , X cj ) ® or
°̄0; MIC ( X i , X cj )  D ˜ MaxMIC ( X i )
­°1; MIC ( X i , X cj ) t D ˜ MaxMIC ( X cj )
MIC ( X i , X cj ) ® ;
°̄0; MIC ( X i , X cj )  D ˜ MaxMIC ( X cj )

Obtain selected MIC ( X i , X cj ) ( i, j 1, 2, , n ) with symmetry;


Get the inter-time slice arcs from selected MIC ( X i , X cj ) ( i, j 1, 2, , n );
Return graph G ;
End Procedure
G.-L. Li et al. / Novel DBN Structure Learning Method Based on MIC 295

3. Experiment

To evaluate the proposed method, Dynamic Asia network was firstly selected as a test
sample, and two other networks followed. It was compared to MI, K2, and MCMC
algorithms from structure learning both the accuracy and efficiency [22].
To evaluate the accuracy of a learned structure, the F-score was introduced to be a
synthetical indicator on the precision and recall of the structure learning algorithm [23].
In this study, all experiment programs were implemented by Matlab and based on
the BNT toolkit developed by K. Murphy. The platform was a PC with Pentium (R) 4
3.20GHz CPU, 1GB RAM, and the operating system is Windows XP.
All experiments in this study were repeated by 10 times for each sample size and
each Bayesian network benchmark, so ten datasets were randomly generated for each
sample size of BN, and the average performance indicator was calculated as the final
result. For each dataset, a Dynamic Bayesian network was learned by each of these
methods.
Table 1 and Figure 2 demonstrate the structure learning results of the Dynamic
Asia network only with inter-timeslice arcs. Compared with other methods, the
proposed method based on MIC obtained better results by obtaining a higher F-score.
Table 1. Results of different methods for Dynamic Asia network structure learning

2000 4000 6000 8000 10000


Method F- Time F- Time F- Time F- Time F- Time
Score (s) Score (s) Score (s) Score (s) Score (s)
MIC 0.54 0.03 0.61 0.04 0.66 0.05 0.69 0.06 0.71 0.06
MI 0.50 0.02 0.55 0.08 0.59 0.11 0.61 0.12 0.62 0.13
K2 0.20 1.21 0.32 2.17 0.46 2.88 0.51 3.45 0.54 3.83
10.8 18.5 43.3
MCMC 0.40 5.18 0.41 0.44 0.47 29.45 0.51
6 2 2

I.
Figure 2. Results of F-score for different sample sizes and methods
There are two other dynamic networks including the WATER network and BAT
network. The WATER network is a dynamic network that monitors biochemical
processes in water supply plants. There are 12 properties, and the corresponding two-
timeslice transition network includes total 24 vertices and 26 edges, as shown in Figure
3(a). The BAT network is a dynamic Bayesian network for highway traffic monitoring.
There are 28 attributes, and the corresponding two timeslice transition network
includes a total of 56 vertices and 42 edges, as shown in Figure 3(b).
296 G.-L. Li et al. / Novel DBN Structure Learning Method Based on MIC

Figure 3. The two dynamic Bayesian networks


The structure learning results for the WATER network and BAT network are
shown in Table 2 and 3, and Figure 4. We can see that the MIC-based method
performed better than the MI-based and MCMC methods in DBN structure learning.
In the small and medium-scale DBN structure learning, the MIC-based method
obtained structure with a higher F-score than the MI-based method although it took
more time. Compared to the MCMC method, the MIC-based method performed better
both in F-score and time consumption.
Table 2. Results of each method in WATER network structure learning
Sample size Performance indicator MIC MI MCMC
F-score(+Time/s) 0.851(+0.112) 0.8181(+0.039) 0.6382(+5.152)
2000 ADD 0 0 5
DELETE 5.2 6.6 8
F-score(+Time/s) 0.878(+0.405) 0.8372(+0.053) 0.6808(+12.03)
4000 ADD 0 0 4
DELETE 4.7 5.8 8
F-score(+Time/s) 0.9(+0.724) 0.85(+0.069) 0.72(+22.62)
6000 ADD 0 0 4
DELETE 3.3 5.2 6
F-score(+Time/s) 0.933(+1.083) 0.864(+0.088) 0.75(+34.81)
8000 ADD 0 0 3
DELETE 2.5 4.8 6
F-score(+Time/s) 0.948(+1.481) 0.87(+0.109) 0.78(+48.43)
10000 ADD 0 0 3
DELETE 2.3 4.7 5
Table 3. Results of each method in BAT network structure learning
Sample size Performance indicator MIC MI MCMC
F-score(+Time/s) 0.8824(+0.224) 0.8266(+0.062) 0.5882(+7.612)
2000 ADD 2.6 2.4 8.3
DELETE 1.2 1.6 5.8
F-score(+Time/s) 0.9091(+0.562) 0.8791(+0.083) 0.6667(+18.75)
4000 ADD 1.9 1.7 8
DELETE 0.9 1.3 4
F-score(+Time/s) 0.9677(+1. 246) 0.9175(+0.104) 0.75(+31.62)
6000 ADD 1.2 1.5 7
DELETE 0.8 1.1 3
G.-L. Li et al. / Novel DBN Structure Learning Method Based on MIC 297

F-score(+Time/s) 0.9753(+1. 831) 0.9352(+0.121) 0.7826(+42.51)


8000 ADD 0.9 1.5 6.4
DELETE 0.8 0.9 2.6
F-score(+Time/s) 0.9832(+2. 263) 0.9416(+0.146) 0.8014(+51.33)
10000 ADD 0.8 1.3 5.2
DELETE 0.6 0.9 2.3

(a) WATER network (b) BAT network


Figure 4. Performance of each method in DBN structure learning

4. Conclusion

This paper presented a more sophisticated application of structural constraints, which


translated the DBN structure learning problem to an issue of discovering associations
among variables. Then, a novel method based on MIC was proposed to obtain inter-
timeslice arcs in DBN. The experimental results showed that the proposed MIC-based
method outperformed the MI-based, MCMC, and K2 algorithm methods. The future
work will apply this new method to learn the optimal DBN structure from biological
data.

References

[1] M. Grzegorczyk, D. Husmeier. Improvements in the Reconstruction of Time varying Gene Regulatory
Networks: Dynamic Programming and Regularization by Information Sharing Among Genes.
Bioinformatics, 27(2011): 693–699.
[2] N. X. Vinh, M. Chetty, R. Coppel, et al. Global MIT: Learning Globally Optimal Dynamic Bayesian
Network with the Mutual Information Test Criterion. Bioinformatics, 27(2011): 2765–2766.
[3] B. Wilczynski, N. Dojer. B N Finder: Exact and Efficient Method for Learning Bayesian Networks.
Bioinformatics, 25(2009): 286–287.
[4] J. Yu, V. A. Smith, P. Wang, A. J. Hartemink, et al. Advances to Bayesian network inference for
generating causal networks from observational biological data. Bioinformatics, 20(2004): 3594-3603.
[5] H. Jia, D. Liu, P. Yu. Learning dynamic bayesian network with immune evolutionary algorithm.
Guangzhou, China: Institute of Electrical and Electronics Engineers Computer Society, (2005): 2934-
2938.
[6] X. Heng, Q. Zheng, T. Lei, et al. Research on Structure Learning of Dynamic Bayesian Networks by
Particle Swarm Optimization. Proceedings of the 2007 IEEE Symposium on Artificial Life (CI-ALife
2007), 2007: 85-91.
298 G.-L. Li et al. / Novel DBN Structure Learning Method Based on MIC

[7] Y. Lou, Y. Dong, H. Ao. Structure Learning Algorithm of DBN Based on Particle Swarm Optimization.
2015 14th International Symposium on Distributed Computing and Applications for Business
Engineering and Science (DCABES). IEEE, 2015: 102-105.
[8] J. Dai, J. Ren. Unsupervised evolutionary algorithm for dynamic Bayesian network structure learning.
Workshop on Advanced Methodologies for Bayesian Networks. Springer International Publishing,
2015: 136-151.
[9] D. Husmeier. Sensitivity and specificity of inferring genetic regulatory interactions frommicroarray
experiments with dynamic Bayesian networks. Bioinformatics, 19(2003): 2271–2282.
[10] J. Robinson, A. Hartemink. Learning non-stationary dynamic Bayesian networks. Journal of Machine
Learning Research, 11(2010): 3647–3680.
[11] W. Hao, Y. Kui, H. Yang. Learning dynamic Bayesian networks using evolutionary MCMC.
Piscataway, NJ, USA: IEEE, 2006: 2934-2938.
[12] D. N. Reshef, Y.A. Reshef, H. K. Finucane, et al. Detecting Novel Associations in Large Data Sets.
Science, 334(2011): 1518–1524.
[13] Y. Zhang, W. Zhang. A novel Bayesian network structure learning algorithm based on maximal
information coefficient. Proceedings of the Fifth International Conference on Advanced Computational
Intelligence. IEEE, 2012: 862–867.
[14] N. X. Vinh, M. Chetty, R. Coppel, et al. Data Discretization for Dynamic Bayesian Network Based
Modeling of Genetic Networks. ICONIP 2012, Part II, LNCS 7664, 2012: 298–306.
[15] G. Li, L. Xing, Y. Chen. A New BN Structure Learning Mechanism Based on Decomposability of
Scoring Functions. Bio-Inspired Computing-Theories and Applications. Springer Berlin Heidelberg,
2015: 212-224.
[16] N. Friedman, K. Murphy, S. Russell. Learning the structure of dynamic probabilistic networks. In
Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI), San
Francisco, CA: Morgan Kaufmann Publishers, 1998: 139–147.
[17] G. Li, X. Gao, R. Di. DBN structure learning based on MI-BPSO algorithm. In: 13th IEEE/ACIS
International Conference on Computer and Information Science, 2014: 245–250.
[18] B. Wilczynski, N. Dojer. BNFinder: exact and efficient method for learning Bayesian networks.
Bioinformatics, 2009, 25(2): 286–287.
[19] N. Dojer Learning Bayesian Networks Does Not Have to Be NP-Hard. In Proceedings of International
Symposium on Mathematical Foundations of Computer Science, 2006: 305–314.
[20] S.A. Ramsey, S.L. Klemm, D.E. Zak, et al. Uncovering a macrophage transcriptional program by
integrating evidence from motif scanning and expression dynamics. PLOS Computational Biology,
4(2008).
[21] Y. Zhang, W. Zhang, Y. Xie. Improved heuristic equivalent search algorithm based on Maximal
Information Coefficient for Bayesian Network Structure Learning. Neurocomputing, 117(2013): 186–
195.
[22] G. F. Cooper, E. Herskovits. A Bayesian method for the induction of probabilistic networks from data.
Machine Learning, 9(1992): 309-347.
[23] E. Patrick, K. Kevin, L. Frederic. Information-Theoretic Inference of Large Transcriptional Regulatory
Networks. EURASIP Journal on Bioinform atics and Systems Biology, 2007.
Fuzzy Systems and Data Mining II 299
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-299

Improvement of the Histogram for


Infrequent Color-Based Illustration Image
Classification
Akira FUJISAWA, Kazuyuki MATSUMOTO, Minoru YOSHIDA, and Kenji KITA1
Advanced Technology and Science, Systems Innovation Engineering, Tokushima
University, Tokushima, Japan

Abstract. Illustration images have a style that is often characterized by its


emotional features. In this research, we tried to classify an illustration image based
on the style. We hypothesized that the color that low appeared frequently would
help to analyze the style. In previous research, an Infrequency histogram (IF-hist,)
was proposed as a feature that focused on the color appearance infrequency.
However, IF-hist has the weakness that this feature cannot be used effectively for
certain kind of images. To address this issue, we proposed a color histogram which
uses a histogram value as its threshold. To evaluate the effectiveness of the
proposed method, we experimented with classifying the illustration image into the
style of "For boys" and "For girls". As a result of the experiment, when using the
proposed histogram as the feature, classification performance was better than using
a normal color histogram. Moreover, the F-measure of the proposed histogram was
better than the F-measure of IF-hist.

Keywords. Illustration image, Color histogram, Image classification, Style

Introduction

Illustration images are used in animation works or comics. Those illustration images
are also called "Anime illust". Figure 1 shows an example of an illustration image.

Figure 1. Example of illustration image2.


Illustration images can give viewers impressions. In this research, we define such
impressions as the “style.” Illustration lovers can often identify the illustrator of a work
by recognizing the illustrator’s style. The ability to receive the style is developed by

1
Corresponding Author: Kenji KITA, Tokushima University, 2-1, Minamijosanjima-cho, Tokushima
770-8506, Japan; E-mail:kita@is.tokushima-u.ac.jp.
2
© 2015 Ell ( http://elleneed.blog102.fc2.com/ )
300 A. Fujisawa et al. / Improvement of the Histogram

experience viewing the illustration images. Thus, receiving the style from the
illustration image in detail is difficult other than illustration lovers. While a few studies
have focused on illustration images [1-2], previous researchers did not propose method
for automatically identifying the style. In this paper, we studied the illustration style to
achieve the following purpose.
x To investigate image features that is effective to classify the style.
x To construct style-based classifiers for illustration images.
In section 1, we describe previous research. In section 2, we describe “IF-hist” and
present an improved histogram “vIF-hist”. In section 3, we describe experiment that
investigates the effectiveness of the proposed method and discuss the experimental
result. Finally, we describe the conclusion of this research in section 4.

1. Previous Research

This section introduces previous research. Kuriyama [3-4] created the recognition
model based on various image features such as Local Binary Pattern features and HSV
model. Moreover, he created the global feature amount per one image by integrating
local features. To study the similarity of touch in relation to illustration images,
Kadokura [5] used the number of types of colors. Besides this, a number of studies
have targeted clip art [6-8]. Clip art is a kind of illustration image. However, clip art is
used as icons or symbols. Therefore, clipart is very simple compared to the type of
illustration images that are targeted in our study. Because illustration images such as
“Anime Illust” include more sensitive and complicated expressions, illustrations that
are classified as clipart should be distinguished from our research target.

2. Proposed Method

2.1. IF-hist

Fujisawa et al [9] assumed that the color to represent the style is determined relatively.
Based on this assumption, they proposed a color histogram focuses on infrequently
appearing colors called an Infrequency Histogram, or IF-hist. Fujisawa et al. ranked
color frequencies according to histogram value and used the ranking result as a
threshold.
IF-hist is created by changing the histogram values of gradations appearing more
frequently than a certain frequency of appearance. This operation is carried out
according to Eq.(1).

ℎ;ÀL# (f >(ℎ;ÀL# ) ≤ ÁÂV#£ )


c¾-ℎ;ÀL# = e (1)
1.0 (ÃLℎhfÄ;Àh)

IF-ℎ;ÀL# and ℎ;ÀL# indicate each histogram value of gradation n. ÁÆÇz‚ indicates
the ranking threshold of each histogram value. This calculation method does not
change the color information under the threshold. However, the colors with more than a
certain frequency of appearance have only information on the presence or absence of
the colors in the target image. As a result, information on how often the colors
A. Fujisawa et al. / Improvement of the Histogram 301

appeared in the image is lost. This method created a color histogram by keeping
information about the infrequent colors. However, they found a weakness in this
method. The method has the problem that the histogram values of gradations that had
originally been appearing less frequently are changed. Figure 2 shows a concrete
example of the problem IF-hist has.

Figure 2. Sample of the problem.


The left histogram is a normal color histogram, and the right histogram is an IF-
hist that ÁÆÇz‚ equal to 64. These histograms were obtained from the same illustration
image. When focusing on histogram values that are surrounded by a square, the
gradations that have low histogram values are continuous in the normal color histogram.
However, the histogram values of gradations that had low histogram values were
changed in IF-hist. This IF-hist cannot use the infrequent color as a feature. To solve
this problem, we propose an improved IF-hist.

2.2. Histogram value as threshold

This subsection describes the improved IF-hist. The improved IF-hist is created by
using a different threshold from the original IF-hist. We refer to the proposed
histogram as vIF-hist. The creation of vIF-hist is shown in Eq.(2)

ℎ;ÀL# (ℎ;ÀL# ≤ ÁÈVÉÊ )


/c¾-ℎ;ÀL# = e (2)
1.0 (ÃLℎhfÄ;Àh)

ÁËÇÌÍÎ indicates the threshold of histogram value. vIF-hist is obtained by using this
threshold. Figure 3 shows the sample of a vIF-hist. The left histogram is a normal color
histogram which is the same as Figure 2. The right histogram is a vIF-hist that ÁËÇÌÍÎ
equal to 0.005. Compared to the IF-hist in Figure 2, in vIF-hist, the histogram values
of gradation that appeared infrequently were not changed. In addition, the other
histogram values of gradation that surrounded by a square were changed as intended.

Figure 3. Sample of vIF-hist.


302 A. Fujisawa et al. / Improvement of the Histogram

3. Evaluation Experiment

3.1. Experiment process

This section describes the evaluation experiment. We obtained vIF-hist from


illustration images. Using vIF-hist, we classified the illustration images based on their
styles by using Support Vector Machines (SVMs). In the experiment, two styles are
used: “For boys” and “For girls.” In the experiments, we selected the cover illustration
of comics as experimental data. The selected comic titles and their publishers are
popular and widely available. As a preparation for the experiment, we decided the
illustration styles of “For boys” and “For girls” based on data such as comic titles,
publishers and magazine titles including these comic works.

3.2. Experimental condition

Illustration images used in experiment were collected from “Rakuten Kobo3,” which is
a shop that sells digital books. Table 1 shows the number of data used in evaluation
experiment. We used 30-fold cross-validation. The library used in this experiment was
lib-svm [10].
Table 1. Experimental Data

Cover illustrations for boys 609


Cover illustrations for girls 610
Total 1,219
We compared the following color histograms.
x Normal color histogram
x IF-hist : ÁÂV#£ = {64, 128}
x vIF-hist : ÁÈVÉÊ = {0.002, 0.005}
To evaluate our proposed method, the normal color histogram was used as a
baseline. In addition, we compared the classification result of IF-hist to investigate
whether classification precision was improved by using vIF-hist. By integrating the
histogram of the RGB channels, we created a 768-dimensional vector from an image.
The histogram value was normalized by pixel count. Range of histogram values were
[0, 1].
We used recall, precision, and F-measure for evaluation of the experimental results.
The definitions for recall, precision, and F-measure are given in Eqs. (3), (4) and (5).

Ð8
Recallk = (3)
Ñ8

Ð8
Precisionk = (4)
Õ8

_hg 6 ∗ 0fhg;À;Ã>6


F-measurek = 2 ∗ (5)
_hg ; + 0fhg;À;Ã>6

3
Rakuten Kobo( https://store.kobobooks.com/ )
A. Fujisawa et al. / Improvement of the Histogram 303

R k : Áℎh >B–Úhf ÃA g ÀÀ;A;h5 Lℎh ÀL¢h ; gÃffhgL¢.


Nk : Áℎh >B–Úhf ÃA ;BÀLf L;Ã> ;– ²hÀ Äℎ;gℎ ℎ /h Lℎh ÀL¢h ;.
Ck : Áℎh >B–Úhf ÃA ;BÀLf L;Ã> ;– ²hÀ Lℎ L Ähfh g ÀÀ;A;h5 À ℎ /;>² Lℎh ÀL¢h ;.
i: The name of the style.

4. Result and Discussion

Result
Table 2 and Table 3 show the experimental results of classifying the styles “For boys”
and “For girls”.
Table 2. Experimental result using normal color histogram.

Style Recall Precision F-measure


Boys 53% 21% 30%
Girls 50% 81% 62%

Table 3. Experimental result using IF-hist amd vIF-hist.

Using IF-hist Using vIF-hist


F- F-
áâãäå Style Recall Precision áæãçèé Style Recall Precision
measure measure
Boys 78% 77% 78% Boys 83% 83% 83%
64 0.002
Girls 77% 79% 78% Girls 83% 83% 83%
Boys 82% 80% 81% Boys 81% 84% 83%
128 0.005
Girls 81% 82% 82% Girls 81% 80% 83%

When using the normal color histogram, the difference of value between precision
and recall was large. In addition, the F-measure of "For boys" was low. From this, it
was considered that the classification results were biased. In the case using vIF-hist, the
overall precision is over 80%. Similarly, the value of recall was higher than the result
using a normal color histogram. Therefore, we considered that our proposed histogram
could effectively classify an illustration image based on the style. In addition, the F-
measure of vIF-hist was better than the F-measure of IF-hist. Thus, it was considered to
be successful at improving the method of IF-hist, and classification was improved.

4.1. Discussion

In this experiment, we used cover illustration images of comics as experimental data.


By using cross-validation for these data, we used them as both training data and test
data. However, this opened the possibility that a cover illustration image that had been
drawn by the same illustrator might be included in the training data and the test data.
We thought this might affect the classification result. In particular, these illustration
images might be classified as illustration images drawn by the same illustrator. Namely,
these illustration images might not be classified based on the style. To investigate the
effect of illustration images drawn by the same illustrator, we conducted an additional
experiment.
304 A. Fujisawa et al. / Improvement of the Histogram

In the additional experiment, we used the same experimental data as we used in the
evaluation experiment. We divided those data into training and test data so that images
by the same illustrator would not be included in both. By using the training data, we
created a classifier by SVM. To classify the test data based on the style, we conducted
the experiment using this classifier. The number of images used for the experiment was
1,219. Table 4 shows a breakdown of the additional experiment data.
Table 4. Training and test data used in additional experiment.

Training data Test data


Cover illust for boys 548 Cover illust for boys 61
Cover illust for girls 550 Cover illust for girls 60
Total 1,098 Total 121
The histogram used in the additional experiment is vIF-hist. The value of ÁÈVÉÊ
equal to 0.005. This value was chosen because of the classification result was best in
the evaluation experiment. Table 5 shows the classification result.
Table 5. Classification result in additional experiment

áæãçèé Style Recall Precision F-measure


Boys 55% 70% 59%
0.005
Girls 76% 63% 68%
The value of macro average precision was approximately. 60%. Compared with
the result in Table 3, the overall classification efficiency decreased. From this result,
we considered that some illustration images were classified based on features such as
the style or the characteristics of an illustrator. However, the precisions were between
60% and 70%. The F-measure of “For boys” was nearly 60%. From this result, we
concluded that it was possible to classify images based on their styles even when those
images were not drawn by the same illustrator.

5. Conclusion

In this study, we aimed to classify illustration images based on their styles. To do so,
we focused on color information that are used infrequently. IF-hist was a color
histogram focused on colors that appeared infrequently on an image. IF-hist used the
rank of appearance frequency of a color as a threshold. However, this histogram had a
weakness. To improve this weakness, we proposed using the histogram value directly
as the threshold. In addition, we named the improved IF-hist, vIF-hist.
To evaluate vIF-hist, we experimented with classification of the illustration image
as either of the styles "For boys" or "For girls". When we looked at the F-measure, the
result with vIF-hist was better than the result with IF-hist regardless of the style. From
this, we considered that the weakness of IF-hist was improved by vIF-hist.
For future studies, we would like to investigate the relationships among colors with
low appearance frequencies on an illustration image. Also, we would like to further
investigate the colors that are not used for the illustration image to represent its
particular style.
A. Fujisawa et al. / Improvement of the Histogram 305

Acknowledgments

This work was supported by JSPS KAKENHI Grant Numbers 15K00425, 15K00309, and
15K16077.

References
[1] T. Itamoti, M. Miwa, K. Taura, and T. Chikayama, An identification algorithm of illustration artists,
Proc.74th National Convention of IPSJ, 74(2012), 2.209-2.210.
[2] S. Aoki, and R. Miyamoto, Feature point extraction from a manga-style illustration of a facial image
using sift and color features as a feature vector, IEICE Tech. Rep., 115(2016), SIS2015-59, 63-68.
[3] S. Kuriyama, Cognitive classification for styles of illustrations based recognition model, IPSJ SIG
Technical Report 2013-CG-152(2013), 1-7.
[4] T. Furuya, S. Kuriyama, and R. Ohbuchi, An unsupervised approach for comparing styles of
illustrations, Oral paper, Proc.13th International Workshop on Content-Based Multimedia Indexing
(CBMI) 2015,1-6.
[5] K. Kadokura and Y. Osana. Image(Artwork) Retrieval based on Similarity of Touch, Proc.75th
National Convention of IPSJ 2013, 603-604.
[6] M. J. Fonseca, B. Barroso, P. Ribeiro, and J. A. Jorge, Retrieving ClipArt Images by Content, Proc.
the 3rd International Conference on Image and Video Retrieval (CIVR) 2004, 500-504.
[7] E. Garces, A. Agarwala, D. Gutierrez, and A. Hertzmann, A similarity measure for illustration style,
ACM Transactions on Graphics (SIGGRAPH 2014), 33(2014).
[8] P. Martins, R. Jesus, M. Fonseca, and N. Correia, Clip art retrieval combining raster and vector
methods, 11th International Workshop on Content-Based Multimedia Indexing(CBMI) 2013, 35-40.
[9] A. Fujisawa, K. Matsumoto, M. Yoshida, and K. Kita, An Illustration Image Classification Focusing
on Infrequent Colors, International Journal of Advanced Intelligence, 8(2016), 84-98.
[10] C. C. Chang and C. J. Lin. LIBSVM: A library for support vector machines, ACM Transactions on
Intelligent Systems and Technology, 2(2011).
306 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-306

Design and Implementation of a Universal


QC- LDPC Encoder
Qian YI a, *and Han JING b
a
Department of Information Science and Technology, Taishan University, Taian, China
b
Department of Information Engineering, Taishan Medical University, Taian, China

Abstract. This paper designed a programmable processor based on multiple


instruction multiple data (MIMD) structure, to realize the quasi-cyclic low density
parity check code (QC-LDPC) coding algorithm for the wireless LAN (WLAN)
and the worldwide interoperability for microwave access (WIMAX). Compared
with the traditional LDPC encoder, the processor uses a programmable
concatenation-operation to achieve matrix-vector multiplication, obtains high
computation speed and easy chip layout. The RTL code of the processor has been
written with the Verilog language on the Xilinx ISE platform, and is synthesized
on the FPGA chip XC2VP20. Its maximum clock frequency is 75MHz. The
experimental results show that the structure is suitable for multi standard LDPC
encoder.

Keywords. Low density parity check(LDPC), WLAN, worldwide interoperability


for microwave access (WIMAX), multiple instruction multiple data (MIMD)

Introduction

In wireless digital communication system, the LDPC code is an important class of the
forward error correction codes. It is closest to the Shannon limit decoding performance,
with a high data throughput [1]. So the LDPC code has been widely used in many
wireless communication standards, such as WLAN and WiMAX in the field of
broadband wireless access [2].
With the broad application of LDPC codes, multi-standard universal LDPC
encoder will be widely used in the future too. Therefore the implementation of the
encoder is required to not only have some flexibility, but also to be close to the special
chip in area and power consumption. The application specific instructions sets
processor (ASIP) is usually selected to achieve the demanding requirements.
Based on the special instruction set processor architecture, paper [2] designs an
architecture aware LDPC (AA-LDPC) encoder. Its core computing component is
matrix multiplier. In paper [3~6], several high speed QC-LDPC code encoders are
designed which have some special structures in computing unit. In this paper, we
design a 4 bits RAM with a less resources to complete the cyclic shift function for
multi-standard. The encoder can calculate the cyclic shift in parallel, which increase the
speed and the throughput.

*
Corresponding Author: Qian YI, Department of Information Science and Technology, Taishan
University, Taian, China ; E-mail: bjkdqy@126.com.
Q. Yi and H. Jing / Design and Implementation of a Universal QC-LDPC Encoder 307

In the section 1, the paper analyses the QC-LDPC coding algorithm, the section 2
describes the system structure and main calculation module of the encoder, the section
3 gives the simulation results and analysis.

1. QC-LDPC Encoding Algorithm Analysis

The QC-LDPC code is based on the set structure. It reduces the coding complexity and
is easier to use semi-parallel hardware structure to realize. Some communication
standards such as WLAN [8], WIMAX [5], DTMB [3] and CCSDS [7] etc. have been
adopted it. The quasi lower triangular matrix encoding algorithm is proposed by
Richarson and Unbanke (also called RU algorithm). The algorithm makes full use of
the sparsity of the check matrix, and rearranges the rows and columns of the parity
matrix to get the lower-triangular-like H matrix, which can reduce the complexity of
the linear encoding [8]. In order to effectively encode the LDPC code, the matrix H is
divided according to the RU algorithm. The A, B, C, D, E and T are circulant
permutation matrices, which are composed of the basic block size g×g. The g is the
expansion factor and the z is an integer much smaller than n.
Assuming the parity code word v=(s, P_1, P_2), which the s is the information
code, the P_1 and P_2 are the first and second parts of the check code, there are:
)    (1)

 (2)
)    (3)
According to the block method, the length of the s, P_1 and P_2 are n-m, z, and m-
z respectively. As known parameters, the Ф can be directly input to the encoder, which
does not need to be calculated.
Analysis of the algorithm from the view of hardware realization, the encoder needs
to calculate the equation (1) and equation (2). According to the analysis of the
algorithms in the paper [4], the core operations of the two equations can be divided into
the multiplication of a g×g matrix and a g-dimensional vector, and the modular-2
addition of the two g-dimension vectors. The modulo-2 addition operation is generally
implemented by the XOR circuit.
According to the matrix theory lemma: If an identity matrix right shifts x bits
cyclically and then multiplies with a column vector, it is equivalent to the column
vector upward shifts x bits cyclically. Therefore, the cyclic shifter is implemented to
achieve the multiplication of a g×g matrix and a g-dimensional vector. If the cyclic
shifter shifts N bits, the result of this shift can be obtained after N clocks usually. In the
paper [5], the logarithmic cyclic shifter can output the result after log2N clocks. These
cyclic shifters are suitable for the case that the shift-bit length is constant. Even in the
same communication standard, the g often has different values. For example, in
IEEE802.16e the g is equal to 24, 28, 32...and in IEEE802.11n it is 27, 54, 81... If the
general cyclic shifter and the logarithmic-cyclic shifter are used to carry out the cyclic
shift of the g-dimensional vector, the data length of the two shifters should be the
maximum value of all the g. In this way, when the g value of the actual operation is
smaller, there is a large resources waste and long-time delay in the chip. In order not to
affect the accuracy and throughput of the subsequent operation, additional control
circuit is also needed to deal with the redundant or invalid bits in the final result.
308 Q. Yi and H. Jing / Design and Implementation of a Universal QC-LDPC Encoder

In addition to ensuring universal, this paper combines the two kinds of design
ideas, such as loop shift and instruction string operation, to extract the special
instruction set of P_1 and P_2. In this way, the modules of the chip can run in parallel
with the structure, so as to increase the resource utilization and the running speed of the
encoder.

2. Encoder Structure Design

The bit length of the s is from hundreds to thousands, only the parallel structure can
increase the coding efficiency and flexibility. In this section, the design of MIMD
parallel architecture is shown in Figure 1.

2.1. MIMD Structure

The MIMD architecture has a plurality of processing units which do the different
operations depending on the different control flows. The units can process different
data, achieve spatial parallelism, so the MIMD architecture is often used for special
purpose computing.

Processing unit

Data bus
Control bus

group #8 Instruction Instruction


decode unit memory

Processing Matrix vector Local


unit multiplier memory
Control unit

memory
...

...

Processing unit
group #1 Instruction Instruction
decode unit memory

Processing Matrix vector Local


unit multiplier memory

Figure 1. The MIMD architecture of the processor


In Figure 1, there are 8 processing unit groups. Each processing unit has its own
ALU (Arithmetic Logic Unit), registers and branch-judgment module. It performs
simple addition, subtraction, logic operations and branches etc. The expansion factor in
IEEE802.16e is a multiple of 4, and in IEEE802.11n it is 27. So the encoder selects 4-
bit as the bit length of a processing unit.
The matrix vector multiplier can run in parallel with the ALU to increase the
system's parallel degree and can shorten the running time of the program. Its detailed
structure will be introduced in the section 2.2.
The instruction decode unit decodes the instructions from its corresponding
instruction memory. If it is a local instruction, the instruction decode unit will send the
Q. Yi and H. Jing / Design and Implementation of a Universal QC-LDPC Encoder 309

control signals to the processing unit. If it is a global instruction, the instruction decode
unit will send the control signals to the control unit.
The control unit coordinates the processing tasks among the processing unit groups,
manages the synchronization signal, the external handshake signal and so on.

2.2. The Matrix Vector Multiplier

After the instruction decode unit shown in Figure 1 decodes the MVMs instruction, the
controller module sends out the control signal, which causes the matrix vector
multiplier to complete the matrix vector multiplication. The hardware structure of
matrix vector multiplier is shown in Figure 2.
local memory

register #0
g-dimensional
g×g matrix
vector data
memory register #1
memory selector

register #2

controller

Figure 2. The structure of the matrix vector multiplier


In Figure 2, the g-dimensional vector memory is a 4 bits read-write memory, and
the information code s is stored in it as a sequence of 4bits/ group.
Element information of the A block matrix is stored in the g×g matrix memory.
The storage address of the matrix element is g×raw+column. Taking the rate 1/2 basic
matrix of IEEE802.16e standard as an example, the storage mode is shown in Figure 3.
Register #0, register #1 and register #2 are three 4 bits registers, used to
temporarily store the element s for shifting. The three registers and the data selector
implement the data shifting function. The partly final data is stored into the local
memory.
The controller reads the data from the g×g matrix memory, and then determines
the concatenation way of the data selector. There are 4 concatenation ways of the data
selector, assuming that N is the length of the part information code in the local memory:
Address g×g matrix memory

0 A0,0
1 A0,1
2 A0,2
...
...

23 A0,23
24 A1,0
...
...

Figure 3. The address of the A element in the g-dimensional vector memory


310 Q. Yi and H. Jing / Design and Implementation of a Universal QC-LDPC Encoder

If mod (N, 4) =0, thus N is divisible by 4, data in the register #0, register #1 can
output directly without concatenation.
If mod (N, 4) =1, then on a clock cycle, {register #0 [2:0], register #1 [3]} is
written into local memory; and on the next clock cycle, {register #1 [2:0], register #0
[3]} is written into local memory. This is equivalent to moving the data one bit.
If mod (N, 4) =2, then on a clock cycle, {register #0 [1:0], register #1 [3:2]} is
written into local memory; and on the next clock cycle, {register #1 [1:0], register #0
[3:2]} is written into local memory. This is equivalent to moving the data 2 bits.
If mod (N, 4) =3, then on a clock cycle, {register #0 [0], register #1 [3:1]} is
written into local memory; and on the next clock cycle, {register #1 [0], register #0
[3:1]} is written into local memory. This is equivalent to moving the data 3 bits.
The space-time task description of the matrix vector multiplier is:
On the first clock cycle, the data with address (0) of the g-dimensional vector
memory is written into the register #0 and the register #2.
On the second clock cycle, the data with address (1) of the g-dimensional vector
memory is written into the register #1. The two group data in the register #0 and the
register #1 are concatenated into a new 4bit data accordance with the above 4 ways,
and on the next clock cycle the new 4bit data is written into the address (0) unit of the
local memory
On the third clock cycle, the data with address (2) of the g-dimensional vector
memory is written into the register 0#. The two group data in the register #0 and the
register #1 are concatenated, and on the next clock cycle the new 4bit data is written
into the address (1) unit of the local memory... and so on.
As the data of the first group is also stored into register #2, on the last clock cycle,
register #2 replaces register #0 or register #1 to concatenate to get the last 4-bits data
which will be written into the last unit on the next clock..
Ñ
The entire working process requires + 3 clock cycles.
ê

2.3. Specific Instructions Design

The specific instruction length is 25 bits. Hardware support for dedicated instruction
has been introduced in the previous section. Some major commands are shown in the
following Table 1.
Table 1. The main instructions

Instruction Description Instruction Description


ADD perform addition operation JUMP unconditional jump instruction
OUTS output the encoded code SYN some PEGs applications for running follow
word instructions simultaneously
XORS calculate the addition of two Setmrc_pe_ control inter-connection module to strobe
vectors EM data path from PEGi to extern_ram
MVMs perform nonzero sub-matrix Setmrc control inter-connection module to strobe
and vector multiplication _EM_pe data path from extern_ram to PEGi

When the PEGs specified by the "SYN" receive the synchronization signal, they
will continue to execute the next instruction, or wait. Instruction "MVMs, N" is to
complete the multiplication of a matrix and vector, N is the size of the sub matrix.
Instruction "OUTS N" outputs code word which length is N.
Q. Yi and H. Jing / Design and Implementation of a Universal QC-LDPC Encoder 311

3. Experimental Results and Conclusion

The function of the processor is simulated in XC2VP20 XILINX, using 8 PEG


modules. The maximum clock frequency is 75MHz. The processor can execute 8
instructions in parallel. The Slices, 4 input LUTs and BRAMs account for 44%, 29%
and 18% of the total resource respectively.
At present, the processor has been programmed for the WLAN which the code rate
is 3⁄4, code length is 1944, the size of the sub block matrices is 81, and is for WiMAX
which is 1⁄2, 2304, 96 respectively. The two throughputs are close to 1000Mbps. The
comparison with other encoders is shown in Table 2.
Table 2. Comparison with other encoders

Device LUT Frequency Throughput rate Code


This paper XC2VP20 5470 75MHz 0.97Mbps 1⁄2, 2304, 96
Paper[4] XC4VLX160 7092 80 MHz 0.24 Mbps Support three kinds
of rate
Paper[5] EP2C70F896C6 7641 188 MHz 0.752 Mbps 1⁄2, 2304
As the clock frequency, the number of PEG, PEG parallelism and MIMD
procedures are all affect the throughput, it also shows the potential of the processor to
improve the throughput.
This paper studies the hardware design method of LDPC encoding algorithm
which is suitable for WLAN and WiMAX. Experimental results show that the proposed
structure is suitable for QC-LDPC codes. In addition, the processor is still in the initial
stage of research, a lot of aspects can further optimize and its performance will be
further improved.

Acknowledgements

The research work was supported by Tai'an Science and Technology Development Plan
# 201330629 and Shandong Provincial Natural Science Foundation, China
#ZR2013FL030.

References

[1] R. G. Gallager. Low density parity check codes. IRE Trans, Inform. Theory, 8(1962), 21-28.
[2] A. C. Vikram, J. J. Sarah, G. Lechner. Memory-efficient quasi-cyclic spatially coupled low-density
parity-check and repeat-accumulate codes. IET Communications, 8(2014), 3179 – 3188.
[3] X. J. Zhang. Research on Encoder/Decoder of AA-LDPC Codes Based on ASIP. East China Normal
University, PhD dissertation, 2011.
[4] M. Zhao and L. Li. High throughput in-system programmable quasi cyclic LDPC encoder architecture.
Journal of Tsinghua University (Science and Technology), 49(2009), 1041-1044.
[5] Y. Zhang, X. M. Wang, H. W. Chen. FPGA based design of LDPC encoder. Journal of Zhejiang
University (Engineering Science), 45(2011), 1582-1586.
[6] R. J. Yuan, FPGA-based Joint Design of LDPC Encoder and Decoder. Journal of Electronics &
Information Technology, 34(2012), 38-44.
[7] P. W. Qiu, P. Bai, M. Y. Li. Design of Dynamically Reconfigurable LDPC Encoder Based on FPGA
According to CCSDS Criteria. Video Engineering. 36(2012), 59-62.
[8] H. Y, He, Principle and Application of LDPC. Beijing: The People's Posts and Telecommunications
Press, 2009.
312 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-312

Quantum Inspired Bee Colony


Optimization Based Multiple Relay
Selection Scheme
Feng-Gang LAI, Yu-Tai LI and Zhi-Jie SHANG1
State Grid Information & Telecommunications Branch, Beijing, China

Abstract. Multiple relay selection schemes for cooperative relaying considering


maximizing the end-to-end signal to noise ratio (SNR) and power efficiency are
researched in this paper. In cooperative multiple relay networks, the relay nodes
which is selected are very important to the system performance. How to choose the
best cooperative relay node number and which relay nodes are selected are an
optimization problems. The exhaustive search scheme can solve the relay selection
problem but the complexity will increase exponentially with the size of network,
that is, the number of relays in the network. Two novel quantum bee colony
optimization (QBCO) based relay selection schemes which optimize the SNR and
power efficiency are proposed in this paper respectively. Simulation results show
that the QBCO-based scheme has a much better performance compared with other
schemes in literature.

Keywords. Cooperative relaying, relay selection, power efficiency, quantum bee


colony optimization

Introduction

It is well known that the relay nodes play an import role in conventional cellular
networks to help enlarge the coverage of base station or increase the overall cell
throughput compared to 3GPP LTE [1]. It is an effective and important technology to
solve the problem of cell coverage shortage and the throughput of cell-user especially
cell-edge user and improve the whole wireless system performance [2-3]. Relay
technology are very important in other cooperative networks too, such as ad hoc
networks. As a part of important network design, relay selection, power control,
spectrum resource allocation have been widely researched in previous period. Relay
selection, which is an important part of the application of relaying systems, influences
the performance of relay nodes effectively. Most of the relay selection researches base
on certain function of channel state information (CSI), which is considering distance
between the source and destination, path loss or SNR [4]. In this scenario, the receiver
knows all the CSI between the source and relay and the CSI between relay and
destination thus chooses the relay with the best performance based on some function of
CSI [5]. However, selecting single relay in wireless networks may have the
disadvantage of imbalance of low utilization of resource, and moreover, the

1
Corresponding author: Zhi-Jie SHANG, State Grid Information & Telecommunications Branch,
Beijing, China; E-mail: shangzhijie@163.com
F.-G. Lai et al. / Quantum Inspired Bee Colony Optimization 313

“emergence” diversity gain among multiple relays cannot be achieved. Furthermore,


single assisted relay maybe have heavy load, thereby having the problem of imbalance
of resource utilization. In order to solve this problem, [6] has proposed A load-based
relay selection algorithm. Single relay selections cannot avoid the fading effect of
wireless channels, so multiple relay selection schemes are widely studied. However,
with the number of relays increasing, the network may have the problems of much
more interference and resource crashing. Thus, how to choose a set of suitable relays is
very important. It has the ability of both improving the stable of network and
maximizing the end-to-end SNR or power efficiency. It is especially useful for wireless
relay networks with multiple relays and many complex constraints. As we all know,
energy efficiency or power efficiency is of great importance in green communications,
which is also the same in the relay networks. In [7], several relay selection strategies
for multiple relay scenarios are proposed, which takes the instantaneous error rate and
fast fading channels into consideration. In [8], a novel relay selection scheme
considering energy-efficiency is proposed, in the scheme, the relay node which has the
best energy efficiency is selected. In [9], a relay ordering based relay selection scheme
which considers end-to-end SNR and end-to-end power efficiency is proposed. But the
scheme for energy-efficiency only can get a sub-optimal solution, and the results of
simulation in [9] illustrate that the solution given by the relay ordering scheme has a
large gap compared exhaustive search scheme.
Since the problems of multiple relay selection can be modeled as a 0-1
optimization problem, intelligence algorithm is used to solve the problems. Some
classical intelligence algorithms are widely researched and applied, like particle swarm
optimization (PSO) [10]. Quantum genetic algorithm (QGA) is combination of
quantum theory and genetic algorithm therefore it has the advantage of faster
convergence rate, stronger searching abilities, less computing time. QBCO is a novel
swarm intelligence algorithm for solving cognitive radio spectrum allocation problem
proposed in [11]. Therefore quantum theory has great efficiency and effectiveness, so
the QBCO algorithm is designed to solve multiple relay selection problems.
We organize the rest of the paper as follows. The network model and problem
illustration is in Section . The QBCO based relay selection scheme is proposed in
Section II. Section III is the simulation, and Section IV concludes the paper.

1. Network Model and Problem Illustration

In this section, we consider a cooperative wireless relay system model, which is


composed of one transmitter, one receiver and R relays which are used for cooperation
as described in Figure 1. With relay selection schemes, a set of relays is selected from
the R potential relays which maximize the SNR or the power efficiency proposed later.
314 F.-G. Lai et al. / Quantum Inspired Bee Colony Optimization

Relay node 1

f1 g1

Relay node 2 Receiver


Transmitter f2 g2
. . . y
s . . .
. . .
f R1 gR1
Relay node R-1
fR gR

Relay node R

Figure 1. Cooperative wireless relay network


Assume that each relay in the network has only one antenna for transmitting and
receiving signals. Denote fi as CSI from s to the i -th relay and g i as CSI from the
i -th relay to y. It is assumed that fi and gi is full known by the i -th relay, all CSI
f1 , f 2 , , f R and g1 , g2 , , g R is known by the receiver. Assume that all CSI are
normalized independent identical distribution (i.i.d.) random variables with Rayleigh
distribution, which has the characteristics of mean of zero and variance of unit. P
denotes the transmission power of the transmitter, and Pi denotes the transmission
power of the i -th relay. Note that power control is not considered in the model, that is
to say, the transmitter or relay cannot save power and sponsor the transmissions with
better channels. A relay either only has two choices, cooperate or not cooperate when
there is a transmission link. We use a two-step AF protocol to forward information
without decoding.
In multiple relay system, the transmitter sends the signal Pz to the i -th relay in
the first transmission process, while in the second transmission process, the i-th relay
ai Pi e jTi
amplifies its signal received from transmitter by (therefore the transmission
2
1  fi P
power is ai 2 Pi ) and then forwards it, where ai represents whether the i -th relay is
chosen or not, while ai 1 , the i -th relay is chosen, ai 0 , otherwise.
Assume that the relays transmit at the same time, that is to say, synchronization
problem is not considered. The angle Ti  arg fi  arg gi aims to represent the
phase of the signal. The received signal by the i-th relay’s is defined as
fi Ps  vi .Thus, the received signal is

R ai fi gi Pi R ai gi Pi
y P¦ s¦ ui  w  (1)
2 2
i 1 1  fi P i 1 1  fi P

where w is the receiver’s AWGN and ui vi e jarg fi while vi is the the i -th relay’s
AWGN. All of the noises are assumed to be i.i.d. complex Gaussian random variables
which has the characteristics of mean of zero and variance of unit. It is obvious that ui
and vi have the same distribution. The average SNR of the communication system is
F.-G. Lai et al. / Quantum Inspired Bee Colony Optimization 315

2
§R · § R a2 g 2P ·
a fg P
P¨¦ i i i i ¸
¨ ¦
J ¨1  i i i
¸   
¨¨ 2
¸¸ 
2 ¸
i 1 1 f P © i1 1 f P ¹
© i ¹ i

The SNR based multiple relay selection problem can be written as

max J s.t. ai ^0,1`   


a1 , , aR

In wireless networks, energy expenditure is also an emergency problem, so energy-


saving communication are widely researched. It is easy to see that the power of all
k
selected relays, Ptotal ¦ Pi increases with the selected relays number k. We define the
i 1
power efficiency as the ratio of J general to all of the transmission powers (the
transmitter power and the relay power). The single objective power efficiency based
multiple relay selection problem can be written as

J
 max K = R
s.t. D i {0,1}   
D1 , ,D R
P¦ D i2 Pi
i 1

Assume that the receiver knows all CSI, this problem is equivalent of solving the
problem of the SNR or power efficiency maximization, which is like the problem in
[12]. But, here the power control problem is not taken into consideration. Instead, each
relay has only two choices: to take part in the cooperation with full power or not to take
part in the cooperation at all. Since every relay has two choices, the problems
considering SNR or power efficiency are general 0-1 optimization problems.
Exhaustive search scheme has the ability to solve the problem, but the complexity
increases exponential with relay number. So the QBCO is used to solve the multiple
relay selection problems to get a better solution, which will be presented in Section II.

2. Multiple relay Selection Scheme

It has been proposed in [9] that for wireless communication relay networks which have
more than 2 relays, all relays are to corporate or not to cooperate at all, there exists no
optimal relay ordering. So the schemes proposed in [9] are not global-optimal, that is to
say, only get a sub-optimal solution. So we propose the multiple relay selection
schemes based on QBCO algorithm.
This paper QBCO is used to solve multiple relay selection problems, which is
referred by social behavior of bees. Three groups of bees, that is, quantum employed
bees, quantum onlooker bees and quantum scouts bees consists of the colony of
quantum bees. They look for food resources (which are represented by quantum
position) in an R dimensions space according to its own and its parteners’ historical
experiences; where R represents the optimization problem’s dimension. In QBCO,
316 F.-G. Lai et al. / Quantum Inspired Bee Colony Optimization

quantum coding is used to represent the probabilistic state, and the quantum velocity
can be represented by a string of quantum bits. One quantum bit can be written by a
pair of numbers (D , E ) , where D 2  E 2 1 . D 2 decides the probability that the
quantum bit is in the '0' state and E 2 decides the probability that the quantum bit is in
the '1' state. The i-th quantum bee’s quantum velocity is

ªD i1 D i 2 D iR º
«   
EiR »¼
vi
¬ Ei1 Ei 2

| Dij |2  | Eij |2 1,( j 1,2, , R)


where For efficiency, we design the QBCO
D E 0 d Dij d 1 0 d Eij d 1
and set ij and ij are real numbers and , . So D ij 1  E ij 2 , and
equation (5) can be simplified as

vi >Di1 Di 2 DiR @ >vi1 vi 2 i @   


viR

2.1. Evolutionary process of Quantum Employed Bees

The quantum colony consists of h quantum bees that flies in a space of R dimensions,
xi ( xi1 , xi 2 , , xiR ), (i 1, 2, , h) represents the i-th quantum bee’s position in the
space. vi (vi1 , vi 2 , , viR ) >Di1 DiR @ represents the i-th particle’s quantum
Di 2
velocity is and until now the best bit position (the local optimal bit position) of the ith
quantum bee is pi pi1 , pi 2 , , piR , i 1,2, , h . The global optimal bit position
found by the whole bee colony is p g p g1 , p g 2 ,
, pgR . At each iteration, the
quantum rotation gate, quantum velocity and bit position of the i-th quantum bee is
updated by the following quantum moving equations respectively:

t 1
 Tij e1 ( pijt  xijt )  e2 ( pgjt  xijt )   

­ 1  (vt )2 , if ( pt xt pt and r  c )
° ij ij ij gj 1
vijt 1 ®   
°abs(vijt u cos Tijt 1  1  (vijt ) 2 u sin Tijt 1 ), else
¯

­°1, if J ijt 1 ! (D ijt 1 )2


xijt 1 ®   
°̄0, if J ijt 1 d (D ijt 1 )2

where r is a uniform random number in the real domain[0, 1], c1 is a probability


t 1 2
which is a constant among [0,1 / R] , J ij  [0,1] is uniform random number, D ijt 1
F.-G. Lai et al. / Quantum Inspired Bee Colony Optimization 317

defines the selection probability of bit position state in the (t  1)-th generation. The
value of e1 and e2 represents the relative important degree of pti and ptg .
After updating the quantum velocity and bit position, calculate the fitness of
each quantum bee based on certain function, that is (3) or (4). If the fitness of xti 1 is
better than that of pti , then update pti 1 as xti 1 . If the fitness of pti 1 is better than that
t 1
of ptg , then update p g as pti 1 .

2.2. Evolutionary Process of Quantum Onlooker Bees

The quantum onlooker bees’ quantum position is based on the selected quantum
employed bee’s quantum position. The selection possibility of the k-th quantum
employed bee can be defined be calculated by the following equation:
U x
pkt h k (10)
¦ U xi
i 1

J K
where U xk represents the fitness of x k , which is in (3) or in (4).
At each iteration, the quantum rotation angles and velocities of the i-th quantum
onlooker bee are updated by the following equations, assume that the k-th quantum
employed bee is selected as the guidance of the quantum onlooker bee:

Tijt 1 e1 ( pkjt  xijt )  e2 ( pgjt  xijt )   

­ 1  (vt )2 , if ( pt xt pt and r  c )
° ij kj ij gj 1
vijt 1 ®  
°abs(vijt u cos Tijt 1  1  (vijt ) 2 u sin Tijt 1 ), else
¯

After updating the velocity and position of each quantum onlooker bee, the
fitness is computed as the process of employed bee.

2.3. Evolutionary Process of Quantum Scout Bees

When the fitness of each quantum employed bees and quantum onlooker bees does not
change in limit times, then it becomes a quantum scout bee, which has the ability to
find new food resources, thus the quantum position is selected randomly.
From the above analysis, the processes of quantum bee colony optimization for
multiple relay selection are shown below:
Step1: Suppose that the receiver knows the CSI f1 , f 2 , , f R and g1 , g 2 , , g R .
Step2: Create an initial quantum bee colony randomly based on quantum coding.
Step3: For all quantum bees, calculate the fitness (i.e., J or K ).
318 F.-G. Lai et al. / Quantum Inspired Bee Colony Optimization

Step4: Update each quantum bee's quantum velocity and quantum position through the
evolutionary process of the three quantum bees.
Step 5: Update the local optimal position of each quantum bee. Update the global
optimal position of the whole quantum bee colony.
Step 6: If the maximum iteration is reached, stop and output the relay selection result;
if not, go to step 3.

3. Simulation Results and Analysis

In this section, we show the simulated J and K of the proposed QBCO scheme with
relay ordering multiple relay selection schemes (the complexity is R), exhaustive search
scheme(the complexity is 2R ) and single relay selection scheme (the complexity is 1).
In the simulation, all channels and noises at all of the relays and destination are i.i.d.
complex random variables with Gaussian distribution and mean of zero, variance of
unit. For QBCO, we set the maximal generation is 100, h 20, e1 0.06,e2 0.03,
c1 1/ 300 (the complexity is 100˜20 without considering R).
Firstly, 15 relays are adopted in the simulation and they have the same power value
Pi . Figure 2 shows the simulation results. We can see that SNR increases with the
power. From Figure 2(a), we can also see that the three relay ordering multiple relay
selection schemes perform almost the same, and multi Best Worst Channel Selection
performs the worst, while the multi SNR-based Selection performs the best among the
three relay ordering multiple relay selection schemes, but the QBCO performs better
than all of the relay ordering multiple relay selection schemes, which is the same as
exhaustive search. The gap between QBCO and the other schemes is obvious. Also, it
is obvious that multiple relay is much effective than single-relay.
Then set the number of relays as 20, Figure 2(b) shows the simulation results.
From Figure 2(b), we can see that the QBCO perform better than the other relay
selection schemes, and compared with Figure 2(a), we can see that when the relay
number increases, the QBCO can find an optimal solution compared with other
algorithms.
25 45

Muti-Realy QBCO Muti-Realy QBCO


40 Muti-Realy SNR-Based
Muti-Realy Exhaustive Search
20 Muti-Realy SNR-Based Muti-Realy Best Harmonic
35 Muti-Realy Best Worse
Muti-Realy Best Harmonic
Muti-Realy Best Worse Single Realy SNR-based
Single Realy SNR-based 30
15
25
J

20
10
15

10
5

0 0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
P(W)
P(W)

(a) (b)
Figure 2. The comparison of SNR for QBCO scheme and other schemes
Now let we consider the power efficiency problem. Figure 3 shows the simulation
results.
F.-G. Lai et al. / Quantum Inspired Bee Colony Optimization 319

1.4 2.1

1.35
2
1.3

1.25 1.9

1.2
1.8
1.15
K

K
1.7
1.1

1.05 Muti-Realy QBCO


1.6
Muti-Realy SNR-Based Muti-Realy QBCO
1 Muti-Realy Best Harmonic Muti-Realy SNR-Based
Muti-Realy Best Worse 1.5 Muti-Realy Best Harmonic
0.95 Exhaustive Search Muti-Realy Best Worse
0.9 1.4
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
P(W) P(W)

(a) (b)
Figure 3. The comparison of power efficiency for QBCO scheme and other schemes
Figure 3(a) considers the case when the number of relays is 15, while Figure 3(b)
considers the case when relay number is 20. Among the three relay ordering multiple
relay selection schemes, the Best Worst scheme performs the worst, while the SNR-
based performs the best, which has similar performance with Figure 2. Our algorithm,
the QBCO, perform better than the other relay selection schemes and has the same
performance as exhaustive search when R = 15. Compare Figure 3(a) with Figure 3(b),
we can see that as relay number increases, the advantage of our algorithm is much more
obvious.
From Figure 2 and Figure 3, the differences between the QBCO multiple relay
selection scheme and relay ordering multiple relay selection schemes which maximize
SNR or power efficiency is obvious. And if the relay number increases, the advantage
of the QBCO-based multiple relay selection scheme is much more obvious.

4. Conclusions and Future Work

This paper has proposed two multiple relay selection schemes based on QBCO which
maximizes SNR and power efficiency respectively in the cooperative multiple relay
networks. The proposed schemes have a great advantage with SNR and power
efficiency targets compared with other schemes in literature.

References

[1] 3GPP TR 36.814, Further Advancement for E-UTRA Physical Layer Aspects, v 1.5.2, Dec. 2009.
[2] N. Laneman, D. N. C. Tse, and G. W. Wornell, Cooperatvie diversity in wiress networks: efficient
protocols and outage behavior, IEEE Transactions on Information Theory, 50(2004), 3062-3080.
[3] A. Nosratinia, T. Hunter, and A. Hedayat, Cooperative communication in wireless networks, IEEE
Communications Magazine, ol. 42(2004), 68-73.
[4] V.Sreng, H.Yanik, D.Falconer, Relayer Selection Strategies in Cellular Networks with Peer-to-Peer
Relaying, VTC 2003-Fall. 2003 IEEE 58th, 3(2003):1949-1953.
[5] A. Bletsas, A. Khisti, D. P. Reed, and A. Lippman, A simple cooperative diversity method based on
network path selection, IEEE Journal on Selected Areas in Communications,24(2006), 659-672.
[6] X. Lin and L. Cuthbert, Load Based Relay Selection Algorithm for Fairness in Relay Based OFDMA
Cellular Systems, in Wireless Communications and Networking Conference, 2009. WCNC 2009.IEEE,
2009, 1-6.
320 F.-G. Lai et al. / Quantum Inspired Bee Colony Optimization

[7] H. Eghbali, S. Muhaidat, S. A. Hejazi and Y. Ding, Relay Selection Strategies for Single-Carrier
Frequency-Domain Equalization Multiple relay Cooperative Networks, in IEEE Transactions on
Wireless Communications, 12(2013), 2034-2045.
[8] T. Zhang, S. Zhao, L. Cuthbert and Y. Chen, Energy-efficient cooperative relay selection scheme in
MIMO relay cellular networks, in Proc. of IEEE Int’l Conf. on Commun. Systems (ICCS), 269-273,
Nov.2010.
[9] Y. Jing and H. Jafarkhani, Single and multiple relay selection schemes and their available divercity
orders, IEEE Trans.on Wireless Communication, 8(2009), 1414-1423.
[10] J. Kennedy and R. Eberhart, Discrete binary version of the particle swarm optimization, in Proc. IEEE
International Conference on Computational Cybernetics and Simulation, 4104-4108, 1997.
[11] H. Y. Gao, J. L. Cao, Quantum-inspired bee colony optimization algorithm and its application for
cognitive radio spectrum allocation, Journal of Central South University, 43(2012):4743-4749.
[12] Y. Jing and H. Jafarkhani, Network beamforming using relays with perfect channel information,
submitted for publication, 2006.
Fuzzy Systems and Data Mining II 321
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-321

A Speed up Method for Collaborative


Filtering with Autoencoders
Wen-Zhe TANG, Yi-Lei WANG 1 , Ying-Jie WU, Xiao-Dong WANG,
College of Mathematics and Computer Science, Fuzhou University, Fuzhou, Fujian
China

Abstract. Collaborative Filtering(CF) is a widely used technique in Recommender


System. With recent development in deep learning, Neural network based CF has
gained great attention in recent years, especially auto-encoders. However, the main
disadvantage of autoencoder based CF is the problem of the large sparse target. In
this paper, we propose a training strategy to tackle this issue, We run experiments on
two popular real world datasets MovieLens 1M and MovieLens 10M. Experiments
show orders of magnitude speed up while Attaining similar accuracy compare to
existing autoencoder based CF method.

Keywords. Recommender System, Neural Network, Auto-encoder

Introduction

Collaborative Filtering(CF)[8] is a widely used technique in Recommender System. It


uses full rating history to predict the unrated items. Among popular CF models. There
are mainly two types, memory-based CF, and model-based CF[11]. Memory-based CF
needs to compute the similarity between users and items. It’s easy to implement but failed
to scale on big data (due to its quadratic nature).
Model-based CF, however, different from memory-based CF, learns a model from
rating history to predict the missing rating entry. Popular model-based CF approaches
like matrix factorization[6] compute the latent factor from sparse matrix of ratings. Re-
cent approach includes SVD[6], Alternating Least Square with Regularzation[15], and
Local Low-Rank Matrix Approximation. [7]
With the development of deep learning, neural network based CF has gained atten-
tions in recent years. RBM-CF [10], a two layer undirected graphical model, has shown
its power in Netflix prize. But RBM-CF uses contrastive divergence[4] to train, which
is very slow when data is large. Recently, a new approach called Autoencoder-based CF
has achieved state-of-the-art performance. Those networks are trained with sparse rating
data and aim to give a dense reconstruction on input sparse data. [14] use the stack de-
noising autoencoders and denoising losses to learn a low dimensional representation of
the movie, and [12] use the direct autoencoder loss to optimize the network, both give
near state-of-the-art accuracy.

1 Corresponding Author: Yi-Lei WANG, College of Mathematics and Computer Science, Fuzhou University,

Fuzhou, Fujian, China; E-mail: yilei@fzu.edu.cn.


322 W.-Z. Tang et al. / A Speed up Method for Collaborative Filtering with Autoencoders

The main disadvantage of Autoencoder-based CF method is the problem of training


on large sparse target, the loss function of Autoencoder based CF is just evaluated on
known values. But known implementations often reconstruct the full rating matrix. This
makes training slow. But if we directly implement the sparse loss function, we couldn’t
take advantage of the benefit of matrix computation. When data is a little (>1%) dense,
the sparse implementation is slower than corresponding dense ones. This makes ‘sparse’
implementation impractical, especially on GPU.
In this paper, we proposed a training strategy to tackle this issue. By exploiting
the sparsity structure, especially the power law distribution, we can greatly accelerate
the training process of Autoencoder-based CF. We run experiments on two popular real
world datasets. Experiments shows orders of magnitude speed up comparing to [12]
and [14], while obtaining similar accuracy.

1. Related works

Speed up method for Collaborative Filtering: Most of the work done previously fo-
cused on memory-based CF. [2] designed a similarity metric optimized for hardware to
speed up the k-nearest neighbor computation. [9] using a k-nearest neighbor graph to
retrieve the most similar user or item. Our work, different from previous two, is a speed
up method for Neural Network based CF.
Neural Network based CF: Restricted Boltzmann Machine (RBF) [10] CF was one
of the first neural network based CF models. Recently,autoencoder based CF[12,14], a
subclass of Neural Network based CF, have achieved state-of-the-art performance. But
autoencoder based CF trained with large sparse target is often slow because it didn’t
exploit the sparsity structure of the ratings. Our approach solved this problem.

2. Autoencoders

Autoencoder[3] is a type of neural network use for learning efficient codings. The aim
of an autoencoder is to learn a representation from set of data Typical autoencoder use
narrow bottleneck layers to force the dimensionality reduction on data. The network
consists of two parts:

Encoder:f (x) = σ(U x + b1 )


Decoder:g(y) = σ(V y + b2 )

where input x ∈ RN , the output of encoder y ∈ Rd , d in the size of the bottleneck


layer (d << n), and U ∈ RN ∗d , V ∈ Rd∗N ,b1 ∈ Rd ,b2 ∈ RN , and the σ() is the
Nonlinear activation function, the objective of the autoencoder is to reconstruct the input
g(f (x)) ≈ x, the loss function is usually squared loss

L(x) = ||nn(x) − x||2


W.-Z. Tang et al. / A Speed up Method for Collaborative Filtering with Autoencoders 323

3. Autoencoders for collaborative filtering

Given N users and M items and the sparse rating matrix R, the rating rij is the rating
of ith user on jth item. A user profile can be described by a sparse vector ui = the ith
row of R, and similarly, an item profile can be described by a sparse vector vj = the jth
column of R.The goal of the Collaborative Filtering is to complete the sparse vector. So
we have two autoencoders to complete the R:

The User-Encoder: nn(ui ) = ûi


The Item-Encoder: nn(vj ) = vˆj

In the experiment, the Item-Encoder usually performs better than User-Encoder.


In the following section, we would mainly discuss the Item-Encoder. Because of the
huge numbers of parameters in Item-Encoder(2*M*d), we must use weight-decay, and
dropout[13] on the first layer to prevent overfitting.

4. Our method of speed up Autoencoders

Because data in the recommender problem is usually sparse(<0.5% non-zeros), ex-


isting method to train neural network often ’pad’ the missing values with zero and
masking losses,what we called it regular dense implementation. This method takes
O(C1*N*M*d) operations.
There are another implementation called sparse implementation.This method di-
rectly uses the sparse matrix as input and just calculate the loss on known input, but this
method often slower than the dense method when data contains dense rows. This method
takes O(C2*N*K*d) where K is the average nonzeros in each row. In reality, C2 C1 ≈ 100
on CPU, and on GPU C2 C1 ≈ 500 on GPU. We couldn’t ignore the big constant on sparse
operations when K > 0.01M .
In the real world recommender data, for each item, the number of users rated it often
follows power law distribution(See figure 1), most items are rated by a very small subset
of the user (really sparse), but some popular items are rated by nearly every user. So
given a threshold T, we can split the matrix into two parts Rthin ,Rf at , the ’thin’ matrix
contains items rated by less than T users and the ’fat’ matrix contains items rated by
above T users. According to the nature of power law distribution, the ’thin’ matrix should
contain most of the rows.
If we set the proper T (0.01 ∗ M is usually good), and use the sparse operation on
Rthin , and dense operation on Rf at , we can get the incredibly low number of operations
considering the big difference in constants. We can take the benefit of sparse operation
and avoid big constant in sparse operations.
We use Google Tensorflow[1] library to implement our acceleration method.

5. Experiments

5.1. Dataset Description

In this section, we test the speed and accuracy of I-CFAE on 2 real world datasets. Movie-
Lens 1M, MovieLens 10M, we randomly select 10% of the ratings for testing and 90%
for training. Prediction Errors are measured by Root Mean Squared Error(RMSE):
324 W.-Z. Tang et al. / A Speed up Method for Collaborative Filtering with Autoencoders

Figure 1. Row density histogram of MovieLens 1M and MovieLens 10M (on log scale)

Algorithm 1 buildNetwork(assuming two existing Ops SparseNetworkOp and


DenseNetworkOp implemented)
Input: sparse rating matrix R, threshold T
count = rowCount(R)
Rthin = R[count<=T ]
Rf at = R[count>T ]
Outputthin = SparseN etworkOp(Rthin )
Outputf at = DenseN etworkOp(Rf at )
LossT otal = Loss(Outputthin , Rthin ) + Loss(Outputf at , Rf at )



 1 
N 
RM SE =  (nn(xtrain,i )j − xtest,i,j )2
||Rtest || i=1 xtest,i,j !=0

Where ||Rtest || is the number of non-zeros entries in the test set.


We reported the average RMSE over 5 different train-test splits. We compare our
model with strong baselines include RBM[10],ALS-WR[15],LLORMA[7], state of the
art Autoencoder-based CF models I-CFN[14] and I-AutoRec[12]. And we compare the
running time with I-CFN[14] and I-AutoRec[12], both on CPU and GPU. And we report
the average training time over 5 runs. We run each program sequentially in Intel(R)
Xeon(R) CPU E5-2620 v2 @ 2.10GHz to test the CPU time, and Nvidia GTX 980 to
test the GPU time.

5.2. Experiment Results

5.2.1. Running time on CPU

Because most of the rows in MovieLens 1M and MovieLens 10M are incredibly sparse
(<1% non-zeros), our speed up implementation gives order of magnitude speed up to I-
CFN, even 3-4 folds speed up compared to dedicated sparse implementation I-AutoRec
[12].
W.-Z. Tang et al. / A Speed up Method for Collaborative Filtering with Autoencoders 325

Table 1. Training time on CPU


DataSet Methods One Epoch Time Total Training Time
MovieLens 1M I-CFN 18.97s 647.5s
MovieLens 1M I-AutoRec 4.02s 182.4s
MovieLens 1M I-CFAE(Our) (T = 0.01*M) 1.50s 64.1s
MovieLens 10M I-CFN 361.12s 7212.1s
MovieLens 10M I-AutoRec 81.12s 7300.8s
MovieLens 10M I-CFAE(Our) (T = 0.01*M) 25.89s 1294.5s

5.2.2. Running time on GPU

Table 2. Training time on GPU(AutoRec contains CPU only implementation)


DataSet Methods One Epoch Time Total Training Time
MovieLens 1M I-CFN 3.51s 123.12s
MovieLens 1M I-AutoRec N/A N/A
MovieLens 1M I-CFAE(Our)(T = 0.01*M) 0.50s 40.1s
MovieLens 1M I-CFAE(Our)(T = 0.005*M) 0.40s 32.4s
MovieLens 10M I-CFN 57.02s 1140.34s
MovieLens 10M I-AutoRec N/A N/A
MovieLens 10M I-CFAE(Our)(T = 0.01*M) 18.27s 548.1s
MovieLens 10M I-CFAE(Our)(T = 0.005*M) 16.17s 485.10s

Because of the huge advantage of GPU on dense implementation. We must lower the
threshold from T=0.01*M to T=0.005*M to provide good results. Because sparse com-
putation can only be done on CPU, and we need to transfer the data between CPU and
GPU. So the speed up ratio is lower than CPU. But still 2-4 folds faster than full dense
implementation(I-CFN).

5.3. Test Set RMSE on Movielens 1M

On MovieLens 1M dataset, we use the 5% cross-validation data to select the hyperpa-


rameter(learning rate, learning rate decay, weight decay, dropout) of I-CFAE. We use
tanh as the activation function and Adam[5] optimizer to optimize the loss. The bottle-
neck layer size is set to 768 (same as I-CFN). As table 3, our method achieved simi-
lar RMSE as other Autoencoder based method and outperforms other non-autoencoder
based methods.

Table 3. Test Set RMSE on MovieLens 1M Table 4. Test Set RMSE on MovieLens 10M
Method Test set RMSE Method Test set RMSE
RBM 0.854 RBM 0.825
ALS-WR 0.843 ALS-WR 0.783
LLORMA 0.837 LLORMA 0.794
I-CFN 0.838 I-CFN 0.776
I-AutoRec 0.831 I-AutoRec 0.782
I-CFAE(Our) 0.836 I-CFAE(Our) 0.779
326 W.-Z. Tang et al. / A Speed up Method for Collaborative Filtering with Autoencoders

5.4. Test Set RMSE on MovieLens 10M

MovieLens 10M is a much bigger dataset comparing to MovieLens 1M, so we set much
lower weight decays and learning rates for this dataset. In MovieLens 10M dataset our
model beats strong baselines such as ALS-WR and even I-AutoRec. Our model’s perfor-
mance is similar to I-CFN.

6. Conclusions

In this paper, we introduced a method to speed up the training of Autoencoders with


large sparse target. We analyzed the time complexity and constant factor of two popular
autoencoder implementation. We also analyzed the distribution of the rating data. Finally,
by properly separating the sparse part and dense part, and using different implementation,
we achieved orders of magnitude speedup comparing to existing autoencoder based CF
implementation, both in CPU and GPU, while attaining similar accuracy.

References

[1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean,
M. Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous systems, 2015. Software
available from tensorflow. org, 1, 2015.
[2] J. Bobadilla, F. Ortega, A. Hernando, and G. Glez-de Rivera. A similarity metric designed to speed up,
using hardware, the recommender systems k-nearest neighbors algorithm. Knowledge-Based Systems,
51:27–34, 2013.
[3] L. Deng, M. L. Seltzer, D. Yu, A. Acero, A.-r. Mohamed, and G. E. Hinton. Binary coding of speech
spectrograms using a deep auto-encoder. In Interspeech, pages 1692–1695. Citeseer, 2010.
[4] G. E. Hinton. Training products of experts by minimizing contrastive divergence. Neural computation,
14(8):1771–1800, 2002.
[5] D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980,
2014.
[6] Y. Koren, R. Bell, C. Volinsky, et al. Matrix factorization techniques for recommender systems. Com-
puter, 42(8):30–37, 2009.
[7] J. Lee, S. Kim, G. Lebanon, and Y. Singer. Local low-rank matrix approximation. ICML (2), 28:82–90,
2013.
[8] G. Linden, B. Smith, and J. York. Amazon. com recommendations: Item-to-item collaborative filtering.
IEEE Internet computing, 7(1):76–80, 2003.
[9] Y. Park, S. Park, W. Jung, and S.-g. Lee. Reversed cf: A fast collaborative filtering algorithm using a
k-nearest neighbor graph. Expert Systems with Applications, 42(8):4022–4028, 2015.
[10] R. Salakhutdinov, A. Mnih, and G. Hinton. Restricted boltzmann machines for collaborative filtering.
In Proceedings of the 24th international conference on Machine learning, pages 791–798. ACM, 2007.
[11] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Item-based collaborative filtering recommendation
algorithms. In Proceedings of the 10th international conference on World Wide Web, pages 285–295.
ACM, 2001.
[12] S. Sedhain, A. K. Menon, S. Sanner, and L. Xie. Autorec: Autoencoders meet collaborative filtering. In
Proceedings of the 24th International Conference on World Wide Web, pages 111–112. ACM, 2015.
[13] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way
to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929–1958,
2014.
[14] F. Strub and J. Mary. Collaborative filtering with stacked denoising autoencoders and sparse inputs. In
NIPS Workshop on Machine Learning for eCommerce, 2015.
[15] Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the
netflix prize. In International Conference on Algorithmic Applications in Management, pages 337–348.
Springer, 2008.
Fuzzy Systems and Data Mining II 327
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-327

Analysis of NGN-Oriented Architecture for


Internet of Things
Wei-Dong FANGa, Wei HEa, Zhi-Wei GAO b,1, Lian-Hai SHANc,d and Lu-Yang
ZHAO a
a
Key Laboratory of Wireless Sensor Network & Communication, Shanghai Institute of
Microsystem and Information Technology, Chinese Academy of Sciences.
Shanghai 201899, China
b
Ceprei Certification Body, the Fifth Electronics Research Institute of Ministry of
Industry and Information Technology, Guangzhou 510610, China.
c
Shanghai Internet of Things Co., Ltd. Shanghai 201899, China.
d
Shanghai Research Center for Wireless Communications. Shanghai 200335, China.

Abstract. As one of the most important technologies which have enormous


influence in the 21st century, the emergence and development of the Internet of
Things (IoT), have exploited a new field for the research and application of the
information technology. In order to facilitate to design IoT’s system, in this paper,
we aim at reviewing the architecture of IoT, especially the oriented on the Next
Generation Network (NGN) architecture. Firstly, the applications in IoT are
identified and summarized. Then, some holistic overviews of NGN-oriented
architecture are given in IoT, these overviews are divided into four categories:
based on the Ubiquitous Sensor Networks (USN), the Open system
Interconnection (OSI) model, the methodology and the converged network-
oriented. Furthermore, along the way, these proposed architectures, techniques and
approaches in each category qualitatively are analyzed. Finally, a few study issues
and directions in near future are pointed out, and the conclusions are given.

Keywords. Internet of Things (IoT), architecture, Next Generation Network


(NGN), Open system Interconnection (OSI).

Introduction

As a multi-disciplinary research direction, the Internet of Things is concerned by the


industry and the academia. It is considered to be one of the most important
technologies, which have enormous influence in the 21st century. The Internet of
Things was defined by the International Telecommunication Union (ITU) in 2005:
“The connectivity for anything by embedding mobile transceivers into a wide array of
additional gadgets and everybody items, enabling new forms of communication
between people and things, and between things themselves.” At present, through
various integrated micro-sensor, the Internet of Things can achieve to sense, process
and transmit diverse information, and then converge through the self-organizing
communication networks in multi-hop relay. Through a variety of standard

1
Corresponding Author: Zhi-Wei GAO, No.110 Dong Guan Zhuang Road, Tianhe District, Guangzhou
510610, Guangdong Province, China; E-mail: Gaozw@ceprei.org.
328 W.-D. Fang et al. / Analysis of NGN-Oriented Architecture for Internet of Things

transmission network (or access network), the Internet of Thing can transmit these
information to the user’s terminal, such as the personal computer, PAD, data sever and
so on, in order to meet the requirements of various applications, and then realize the
goal of the ubiquitous computing.
Currently, the IoT is gradually applied in many fields, which involve the smart city
and digital city [1], the intelligent transportation [2] and so on. Additionally, IoT is
studied by many institutes and organization for standardization. Recently, in the Third-
Generation Partnership Project's (3GPP's) Radio Access Network Plenary Meeting 69,
the Narrow-Band Internet of Things (NB-IoT) is decided to standardize [3]. This
standardization will focus on providing improved indoor coverage, supporting of a
massive number of low-throughput devices, low latency sensitivity, ultralow device
cost, low device energy consumption, and optimized network architecture.
Although the IoT’s standardizations have been launched, and its applications have
been deployed in many fields, the Next Generation Network (NGN) is converged
heterogeneous network, which has various topologies and standards, multi-networks
co-existing, and superior fault tolerance and resilience. At present, there is no an
objective architecture of IoT for NGN. In this paper, the applications of IoT are firstly
identified and summarized in section 1. Then, in section 2 a more holistic overview of
NGN-oriented architecture in IoT is given. These architectures are divided into four
categories: based on the Ubiquitous Sensor Networks (USN), the Open system
Interconnection (OSI) model and the methodology, as well as converged network-
oriented. Along the way, the pros and cons of proposed architectures are analyzed in
each category qualitatively. Lastly, the conclusions are represented.

1. IoT Architecture of NGN- oriented

The Next Generation Network (NGN) that is a broad concept involves a variety of
changes to take place in the network construction method. At present, the research on
designing and proposing the architecture of the Internet of Things is mainly classified
as four categories: based on the USN’s high-level architecture, referencing to the OSI
model, according to logistics or semantics and converged network- oriented.

1.1. USN-based

The Ubiquitous Network (UN) was defined in Y. 2002 of ITU [4]: “The ability for
persons and/or devices to access services and communicate while minimizing technical
restrictions regarding where, when and how these services are accessed, in the context
of the service(s) subscribed to.”
At present, the industry believes generally that "IoT + Internet" is almost equal to
the ubiquitous network It is the service, which refers to information perception,
transmission, storage, cognition, decision-making and use between people, objects and
things based on the needs of individuals and society. The ubiquitous network has
superior environment - aware, content - aware and intelligence, provides ubiquitous,
nothing without the information services and applications for individuals and society.
As an important part of ubiquitous networks, the high-level architecture of the
ubiquitous sensor networks was proposed in Y.2221 of ITU [5]. The architecture was
divided into five parts based on the high-level architecture of the ubiquitous sensor
networks in figure 1. It included the underlying sensor network, the ubiquitous sensor
W.-D. Fang et al. / Analysis of NGN-Oriented Architecture for Internet of Things 329

networks, access networks, the ubiquitous sensor networks infrastructure backbone


network and the sensor network middleware. It is the greatest feature that the
ubiquitous sensor networks layered architecture relies on NGN architecture. In the
place closest to the user, the smart things form a ubiquitous network environment to
provide diverse services, and NGN acts as the core infrastructure for the ubiquitous
sensor networks.

Figure 1. High-level architectural model for ubiquitous networking in NGN

1.2. OSI model-based

Generally, the system architecture based on the Open system Interconnection (OSI)
model is designed as three layers, which involve the Perception Layer, the Network
layer and the Application Layer. This architecture’s design comes more from the
requirements of industrial application, the interfaces between the different layers (data
interface or physical interface) are seldom mentioned. Wu [6] proposed a new IoT
architecture based on the three-layer, which could better explain the features and
connotation of the IoT. It is different that the Application Layer is refined into: the
processing layer, the application layer and the business layer.
Ub
ol

iqu
ntr

ito
Co

us
&

Tr

Diversified
se

ans
Sen

Services &
mi
ssio

Applications
n

Generic Technologies
(Interface,
(Interfa
f ce, Middleware )

Figure 2. A spatial architecture model of Internet of Things


330 W.-D. Fang et al. / Analysis of NGN-Oriented Architecture for Internet of Things

In addition, Fang [7] proposed a spatial architecture model of IoT based on


triangular pyramid. The spatial architecture model is made up of “Sensing and
Controlling”, “Ubiquitous Transmission” and “Diversified Requirements and
Applications” in figure 2. In this architecture, the OSI model and the diverse
applications could be converged better.

1.3. Methodology-based

To abstract the heterogeneity of devices, Kiljande [8] presented novel semantic level
interoperability architecture for pervasive computing and IoT. This architecture had
two main principles. The first one was that information and capabilities of devices were
represented with semantic web knowledge representation technologies, and interaction
with devices and the physical world was achieved by accessing and modifying their
virtual representations. Second, for global IoT, it was divided into numerous local
smart spaces, which were managed by a Semantic Information Broker (SIB), which
provided a means to monitor and update the virtual representation of the physical world.
To connect the things with each other or the users with the physical world, Pu [9]
proposed intelligent interaction architecture based on the context fusion in IoT.
In addition, since services in IoT had four different characteristics from traditional
Internet service: environment perception, event-driven, service coordination and
initiative execution, Lan [10] proposed an Event-Driven Service-Oriented Architecture
(EDSOA) for IoT. The EDSOA could support real-time, event-driven, and active
service execution. Bergmann and Robinson [11] proposed Server-Based Internet-Of-
Things Architecture (SBIOTA).

1.4. Converged network- oriented

The convergence of heterogeneous networks has created a huge potential for new
business. To fully realize this potential, there is a need for a common way to design
architecture. So, the architecture of the converged network- oriented has been focus on.

Figure 3. Enhanced IMS QoS architecture of converged IoT and 3GPP LTE-A network
W.-D. Fang et al. / Analysis of NGN-Oriented Architecture for Internet of Things 331

Yang [12] proposed an enhanced IMS (IP Multimedia Subsystem) QoS (Quality of
Service) architecture to support the convergence of IoT and 3GPP (Third Generation
Partnership Project) LTE-A (Long Term Evolution-Advanced) network. The above
architecture could provide flexible services with dynamic requirements to both IoT and
LTE-A network. In figure 3, higher-layer connections among all MTC (Machine Type
Communication) devices could be provided by attaching to these fixed or mobile
stations. In addition, Zhang and Liang [13] proposed architecture for converged IoT
based on VN (Vector Network).
From the IoT’s application point of view, most architecture of applied IoT’s
system is three-layers: sense, transmission and application. The information is sensed
in the front end networks, which have complex and dynamic network topology, such as
tree/cluster topology, star topology, peer-to-peer topology, mesh topology and so on.
This information is accessed via the base station (BS) or sinks to transmit into the
backbone/ core network. Finally, via the edge router and firewall, this information is
used by end-users
The convergence of heterogeneous networks is the IoT’s evolution trend. These
are due to that, 2G/3G/4G, WiFi, WSN and Ad hoc are co-existing. In fact, there are a
few specialized researches along the ITU's technology roadmap for the Internet of
Things, however, how to achieve the communication’s goal between persons and
things, things and things as the IoT’s important function, which is adopted by research
system of IoT, it requires all parties to research.

2. Some Key Technologies in the Near Future

In this section, we will present some key technologies for the design of IoT’s
architecture and its necessity analysis briefly.

2.1. Low Complexity Security Technology

Cognizing, sensing and controlling in IoT depends on many different types of sensors
and electric tag. These sensor nodes have the following characteristics.
x Limited power supply or no power supply
x Restricted computing and storage
x Smaller user interface,
x Tiny volume and limited communication range
x Open and diverse application scenarios
These inherent constraints result in the particularity of security from as follows:
x Minimizing resource consumption and maximizing security performance.
x WSN deployment renders more link attacks ranging from passive
eavesdropping to active interfering.
x In-network processing involves intermediate nodes in end-to-end information
transfer.
x Wireless communication characteristics render traditional wired-based
security schemes unsuitable.
x Large scale and node mobility make the affair more complex.
x Node adding and failure make the network topology dynamic.
332 W.-D. Fang et al. / Analysis of NGN-Oriented Architecture for Internet of Things

Although there are many researches in this field, these methods and algorithms of
security seldom consider the implementation complexity and energy consumption. As
mentioned above, the limited power supply is one of the most important constraints,
and the complexity of the algorithm is directly related to that. We have to face a
balance between the algorithm of low complexity and security requirements. So we still
have a long way to go.

2.2. Interface Technology of Ubiquitous Network

The heterogeneity is an important feature for the ubiquitous transmission, the diversity
of network standards and topologies is a good example. The ubiquitous transmission of
IoT differs from common digital transmission. The former need low latency and QoS
guarantee. On one hand, the interfaces become the key nodes of data congestion
because of the scheduling policies and priority management of the different standard
network. On the other hand, the electrical characteristics of different interfaces are also
the important reasons contributed to the forwarding latency.
The interface technology of the ubiquitous network includes not only the unity of
electrical interface specification, but also the optimization of interface protocol. After
all, this is not a small problem. As we all know, the Internet does not provide the
Quality of Service (QoS) guarantees, although some access network could, such as 3G,
HSDPA (High Speed Downlink Packet Access) and etc. However, there is no denying
that the research and standardization of the interface technology of IoT’s ubiquitous
transmission are important issues under the premise of guaranteeing the quality of
service.

3. Conclusions

Many researchers have focused on the design of the system architecture all along,
especially, the design of IoT’s architecture. In February 2004, the Next Generation
Network and its architecture were decided to standardize by ITU-T SG13. In the last
ten years, lots of heterogeneous networks continued to emerge. As the transmission
networks for sensed information, they are facing the challenges of complex scenario
and unattended environment in IoT. On the other hand, the rational design of the
architecture contributes to solve the transmission bottleneck. For the front end of IoT’s
system, the information is collected by many wireless sensor nodes. In the wireless
sensor network, the transmission bottleneck could make the entire network’s load
imbalance; meanwhile, the sink nodes’ energy is consumed rapidly. This will cause to
shorten the lifetime of the entire wireless sensor network. In this paper, we review the
NGN-oriented architecture in IoT, and present some conclusion and open research
issues to facilitate the design for IoT’s system.
According to the ITU’s description, “Interconnect Any Thing” is the expansion of
capacity, services and applications in the next generation network. Therefore, we
recommend that the Internet of things should be adopted into the research field of the
next generation network, and be implemented in the technology development roadmap
of the next generation network, relied on its existing research achievements. In the near
future, we will further research the functional architecture, the system framework and
specific configuring model in the Internet of Things.
W.-D. Fang et al. / Analysis of NGN-Oriented Architecture for Internet of Things 333

Acknowledgment

This work is partially supported by the National Natural Science Foundation of China
(No. 61471346, No. 61302113), the Science and Technology Service Network Program
of Chinese Academy of Sciences (No. kfj-sw-sts-155), the Shanghai Municipal Science
and Technology Committee Program (No. 15DZ1100400), the Shanghai Natural
Science Foundation (No. 14ZR1439700), the National Science and Technology
Infrastructure Program (No. 2015BAH26F00) and the NSFC-Guangdong Joint Fund
(No. U1401257).

References

[1] A. Monzon. Smart cities concept and challenges: Bases for the assessment of smart city projects. In:
Proceeding SMARTGREENS, Lisbon, Portugal, (2015), 1-11.
[2] M. Jiang, H. Liu, L. Niu. The Evaluation Studies of Regional Transportation Accessibility Based on
Intelligent Transportation System: Take the Example in Yunnan Province of China. In: Proceeding
ICITBS, Halong Bay, Vietnam. (2015), 862 - 865,
[3] J. Gozalvez. New 3GPP Standard for IoT. IEEE VEH TECHNOL MAG. 11 (2016), 14-20.
[4] ITU-T. Recommendation Y. 2002. Overview of ubiquitous networking and of its support in NGN.
Geneva: ITU, (2010).
[5] ITU-T. Recommendation Y.2221. Requirements for support of ubiquitous sensor network (USN)
applications and services in NGN environment. Geneva: ITU, (2010).
[6] M. Wu, T. J. Lu, F. Y. Ling, J. Sun, H. Y. Du. Research on the architecture of Internet of Things. In:
Proceeding ICACTE, Chengdu, China, (2010), V5-484 - V5-487.
[7] W. Fang, L. Shan, Z. Shi, G. Jia, X. Wang. A Spatial Architecture Model of Internet of Things Based on
Triangular Pyramid, Lecture Notes in Electrical Engineering, Springer, 237 (2014), 825-832.
[8] J. Kiljande, A. D'Elia, F. Morandi, P. Hyttinen, J. Takalo-Mattila. Semantic Interoperability Architecture
for Pervasive Computing and Internet of Things. IEEE ACCESS, 2(2014), 856 – 873.
[9] H. Pu, J. Lin, F. Liu, L. Cui. An intelligent interaction system architecture of the internet of things based
on context. In: Proceeding ICPCA, Maribor, Slovenia, (2010), 402 – 405.
[10] L. Lan, F. Li, B. Wang, L. Zhang, R. Shi. An Event-Driven Service-Oriented Architecture for the
Internet of Things, In: Proceeding APSCC, Fuzhou, China, (2014), 68 – 73.
[11] N. W. Bergmann, P. J. Robinson. Server-based Internet of Things Architecture. In: Proceeding IEEE
CCNC, Las Vegas, NV, USA, (2012), 360 – 361.
[12] S. Yang, X. Wen, W. Zheng, Z. Lu. Convergence architecture of Internet of Things and 3GPP LTE-A
network based on IMS. In: Proceeding GMC, Shanghai, China, (2011), 1 – 7.
[13] J. Zhang, M. Liang. A New Architecture for Converged Internet of Things. In: Proceeding ITAP,
Wuhan, China, (2010), 1-4.
334 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-334

Hypergraph Spectral Clustering via Sample


Self-Representation
Shi-Chao ZHANG1, Yong-Gang LI, De-Bo CHENG, Zhen-Yun DENG
Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal
University, Guilin, Guangxi, 541004, China

Abstract. Traditional clustering methods cluster data with pairwise graph and
usually result in information loss. In this paper, we propose a novel spectral
clustering method by combing hypergraph and sample self-representation together.
Specially, the proposed algorithm employs sample self-representation based loss
function 2,1 -norm which is row sparse to weaken the effects of the noises. And
then, a hypergraph regular term is imposed to construct the hypergraph Laplacian
which fully consider the complex similarity relationships of the data. The
experimental results on benchmark data-sets indicated that the proposed algorithm
prominently outperforms the compared state-of-the-art algorithms in terms of CE,
such as SRC, LSR and et al.

Keywords. spectral clustering, hypergraph, row-sparse, sample self-representation,


hypergraph Laplacian

Introduction

Clustering is a fundamental technique in many aspects of computer vision and machine


learning [1]. The purpose of clustering is to cluster data so that the data in same cluster
could reveal consistent relationships, while objects that do not pertain to the same
cluster should not reveal such relationships [2, 3]. Traditional clustering methods [4-7]
usually assume pairwise relationships among the samples. However, in many real
problems, it is incomplete to characterize the relationships among the data points
merely by pairwise simple graphs which usual result in high-order local information
loss. To solve this problem, in this article, we propose a hypergraph spectral clustering
which is based on sample self-representation (i.e., HGSR) to preserve the high-order
information of the samples, and finally improve the clustering performance.
Hypergraph is a summary of graphs [8]. Different from the graph that constructed
by vertices and edges, hypergraph is constructed by hyper-edges and vertices.
Moreover the same to graph, hypergraph can be asymmetric (or symmetric) as well.
The hyper-edges of hypergraph contain two or more vertices, i.e., hyper-edges are
arbitrary subset of the vertex set rather than just include two vertices that edges of
graph connected. In this way, more information (high-order local relations) is
introduced into the clustering model, and so hypergraph could improve the clustering

1
Corresponding Author: Shi-Chao Zhang, Guangxi Normal University, Guilin, China; E-mail:
zhangsc@gxnu.edu.cn.
S.-C. Zhang et al. / Hypergraph Spectral Clustering via Sample Self-Representation 335

results. In the following, without special instructions, we utilize graph to represent the
simple graph.
HGSR algorithm can be detail described as fellows. Firstly, we construct a
hypergraph that fully considers the relations of samples and then obtain the hypergraph
Laplacian matrix. Secondly, we conduct row sparse self-representation for all samples
by utilizing an 2,1 -norm loss function, and meanwhile put hypergraph Laplacian into
the regulation to preserve the local structure of each sample. In this way, similar
samples are clustered into the same cluster. At last, we obtain an affinity matrix to
conduct clustering.
The contributions of this work are summed up as follows:
x By imposing hypergraph into the regularization, more information has been
introduced into the clustering model. Specially, the hypergraph Laplacian
utilize the higher order relationship to preserve the local structure and then
improve the clustering performance.
X  XZ 2,1
x HGSR utilizes 2,1 -norm, i.e., to reconstruct the sample self-
representation error. It is row sparse and the representation coefficients of
every sample depend on all of the other samples, so that HGSR is robust to
noises and outliers.
x The experimental results on benchmark datasets (such as face image clustering,
motion segmentation, et al) showed that the HGSR surpasses the state-of-the-
art algorithms, such as LSR, SSC, et al.

1. Proposed algorithm

1.1. Notations

In the whole paper, the lowercase letters and the bold italic capital symbols are
respectively used to denote vectors and matrices. tr(A) means trace norm of a square
matrix A. AT and A-1 respectively represent the transpose and the inverse of A. [A]j
represents the j-th column of A. The i-th row of A is represented as [A]i.
¦ ¦
n n
A1 j 1 i 1
Aij A A 2,1 2,1
, F , respectively represent the 1 -norm, F-norm, -

norm (
¦ j > A@ j 2 ) of A. Rank (Z) and Z * respectively denote the rank and the
amount of the singular values (SV) of Z.

1.2. Hypergraph

We define a hypergraph as triple GH (V E w) , where V ^ x1 , x2 , }, xn `  Rd un ,


represent the data points, E represents hyper-edges (a group of nonempty subsets of V)
and w assigns a weight to each edge with w(e). A hypergraph is the general form of a
graph: hyper-edges can contain arbitrary number of data points rather than pair of data
points. The degree of a hyper-edge e is the number of vertices it contains, denoted by
G (e) . The degree of a data point vi V is denoted as d v ¦ ve,eE w (e) . De, Dv
and WH denote the matrix characterization of G (e) , d v and w(e) , respectively. Fig. 1
336 S.-C. Zhang et al. / Hypergraph Spectral Clustering via Sample Self-Representation

is a sample Hypergraph of some animals from the Zoo dataset [15] with the hyper-
edges:
^
E e1 ^v1 , v2 , v9 , v10 ` , e2 ^v5 , v8 , v12 ` , e3 ^v5 , v6 , v7 , v11` , e4 ^v2 , v3 , v4 , v6 , v10 ` `
ª e1 e2 e3 e4 º
« v 1 0 0 0»
« 1 »
« v2 1 0 0 1 »
« »
« v3 0 0 0 1 »
«v 0 0 0 1 »
« 4 »
« v5 0 1 1 0»
«v 0 0 1 1 »
« 6 »
« v7 0 0 1 0»
« »
« v8 0 1 0 0»
« v9 1 0 0 0»
« »
« v10 1 0 0 1 »
«v 0 0 1 0»
« 11 »
«
¬ v12 0 1 0 0» ¼

Figure 1. Hypergraph of the animals from Zoo dataset.


In addition, the hypergraph GH also could be represented by an affinity matrix
H  R|V |u| E| (see right of Fig. 1), where |V | and |E| represent the number of vertices and
hyper-edges, respectively. The affinity matrix characterizes relationships of vertices
and edges rather than pairwise vertices, and the elements of it are defined as follows:
­1 if (v  e)
h ( v, e) ® (1)
¯0 otherwise
From the definition of H, we know that d (v) ¦ eE
w(e) h(
h(v, e) and
G (e) ¦ vV
h(v, e) .
The affinity matrix H can completely depict the peculiarity of a hypergraph: the
elements of H characterize the relations between vertices and hyper-edges. While, the
same with graph based spectral clustering that construct a Laplacian matrix L=D-W to
conduct clustering procedure, we need to find a hypergraph Laplacian matrix and an
adjacent matrix for hypergraph based spectral clustering methods to cluster data points.
Specifically, we utilize the recent method in [9] to construct the hypergraph Laplacian.
The normalized Laplacian matrix of a hypergraph H in this article is defined as
1 1

H I  D 2 HW D 1 H T D 2 , where Dv is the diagonal vertex degree matrix and De
|V | v H e v

is the diagonal edge degree matrix. In this paper, the WH is a diagonal matrix with all
ones diagonal elements.
The most recently studies [10] indicate that the local structure of data is beneficial
to clustering analysis. The local geometry structure of the data tends to be the local
neighborhood relationship of the samples. This relationship could be represented by the
k-nn graph of each sample. Intuitively, similar samples should have similar
representation coefficients. We combine it with hypergraph and define hypergraph
based regularization as follows:
1
¦ ¦
w ( e ) h ( xi , e ) h ( x j , e )
K( Z ) G (e) u || zi  z j ||22
2 eE xi , x j V ( ei ) (2)
1 1
= tr( ZLˆ H Z T ) tr( Z ( I|V |  Dv 2 HWH De1 H T Dv 2 ) Z T )
where WH is a weight matrix of hyper-edges (we set the weight of each hyper-edge to
1), Z is the representation coefficient matrix, H  R|V |u| E| is the affinity matrix of the
S.-C. Zhang et al. / Hypergraph Spectral Clustering via Sample Self-Representation 337

hypergraph, Lˆ H is the hypergraph Laplacian matrix and V(ei) represents the vertices
that have relations to the hyper-edge ei. The regularization K( Z ) ensures that the
similar samples xi and xj have similar or equal representation coefficients zi and zj.

1.3. Proposed algorithm

In general, we expect the clustering model should satisfy those properties: the
sample self-representation coefficients of similar data points should also be similar,
and the model should be robust to noise and outliers. However, existing methods
usually are sensitive to noise and outliers. Moreover, some relations between
samples are lost. For instance the model in LSR [8]:
1 n n
min J Z X  XZ F  O tr ZZ T = X  XZ F + ¦¦|| zi  z j ||22  || Z T e ||22 (3)
2 2 1
Z 2i1 j1 n
where e is a vector that all elements of it are one. It assigns equal weights to all
representation coefficients. In this way, whether the representation coefficients are
similar to each other is neglected. In consider of 2,1 -norm is robust to outliers [11], we
utilize it as the loss function. And then, the hypergraph Laplacian is utilized to
constraint coefficient of the representation Z, i.e. Eq. (2). It ensures the representation
coefficients of the similar samples are also similar. Finally, the objective function of
HGSR is
min J Z X  XZ  O tr ZLˆ Z T
Z 2,1 H (4)
where Lˆ H is the hypergraph Laplacian of the hypergraph GH . Since the loss function of
2
Eq. (4) is not quadratic, it makes the outliers become less important than X  XZ F
.
With the help of hypergraph based trace-norm constrain on Z, local information has
been introduced into the model to make sure the representation coefficients of the
similar samples are similar. And ultimately improve the clustering performance.
The same as SRC et al, after the optimal coefficient matrix Z* is obtained, we use
the following equation to conduct spectral clustering [12]:
J1 ( Z *  Z *T ) 2 (5)

2. Experimental Analyses

2.1. Experimental Data sets and Evaluation Criterion

We compared HGSR with the latest graph based spectral clustering algorithms such as
LSR, SSC and et al. on the applications of motion segmentation (Hopkins155 [7]), face
clustering (Extended Yale Face database B [13] and ORL [14]) and animal clustering
(Zoo dataset [15]). All of the data sets are the most commonly used benchmark data
sets for evaluating spectral clustering algorithms. The details of the datasets are
presented as follows:
Hopkins 155 [7], includes 155 sequences of video. Every sequence contains
two or three movements and every sequence is a clustering task.
338 S.-C. Zhang et al. / Hypergraph Spectral Clustering via Sample Self-Representation

Extended Yale Face B [14] is a face clustering dataset. It includes 16128


grayscale photographs of 28 persons under different actions and 64 Lighting
Condition.
ORL [14] is also a face clustering database. There are ten different images of
each of 40 distinct persons, i.e., totally 400 face images.
Zoo [15] is an animal database. It contains 101 instances, and each sample has
17 Boolean-valued attributes.
The same with the other algorithms, we use the performance evaluation clustering
error (CE) [6] to evaluate all the algorithms. Through the optimal transformation, CE
can achieve the least error by mating the clustering result and the ground truth. And it
is defined as:
1 N
CE 1  ¦ M Eri , map(Tli ) (6)
N i1
where Eri and Tli respectively represent the ground truth and the result label of the i-th
sample. In function Eq. (6), δ(x, y) =1, when x=y, and otherwise δ(x, y) =0. The smaller
the value of CE is, the better the method is. The mapping function, map(qi), arranges
the clustering result to mate the original labels, and it can be computed by the
algorithm Kuhn-Munkres [7] .

2.2. Experimental Results

In experiments, we constructed hypergraphs for each dataset, in which samples were


regarded as vertices. We assign all hyper-edges with equal weight. How to choose
suitable weights for hypergraph needs further research.
In Table 1, we exhibit the CE of each method on Hopkins 155 data set. In order to
comprehensively show the effectiveness and efficiency of the algorithms, we evaluate
the performances of each algorithm in four aspects, i.e. maximum, average, minimum,
and the standard deviation. It indicates that HGSR achieves the smallest CE, i.e., 2.48%,
while the best CE achieved by the other algorithms is 3.35% obtained by SSC. The
standard deviation of HGSR is smaller than the others. It indicates the stability of
HGSR. Because many sequences can be clustered easily, all the algorithms achieve the
smallest error on these sequences. To be fair comparison, we also compare HGSR with
graph and sample self-representation based clustering (GSR) which is only different in
the Laplacian matrix construction, i.e. HGSR is hypergraph Laplacian based while
GSR is Laplacian based.
Table 1. The CE (%) achieved by each algorithm on Hopkins dataset.

LSR SRC SSC LRR GSR HGSR


Max 39.71 46.70 46.97 47.64 38.86 42.37
Mean 4.21 4.24 3.92 5.14 3.35 2.48
Min 0 0 0 0 0 0
STD 8.60 9.80 7.61 10.04 7.7 6.4

Table 2 shows the clustering errors on Extended Yale Face B, ORL and Zoo. It can
be summed up that the clustering result improved by HGSR over the others is
noteworthy. It also can be seen that the algorithm based on hypergraph HGSR is better
than graph based GSR. In each data set, the clustering error of graph based GSR is
S.-C. Zhang et al. / Hypergraph Spectral Clustering via Sample Self-Representation 339

smaller than other four graph based methods. It indicates that the 2,1 -norm is much
robust to noises than F-norm.
Table 2. The CE (%) achieved by each algorithm on Extended Yale Face B, ORL and Zoo.

LSR SRC SSC LRR GSR HGSR


Yale Face B 48.81 26.56 35.00 27.50 25.94 25.00
ORL 22.50 21.25 53.75 22.25 19.50 17.25
Zoo 45.54 31.68 31.68 30.69 25.74 24.75

3. Conclusion

In this paper, we introduced hypergraph into the sample self-representation model.


The relations among the samples were fully considered. HGSR solved the problem of
information loss and made the clustering model be robust to noises and outliers.
Experimental results demonstrate that our spectral model prominently outperforms the
previous self-representation based clustering algorithms on benchmark datasets and
real problems.

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China
(Grants No: 61263035, 61573270, and 61672177), the China Key Research Program
(Grant No: 2016YFB1000905); the Guangxi Collaborative Innovation Center of Multi-
Source Information Integration and Intelligent Processing; Innovation Project of
Guangxi Graduate Education under grants YCSZ2016046 and YCSZ2016045.

References

[1] A. Jain, M. Murty, P. Flynn, Data clustering: a review. ACM Computing Surveys, 31 (1999), 264-323.
[2] F. Zhao, L. Jiao, H. Liu, et al. Spectral clustering with eigenvector selection based on entropy ranking.
Neurocomputing, 73 (2010), 1704-1717.
[3] J. A. Hartigan, M. A. Wong. A k-means clustering algorithm. Applied Statistics, 28 (2013), 100-108.
[4] E. Elhamifar, R. Vidal. Sparse subspace clustering. In CVPR, 2009, 2790–2797.
[5] C. Y. Lu, H. Min, Z. Q. Zhao, et al. Robust and efficient subspace segmentation via least squares
regression. In ECCV, 2012, 347–360.
[6] G. Liu, Z. Lin, S. Yan, et al. Robust recovery of subspace structures by low-rank representation. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 35 (2013), 171–184.
[7] H. Hu, Z. Lin, J. Feng, et al. Smooth representation clustering. In CVPR, 2014, 3834-3841.
[8] S. R. Bulo, M. Pelillo. A Game-Theoretic Approach to Hypergraph Clustering. IEEE Transactions on
Pattern Analysis & Machine Intelligence, 35 (2013), 1312-1327.
[9] J.B. MacQueen. Some Methods for Classification and Analysis of Multi Variate Observations. Berkeley
Symposium on Math, 1967, 281-297.
[10] J. Sivic, B. C. Russell, A. A. Efros, et al. Discovering Objects and Their Location in Images. In ICCV, 1
(2005), 370-377.
[11] X. Zhu, L. Zhang, Z. Huang. A sparse embedding and least variance encoding approach to hashing.
IEEE Transactions on Image Processing, 23 (2014), 3737-3750.
[12] U. von Luxburg. A tutorial on spectral clustering. Statistics & Computing, 17 (2007), 395-416.
340 S.-C. Zhang et al. / Hypergraph Spectral Clustering via Sample Self-Representation

[13] Y. Gao, M. Wang, D. Tao, et al. 3-D object retrieval and recognition with hypergraph analysis. IEEE
Transactions on Image Processing a Publication of the IEEE Signal Processing Society, 21 (2012),
4290-303.
[14] M. R. Franjoine, J. S. Gunther, M. J. Taylor. Pediatric Balance Scale: a modified version of the Berg-
Balance Scale for the school age child with mild to moderate motor impairment. Pediatric Physical
Therapy, 15(2003), 114-128.
[15] X. Zhu, H. I. Suk, S. W. Lee, et al. Subspace regularized sparse multi-task learning for multi-class
neurodegenerative disease identification. IEEE Transition Biomed Engineering, 63 (2016), 607-618.
Fuzzy Systems and Data Mining II 341
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-341

Safety Risk Early-Warning for Metro


Construction Based on Factor Analysis and
BP_Adaboost Network
Hong-De WANGa,c,*, Bai-Le MAb, Yan-Chao ZHANGa
a
School of Civil and Safety Engineering, Dalian Jiaotong University, Dalian , China
b
Operation Branch, Suzhou Rail Transit Co., Ltd, Suzhou, China
c
Tunnel & Underground Structure Engineering Center, Dalian Jiaotong University,
Dalian ,China

Abstract. In order to improve the level of safety risk early warning management
of metro construction, firstly, built early warning indicators from four aspects of
the human, machinery, environment, and management. Then quantized the early
warning indicators by combining 12 metro construction projects and using the
Dlephi method. Thirdly, based on factor analysis method, optimized the 30 early
warning indicators to reduce to 7 indicators. Finally, obtained index factors as the
input of the neural network, established BP_Adaboost network warning model and
made the subway construction safety risk early warning. The results show that the
input data optimization technique based on factor analysis method to
BP_Adaboost input factor to optimize neural network early warning model, raised
the speed of the warning, but also improved the precision of the subway
construction safety risk early warning.

Keyword. metro construction, early warning index, factor analysis, BP network

Introduction

Along with the rapid development of urban rail transit, promote the development of the
city and improve the efficiency of the business gathering. However the metro
construction process is uncertainty, inscrutability, complexity etc. Coupled with
inadequate understanding of subway construction safety risk and imperfect early
warning management system, landslides and other safety accidents occur frequently.
Therefore, building the feedback safety risk early warning index system and a set of
operational early warning method is an effective measure to strengthen the metro
construction safety control.
In 2000 J. Reilly expounded the safety risk management in tunnel construction
process [1]; In 2002 FaberMH, applied safety evaluation method in the construction
process [2]; And the metro construction safety risk assessment based on hierarchical
analysis method [3-5]. Based on factor analysis and BP neural network of subway
construction safety warning [6], and based on fuzzy comprehensive evaluation method

*
Corresponding Author: Hong-De WANG, School of Civil & Safety Engineering, Dalian Jiaotong
University. Tunnel & Underground Structure Engineering Center, Dalian Jiaotong University, Dalian,
Liaoning, China. E-mail: whdsafety@126.com.
342 H.-D. Wang et al. / Safety Risk Early-Warning for Metro Construction

of metro construction safety evaluation [7-8]. These methods subjectivity is stronger,


and the risk factors of nonlinear. Then the dimension of index was reduced by factor
analysis method. Finally the index was substituted in BP-Adaboost algorithm for early
warning.

1. Index selection and Questionnaire

Based on the system theory, the safety accidents were analyzed. The safety risk factors
were obtained according to the four aspects of the human, machine, environment,
management [9-10]. According to the metro construction technology standards [11],
metro construction safety management standards [12] and expert engineering
experience, established metro construction safety risk early warning index system
based on the four aspects of human, machine, environment, management, and got 30
warning indicators [3]. As shown in the Table 1.

Table 1. Early warning index


First Two level index First Two level index First Two level index
level level level
index index index
Indicator Operating violation Constr Equipment failure Early Climate conditions
system rate uction rate warning
for the Emergency technol The standard rate index Surrounding traffic
early mishandling rate ogy of technical system conditions
warning and oversight of
of Scene detection equip The construction construc Working comfort
behavior error rate ment method of tion
al[13] factor rationality environ
early mental
Technical warnin The practical Safety Safety production
examination g index situation of data manage education
unqualified rate preparation ment
Degree of Early Support condition factor Safety production
cooperation warnin early inspection
Sense of injustice g index Formation warning Rationality of
system deformation index[1 organization and
of 6] structure
Construc Equipment constru Rock condition Distortion rate of
tion applicability ction information
technolo enviro communication
gy and Equipment quality nmenta Settlement of The standard rate of
equipme failure rate l[15] surrounding management
nt factor buildings oversight
early Failure rate of Groundwater Safety hidden danger
warning equipment condition rectification
index maintenance
[13-14] Operating error rate Underground Emergency plan
pipeline situation

Using the above 30 early warning indicators, make question investigation to


Changsha, Wuhan, Hangzhou, Wuxi, Suzhou and other 12 metro project under
construction or completed .The quantitative index survey data directly, for qualitative
index using the Delphi method. The investigation projects have station construction
projects and interval construction projects. Every metro construction projects had the
H.-D. Wang et al. / Safety Risk Early-Warning for Metro Construction 343

warning level, including the severe alarm (4), moderate alarm (3), light alarm (2), no
alarm (1).

2. Factor analysis

First, the reverse early warning indicators were changed into a positive indicator. The
xc 1
reverse index was dealt with x .The reverse index xi was transformed into positive
i
i

index xci .The statistical software SPSS.19 was used to analyze. And by using Bartlett
ball test and KMO test, the measure value was greater than 0.6, and the significant
level was 0. There was a great correlation amongst early warning indicators, which
would be reduced by factor analysis. As Table 2 shown, there were 7 common factors,
whose cumulative contribution rate of sample variance was 94.195% in all. The 30
indicators of the metro construction warning would be reduced to 7 core indicators,
which would be able to represent the information of all early warning indicators.
Table 2. Explained the total variance
Initial eigenvalue Extraction square and loading Rotating square and loading
Fac
Summ Varianc Cumula Summa Varian Cumulat Summa Varianc Cumulat
tor
ation e% tion% tion ce% ion% tion e% ion%
1 14.004 46.681 46.681 14.004 46.681 46.681 10.565 35.215 35.215
2 4.321 14.402 61.083 4.321 14.402 61.083 5.588 18.626 53.841
3 3.010 10.033 71.116 3.010 10.033 71.116 3.536 11.787 65.629
4 2.290 7.634 78.750 2.290 7.634 78.750 2.305 7.683 73.312
5 2.069 6.895 85.645 2.069 6.895 85.645 2.248 7.494 80.806
6 1.428 4.760 90.405 1.428 4.760 90.405 2.141 7.136 87.942
7 1.137 3.790 94.195 1.137 3.790 94.195 1.876 6.254 94.195
8 0.851 2.836 97.031 -- -- -- -- -- --
9 -- -- -- -- -- -- -- -- --

Common factor 1 represents a particularly easy to cause accident hidden trouble of


safety risk point, which would be named for metro construction safety comprehensive
factor; Common factor 2 represents easily to cause accident hidden safety danger point,
which named for the construction environment and the safety management factors;
Common factor 3 and common factor 4 represent to urges the indirect factor of the
accident; Common factor 5, common factor 6 and common factor 7 respectively
represent organization and management structure of rationality , security risks
rectification ,and operation errors.
Then the coefficient matrix of the original index was obtained according to the
factor analysis method as shown in Table 3. The formula:

n
Fi ¦C x
j 1
ij j
. (1)

— Fi for the various factors extracted from the common factor; C ij factor
coefficient; X j for the original variable of the standard value.
344 H.-D. Wang et al. / Safety Risk Early-Warning for Metro Construction

Table 3. Factor score coefficient matrix


Factor Common Factor
1 2 3 4 5 6 7
1 0.129 -0.091 -0.019 -0.088 -0.063 0.002 0.036
2 -0.093 0.211 0.037 -0.109 0.02 -0.035 0.161
3 0.119 -0.085 -0.002 0.002 -0.065 0.041 0.012
4 -0.024 0.0148 0.102 -0.095 0.013 -0.065 0.071
5 0.08 -0.066 -0.008 -0.026 0.122 0.059 0.03
6 0.108 0.036 0.036 -0.053 -0.113 0.009 -0.281
7 0.016 -0.089 -0.024 0.02 -0.046 0.208 0.273
8 0.058 0.058 -0.059 -0.053 -0.012 0.024 -0.044
9 0.106 0.001 0.09 0.052 -0.001 0.023 -0.193
10 0.023 -0.007 -0.022 0.069 -0.227 -0.196 0.441
11 0.051 0.005 0.067 0.005 -0.032 0.093 0.029
12 -0.022 -0.061 0.26 0.083 0.061 0.122 0.031
13 0.135 -0.088 0.044 0.17 0.021 -0.096 -0.066
14 0.125 0.004 -0.054 0.019 -0.137 -0.173 -0.007
15 0.101 0.046 0.032 -0.082 -0.09 0.018 -0.22
16 0.112 -0.03 -0.079 -0.03 0.178 -0.151 -0.059
17 0.066 0.075 -0.019 0.122 0.16 -0.132 -0.137
18 -0.027 0.236 -0.009 0.043 0.08 -0.115 -0.18
19 -0.045 0.237 -0.078 0.033 -0.065 -0.081 -0.06
20 0.022 0.005 0.214 -0.34 0.008 0.001 -0.019
21 -0.045 0.08 -0.035 -0.06 -0.027 0.13 0.241
22 -0.028 0.041 0.27 0 -0.009 0.142 -0.102
23 0.03 0 0.063 0.323 -0.077 -0.033 -0.042
24 -0.081 0.077 -0.183 0.036 0.068 0.207 0.153
25 -0.103 0.16 -0.19 0.03 -0.034 0.141 0.136
26 0.02 -0.017 -0.03 -0.05 -0.497 -0.034 0.201
27 0.033 0.026 0.063 -0.083 0.146 0.017 0.075
28 0.062 -0.052 0.043 -0.032 0.146 -0.068 0.173
29 -0.009 -0.111 0.151 -0.013 0.03 0.556 -0.142
30 -0.025 -0.029 0.003 0.354 0.083 0.006 -0.005

3. Metro construction early warning based on BP_Adaboost algorithm

BP neural network can realize from the input to the output of the nonlinear mapping
function, but sometimes lacked simply and effective parameters. Algorithm was not
stable. The single BP neural network was easy to fall into local optimum; When a large
amount of training data, would be easy to fall into local optimum. In order to speed up
the convergence rate, improve the learning efficiency, and avoid the local optimum,
there were 10 BP networks. The structure of BP neural network for 10-7-1, 10 input
nodes and 7 nodes in the hidden layer, 1 output node. Each BP neural network trained
20 times. Table 4 Item Score.
Table 4. Item score
Item 1 2 3 4 5 6 7 Alarm level
1 -0.92460 -0.56238 -1.05038 -0.81879 -0.51597 -0.30275 -0.49870 1
2 0.30931 0.34129 -0.72078 -0.90587 0.24190 -1.81012 0.57436 3
3 0.13676 0.22341 -0.42382 0.18261 3.04584 0.23712 -0.11627 3
4 -0.68143 0.30668 -0.97917 0.94920 -0.50854 -0.70859 0.61760 2
5 0.01271 -0.22087 0.82655 -0.00148 -0.21424 0.40999 2.80254 3
6 -0.18278 -0.23775 0.81107 0.92725 -0.26456 0.19955 -0.0713 2
7 0.15807 -0.08450 1.64599 1.67091 -0.02558 -1.06484 -0.97398 2
8 2.58489 -1.50741 -0.57458 -0.13913 -0.46455 0.46719 -0.33454 4
9 -1.07828 -0.49040 -0.46676 0.35577 -0.07380 2.14917 -0.28563 1
10 0.91079 2.75912 -0.21622 -0.13125 -0.71299 0.80764 -0.40766 4
H.-D. Wang et al. / Safety Risk Early-Warning for Metro Construction 345

11 -0.45190 -0.11132 1.78539 -2.17981 0.03308 0.09791 -0.60103 2


12 -0.79354 -0.41588 -0.63729 0.09059 -0.54059 -0.48228 -0.70540 1

The project 5 and project 6 were test samples. Other items were training samples.
And there was output value with BP neural network. The prediction results of the table
5 and figure 1.

Table 5. Early warning results


Item Output Actual value Error BP output Error
5 3.1667 3 0.1667 3.433 0.433
6 2.2903 2 0.2903 2.601 0.601

Figure 1. Forecast error

4. Conclusion

In view of the large and medium-sized cities in China under construction and
completed metro construction projects, survey the original samples, using Delphi
method to grade indexes, through the analysis of the factor analysis method to the
original dimension reduction of 30 indicators, and then use BP_Adaboost index for
neural network training and testing, the following conclusions:
Based on the factor analysis method to reduce the dimension of the 30 warning
indicators of the metro construction project and got 7 core early warning indicators, as
the input of the BP_Adaboost neural network early warning model.
It was based on BP_AdaBoost network of metro construction safety risk early
warning model, to realize the classification of nonlinear index. Nonlinear index
classification was realized based on BP_AdaBoost network of metro construction
safety risk early warning model. Compared with BP neural network algorithm, it is
better in anti-noise ability, smaller in actual value error, and higher prediction accuracy.

References

[1] J. J. Reilly. Management Process for complex underground and tunneling Projects. Tunneling &
Underground Space Technology. 2000:31-44.
[2] M. H. Sorensen and J. D. Faber. Indications maintenance claiming of exonerate structures. Structural
Safety. 2002:377-396.
346 H.-D. Wang et al. / Safety Risk Early-Warning for Metro Construction

[3] Ren-hui Liu, Bo Yu, Zhen Jin. Study on Index System of Safety Risk Evaluation for Subway
Construction Based on Interval Estimation. Prediction.2012, 31 (2): 62-66.
[4] Ghosh Sid, Jintanapakanont Jakkapan. Identifying and assessing the critical risk factors in an
underground rail project in Thailand: A factor analysis approach. International Journal of Project
Management, 2004, 22(8):633-643.
[5] Shapira A, Simcha M. AHP –based weighting of factors affecting safety on construction sites with tower
cranes. Journal of Construction Engineering and Management, 2009, 135(4):307-318.
[6] Fan Chen, Hong-tao Xie. Subway Construction Safety Early Warning Research Based on Factor
Analysis and BP Network. China Safety Science Journal.2012,08:85-91.
[7] Zheng Xin, Hai-Ma Feng. Metro Construction Safety Risk Assessment Based on the Fuzzy AHP and the
Comprehensive Evaluation Method. Applied Mechanics and Materials, 2014, Vol.3307 (580),
pp.1243-1248.
[8] Hallowell M R, Gambatase J A. Activity-based safety risk quantification for concrete form work
construction .Journal of Construction Engineering and Management,2009㧘135(10):990-998.
[9] Sheng-Li Zhu, Wen-Bin Wang, Wei-Ning Liu. Risk management of metro engineering construction.
Urban Express Rail Transit. 01:, 56-60.2008.
[10] Feng Lv, Research on construction risk warning in mountain areas large section highway tunnel. Chong
qing Jiaotong university. 2010.
[11] GB50299-2003, Underground railway engineering construction and acceptance specification.
[12] GB/T50326-2001, Construction project management norms.
[13] Wei-Ke Chen, Xing-Hua Wang. Design and analysis of metro construction disaster early warning index
system. Urban Rail Transit.2007,10:25-29.
[14] Li Xue-Mei. Study on the early warning index system of construction safety risk of subway
engineering . Hubei. Huazhong University of Science and Technology.2011.
[15] Hong-Lin Wang. Study on construction risk management of urban rail transit project. China Mining
University. 2014.
[16] Hai-Li Yu. Safety risk analysis and evaluation of engineering construction based on human factors.
Wuhan: Huazhong University of Science and Technology, 2012.
Fuzzy Systems and Data Mining II 347
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-347

The Method Study on Tax Inspection


Cases-Choice: Improved Support Vector
Machine
Jing-Huai SHEa and Jing ZHUOb, 1
a
College of Business Administration, Capital University of Economics and Business,
121 Zhangjialukou, HuaxiangFengtai District, Beijing, China
b
Faculty of Law, University of Macau, Avenida da Universidade, Taipa, Macau, China

Abstract. It is easy to be ignored the interrelation between the indexes of tax


inspection cases-choice and the dynamic nature of related data by the traditional
data mining technology. This paper attempts to establish the model of kernelled
support matrix machine and proposes the data mining method of tax inspection
cases-choice on the basis of new type of data. Moreover, the data mining method
is valuable for improving tax inspection cases-choice accuracy

Keywords. Tax Inspection Cases-Choice, Support Vector Machine, Kernelled


Support Matrix Machine, Data Mining

Introduction

As the core part of tax inspection, the tax inspection cases-choice is a method to select
objects and projects by tax authorities. This cases-choice uses the manual mode and
computer analysis mode, which is based on the present tax rules and methodologies [1].
Its information has experienced the process from the electronized stage of simulating
the manual operations to the current information stage including the general
management of tax inspection. However, the tax inspection information cannot manage
and integrate the accumulated massive basic data, which include the basic information
and filing information of tax-payers [2]. Therefore, it is necessary to applying the data
mining technology to look for credible and valuable information from massive random
data.

1. The Application Research of Data Mining in Tax Inspection Cases-Choice

At present, some local tax departments in China have tried to utilize the related
technologies of data mining, such as data warehouse, to carry through the inspection
cases-choices. By comparing the decision tree model of C5.0 with the binary-
classification logistic regression method, it shows that the decision tree model can
improve the efficiency and effect of inspection cases-choice work much more [3]. For

1
Corresponding Author Jing ZHUO, University of Macau, Faculty of Law, Avenida da Universidade,
Taipa, Macau, China; Email: jzhuo@outlook.com.
348 J.-H. She and J. Zhuo / The Method Study on Tax Inspection Cases-Choice: Improved SVM

another example, the significant effects of data mining technology on the tax inspection
cases-choice have been studied in the following methods as: Application of Self-
Organizing Map (SOM) [4, 5], Association Rule [6], Combination of Support Vector
Machine (SVM) and SOM [1, 7] and Generalized Regression Neural Network (GRNN)
Model [8] etc.
It has been proven form the above literatures that the data mining as a technology
of tax inspection cases-choice is obviously better than the conventional mathematical
statistics methods. However, the biggest disadvantage when these data mining
technologies are used to deal with the inspection cases-choice problem of vector mode
data is limited by the data type, ignoring the mutual relations among the indexes or
characteristics of inspection cases-choice. Moreover, in fact, researchers in the actual
operation of tax inspection cases-choice always concern the temporal dynamic problem
of alternative case. However, SVM is often lacking the dynamic feature, which the data
should have.

2. SVM Theory and Improvement

2.1 SVM Theory

SVM proposed by Vapnik is a learning theory specific to the classification and


regression problems in 1995 [9]. It can easily seek out the divided plane with the best
classification effect in the n-dimensional space. This kind of method is the maximum
margin principle. Base on the idea of maximum margin, the optimization problem of
SVM is as follows:

1
min ‖Ä‖22 + í y Ž6
Ä,Ú,ë 2
6~!

s.t. ¢6 (Ä î 6 + Ú) ≥ 1 − Ž6 ,
Ž ≥ 0, ; = 1, ⋯ , .
(1)
C is the penalty parameter. Much bigger C is, much larger the penalty on
misclassification will be. The so-called maximum margin principle is to make the
previous distance farthest between the two planes of Ä î + Ú = 1 and Ä î + Ú = −1
determined by the margin plane of Ä î + Ú = 0 .Since the variable number of
optimization problem (1) is related to the dimensionality of sample point 6 , Tt is not
directly to solve (1) but turned to solve the dual problem during solving. The dual
problem of optimization problem (1) is as follows:
!
minï8|¥ ∑6~! ∑€~! ¼6 ¼€ ¢6 ¢€ 6ð € − ∑6~! ¼6
8¶W "
s. t. ∑6~! ¼6 ¢6 = 0,
0 ≤ ¼6 ≤ í, ; = 1, ⋯ , .
(2)
Obviously, (2) is a convex quadratic programming problem. The solution of
convex quadratic programming method can be used to work out the optimum solution,
and then through the relation between Ä ∗ and ¼6∗ :
Ä ∗ = ∑;=1 ¼∗; ¢; ; , Ú ∗ = ¢€ − ∑;=1 ¼∗; ¢; Á; ® (3)
J.-H. She and J. Zhuo / The Method Study on Tax Inspection Cases-Choice: Improved SVM 349

The decision function can be obtained as:




A( ) = sgn²(x) = sgn(Ä ∗î + Ú ∗ ) = sgn(y ¼6∗ ¢6 6î + Ú ∗ )


6~!
(4)
Ä ∗ and Ú ∗ are the optimum solutions of optimization problem concerning (1).

2.2 Support Matrix Machine & Kernel Function

x Support Matrix Machine Model


Putting forward the extended form of SVM-- Support Matrix Machine (SMM), [10,
11]. The model is as follows:

1
min ‖ô ô ð ‖" + í y Ž6
òW ,ò ,ó,ë 2 ! " õ
6~!

s.t. ¢6 (Ä!ð 46 Ä" + Ú) ≥ 1 − Ž6 ,


Ž ≥ 0, ; = 1, ⋯ , .
(5)
By solving (5), its decision function can be worked out as:
A(4) = sgn²(4) = sgn(Ä!∗ð 4Ä"∗ + Ú ∗ ),
(6)
Ä!∗ , Ä"∗ Ú∗
and are the optimum solutions of optimization problem concerning (5).
SMM can only be used to deal with the linearly separable problem. Therefore,
when disposing the honest of taxpayer, the method of (5) cannot be used, and the
model of kernelled support matrix machine must be selected and used:
  
1
min y y ¼6 ¼€ ¢6 ¢€ ö4; , 4®  − y ¼6
ï8 |¥8¶W 2
6~! €~! :~!


s. t. y ¼6 ¢6 = 0
6~!

0 ≤ ¼6 ≤ í, ; = 1, ⋯ , .
(7)
Inside, ö46 , 4€  = ⟨Ф(46 ) Ф(46 )⟩ becomes the kernel functions of matrixes 46
and 4€ . The right side of equation is the inner product of matrix, and Ф is the mapping
of matrix space, which can map the sample points X from the matrix space into the
higher dimensional matrix characteristic space. In the higher dimensional matrix
characteristic space, all the sample points can be linearly divided. Therefore, the final
problem can be converted to search the linear decision function in the characteristic
space. The final decision function will be:
A(4) = sgn²(4) = sgn∑6~! ¼6∗ ¢6 ö(46 , 4) + Ú ∗  (8)

Inside, ¼6 and Ú ∗ are the optimum solutions of optimization problem (7).
x Kernel Function of Matrix
Observing (7) and (8), the mappings of Ф are all appeared in pairs by the forms of
inner products. So the defined mappings of Ф are turned into the kernel function Κ.
There were some scholars gave out the definition of matrix kernel function as [12]:
ö(4, ³) = ö! (4, ³)ö " (4, ³) (9)
350 J.-H. She and J. Zhuo / The Method Study on Tax Inspection Cases-Choice: Improved SVM

Where, ö 6 (X, Y) = exp(− ü! (c6 − Áf(ýî ý))) , Á


ý = .14,(;) .14,(;) and ; = 1,2. !
.«,(!) and
! T
.«,(") are respectively from the right singular vectors of matrixes X and X :
! î
! ∑ 0 .«,(!)
4 = þd«,(!) "
d«,(!) ÿ  «,(!)  î
(10)
0 0 .«,(!)
"

! î
! ∑ 0 .«,(")
4 ð = þd«,(") "
d«,(") ÿ  «,(")  î
(11)
0 0 .«,(")
"

The matrixes c! and c" are respectively the unit matrixes whose sizes are same
with that of matrixes 4 and 4 î .
(7) is combined with (9). Then, one kind of new classification model based on the
data of matrix can be obtained. This model can be directly applied in the inspection
cases-choice problem of credit tax payments. In the end, (8) will be utilized to predict
the extent of credit tax payments for the new sample points.

3. Experiment of Case

In this paper, the inspection of added-value tax is selected to carry through the
experimental study of inspection cases-choice. The data of this study are mainly based
on the Vat Payment Return, Annex of Vat Payment Return, Input/Output Tax Amount
List of VAT, and Special Tax Payment Letter, Other information of tax declaration,
balance sheet and Profit Statement in the continuous three years.

3.1 Indexes & Data Sources of Inspection Cases-Choice

By referring the indexes required by the practical experiences of tax inspection and
internal documents as well as the achievements of literature researches, the indexes2 of
VAT inspection cases-choice are determined (Table 1).
Table 1. Indexes of VAT Inspection Cases-Choice

Name of Index Calculation Formula


Tax Burden Rate Payable VAT Amount/Main Business Income
Effective Tax Rate Output Tax/(Main Business Income+ Other Business Income)
Inventory Rate (Final Inventory/Selling Cost) ×100%
Quick Ratio Quick Assets/Current Liabilities
Net Profit Rate of Assets Net Profit/Total Average Assets Total Average Assets= (Total
Initial Assets+ Total Final Assets)/2
Ratio of Sales to Cost Selling Cost/Main Business Income
Sales Financial Expense Ratio Financial Expense/Main Business Income

The data of this study are from the VAT tax inspection of some tax bureau.
Totally, 140 commercial enterprises are randomly inspected. By the inspection, 60
enterprises inside are the enterprises of non-honest tax payments, and other 80
2
The Index selection is mainly based on the indexes from Chen (2004) [4]. The cases-choice indexes of
VAT are strictly selected by the stepwise discriminant analysis method. In fact, there are some differences for
the selection of each kind of cases-choice indexes and there is a very important significance in the cases-
choice. However, it is ignored since this study is only aimed at the research of data mining technology.
J.-H. She and J. Zhuo / The Method Study on Tax Inspection Cases-Choice: Improved SVM 351

enterprises are the enterprises of honest tax payments. According to the indexes of
cases-choice in Table 1, before disposing kernelled support matrix machine model㧘
the standardizing treatment of related ratio indexes needs to be done.

3.2 Results of Data Processing

The experiment was under the environment of MATLAB 2010a. SVM and toolbox
function were called to realize the system simulation and test. In the collected data sets,
20 sample points of each class (means the honest and non-honest) were randomly
selected as the training set, and the left sample points were taken as the test set, namely
that there were 40 sample points in the training set and 100 sample points in the test set.
According to the above discussions, in SMM (7), total 13 values of
2’ ,2’ ,∙∙∙ ,2’! ,2 ,2! ,∙∙∙ ,2 were valued as values of parameter C, and the parameter
in the kernel function (9) was valued as 1, 2, … and 10 (1 was taken as the step length).
After the sample points were randomly selected, such an experiment was classified by
SMM for 10 times. Then, the average value of this 10 times of accuracy values was
solved (Inside, the highest accuracies were shown as Table 2), namely that the final
classification accuracy was gotten.
Table 2. Performance comparison of SMM methods

Algorithm Penalty Factor C of Classification Accuracy Classification Accuracy


Errors of Training Set (%) of Test Set (%)
SMM (Method I) 100 91.02 88.21
SMM (Method II) 97 96.44 94.02

Table 3. Comparison of test set accuracies in each algorithm

Algorithm Accuracy (%)


KNN 52.50
Neural Network 69.71
Linear Kernel SVM 72.73
SMM (Method I) 85.10
SMM (Method II) 95.32

Finally, the divided segments were laid out as the row vectors of matrix, to form
the final 4×7 matrix mode data. Compared the results of operations with the accuracies
of other conventional methods (Table 3), it shows that the kernelled SMM, especially
SMM (Method II) has a reliable superiority.
It can be seen from Table 3 that: When the kernelled SMM method is used to deal
with the data set, its predicting accuracy will be higher than that of other three methods.
SMM (Method II) keeps the dynamic nature of time, so that its obtained effect is more
ideal.
352 J.-H. She and J. Zhuo / The Method Study on Tax Inspection Cases-Choice: Improved SVM

4. Conclusion

Through improving the SMM and the constructive method of its data, this paper
optimized the kernelled SMM method of matrix data. By applying the actual data of
some tax department collected in the case of tax inspection cases-choice, and on the
basis of standardizing treatment of data set, this paper applied two methods to construct
the matrix mode data. It is shown from the experiment that: The kernelled SMM is
obviously better than the data mining methods of conventional SVM etc. This method
can effectively resolve the nonlinear and dynamic evaluation problem of honest
taxpayer, and has a higher value to improve the tax inspection cases-choice efficiency.
It is worthy for further study and pilot practical application.

Acknowledgements

The work described in this paper was supported by the National Social Science
Foundation of China (No: 14BGL214), the Science Foundation of Ministry of
Education of China (No.13YJA630073)

References

[1] S. H. Hong, Study on tax inspection cases-choice method, Master Thesis of Shantou University, (2011), 1.
[2] Y. H. Zhang, Tax inspection cases-choice study based on data mining technology: take the corporate
income tax as example, Master Thesis of Guangdong Business College, (2012), 1.
[3] S. H. Chen et al., Research on tax inspection based on C 5.0 decision tree, Journal of Lianyungang
Technical College, 03 (2011), 21-23.
[4] Y. Chen, Research on the method of tax inspection cases-choice, Tianjin University, (2004), 42-44.
[5] X. Wu et al., The method and application of tax inspection cases-choice based on neural network,
Journal of Xidian University (Social Science Edition), 05 (2007), 63-69.
[6] S. G. Xu, Research on tax compliance problem in China, Huazhong University of Science and
Technology, (2011), 1-110.
[7] H. Xia et al., Study on the corner detecting method based on SVM, Computer Applications and Software,
01 (2009), 230-231, 276.
[8] W. G. Lou et al., The modeling and empirical assessment of generalized regression neural network in tax
evaluation, Systems Engineering, 11 (2013), 74-80.
[9] C Cortes and V Vapnik, Support-vector networks, Machine Learning, 20 (1995), 273-293.
[10] D Cai et al., Support tensors machines for text categorization, Technical Report of Department of
Computer Science, University of Illinois at Urbana-Champaign, No.2714, 2006.
[11] D Cai et al., Learning with tensor representation, Technical Report of Department of Computer
Science, Department of Computer Science, University of Illinois at Urbana-Champaign, No. 2716,
2006.
[12] M Signoretto, LD Lathauwer and JAK Suykens, A kernel-based framework to tensorial data analysis,
Neural Networks, 24 (2011), 861-874.
Fuzzy Systems and Data Mining II 353
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-353

Development of the System with


Component for the Numerical Calculation
and Visualization of Non-Stationary Waves
Propagation in Solids
Zhanar AKHMETOVAa,1,Serik ZHUZBAYEVa, Seilkhan BORANBAYEVa and
Bakytbek SARSENOVb
a
Department of Information Systems, L.N.Gumilyov Eurasian National University,
Satpayev st.2, Astana, Kazakhstan
b
Department of Mathematics, Hodja Akhmet Yassawi International Kazakh-Turkish
University, Sattarhanov st. 29, Turkestan, Kazakhstan

Abstract. The information system is considered in this paper. The nucleus of this
system is a software component, which is used for the numerical calculation of the
wave process. This wave process is the propagation of non-stationary waves in a
homogeneous solid in dynamic loading, and visualization of numerical solutions in
the form of the stress tensor and the velocity waveform. Many information systems,
software are usually developed on the basis of numerical methods such as finite
element methods, boundary element, etc. To solve this problem in this paper we
use the method bicharacteristics using the ideas splitting method, which is the
novelty and originality of this case study. The user can make a prediction and
engineering analysis with the help of the developed system. These predictions and
estimates can be used for building structures in engineering practice, in mechanical
engineering and in general for scientific research in the field of engineering. The
practical significance of the developed system is that the using of such a system
helps organizations reduce project development cycle and reduce costs and
improve product quality.

Keywords. propagation of non-stationary waves, numerical method, information


system, data mining visualization, engineering.

Introduction

The rapid development of computer technology and its implementation in almost all
spheres of life has led to the fact that today a qualified specialist in any field of
knowledge analysis (CAE). These CAD / CAM systems like AutoCAD, DUCT, Pro /
Engineer, Unigraphics and SolidsWorks are widely used for computer modeling of
complex shapes[1]. Modern engineering is not possible without knowledge of
computer- aided design systems (CAD), automatic production (CAM) and automated
engineering and Solids Works widely used for computer modeling of complex shapes,
with the subsequent release of drawings and generation of control programs [2].

1
Corresponding Author: Zhanar AKHMETOVA, Department of Information Systems, L.N.Gumilyov
Eurasian National University, Satpayev st.2, Astana, Kazakhstan; E-mail: zaigura@mail.ru.
354 Z. Akhmetova et al. / Development of the System

However, these specialized numerical modeling packages have not developed means of
engineering analysis [3,4].
In this paper is considered the information system. The nucleus of information
system is a software component that is used for the numerical calculation of the wave
process, as the propagation of non-stationary waves in a homogeneous solid in dynamic
loading. And this software component also is used for visualization of numerical
solutions in the form of the stress tensor and the velocity waveform. We used the
numerical bicharacteristics method for develop this component. Thebicharacteristics
method has ideas of splitting method.
Many software programs are usually developed on the basis of numerical methods
such as finite element methods, boundary element [5]. The novelty and originality of
this case study is that a software component developed using a numerical method
bicharacteristics using the ideas of the splitting method. The advantage of the proposed
method is that it allows approaching dependence domain of the final and differential
equation to the dependence area of the initial differential equation as much as possible.
This method is one of the most convenient for creating software and applications [6,7].

1. Statement of the Problem

In this paper, we consider one of the statements of the problems, which are solved
using the developed information system.
The flat semi-strip with final width 1, which is made from a linear elastic material
properties which are characterized by a density U1 , speed of propagation of
longitudinal a1 and transverse b1 waves are already fixed rectangular coordinates
x1Ox2 occupies an area x2 d 1, , 0 d x1 d f (Figure 1).

Figure 1. The studied body

In the initial time point the body is at state of rest.


-a 0 , V Dj 0, ( D , j 1 on t0 0) (1)
At any given time t n  W (n=1,2,…N) at the end-strip x1 0 boundary
condition is valid
V 11(1) f (t ), V 12(t ) 0 (2)
Z. Akhmetova et al. / Development of the System 355

The rest of the homogeneous boundary of the body free from stress.
V 22 0, V 12 0 on x2 d 1, 0 d x1 d f (3)

2. Calculation of Bicharacteristics Numerical Method

Difference method using the method of spatial characteristics of Clifton proposed in [8]
for the study of planar dynamical problems, and in [9] Recker developed for the study
of elastic wave propagation in isotropic bodies, a rectangular shape [4]. In this paper,
we solve the non-stationary problem of the dynamics of a homogeneous body with
bicharacteristics method [10]. This body is in a Cartesian coordinate system. To
understand the essence of the bicharacteristics method, consider a deformation of
elastic semi-strip. The final width occupies the area of x1  l in the Cartesian
system (Figure 2).

Figure 2. The elastic semi-strip

Іn thеіnіtіаl tіmе pоіntthе bоdуіs аt stаtеоf rеst.Whеrе xi – Cаrtеsіаn cооrdіnаtеs, t


- tіmе, V ij – tеnsіоn tеnsоr, -i – vеlоcіtу vесtоr, ui – dіsplаcеmеnt vесtоr [11].
- i 0, V ij 0, ( i,j=1,2) (4)

Аt аnуоthеr tіmеоn thе sіtе N1 d x2 d N 2 , x1 l оf bоrderBN the uniformlу


dіstrіbuted transіent normal load f(t) has іtsіnfluence,whіchvarіes according to sine
law
­ A sin( Zt ),0 d t d S1 ½
V 22 (t ) ® ¾ (5)
¯0, t t S1 ¿
V 21 (t ) 0
S
Whеrе S 1 – loаdings аctіon tіmеаnd Z .The other part of the semi trip bоrder
S1
is frееfrоm any іnfluеncе:
356 Z. Akhmetova et al. / Development of the System

V 12 (t ) 0, V 11 (t ) 0, x1 0,
(6)
V 21 (t ) 0, V 22 (t ) 0, 0 d x1  ( N 1 , N 2 ), x 2 l

Under existing conditions it is nеcеssаrуtоіnvestіgаtеаnеlastісbоdу tension at t>0


[12].

2.1. The Defining Equations.

In order to solve the problem along with entry and boundary conditions, we used the
system of the equations consisting of the movement and ratios equations of the
generalized Hooke's law [13]:
w 2ui
V iE , E U 2 (7)
wt
V ij Ou E ,E G ij  P (u i , j  u i , j ) (8)
Where U – density, O , P – Lama’s constants, V i, j – Kroneckerdelta.and
required sizes are entered (9).
After the non-dimensional variables integration, the motion equations (7) and
differentiated by time correlation of the generalized Hooke's law (8) takes the form
(10):
­-1 V 11,1  V 12, 2 ,
°
°X1 V 21,1  V 22, 2 ,
°
®V 11 -1,1  J 11V 2, 2 , (10)
°V J 11-1,1  V 2, 2 ,
° 22
°V J 122 (-1, 2  -2,1 ).
¯ 12

2.2. The Equations of Bicharacteristics

In order to obtain bicharacteristics equation and conditions on them, let us split the
two-dimensional system (10) on the single-dimensional one. Applying ideas of
K.A.Bagrinovski and S.K.Godunov on splitting multidimensional t- hyperbolic systems
on single-dimensional systems where xk const [11], we will have the system (11):
­°-i  V ij , j aij , ½°
® ¾ (11)
°̄V ij  Oij-i , j bij °¿
2

Where bij (J 11G ij  J 122 (1  G ij ))- p,k ; i, j , k , p 1,2; p z i, k z j

From here, using notorious methods to obtain differential bicharacteristics


equations and conditions on them, we obtain (Fig. 3) [11]:
Z. Akhmetova et al. / Development of the System 357

dx j rOij dt
dV ij B Oij d-i (bij B Oij aij )dt (12)
According to the bicharacteristics method for selection of a point scheme and a
pattern the studied body is divided into square cells. The sides of the cell are
'x1 'x2 h .Іn thе dоublе pоіnts, the functіоn values -i , V ij аrе sеarchеd at
various time points with step of W .The dot grid (on the basis of which the difference
schеmе is buіlt, оthеr thаn thоsеmentiоnеd dоublе pоints) cоntаins points formed by
thеinterseсtiоn оf bіchаrаctеrіstісs wіthhуperplanes t const . Accеptеd pattern
cоnsisting of О node and Eij poіnts, separated from the point O to the distance OijW
(Figure 3)[13].

Figure 3. The view bicharacteristics on plane

3. Analysis of the Results

With the help of the developed software information system component, we were able
to visualize the numerical solution of the problem in this paper (Figure 4). On the 1st
picture of Figure 4 schematically shows the types of waves, determining the tension
points of the body. On 2nd and 3rd Figure 4 shows the wave of the longitudinal Q 1 and
transverse Q 2 particle velocity normal V jj 0 (j=1,2) and tangential stresses on V 12
time intervalfour fixed points of observation: 1 ( x1 0h, x2 5h) ; 2
( x1 5h, x2 5h), 3 ( x1 10h, x2 5h) , 4 ( x1 15h, x2 5h).
Analysis of the results shows that in the semi-strip clearly observed two-
dimensional nature of the wave process, and used the difference scheme does not lose
stability at a fairly long periods of time (studied before t = 420τ)[14,15]. Furthermore,
the results of decisions can be used for comparative evaluations for solving the more
complex tasks.
358 Z. Akhmetova et al. / Development of the System

a)

b)

c)
Figure 4.Visualization of numerical solutions obtained using the developed system
Z. Akhmetova et al. / Development of the System 359

4. Conclusion

With the help of the developed system, we were able to solve a number of tasks that are
useful in engineering practice: 1) The numerical solution of non-stationary wave
propagation in solids was obtained using the method bicharacteristics; 2) The
visualization of numerical solution of propagation of non-stationary waves was
obtained which occur in a solid and dynamic loading; 3) The analysis of the results was
obtained with the help of visualization; 4) The information system was developed, the
database which stores all the numerical solutions and visualizations. Multipurpose
orientation of the system, independent of the hardware (from PCs to workstations and
supercomputers), full compatibility with the Windows operating system and "friendly"
interface, can not only perform high-quality simulation of wave process, but also to do
analyzes and forecasts for these types of visualizations. [16,17].
The practical significance of this system is its support development organizations
to reduce development cycle, reducing the cost of products and improve product
quality. [18,19].

References

[1] B. Kantarci, H. T. Mouftah and S. Oktug. Availability analysis and connection provisioning in
overlapping shared segment protection for optical networks. Computer and Information Sciences,
2008.ISCIS'08.23rd International Symposium on.IEEE, 2008.
[2] G. Tarabrin, Metallurgical science, Moscow,3(1979), 193-199
[3] A. Zhidkov,Application of ANSYS system to meet the challenges of the geometric and finite element
modeling, Nizhny Novgorod, 4(2006), 4–5
[4] G. Tarabrin, Mechanics and calculation of constructions, Moscow,4(1981), 38-43.
[5] G. Tarabrin, Construction mechanics and calculation of constructions, Moscow, 3(1979), 193-199.
[6] G.Tidwell, Development of user interfaces,Trans. from English, 2008, 416.
[7] I. Medvedkov, Y. Bugaev, S. Nikonov. The Database, Voronezh, 2014, 67-73.
[8] R.Clifton, Mechanics, Moscow, 1(1968), 103-122.
[9] V. Reker, Applied mechanics. Series E, Moscow, 1(1970), 121-129.
[10] O. Syuntyurenko, Electron. Lib. Electronic information resources: new technologies and applications.
1(2011), 214-230.
[11] Z. Akhmetova, S. Zhuzbayev, S. Boranbayev., ActaPhysicaPolonica A, PolskaAkademiaNauk,
129(2016), 352-354.
[12] S.Dzuzbayev. B. Sarsenov, Dynamic stress state of the half-strip in a side pulse pressure,
Almaty,3(2003),55-62.
[13] Z. Akhmetova, S. Boranbayev, S. Zhuzbayev, Advances in Intelligent Systems and Computing,
Springer International Publishing Switzerland,448(2016), 473-482.
[14] S.Boranbayev , S. Altayev, A.Boranbayev, Proceedings of the 12th International Conference on
Information Technology: New Generations, Las Vegas, 2015, 796-799.
[15] A.Boranbayev , S.Boranbayev, Proceedings of the 7th IEEE International Conference on Application of
Information and Communication Technologies, Astana, 2014, 1282-1284.
[16] D. Raskin, Interface: New directions in designing of computer systems, Transl. from English, 2007,
272.
[17] G. Druzhinin, I. Sergeev, Maintenance of information systems, Marshrut, 2013, 124-128.
[18] Q. Mao, Micro-UIDT: A user interface development tool. Eurographics Association, 1989, 3-14.
[19] I.Molina Ana, M.Redondo, M. Ortega, A methodological approach for user interface development of
collaborative applications: A case study. Science of Computer Programming, 74(2009), 754-776.
360 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-360

Infrared Image Recognition of Bushing


Type Cable Terminal Based on Radon and
Fourier-Mellin Transform and BP Neural
Network
Hai-Qing NIU1, Wen-Jian ZHENG, Huang ZHANG, Jia XU and Ju-Zhuo WU
School of Electric Power, South China University of Technology, Guangzhou 510641,
Guangdong, China

Abstract. To recognize the abnormal heating of cable terminal, Radon and


Fourier-Mellin transform is used to extract the feature in the paper. First,
processing the original image by Radon transform and Fourier-Mellin transform
successively and then extracting the four feature quantities of the transformed
image based on invariant function. Finally, feature vectors will be input to the BP
neural network for image recognition. It can be concluded from the result that the
method proposed in the paper can reflect features of the infrared image more
effectively. Infrared image with Pepper and salt noise and white Gaussian noise is
recognized based on the method which proves its strong robusticity on noise
disturbance.

Keywords. bushing type cable terminal, infrared image, radon transform, fourier-
Mellin transform, BP neural network, feature extraction, image recognition

Introduction

The bushing type cable terminal, as the cable accessory connecting cable lines and
overhead lines has been widely used for its mature technology and stable running
record in 110kV and 220kV city grid,. It may malfunction because of poor design,
manufacture, installation and operating environment, so it’s necessary to carry out
regular inspections and preventive tests among which infrared detection is the main one.
Infrared detection can show the surface thermal distribution of the terminal by the
human eye visible image which can help diagnose and distinguish the existence of
defect and its properties, location and severity, then the corresponding measures can be
taken to eliminate the defect. The nearly four years statistics of cable terminal infrared
image in Guangzhou shows that abnormal heating of terminal concentrated in the
clamp, the stress cone, the tail and so on. Due to the difference of heating type and
diagnostic criteria in different heating part, it’s essential to recognize the pattern of
infrared images to overcome the low efficiency of human analysis and diagnostic
methods.

1
Corresponding Author: Hai-Qing NIU, Associate Professor, School of Electric Power, South China
University of Technology, Guangzhou, China; E-mail: niuhq@scut.edu.cn.
H.-Q. Niu et al. / Infrared Image Recognition of Bushing Type Cable Terminal 361

Feature extraction is the key for image pattern recognition. Favorable feature is not
affected by light, noise and geometry transformation. In the development of image
recognition, new features are constantly being raised and moment feature is widely
concerned. According to the characteristic of projection basis function, moment can be
divided into non-orthogonal moment and orthogonal moment. Orthogonal moment
includes Fourier-Mellin moment [1-2] which has strong robusticity to noise and good
effect on image reconstruction, but lacks of scale transformation invariance[3-6] and
will bring in resample and requantization error.
In order to avoid the shortcomings of the orthogonal moment, Radon transform [7-
8] can be used to process the gray image firstly and then analytic Fourier-Mellin
transform is utilized to the result. It means transforming scale change to amplitude
change, rotation change to phase change of original image. Rotation and scale invariant
functions are defined based on the above transform which applied to extract feature of
image rotation and scale invariant. The feature of infrared images of abnormal heating
of cable terminal will be extracted based on the method mentioned above and image
will be recognized by using BP neural network in the paper.

1. Feature Extraction of Infrared Image

1.1. Feature Extraction Based on Radon Transform and Fourier-Mellin Transform

Scale transform (factor is O ) and rotate(angle is I ) is taken to the denoised image


f x, y to get a new image f g x, y . M g (u, k ) is the result after taking Randon
[7-8]
transform and Fourier-Mellin transform to f g x, y .
f 2S
M g ( u, k ) ³0 ³0 O P(W , E )(OW )V iu 1 e ik ( E M ) O dW d E
f 2S
O V 1O iu eikI ³ ³0 P(W , E )W V iu 1 e ik E dW d E (1)
0

O V iu 1 eikI M (u, k )


After Randon and Fourier-Mellin transform, the rotation and scale transformation
of the original image f x, y are transformed into the phase factor and amplitude
factor.
The function Z (u, k ) is defined as follows:
Z (u, k ) M (0,0)  (V iu 1)/(V 1) eik arg( M (0,1)) M (u, k ) (2)
And Z g (u, k ) can be deduced by formula (1) and formula (2):
M g (0,0)  (V iu 1)/(V 1) e
ik arg( M g (0,1))
Z g ( u, k ) M g ( u, k )
V 1 iI
[O V 1M (0,0)] (V iu 1)/(V 1) eik arg( O e M (0,1))
u O V iu 1eikI M (u, k ) (3)
 (V iu 1)/(V 1) ik arg( M (0,1))
M (0,0) e M ( u, k )
Z ( u, k )
According to the formula (3), the function Z (u, k ) is not affected by the scale
transformation and rotation of the original image which overcomes the lack of scale
transformation invariance of orthogonal moment.
362 H.-Q. Niu et al. / Infrared Image Recognition of Bushing Type Cable Terminal

The feature extracted based on invariant function Z (u, k ) is as follows:


M N M N
¦¦ Z (u, k ) 1
M N 2
¦¦ Z (u, k ) , e2 , e3 ¦¦ Z (u, k )  P ,
2
e1
u 1k 1 u 1k 1 ( M u N )2 u 1k 1
M N
1
¦¦ Z (u, k )
2 2
e4  log( Z (u, k ) ) (4)
( M u N )2 u 1k 1
M N
1
Where: P ¦¦ Z (u, k ) -the average value of function, M- the number
(M u N ) u 1 k 1
of row and N- the number of column.
Four eigenvalues of each gray image is calculated according to the formula (4) and
they constitute the feature vector which will be input to the BP neural network for
identification.

1.2. Feature Extraction of Infrared Image of Cable Terminal

Due to the infrared images collected by infrared thermal imager are color images, it is
necessary to convert them to grayscale for convenience of the computer data processing.
The infrared image of the cable terminal whose clamp, stress cone and tail abnormal
heating is selected and then the gray image of the infrared image is obtained by using
rgb2gray function, as shown in figure 1.

Figure 1. Original infrared image grayscale


The eigenvalues of the infrared image can be got by using the feature extraction
method based on Radon transform and Fourier-Mellin transform, as shown in table 1.
Table 1. Image feature extraction of heating in different parts

Eigenvalues The clamp The stress cone The tail of a bushing


e1 1.09*10^9 1.24*10^9 1.20*10^9
e2 2.64*10^7 2.75*10^7 2.61*10^7
e3 427.71 444.88 422.13
e4 -5.31*10^6 -5.80*10^6 -5.21*10^6

2. Infrared Image Recognition Based on BP Neural Network

2.1. BP Neural Network

The topology of BP neural network [9] includes input layer, hidden layer and output
layer. Figure 2 is a network topology with two hidden layers.
H.-Q. Niu et al. / Infrared Image Recognition of Bushing Type Cable Terminal 363

Figure 2. BP neural network


In Figure 2, X and Y represent the input and output variables of the network
respectively. W1 is the weight matrix of the input layer and the hidden layer, and W2 is
the weight matrix of the hidden layer and the hidden layer, and W3 is the weight matrix
of the hidden layer and the output layer. f1 (x) and f2 (x) is the activation function of the
hidden layer. In this paper, the hidden layer selects the sigmoid function, and the output
layer selects the purelin function. a1, a2 is the first and second input vector of the hidden
layer respectively, and a3 is output vector of the output layer. During the training
process, the weights and thresholds are adjusted according to the gradient descent
method to minimize the error sum of squares.

2.2. Infrared Image Recognition

It should be coded for images while using BP neural network to identify abnormal
heating image. It means that 00, 01 and 10 respectively stands for network output
corresponding to the abnormal heating image of the clamp, stress cone and the tail. So
the number of output layer of the network is 2. If the extracted feature vector is used as
the input of the network, the number of input layer neurons in the network is 4. The
hidden layer structure of the network is determined to have two hidden layers by
repeated debugging. And the number of neurons both in the first hidden layer and the
second hidden layer is 9. So the structure of the BP neural network is 4-9-9-2.
As is shown in Figure 1, five abnormal heating infrared images of different parts of
the terminal is selected respectively and 5400 images which will be used as training
samples to train the constructed network can be obtained after rotating and enlarging
the original images. Fifty abnormal heating infrared images are selected as samples
and their feature value are extracted and input to the trained neural network and then
the forecast output will be achieved. If the output of the network (00, 01, 10) is in
agreement with the corresponding heating type, the identification is correct.
Recognition effect of BP neural network is shown in table 2.
Table 2. Recognition effect of BP neural network

The number of The correct number of


Infrared image Recognition rate/%
samples samples
Clamp 50 50 100
Stress cone 50 47 94
Tail 50 49 98
Total 150 146 97.3
364 H.-Q. Niu et al. / Infrared Image Recognition of Bushing Type Cable Terminal

From table 2, it can be seen that the method has a good recognition effect, its
average recognition rate is 97.3%. In addition, infrared image background with
excessive brightness may lead to unsuccessful recognition according to the analysis of
wrong infrared image recognition, so operators should pay attention to the influence of
ambient light when shooting infrared images.

2.3. Comparison with invariant moment recognition method

In order to verify the validity of the method, invariant moment recognition method [10-
11] is used to compare and analyze.
The data of samples, the structure of BP neural network and the training process
are consistent with section 2.2. As shown in Table 3 is the comparison of recognition
results between the method mentioned in this paper and invariant moment feature
recognition method.
Table 3. Comparison of recognition result of different methods
The number of
Infrared image Invariant moments /% Recognition rate/%
samples
Clamp 50 93 100
Stress cone 50 86 94
Tail 50 91 98
Total 150 90 97.3
It can be seen from table 3, the recognition rate based on Radon transform and
Fourier-Mellin transform is better than invariant moment feature recognition method
which proves the validity of the method mentioned in this paper.

3. Recognition Effect Analysis

In order to test the robusticity of the feature extraction method based on Radon
transform and Fourier-Mellin transform to the noise, pepper and salt noise and white
Gaussian noise is utilized to pollute image respectively for studying the effect of noisy
on the recognition results.
Training samples and test samples are same with section 2.2 and pepper and salt
noise with different density and white Gaussian noise with different variance is added
to these samples. Finally, the infrared image with different signal to noise ratio is
obtained. Figure 3 is the image with pepper and salt noise.

Figure 3. Image with salt and pepper noise


The training samples are used to train the network and test samples are input to the
trained network to be identified. The recognition of infrared image polluted by pepper
and salt noise and Gaussian white noise are shown in tables 4 and 5 respectively.
H.-Q. Niu et al. / Infrared Image Recognition of Bushing Type Cable Terminal 365

As shown in table 4, infrared image recognition rate decreases with the increase of
the density of pepper and salt noise. Table 5 shows that the recognition rate of infrared
image reduces with the increase of the variance of white Gaussian noise. However,
Infrared image recognition rate is relatively ideal under high noise background which
proves that the recognition method has strong robusticity to noise.
Table 4. Recognition result of image containing salt and pepper noise

Recognition rate/%
The density of noise
The clamp The stress cone The tail of a bushing
0.01 100 92 98
0.03 98 90 94
0.05 94 86 92

Table 5. Recognition result of image containing white Gaussian noise

Recognition rate/%
The variance of noise
The clamp The stress cone The tail of a bushing
5 98 94 96
10 96 90 92
15 92 84 88

4. Conclusion

The feature of infrared image of clamp, stress cone and tail of terminal is extracted
based on Radon and Fourier-Mellin transform and is composed to the feature vector
which will be input to the BP neural network to be identified. Some conclusions can be
obtained.
(1)The recognition rate of abnormal heating infrared image of clamp, stress cone
and the tail is 100%, 94% and 97% respectively and the average recognition rate is
97.3% which proves the validity of recognition of the method.
(2)Pepper and salt noise with different density and white Gaussian noise with
different variance is added to infrared image and the same feature extraction method is
used to recognize it. The recognition results show that the recognition rate of infrared
image reduces with the increase of the variance of white Gaussian noise and the density
of pepper and salt noise. And greater the density or variance of noise is, greater the
descent rate of the recognition rate of the infrared images are. However, infrared image
recognition rate is relatively ideal under high noise background which proves that the
recognition method has strong robusticity to noise.

References

[1] S. Derrode, G. Faouzi. Robust and efficient Fourier-Mellin transform approximations for gray-level
image reconstruction and complete invariant description. Computer Vision and Image Understanding,
83(2001): 57-78.
366 H.-Q. Niu et al. / Infrared Image Recognition of Bushing Type Cable Terminal

[2] K. Zhang, H. Q. Chen, Q. W. Liang, et al. Improvement of Fourier-Mellin moments-based edge detection
algorithm. Journal of Huazhong University of Science and Technology(Natural Science Edition),
38(2010): 53-56.
[3] X. Wang, B. Xiao, J. F. Ma. Scaling and rotation invariant analysis approach to object recognition based
on radon and analytic Fourier-Mellin transforms. Journal of Image and Graphics, 13(2008): 2157-2162.
[4] L. S. Fu, P. W. Liu, D. D. Li. Improved moment invariant characteristics and object recognition.
Computer Engineering and Applications, 48(2012): 183-185.
[5] G. L. Xu, J. Xu, B. Wang, et al. CIBA Moment invariants and their use in spacecraft recognition
algorithm. Acta Aeronautica Et Astronautica Sinica, 35(2014): 857-867.
[6] L. H. Jiang, H. Chenm Z. B, Zhuang, et al. Recognition on low-level wind shear of wavelet invariant
moments. Infrared and Laser Engineering, 43(2014): 3783-3787.
[7] Y. M. Wang, W. Yan, S. Q. Yu. Moment feature extraction of image based on radon transform and its
application in image recognition. Computer Engineering, 27(2001): 82-89.
[8] L. Wang, Q. Chang, K. Zhang, et al. Radon transform for line segment detection in low SNR image,
Infrared and Laser Engineering, 32(2003): 163-166.
[9] G. B. Zhang, X. Luo, Y. Y. Shen, et al. Effect of atmosphere condition on discharge characteristics of air
gap and the application of neural network. High Voltage Engineering, 40(2014):564-571
[10] M. K. Hu. Visual pattern recognition by moment nvariants. IEEE Transactions on Information Theory,
8(1962): 179-182.
[11] J. Flusser. On the independence of rotation moment invariants. Pattern Recognition, 35(2002): 3015-
3017.
Fuzzy Systems and Data Mining II 367
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-367

Face Recognition with Single Sample


Image per Person Based on Residual Space
Zhi-Bo GUO a,1, Yun-Yang YAN b, Yang WANG b and Han-Yu YUAN a
a
School of Information Engineer, Yangzhou University, Yangzhou ,225009, China,
b
Faculty of Computer Engineering, Huaiyin Institute of Technology, Huai’an,
223003,China,

Abstract. To improve the performance of face recognition with only one sample
per person, a novel method of face recognition based on virtual images with multi-
pose and micro expressions is presented here. At first, the quadratic function is
used to create the virtual images as training samples, so as to enhance the
classification information of single training sample. Then the effective
discriminative features are extracted through the Bidirectional two dimensional
PCA in the residual space. The influence of different illumination on face
recognition was cut down effectively. Experiments on ORL and Yale dataset show
the effectiveness of the proposed method and the face recognition rate is improved.

Keywords. multi-pose, micro expressions, virtual image, residual space, face


recognition

Introduction

Because of its special merits of convenient, fast and easy collection, face recognition
technology has received much attention in biometrics recognition field in recent years
[1-2]. There are many effective methods developed for face recognition such as PCA,
LDA and 2DPCA [3-6]. These methods usually use a number of representative face
images for each person as training samples to extract discriminative features for
adapting with pose and illumination variability [7-8]. However, it is difficult to obtain
various sample images of one person in practical operation. Generally, only one image
could be got as sample for each person, such as the photo of personal identification,
student certificates and passports. It has become a current challenging task to attempt to
do face recognition only by the limited samples for each face under complex
illumination, various pose and expression [9].
Fortunately, it is possible and effective for face recognition to use virtual images
produced by the given image in various pose and expression as training samples to
solve the problem of insufficient training samples [10]. The method of (PC) 2A
proposed by Wu [11] fuses the original image and its integral projection together into a
new one. However, the recognition rate is low. Xu [12] develops a method to generate
the virtual images by rotating the original image. Wu [13] makes the best use
of the globe and local information of the samples by dividing the face into sub-blocks
and also overcomes some influence of pose to recognition effect.
1
Corresponding Author: Zhi-Bo GUO, School of Information Engineer, Yangzhou University, No.196,
Huayang Western Road, Yangzhou City, Jiangsu Province, China; E-mail: zhibo_guo@163.com.
368 Z.-B. Guo et al. / Face Recognition with Single Sample Image per Person

In this paper a novel method is proposed for face recognition by using virtual
images which are reconstructed as training samples from single sample image per
person in different pose and micro-expression. Firstly, the wavelet transform and
quadratic function are selected to create the virtual images as training samples. Then,
the algorithm RS-2DPCA is designed based on residual space and Bidirectional
2DPCA for face recognition. The influence of different illumination on recognition
could be decreased in residual space. Experimental results on ORL and Yale face
dataset show that the proposed method is more effective and accurate than the
corresponding.

1. Sample Images Generation

1.1. Wavelet Transform

In order to save the computation time, the sample image could be compressed at first.
The image is decomposed by wavelet transform as shown in Figure 1. The original
image is transformed into four sub-bands that are labeled as LL, LH, HL, and HH.
where LL denotes that horizontal and vertical directions with low frequencies, LH is
the horizontal direction with low frequencies and the vertical one with high frequencies,
HL is the horizontal direction with high frequencies and the vertical one with low
frequencies and HH is both horizontal and vertical directions have high frequencies.
The left up band LL is a coarser approximation to the original image. The right up band
HL and the lower left named LH record respectively the changes of the image along
horizontal and vertical directions while the lower right band HH correspond the higher
frequency component of the image. As it is shown, the information of the LL part is
essential, although the others also play an indispensable role in face recognition. It is
obviously that noises are mainly included in the high frequency part. So each part of
the four blocks after wavelet transform has different weight in face discrimination
information. The fusion of the four sub-bands with different weight is necessary for
face recognition.

(a) The original image (b) The image after wavelet decomposition
Figure 1. Face image using Wavelet transform

The combination of weight of the 4 sub-bands can be defined as follows:


(a)1/4,1/4,1/4,1/4; (b)11/16,3/16,3/16,1/16; (c)25/32,3/32,3/32,1/32; (d)57/64,3/64,3/64,1/64;
(e)1,0,0,0. The reconstructed face images are shown in Figure 2.
Z.-B. Guo et al. / Face Recognition with Single Sample Image per Person 369

(a) (b) (c) (d) (e)


Figure 2. The fusion image after using different weights

Under these combinations of different weight, the experiments are done on the
ORL face dataset by using the face recognition algorithm of the improved bidirectional
two dimensional PCA (detailed in part 3). For each person, five images are used as
training samples and the rest for testing. The experimental results are shown in Table 1.

Table 1. Experimental results with different weights

Combination with different weights Recognition rate (%)


Figure 2(a) 87.78
Figure 2 (b) 92.50
Figure 2 (c) 98.50
Figure 2 (d) 96.25
Figure 2 (e) 94.44

When the weight of high frequency part is large, noise would be brought in and the
performance of the face recognition would be influenced. If the weight of high
frequency part is 0, the high frequency information of face would be lost and the
performance of the face recognition would also be influenced. It is illustrated that the
case Figure 2 (c) is the best as shown in Table 1.

1.2. The Image Transform

Single face sample is difficult to train a high performance classifier for face recognition.
Several face images with different poses and different expressions could be created as
training samples by image transformation using the single face sample. The unitary
quadratic function is used here for image transformation.
For the pixel with the coordinate (x, t) in an image, the new coordinate after
transformation is (f(x), t) where f(x) =ax2+bx+c. c indicates the transformation angle.
The face deflection degree is usually from -300 to +300. Otherwise the face looks
serious deformation and the face recognition would be very difficult. Furthermore, a, b,
c should be satisfied as follows formula (1),(2),(3):
c (1)
d
a
b 1 (2)
k
c
' b 2  4ac ; 0 (3)
where d and k are constant which could be set by image size.
Suppose a face image X, the size is w×h. Let (m, n) denotes the coordinate of an
random pixel in the image where m=1,2…,w and n=1,2,…h. Based the idea of
370 Z.-B. Guo et al. / Face Recognition with Single Sample Image per Person

polynomial fitting, various pose of human face would be created from left or right
rotation, with column position changed under the unitary quadratic function, when the
row position is the same. The unitary quadratic polynomial function is as follows:
­m' m
° (4)
® 2v 2 2v(h  1)
°̄n'  h n  [1  h
]n  2v
where v is used to control the rotation degree. Its possible value ranges from -0.5
to 0.5, deflecting through -380 to +380 relative to the frontal face, and (m', n')is the
virtual pixel. The image would look severely deformed and lose much discrimination
information when it was deflected too large.
The new position of pixel may be beyond the size of h, even becomes a negative
number in the transform. There would be some black lines in the image as shown on
Figure 3(b) when the pixel’s value is set as zero and it would be difficult for feature
extraction and recognition. A new method is presented here to solve the problem. For
the pixel with zero, its value is changed with the average of its adjacent four pixels
from above, below, left, and right. The improved image is shown in Figure3(c).

(a) Original image (b) Transform image (c) Improved image


Figure 3. Image Transform

Similarly, the row position is changed under the unitary quadratic function, when
the column position is the same. The unitary quadratic polynomial function is as
follows:
­n' n
° (5)
® 2u 2 2u ( w  1)
°̄m'  m  [1  ]m  2u
w w
where u is as the loosen degree about face muscle. The movement of muscle point
can be divided into two directions, up and down. The micro expression images would
be produced. The value of u ranges from -0.3 to +0.3, otherwise the face would look
severely deformed.

1.3. Virtual Images Creation

The images obtained by formula (4) and (5) are called virtual images. These virtual
images could be used as face training samples. For a given image sample A, 10 virtual
images could be produced. The algorithm of virtual images creation is as follows
named AVIC:
Step1: Under the formula (4), v is set as one of {+0.3, +0.15,-0.15, -0.3}, and A is
transformed into A1, A2, A3, A4 respectively. The corresponding deflection degree is
+200, +100, -100,-200.
Step2: Constructing the mirror images A5 and A6 with A2 and A3 respectively
Z.-B. Guo et al. / Face Recognition with Single Sample Image per Person 371

Step3: Under the formula (5), u is set as one of {+0.3, +0.15,-0.15, -0.3}, and A is
transformed into A7, A8, A9, A10 respectively.
As is known to all, the change of micro expressions is caused by the orbicular
constricting or dilating. According to the principle of orbicular movement, muscle
motion texture is a group of curves which are always around the facial contour [11]. If
the change is small when the polynomial function is used to fit the shape of contour
curves. In order to obtain the effective distinctive feature, u is selected relatively larger.
The experimental results are shown in Figure 4:

original image A A1 A2 A3 A4 A5

A6 A7 A8 A9 A10
Figure 4. Virtual images Creation

2. RS-2DPCA for Face Recognition

2DPCA was proposed by Yang [6]. This algorithm makes use of matrix of original
image directly. Therefore the inherent structure of image data is kept in good condition.
The recognition rate is improved efficiently while the feature extraction is also speedup.
However, 2DPCA only processes PCA transformation in row without considering
about column. A novel face recognition algorithm is presented here based on residual
space and bidirectional two-dimension PCA (named RS-2DPCA).

2.1. The Improved Bidirectional 2DPCA

Suppose that there are N training image samples{X1, X2, X3, …, XN}, the size of each
image is w×h. There are L types in the training image samples, each type’s number of
the training sample is expressed as N1, N2, …, NL. The training images of cth type are
wuh
denoted by { X 1 , X 2 , … , X Nc },where X i  R
c c c c
, i=1,2,…,Nc, c=1,2, … ,L.
Step1: Computing the mean of the cth type training samples Tc as formula (6):
Nc
1
Tc
Nc
¦X
i 1
i
c
, c=1,2,…,L (6)

Step2: Computing the image scatter matrixes in row and column direction
respectively as formula (7) or (8):
1 L Nc
Gr ¦¦ ( X ic  Tc )T ( X ic  Tc )
Nw c 1 i 1
(7)
372 Z.-B. Guo et al. / Face Recognition with Single Sample Image per Person

1 L Nc
¦¦ ( X ic  Tc )( X ic  Tc )T
Gc
Nh c 1 i 1
(8)

Step3: Taking the orthonormal eigenvectors {ν1,ν2,ν3,…,νd} corresponding to the


first d largest eigenvalues of the image covariance matrix Gr, and seemly
{u1,u2,u3,…,ud} of Gc. Let Pr=[ν1,ν2,ν3,…,νd], Pc=[ u1,u2,u3,…,ud]. The feature matrix Y
of each sample X is defined as formula (9):
Y PcT XPr (9)

2.2. Construct Difference Images

The influence of different illumination on recognition could be decreased by using


difference image. Difference image is constructed as follows:
Step1: Computing the feature matrix of each training image sample Y' by the
formula (9).
Step2: Since Y1, Y2, …, Yd are orthonormal, it is easy to obtain the reconstructed
image X' of sample X :
T
X ' PYP
c r
(10)
Step3: Difference image Q is defined as follows to remove the lighting
information:
Q X X' (11)
We obtained the row and column projection vectors based on 2DPCA and
calculated the reconstructed images of original samples, then we get the difference
images by the reconstructed image and the original image subtraction and built the
residual space made up of the difference images.
Based on the observation of the reconstruction images, it is not hard to find that
there was so much illumination feature besides internal information in the face. The
difference image could be used to remove the effect of lighting information, so as to
improve the face recognition rate.

2.3. Face Recognition Based RS-2DPCA

Suppose that there are L person’s images are {A1,A2,A3, … ,AL}. There is only one
image corresponding to each person.
Step1: L person’s images {A1,A2,A3,…,AL} are transformed into {B1,B2,B3,…,BL}
with wavelet transform. The combination of weight of the 4 sub-bands is defined as
Figure2 (c).
Step2: For each person, 10 virtual images { X 1i , X 2i , X 3i ,......, X 10i } are produced by
every Bi. { X 0i , X 1i , X 2i , X 3i ,......, X 10i } are used as the training samples of the person,
where X 0i Bi . When i=1,2,…L, for L person, the number of training samples are
expanded from L to N, where N L u11 . All of training samples
are { X 01 , X 11 , X 21 ,......, X 10
1
, X 02 , X 12 , X 22 ,......, X 102 ,......, X 0L , X 1L , X 2L ,......, X 10L } .
Step3: Computing the difference images Qj of each sample X ki , i=1,2,…L.
k=0,1,2,…,10. j (i  1) u 11  k by (6)-(11).
Step4: Computing the scatter matrix of all the difference images:
Z.-B. Guo et al. / Face Recognition with Single Sample Image per Person 373

N 1
Sb ¦Q Q
j 0
T
j j
(12)

Step5: Computing projection matrix F by the scatter matrix Sb which consists of


the eigenvectors corresponding to the first d largest eigenvalues.
Step6: Computing the feature matrix of the testing sample and each training
sample under matrix F, then face is identified by the nearest neighbor classifier like
PCA. Suppose that the feature matrix of training samples are H1, H2, …,HN (where N is
the total number of the training samples), and each of these samples is assigned a given
identity (class) ck. Given a test sample, its feature matrix is H. The distance between H
and Hi is defined by
d ( H , Hi ) H  Hi 2 (13)
where H  H i denotes the Euclidean distance between the two matrix H and Hi.
2

If d ( H , Hl ) min( H , Hi ) and H l  ck , then H  ck .


Based on improved bilateral 2DPCA, we get the residual space of samples, which
can reduce the effects of illumination change and improve recognition accuracy. The
proposed method is named as RS-2DPCA.

3. Experiment Results

The experiments are done with MATLAB2012 using ORL and Yale face dataset.
Firstly, virtual face images are created as training samples by the proposed algorithm
AVIC. Recognition rates in table 2 and table 3 are defined the number of correctly
recognition divided by the total number of test samples.

3.1. Experiments on the Yale Dataset

The Yale dataset contains 15 individuals, each providing 11 different images. The
facial expressions are rich, open or closed eyes, smiling or non-smiling, glasses or no
glasses. Moreover, the illumination on the face is varied.11 sample images of one
person from Yale dataset are shown in Figure 5.

Figure 5. 11 Sample Images of One Person in Yale Dataset

The ith image (i=1, 2, …, 11) of each person is selected as single face sample per
person. This sample and its 10 virtual images were used as training samples, and the
left 10 image of each person for test. The total number of training samples is 165 and
that of the testing samples are 150. The experimental results are shown in Table 2. The
mean face recognition rate is 84.36%.
374 Z.-B. Guo et al. / Face Recognition with Single Sample Image per Person

Table 2. Recognition result with defferent traning sample in Yale database

i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9 i=10 i=11
86.67 86.00 83.33 86.00 84.67 84.00 87.33 85.33 82.67 81.33 80.67

3.2. Experiments on the ORL Dataset

The ORL dataset consists of 40 persons. Each person has 10 different images. The
facial with rich expresses and the poses are varied.10 sample images of one person
from ORL dataset are shown in Figure 6.

Figure 6. 10 Images of One Person in ORL Dataset


Similarly, the ith image (i=1, 2,…,10) of each person is selected as single person
face sample. This sample and its 10 virtual images are used as training samples, and the
left 9 images of each person are for test. The total number of training samples is 440
and that of the testing samples is 360. The experimental results are shown in Table 3.

Table 3. Recognition result with different traning samples in Yale Database

i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9 i=10
85.00 82.78 83.06 77.78 88.06 81.11 83.61 83.33 75.56 80.83

From Table 3, the mean face recognition rate is 82.11%. Experiments in the same
environment with the different algorithm are shown in Table 4. Results of the different
PCA methods in Table 4 are obtained using the same training samples, test samples and
recognition rate calculating as the proposed method.

Table 4. Experimental results with different algorithms

Algorithm Recognition rate (%)


PCA with virtual image 69.83
(PC)2A[8] 57.14
Wavelet+2DPCA[9] 80.25
Sub-block+2DPCA[10] 77.80
Proposed 82.11
Z.-B. Guo et al. / Face Recognition with Single Sample Image per Person 375

From Table 4, the proposed method is more effective than the corresponding.
Why? That is due to two factors. One is that the virtual images of multi-pose and
micro-expression are closer to the actual situations which remedy the shortage in poses
and expressions of the original image effectually. Obtained better results on the YALE
dataset is that the original image’s lighting is obvious. The present method reduces the
lighting effects in the paper, so the testing results are improved. The other is that the
influence of varying illumination is drawn down based on residual space.

4. Conclusions

Face recognition from one sample per person is an important but a challenging problem
both in theory and for real-world applications. In this paper, 10 virtual images of multi-
pose and micro expression were created with single face sample. These images were
used as training samples. Experimental results show the good performance of face
recognition. So the proposed algorithm is efficient. But the one sample problem is by
far not solved. The more research is needed for face recognition with large vary
expression, different lightness and various pose. It is also a problem to be encouraged
to study on how many training samples are enough to extract sufficient features for face
recognition.

Acknowledgment

This work was sponsored by Prospective Joint Research Project of Jiangsu Province
(BY201506-01), the LiuDa Talent Peak Project of Jiangsu (2013DZXX-023), Huai’an
533 Project and supported in part by the Major Program for scientific and technological
research in University of China under the Grant No.311024.

References

[1] Y. Q. Hu, A. S Mian, R. Owens. Face Recognition Using Sparse Approximated Nearest Points between
Image Sets, IEEE Transactions Pattern Analysis and Machine Intelligence, 10(2012), 1992 ~ 2012
[2] Y.F. Jin, K.M. Geng, Y.P.Wang. Efficient Feature Reduction Algorithm based on mPCA and Rough Set,
International Journal of Advancements in Computing Technology, 15(2012), 504 ~ 511.
[3] Z. Wang 㧘 Q. Ruan 㧘 G. An. Facial expression recognition using sparse local Fisher discriminant
analysis, Neurocomputing, 174(2016), 756-766.
[4] M. Kan㧘S. Shan㧘H. Zhang㧘S. Lao, X Chen. Multi-view Discriminant Analysis, IEEE Transactions
on Pattern Analysis and Machine Intelligence, 1(2016), 808-821.
[5] J. Yang, Z. Gu, N. Zhang, J. Xu. Median–mean line based discriminant analysis, Neurocomputing,
123(2016), 233-246
[6] J. Yang, Z. David, J. Y. Yang. Two-Dimensional PCA: A New Approach to Appearance-Based Face
Representation and Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence,
1(2004), 131 ~ 137.
[7] Z. Y. Yu, S. B. Gao. Fuzzy Two-dimensional Principal Component Analysis and Its Application to Face
Recognition, Advances in Information Sciences and Service Sciences, 11(2011), 335 ~ 341.
[8] Y. Zeng, D. Z Feng. The Face Recognition Method of the Two-direction Variation of 2DPCA,
International Journal of Digital Content Technology and its Applications, 2 (2011), 216 ~ 223.
[9] P. Voild, M. Jones. Robust Real-time Face Detection, International Journal of Computer Vision,
2(2004), 137 ~ 154.
376 Z.-B. Guo et al. / Face Recognition with Single Sample Image per Person

[10] X. Y. Tan, S. C Chen, Z. H. Zhou. Face Recognition form a single image per person: a survey, Pattern
Recognition, 9(2006), 1725 ~ 1745.
[11] J. X. Wu, Z. H. Zhou. Face Recognition with One Training Image Per Person, Pattern Recognition
Letters, 14(2002), 1711 ~ 1719.
[12] X. Y. Xu. Face Recognition Method for Single Sample Based on Virtual Image, Computer
Engineering, 1(2012), 143 ~ 145.
[13] P. Wu, J. L. Zhou, X. H. Li. Sub-block Face Recognition Based on Virtual Information with One
Training Image Per Person, Computer Engineering and Applications, 19(2009), 146 ~ 149.
Fuzzy Systems and Data Mining II 377
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-377

Cloud Adaptive Parallel Simulated


Annealing Genetic Algorithm in the
Application of Personnel Scheduling in
National Geographic Conditions
Monitoring
Juan DU1, Xu ZHOU, Shu TAO and Qian LIU
National Geomatics Center of China, Beijing, China

Abstract. Taking the irrationality of staff scheduling in China Geography


Census(CGC) into consideration, this paper establishes a personnel optimization
scheduling model. It is based on some assumptions according to characteristics in
CGC’s (or National Geographic Conditions Monitoring) production process. As
optimal dispatch of personnel involves in large-scale, high dimension and
nonlinear problems, the standard genetic algorithm (SGA) has drawbacks of
premature and slow convergence, as well as poor local optimization ability. So this
paper adopts cloud adaptive parallel simulated annealing genetic algorithm
(PCASAGA), which integrates adaptiveness, cloud reasoning with simulated
annealing mechanism to improve SGA’s performance. And also parallel
computing function is introduced. To Take Shandong Remote Sensing Technology
Application Center as an example, it shows that PCASAGA is superior to SGA in
convergence speed and optimization ability. It also proofs that homogeneous and
heterogeneous situations have not only distinction but also connection as influence
factors increase, such as number of return to modify (n), quality sampling rate (s)
and error rate (e). The distinction is changes of structure’s proportion, the former
case shows flat or falling trends, and the latter one has no unified state. On another
side, the connection is the increased optimal completion time. The findings have
guiding significance for staff optimization in National Geographic Conditions
Monitoring in aspects of engineering plan, cost calculation and so on.

Keywords. National Geographic Conditions Monitoring, personnel optimization


scheduling, genetic algorithm, cloud reasoning, simulated annealing, parallel
computing

Introduction

The decision of personnel optimization scheduling is to complete the task under scarce
resources in terms of minimizing or maximizing objective function. The earliest
research is about transportation system, and it can be traced back to Edie’s [1] toll
station traffic jams. Then the application gradually extended to many fields, and these
fields, such as health care, communications, financial services, manufacturing, high and

1
Corresponding Author: Juan DU, National Geomatics Center of China, 28 Lianhuachi West Road,
Haidian District, Beijing, China; E-mail: mhwgo_jane@163.com.
378 J. Du et al. / Cloud Adaptive Parallel Simulated Annealing Genetic Algorithm

new technology services began to pay attention. Accordingly, as the problems being
more complicated, research methods developed from traditional single algorithm, such
as linear programming, integer programming and so on, to modern heuristic algorithm
and hybrid algorithm. With the rise of algorithms which based on biology, physics and
artificial intelligence in the 1990s, genetic algorithm (GA) [2-3], simulated annealing
algorithm (SA) [4], evolutionary algorithm (EA), tabu search (TS), ant colony
algorithm (ACA) [5], and so on are widely used in optimization scheduling problem,
genetic algorithm in particular. The widespread is due to its strong global optimization,
robustness and generality.
GA is proposed by Holland in the late 1960s and early 70s, its mechanism is to:
x (1)Simulate the process of natural selection and natural genetics of
reproduction, crossover and mutation,
x (2)Keep a set of candidate solutions in each iteration, and select better
individuals according to certain indicators,
x (3) Combine these individuals to produce a new generation by using of genetic
operators (selection, crossover and mutation).
x (4)Repeat the process until some conditions are satisfied.
But GA has its own drawbacks, such as premature convergence, slow convergence
speed and poor local optimization ability. Therefore, various improvements are put
forward. For the choice of adaptive value, breeding pool [6], Boltiziman selection were
introduced on the basis of roulette selection which is mostly used, but it was prone to
produce "premature" convergence and stagnation. For the rates of crossover and
mutation are constant in the process of genetic, Chen et al. [7] raised a new perspective
of superiority inheritance based on Srinivas et al. [8], Wang and Cao [9], Zheng et al
[10], which solved the premature problem effectively to some extent. In addition, SA,
methods of gradient, mountain climbing, list optimization and the like, have strong
local search ability. They can improve running efficiency and solution quality by
adding above methods to GA’s search process. Taken all above into consideration,
Peng et al [11] combined fuzzy control and SA which named fuzzy adaptive simulated
annealing genetic algorithm (FASAGA) based on standard genetic algorithm (SGA).
Furthermore, Dong et al [12] introduced a cloud model [13-14], which took fuzziness
and randomness into consideration, together with a parallel mechanism [15-16]. They
proposed an adaptive parallel simulated annealing genetic algorithms which based on a
cloud model (PCASAGA). The PCASAGA has faster convergence speed and better
optimization results.
As unreasonable staffing in production process of most units (enterprises) in
Geographical Conditions Census (CGC), the task allocation of producer and quality
inspector determined by experience according to engineering task quantity on a
national scale, "fire brigade" dispatch mode existed universally. Gear to the needs of
normalized National Geographic Conditions Monitoring (NGCM), the paper focuses
on PCASAGA which will improve staff scheduling. We establish optimization
scheduling model which links the characteristics of CGC’s production process. At last,
the passage illustrates good performance that algorithm applies to the optimization
model by utilizing instance analysis.
J. Du et al. / Cloud Adaptive Parallel Simulated Annealing Genetic Algorithm 379

1. Personnel Optimization Scheduling Model

Personnel optimization scheduling can formulate corresponding objectives function


according to different scheduling task, we take shortest time as the goal. To enhance
model’s generality, we introduce conversion function between skill level and work
efficiency, and establish optimization model.

1.1. Efficiency Conversion Function

To consider heterogeneity of work efficiency in reality, we convert staff skill scores to


efficiency which draws lessons from learning curve of Ngwenyama et al. [17] and
existing research of Walter [18-19], mathematical expression is as follows(Eq.(1)):

f( ) = f /(1 + ∗ h ’ó ) (1)

where r(x) is efficiency, r is efficiency limit, parameters a,b>0, x is the


employee's skill score (0 to 10 in this paper).
The efficiency limit r generally gets through experience according to actual
production situation, parameters of a and b are adopted by experimental. When
employee's skill is too low, his ability is difficult to competent for the assigned tasks,
also work efficiency is lower; in the opposite side, higher-skilled worker’s efficiency is
higher. At lower and higher level, worker’s efficiency lifts slower; otherwise, at
medium level, efficiency lift relatively fast. To reference to previous research (Zhang
[20]), we use b = 1, a = 148.4132, which are shown in figure 1.

Figure 1. The efficiency conversion function curve.

1.2. Objective Function and Constraint Conditions

CGC include production, quality inspection tasks, the process is producer organize
production, then quality inspector check complete production by sampling. In this
paper, the target is finding distribution between two kinds of worker when time is
380 J. Du et al. / Cloud Adaptive Parallel Simulated Annealing Genetic Algorithm

shortest under certain production quota. To reach the target, assumptions are as
follows:
x (1) P and QC are number of producer and quality inspector, P+QC=c, c stand
for constant;
x (2) In general, the production task can be number of patches, sheet quantity or
missions’ area, we adopt the last one in this paper which is represented as A.
Sampling proportion of quality inspection is s, which is decimal number
between 0 and 1, thus inspection tasks is s * A. If quality inspectors detect
errors, it needs to return to producers for modifying, and quality inspectors
need to check all errors again. If there are errors still, the rest should be done
in the same manner, errors are modified completely after n times, and n takes
natural number. Assuming the error rate of producer is e, quality inspectors
are all right, e∈(0, 1].



∗ Î ⋯ Ê X 㧕 ∗
∗(! Î ⋯ Ê X )
Objective function㧦Min TIME=min( ∑  ) +  ) 㧔2㧕
W Æ!/(! Ç∗Ê ∑W Æ"/(! Ç∗Ê  )

where r1, r2 are the efficiency limit of producer and quality inspector respectively,
x, y are skill score of producer and quality inspector respectively.
Constraint conditions㧦P+QC=c
P,QC N+, and 0<P<c,0<QC<c
0≤x,y≤10

2. Cloud Adaptive Parallel Simulated Annealing Genetic Algorithm

GA has premature convergence problem, slow convergence speed and poor local
optimization ability, it’s primarily because selection should be carried out in
accordance with proportion, better individuals will occupy a higher proportion at earlier
time in group due to their fitness are much higher than average, then it will result in
"premature" convergence. Meanwhile, offspring inherit from parents, the diversity of
population declines, then adaptive value that of individuals’ are close in later time,
genetic operators are difficult to choose more excellent offspring, so convergence speed
is slower and local optimization ability is poorer. Therefore, PCASAGA regulates
selection and mutation operator with cloud adaptive adjustment, it introduces SA which
has strong local search ability into GA. The starting point of SA is based on the
similarity between annealing process of solid matter in physics and combinatorial
optimization problem in general. It starts from a higher initial temperature, and finds
the global optimal solution of the objective function randomly in the solution space
with probabilistic kick feature and falling temperature. SA can jumps out local optimal
solution in probability and reaches global optimal ultimately. And also parallel
mechanism to improve the algorithm efficiency.

2.1. Cloud Adaptive Mechanism of Parameters

In genetic evolution process, crossover rate P and mutation rate P control global
and local search in search space. In GA, P and P are constants and determined by
experience, but small or large value will affect genetic process. Therefore M. Srinivas
J. Du et al. / Cloud Adaptive Parallel Simulated Annealing Genetic Algorithm 381

et al put forward an adaptive genetic algorithm, scholars at home and abroad conducted
a lot of research to get improvements. Although above adaptive mechanisms
considered individual difference, but ignored situation of whole group. As a result,
Peng Yong-gang (FASAGA) eyed on the differences of individual and population,
obtained P and P through fuzzy control. However, fuzzy inference system adopts
precise membership function to describe uncertainty of qualitative concept; it fuzzifies
input values into a fuzzy set, and stimulates membership functions to get membership
value. At last, outputs are obtained by fuzzy inference machine and solution
fuzzification. Therefore, same inputs will gain same results because of invariant
excitation level, fuzzy inference machine and solution fuzzification. On the contrary,
cloud inference system will obtain uncertain results because it has no precise
membership function. Its definition is: Set C as language value on domain U, if x ∈ d
is a random implementation of C, and the certainty degree of x to C is random number
with stable tendency which is symbolized as μ(x) ∈ [0,1], μ(x): U → [0,1] ∀ ∈ d,
then the distribution of x on domain U is called cloud model. There are three digital
characteristics called expectations ( ), entropy (# ) and hyper entropy (´Ê ) in cloud
model to reflect the characteristics of qualitative concept. Expectation ( ) reflects the
cloud’s barycenter position, it is the most representative value of qualitative theory on
domain space. Entropy (# ) reflects scope that can be accepted by the language value
on the one hand, and it also reflects probability that points in domain space stand for
language value on the other hand. # represents randomness in cloud droplets of
qualitative concept, it reveals the correlation between fuzziness and randomness. Hyper
entropy (´Ê ) is uncertainty measure of entropy, namely entropy of entropy. It’s
coherence of all droplets’ uncertainty measure that represents the same language value.
When μ(x) is normal distribution, it’s called normal cloud model, and showed by
  ( , # , ´Ê ) [21-23]. Thus the outputs of cloud model are generated by random
process. From this standpoint, PCASAGA introduces cloud reasoning based on
population differences, individual differences and fuzzy inference rules.
Population differences ( E! , include population crossover differences E! and
population mutation differences E! ) and individual differences(E" , include individual
crossover differences E" and individual mutation differences E" ) are as
followings(Eqs.(3) and (4)):

! = ! = !\ = (A\V − A ̅)/A\V ∈ [0,1] (3)

̅ \V
" = A w − A/A
" =  ∈ [−1,1] (4)
̅ \V
"\ = (A − A)/A

where fÇ and f̅ are the maximum and average fitness values at each
generation, , f w is the larger of the fitness values of the individuals to be crossed, f is
the fitness values of the individuals to be mutated.
Cloud reasoning model uses the same inference rules in FASAGA, and table 1
shows the adaptive adjustment rules of P , P is in the same way. E! , P , P are
divided into language set {large, medium, small}, E" is divided into {positive, zero,
negative}. If diversity is worse(namely E! is small), and individual value is below the
average level(namely E" is negative), then P and P should be large, other
inference rules are shown in table 1, concept of each language set is described by
expectation E , entropy Ez and hyper entropy HÎ . Adaptive values of P and P
382 J. Du et al. / Cloud Adaptive Parallel Simulated Annealing Genetic Algorithm

are obtained by following procedures: 1) to take any given input vectors as cloud
titration values which without uncertainty information, 2) to input cloud titration values
to cloud generator which constructed by qualitative rules library, 3) to get some output
cloud droplets with certainty information, and then accuralize all outputs to a concrete
value. The process is shown in following figure 2.
Table 1. The rules’ table of adaptive crossover operator

E1c/E2c negative zero positive


small large large medium
medium large medium small
large medium small small

Qualitative rules library

μ
 Ê!
>Ê!  

´hÊ!
Cloud droplets  > 0 Accuralize

 Ê"
! " >Ê" ´h 0

´hÊ"

Figure 2. Cloud reasoning system


According to concept of qualitative characteristics, small and large belong to
unilateral uncertainty, which is shown with right and left hemisphere's normal cloud,
and medium with symmetrical normal cloud. Three digital characteristic values are
determined by practical experience and forefathers' research [24]. Rules in the library
related with qualitative concept E! ‫ޔ‬E" ‫ޔ‬P ‫ޔ‬P are shown in the left side as follows,
cloud charts are shown in figure 3 - figure 5 in the right side correspondingly.
1.0

1 [0,0.15]
0.8

⎧!’ž\V = e 
⎪  (0.15,0.1,0.002)
0.6

!’\ʈ6É\ =   (0.45,0.35/3,0.002)
μ

0.4

⎨ small medium large


⎪!’VÂ©Ê = e  1 [0.8,1]
0.2

⎩  (0.8,0.35/3,0.002)
0.0

0.0 0.2 0.4 0.6 0.8 1.0

E1

Figure 3. The cloud graph of qualitative concept


about sample differences E!
J. Du et al. / Cloud Adaptive Parallel Simulated Annealing Genetic Algorithm 383

1.0
1 [−1, −0.65]

0.8
⎧"’#Ê© = e 
⎪  (−0.65,0.65/3,0.005)

0.6
"’ÊÂ =   (0,0.35/3,0.005)

0.4
⎨ negative zero positive
⎪"’ž = e 1 [0.65,1]

0.2
⎩   (0.65,0.35/3,0.005)

0.0
-1.0 -0.5 0.0 0.5 1.0

E2

Figure 4. The cloud graph of qualitative concept


about individual differences E"

1.0
1 [0,0.15]
0.8
⎧0ž\V = e 
⎪  (0.15,0.1,0.002) 0.6
0\ʈ6É\ =   (0.45,0.35/3,0.002)
μ

0.4

⎨ small medium large


⎪0VÂ©Ê = e 1 [0.8,1]
0.2

⎩   (0.8,0.35/3,0.002)
0.0

0.0 0.2 0.4 0.6 0.8 1.0

Figure 5. The cloud graph of qualitative concept


about crossover rate P and mutation rate P

2.2. Cloud Adaptive Parallel Simulated Annealing Genetic Algorithm Process

Usually, the parallel model is divided into three categories: master-slave type,
coarse-grained and fine-grained model. We choose the coarse-grained model, and
introduce migration operator at the same time. The design generates a number of initial
populations randomly who process in different processors independently, and selects
the optimal individual in all populations to replace the worst one in each population
before crossover and mutation in each generation.
At the same time, we apply SA to stretching fitness, changes of crossover operator
and mutation operator. Fitness stretch according to the following method (Eq.(5)):

A w = exp(−(A\V − A)/Á6 ) (5)

where f w is stretched fitness, fÇ is the largest fitness, Tk is annealing


temperature in ith generation. Rates of crossover and mutation are determined by cloud
inference system, then annealing mechanism select better individuals as the next
generation.
PCASAGA process is as follows:
x (1) Initialize population and temperature T
x (2) Calculate fitness of population, adopt elitist strategy, and replace the worst
individual by the best one. Specific operation: if generation number is 1,
384 J. Du et al. / Cloud Adaptive Parallel Simulated Annealing Genetic Algorithm

select the one who has the largest fitness as the best individual from all
populations after comparison. Otherwise, select the larger one between
individual who has the largest fitness in current generation and the best
individual.
x (3) Choose roulette after stretching fitness
x (4) Crossover rate gains by cloud reasoning system, and accept new individual
through Boltiziman mechanism.
x (5) Mutation rate operates as 4).
x (6) Increase k to k + 1, and cool T‚ to 1/ln(k/T + 1), if termination
conditions are not met.

3. Application Instance

3.1. Data Processing

(1) Data source


In order to verify algorithm's feasibility and validity, this paper takes Shandong
Remote Sensing Technology Application Center as an example. There are 22 works in
this center, we select a half as sample data for getting true value in short time. Data is
from metadata of Shandong province in CGC, including Basic Identification
Information (V_MBIIA), which record the basic condition, like completion date and
information of production units; Indoor Data Capture (V_MIDCA), which record
person, method, time and content of data collection; and 2nd Stage Quality Control
(V_MQC2).
(2) Processing method
To intersect V_MBIIA with V_MIDCA and V_MQC2 respectively, excluding
illegal records, like unreasonable start/finish date, and then get information about total
operating area (A), total number of persons(c). Efficiency limit (r1, r2) are obtained by
the evaluation of highest level people, skill score (x) is acquired by inverse
transformation of efficiency in this experiment, and questionnaire survey to the
superior leadership is recommended under circumstance of unknown efficiency.
(3) Computing platform
We use R language in instance analysis, which involved foreach package.
Experimental environment is single PC, 8G memory, and CPU is 2 cores Intel i5.

3.2. Results and Analysis

For the target of shortest time, we calculate reciprocal of objective function (Eq.(2)),
use binary encoding way, 1 represent producer and 0 for quality inspector. To test
performance of algorithm, we simulate under different values of n, s, e, and compare
PCASAGA with SGA to test efficiency. Main parameters of PCASAGA are:
population number is 3, each population size is 10, generation number is 100, initial
temperature is 100.On the contrary, SGA’s parameters: generation number and initial
temperature are the same, the rates of crossover and mutation are 0.8 and 0.1
respectively.
(1) Convergence performance and optimization ability
J. Du et al. / Cloud Adaptive Parallel Simulated Annealing Genetic Algorithm 385

To calculate generation number under different parameters combination in two


algorithms when take the same precision (0.0001) as stop condition, results are shown
in table 2. It shows that generation number in PCASAGA is less than SGA under same
precision level, which tells that the former one can achieve convergence condition
much earlier, whose convergence performance is superior to the latter.
Table 2. The comparison of genetic generation number under the same precision

e
n s method
0.1 0.3 0.5 0.7 0.9
PCASAGA 45 23 11 48 25
1 0.2
SGA 67 38 83 61 69
PCASAGA 7 1 8 3 5
3 0.6
SGA 15 20 64 12 26
PCASAGA 3 27 11 73 2
5 1
SGA 4 43 20 94 12

To calculate optimal time under different parameters combination in two


algorithms when take same generation number (100) as stop condition, we can see
results in table 3. Data in the table are reciprocal of actual optimization values, which
stand for the time needed to complete project. Experiments show that PCASAGA’s
optimization ability is greater than SGA, the calculation time needed by the former one
is not greater than the latter, which show that PCASAGA can complete task in a shorter
period of time.
Table 3. The comparison of optimal time needed (days) under the same genetic generation number

e
n s method
0.1 0.3 0.5
PCASAGA 24.6466034862413 26.0910670029848 27.5128935182451
1 0.2
SGA 24.7194057952400 26.1109679545236 27.5128935182451
PCASAGA 31.0809181212216 37.1152116514881 46.1194461945105
3 0.6
SGA 31.0809181212216 37.1173633750822 46.1194461945105
PCASAGA 36.2018982793111 46.5114911802478 64.1453026589571
5 1
SGA 36.2019593040322 46.5171956243194 64.1473770869863

Table 3. The comparison of optimal time needed (days) under the same genetic generation number(continue)

e
n s method
0.7 0.9
PCASAGA 28.9176323078687 30.3090832026231
1 0.2
SGA 28.9285882615977 30.3093889128568
PCASAGA 59.0271264010922 76.7745213589466
3 0.6
SGA 59.0294575681828 76.7745213589466
PCASAGA 95.8284392743847 152.6646799673810
5 1
SGA 95.8284392743847 152.6646799673810
386 J. Du et al. / Cloud Adaptive Parallel Simulated Annealing Genetic Algorithm

(2) Calculation results


In circumstance of n, s, e take different values (n takes 1, 3, 5, s takes 0.2, 0.6, 1,
and e takes 0.1 to 0.9), average (homogeneous) and individual skill score
(heterogeneous) are obtained through inverse transformation of working efficiency, and
then we calculate allocation proportion of two types of technical people and optimal
time, as shown in figure 6 - figure 9.When skill scores are at average level, namely
homogeneity, the increase of n, s, e will cause flat or falling trend in staff ratio, that is,
we need to reduce producers’ quantity to increase inspectors’, meanwhile time will be
prolonged. When skill scores are of individual’s, namely heterogeneous, the changes of
three parameters will cause alteration of structure ratio. But unlike homogeneous,
proportion increase, or decrease, or increase then decrease, or decrease then increase
with the increase of parameters, there’s no unified trend performed. While it’s due to
irreplaceability of people in heterogeneous case, but time is as the same as
homogeneity, which extend as parameters increase.

5 120
P/QC(n=
4.5 1,s=0.2)
4 100
P/QC(n=
3.5 80 3,s=0.2)
3 P/QC(n=
2.5 60 5,s=0.2)
2 T(n=1,s=
1.5 40 0.2)
1 20 T(n=3,s=
0.5 0.2)
0 0 T(n=5,s=
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.2)

Figure 6. Staff ratio and time change when fixed sampling rate in homogeneous case

5 250
P/QC(n=
4.5 3,s=0.2)
4 200 P/QC(n=
3.5 3,s=0.6)
3 150 P/QC(n=
2.5 3,s=1)
2 100 T(n=3,s
=0.2)
1.5
T(n=3,s
1 50 =0.6)
0.5 T(n=3,s
0 0 =1)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Figure 7. Staff ratio and time change when fixed the number of return to modify in homogeneous case
J. Du et al. / Cloud Adaptive Parallel Simulated Annealing Genetic Algorithm 387

5 60
4.5 P/QC(n=
4 50 1,s=0.2)

3.5 P/QC(n=
40 3,s=0.2)
3
P/QC(n=
2.5 30 5,s=0.2)
2 T(n=1,s=
1.5 20
0.2)
1 10 T(n=3,s=
0.5 0.2)
0 0 T(n=5,s=
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.2)

Figure 8. Staff ratio and time change when fixed sampling rate in heterogeneous case

12 120
P/QC(n=
10 100 3,s=0.2)
P/QC(n=
8 80 3,s=0.6)
P/QC(n=
6 60 3,s=1)
T(n=3,s
4 40
=0.2)
2 20 T(n=3,s
=0.6)
0 0 T(n=3,s
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 =1)

Figure 9. Staff ratio and time change when fixed the number of return to modify in heterogeneous case

(3) Analysis
We get following findings through above results:
1) To take the same precision accuracy (0.0001) and generation number(100) as
stop conditions, it proofs that PCASAGA has better effects in convergence speed and
optimization ability after we calculate generation number and optimal completion
time in different parameter combinations;
2) By setting homogeneous and heterogeneous cases, it indicates that the former
one presents regular trend, whose ratios between producer and quality inspector are flat
or falling as increase of arbitrary influence factors. It means that there’s a need to keep
the same structure, or allocate more quality inspectors if we want to complete the task
within the optimal time. But the latter one shows an irregular change, that is, we can’t
388 J. Du et al. / Cloud Adaptive Parallel Simulated Annealing Genetic Algorithm

adopt the strategy which to keep producers unchanged or reduced blindly, it should be
combined with actual production.
3) As increased of n, s, e, and optimal time in homogeneous and heterogeneous
situations are increased obviously. This is due to low professional skills of workers’,
which will cause the increase of quantity and completion time.
Our Studies have shown that PCASAGA can be applied to the model, National
Geographic Conditions Monitoring (NGCM) should be committed to improve staff
quality and arrange production task based on actual situation from our study.

4. Conclusion

Elitist strategy is adopted to ensure convergence in Cloud adaptive parallel simulated


annealing genetic algorithm (PCASAGA), plus parallel mechanism and migration
operator are introduced to guarantee independence in each population and coordination
between populations in evolution. Algorithm combines simulated annealing and
adaptive mechanism to enhance local search ability; it makes PCASAGA superior to
SGA on convergence performance and optimization capability in nonlinear model
established in section 1. The instance analysis proof that: with the increase of n, s, e,
structure ratio and optimal time will induce changes when PCASAGA is applied to
homogeneity and heterogeneity situations. For structure ratio, changes of different
parameters will cause a flat or falling tendency in homogeneity case, that is, we need
use the strategy to remain unchanged or increased quality inspector to achieve the
optimal time objective. In heterogeneous case, ratios present non-uniform trends under
effect of three parameters because of irreplaceability in people. For optimal time, both
two situations have the same trends, that is, time increases with the increase of the
parameters. It is because low quality skills will add engineering quantity. To sum up,
PCASAGA is adaptive for the model which concerning with the characteristics of CGC.
This paper can provide guidance for NGCM which has the same production process
characteristics with CGC. In addition, NGCM should improve workers’
professionalism, and arrange production on an overall consideration based on
summarizing experience from CGC.

References
[1] L. C. Edie, Traffic Delays at Toll Booths, Journal Operations Research Society of America, 2(1954),
107-138.
[2] Y. J. Ma, W. X. Yun, Research Progress of Genetic Algorithm, Application Research of Computers 4
(2012), 1201-1210.
[3] X. Bian, L. Mi, Development on Genetic Algorithm Theory and Its Applications, Application Research
of Computers 7(2010),2425-2434.
[4] H. G. Chen, J. S. Wu, J. L. Wang, et al. Mechanism Study of Simulated Annealing Algorithm, Journal
of Tongji University(Natural Science) 6(2004),802-805.
[5] Q. H. Wu, Y. Zhang, Z. M. Ma, Review of Ant Colony Optimization, Microcomputer Information
3(2011), 1-5.
[6] F. Gao, Y.P. Shen, L.X. Li. Optimal Design of Piezo-electric Actuators for Plate Vibroacoustic
Control using Genetic Algorithms with Immune Diversity , Smart Materials and Structures
9(2000),485-491.
[7] S. Z. Chen, G. D. Liu, X. Pu, et al. Adaptive Genetic Algorithm Based on Superiority Inheritance,
Journal of Harbin Institute of Technology 7(2007), 1021-1024.
[8] M. Srinivas, L. M. Patnaik, Adaptive Probabilities of Crossover and Mutation in Genetic Algorithm,
IEEE Transaction on Systems, Man, and Cybernetics 4(1994), 656 - 667.
J. Du et al. / Cloud Adaptive Parallel Simulated Annealing Genetic Algorithm 389

[9] X. P. Wang, L. M. Cao, Genetic Algorithm Theory, Application and Software Implementation, Xi 'an
Jiaotong University Press, Xi 'an,2002.
[10] J. Zheng, J. Zhu, Image Matching based on Adaptive Genetic Algorithm, Journal of Zhejiang
University (Engineering Science) 6(2003), 689 -692.
[11] Y. G. Peng, X. P. Luo, W. Wei, New Fuzzy Adaptive Simulated Annealing Genetic Algorithm,
Control and Decision 6(2009), 843-848.
[12] L. L. Dong, G.H. Gong, N. Li, et al. Adaptive Parallel Simulated Annealing Genetic Algorithms
based on Cloud Models, Journal of Beijing University of Aeronautics and Astronautics 9(2011),
1132-1136.
[13] D.Y. Li, Y. Du, Artificial Intelligence with Uncertainty, National Defense Industry Press ,
Beijing,2005.
[14] D. R. Li, S. L. Wang, D. Y. Li, Theory and Application of Spatial Data Mining (Second Edition),
Science Press, Beijing, 2013.
[15] T. C. Guo, C. D. Mu, The Parallel Drifts of Genetic Algorithms, Systems Engineering& Theory
Practice 2(2002), 15-23, 41.
[16] J. Q. Gao, G. X. He, A Review of Parallel Genetic Algorithms, Journal of Zhejiang University of
Technology 2(2007), 56-59, 72.
[17] O. Ngwenyama, A. Guergachi, T. Mclaren, Using the Learning Curve to Maximize IT Productivity: A
Decision Analysis Model for Timing Software Upgrades, International Journal of Product Economics
2(2007),524-535.
[18] J.G. Walter, K. Stefan, R. Peter R, et al. Multi-objective Decision Analysis for Competence-oriented
Project Portfolio Selection, European Journal of Operational Research 3(2010),670-679.
[19] J.G. Walter, Optimal Dynamic Portfolio Selection for Projects under A Competence Development
Model, OR Spectrum 33(2011), 173-206.
[20] Y. Zhang, Knowledge works scheduling based on stochastic ability promotion, Xian University of
electronic science and technology, 2012.
[21] D. Y. Li, H. J. Meng, X. M. Shi, Membership clouds and membership cloud generators, Journal of
Computer Research and Development 6(1995), 15-20.
[22] C. H. Dai, Y.F. Zhu, W. R. Chen, Adaptive genetic algorithm based on cloud theory, Control Theory
and Applications 4(2007), 646-650.
[23] H. Chen, B. Li, Approach to Uncertain Reasoning Based on Cloud Model, Journal of Chinese
Computer Systems 12(2011), 2449-2455.
[24] M. S. Wang, M. Zhu, Evaluating Intensive Land Use Situation of Development Zone based on Cloud
Models, Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE)
10(2012), 247㧙252.
390 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-390

Quality Prediction in Manufacturing


Process Using a PCA-BPNN Model
Hong ZHOU a, b, 1 and Kun-Ming YU a, c
a
Ph.D. Program in Engineering Science, Chung-Hua University, Hsinchu, Taiwan
b
Faculty of Computer and Software Engineering, Huaiyin Institute of Technology,
Huai'an, Jiangsu, P. R. China
c
Department of Computer Science and Information Engineering, Chung-Hua
University, Hsinchu, Taiwan

Abstract. A PCA-BPNN model is proposed and simulated which aims to resolve


the difficulties faced by the product quality prediction in the modern industry
which is brought by the high dimension of production parameters generated in the
complex and nonlinear production process. The PCA algorithm is introduced into
the BPNN model to realize the dimension reduction without vital information loss
which can simplify the architecture of the neural network. The PCA-BPNN model
is illustrated and simulated. Experimental results show that this model is superior
over the BPNN which achieves a stable prediction performance with a
rapid convergence and can overcome the problem, oscillation of MSE, occurred in
the BPNN.

Keywords. quality prediction, high dimension PCA, BPNN

Introduction

Superior product quality is the constant pursuit of industry and the product quality
prediction is valuable during manufacture which can detect the defective products early,
ensure the product quality and improve the product yield effectively. However, the
traditional quality prediction method based on Statistics Process Control (SPC) [1] is
difficult to establish a complete and accurate model for quality prediction in the
modern industry. Therefore, it is of great significance to realize the intelligent
production quality prediction using the modern quality prediction method based on
Artificial Intelligence (AI) which can overcome the limitations of the traditional
method.
The Artificial Neural Networks (ANNs) [2], an information processing model
simulating the human brain structure and function, has the strong adaptive learning
ability and nonlinear function approximation ability, which is suitable for the complex
and nonlinear production process nowadays. The proposed model, PCA-BPNN, which
optimizes the Back Propagation Neural Network (BPNN) using the Principal
Component Analysis (PCA) algorithm, can resolve prediction difficulties aroused by
the high dimensional production parameters.

1
Corresponding Author: Hong ZHOU, Ph.D. Program in Engineering Science, Chung-Hua University,
707, Sec.2, WuFu Rd., Hsinchu, Taiwan; E-mail: d10424004@chu.edu.tw.
H. Zhou and K.-M. Yu / Quality Prediction in Manufacturing Process Using a PCA-BPNN Model 391

1. Literature Reviews

1.1. Back Propagation Neural Network

The Back Propagation Neural Network (BPNN), a kind of the multi-layer forward
neural network, is proposed based on the error Back Propagation (BP) algorithm and
becomes one of the most widely used neural network models [3]. The learning process
of BPNN includes two processes. One is forward propagation which transmits the input
signal from the input layer through the hidden layer to the output layer. Another is
backward error propagation which transmits errors in reverse, and adjust the value of
connection weights and bias using gradient descent algorithm.
The BPNN shows an outstanding ability of fault tolerance, self-learning and
nonlinear dynamic processing and is a good choice to solve the fuzzy, not strict or
incomplete problems. However, it has inherent defects like that its convergence speed
is slow and it is easily falling in a local minimum value which can be improved by
introducing other algorithms or by optimizing the network structure model.

1.2. Principal Component Analysis

Principal Component Analysis (PCA) [4], proposed by Hotelling in 1993, is a


statistical method to analyze and find the main influencing factors in the multivariate
things which can reveal the nature of things and simplify the complex problem. The
PCA does the spatial coordinate rotation on the original multi-variable matrix after
which a set of new uncorrelated and representative variables called principal
components are achieved under the premise of no loss of the original information
contained in the original samples as far as possible. Compared with the original
variables, the number of principal components is less but the performance is better.

2. Design of the PCA BPNN model

In the modern industry, massive production data are produced continuously during the
production process which is usually imprecise, incomplete, redundant and so on. Hence,
if the BPNN is adapted to analyze these large-scale data with a high dimension, it is
easy to lead a long training time of network, be trapped in a local optimal value, or
even generate an oscillation. These defects can be improved by introducing the PCA
algorithm to optimize the BPNN model shown in Figure 1.

Figure 1. Architecture of PCA-BPNN.


392 H. Zhou and K.-M. Yu / Quality Prediction in Manufacturing Process Using a PCA-BPNN Model

The PCA is responsible for the dimension reduction of production features namely
the input variable X. The BPNN is in charge of the product quality prediction which
inputs the new production variable P obtained from PCA and outputs the prediction
quality of products y1. The flow chart of production quality prediction using the PCA-
BPNN model is shown in Figure 2.

Figure 2. Flow chart of PCA-BPNN.

2.1. Missing Value Imputation Using KNN

In order to ensure the prediction results meaningful, the data analysis should be
implemented on the ideal datasets which means the data is true and incomplete.
Therefore, a simple and effective algorithm, K Nearest Neighbor (KNN) [5] is utilized
to do the missing data imputation for the Secom dataset of UCI (described in section 3).
The main implement steps include the sample definition, K nearest neighbors selection
using the minimum Euclidean distance, estimation of missing value using the weighted
average value of K nearest neighbors.

2.2. Dimension Reduction Using PCA

The PCA algorithm is employed to reduce the production variables and specific steps
are described as following:
1. Define the original input variables in the matrix X=(x1, x2, …, xp) T where
p=590 represents the dimension of production features. And production samples
(n=1567) are defined as xi=(xi1, xi2, …, xip), i=1,2,…,n.
2. Normalize the original matrix X into the standard matrix Z using two methods
separately. One is the Min-max method defined in Eq. (1) where zij  [-1, +1] and
another is the Zero-mean method defined in Eq. (2) where the mean value is 0 and the
standard deviation is 1. In this way, effects on the analysis result brought by differences
between magnitude and dimension of inputs are eliminated to ensure the comparability
of inputs.

zij (1)  ( xij  min j ) / (max j  min j ) *(1  (1)) (1)


H. Zhou and K.-M. Yu / Quality Prediction in Manufacturing Process Using a PCA-BPNN Model 393

zij ( xij  xj ) / V j (2)

Note: zij (i=1, 2, ̖, n j=1, 2, ̖, p) are standardized variables; maxjᇬminj are the
maximum and minimum value of each production feature; xj is the arithmetic mean of
each production variable; V j is the standard deviation of each production feature.
3. Construct the covariance matrix R based on the matrix Z which is composed of
rij(i j=1, 2, ̖, p) , the correlation coefficient between zi and zj, defined in Eq. (3).

n n n (3)
rij (¦ (zki  zi )( zkj  z j )) / ¦ ( zki  zi)2 ¦ ( zkj  z j)2
k 1 k 1 k 1

4. Calculate the engine root and engine vector taking account of R -O I =0 in


which I means the unit vector and sort the engine root O i(i=1, 2, …, p) in a descending
order. Then the orthogonal engine vector P ij is resolved according to each engine root
O.
5. Select the principal component taken the cumulative contribution rate> =99%
using Eq. (5) to form a new matrix W.

i p (4)
Mi ¦ O /¦ O
i 1
i
i 1
i i 1, 2,..., p

6. Transform the principal component matrix W using the equation yij P ij wij .

3. Experiment and Simulation

The dataset adopted is called Secom provided by University of California Irvine [6]
which is acquired from a semi-conductor manufacturing process in the real-world. It
includes 1567 production instances, 590 associated measured features and a pass/fail (-
1/1) yield. The simulation runs on the Matlab R2015b with the CPU Intel(R) Core(TM)
i5-3317U 1.70GHz and a 4 GB RAM.

3.1. Establishment and Training of BPNN

In the established BPNN [7, 8], the number of neurons in the input layer is defined as N
whose value depends on the number of principal components chosen by PCA. If the
min-max normalization method is applied to PCA then N=7. Otherwise, if the zero-
mean normalization method is applied to PCA then N=31. The number of neurons in
the output layer is defined as O=1. Furthermore, the number of neurons in the hidden
layer is defined as H according to the empirical formula H = N  O +a where a=10 the
optimal value after trial. Thus the value of H will be 13 or 16 depending on the
different numbers of inputs. The transfer function utilized is the
hyperbolic tangent non-linear function “transig”; the training function utilized is the
gradient descent with momentum backpropagation function “traingdm”; and the
394 H. Zhou and K.-M. Yu / Quality Prediction in Manufacturing Process Using a PCA-BPNN Model

learning function utilized is the gradient descent with momentum weight and bias
function “learngdm”.
The training parameters are set as following: the maximum number of epochs to
train is 10000; the performance goal of training is 0.00001; the learning rate is 0.01; the
maximum validation failures is 20; and the minimum performance gradient is le-5.

3.2. Simulation Using PCA-BPNN

The 5-fold cross-validation is utilized to partition the original samples achieved using
PCA into 5 disjoint subsamples randomly and evenly. Of these 5 subsamples, a single
subsample is retained as the validation data to evaluate the prediction accuracy, and the
remaining 4 subsets are used together as a training set to establish the model. This
cross-validation process is repeated 5 times and each subset is used only once as the
validation set.
In this section, Square Error Mean (MSE) and correct ratio are used to evaluate
the prediction performance of the PCA-BPNN model. The correct ratio (CR) indicates
the percentage of samples predicted correctly account for the total sample. It should be
particularly noted that the predicted quality output will be marked as qualified when it
is smaller than 0, otherwise will be marked as unqualified.

3.2.1. Simulation using different inputs based on same model


Two different datasets, generated by KNN with K=3 and K=6 respectively, are input to
the PCA-BPNN model which uses Min-max method to normalize and has 13 neurons
in the hidden layer. The MSE and CR of quality prediction are recorded in Table 1.
Table 1. MSE and CR of quality prediction based on different datasets using PCA-BPNN

Cross-validation MSE(K=3) MSE(K=6) CR(K=3) CR(K=6)


1th Round 0.26158 0.25235 93.29% 93.29%
2th Round 0.25071 0.23819 93.61% 93.61%
3th Round 0.26034 0.25578 93.29% 93.29%
4th Round 0.25835 0.25701 93.31% 93.31%
5th Round 0.27298 0.25094 92.04% 93.31%
By comparing the simulation results shown in Table 1, it can be seen that the
prediction result is basically the same when using KNN with different value of K on
the same model.

3.2.2. Simulation using the same inputs based on different models


Three different prediction models are simulated in this section, based on the same data
set generated by KNN (K=3), compared in the Table 2. The MSE and CR of quality
prediction are recorded in Table 3 and 4.
By comparing the simulation results shown in Table 3 and 4, it can be seen that
the model performance is obviously between the BPNN and other models. The
prediction accuracy of BPNN varies violently and frequently and the MSE in the best
case (0.25711) is only 9.1% of that in the worst case (2.7993). However, the
prediction accuracies of other models remain stable and there is no significant
difference between themselves. Furthermore, although the smallest MSE of these
H. Zhou and K.-M. Yu / Quality Prediction in Manufacturing Process Using a PCA-BPNN Model 395

three models is roughly equal, the biggest MSE of BPNN (2.7993) is 10 times more
than that of the other two models (0.27298). Therefore, not only the stability but also
the accuracy of the quality prediction is optimized when produced PCA into the
BPNN model.
Table 2. Differences between three quality prediction models

Normalization using Neurons in the input Neurons in the hidden layer


in PCA layer
BPNN ----------- 590 34
PCA-BPNN min-max 7 13
PCA-BPNN’ zero-mean 31 16

Table 3. MSE of quality prediction based on the same dataset

Cross-validation MSE of BPNN MSE of PCA-BPNN MSE of PCA-BPNN’


th
1 Round 0.32874 0.26158 0.25871
2th Round 1.6654 0.25071 0.26655
th
3 Round 2.7993 0.26034 0.25318
th
4 Round 0.25711 0.25835 0.27255
5th Round 1.5644 0.27298 0.259

Table 4. CR of quality prediction based on different models

Cross-validation CR of BPNN CR of PCA-BPNN CR of PCA-BPNN’


th
1 Round 93.61% 93.29% 93.31%
th
2 Round 6.69% 93.61% 92.97%
th
3 Round 6.71% 93.29% 93.61%
4th Round 93.29% 93.31% 92.97%
th
5 Round 10.83% 92.04% 93.31%

4. Conclusion

The PCA-BPNN model is designed and simulated to predict the product quality for the
modern industry which has high dimensional production parameters. From the
simulation results described it can be seen that the PCA-BPNN model can reduce the
input dimension effectively using PCA without information loss when facing the
massive and high dimensional data. In addition, by this way the scale of the neural
network can be minified, the convergence can be accelerated, and the prediction
accuracy can be improved. Compared with the traditional BPNN, the performance of
PCA-BPNN is quite stable and the problem of performance vibration can be solved
effectively.
Although, the PCA-BPNN shows a prominent superiority over BPNN, it also has
some defects needed to be improved, for example, it is valuable to research on how to
optimize the model performance and avoid trapping in the local optimum.
396 H. Zhou and K.-M. Yu / Quality Prediction in Manufacturing Process Using a PCA-BPNN Model

Acknowledgement

This research was supported in part by the Ministry of Science and Technology of
R.O.C. under contract MOST 105-2221-E-216 -015 -MY2.

References

[1] W. H. Woodall and C. M. Douglas, Research issues and ideas in statistical process control, Journal of
Quality Technology, 31 (1999), 376.
[2] I. A. Basheer and M. Hajmeer, Artificial neural networks: fundamentals, computing, design, and
application, Journal of Microbiological Methods, 43 (2000), 3-31.
[3] A. T. C. Goh, Back-propagation neural networks for modeling complex systems, Artificial Intelligence in
Engineering, 9 (1995), 143-151.
[4] I. T. Jolliffe, Principal Component Analysis, New York: Springer-Verlag, 1986.
[5] G. E. Batista and M. C. Monard, A study of k-nearest neighbour as an imputation method, HIS, 87 (2002),
251–260.
[6] M. C. Michael and A. Johnston, Secom Data Sets of UCI Machine Learning Repository,
https://archive.ics.uci.edu/ml/datasets.html, 2008.
[7] P. G. Benardos and G. C. Vosniakos, Optimizing feedforward artificial neural network architecture.
Engineering Application of Artificial Intelligence, 20 (2007): 365-382.
[8] C. Macbeth and H. Dai, Effects of learning parameters on learning procedure and performance of a
BPNN. Neural Networks the Official Journal of the International Neural Network Society, 10 (1997),
1505-1521.
Fuzzy Systems and Data Mining II 397
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-397

The Study of an Improved


Intelligent Student Advising System
Xiaosong LI1
Practice Pathway of Computer Science, Unitec Institute of Technology, Auckland, New
Zealand

Abstract. This paper describes an improved intelligent student advising system.


Comparing it to the initial system, the improvements include: the initial system
was integrated with WEKA, both K-means algorithm and Cobweb algorithm were
implemented for training and testing data sets, and course and pathway
recommendations were also implemented. The recommendations given by the
improved system were based on the K-means algorithm only, and the results were
meaningful. However, the quality of the recommendations could be improved. For
that purpose, Cobweb algorithm was also experimented. The results showed that it
is hard to identify proper parameters for Cobweb algorithm to produce meaningful
clusters. Furthermore, the results of the experiments suggested that Cobweb
algorithm is less efficient than K-means algorithm for this system. The results of
the experiments also suggested that Cobweb algorithm is more reliable than K-
means algorithm for this system. Future research should focus on improving the
efficiency of Cobweb algorithm.

Keywords. K-means, cobweb, WEKA, clustering, collaborative filtering,


intelligent academic advising system, course

Introduction

Online academic advising systems can provide prompt, effective and efficient advices,
enhance student experience and save institutional resources.
In [1], a prototype of a web based intelligent student advising system using
collaborative filtering [2, 3] had been developed for concept approval. In this system,
students are sorted into groups. If a student was determined to be similar to a group of
students, a course preferred by that group might be recommended to the student [1].
Real student data with complete records of all 743 students enrolled in over 50 courses
in the Bachelor of Computing Systems (BCS) were anonymized and used for training
and testing the prototype of the sys-tem. This made it easy to integrate the system with
our current student management system. There-fore, our students don’t need to create a
profile to use this system. Students intending to complete the BCS program of study
could use the system to help them decide the pathway and papers to enroll in, as it
gives advice that considers their current application, academic transcripts and cultural
background. The solution is cost-effective, scalable and easily accessible by students,

1
Corresponding Author: Xiao-Song Li, Practice Pathway of Computer Science, Unitec Institute of
Technology, Auckland, New Zealand; E-mail: xli@unitec.ac.nz.
398 X. Li / The Study of an Improved Intelligent Student Advising System

lecturers and data analysts who can benefit from the long-term investment in smart
tools.
The data were divided into two sets: training data (372 records) and testing data
(371 records) [1]. The following attributes were defined based on the existing student
records [1]:
x GPA: Relevant to performance.
x Age: Relevant to family stress, e.g. mature students are more likely to have
family commitment.
x Ethnicity: Relevant to English competency and family background.
x Gender: Relevant to learning style and how they can cope with the provided
learning facilities.
Instead of other clustering techniques, K-means algorithm was chosen to determine
the similarity of the students [4, 5] for the prototype, due to its ease of use and fitness
for the purpose. K=7 was identified as the most informative and effective value for the
K-means algorithm used in this system [1]. The K-means algorithm was implemented
by using the C# procedure provided by [6].
For verification, the training data were used to predict the preferences of the
testing data when K=7. For all the other clusters, the results were very close except for
cluster 3, where there were around 10% course preference differences between the two
datasets for all the three pathways: Software Development (SD), Networking and
Security (NS), Business Intelligence (BI) and the rest of the courses (Other).
To investigate further on improving the initial sys-tem to address the above issue,
other cluster methods were considered, such as Cobweb. In [7], Cobweb algorithm was
combined with K-means algorithm, where Cobweb was used to produce a balanced tree
with sub-clusters at the leaves and then K-means applied to the resulting sub-clusters.
An outline of the procedure to provide guidelines for students’ course selection
and pathway selection was given in [1]; however, it was not completely implemented
in the initial system.
An improved system was designed and implemented. The main objectives of the
improved sys-tem are to: improve the user interface of the initial system for robustness
and usability; provide recommendation for popular pathways for different groups of
students; provide pathway and course selection for a student logged into the system;
and improve the recommendation quality of the initial system. This paper describes the
results of the first phase of the improved system.
In the rest of this paper, the improved system is described, the experiment results
of K-means algorithm and Cobweb algorithm are compared, and a summary and future
work is given at last.

1. The Improved System

The improved system includes three major improvements: a) the initial system was
integrated with WEKA (Waikato Environment for Knowledge Analysis); b) both K-
means algorithm and Cobweb algorithm were implemented for training and testing data
sets; and c) course and pathway recommendations were implemented.
WEKA is a widely used open-source machine learning and data mining software
developed using Java programming language. To assure the correctness of the test
results and to experiment with more machine learning algorithms efficiently, WEKA
X. Li / The Study of an Improved Intelligent Student Advising System 399

was integrated with the initial intelligent student advising system which was
implemented in ASP.NET. IKVM software was used to integrate WEKA with the
initial system. K-means algorithm was re-implemented by using WEKA (training and
testing).
The recommendation procedure [1] was implemented for K-means algorithm with
K=7. Given a student record xi, the study pathway the student could take or the courses
the student could take for next semester are recommended based on the following
procedure:
1) Generate clusters C = {c k }, i-1, …, k by using the K means algorithm on the
whole data set, where k=7.
2) Identify which cluster xi belongs to, say cm where 7>= m >=1.
3) Find out the top 12 most popular courses in c m, eliminate those xi has taken,
recommend the rest to xi.
4) Calculate all the average marks for all the courses taken by the students in
cluster cm, select the top 12 courses, eliminate those xi has taken, recommend
the rest courses with the highest average marks.
5) Recommend the most popular pathway (major) in cluster c m to xi. This is
particularly useful to new student or for a student who wishes to change
pathway.
Figure 1 shows the recommendation after a Software Development pathway
student logged in. As the student’s record was taken when he was in the third year, he
had taken most of the software courses at lower levels; therefore the majority of the
recommended courses were third year software courses, such as Java Enterprise
Programming, Mobile Software Development, Data Warehousing and etc. The
recommended study pathway was Software Development, which is correct. The
recommended courses based on the highest average marks included some popular
network courses as well. Further investigation is required.

Figure 1. The recommendation for a logged in student


400 X. Li / The Study of an Improved Intelligent Student Advising System

2. The Test Results Comparison

2.1. The K-means Results

Figure 2 shows the K-means results from the initial system when K=7. Figure 3 shows
the K-means results from the improved system based on WEKA when K=7. To verify
that the results from the both systems are correct, the K-means results from the both
systems were compared for K = 1, …, 7, where the same datasets and attributes were
used. The whole set was divided into 50:50 to create the training set and the testing set,
see Table 1.
Table 1. The testing datasets

Datasets No. of instances


The whole set 743
The training set 372
The testing set 371

Figure 2. The K-means results from the initial system.


X. Li / The Study of an Improved Intelligent Student Advising System 401

Figure 3. The K-means results based on WEKA.


Both systems also use same attributes: Student ID, Gender, Birthdate, GPA, and
Ethnicity. In order to find most matching clusters between the two systems, certain
criteria have been followed in specific order. For the clusters that match depending on
dataset’s characteristics: the whole distribution of Ethnicity, Average Age, Average
GPA, and Gender; the whole shape of each cluster’s histogram; the distribution of
Ethnicity; the distribution of Gender; Average Age; Average GPA.
For the clusters that match depending on majors, the benchmark was the
percentage of ‘Networking & Security’. A cluster with a highest percentage of
Networking & Security in the improved system will have best match with a cluster that
had a highest percentage of Networking & Security in the initial system. A cluster with
a lowest percentage of Networking & Security in the improved system will have best
match with a cluster that had lowest percentage of Networking & Security the initial
system.
The results showed that the two systems do not produce exactly the same results
but they do produce results with similar trends and same characteristics.
The clusters that are produced when K= 4 and 7, especially when the whole data
set was used, gave results that showed that the K-means from both systems used similar
procedure to sort out the instances. The histograms were not perfectly matching but
they showed that the instances were clustered in a similar trend and could help to easily
find the most matching clusters between the two systems. The clusters that were
produced from the two systems, even though some of them share the same
characteristics that can be easily be seen by looking at the histograms, they didn’t have
perfectly matching distribution of instances.
402 X. Li / The Study of an Improved Intelligent Student Advising System

According to [8], unsupervised learning does not require training or human


labeling on training data. The experiments had shown that the results from the whole
set are most accurate and easy to find matches amongst the other results. Unsupervised
learning is made to find clusters and place the instances in discovered clusters. And
therefore it is strongly recommended to use the whole set only and select either K=4 or
7 to find out what papers should be recommended to a specific student, since from our
experiments K=4 and 7 had shown most reliable and steady consistency between the
initial system and the improved system.

2.2. The Cobweb Results from the Improved System

The Cobweb method was developed for clustering objects in an object-attribute dataset.
The Cobweb method yields a classification tree that characterizes each cluster with a
probabilistic description. It builds clusters by incrementally adding instances to a tree,
merging them with an existing cluster if this leads to a higher ‘Category Utility (CU)’
value than when the instance would get its own cluster and if the need arises, an
existing cluster may also be split up into two new clusters, as this is beneficial to the
CU value [9]. Unlike K-means, the Cobweb finds a set of clusters depending on the
parameter setting that has been set by the user.
The improved system adopted the Cobweb functionality of WEKA. The
experiments were conducted to discover most suitable parameter setting for the
improved system, which helped the Cobweb to discover 4 clusters from the three data
sets. Identical datasets for the K-means algorithm were used for these experiments. The
experiments required configuring the following three parameters:
x Acuity – The minimum standard deviation of a cluster attribute. It only
matters for numerical attributes. The default acuity is ‘1.0’,
x Cutoff – The minimum category utility of a cluster attribute. The default
cutoff is ‘0.0028209479177387815’,
x Seed – The random number seed to be used. The default seed is ‘42’.
According to [9], the CU is the main value that determines where instances belong
to, which clusters they belong to, if the dataset does not have numerical attribute(s) one
of the parameters, the acuity, does not need to be adjusted to find a set of aimed
clusters. The datasets that had been used for the experiments have both numerical and
nominal attributes; therefore, both Cutoff and Acuity were configured.
The number of clusters increased or stopped with maximum number of clusters
that could be found when the number of Cutoff was increased and the number of
clusters decreased when the Cutoff was decreased. Unlike the Cutoff, when the Acuity
and Seed were configured the results didn’t show steady trend. When number of Acuity
was increased or decreased, the number of clusters that were discovered sometimes was
smaller than previous setting or bigger than the previous setting.
Figure 4 shows the Cobweb results based on WEKA. Figure 5 shows the Cobweb
tree structure and the resulting clusters for the training dataset; Figure 6 shows the
Cobweb tree structure and the resulting clusters for the testing dataset; Figure 7 shows
the Cobweb tree structure and the resulting clusters for the whole dataset.
X. Li / The Study of an Improved Intelligent Student Advising System 403

Figure 4. The Cobweb results based on WEKA.

Figure 5. The result from the training dataset.


The finalized parameter settings were found through numerous experiments. The
experiments have revealed that individual parameter setting needs to be set for each set.
When the parameter setting for the whole set was used on the training set or the testing
set, the results didn’t come out as 4 clusters. And since the given data sets have mixed
types of value, both Acuity and Cutoff need to be configured.
404 X. Li / The Study of an Improved Intelligent Student Advising System

Figure 6. The result from the testing dataset.

Figure 7. The result from the whole dataset.


It was not an easy task to produce pre-required number of clusters via Cobweb.
The experiments had shown that finding the exact values for parameters took time and
it was a tedious task as the researcher had to experiment with different values without
any clue, to figure out how to produce a desired set of clusters. The parameters
absolutely depend on the dataset’s instances. If new students are added to the dataset,
the Cobweb will require a brand new parameter setting to produce a set of clusters.
The system could be further improved to develop a separate panel for the
parameter tuning, just like how WEKA allows the users to use separate pop-up window
to configure the parameter setting. When the system goes live, the student record will
X. Li / The Study of an Improved Intelligent Student Advising System 405

be added dynamically, and the Cobweb will seek for updated values for its parameter to
produce a set of clusters. This makes Cobweb less efficient and therefore not likely to
be recommended in preference to the K-means.

2.3. Comparison of K-means and Cobweb

The two algorithms were compared on the WE-KA based improved systems when 4
clusters were produced by both of the algorithms. The same method used in [1] was
used to verify K-means and Cobweb on the WEKA based improved systems, where the
training data was used to predict the preferences of the testing data for the four clusters,
and then they were compared.
Comparing the differences between the testing data and the training data for K-
means algorithm in Table 2, it was found that the biggest difference was in cluster 4 in
BI pathway, i.e. 6.68%; on the other hand, the maximum difference was around 5% for
all the clusters and all the pathways for Cobweb algorithm. This suggested that the
recommendations based on Cobweb algorithm could be more reliable.
Table 2 The experiment results on different datasets

Pathway (%) SD NS BI Other


Cobweb 31.65 43.2 11.43 13.72
Whole 29.6 46.46 10.07 13.87
36.7 38.06 11.28 13.97
34.74 38 12.42 14.84
Cobweb 34.71 41.47 12.06 11.76
Training 40.34 33.97 13.18 12.51
34.62 39.41 13.19 12.77
30.22 43.96 13.6 12.23
Cobweb 36.13 37.89 13.28 12.7
Testing 35.62 38.26 13.8 12.33
34.99 39.81 12.64 12.56
34 40.86 13.14 12
K-means 29.7 46.03 10.71 13.56
Whole 40.25 34.63 10.57 14.56
32.62 41.81 12.02 13.55
34.74 38 12.42 14.84
K-means 32.72 41.69 12.78 12.79
Training 39.21 34.56 14.06 12.18
32.51 43.21 13.58 10.7
40.21 37.11 8.26 14.43
K-means 31.39 44.25 12.31 11.95
Testing 36.01 38.33 13.09 12.57
37.02 38.3 11.49 13.19
38.96 33.41 14.88 12.75
406 X. Li / The Study of an Improved Intelligent Student Advising System

3. Summary and future work

This paper has described an improved intelligent student advising system. Compared to
the initial system, the improvements include: integrating the initial system with
WEKA; implementing both K-means algorithm and Cobweb algorithm for training and
testing data sets; and implementing course and pathway recommendations.
The recommendations given by the improved system were based on the K-means
algorithm; the results were meaningful. However, the quality of the recommendations
could be improved. For that purpose, Cobweb algorithm was also experimented. The
results showed that it is hard to identify proper parameters for Cobweb algorithm to
produce meaningful clusters. The results of experiments suggested that Cobweb
algorithm is less efficient that K-means algorithm for this system. It also suggested that
Cobweb algorithm is more reliable than K-means algorithm for this system. Future
research should focus on improving the efficiency of Cobweb algorithm.
To improve the quality of the recommendations, two new attributes, learning styles
and personal interests, were considered. Thirty five learning styles were introduced,
including: self-motivating, curiosity and adaptability. Personal interests included
cooking, sports, and reading. The improved system provided options for students to
select their learning styles and their personal interests, after they login, and saved those
into the database along with other data in each student’s record. This function can be
used in the future for data collection to include these two attributes in the data model of
the recommendation system.

Acknowledgment

The author would like to thank the “Intelligent Student Advising System” BCS project
team (Xingyu Liu, Jehee Hwang, Obert Ye and Xianbo Lu) for their implementation
and experiments.

References

[1] K. Ganeshan and X. Li, An Intelligent Student Advising System using collaborative filtering, Proceedings
of Frontiers in Education, 2015, 2194–2201.
[2] T. Jones. Recommender systems, Part 1: Introduction to approaches and algorithms. IBM Developer
Works: 12 December 2013.
[3] Collaborative Filtering. Web Whompers. Retrieved 10th April 2014 from:
http://webwhompers.com/collaborative-filtering.html.
[4] K means Clustering. OhMyPHD. Retrieved 23rd March 2014 from: http://www.onmyphd.com/?p=k-
means.clustering#h3_badexample.
[5] A. K. Jain, Data clustering: 50 years beyond K-means. Pattern recognition letters, 31(8), (2010), 651-
666.
[6] J. McCaffrey, K Means Clustering Using C#. Visual Studio Magazine. (2013) Retrieved 23rd March
2014 from: http://visualstudiomagazine.com/articles/2013/12/01/k-means-data-clustering-using-c.aspx
[7] M. Li, G. Holmes and B. Pfahringer, Clustering large datasets using cobweb and K-means in tandem,
Proceedings of the 17th Australian Joint Conference on Artificial Intelligence, G.I. Webb & Xinghuo Yu
(Eds.), Cairns, Australia, December 4-6, 2004, pp. 368-379. Berlin: Springer.
[8] J. Hu (2012). Clustering – An unsupervised learning method. Retrieved from
http://www.aboutdm.com/2012/12/clustering-unsupervised-learning-method.html
[9] P. Spronck (2005). Lab 7: Clustering. Tilburg centre for Cognition and Communication, Tilburg
University, The Netherlands.
Fuzzy Systems and Data Mining II 407
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-407

An Enhanced Identity Authentication


Security Access Control Model Based on
802.1x Protocol
Han-Ying CHEN a,1 and Xiao-Li LIU b
a
College of Information Science and Technology, Jinan University,
510632, Guangzhou, China
b
College of Information Science and Technology, Jinan University,
510632, Guangzhou, China

Abstract. The safety of the access to certification system has gained more and
more attention, the general current access control system does not consider that
whether the user's terminal equipment meets the needs of security policy. This
paper presents a security access control model based on 802.1x protocol, extended
by adding authentication information, to strengthen the authenticated user's
strategy control, and achieve the access user's security control. The model is
implemented using the plaintext in Kaspersky Anti-virus software as an extension
of information. The server realizes the authentication by simple string comparison.
The test results show that our model can check whether the host meets the policy
before it accesses to the network, and restrict the unmatched host and enhance the
network security control effectively.

Keywords. radius EAPOL 802.1x, access control, authentication client, radius


agency

Introduction

With the popularity of the network and the enterprise informatization, most companies
now choose Ethernet as it is simple, flexible and cheap. However, the birth defects in
Ethernet safety makes it necessary to achieve the user-level access control. 802.lx is a
standard defined by IEEE to deal with the Port-Based Network Access Control[1]. The
IEEE 802.1x protocol inherits the advantage of IEEE 802.1x LAN and provides a
measure to authenticate and authorize the user which has connected to the local area
network [2].
The current Popular security access control model is based on 802.1x protocols,
which uses the user name and password to be authenticated, and allows only legitimate
users to access the network [3]. Authentication and key agreement schemes are widely
adopted in many applications [4]. Such as Ruijie authentication system, its
authentication is combined with account, password, IP address and MAC address,
effectively put an end to illegal users to access the internal network. In the practical
application of such systems, we have found that some users have the right to enter the

1
Corresponding Author: Han-Ying CHEN, College of Information Science and Technology, Jinan
University, 510632, Guangzhou, China; E-mail: jackchenhy@hotmail.com.
408 H.-Y. Chen and X.-L. Liu / An Enhanced Identity Authentication Security Access Control Model

network, but their computers may not be suitable for access networks. As we know,
computers and other mobile devices are easily infected with the external viruses,
trojans and other malicious codes. When they re-access the net, it will inadvertently put
malicious code into the internal network environment, affecting the network security
[5-6].
This model only solves the problem of user identity, without taking into account
the safety of terminal equipment. We look forward to security access control system
that can check whether the host meets the security policy before it access the network,
to restrict suspicious host accesses to the network until it has taken appropriate safety
measures. This will not only help the host avoids becoming the target of malicious code
attacks, but also help the host avoids becoming the source of malicious code. In this
regard, this paper presents a security access control model, which authenticates the
client's security situation (such as the installing and operating information of anti-virus
software) as extended information, to strengthen the security policy control of terminal
equipment.

1. 802.1x Protocol and Transmission Analysis

802.1x is a port-based security access control technology, authenticating and


controlling LAN access equipment at the physical access level. The port opens if the
user can be authenticated, he can access to LAN resources, if not certified, Port will be
in a disconnected state, the user cannot access LAN resources. The overall 802.1x
consists of the client the device and authentication server [7].
The device package Packet Body field in the EAPOL frame into the EAP-Message
attribute in RADIUS Access-Request packet [8-9]. It is proved that in EAP-
Response/Identity frame, we increase several bytes of information behind the user
name, and separate the two with a bytes value of 0 at least. Then in the Radius packet
sent by equipment, the attribute of user name is correct, and the attribute of EAP -
Message retain the complete Packet Body [10]. The proposed model uses this method
to transmit extended information. The structure of EAP-Response/Identity frame
building has two points to note. The 0 must be added after the user name, to help the
device to determine the correct user name and facilitate the server analysis the extended
information; the calculation of Packet Body Length must consider the extended
information [11].

2. 802.1x Protocol-based Security Access Control Model

This paper proposes the following model, the client adds all kinds of information
collected by host as extension information to the user name frame, and then the device
packages it and forwards to the server. First the extended information is verified, if it
meets the requirements the user name and password are to be authenticated. Otherwise,
the authentication request is rejected [12]. Authentication process is shown in Figure 1.
The authentication process of the model:
x When the user needs to access the network, he opens the 802.1x client
program, enters the user name and password registered to initiate the
H.-Y. Chen and X.-L. Liu / An Enhanced Identity Authentication Security Access Control Model 409

connection request. Client program will send the request certified EAPOL-
Start frame to the device, triggering authentication.

Figure 1. Authentication process.


x The device receives the request authentication frame, and then sends a EAP-
Request/Identity frame requiring the client to send the user name and the
extension information.
x The client responds to the device request, and sends the user name and the
expansion information to the device through the EAP-Response/Identity
frame. The device packages it then sends to the Radius agent.
x After Radius agent receives the Radius Access-Request packet from the
device, it extracts the expansion information. If the information meets the pre-
policy, the packet will be forwarded to the server, or the server rejects the
request with Access-Reject packet to terminate this authentication.
x After authentication server receives the Access-Request packets, it finds the
password corresponding with user name in the database, and encrypts it with
MD5 digest. Also, through Radius Access-Challenge packet this encryption
word is sent to the Radius agent, to be forwarded to the device. The device
generates EAP-Request/MD5 Challenge frame to send to the client after the
analysis.
x The client program receives the encrypted word, then encrypts the password
with MD5 digest and packages it to the EAP-Response/MD5 Challenge, sends
to the authentication server by the device and Radius proxy.
x Authentication server compares the received the encrypted password
information and the password information generated by its own encryption
algorithms. If they are the same, the server determines the user is the
legitimated and responds with Access-Accept packet to open the
corresponding device port, allowing the user to access the network through
the port. Otherwise, the server responds with Access-Reject packet, the port
remains closed.
Compared with the traditional model, the characteristics of this model is to support
the transmission of expansion information from the client to the service. The server
adds a proxy program ahead the standard server to certificate the extension information.
410 H.-Y. Chen and X.-L. Liu / An Enhanced Identity Authentication Security Access Control Model

This extension information can be operating system version, system vulnerabilities,


patch information, installation and operation of anti-virus software or firewall, etc. In
the service we can develop strategies for such information and refuse the host that does
not meet the strategies to access network. This authentication process of model is
compatible with the traditional model, so to upgrade is only need to replace the client
program and add Radius agent, the server program and the original database can be
retained.

3. Model Realization

3.1. Authentication Client

The concrete realization of the client program refers to the open source ThorClient2
program, and adds extensions information gathering functions based on it. The overall
structure of the improved Authentication client program is shown in Figure 2.

Figure 2. The overall structure of the client program.

The main security goals of such client program are authentication and privacy.
This protocol allows client to authenticate and establish a secure session key through a
server over an insecure channel [13].On login screen the user enters the user name,
password, and the network adapter selected, even call for configuration module to
configure the detail if necessary. The configuration information is saved for later
authentication. EAPOL module is the core of the software, which is responsible for
completion of the function of PAE of 802.1x authentication client. Authentication data
frame of authenticator system is charge of capturing data transmit/receive module,
authentication request response frame is structured according to the state of requesting
PAE state machine processing, and is forwarded to the data transmit/receive module.
MD5 module realizes the encryption on the user name and password.
Expansion information collection module is to collect all kinds of information of
system in accordance with preset policies, and send it to the EAPOL module. This
paper simplified this module to test whether local system have installed and run the
Kaspersky anti-virus software, and generate extended information. The concrete
method is to read the latest start time and the last update time of Kaspersky from the
Windows registry, and see if there are running the avp.exe processes as SYSTEM user
H.-Y. Chen and X.-L. Liu / An Enhanced Identity Authentication Security Access Control Model 411

and current user in system process list, and then transmit the information to EAPOL
module.
EAPOL module is to add the information that is the user name from the user
interface and the extension information from information gathering module to EAP-
Response/Identity frame; the two are separated by byte 0. To facilitate the processing
of the Radius proxy, we set the user name information and the extension information to
a fixed length, and complete with 0 when it is less than the length.

3.2. Radius Agency

In this proposal, Radius agency's main task is to analysis and processes the received
UDP packets.
According to the direction of the UDP package, the program is divided into two
threads: 1) The packet from the device, the agency determine whether it is Radius
Access-Request packet with EAP-Response/Identity. If it is, then certificate its
extension information and forward the packet meeting the requirements to the server. If
the packet does not meet the requirements, the agency replies Radius Access-Reject
packet to feedback refused authentication information and terminates this
authentication. EAP-Response/MD5 Challenge packet is forwarded to the server, while
non EAP-Response/Identity packet is discarded (process shown in Figure 3 below
Radius agency process); 2) The packet from the server is directly forwarded to the
device. This program simplifies the extension information authentication to the
comparison of string.

Figure 3. The process of Radius agency.

4. System Simulation Test

The purpose of this test is to verify the transmission of extension information and the
authentication results.
Test environment:1) We use NetgearGSM7312 switch as the device, the server
address is set to point at the Radius agency; 2) We set Winradius server addresses
within the Radius agency, the policy is set to require the client must be running
Kaspersky anti-virus software, and the update date is late within 7 days; 3) We use
Winradius as authentication server; 4) client program discussed in previous section is
running on the client host (required Winpcap4.0); 5)Radius agency, Winradius and
client program are all running on Windows XP SP3 environment.
Test I: The client host had installed Kaspersky Anti-Virus software but it was not
running, we entered the correct password in the client program. From the test results
(Figure 4 Test I (RADIUS agent)) we can see the Radius agent found it did not meet
security policies while authenticating extension information, then Radius replied with
Access-Reject packet to reject the authentication request. The Winradius Server did not
412 H.-Y. Chen and X.-L. Liu / An Enhanced Identity Authentication Security Access Control Model

have any response. The client displayed with authentication failure and prompted the
user to check the anti-virus software. The client host still could not access the network
properly.

Figure 4. Test I (Failed authentication).


Test II: After the client host updated Kaspersky Anti-virus software, we entered
the correct user name and password in the client program. From the test results (Figure
5 Test II (RADIUS agent)) we can see The Radius agent has authenticated the
extension information, and transmitted all the packets in both directions between the
server and the switch. Winradius verified the user name and password, and then opened
the corresponding port switch, so the client host can access the network properly.

Figure 5.Test II (Successful authentication).


The results show that Radius agent can block the authentication request when the
extension information does not meet the security policy, and feedback the failure
reason to the client program. In the case that extension information meets the strategy;
the authentication result of our system is the same as before. Our security access
control system can check if the host meets the policy before it accesses to the network,
to restrict the unmatched host and enhance the network security policy control
effectively.
H.-Y. Chen and X.-L. Liu / An Enhanced Identity Authentication Security Access Control Model 413

5. Conclusion

In this paper, we proposed a security access control model to realize the transmission
and authentication of extension authentication information from client to server with
802.1x protocol. Model is implemented using the plaintext in Kaspersky Anti-virus
software as an extension of information. The server realizes the authentication by
simple string comparison. In fact, the extension information can be the client's
operating system, software environment, network conditions and other ciphertext or
plaintext, the authentication of the extension information in server can also be varied.
Through the authentication of extension information, this security access control model
can only allow the networking equipment which meets the security policy to access
into the network, in order to reduce security threats from the network. This model can
be applied in system which is similar with the test environment above, and can resist
attacks with simple or single extension information authentication. If extended
information consists of a variety of data sources, then this model can't work. In the
future, we will make more experiments to improve the performance of the model. This
model still has some defects to be deal with. For instance, this access control model is
not comprehensive enough; it should be a more widely used in any system. Another
limitation is how to prevent more and more complex attacks. All these problems need
to be studied in the future.

References

[1] N. Hoque, Monowar H. Bhuyan, R.C. Baishya, D.K. Bhattacharyya, J.K. Kalita, Network attacks:
Taxonomy, tools and systems, Journal of Network and Computer Applications, 40(2014),307-324.
[2] X. Wang , Research on the 802.1x Authentication Mechanism and Existing Defects, International
Conference on Advanced Mechanical Engineering (AME 2012),192(2012), 385-389.
[3] D. Yadav, A. Sardana, Authentication Process in IEEE 802.11: Current Issues and Challenges, Advances
in Network Security and Applications, 196(2011), 100-112.
[4] I.P. Chang,T.F. Lee,T.H. Lin,C.M. Liu, Enhanced Two-Factor Au-thentication and Key Agreement
Using Dynamic Identities in Wireless Sensor Net-works, Sensors,15(2015),29841-29854.
[5] J. Soryal, T. Saadawi, IEEE 802.11 DoS attack detection and mitigation utilizing Cross Layer Design. Ad
Hoc Networks, 14(2014), 71-83.
[6] M. Cheminod, L. Durante, L. Seno, A. Valenzano, Detection of attacks based on known vulnerabilities in
industrial networked systems, Journal of Information Security and Applications, 2016.
[7] S. Hong, J. Park, S. Han, J. Pyun, J. Lee. Design of WLAN Secure System against Weaknesses of the
IEEE 802.1x, Advances in Hybrid Information Technology, 4413(2007), 617-627.
[8] C. Chen, J. Zhang, J. Liu, Design in the Authentication and Billing System Based on Radius and 802.1x
Protocol, International Symposium on Computers and Informatics (ISCI), 13(2015),1438-1443.
[9] A.K. Dalai, S.K. Panigrahy, S.K. Jena, A Novel Approach for Message Authentication to Prevent
Parameter Tampering Attack in Web Applications, Procedia Engineering, 38(2012), 1495-1500.
[10] Eshmurzaev, B.Eshmurzaev, Dalkilic, G. Dalkilic, Analysis of EAP-FAST Protocol, Proceedings of the
iti 2012 34th international conference on information technology interfaces (iti), (2012),417-422.
[11] J. Hur, C. Park, H. Yoon, An Efficient Pre-authentication Scheme for IEEE 802.11-Based Vehicular
Networks , Advances in Information and Computer Security, 4752(2007), 121-136.
[12] S. Wijesekera, X. Huang, D. Sharma, Utilization of Agents for Key Distribution in IEEE
802.11,Advances in Intelligent Decision Technologies, 4(2010),435- 443.
[13] Farash, Mohammad, Attari, Mahmoud. An efficient client-client password-based authentication scheme
with provable security, Journal of Supercomputing, 70(2014),1002-1022.
414 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-414

Recommending Entities for E-R Model by


Ontology Reasoning Techniques
Xiao-Xing XU a, Dan-Tong OUYANG a, b, Jie LIU a, b, 1, Yu-Xin YE b, c
a
College of Software, Jilin University, Changchun 130012, China
b
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of
Education, Changchun 130012, China
c
College of Engineering and Computer Science, Wright State University Dayton, OH
45435, USA

Abstract. Designing E-R model is still the key process for database design. This
paper proposes a novel method to recommend entities for E-R model by OWL on-
tology. OWL ontology is open and shared knowledge which can be easily obtained
for anybody from the internet. We adopt ontology reasoning techniques to capture
domain concepts according to a set of predefined vocabularies. Some correspond-
ing concepts in OWL ontology are pinpointed by a preprocessing module first.
Then these concepts used as seeds are extended, deleted and modified by an opera-
tion module. We can acquire reasonable entities for E-R model through a series of
processes. Based on experimental study the entities recommended by this method
approach the expert level.

Keywords. Database design, E-R model, Semantic web, Ontology reasoning

Introduction

E-R model introduced by Peter Chen [1] is a form of knowledge representation. With
the development of semantic web and researches of knowledge representation, domain
ontology has gradually become one of the main representations of domain knowledge,
for instance, the Foaf ontology [2], the GO ontology [3] and the SWEET ontology [4].
In addition, description logic language [5] can not only express displayed knowledge in
the domain but also imply implicit knowledge.
Many researchers devised different methods to construct relational database, and
some of them tried to build database using domain ontology. For example, Storey et al.
[6] presented an ontology that could classify terms to several categories. Sugumaran et
al. [7] proposed a methodology for supporting database design and evaluation. Gali et
al. [8] gave a method for querying ontology, which stored ontology information in
relational tables. Vysniauskas et al. [9] put forward some algorithms to convert well-
formed ontology to relational database. In this paper, a new method is proposed to
recommend entities for E-R model. We obtain relevant concepts from domain ontology
by description logic reasoning.

1
Corresponding Author: Jie LIU, College of Software, Jilin University, Changchun 130012, China; E-
mail: liu_jie@jlu.edu.cn.
X.-X. Xu et al. / Recommending Entities for E-R Model by Ontology Reasoning Techniques 415

1. Resolved Framework

Consisting of three function modules, the resolved framework is shown in Figure 1.


Through interaction among three modules, we will obtain the final entities recom-
mended for E-R model.

Figure. 1. Resolved framework

Preprocessing module. The hierarchy tree of OWL ontology is generated by trans-


ferring Hierarchy API. Then we extend the set of requirement terms which are provid-
ed by customers to the set of concept terms. Those concepts are pinpointed in the hier-
archy tree, at the same time, get the set of concept terms pinpointed (PTSet).
Reasoner module. In reasoner there are many kinds of APIs and the core compo-
nent is consistency checking. We mainly use three APIs: Hierarchy API, Subsumption
API and Disjointness API. When some operations in the third module need call the
corresponding interfaces, these APIs switch reasoning tasks to the consistency check-
ing through invoking the core component and return results back to the third module.
Operation module of recommending entities. In this module we execute mainly
three operations: extending entities, deleting entities and modifying entities. These
operations obtain results through calling the corresponding APIs. The details will be
introduced in subsequent sections.

2. Preprocessing Module

The expression of a concept provided by customers may be different from the expres-
sion of knowledge in domain ontology. Besides, the requirement terms provided by
customers may be not comprehensive. We need extend the requirement terms further.
We first call Hierarchy API of reasoner to generate the concept hierarchy tree of do-
main ontology, by which database designers can grasp the whole hierarchy of OWL
ontology. Then using WordNet [10] combined with semantics similarity comparison
[11] we obtain synonyms of requirement terms in ontology by calculating the similarity
value of two concepts. The similarity concepts gained from ontology and these re-
416 X.-X. Xu et al. / Recommending Entities for E-R Model by Ontology Reasoning Techniques

quirement terms are pinpointed on the concept hierarchy tree. Through this way we
realize the initial extension for requirement terms.

3. Operation Module of Recommending Entities

3.1. Extending Entities

For designers and customers, it is inconvenient if they want to pinpoint or add a con-
cept in a large chart. We select a part of chart from the whole concept hierarchy tree.
(1) Owl: Thing and its direct child concepts
(2) All concepts in PTSet
(3) The direct child concepts and father concepts of concepts in PTSet
(4) These concepts at the same level with concepts in PTSet
Owl: Thing and its direct child concepts can define the scope of concepts. Besides,
these concepts in PTSet are the nearest ones to customers’ requirements. The child
concepts and father concepts of concepts in PTSet are very important to refine the enti-
ties of E-R model. Last, those concepts at the same level with concepts in PTSet can
approximately get concepts and entities with similar granularity.
Algorithm. getPartHierarchies
Input: HierarchiesTree, PTSet
Output: PartHierarchies: the part concept hierarchy tree
1 rootkHierarchiesTree
2 for each concept in root
3 PartHierarchies.add (concept);
4 for each sub in concept
5 PartHierarchies.add (sub);
6 end for
7 end for
8 for each concept in PTSet
9 PartHierarchies.add (concept);
10 for each sub in concept
11 PartHierarchies.add (sub);
12 end for
13 for each sup in concept
14 PartHierarchies.add (sup);
15 buildRelation (sup, root.sub);
16 end for
17 concept*=getSameLayer (concept);
18 PartHierarchies.add (concept*);
19 buildRelation (concept*, root.sub);
20 end for
21 return PartHierarchies;

3.2. Deleting Entities

Combining structural features of E-R model and those concepts in PTSet, we give some
principles to delete concepts from the part hierarchy tree and save these deleted con-
cepts into a set of concept terms deleted (DTSet) used to modify entities.
X.-X. Xu et al. / Recommending Entities for E-R Model by Ontology Reasoning Techniques 417

Principle 1 If there is inclusion relation between concepts, the father concept has
only one child concept, and these two concepts have different relations R with other
concepts in part hierarchy tree. Then retain these two concepts.
Principle 2 If there is inclusion relation between concepts, these concepts have the
same relation R with other concepts. Then retain the concept appearing in PTSet.
Principle 3 If there is inclusion relation between concepts, the father concept has
more than one child concept, and these concepts have the same relation R with other
concepts. Then retain father concept and delete child concepts.
Principle 4 If there is inclusion relation between concepts, the father concept has
more than one child concept. These concepts have different relations R with others.
Then delete these child concepts and retain father concept, meanwhile, retain those
child concepts that have different relations with father concept.

3.3. Modifying Entities

Through we have basically established the entities used to recommend for E-R model.
We cannot exclude there are isolated entities (these entities have no relations with oth-
ers). Directly delete isolated entities may cause the loss of some useful information. We
proposed some modifying principles as follows to modify isolated entities.
Principle 1 If there are isolated entities, using these related concepts in DTSet
(such as father concept or child concept, not considering Thing or Nothing) instead of
the isolated entities obtain relations R among entities again. If there are not isolated
entities, then use these related concepts instead of isolated entities.
Principle 2 If those related concepts are still isolated entities, continue to find re-
lated concepts of those related concepts until there are no concepts used to replace.
Then delete isolated entities and all related concepts.
Principle 3 If there are no isolated entities, then do not modify the entities.

4. Experiment and Evaluation

4.1. Experimental Design

To assess the usefulness of this method proposed in this paper, we experimented by


two popular ontologies which named SWEET 2 and GO3.
Eighteen people participated in the study, including five geological experts, five
genetics experts, three geological students, three genetics students and two students
(computer major) who have no domain knowledge about geology and gene. These
eighteen people were assigned to two groups, group A for geological domain and group
B for genetics domain. The students major in computer recommended entities with
SWEET or GO ontology for E-R model using our method. The three students in every
group having basic knowledge provided a set of requirement terms. Database designers
and experts would recommend entities for every term sets.

2
https://sweet.jpl.nasa.gov
3
http://www.geneontology.org/
418 X.-X. Xu et al. / Recommending Entities for E-R Model by Ontology Reasoning Techniques

4.2. Data Analysis and Results

For geological domain the statistical information was shown in Table 1. We first com-
puted the number of entities recommended by designer and experts listed in the column
1 and column 2 respectively. Then through matching we calculated the number of re-
petitive entities and relevant entities in the column 3 and column 5.
Table 1. The number and rate of entities recommended in geological domain

Designer Experts Repetitive Repetition Relevant Correlation Irrelevant


entities entities entities rate (%) Entities rate (%) terms rate
(De) (Ee) (Rep) (Rel) (%)

298 254 155 61.023 218 85.826 26.845


273 161 58.974 229 83.882 14.765
239 143 59.832 186 77.824 37.583
251 152 60.557 208 82.868 30.201
287 176 61.324 243 84.668 18.456
311 267 168 62.921 216 80.898 30.546
285 152 53.333 223 78.245 28.295
259 144 55.598 214 82.625 31.189
271 162 59.778 227 83.763 27.009
293 175 59.726 246 83.959 20.900
305 266 163 61.278 215 80.827 29.508
281 175 62.277 236 83.985 22.622
253 149 58.893 204 80.632 33.114
284 167 58.802 235 82.746 22.950
273 158 57.875 223 81.684 26.885
The three rates in table were gotten by three formulas. The statistical information
for genetics domain was shown in Table 2.
Table 2. The number and rate of entities recommended in genetics domain

Designer Experts Repetitive Repetition Relevant Correlation Irrelevant


entities entities entities rate (%) Entities rate (%) terms rate
(De) (Ee) (Rep) (Rel) (%)

192 167 104 62.275 135 80.838 29.687


183 125 68.306 148 80.874 22.916
171 109 63.742 143 83.625 25.520
162 98 60.493 133 82.098 30.729
175 101 57.714 146 83.428 23.958
224 184 112 60.869 151 82.065 32.589
169 96 56.804 138 81.656 38.392
192 109 56.770 161 83.854 28.125
209 128 61.244 167 79.904 25.446
174 108 62.068 136 78.160 39.285
207 184 113 61.413 145 78.804 29.951
198 121 61.111 161 81.313 22.222
167 93 55.688 134 80.239 35.265
204 131 64.215 169 82.843 18.357
162 103 63.580 128 79.012 38.164
Correlation rate=Rel/Ee*100% Repetition rate=Rep/Ee*100% Irrelevant rate=(De-Rel)/De*100%
X.-X. Xu et al. / Recommending Entities for E-R Model by Ontology Reasoning Techniques 419

According to above statistic analysis, we summarized further the average values in


Table 3 which were the average of corresponding data in Table 1 and Table 2.

Table 3. The average number of entities and the average ratio


Domain Designer Experts Repetitive Repetition Relevant Correlation Irrelevant
entities entities entities rate (%) entities rate (%) terms rate
(%)
Geology 305 269 160 59.479 222 82.527 27.213
Gene 208 180 110 61.111 146 81.111 29.807

5. Conclusions

With abundant ontology resources in network, combining ontology reasoning, we of-


fered an algorithm and some principles in the process of recommending entities. We
used an experiment and data analysis verified that this method was useful and efficient.
If there is no suitable ontology in network, we may search some related ontologies
containing a part of requirement terms. These candidate ontologies can be merged
through ontology matching. Then we obtain useful knowledge to recommend entities
for E-R model through the method proposed by this paper.

References

[1] P P S Chen, The entity-relationship model—toward a unified view of data. ACM Transactions on Data-
base Systems (TODS) 1(1), (1976), 9-36.
[2] M Graves, A Constabaris., D Brickley, Foaf: Connecting people on the semantic web. Cataloging &
classification quarterly 43(2007), 191-202.
[3] Gene Ontology Consortium., The Gene Ontology (GO) database and informatics resource. Nucleic acids
research 32(suppl 1) (2004), D258-D261.
[4] R Raskin, M. Pan, Semantic web for earth and environmental terminology (sweet). Proc. of the Work-
shop on Semantic Web Technologies for Searching and Retrieving Scientific Data, 2003.
[5] F Baader, The description logic handbook, Cambridge: Cambridge University Press , 2003.
[6] V C Storey., D Dey., H Ullrich., et al., An ontology-based expert system for database design. Data &
Knowledge Engineering 28(1) (1998), 31-46.
[7] V Sugumaran., V C Storey., The role of domain ontologies in database design: An ontology management
and conceptual modeling environment. ACM Transactions on Database Systems (TODS) 31(3) (2006),
1064-1094.
[8] A Gali., C X Chen., et al., From ontology to relational databases. International Conference on Conceptu-
al Modeling. 2004, 278-289.
[9] Vysniauskas E., Nemuraite L.: Transforming ontology representation from OWL to relational database.
Information technology and control 35(3) (2015).
[10] G A Miller WordNet: a lexical database for English. Communications of the ACM, 1995, 38(11): 39-
41.
[11] D Lin, Automatic retrieval and clustering of similar words. Association for Computational Linguistics
1998, 768-774.
420 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-420

V-Sync: A Velocity-Based Time


Synchronization for Multi-Hop Underwater
Mobile Sensor Networks
Meng-Na ZHANGa,b,1, Hai-Yan WANGa,b, Jing-Jie GAOa,b and Xiao-Hong SHENa,b,2
a
School of Marine Science and Technology, Northwestern Polytechnical University,
127West Youyi Road, Xian, Shaanxi, China, 710072
b
Key Laboratory of Ocean Acoustics and Sensing (Northwestern Polytechnical
University), Ministry of Industry and Information Technology, Xian, Shaanxi, China,
710072

Abstract. Time synchronization plays an essential role in underwater wireless


sensor networks (UWSNs). Many time synchronization methods have been
proposed for terrestrial wireless sensor networks. However, none of them can be
directly applied to UWSNs, because there are two fundamental challenges namely
large propagation delays and node mobility. In this paper, we proposed a velocity-
based time synchronization algorithm, called V-Sync. It’s a novel method for
multi-hop mobile UWSNs, which investigates the relationship between time-
varying propagation delay and node mobility to eliminate the effect of node
mobility and then corrects time deviation. In addition, it generates a stratified node
model to make the multi-hop network time synchronized. Simulation results show
that V-Sync achieves high precision and outperforms the existing time
synchronization algorithms obviously.

Keywords. UWSNs, time synchronization, node mobility, multi-hop

Introduction

In recent years, UWSNs have drawn considerable attentions from both academy and
industry. It facilitates a wide range of aquatic applications such as undersea exploration,
assisted navigation, environmental monitoring, disaster prevention and mine
reconnaissance. And most of them require that nodes have consistent time. However,
the local clock of nodes has an intrinsic drift because nodes go out of sync as time
elapses. Therefore, time synchronization is a necessary prerequisite for underwater
network system [1-4]. Unlike terrestrial wireless sensor networks, there are many
characteristics in UWSNs such as large and unstable propagation delay, node mobility
and severe signal attenuation, which increase the difficulty of time synchronization.
Furthermore, wireless sensors are powered by battery, and replacing the battery is very

1
Corresponding Author: Meng-Na Zhang, 127West Youyi Road, Xian, Shaanxi, China; Email:
mengnazhang@163.com.
2
Corresponding Author: Xiao-Hong Shen, 127West Youyi Road, Xian, Shaanxi, China;
E-mail: xhshen@nwpu.edu.cn.
M.-N. Zhang et al. / V-Sync: A Velocity-Based Time Synchronization 421

inconvenient. So the time synchronization of UWSNs is more difficult than that of


terrestrial networks.
Existing time synchronization algorithms for terrestrial sensor network are mainly
NTP, RBS, FTSP and TPSN and so on [5-9]. However, these algorithms cannot be
directly applied to underwater communication environment due to its special nature. So
according to the characteristics of UWSNs, the researchers carried out a series of
research work start with long propagation delay and node mobility.
TSHL [10] is the first research on high latency networks time synchronization
algorithm. It consists two phases. In the first phase, the master node transmits multiple
beacon to estimates the clock skew using linear regression. In the second phase, the
time offset is corrected by exchanging two-way messages. Tri-Message [11] is a
lightweight time synchronization protocol, which is improved on the basis of TSHL.
This algorithm only sends one message packet in the first phase. So the communication
overhead is much lower than TSHL. However, TSHL and Tri-Message suppose that
nodes are static and propagation delay is a constant, which are not reasonable for the
actual underwater environment.
MU-Sync divides the network into clusters considering the mobility of underwater
nodes [12].The clock skew is estimated by performing linear regression twice over a
set of local timing information collected by a two-way message exchange with a
cluster-head. The algorithm requires lots of beacon nodes as cluster-head and they
should be evenly deployed in the network, which resulting in high cost of node
deployment. This deployment is also very susceptible to the underwater environment.
Mobi-Sync [13] estimates the propagation delay through the velocity estimation
formula, and makes use of the correlation of the space between nodes to eliminate the
influence of the mobile node. This algorithm requires self-locating buoys nodes and
energy-sufficient super nodes. It needs location information of nodes in the estimation
process, whereas the location algorithms typically depend on the synchronization
results. And the assumption of super node is difficult to achieve since each common
node does two-way interactions three times with super node at least. D-Sync utilizes
Doppler shift estimation to time synchronize. But it has complicated computing process
and is susceptive with the measure precision of Doppler shift, which varies with
environment and is difficult to obtain [14].
Considering the defects of these UWSNs time synchronization algorithm, this
paper proposed a time synchronization algorithm making use of dynamic nodes that
has been synchronized to assist the whole network synchronization on the basis of
nodes’ relative velocity, which is suitable for multi-hop mobile UWSNs. The algorithm
can increase the accuracy of synchronization, as well as reduce complexity and power
consumption of synchronization.

1. Algorithm Design

1.1. The Basic Idea of Time Synchronization

In a distributed network, the node calculates local time with the crystal. However,
different nodes have different frequency due to the hardware and production process.
Thus they have different internal clock, which result in the time of different nodes are
not synchronous. The relationship between standard time and local time is
422 M.-N. Zhang et al. / V-Sync: A Velocity-Based Time Synchronization

T = D t +E (1)

where T is the standard time.t is the node's local time. α and β are the clock
frequency skew and the time offset. When there is no deviation, α and β are 1 and 0
respectively.
The basic idea of time synchronization is the slave node estimates clock frequency
skew and time offset by exchanging messages, then it compensates time deviation
between local and standard time to keep them synchronous. And the sync error is the
difference between them after synchronization.

1.2. Details of V-Sync

The algorithm achieves time synchronization in multi-hop mobile UWSNs by first


establishing a stratified node model in the network, and then realizing time
synchronization for point-point mobile nodes and adjusting synchronization period
adaptively at last.

1.2.1. Stratified Node Model


This scheme assumes that each node in the network has its own unique and fixed
identity which is called ID. The anchor node’s ID and level number are 0 in default.
The anchor node or node synchronized (master node) initiates time synchronization,
and the packet it sends includes the time it leaves the master node, its ID and level.
When the slave node which is closer to the master node receives a packet, it will wait
for a period of time T to receive packets. The length of this time T is determined by the
size of timestamp and data transmission rate. After this time, the node compares all
packets received. It will make the node whose level is the lowest as its parent node; this
will avoid the error accumulated by the increased number of hops. And then make the
parent node’s level number plus one as its level number. And then the node sends out
packet to determine the next level until all nodes in the network find their own parent
node. In addition, the network will be re-stratified according to node mobility.

1.2.2. Time Synchronization of Mobile Node


Since node mobility causes propagation delay instable, the accuracy of current classical
algorithms is not high enough., This paper assumes that the nodes move with a fixed
relative velocity we set it to V, and then one additional packet delivery is added based
on Tri-Message to compensate propagation delay variation in synchronization process.
In order to improve the efficiency of synchronization, the child nodes make point-point
synchronization with their parent once determine the parent node of their own in
generating stratified node model. Figure 1 shows the synchronization process.

Figure 1. Point-Point Synchronization Process.


M.-N. Zhang et al. / V-Sync: A Velocity-Based Time Synchronization 423

The relationships between time information are functions through the clock model
as follows:

B0 =D A0 + d0  E (2)

B1 =D A1 - d1  E (3)

B2 =D A2 + d 2  E (4)

B3 =D A3 - d3  E (5)

where A0,B0,B1,A1,A2,B2,B3 and A3 are eight time information obtained by the four
message exchanges between father and child node. di denotes corresponding
propagation delay. And A0 can be obtained in the process of establishing stratified node
model, which reduce one message delivery. To account for node mobility, the
relationships of propagation delay can be written as:

Li
di (6)
vs

'Li Li  Li 1
'di = (7)
vs vs

We expand the equations as follows:

v ª¬ d1  A1  d1  A0  d 0 º¼
d1  d 0 (8)
vs

v ª¬ d 2  A2  A1 º¼
d 2  d1 (9)
vs

v ª¬ d3  A3  d 3  A2  d 2 º¼
d3  d 2 (10)
vs

where L0 is the propagation distance from the position X to Y, where X is the


father node sends packet at A0 and Y is the child node receives packet at B0. Vs denotes
acoustic propagation velocity underwater. We set  = / /ž , the equations are
simplified as follows:

d1 k A1  A0  1  k d0 (11)
424 M.-N. Zhang et al. / V-Sync: A Velocity-Based Time Synchronization

k
d2 A2  A0  d 0 (12)
1 k

d3 k A3  A0  1  k d0 (13)

We substitute the equations above into 2 ~ 5 and assume that


0<k<0.01 ,which is reasonable since the maximum velocity of sensor is 3 m/s
underwater. Therefore, we can respectively solve the parameters α and β are:

1 § B1  B3 B0  B2 ·
D = ¨  ¸ (14)
2 © A1  A3 A0  A2 ¹

1 § B1  B3 A1  A3 B0  B2 A0  A2 ·
E = ¨¨ B0  B1  B2  B3   ¸¸ (15)
4 © A1  A3 A0  A2 ¹

and then compensate the time deviation to achieve time synchronized using the
formula t = (B - β) / α. Furthermore, we found there is no need to calculate k in this
process, so we do not need an additional measurement for specific value of V.
In addition, the network is prone to packet collide, so this paper adds collision
avoidance procedure to reduce the possibility of packets collision when sending
packages. After the child node receives the packet from its father node, it will wait for a
period of time. The length of this time is determined by the size of timestamp and data
transmission rate. If a node does not receive the packet from corresponding node after
waiting for the time, it will re-send the last packet until it receives response packet.

1.2.3. Adjustment of Synchronization Period


In order to reduce energy consumption as much as possible, this paper takes adaptive
adjustment method of synchronization period to extend synchronization cycle under the
premise of the synchronization accuracy. If synchronization error is small or large this
time, it will lengthen or shorten the next synchronization period respectively.
We set the synchronization accuracy as μ. The maximum synchronization error of
all the nodes is ε. Tn-1 is the (n-1)-th synchronization period. Then the predicted and
actual n-th synchronization period is T and Tn, which are calculated as follows:

T P
(16)
Tn 1 H

1 i n2
Tn a T  b ¦ Ti
n2 i 1
(17)

As can be seen from the above formula, Tn is average of predicted period and
historical period. a and b are weighting coefficients and a+b=1. We also set an error
threshold and order nodes don’t synchronize if ε is less than the threshold. By the
M.-N. Zhang et al. / V-Sync: A Velocity-Based Time Synchronization 425

adjustment, synchronization period can be largely extended and energy consumption


will be reduced.

2. Simulation and Analysis

In this section, we first give a detailed account of the simulation process, and then the
simulation results are analyzed from multiple aspects so as to evaluate the performance
of V-Sync.

2.1. Simulation Setup

We use OPNET simulation platform with underwater channel to run our scheme. The
simulation is based on mesh network topology with one common anchor node and 60
ordinary nodes. The errors encountered in message exchange and process is modeled
by Gaussian distribution. The time we use are all from the MAC layer timestamp. This
paper compares V-Sync with the algorithm what are extended by Tri-Message that we
call it E-Tri-Message. The other parameters used in our simulation are as follows:
x Clock initial skew is 40 ppm.
x Clock initial offset is 100 μs.
x The clock jitter is 15μs.
x Propagation speed is 1500m/s.
x Wait time is 5s.
x Maximum retransmit count is 3.
x The maximum speed (Vmax) of sensor is 5m/s.

2.2. Analysis

First of all, we study the effect of node level to these two algorithms. As can be seen
from Figure 2 and 3, skew error and offset error of V-Sync is much less than E-Tri-
Message. And the raw data are presented in Table 1 and Table 2.This is because Tri-
Message does not consider the effect of node mobility on the propagation delay, and it
supposes packet propagation delay is equal every time. Therefore its error greatly
increased but the error of V-Sync is very small. The results also show that the larger the
node level, the error grows, which is because the error accumulates with the number of
hop increasing.
4
10
V-Sync
E-Tri-Message

3
10
Skew Error(ppm)

2
10

1
10

0
10
1 1.5 2 2.5 3 3.5 4 4.5 5
Node Level

Figure 2. The effect of node level on skew error


426 M.-N. Zhang et al. / V-Sync: A Velocity-Based Time Synchronization

5
10
V-Sync
E-Tri-Message

4
10

Offset Error(us)
3
10

2
10

1
10
1 1.5 2 2.5 3 3.5 4 4.5 5
Node Level

Figure 3. The effect of node level on offset error


Table 1. The raw data of skew (ppm)
Node level V-Sync E-Tri-Message
1 3.56 297.36
2 5.68 551.00
3 7.92 745.24
4 8.37 872.81
5 7.72 916.61

Table 2. The raw data of offset (μs)

Node level V-Sync E-Tri-Message


1 43.3 2636.3
2 102.1 7750.6
3 195.7 13167.5
4 204.3 18195.3
5 221.6 20970.6
Secondly, we study the effect of varying nodes’ relative velocity. Figure 4 shows
that there is little change in synchronization error of V-Sync, however, the error of E-
Tri-Message increases significantly with increasing speed. This indicates that V-Sync
is more stable in the actual environment that propagation delay always changes.
4
10
V-Sync
E-Tri-Message
3
10

2
10
us
Error

1
10

0
10

0 1 2 3 4 5 6
V m/s

Figure 4. The effect of V


M.-N. Zhang et al. / V-Sync: A Velocity-Based Time Synchronization 427

60
V-Sync
E-Tri-Message
50

40

Error(s)
30

20

10

0
0 1 2 3 4
10 10 10 10 10
Time after Sync(s)

Figure 5. The error after synchronization


Finally, we can see the trend of error with the time elapsed after synchronization in
Figure 5. Just because V-Sync takes into account node mobility between round trips,
the synchronization error is much less than E-Tri-Message and its accuracy can be
about 25μs.The V- Sync performs better in UWSNs as time processes.
The simulation results show that, in multi-hop mobile UWSNs, the precision of
time synchronization algorithm this paper proposed is higher. It is still able to maintain
good performance with large node relative velocity.

3. Conclusion

In this paper, we present a velocity-based time synchronization algorithm for multi-hop


underwater mobile sensor networks. By investigating the relationship between time-
varying propagation delay and node mobility, it eliminates the effect of node mobility.
This paper also generates a stratified node model to make the full network time
synchronized and puts forward an adaptive synchronization period adjustment method.
Simulation results show that the V-Sync has a better performance in synchronization
accuracy and is easier to achieve than the other synchronization algorithms because it
doesn’t need additional hardware requirements and limit network topology. In future
work, we will analyze the adaptability of our algorithm to different cases such as the
nodes’ relative velocity changes and examine the applicability of our algorithm through
ocean experiment.

Acknowledgments

This work was sponsored by the National Natural Science Foundation of China
(61571365), the Seed Foundation of Innovation and Creation for Graduate Students in
Northwestern Polytechnical University (Z2016056).

References

[1] I. F. Akyildiz, D. Pompili, T. Melodia. Underwater acoustic sensor networks: research challenges. Ad
hoc networks 3 (2005): 257-279.
428 M.-N. Zhang et al. / V-Sync: A Velocity-Based Time Synchronization

[2] J. H. Cui, J. Kong, M. Gerla, et al. The challenges of building mobile underwater wireless networks for
aquatic applications. IEEE Network 20 (2006): 12-18.
[3] J. Heidemann, W. Ye, J. Wills, et al. Research challenges and applications for underwater sensor
networking. IEEE Wireless Communications and Networking Conference. WCNC 2006. 1(2006).
[4] J. Partan, J. Kurose, B. N. Levine. A survey of practical issues in underwater networks. ACM
SIGMOBILE Mobile Computing and Communications Review 11 (2007): 23-33.
[5] F. Sivrikaya, B. Yener. Time synchronization in sensor networks: a survey. IEEE network 18(2004): 45-
50.
[6] J. Elson, L. Girod, D. Estrin. Fine-grained network time synchronization using reference broadcasts.
ACM SIGOPS Operating Systems Review 36 (2002): 147-163.
[7] S. Ganeriwal, R. Kumar, M. B. Srivastava. Network-wide time synchronization in sensor networks.
Center for Embedded Network Sensing (2003).
[8] S. Ganeriwal, R. Kumar, M. B. Srivastava. Timing-sync protocol for sensor networks. Proceedings of
the 1st international conference on Embedded networked sensor systems. ACM, 2003.
[9] M. Maróti, B. Kusy, G. Simon, et al. The flooding time synchronization protocol. Proceedings of the
2nd international conference on Embedded networked sensor systems. ACM, 2004.
[10] A. A. Syed, J. S. Heidemann. Time Synchronization for High Latency Acoustic Networks. INFOCOM.
2006.
[11] C. Tian, H. Jiang, X. Liu, et al. Tri-message: a lightweight time synchronization protocol for high
latency and resource-constrained networks. 2009 IEEE International Conference on Communications.
IEEE, 2009.
[12] N. Chirdchoo, W. S. Soh, K. C. Chua. MU-Sync: a time synchronization protocol for underwater
mobile networks. Proceedings of the third ACM international workshop on Underwater Networks.
ACM, 2008.
[13] J. Liu, Z. Zhou, Z. Peng, et al. Mobi-Sync: efficient time synchronization for mobile underwater sensor
networks. IEEE Transactions on Parallel and Distributed Systems 24 (2013): 406-416.
[14] F. Lu, D. Mirza, C. Schurgers. D-sync: Doppler-based time synchronization for mobile underwater
sensor networks. Proceedings of the Fifth ACM International Workshop on Under Water Networks.
ACM, 2010.
Fuzzy Systems and Data Mining II 429
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-429

An Electricity Load Forecasting Method


Based on Association Rule Analysis
Attribute Reduction in Smart Grid
Huan LIUa and Ying-Hua HANb,1
a
School of Computer Science and Engineering, Northeastern University , Shenyang,
Liaoning, 110819, China
b
Department of Computer and Communication Engineering, Northeastern University
at Qinhuangdao, Qinhuangdao, Hebei, 066004, China

Abstract. Accurate short-term load prediction is advantageous to enhance the


secure and economic effect of electric system and ameliorate the supply quality.
Thus, it is important to find an effective method to improve the short-term forecast
precision effectively. There are lots of uncertain factors in smart grid power
system affect the accuracy of the load forecasting directly. Meanwhile some less
important factors can be reduced by attribute reduction. In this paper, association
rule analysis is proposed to analyze the relevance of power load and its influencing
factors, in order to improve the accuracy of load forecasting, and reduce the
operation time. In this paper, a new method (FNNAR) is presented to forecast
short-term load which is based on Association Rules (AR) and Fuzzy Neural
Network (FNN). The Association Rules Mining algorithm based on Quantitative
Concept Lattice (ARMQCL) is proposed to extract association rules. And attribute
reduction is carried out based on these extracted association rules. Finally, FNN
model uses the reduced attributes as the input to forecast the electric load in smart
grid. Experimental results indicate that the proposed forecast method (FNNAR)
has higher accuracy and less running time.

Keywords. smart grid, load forecasting, association rule mining, quantitative


concept lattice

Introduction

Smart grid is a nebulous terminology, which covers a variety of functions geared


towards making the power grid modernized. In order to make the electric power grid
more resilient, efficient and cost-effective, in its key area, smart grid uses digital
communication and control system to monitor and control power flows. Smart grid
modernizes the grid features such as power demand-side management, electricity
generation, real-time price, and automated meter activation and reading, in order to
increase the connection and self-coordination between providers, consumers, and the
network [1-3]. The ultimate goal of a smart grid is to reduce energy costs, improve
power quality, and reduce operating costs [4].

1
Corresponding Author. Ying-Hua HAN, Department of Computer and Communication Engineering,
Northeastern University at Qinhuangdao, Qinhuangdao, Hebei, 066004, China; E-mail: yhhan723@126.com.
430 H. Liu and Y.-H. Han / An Electricity Load Forecasting Method

At present, researchers have developed many forecasting methods and models. The
traditional short-term load forecasting methods include: time series, regression
forecasting and gray system theory. Modern and smart forecasting methods includes:
gray system theory method, Artificial Neural Networks (ANN) model, fuzzy inference
prediction method, genetic algorithm, Support Vector Machine (SVM) model, and
wavelet analysis method and so on [5-6]. Gray system theory has many advantages
such as it can be applied to any non-linear change of load index forecast, irrespective of
distribution law and change tendency, and easy operation. But it requires the load has
index change tendency, and the data with greater dispersion degree leads to worse
prediction accuracy [7]. Advantages of artificial neural networks are adaptive, self-
learning, strong computing, sophisticated mapping and a variety of intelligent
processing capabilities. However, ANN forecasting method is difficult to determine the
network structure scientifically. Learning speed is slow and has local minimum points
[8]. Wavelet analysis method has higher prediction accuracy; but can’t consider the
impact of weather, temperature and humidity and many other factors, and the
forecasting results have a great relationship with the choice of wavelet base [9]. SVM
can be used for short-term load forecasting which has higher accuracy than
conventional methods, and takes full account of the various factors affecting the load. It
converges relatively quickly, and can find the global optimal solution [10]. In practice,
it is difficult to determine the most ideal mathematical model among the above
methods, and establish the relational expression between the load and its influence
factors dependably.
There are lots of uncertain factors directly affect the accuracy of load forecasting
in smart grid power system. A variety of external factors which affect the load
forecasting can be concluded from lots of practice, such as date type, load level,
weather conditions, seasonal factors, and social-economic environment, etc. While
weather conditions include temperature, humidity, wind speed, rainfall, and sunshine;
social-economic environment also includes several aspects. And date type also can be
classified as some categories. Therefore data types of the forecasting input are so many,
and the amount of data in smart grid is very large, which can affect the accuracy and
uptime of load forecasting.
In order to solve the above problems, by analyzing the short-term load forecasting
methods and association rule analysis algorithm, a new forecast method is presented in
the paper. The presented method: the Fuzzy Neural Network based on Association
Rules (FNNAR) method, can increase the accuracy of forecast values, and also reduce
the operation time. In this model, correlation between external factors and changes of
electric load can be found in massive data by using association rule analysis. Attribute
reduction based on association rule analysis can be carried on influencing factors of
load forecasting. That can reduce some interference of noise data, eliminate redundant
properties, and improve the effectiveness of load forecast. And the association rules are
excavated by the proposed Association Rules Mining algorithm based on Quantitative
Concept Lattice (ARMQCL algorithm).

1. FNN Load Forecasting Method Based on Association Rules

The presented method: the Fuzzy Neural Network based on Association Rules
(FNNAR) method. Attribute reduction based on association rule analysis is firstly
H. Liu and Y.-H. Han / An Electricity Load Forecasting Method 431

carried on the input of load forecasting. Then, the load forecasting can be obtained by
the FNN model using the input of reduced attributes.

1.1. Reduct Properties Based on Association Rules

Since some irrelevant or unimportant factors can interfere with load forecasting
accuracy in smart grid. Attributes reduction based on association rules is necessary,
which can reduce the influence on forecast accuracy of noise data. So a new Apriori
algorithm fit for smart grid should be proposed primarily.
Association rule analysis is used to reveal interesting links between the basket data
transaction items. Later, it is used to find hidden relationships between the data [11].
Consider database D which includes h transactions. Each transaction is composed of a
number of items. If all itemsets are expressed as I, I= {i1, i2, … ,in}, the association rule
can be expressed as the format of A o B (A  I, B  I, Aˆ B ‡ ). The properties of the
association rules can be described by the following two parameters.
Support: If there are s% of transactions which simultaneously support the
transaction sets A and B among all transactions, then s% is called the support of A o B .
Additionally, support number can be denoted as the ratio of occurrence of the two
items and total number of transactions. Support represents the frequency of the rule,
and is represented as sup(A→B). sup( A o B )=sup( A ‰ B ). Minimum support is
represented as min_sup.
Confidence: The possibility of seeing the consequence of the rule under the
condition that the transactions also contain the antecedent is denoted as confidence.
When c% of transactions simultaneously support the transaction set A and B, that c% is
the confidence of A o B . Confidence represents the intensity of the rule, and is
represented by con ( A o B ). con ( A o B )=sup ( A ‰ B )/sup (A). Minimum confidence is
represented by min_con.
Association rule analysis is used to determine association rules in the transaction
database D that satisfy the given conditions: min_sup, and min_con. Association rule
analysis is generally divided into the following two steps [12]:
Determine all existing frequent itemsets in the transaction database.
Generate association rules by frequent itemsets. That is, if B  A , B z ‡ ,and
con(B o (A B)) t min _con , then itemset A and B constitute the association rule
B o (A B) .

1.1.1. ARMQCL Algorithm


The Association Rules Mining algorithm based on Quantitative Concept Lattice
(ARMQCL) is proposed for smart grid applications to improve the accuracy and time
efficiency, which can reduce the number of scanning the database and the number of
candidate itemsets.
In the formal concept analysis, formal context is usually denoted as C ( X , D, R ) ,
where X represents the object set, D represents the attribute set, R represents the
binary relation between X and D, ie, R Ž X u D . xRd represents “the object x has the
characteristic d ”. There is a particular indexed collection which can represent the
structure of inherent lattice. This lattice structure represents the partial ordering relation
which can describe the relationship between the objects and their attributes in the
context. This structure is defined as Concept Lattice (CL) [13].
432 H. Liu and Y.-H. Han / An Electricity Load Forecasting Method

Definition 1: Each node in the CL is a tuple called a concept, such as (A,B). Where
A implies the extension of concept, and B implies the intension, which are respectively
denoted as Extension(C) and Intension (C).
Definition 2: For C=(A,B), C ' =( A ,B) is the quantitative concept of C. A is the
cardinality of A. The lattice concept constituted by quantitative concept is defined as
Quantitative Concept Lattice (QCL). QCL quantifies the extension of CL, and ignores
the specific information about CL.
1.1.1.1. Construct Quantitative Concept Lattice
Based on the above discussion, the algorithm constructs QCL can be proposed.
Insert (QCL, C): insert a quantitative concept C=(N,B) into QCL.
(1) If C in the QCL satisfies C1i (C1i œ C) , C=(N,B) is inserted in the QCL. Then
the intension of C is merged into the intension of C1i. And the operation of insertion is
finished, that is, Intent(C1i ) Intent(C1i ) ‰ Intent(C) .
(2) Else, on the assumption that C0 is a direct sup-concept of C. C is firstly inserted
as a sub-concept of C0 into the QCL. And we do the following operations:
Make C been the direct-sup-concept of each C1i (C1i<C0,C1i<C), and relieve the
link between C1i and C0.
Insert Ck=(Nk,Bk) into the QCL, which is generated by the concepts of C and
QCL joined.
The generation algorithm of QCL: Create_In_Attr (QCL) is described as follows:
Initialization: Generate QCL with the complete concept O,{} and the null
concept {},all .
For I=1 to n do for Ci  C ai do Insert (QCL, Ci).

1.1.1.2. Association Rules Mining Algorithm Based on Quantitative Concept Lattice


A concept C ( A, B {B1, B2 , , Bk }) in the QCL, support( B) A / M is introduced, M
is the total amount of transaction sets in the database TD.
Definition 3: If Support (B) of C=(A,B) in the QCL is larger than the support
threshold, C is called as the frequent concept, where intension Bi (i 1, 2, , k ) is the
basic frequent items.
Based on the above theorem, the ARMQCL algorithm can be described as follows:
(1) If the concept C1=(N1,B1) and C2=(N2,B2) satisfies C2  sup(C1 ) , the
association rule: B2 Ÿ B1  B2 can be obtained, and its confidence is defined as N1 / N2 ;
otherwise, the obtained rule is B1  B2 Ÿ B2 , and its confidence is 100%.
(2) For C1=(N1,B1), C2=(N2,B2), and there exists nonempty common maximum-
sub-concept C=(N,B), i.e., there is no concept of C3=(N3,B3), namely, which makes
C3  sup(C3 ) š C1  sup(C3 ) š C2  sup(C3 ) established. The association rules between B1
and B2: B1 Ÿ B2 , B2 Ÿ B1 can be obtained, and their confidences are N / N1 , N / N2
respectively.
The pseudocode of ARMQCL algorithm is described as follows:
BEGIN
Setnull(QCL);
C0 (all, ‡) ;
H. Liu and Y.-H. Han / An Electricity Load Forecasting Method 433

Enqueue(QCL, C0 )
WHILE Not empty(QCL) DO
BEGIN
,
L_ node Outqueue(Q) ;
C_node First  direct  sub concept(L, L_ ,
node) ;
WHILE (( C_node ! null ) OR ( C_node ! L_empty-concept_node )) DO
BEGIN
IF NOT ( C_node in QCL)THEN
IF Extent (C_node) / Extent (Ci ) ! Confidence_threshold
THEN Output association rules;
Enqueue(QCL, C_node ):
END;
,
C_node : Next-direct-sub-concept(L, L_node) ;
END;
END.

1.1.2. Generating Association Rules and Reduct Properties


The advantage of the proposed ARMQCL algorithm for smart grid applications is that
the manifestation of extracted rules are more intuitive and compact. And the time
complexity of ARMQCL algorithm is relatively low.
According to the above advantage, ARMQCL algorithm is used to determine some
feasible correlation features between external factors and changes in electric load.
Influence factors are so many, such as, date type, temperature, humidity, and wind
and so on. In this part, date type, temperature, humidity, wind, and sunshine degree are
considered as inputs of ARMQCL algorithm.
The first step is to preprocess the load forecasting historical data in smart grid,
change the data storage structure of historical data, and make the original smart grid
database convert to a Boolean stored in binary form. Each itemset of external factors
are discretized and encoded:
The historical load data is divided into five categories by using the Competitive
Aggregation (CA) algorithm. Historical demand level at each hour is partitioned as
[35.23, 49.99], [53.19, 59.55], [59.58, 63.34], [63.86, 69.04], and [81.12, 82.66].
According to the actual situation, the temperature data is summarized as five
categories. When the average temperature is in [13, 20], the curve of electricity
consumption is stabilized, and less volatile. When the temperature is in [20, 25],
electricity consumption is significantly increased. And when the temperature T>25
degrees, electricity consumption shows a steady growth momentum. Similarly, the
other two range is T  [2,13] and T<2.
The date type is classified as three categories: weekdays, weekends, and holiday.
According to some research about appropriate humidity of human body,
humidity is divided into two categories: Humidity H>0.8 and H<=0.8.
The processing mode of wind is similar to the temperature.
Sunshine degree can be divided into three conditions: sunny, cloudy, rain.
Then the historical database is converted to a Boolean stored in binary form which
can use the binary “0”and “1” to store the transaction set of each attribute.
434 H. Liu and Y.-H. Han / An Electricity Load Forecasting Method

Secondly, the Boolean binary database is the input of ARMQCL algorithm.


Afterwards, some association rules can be excavated which represent the correlation
features of the external factors with electric load. The excavated rules are as follows:
2 6 11
2 6 9
2 11
3 6 10
3 6 11
3 6 13
3 10
3 11
By comparing the classification number when converted to the Boolean database,
corresponding to the external factors item, we can find that the main factors that affect
the load are temperature and date type. That shows the correlation of temperature and
date type with the load is strong. So the properties which affect the load forecasting can
be reduced as temperature and date type. Thus, the attributes reduction is completed by
analyzing association rules.

1.2. Load Forecasting By Using the FNNAR Model

In the new model (FNNAR model), the first step is attribute reduction based on the
detected association rules by the Association Rules Mining algorithm based on
Quantitative Concept Lattice (ARMQCL algorithm). Then, FNN model uses the
reduced attributes as the input to forecast electric load in smart grid.
Fuzzy Neural Network (FNN) model is used to obtain the load forecast, which can
improve the accuracy of load forecasting in smart grid.
FNN implant the fuzzy input signal and fuzzy weight value into the conventional
neural network (such as feed-forward neural network, Hopfield neural network, etc.).
Fuzzy BP model is used in this paper. Specific implementation process is
described in detail in the literature [14], this paper only describes briefly.
(1) Identify the input and output factors of the neural network;
(2) Fuzzified and normalized the relevant data;
(3) Train the network until obtaining a stable output;
(4) Establish the mathematical model;
(5) Forecast load at the target time.

1.2.1. FNNAR Model


The first step of Fuzzy Neural Network based on Association Rules (FNNAR) model is
attribute reduction based on the extracted association rules. Secondly, the load
forecasting can be obtained by using the FNN model.
The detailed processes of the FNNAR method are presented as follows:
(1) Extract association rules by using the ARMQCL algorithm suitable for smart
grid.
(2) Carry out attributes reduction based on the extracted association rules.
(3) Determine the content of load forecasting in smart grid.
(4) Identify and preprocess smart grid historical abnormal data.
(5) Fuzzified and normalized the relevant data.
(6) Forecast the electric load in smart grid by using the FNN model.
H. Liu and Y.-H. Han / An Electricity Load Forecasting Method 435

2. Simulation Analysis

24-hourly historical load data from February to July of Hebei Province is used to
simulate the load forecasting.

350
actual value
FNNAR Predictive
BP Predictive
300

250
load/MW

200

150

100
0 5 10 15 20 25
time/h

Figure 1. Comparison of actual and prediction value


Figure 1 shows the comparison between the load forecasting curve and the actual
load curve. For further proving the forecasting ability of FNNAR model, BP model is
used for comparison. As it can be seen, fitting result of FNNAR is better than BP
model. It proves that the forecasting performance of FNNAR model outperforms BP
model. FNN can surmount the disadvantages of single neural network, which can make
the network more easily to catch the nonlinear relationship between the input and
output, thereby improving the prediction accuracy.

8
not Using the attribute reduction
7 Using the attribute reduction

5
Relative Error/%

0
0 5 10 15 20 25
time/h

Figure 2. The forecasting performance (Relative Error)

Figure 2 shows the comparison of the relative error whether the attributes reduction
based on the detected association rules is carried out or not. Since attribute reduction
can exclude the influence of some irrelevant factors, and reduce the interference of
436 H. Liu and Y.-H. Han / An Electricity Load Forecasting Method

noise data. As it can be seen, when forecasting attributes are reduced, the forecast
accuracy is enhanced.
Table 1. The running time of models
Time
not use Attribute use Attribute
Reduction Reduction
Date
d 9.384 1.692
d+1 8.684 1.569
d+2 8.326 1.487
d+3 8.544 1.501
d+4 8.366 1.493
d+5 8.621 1.546
d+6 8.457 1.472
Attribute reduction based on association rules analysis can significantly reduce the
size of the input set for FNN model, and reduce the running time. Table1 shows that
running time of the model which uses attribute reduction is less than the other one.

3. Conclusions

In this paper, a new load forecasting method adapted to smart grid called Fuzzy Neural
Network based on Association Rule mining (FNNAR) is proposed. FNNAR model
firstly carries out attribute reduction based on association rule analysis in order to
exclude the influence of some irrelevant or unimportant factors on load forecasting,
reduce the interference of noise data, and reduce the size the input set of FNN model
significantly. The Association Rules Mining algorithm based on Quantitative Concept
Lattice (ARMQCL algorithm) is proposed to extract association rules. Experiment
results indicate that the proposed method has better forecast precision by offering
smaller forecast errors. Another advantage of FNNAR method is that it needs less
running time. The experiment results prove that FNNAR model is efficient and feasible,
which can be better applied to real conditions of an electricity market in smart grid.

Acknowledgment

This work is partially supported by the National Natural Science Foundation of China
under Grant No.61104005 and 61374097, by Natural Science Foundation of Liaoning
Province under Grant No.201202073, and by the Central University Fundamental
Research Foundation under Grant.N142304004.

References

[1] J. W. Cao, et al., Information system architecture for smart grid, Chinese Journal of Computers, 1(2013),
143-167.
[2] X. Fang, S. Misra, G. L. Xue, and D. J. Yang, Smart Grid-The New and Improved Power Grid: A Survey,
IEEE Communications Surveys & Tutorials, 4(2012), 944-980.
[3] Y. Ye, Y. Qian, H. Sharif, and D. Tipper, A Survey on Smart Grid Communication Infrastructures:
Motivations, Requirements and Challenges, IEEE Communications Surveys & Tutorials, 1(2013), 5-20.
H. Liu and Y.-H. Han / An Electricity Load Forecasting Method 437

[4] S. Y. Chen, S. F. Song, L. X. Li, and J. Shen, Survey on Smart Grid Technology, Power System
Technology, 8(2009), 1-7.
[5] H. Y. Zhao, L. C. Cai, and X. J. Li, Review of Apriori algorithm on Association Rules Mining, Journal
of Sichuan University &Engineering, 1(2011), 66-70.
[6] N. H. Liao, Z. H. Hu, Y. Y. Ma, and W. Y. Lu, Review of the short-term load forecasting methods of
electric power system, Power System Protection and Control, 1(2011), 147-152.
[7] P. R. Ji, J. Chen, and W. C. Zheng, Theory of grey systems and its application in electric load forecasting,
Proc. of Cybernetics and Intelligent Systems, 2008 IEEE Conference on IEEE, Chengdu, China, 2008,
1374-1378.
[8] X. H. Du, T. Feng, and S. Tan, Study of Power System Short-term Load Forecast Based on Artificial
Neural Network and Genetic Algorithm, Proc. of International Conference on Computational Aspects of
Social Networks, CASoN 2010, Taiyuan, China, 2010, 725-728.
[9] D. H. Zhang, and S. F. Jiang, Power Load Forecasting Algorithm Based on Wavelet Packet Analysis,
Proc. of Electric Power System & Automation, 2(2004), 987-990.
[10] M. G. Zhang and L. R. Li, Short-term load combined forecasting method based on BPNN and LS-SVM,
Power Engineering and Automation Conference (PEAM), 2011 IEEE, 1(2011), 319-322.
[11] S. Liu, Y. J. Yang, D. X. Chang, and W. Qiu, Improved fuzzy association rule and its mining algorithm,
Computer Engineering and Design, 4(2015), 942-946.
[12] S. Mallik, A. Mukhopadhyay, and Ujjwal Maulik, RANWAR: Rank-Based Weighted Association Rule
Mining from Gene Expression and Methylation Data, IEEE Transactions on Nanobioscience, 1(2015),
59-66.
[13] D. X. Wang, X. G. Hu, H. Wang, Algorithm of mining association rules based on Quant itative Concept
Lattice, Journal of Hefei University of Technology (Natural Science), 5(2002), 678-682.
[14] H. Y. Yu, F. L. Zhang, Short-Term Load Forecasting Based on Fuzzy Neural Network, Power System
Technology, 3(2007), 68-72.
438 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-438

The Improved Projection Pursuit


Evaluation Model Based on Depso
Algorithm
Bin ZHU a,1 and Wei-Dong JIN b
a
School of Electronic Information Engineering, Yangtze Normal University,
Chongqing, 408100, China
b
School of Electrical Engineering, Southwest Jiaotong University, Chengdu, Sichuan,
610031, China

Abstract. In this paper, a new projection pursuit model based on differential


evolution & particle swarm optimization (DEPSOPP) is proposed. Firstly, a hybrid
differential evolution particle swarm optimization algorithm (DEPSO) is proposed
through the combination of the improved particle swarm optimization algorithm
with variable weights (VWPSO) and differential evolution algorithm (DE).
Secondly, the optimization of the projection direction is the key of the projection
pursuit algorithm. The hybrid DEPSO algorithm would be applied in the
optimization of the projection direction. Finally, the experiment results of test
functions show that the new hybrid DEPSO algorithm has the excellent
convergence, accuracy and robustness. This indicates that the DEPSOPP
evaluation model is a better feature assessment model.

Keywords. evaluation model, particle swarm optimization, projection pursuit,


differential evolution

Introduction

The projection pursuit algorithm (PP) is a statistical method for processing non-normal
data [1]. Its idea is to project high-dimensional data to a low-dimensional subspace in
accordance with the needs of practical problems. It measures the probability that
projection reveal some structure of the data by using projection index function. Then, it
analyzes the structural features of high-dimensional data based on the projected value
[2-3]. But the computation of PP is very large. So, the authors propose a new
differential evolution particle swarm optimization projection pursuit algorithm through
the combination of VWPSO and DE, and use this algorithm to realize the optimization
of the projection direction of PP. Experimental results show that the DEPSOPP has
satisfactory effects in optimizing accuracy, convergence and robustness.

1
Corresponding Author: Bin ZHU, School of Electronic Information Engineering, Yangtze Normal
University, Chongqing, 408100, China; Email: zb8132002@163.com.
B. Zhu and W.-D. Jin / The Improved Projection Pursuit Evaluation Model 439

1. Construction of the DEPSOPP Evaluation Model

The particle swarm optimization algorithm (PSO) is a global optimization algorithm.


It retains the global search strategy based on the population, and has dual advantages of
the self-learning and learning from others. Therefore, it can find the optimal solution in
the little iteration. [4-5].
In standard particle swarm optimization algorithm (SPSO), each particle is the
solution of the optimization problem. All particles have a fitness value that is
determined by the optimization function. The velocity and position update of the d-
dimensional i-th particle respectively according to the Eq. (1) and Eq. (2).

vid (t  1) w u vid (t )  c1 u rand1d u ( pBestid (t )  xid (t ))  c2 u rand 2d u ( gBest d (t )  xid (t )) (1)

xid (t  1) xid (t )  vid (t  1) (2)

In Eq. (1), v represents the flight speed of the particle. pBest is the local optimum.
gBest represents the global optimum. Inertia weight is w . c1 and c2 are the
acceleration coefficients. rand1d and rand 2d are two random number on the interval of
[0, 1] in Eq. (1). In iteration, the particle is updated by individual extreme pBest and
global extremum gBest .
Since PSO is easy precocious, and is prone to oscillation in the vicinity of the
global optimal solution in the late [6], the authors propose the VWPSO. Weight change
is made according to the Eq. (3). Initial inertia weight is the biggest, which is
conducive to perform global search. Late inertia weight is smaller, which is conducive
to conducting an effective search in the vicinity of the current local extreme points.
Thus, the global and the local search ability of PSO are enhanced.

(t  1) ( wmax  wmin )
w wmax  ,t t 1 (3)
tmax  1

Where wmax denotes the maximum weight. wmin is the minimum weight. tmax is
the maximum number of iterations. t is the current iteration number.

1.1. DEPSO Algorithm

Although the PSO has many advantages, it also has many problems, such as easy to fall
into local extremum, low searching accuracy and slow convergence speed in the late
evolutionary. Therefore, in order to improve the optimization ability and robustness of
PSO, the authors consider the introduction of DE and hope to improve particle swarm
algorithm through the superiority of DE in the maintenance of population diversity and
search ability.
Consider the minimum optimization problem of function f ( x ) .

min f ( x) , x [ x1, x2 , , xn ] , lk d xk d uk , k 1,2, ,,n


n (4)
440 B. Zhu and W.-D. Jin / The Improved Projection Pursuit Evaluation Model

lk and uk denote the lower and upper search bounds of the k-dimensional
variables, n is the variable dimension. Assuming that xi (t ) [ xi (t ), xi (t ), , xi (t )] is
1 2 n

the t generation population of i-th individuals, x1 , x2  x , then the difference vector


composed by them is 'd12 x1  x2 . Input parameters include particle population
number and maximum number of iterations. The output is an extreme value pgd of the
objective function.
The improved particle swarm optimization algorithm and differential evolution
algorithm are combined to get a new optimization algorithm. The whole procedure of
DEPSO algorithm is shown in Figure 1.

Initialize the PSO and DE


populations

The individual fitness fit_pso of PSO and fit_de of


DE were calculated. The initial individual optimal
pid(i) and the global optimum pgd were set.

Set iteration
condition

According to the global optimum pgd, the position


& the velocity of PSO particle are updated. Then
complete the variation, cross and selection of the
individual of DE algorithm.

The fitness fit_pso of PSO and fit_de of DE


algorithm were recaLculated. Then, update the
individual optimal pid(i) and the global optimum
pgd through the comparison of fit_pso, fit_de and
pid(i).

N
The termination condition is
satisfied?

End

Figure 1. The flow chart of the DEPSO algorithm


B. Zhu and W.-D. Jin / The Improved Projection Pursuit Evaluation Model 441

1.2. DEPSOPP Evaluation Model

1.2.1. Projection Objective Function


According to the evaluation index system, the original projection data is prepared.
Assuming that n is the sample volume. n p is the index number. x * (i , j ) represents i-
th sample value of the j-th index. i 1,2, n . j 1, 2, , n p . xmin ( j ) and xmax ( j )
,,n
represent the minimum and maximum values of the j-th index respectively. According
to the Eqs. (5) and (6) the projection data are normalized.

x* (i, j )  xmin ( j )
x (i , j ) , x* (i, j ) is the benefit index. (5)
xmax ( j )  xmin ( j )

xmax ( j )  x* (i, j )
x (i , j ) , x* (i, j ) is the cost index. (6)
xmax ( j )  xmin ( j )

Assuming that a ( a (1), a (2), , a ( n p )) is the projection direction. The one-


dimension value z (i ) of x(i, j ) in the projection direction is obtained by the PP.

np

z (i ) ¦ a( j ) x(i, j)',
j 1
i 1,2, , n (7)

In practical application, the variation information in x(i, j ) should to be extracted


by the projection value z (i ) as large as possible. And the variation information in
x(i, j ) is extracted as much as possible. Therefore, the standard deviation sz of z (i )
should be as large as possible. Moreover, the absolute value rzy of the correlation
coefficient z (i ) and y (i ) should be as large as possible. The projection objective
function should be selected as follows.

f ( a) s z rzy (8)

1.2.2. Objective Function Optimization


The projection index function f (a) changes with the projection direction a . The best
projection direction is determined by solving the maximum value of the projection
objective function. Therefore, the problem is transformed into a nonlinear optimization
problem with a ( a (1), a (2), , a( n p )) variables. If the nonlinear problem is solved by
DEPSO algorithm proposed above, we can obtain the best projection direction. Optimal
projection vectors are normalized, after that, we can obtain the index weight of
evaluation index.
442 B. Zhu and W.-D. Jin / The Improved Projection Pursuit Evaluation Model

­ a arg ma
max{sz rzy }
°
° a

® np (9)
° s.t ¦ a ( j ) 1, 0 d a( j ) d1
2

°̄ j 1

Based on the above assumption, the problem of weight solving has been
transformed into the extremum problem of the objective function. In this case, the
weight is corresponding to the projection direction vector a of PP algorithm.
Fortunately, the hybrid DEPSO algorithm is good at seeking the extremum of the
objective function. After solving the extremum of the objective function that presented
by Eq. (9), we can get the projection direction a ( a (1), a (2), , a( n p )) , and then, we
can get the weight of different evaluation index.

2. Testing of DEPSO Algorithm

The performance of optimization algorithm should be tested before applying. So, three
kinds of typical test functions were selected to do the test analysis. They are
Rosenbrock function, Rastrigrin function and Girewank function.
We have known that the Rosenbrock function is a unimodal function. The
extremum seeking of Rosenbrock function is slower, which makes the optimization
algorithm hard to get the global optimal extremum. This can be used to investigate the
local optimization ability of the optimization algorithm effectively. The Rastrigrin
function and Girewank function are multimodal functions. The optimization of multi
extremum function is easy to fall into local extremum. Therefore, the optimization of
the Rastrigrin and Girewank function can effectively detect the global optimization
capability of the DEPSO. Based on the above analysis, we carried out the following
experiment.
Table 1. Result for all algorithms on benchmark functions

benchmark
Fitness SPSO VWPSO DE DEPSO
functions
best 8.7682E+00 3.0159E-02 1.0109E-02 3.8032E-05
worst 3.7592E+01 9.4936E+00 8.7380E+00 1.4919E-01
f1
mean 1.8756E+01 3.6217E+00 3.5090E+00 2.5069E-02
Std. 7.6947E+00 2.3592E+00 2.6193E+00 3.5839E-02
best 2.9028E+00 9.9772E-01 3.1064E+00 0.0000E+00
worst 2.0098E+01 2.3879E+01 1.0203E+00 3.9798E+00
f2
mean 1.0004E+01 1.1741E+01 6.6814E+00 1.3012E+00
Std. 3.8116E+00 4.6106E+00 1.8781E+00 1.3031E+00
best 6.1226E-02 4.6796E-02 6.0007E-03 0.0000E+00
worst 5.3671E-01 3.3443E-01 7.2649E-02 4.4947E-02
f3
mean 2.4422E-01 1.5402E-01 2.7571E-02 5.6192E-03
Std. 1.2565E-01 7.3302E-02 1.6244E-02 1.0732E-02
B. Zhu and W.-D. Jin / The Improved Projection Pursuit Evaluation Model 443

The population size of different algorithms is 40. Maximum number of iterations is


2000 times. SPSO parameter setting is w=0.9, c1=c2=2. The parameter settings of
VWPSO is wmax=0.9, wmin=0.4, c1=c2=2. The parameter settings of DE is cr=0.9, f=0.5.
The parameter settings of DEPSO are the same as defined in VWPSO and DE. Each test
function experiments run 30 times. Experimental results are shown in Table. 1.
As can be seen from Table.1, the search accuracy and convergence of DEPSO
algorithm are better than SPSO, VWPSO and DE algorithm. Meanwhile, we can see
that the robustness of DEPSO algorithm is better than SPSO, VWPSO and DE
algorithm from the variance value of the fitness. Rastrigrin functions and Girewank
functions are multimodal function. They require the algorithm has higher global search
capability. As can be seen from Table.1, the convergence of the DEPSO algorithm is
very good, and the solution accuracy is the best in all algorithms. DEPSO algorithm
can always find the optimal value of the test function. It can also be seen from the
Table.1 that the robustness of the DEPSO algorithm is also very good.

3. Conclusions

A new feature evaluation model (viz. DEPSOPP model) is presented in this paper. The
authors achieve the projection vectors optimal of PP algorithm by combining the
VWPSO and DE algorithm. The simulation results show that the algorithm has the good
convergence, robustness and accuracy. The aim of this paper is to solve the evaluation
problem of radar emitter signals features, and there is no real-time requirement.
Therefore, the complexity of the algorithm is less considered.

Acknowledgments

This work was supported by the state key program of national natural science of China
(Grant No. 61134002), the natural science foundation of Chongqing City (Grant No.
CSTC2013JCYJA70010) and the science research project of Chongqing Education
Commission (Grant No. KJ1401224).

References

[1] M. Yan, Y. X. Zhao, L. G. Wu, et al. Navigability analysis of magnetic map with projecting pursuit-
based selection method by using firefly algorithm, Neurocomputing 159(2015), 288-297.
[2] H. L. Zhang, C. Wang, W. H. Fan, A projection pursuit dynamic cluster model based on a memetic
algorithm, Tsinghua Science and Technology 20(2015), 661-671.
[3] Y. Su, S. G. Shan, X. L. Chen, et al. Classifiability-based discriminatory projection pursuit, IEEE
Transactions on Neural Networks 22(2011), 2050-2061.
[4] I. Musa, S. Gadoue, B. Zahawi, Integration of Induction Generator Based Distributed Generation in
Power Distribution Networks Using a Discrete Particle Swarm Optimization Algorithm, Electric Power
Components and Systems 44(2016), 268-277.
[5] M. J. Mahmoodabadi, M. B. S. Mottaghi, A. Mahmodinejad, Optimum design of fuzzy controllers for
nonlinear systems using multi-objective particle swarm optimization, Journal of Vibration and Control
22(2016), 769-783.
[6] S. Mirhadi, N. Komjani, M. Soleimani, Ultra-wideband antenna design using discrete Green's functions
in conjunction with binary particle swarm optimization, IET Microwaves, Antennas and Propagation
10(2016), 184-192.
444 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-444

HRVBased Stress Recognizing by Random


Forest
Gang ZHENGa,b,1,Yan-Hui Chen a,b and Min DAI a, b
C
Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology,
Tianjin University of Technology, Tianjin, China
D
School of Computer and Communication Engineering, Tianjin University of
Technology, Tianjin, China

Abstract. When attempting to recognize mental stress using heart rate variability
(HRV), single classification models tend to have lower accuracy in detecting
different stress levels accurately, and are likely to lead to over fitting, therefore
affecting the accuracy of stress recognition. This study employed the ensemble
learning method of random forests (RF) and proposed a method to recognize stress
by using HRV. By analyzing the short-term (120-180 sec) electrocardiography
(ECG) data of the subjects during a stress-inductive video game, we extracted their
HRV readings using a time-domain method, frequency-domain method, and
non-linear method. Next we constructed a stress recognition model based on the
RF technique, which was able to identify low, medium, and high level of stress.
Then the model was applied to 200 groups of stress level data collected from the
10 subjects. The results showed that, compared to traditional k-nearest neighbor
(KNN) and logistic regression (LR) methods, the RF model could be used to
automatically detect and identify stress of different levels with a higher level of
accuracy, and with 90% accuracy in recognizing higher levels of stress.

Keywords. heart rate variability, stress recognition, logistic regression, random


forests

Introduction

Psychological stress is an important factor that affects individuals’ lives in modern


society. Effective monitoring and evaluation of mental stress plays a significant role in
improving individuals’ mental and physical health, as well as enhancing their quality of
life. HRV is a technique that extracts physiological information from ECG readings,
and has been proven to be an effective method of detecting stress [1-3].
Recent years, studies utilizing HRV to monitor and recognize stress usually only
extract stress-related features by using time-domain methods and frequency-domain
methods, and then applying two analysis methods: first, a statistical analysis [4-7],
followed by a stress recognition model [8-11]. Statistical analysis is done through
analyzing the changes in HRV indicators observed under different states of stress,
identifying the correlation between HRV and stress, a commonly used method of
detecting change in mental stress. Subahni et al [4] collected ECG readings from six

1
Corresponding Author: Gang ZHENG, Professor, School of Computer and Communication Engineering,
Tianjin University of Technology, Room 317, #7 Building, Tianjin, China; E-mail:
kenneth_zheng@vip.163.com
G. Zheng et al. / HRVBased Stress Recognizing by Random Forest 445

subjects to extract the HRV features for the time and frequency domains for further
analysis, and proved that HRV could be used to predict mental stress by detecting
changes in the autonomous nervous system (ANS). Construction of stress recognition
models refers to using computer analysis to establish a HRV-stress relationship model
to analyze and detect stress. A well-developed model will usually have higher accuracy
in detecting stress, when compared to the observation of statistical analysis. Currently,
analysis methods that utilize HRV in detecting stress include: K-nearest neighbors
(KNN), probabilistic neural networks (PNN), linear discriminant analysis (LDA), and
fuzzy clustering. Karthikeyan et al. [8] adopted the Strop color word test, applied PNN
and KNN classifiers to classify features of short-term (32s) HRV and ECG readings,
and achieved a 91% accuracy rate in recognizing the two statuses: stressed and normal.
However, most of the stress recognition methods with HRV tend to apply a traditional,
single classification model to detect stress, which is likely to cause over fitting and
affect the accuracy in stress recognition.
This article introduced an RF technique into the ECG-based stress analysis,
combining the feature set of HRV extracted from time-domain, frequency-domain, and
non-linear methods to construct a model of mental stress recognition, to identify
different levels of stress.

1. Stress Induction and Extraction of HRV Features

1.1. Stress Induction and Label Assessment

Traditional laboratory stimuli that are used to induce stress include color word-based
tests, mental arithmetic tasks, pictures tests, video tests, and video game tasks[12-14].
Given that induction of mental stress is likely to be affected by the differences in
factors such as personal experience and psychological quality, it is hard to determine
the label of stress levels in most traditional stress induction experiments. In order to
scientifically determine the labels, the study employed a video game task as it has
higher stimulating effect on the senses. A multitude of difficulty levels were set to
induce mental stress. At the same time, the facial expression of the subjects and the
game parameters were recorded to assist the assessment of stress labels. In addition to
using participants’ subjective answers to a questionnaire to evaluate their stress level,
their facial expressions, and the game parameters were also recorded. Human
observation on the participants’ facial expression was applied in the experiment to
assess their level of commitment. If the participant looked absent-minded, then the data
would be removed from the data set to avoid label confusion. Game parameters refer to
the information that could reveal the subjects’ stress level. One such example was the
number of mistakes that was made by a subject, as making an excessive amount of
mistakes tends to cause greater mental stress. Therefore, we set difficulty levels for the
game, subsequently introducing facial expressions and game parameters as objective
indicators to further verify the results of subjects’ self-reported stress levels in the
questionnaire, so as to achieve more accurate labels for stress levels.

1.2. Extraction of HRV Features

In order to recognize mental stress with HRV signals, features defined from the
original ECG readings that can reflect HRV information needs to be acquired.
446 G. Zheng et al. / HRVBased Stress Recognizing by Random Forest

Therefore, preprocessing of the ECG data is needed for this study, as well as extracting
feature parameters of the HRV signals.
The ECG signal of human body is a weak signal with a low signal-to-noise ratio
(SNR). A normal ECG signal frequency ranges from 0.05 to 100 Hz, whilst 90% of
spectral energy of the signal distributes between 0.25 and 35 Hz[15] .The acquisition of
ECG signals is likely to be subject to the interference of various amounts of noise.
Therefore, it is necessary to remove the noise generated during the ECG acquisition
process to effectively detect and locate the R-wave peak in ECG signals and to obtain
accurate HRV signals. Hence, we preprocessed the original ECG signal for our study.
Figure 1 shows processing steps of the original ECG signal: First, an ECG was used to
obtain the original ECG readings. Then, a wavelet thresholding technique [16] was
adopted to remove noise from the original signals. The coif4 wavelet function was used
and an initial threshold was set to remove the noise and baseline drift from the original
ECG signal. Next, we applied a window thresholding method to detect the R-wave
peak in the ECG waveform, after noise reduction. Lastly, the RR intervals were
acquired based on the R-wave peak positions, the ectopic beats were removed, and the
time series of the RR intervals or HRV signals, were obtained.

Figure 1.Extract HRV signals from ECG

Using HRV to recognize individuals’ mental stress requires extracting information


regarding feature parameters that can reflect HRV information. The HRV features used
in this study were mainly extracted through the time-domain, frequency-domain, and
non-linear methods.
The time-domain method was used to analyze the variations in heart rate, detect
transient changes of heart rate when being stimulated, and measure HRV. We selected
three commonly-used indicators that are suitable for short-term HRV analysis from all
statistical indicators that uses sequences of normal RR intervals: the standard deviation
of NN intervals (SDNN), the root mean square of successive differences between
adjacent NNs (RMSSD), the proportion of the number of pairs of successive NNs that
differ by more than 50 ms divided by total number of NNs (pNN50).
Frequency-domain methods can be used to describe the distribution of energy by
assigning complex heart rate fluctuation signals to different frequency bands, so that
the effect of various physiological factors could be properly separated before further
analysis [17]. The variation of RR intervals can reflect the activities of sympathicus
and vagus nerves in the ANS; however, the time-domain statistical indicators of HRV
G. Zheng et al. / HRVBased Stress Recognizing by Random Forest 447

cannot be effectively demonstrated. Whereas the frequency-domain method can


provide a quantitative description of the regulatory impact that sympathicus and vagus
nerves have on the heart rate. Given that RR intervals are not uniform, resampling is
usually required in order to obtain uniform RR intervals, and acquire a spectrum with
Fast Fourier Transformation (FFT). We employed a more effective method, the
Lomb-Scargleperiodogram [18], for the spectral analysis of the non-uniform RR
intervals to avoid data loss caused by the resampling process. Using the
Lomb-Scargleperiodogram technique to calculate how the spectrum of RR intervals
changes over time, we extracted three commonly used HRV spectral indices: very low
frequency (VLF), low frequency (LF), high frequency (HF), and the ratio between the
low and high frequency (LF/HF).
HRV is believed to be non-linear as it is affected by the complex interaction of
hemodynamic, electrophysiology, body fluid, and neuromodulation. Poineare plot is a
commonly used method for non-linear HRV analysis. We used two feature indicators
that are used to describe a Poineare plot: the vector length Index (VLI) and vector angle
index (VAI), to analyze the impact of different states of stress on HRV.

2. Stress Recognition Model Based on Random Forests Ensemble Learning

2.1. Random Forest Classifier Model

RF is a relatively new ensemble learning method. It randomizes samples and features


to generate several tree models, and then votes (classification) on or computes the
means (regression) of the prediction of these models to get the final result. A large
number of theoretical and empirical studies have proved that RF has a high prediction
accuracy rate, a good tolerance of outliers and noise, as well as a low likelihood of
overfitting [19-20].

2.2. Stress Classification Models Using HRV

In our study, the training set represented by the features of HRV and the stress label is
expressed as
T = {(x1 , y1 ), (x 2 , y 2 ), , (x N , y N )} (1)
Wherein, xi ∈ χ ⊆ R n is the feature vector of HRV, yi ∈ γ = {c1 , c2 ,  cm } is
the classification of the HRV information: if m = 2, then c1 = 0, indicating a state of
relaxation c2 = 1 represents (low, medium, high) states of stress. We employed the RFC
model to establish the relationship between HRV and the states of mental stress, to
achieve the purpose of recognizing various states of stress through the HRV data, or to
permit a HRV feature vector x to get its classification y. The following steps were
adopted to construct the stress recognition model based on RF:
1) In a given HRV training set T, a training sample was acquired through N
sub-sampling with replacements.
*
2) For each acquired sample T , one classification tree model h (x, Θ1 ) was
established. During the model construction process, supposing the sample training set
has M HRV features, then Mtry (Mtry< M) features would be randomly selected from the
M features for each node in each tree, and select one of the M try features to split,
448 G. Zheng et al. / HRVBased Stress Recognizing by Random Forest

according to the principle of minimum impurity of nodes. This process was repeated to
construct each node in the classification tree model until the HRV vector of each
sample could be accurately classified, or all the features were used, at which the growth
of the classification tree ceased.
3) Step 1) and 2) were repeated to establish k classification trees. No pruning is
performed after the trees were established.
4) Each classification tree model would predict the input HRV feature vector to
obtain k results.
5) The final classification results y would be decided by voting on the predicted
results by k classification trees:
k
y = arg max ∑ I (h i ( x ) = c j ), j = 1,2,  m (2)
cj
i =1

Wherein, Ki is a single CART model, y represents the output variables, i is the


indicator function.

3. Experiment and Results

3.1. Data Collection

We employed a self-developed single-lead ECG with connected V5 leads, through


three points on the frontal chest area. The ECG sampling frequency was set at 250 Hz.
The subjects of the experiment were 10 students (5 male and 5 female) aged between
18 and 25 years old, with good health conditions. 20 groups of data were collected for
each subject, with 200 data sets in total.
We utilized the mobile version Rhythm Masters Game with different difficulty
levels to induce a low level, medium level or high level of stress, whilst the state of
relaxation was acquired when the subjects were sitting quietly. While acquiring the
ECG readings, a camera was used to record the facial expressions of the subjects and
reproduce the operational parameters of the game when it ended to assist the stress
labeling process.
When the experiment was completed, subjects were asked to fill out a
questionnaire to examine whether they perceived stress during the experiment, and rate
their stress level, with a scale of: highly stressed, stressed, slightly stressed, or not
stressed at all. In addition to the adoption of a subjective questionnaire, the experiment
also introduced objective information, such as the facial expressions of the subjects and
the operational parameter of the game to verify the subjective evaluation of the stress
level. Four types of stress level labels were obtained in the end.

3.2. Results and Analysis

In order to investigate the performance of HRV stress recognition with the RFC, we
conducted a comparative analysis between our model and KKN and LR models, which
were used in the most relevant studies. We employed three models to identify the
states: relaxed and low-stress, relaxed and medium-stress, and relaxed and high-stress
separately. For every instance of recognition, a 5-fold cross-validation method was
adopted for adjustment. The data were randomly divided into five portions, wherein
G. Zheng et al. / HRVBased Stress Recognizing by Random Forest 449

four portions were used as training sets, and one reserved as a testing set. The receiver
operating characteristic (ROC [21]) curve and the average accuracy rate from twenty
5-fold cross-validation tests were used as the performance evaluation index for the
model.
Figure 2 shows the value of the ROC curve and the area under the curve (AUC) of
the three models, KNN, LR, RFC, under the condition of the recognition target being
set as: relaxed and low-stress, relaxed and medium-stress, and relaxed and high-stress
respectively. The diagonal dotted line shows the result of a random guess, the AUC of
which is 0.5.
The closer the ROC curve is to the upper left corner, the larger the AUC, and the
better the performance of the model. Whereas the closer the ROC curve is to the
diagonal dotted line, the closer AUC is to 0.5, and the closer the performance of the
model is to a random result. According to Figure 2, when recognizing the states of
relaxed and low-stress, the AUC of KNN, LR, and RFC model were 0.62, 0.75, and
0.78 respectively. The performance of the LR and RFC was close, whilst the
performance of the KNN was relatively poorer than the other two models. The overall
performance of the three models in stress recognition was not particularly accurate.
When recognizing the states of relaxation and medium-stress, the AUC of the KNN,
LR, and RFC models were 0.72, 0.90, and 0.95 respectively. The RFC model achieved
a good recognition performance, followed by the LR model, whilst the recognition
performance of the KNN was still under standard. When recognizing the states of
relaxation and high-stress, the AUC of KNN, LR, as well as the RFC model AUC were
0.88, 0.96, and 0.96 respectively. In this situation, the recognition performances of all
three models were acceptable. Thus, with the increase of the stress level, the
recognition performance of the three models improved accordingly, whilst the RFC and
LR seemed to have better recognition performance than the KNN.
Table 1.Classification Results of the Three Classifier for Relax and Different Level of Stress States
State Classifier Accuracy rate Key parameters
KNN 60.38 K=7
Relaxed/Low Stress LR 75.43 penalty='l2',tol=0.0001
RFC 73.35 Tree_nums=1000, Mtry=4
KNN 64.46 K=7
Relaxed/Medium Stress LR 89.94 Penalty=‘l2’, tol=0.0001
RFC 91.03 Tree_nums=1000, Mtry=4
KNN 84.51 K=7
Relaxed / High Stress LR 93.45 Penalty=‘l2’, tol=0.0001
RFC 93.88 Tree_nums=1000, Mtry=4
The model parameters were adjusted with 5-fold cross-validation. Table 1 shows
the optimal parameters and recognition accuracy of the three models when recognizing
relaxed and low-stress, relaxed and medium-stress, and relaxed and high-stress.
In the above table, an l2 penalty was used as the regularization constraint,
“tol=0.0001” was defined as the tolerance of stop training. “Tree_nums=500” of the
RFC represents the number of element classifiers in RF, “M try=5” refers to the number
of randomly selected feature subsets.
450 G. Zheng et al. / HRVBased Stress Recognizing by Random Forest

Figure 2. The ROC of KNN, LR, and RFC Model When Recognizing Relax and Low Stress, Relax and
Medium Stress, and Relax and High Stress

4. Conclusions

This study provided an automatic stress recognition method based on the RF technique
and the utilization of HRV signals, which can help individuals monitor and recognize
mental stress. A game task was applied to induce the state of relaxation and low-level,
medium-level, or high-level mental stress, HRV of ECG signal were obtained, and
features of the HRV were extracted. After computing by KNN, LR, and RFC models,
their recognition accuracy of relaxed and the three stress levels were acquired. The
computation complexity of KNN and LR is O(n), and that of RFC is O(mnlogn), n is
the amount of sample, m is the amount of feature. Since KNN’s computation
complexity is raised by n, the number of samples. Therefore, its computation
complexity is rather low, which was easy to be performed. The computation
complexity of REC is depended on n, and m (number of feature), which is means that
its computation time is longer than KNN, but it is still acceptable. The results showed
G. Zheng et al. / HRVBased Stress Recognizing by Random Forest 451

that HRV can be used to recognize changes in the state of mental stress, and is
especially sensitive when the stress levels are high. In addition, a comparative analysis
of the three classifiers revealed that when using HRV to recognize mental stress, the
RFC model appeared to have better overall recognition performance, and its
recognition accuracy was greater than 90%, when the stress levels were high.

Acknowledgment

This paper was supported by Tianjin Natural Science Foundation (16JCYBJC15300)

References

[1] P. Ferreira, P. Sanches, K. Höök, et al. How to empower users to cope with stress, Proceedings of the 5th
Nordic conference on Human-computer interaction, 2008, 123-132.
[2] J. P. Niskanen, M. P. Tarvainen, P. O. Ranta-Aho, et al. Software for advanced HRV analysis. Computer
methods and programs in biomedicine, 1(2004), 73-81.
[3] D. W. Rowe, J. Sibert, Irwin D. Heart rate variability: Indicator of user state as an aid to
human-computer interaction. Proceedings of the SIGCHI conference on Human factors in computing
systems, 1998, 480-487.
[4] A. R. Subahni, L. Xia, A. S. Malik. Association of mental stress with video games, The 4th IEEE
International Conference on Intelligent and Advanced Systems (ICIAS), 2012, 82-85.
[5] C. Wang, F. Wang. An emotional analysis method based on heart rate variability, 2012 IEEE EMBS,
2012, 104-107.
[6] J. Zhang, A. Nassef, M. Mahfouf, et al. Modelling and analysis of HRV under physical and mental
workloads. Modeling and Control in Biomedical Systems. 1(2006), 189-194.
[7] Y. H. Lee, V. Shieh, C. L. Lin, et al. A Stress Evaluation and Personal Relaxation System Based on
Measurement of Photoplethysmography, The Second IEEE International Conference on Robot, Vision
and Signal Processing (RVSP), 2013, 182-185.
[8] P. Karthikeyan, M. Murugappan, S. Yaacob. Detection of Human stress using Short-Term ECG and
HRV signals. Journal of Mechanics in Medicine and Biology, 2(2013), 1-29.
[9] B. Kaur, J. J. Durek, B. L. O'Kane, et al. Heart rate variability (HRV): an indicator of stress, SPIE
Sensing Technology + Applications International Society for Optics and Photonics, 2014:
91180V-91180V-8.
[10] J. Choi, R. Gutierrez-Osuna. Using heart rate monitors to detect mental stress. 6th IEEE International
Workshop on Wearable and Implantable Body Sensor Networks, 2009, 219-223.
[11] M. Kumar, M. Weippert, R. Vilbrandt, et al. Fuzzy evaluation of heart rate signals for mental stress
assessment. IEEE Transactions on Fuzzy Systems, 5(2007), 791-808.
[12] M. Svetlak, P. Bob, M. Cernik, et al. Electrodermal complexity during the stroopcolour word test.
Autonomic Neuroscience, 1(2010), 101-107.
[13] C. Ring, M. Drayson, D. G. Walkey, et al. Secretory immunoglobulin A reactions to prolonged mental
arithmetic stress: inter-session and intra-session reliability. Biological psychology, 1(2002), 1-13.
[14] M. H. Choi, S. J. Lee, J. W. Yang, et al. Changes in cognitive performance due to three types of
emotional tensionDatabase theory and application, bio-science and bio-technology, Springer Berlin
Heidelberg 2010, 258-264.
[15] R. Q. Yan, Y. Q. Zhan, et al. Study of automatic detection on 12 leads Electrocardiogram, Chinese
Journal of Medical Instrumentation, 2(2002)88-91.
[16] P. Karthikeyan, M. Murugappan, S. Yaacob. ECG signal denoising using wavelet thresholding
technique in human stress assessment. International Journal on Electrical Engineering and Informatics,
2(2012), 306-319.
[17] H. Zong, C. C. Liu. Study on ECG signal processing and HRV analysis, Shandong University, 2009.
[18] R. H. D. Townsend. Fast calculation of the Lomb-Scargleperiodogram using graphics processing units.
The Astrophysical Journal Supplement Series, 2(2010), 895-895.
[19] K. Fang, J. B. Wu, et al. A review of Technologies on Random Forrest. Statistics & Information Forum,
3(2012), 32-38.
[20] L. Breiman. Random Forests. Machine Learning, 1(2001), 5-32.
[21] T. Fawcett. An introduction to ROC analysis. Pattern recognition letters, 8(2006), 861-874.
452 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-452

Ricci Flow for Optimization Routing in


WSN
Ke-Ming TANG a, Hao YANG a, b, 1, Xin QIUa and Lv-Qing WUa
a
School of Information Engineering, Yancheng Teachers University, Yancheng,
Jiangsu, China
b
School of Software and TNList, Tsinghua University, Beijing, China

Abstract. Routing based on geographic transform is appropriate for wireless sensor


networks owning to their special characteristics. However, iterative process in most
existing methods will cost tremendous energy in the processing of building routing
and deeply impact their performance. In order to avoid unnecessary mapping, this
paper presents an algorithm based on optimized hyperbolic Ricci flow. Two
schemes are proposed through choosing a potential candidate set, which reduce
unnecessary iterations immensely. We validate them in our operating system and
experiments validate that both schemes are feasible contrasted to the existing
methods.

Keywords. ricci flow, hyperbolic ricci flow, optimization, wireless sensor network

Introduction

In WSN, routing is an inherent challenge that genuinely impacts the service of network
[1], such as timely information inquiry. That is, building an available route in time is one
of crucial problems. Greedy strategy is suitable for the features of WSN [2]. Accordingly,
virtual coordinate system [3] is proposed for greedy routing scheme in WSN.
In practice, communication links in WSNs are volatile due to the inherent
characteristic of sensors. In this case, nodes could be deformed to virtual coordinates to
build the routing, since the geometric topology of the network may be more stable. In
this paper, we employ Ricci flow to achieve this conformal map and obtain “virtual
nodes”. Furthermore, a sampling greedy forward scheme could discover routing easily in
the virtual coordinate system. However, greedy routing may fail to find a route due to
forwarding void in practice. A well-known problem with geographical forwarding is
that packets may get stuck at nodes with no neighbor closer to the destination, since the
geometric topology of the network may include holes. To address this problem, the
applicable domain of Ricci flow is generalized from Euclidean space to hyperbolic
space.
The essential merit of this method is that it could generate a route quickly.
Unfortunately, the build route may be unavailable. In details, the source cannot reach the
destination through the route [4-5].
1
Corresponding Author: Hao YANG, School of Information Engineering, Yancheng Teachers
University, Yancheng, Jiangsu, China; School of Software and TNList, Tsinghua University, Beijing, China;
E-mail: classforyc@163.com
K.-M. Tang et al. / Ricci Flow for Optimization Routing in WSN 453

In this paper, we propose an algorithm based on optimized hyperbolic Ricci flow.


Two schemes are presented through selecting a suitable candidate set before mapping.
They could enormously decrease the number of iterations during the operation process.
Meanwhile, they guarantee the reachability of the route and realize the global optimum
solution. We validate them in the operating system. Experiments demonstrate that our
approach is feasible in practice and could be utilized based on different requirements.
Our contributions in the paper are as follows:
1. We present two optimized schemes based on optimized hyperbolic Ricci flow,
which reduce energy immensely. To our best knowledge, we first apply
optimized hyperbolic Ricci flow to decrease costs for building routing. Both of
them save immensely unnecessary energy spent. In addition, they guarantee
that the conformal mapping is unique in hyperbolic space using an elaborative
Möbius transformation in the Upper Half-plane model.
2. Experiments validate that our schemes are superior to the existing methods.

1. Preliminary

Hamilton [6] firstly introduced the theory of Ricci flow on smooth surfaces for
Riemannian manifolds. It is applied to deform Riemannian metric using a given
curvatures. That is, a geometric object which is distorted or uneven can be morphed to
guarantee all curvatures are the same. [8] Further extended the traditional circle packing
metric and proposed generalized discrete Ricci flow. Though the extension, discrete
Ricci flow enables construct geometric routing effectively [9] in wireless sensor network.
Especially, [7] adopted this technology in WLAN which validate that it enables to be
applied to be wireless sensor networks. However, the corresponding computational
expense is unacceptable when the scale of the network becoming large. In the other
words, optimized solutions should be considered, which our focus of this paper is.

2. Optimized Methodology

In the process of constructing routing, it is unnecessary for all sensors to transform to


virtual coordinates, since it will obviously spend enormous energy. Samples of sensors
in the local areas are correlative in general, thus some sensors may be always to be
selected as relay nodes in practical applications. It is the advantage of the clustering
transmission idea. Furthermore, the number of iterations in the process of deformation
definitely increases if the scale of the network becomes large.
To reduce energy costs of mapping, we first consider clustering the topology of the
network. In this case, only nodes which are nearly to bounds of clusters may need to be
mapped and others just need to communicate with them. This phenomenon will help us
save energy immensely, especially for large-scale network.
This paper presents two practical schemes to choose candidate nodes beforehand for
transforming.
Scheme one: only adopts the boundaries of clusters to construct the suitable vertex
set only which is simple but effective and decreases energy cost. To construct a suitable
vertex for transformation, this scheme traces the boundary Bi of clusters Ci and construct
the vertex set Vi : if vaЩBi, Vi = Vi {va}. After obtaining all vertex sets of boundaries
454 K.-M. Tang et al. / Ricci Flow for Optimization Routing in WSN

of classes, the suitable vertex set is V = Vi, where iЩ|C| and |C| is the number of
classes. Obviously, this scheme does not consider the functions of internal nodes. In this
case, it is possible that the built route is not the global optimum but an approximate
solution, since some nodes which are in clusters may be needed. Therefore, we propose
another improved scheme with more comprehensive considerations.
Scheme two: appends the neighbor nodes of boundaries of clusters, therefore the
built route is probable better than the former although the energy consumption is a little
greater. It constructs neighbor set Nei of Ci: if vЩNei, vЩCi and v is the neighbor vb, vb
ЩBi. Accordingly, constructs the vertex set Vi:
v  C
i and v is the neighbor of vb, vb

ЩBi, if v is also the neighbor of vnei, vneiЩNei, then add vnei instead of vb . In this
case, the suitable vertex set is obtained. This scheme considers the nodes that may be
probable selected in the process of building routes.
According to our optimization schemes, the number of nodes that needed to be
transformed to virtual coordinates will reduce and energy cost for building routing will
decrease greatly in WSN.
The detail of Optimized Hyperbolic Ricci flow is as follow:
Algorithm: Optimized hyperbolic Ricci Flow
Input: Triangular Mesh M, genus 0 and exist holes.
Output: Virtual coordinates for sensors, all the boundaries are circular
1. Select a candidate set by either of above schemes
2. Calculate the length of the longest boundary \V
3. If \V < the radius of corresponding of sensors, then
4. For each vertex /6 , label /6 as un-accessed. Suppose the first face ¾6€£ , and
embed this face onto the plane, which is
Ri (0,0), R j (lij ,0), Rk (lik cosT i jk , lik sin T i jk ) then label /6 , /€ and
/£ as accessed.
5. For each un-accessed node/6 , check all its neighbouring faces ¾6€£ . Once /6 ’s
neighbouring nodes /€ and /£ are accessed, embed this face on to plane.
Suppose /6 and /€ has been accessed already, /£ can be computed,
1Ti jk
l jk e
Rk ( R j  Ri )
lij
6. Find a point that is belonging to [1.8lÇ , 2lÇ ] from the origin, and make it a
center of the Upper Half-plane model
7. For each vertex /6 , map its coordinate _6 to its corresponding coordinate in the
Upper Half-plane model
8. End If

Note that all the vertex planar parameters ! " , ! # and ! $ are treated as complex numbers.

3. Performance Evaluations

The proposed schemes are verified in our sensor network, which is a consistently
operating system deployed for the aim of forest monitoring. With up to 124 nodes
K.-M. Tang et al. / Ricci Flow for Optimization Routing in WSN 455

deployed in the wild, this system provides us an excellent platform for validate the
availability of this method. Figure 1 plots the real topology of the sensor network. The
sink is deployed at the lower right corner and the communicational links are plotted.

Figure 1. The topology of our tested platform


In our comparing evaluations, both the number of iteration and energy cost of our
schemes are compared to ones of existing methods, called Traditional Ricci Flow (TRF).
In this paper, we pay close attention to the greedy strategy based on geographic
information since it could build a route quickly. It suffers from a problem that
geographical forwarding may get stuck when the topology of the network includes holes.
That is, the build route could not reach from the sensor to the sink in practice.
Hence, this paper focuses on three metrics for our proposed schemes, the number of
iterations, energy cost, and the global optimum solution. Through our optimizations, we
reduce both iteration and energy cost. Our experiments evaluate their performances.
Furthermore, their success rates of the global optimum are verified in the light of the
scale of the network.
As Figure 2, both schemes need less number of iteration than TRF as a whole. The
advantageous is obvious when there are more nodes in the network.

Figure 2. The relation of the number of nodes and iterations

Figure 3 illustrates the ratio of energy cost of both schemes compared to TRF passing
through the built route. As the results, the numerical value of TRF is 1 as the evaluative
456 K.-M. Tang et al. / Ricci Flow for Optimization Routing in WSN

criteria. The scheme two always cost the same energy with TRF while the scheme one
cost fewer when the number of nodes increases, since the former builds the identical
route with TRF based on classification and the later just obtains the approximate solution.
The above experiments demonstrate that either scheme has its advantage. We could take
the best of them on the basis of our demands.

Figure 3. The relation of the number of nodes and energy ratio

Finally, we validate the globe optimum of our scheme. As our discussion, both of our
schemes are able to reflect the route successfully since they can guarantee delivery, but
they do not always guarantee the globe optimum. The results of experiments are shown in
Fig. 4. When the number of nodes is not large (e.g. the scale of network is no more than 50
sensors), the route of three methods can reach to the globe optimum. As the number of
nodes increases (e.g. the scale of network is more than 80 sensors), the scheme one may
not guarantee to build an optimal routing each time. In other words, the built route by the
scheme one is approximate optimal solution that is constructed by both the scheme two
and TRF.

Figure 4. The relationship of success rate of the globe optimum and the number of sensors
K.-M. Tang et al. / Ricci Flow for Optimization Routing in WSN 457

4. Conclusions

In this paper, we propose two optimized hyperbolic Ricci flow schemes to construct
virtual coordinates for geographic routing, which reduces energy cost of iterative process
immensely. With our methods, sensors are mapping to virtual coordinates to discover a
proper routing. Experiments demonstrate that our optimized schemes are feasibility and
effective in practice and outperform existing Ricci flow-based routing schemes.

Acknowledgements

This work is supported by the National High Technology Research and Development
Program (863 Program) of China (2015AA01A201), National Science Foundation of
China under Grant No. 61402394, 61379064, 61273106, National Science Foundation of
Jiangsu Province of China under Grant No. BK20140462, Natural Science Foundation of
the Higher Education Institutions of Jiangsu Province of China under Grant No.
14KJB520040, 15KJB520035, China Postdoctoral Science Foundation funded project
under Grant No. 2016M591922, Jiangsu Planned Projects for Postdoctoral Research
Funds under Grant No. 1601162B, JLCBE14008, and sponsored by Qing Lan Project.

References

[1] T. Meng, F. Wu, Z. Yang, et al. Spatial reusability-aware routing in multi-hop wireless networks, IEEE
Transactions on Computers, 65(2016): 244-255.
[2] H. Huang, H. Yin, Y. Luo, et al. Three-dimensional geographic routing in wireless mobile ad hoc and
sensor networks, IEEE Network, 30(2016): 82-90.
[3] D. Zhang, E. Dong. A Virtual Coordinate-Based Bypassing Void Routing for Wireless Sensor Networks,
IEEE Sensors Journal, 15(2015): 3853-3862.
[4] R. Sarkar, X. Yin, J. Gao, F. Luo, and X. D. Gu, Greedy routing with guaranteed delivery using Ricci flows,
in Proc. 8th Int. Symp. Inf. Process. Sensor Netw. , 2009, 121–132.
[5] K. Cai, Z. Yin, H. Jiang, et al. Onionmap: A scalable geometric addressing and routing scheme for 3d
sensor networks, IEEE Transactions on Wireless Communications, 14(2015): 57-68.
[6] R. S. Hamilton, The Ricci flow on surfaces, Math General Relativity, 71(1988): 237-262.
[7] H. Yang, K. M. Tang, J. J. Yu, L. C. Zhu, H. Xu, Y. Y. Cao, Virtual Coordinates in Hyperbolic Space
Based on Ricci Flow for WLANs, Applied Mathematics and Computation, 243(2014), 537-545.
[8] Y. L. Yang, R. Guo, F. Luo, S.M. Hu, and X. Gu, Generalized discrete Ricci flow, Computer Graphics
Forum, 28 (2009).
[9] R. Guo, Local rigidity of inversive distance circle packing, Transactions of the American Mathematical
Society, 363(2011), 4757-4776.
458 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-458

Research on the Application-Driven


Architecture in Internet of Things
Wei-Dong FANG a, Wei HE a, Wei CHEN b,1, Lian-Hai SHAN c, d and Feng-Ying MA e
a
Key Laboratory of Wireless Sensor Network & Communication, Shanghai Institute of
Microsystem and Information Technology, Chinese Academy of Sciences.
Shanghai 201899, China
b
School of Computer Science and Technology, China University of Mining and
Technology, Xuzhou, Jiangsu, 221116, China.
c
Shanghai Internet of Things Co., Ltd. Shanghai 201899, China.
d
Shanghai Research Center for Wireless Communications. Shanghai 200335, China.
e
School of Electrical Engineering and Automation, Qilu University Technology, Jinan,
Shandong, 250353, China.

Abstract. As we all know, the Internet of Things (IoT) has a promising future.
However, the Internet of Things is an onward research and development (R&D)
work as a system or technology. Currently, there is no strict definition of IoT
system by the International Telecommunications Union (ITU), so most of the
architecture design for IoT comes from the requirements of specific applications.
In this paper, we focus on the architecture for IoT, especially the application-
driven architecture. Firstly, we identify and summarize the applications in IoT.
Then we give a more holistic overview of IoT’s application-driven architecture,
which are divided into three categories based on Radio Frequency Identification
(RFID), Wireless Sensor Network (WSN) and Machine-to-Machine (M2M)
respectively. Along the way, we analyze the pros and cons of proposed
architecture in each category qualitatively. In addition, we also analyze the
techniques and methods in these categories, and point out the open research issues
and directions in this field.

Keywords. internet of things, architecture, wireless sensor network.

Introduction

The Internet of Things (IoT) refers to uniquely identifiable objects (things) and their
virtual representations in an Internet-like structure [1]. In 2005, the International
Telecommunications Union (ITU) gave it a definition: “The connectivity for anything
by embedding short-range mobile transceivers into a wide array of additional gadgets
and everybody items, enabling new forms of communication between people and
things, and between things themselves.”[2] It is our comprehension that IoT is a multi-
disciplinary and advanced technology set.
At present, the applications based on IoT technology emerge more and more.
These applications involve the logistics management [3], the Intelligent Transport [4],

1
Corresponding Author: Wei CHEN, School of Computer Science and Technology, China University
of Mining and Technology, Xuzhou, Jiangsu, 221116, China; E-mail: chenw@cumt.edu.cn.
W.-D. Fang et al. / Research on the Application-Driven Architecture in Internet of Things 459

the Smart City [5], the Smart Home [6], and so on. The IoT has not only industrial
value, but also research significance. Many governments and research institutions have
invested heavily in the IoT research. Based on the existing wireless sensor network
(WSN), many projects have been researched. Recently, the special funds have been
established to facilitate the research of theories, methods and key technologies in the
IoT field, and lots of research results have been worked out and the application
demonstration and industrialization process have been initially launched by using of
some results. Gradually, the IoT becomes an indispensable aspect in the process of
next-generation broadband wireless communication network, and has huge
opportunities for the industrial R&D.
Although the applications in the Internet of Things have a promising future, the
Internet of Things is an ongoing R&D work. Nowadays, the IoT system is not strictly
defined by ITU, so most of the architecture design for IoT comes from the
requirements of specific applications. In this paper, firstly, we identify and summarize
the application of IoT in section 1. Then, in section 2 we give a more holistic overview
of the IoT’s application-driven architecture, which are divided into three categories
based on RFID, WSN and M2M respectively. Along the way, we analyze the pros and
cons of proposed architecture in each category qualitatively. In addition, we also
analyze these techniques and methods in these categories, and point out the open
research issues and directions in this field.

1. IoT Architecture of Application-Driven

In this section, the IoT’s system architecture of application-driven is divided into three
typical categories based on RFID, WSN and M2M respectively.

1.1. RFID-based

The electronic tags that transform the “things” into intelligent things may be the most
flexible. The tagging mobile and fixed assets are their major application for the
commodities’ tracking and management. Khanam believed that RFID, just like punch
card, keyboard and barcode, was an information input approach, belonged to the IoT’s
category [7]. As an extension of the application technology, RFID improves the
efficiency of information input, and reduces costs. In coding, the Auto-ID Centre
proposed the EPCGlobal system [8] for all electronic encoding; and RFID was only an
encoded carrier.
In Figure 1, the EPCGlobal have proposed five technical components of the Auto-ID
system. The five technical components involve the electronic product code (EPC) tag,
the RFID reader, the Application Level Event (ALE) middleware for information
filtering and gathering, the EPC Information Service (EPCIS) and the EPCIS
Discovery Service (including the Object Name Service (ONS) and the Physical Mark-
up Language (PML)). The EPC only identifies "tag", all useful information about the
product are described by a new standard XML (eXtensible Makeup language), named
PML. Because of the existence of ONS and PML, the EPC system of RFID-based is
really from the Network of Things to the Internet of Things. Based on ONS and PML,
the enterprise application of RFID will be from the internal closed-loop application to
the open-loop supply chain applications. Zhang et al proposed extent six-layer
architecture of IoT based on RFID [9]. In this architecture, the perception layer was
460 W.-D. Fang et al. / Research on the Application-Driven Architecture in Internet of Things

divided it into coding layer, information acquisition layer and information access layer
from bottom to up. The coding layer was the base of the Internet of things. The things’
coding information was obtained from barcode, two-dimension code and EPC. Liu et al
proposed simple radio frequency identification (RFID) based architecture to preserve
the privacy of the target object [10]. The proposed architecture could effectively hide
the presence of the target object, and preserve the location information of the target
object via simply transferring the ID information.

Figure 1. EPCGlobal Standard Architecture


As above mentioned, the key of IoT architecture’s design based on RFID focuses
on the objects information obtaining, especially the coding information. The ways of
obtaining the objects information is divided into two categories: active (using for
barcode, two-dimension code and EPC) and passive (using for active tag).

1.2. WSN-based

In general, the Sensor Networks include the wireless sensor network, the Visual Sensor
Networks (VSN) and Body the Sensor Networks (BSN). In this sub-section, we mainly
discuss the WSN, which is made up of a set of autonomy and auto-configuring wireless
sensors. A wireless node is made up of sensor, RF, MCU (Micro Controller Unit),
memory; batteries and UART (Universal Asynchronous Receiver/Transmitter). The
sensor nodes sense information, process it into data packets, and transmit them to the
sinks. The sinks converge to them, and transmit them to BS (Base Station. Finally,
these data packets are transmitted to user via wide area network.
Although the wireless sensor network is a hot research spot, there are few successful
actual cases in the industrial field. This is because, the researches mostly focus on the
WSN’s sub-layer, such as ZigBee, TinyOS and 6LoWPAN (IPv6 over Low-power
Wireless Personal Area Networks), as well as the energy efficiency. It was noteworthy
that 6LoWPAN was created for the purpose, which was unsuitable for such low-power
wireless embedded devices due to a lack of resources [11]. On the other hand, the new
trends are that made sensor nodes into smart things, and allowed sensor nodes to be
accessed via Internet [12]. Based on the above assumptions, the architecture of
flattening network was given in Figure 2.
W.-D. Fang et al. / Research on the Application-Driven Architecture in Internet of Things 461

Figure 2. Architecture of Flattening Network


There are not only the terminal users, but also thousands of sensor nodes and
intelligent devices in the Fig 2. This architecture has many advantages, which involve
the network flattening, the convenience to access and management. However, due to
different data formats, there is no standardization to solve the interoperability issues,
when different sensor networks need to connect or even different sensor nodes need to
connect within the same network. In addition, Mohamed and Camille proposed low-
cost many-to-one WSN architecture [13], which adapted to the low throughput short-
range applications.
According to current development of the wireless sensor network, we think it has
certain distance from the real-time sense for the Internet of Things. Some researchers
focus on the wireless communication technology, instead of the combination of the
field bus in perception layer and the long distance wireless communication in
transmission layer. These two technologies have achieved the steady and large-scale
application goal

1.3. M2M-based

In general, the M2M’s concept covers a wide range, which involves the part of the
EPCGlobal and the wireless sensor network, both the wired communication and the
wireless communication. A typical M2M system architecture is shown in Figure 3.

Figure 3. Typical M2M’s System Architecture


M2M has covered and expanded the system function of the Supervisory Control And
Data Acquisition (SCADA) [14]. In many industrial fields, the SCADA system has
achieved the equipment data acquisition and remote monitoring, just like M2M.
However, it is sharply different between SCADA and M2M. As a foundation of M2M,
there are a lot of standardized technologies, but many SCADA systems are based on
the traditional Client/Server architecture.
Based on actual requirements, a group of OPC Foundation member companies
wanted to develop the SCADA application on .NET Framework and Windows
Communication Foundation (WCF) [15]. In a similar way, the M2M’s development
462 W.-D. Fang et al. / Research on the Application-Driven Architecture in Internet of Things

lacks unified standardization and architecture, just as ONS and PML. Although there is
some attempt, the unified standardizations have not yet been formed. In general, as the
key technology of IoT’s architecture, the ONS and PML have broad application
prospects. In addition, Magdum et al. proposed a low cost M2M architecture to
improve the existing city bus public transport system - Arrival Time Prediction of bus
in real time and approximate Seat Availability in the bus [16].
Additionally, the technology architectures for the wireless sensor network and the
machine-to-machine have not yet fully been raised to the ONS / PML technology
system height for the Internet of Things. We think WSN and M2M would refer to the
ONS / PML technology system architect on the road towards the Internet of Things.

2. Architecture Analysis and Future Issues

As a large set of technologies, the IoT has many key technologies in theories and
applications. In this section, we will analysis some representative technologies, and
present some future issues combined with the proposed architecture in the previous
section.

2.1. Architecture Analysis

The Internet of Things is made up of many heterogeneous networks, which are co-
existence. On the other hand, as the important basis of information sense, the function
of perception layer embodies diverse aspects, which involve the overlapping of cross-
system, cross-cell and different accessed technology, in special applications.
Meanwhile, we have to take enough account for the different application characteristics,
which contain the following items:
x Unified structural design
x Diversity of standard and protocol
x Interaction of hardware and software
x Intersection of function implementation and task management
Therefore, through the application-driven analysis (seeing in section 2), we
synthesize the network structure of RFID, WSN and M2M, and then present some
future issues in the next sub-section.

2.2. Future Issues

2.2.1. Robust and Secure Architecture in Front-end of IoT


The front ends of IoT are made up of many heterogeneous networks, and these
information sense nodes face the very complex environment. It is very difficult that the
traditional wired network security technologies are directly used in the front end of IoT
due to limited resources and harsh environment. Furthermore, the characteristics of
wireless network, such as instability of the wireless link, the possibility of opportunistic
communication and the broadcast feature of wireless media, make the traditional
layered architecture not very well adapt to the wireless network of the front end of IoT.
To further enhance the security performance of wireless network, there are some
requirements of shared variable for some adjacent sub-layers and the communication
between some nonadjacent sub-layers in the front-end of IoT. So that we think it is
W.-D. Fang et al. / Research on the Application-Driven Architecture in Internet of Things 463

necessary to break the principle of layered design, to a certain extent, it is necessary to


add a few interfaces between some layers in order to achieve cross-layer design of the
security architecture.

2.2.2. Horizontal Data Mining


These unique technologies surrounding IoT are the intelligent applications of
technology, which include intelligent data fusion technology and informed decision-
making control technology. The intelligent data fusion technology is based on policy,
location, time or semantic. The intelligent decision-making control technology includes
intelligent algorithm-based, policy-based and knowledge- based. The data mining
technology that is composed of the two above-mentioned provides one of the links
between them.

Figure 4. A prototype based on horizontal data mining


The traditional data mining is based on the vertical form. However, with the
development of IoT technology, the more extensive demands need to utilize a novel
method of data mining. The horizontal data mining just breaks out the limitation of
vertical data mining. In other words, assuming the original industrial applications, it
obtains massive data, and the objective is associated with and cross-industrial
applications. There is a prototype based on horizontal data mining (see in Figure 4).
In addition to cross-industry data mining, as the key step of knowledge discovery,
the horizontal data mining should associate with knowledge generation, retrieval and
support technologies.

3. Conclusions

There are different perspectives about the R&D of the Internet of Things. One is
whether IoT has its own technology architecture or not. Different people have different
points on this issue. Someone have passive attitudes. They claim that the IoT only
integrates the existing technologies, without its own technology architecture. However,
others hold opposite opinions, they hold the viewpoint of “Internet of Things pan-
technology theory”. They argue that the IoT’s technology have been widely used in the
464 W.-D. Fang et al. / Research on the Application-Driven Architecture in Internet of Things

all aspects of industrial application, related to various fields of IT R&D. In this paper,
we identify and summarize the applications in IoT, and then we give a more holistic
overview of IoT’s application-driven architecture, which are divided into three
categories: based on RFID, WSN and Machine-to-Machine (M2M) respectively. Along
the way, we analyze the pros and cons of the proposed architecture in each category
qualitatively. In addition, we also analyze the techniques and methods in these
categories, and point out the open research issues and directions in this area.
We believe that, although the Internet of Things possesses the computer,
communications, networking and control, the simple integration of these technologies
cannot constitute a flexible, efficient and useful IoT. Based on the convergence of
existing above-mentioned technology, the IoT forms its own technical architecture
through further R&D and application.
In the foreseeable future, the things become more and more tiny, more and more
intelligent. They have their own IP address, and could achieve autonomous information
exchange via IoT. Furthermore, with the development of Smart Manufacturing and
Industry 4.0, the industrial structure will be transited from vertical to flattening, and be
changed from centralized to decentralized design. This will inevitably require the
different IoT’s architecture to meet different application requirements. Through our
contribution in this paper, we wish that some conclusion and proposed open research
issues could facilitate the system design in IoT’s field.

Acknowledgment

This work is partially supported by the National Natural Science Foundation of China
(61471346, 61302113), the Shanghai Municipal Science and Technology Committee
Program (15DZ1100400), the Science and Technology Service Network Program of
Chinese Academy of Sciences (kfj-sw-sts-155), the Science and Technology
Commission of Shanghai Municipality (14ZR1439700), the National Natural Science
Foundation and Shanxi Provincial People's Government Jointly Funded Project of
China for Coal Base and Low Carbon (U1510115), the State administration of work
safety accident prevention technology project (shandong-0006-2014AQ, shandong-
0001-2014AQ), the independent innovation projects of Ji'nan University(201401210),
the Science and technology project of Housing Urban and rural construction in
Shandong Province (201419, 2015RK030), the Safety production science and
technology development plan of Shandong Province (201409, 201417) and Project
funding for young teachers of higher education in Shandong Province.

References

[1] M. M. Kashef, H. Yoon, M. Keshavarz, J. Hwang. Decision support tool for IoT service providers for
utilization of multi clouds. IEEE ICACT, Pyeongchang, Korea (south). 2016, 91-96.
[2] International Telecommunication Union (IUT), ITU Internet Reports 2005:The Internet of Things.
[3] H. Martin, T. Marek, H. Romana. The methodology of demand forecasting system creation in an
industrial company the foundation to logistics management. IEEE ICALT, Valeciennes, France. 2015,
12-15.
[4] K. Ben, G.-M. Susan. Sustainability assessment approaches for intelligent transport systems: the state of
the art. IET Intelligent Transport Systems 10(2016), 287-297.
[5] M. Andres. Smart cities concept and challenges: Bases for the assessment of smart city projects. IEEE
SMARTGREENS, Lisbon, Portugal. 2015, 1-11
W.-D. Fang et al. / Research on the Application-Driven Architecture in Internet of Things 465

[6] K. Xu, X. Wang, W. Wei, H. Song, B. Mao. Toward software defined smart home. IEEE
Communications Magazine 54 (2016), 116-122.
[7] S. Khanam, M. Mahbub, A. Mandal, M.S. Kaiser, S.A. Mamun. Improvement of RFID tag detection
using smart antenna for tag based school monitoring system. IEEE ICEEICT, Dhaka, Bangladesh. 2014,
1-6.
[8] F. Alessandro, M. Luca, P. Luigi, V. Roberto. An EPC-based middleware enabling reusable and flexible
mixed reality educational experiences. IEEE SoftCOM, Split-Primosten, Croatia. 2013, 1-6.
[9] M. Zhang, F. Sun, X. Cheng. Architecture of Internet of Things and Its Key Technology Integration
Based-On RFID. ISCID, Hangzhou, China, 2012, 294 – 297.
[10] D. Wu, J. Du, D. Zhu, S. Wang. A Simple RFID-Based Architecture for Privacy Preservation. IEEE
Trustcom/BigDataSE/ISPA, Helsinki, Finland. 2015, 1224-1229.
[11] S. C. Mukhopadhyay, N.K. Suryadevara Internet of Things: Challenges and Opportunities, Smart
Sensors, Measurement and Instrumentation, Vol. 9, Internet of Things: Challenges and Opportunities,
ISBN 978–3–319–04222–0, Springer–Verlag, by S. C. Mukhopadhyay, 2014, 1–18.
[12] C.P. Dan, M. Hedley, T. Sathyan. A manifold flattening approach for anchorless localization. Wireless
Networks, 18(2012), 319-333.
[13] T. Mohamed, D. Camille. A low-cost many-to-one WSN architecture based on UWB-IR and DWPT.
IEEE CoDIT, Metz, France. 2014, 712-718.
[14] C. Fu, Z. Ni. The application of embedded system in Supervisory Control and Data Acquisition System
(SCADA) over wireless sensor and GPRS networks. IEEE ASID, Xiamen, China. (2015), 81-85.
[15] I. Ungurean, V.G. Gaitan, N.-C. Gaitan. Integration of Information Acquired from Industrial Processes
in a Data Server Based on OPC.NET Specification. Global Journal on Technology, 3(2013), 553-558.
[16] N. Magdum, S. Patil, A. Maldar, S. Tamhankar. A low cost M2M architecture for intelligent public
transit. IEEE ICPC, Pune, India. 2015, 1-5.
466 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-466

A GOPLevel Bitrate Clustering


Recognition Algorithm for Wireless Video
Transmission
Wen-Juan SHI a, b,1 Song LI b, Yan-Jing SUN b, Qi CAO b and Hai-Wei ZUO b
a
School of New Energy and Electrical engineering, Yancheng Teachers University,
Yancheng, 224051, China
b
School of Information and Electrical Engineering, China University of Mining and
Technology, Xuzhou 221116, China

Abstract. The wireless video transmission process, due to the complexity and
variability of wireless communication channels, requires to adjust the bitrates to
match the dynamic wireless channel. An analysis of the video frame quality could
be used as an important basis to adjust the wireless video bitrates. This paper aims
at detecting and recognizing transmission bitrates based on video frame quality for
wireless networks. According to the distribution characteristics of video frame
quality, this paper proposes a GOP-level bitrate clustering recognition algorithm
(GLBCR) by video coding GOP structural feature and temporal continuity of
video frames to recognize different bitrates for wireless video. GLBCR uses PSNR
between each pair of original and terminal decoding frame as the feature to
quantify the degradation of video frame quality. The algorithm extracts the PSNR
values of all I-frames by the peak detector function, then uses PSNR similarity
measure to recursively split the frame interval into subintervals. Finally, the
different video bitrates can be recognized by GLBCR. The proposed algorithm is
evaluated by using the LIVE mobile video quality assessment (VQA) database.
The results show that the proposed algorithm can recognize the change of video
bitrates by analyzing video frame quality, it is well consistent with the real bitrate
changes in wireless video transmission with small amount of calculation.

Keywords. bitrate clustering recognition, wireless video, PSNR, video frame


quality, clustering algorithm

Introduction

According to the Cisco Visual Networking Index Global Mobile Data Traffic Forecast
Update, mobile video traffic accounted for 55% of total mobile data traffic in 2015 and
will generate 75% of total mobile data traffic by the end of 2020 [1]. Moreover,
wireless systems are rapidly replacing present-day wire-line systems, and the wireless
video services will play a major role in our daily lives [2]. Despite growing maturity in
broadband mobile networks, the time-varying wireless channel qualities often cause the

1
Corresponding Author: Wen-Juan SHI, School of New Energy and Electrical engineering, Yancheng
Teachers University, Yancheng, 224051, China; School of Information and Electrical Engineering, China
University of Mining and Technology, Xuzhou 221116, China; E-mail: winterswj@126.com.
W.-J. Shi et al. / A GLBCR Algorithm for Wireless Video Transmission 467

channel to be relatively unreliable and can lead to the loss of transmission data; this can
seriously affect the image and video quality.
Many works have been done to study the impact of frame rate on perceptual video
quality [3-6]. Moorthy et al. [3] conducted subjective experiments to assess mobile
video quality, and the experiment results indicated the relationship of different bitrates
and subjective evaluation scores that human prefer higher bitrate. Chen et al. found that
frame rate around 15Hz seems to be generally more widely preferred, but the exact
acceptable frame rate varies depending on video content and viewers [4]. Ou et al.
explored the impact of frame rate and quantization on perceptual quality of a video [5].
Zhan Ma et al. proposed a rate model and a quality model which are expressed as the
product of separated functions of quantization step size and frame rate [6]. These works
focus on perceptual quality modeling and rate modeling for video.
In fact, a wireless channel is subject to radio interference, multipath fading and
shadowing, and sudden and severe fluctuations in the wireless bandwidth; all of these
factors can cause the traffic patterns of the compressed video streaming to change
dynamically and can significantly degrade the received video quality [7, 8]. Therefore,
it is important to recognize different bitrates by extracting the features of video and
analyzing the characteristics of the features under the conditions of terminal decoded
videos and unknown coder parameters. This paper focuses on automatically
recognizing video bitrates variation based on PSNR of each video frame. Considering
the structural and consecutive feature of video frames and the similarity measure of
neighboring frames, a bitrate clustering recognition algorithm named GLBCR is
proposed to partition the frames into clusters.
The rest of this paper is organized as follows. Section 1 gives the PSNR
computation of I-frames by peak detector function. Section 2 proposes the framework
of GLBCR and illustrates the detail of the proposed algorithm. Section 3 shows the
experiment results on LIVE mobile VQA database. Section 4 gives the conclusion.

1. PSNR Computation of I-frames

PSNR (Peak Signal to Noise Ratio) is widely used as a quality metric or performance
indicator in image and video processing [9-11], which is defined as:
2552
PSNR=10 log10 (1)
MSE
2
1 m n
MSE= ¦¦ X (i, j)  Y (i, j)
mn i 1 j 1
(2)

We have conducted an experiment on the LIVE mobile VQA database and found
that the mean PSNR value increases and the standard deviation decreases as the bitrate
increases, which indicates that the video frame quality improves as the bitrate increases.
For example, the PSNR values of four videos transmitted at different bitrates [R1,
R2, R3 and R4 (R1<R2<R3<R4)] are extracted; the results are illustrated in Figure 1. The
horizontal axis is the frame number. The vertical axis is the PSNR value of each video
frame. The PSNR mean and standard deviation values are shown in Table 1. From
Figure 1 and Table 1, it can be observed that the mean PSNR increases and the
standard deviation decreases as the bitrate increases.
468 W.-J. Shi et al. / A GLBCR Algorithm for Wireless Video Transmission

The PSNR values of a rate-changes video are shown in Figure 2. The horizontal
axis is the frame number. The vertical axis is the PSNR value of each video frame.
Three types of bitrates can be clearly observed, which are distributed in three
continuous frame intervals.
From Figure 1, it can be observed that PSNR is fluctuated affected by the structure
of Group of Pictures (GOPs). Since I frame is a full-frame compression coding frame
in a GOP, which is an Intra-code frame and can offer the most information as the
reference for the decoding of the other frames, the bitrate variation is recognized by the
PSNR of neighboring I-frames.

Figure 1. The distribution of PSNR values at four bitrates(R1<R2<R3<R4)


Table 1. The mean and standard deviation of PSNR values at four bitrates(R1<R2<R3<R4)

Bitrate Mean Standard Deviation


R1 31.5574 0.2993
R2 34.1488 0.2806
R3 36.9466 0.2313
R4 39.8255 0.1729

Figure 2. The distribution of PSNR values in a rate-changes video


Due to different GOP size in different videos, the peak detector function [12] is
adopted to obtain the GOP length by the PSNR distribution of video frames. The peak
detector function is defined as:
'P(i 1,i ) PSNR(i )  PSNR(i 1)
(3)
W.-J. Shi et al. / A GLBCR Algorithm for Wireless Video Transmission 469

'P(i ,i +1) PSNR(i +1)  PSNR(i )


(4)

­1, ( sign('P( i 1,i ) ) 1)and ( sign( 'P( i ,i 1) )=-1)


f (i ) ® (5)
¯ 0, otherwise
Where PSNR(i-1), PSNR(i) and PSNR(i+1) are the PSNR values at the (i-1)th, ith and
(i+1)th I-frames, respectively.
For example, we get the PSNR values of all the I-frames in Figure 1 and Figure 2.
by the peak detector function. The distribution of the PSNR values is showed in Figure
3 and Figure 4. The PSNR mean and standard deviation values of the I-frames are
shown in Table 2. From Figure 3 and Table 2, it can be clearly observed that the PSNR
values increases and the standard deviation decreases as the bitrate increases. From
Figure 4, three types of bitrates can be more clearly observed than that in Figure 2,
which are distributed in three intervals.

Figure 3. The distribution of PSNR values of I-frames at four bitrates(R1<R2<R3<R4) in Figure 1.


Table 2. The mean and standard deviation of PSNR values of I-frames at four bitrates(R1<R2<R3<R4)

Bitrate Mean Standard Deviation


R1 32.3962 0.0877
R2 34.9915 0.0868
R3 37.6429 0.0931
R4 40.3002 0.0967

Figure 4. The distribution of PSNR values of I-frames in Figure 2.


470 W.-J. Shi et al. / A GLBCR Algorithm for Wireless Video Transmission

2. GLBCR Algorithm

In this paper, bitrate recognition is identified as a continuous frame clustering problem,


since a video consists of consecutive frames. It needs to be established that a frame
cluster will correspond to a continuous frame interval. The frames within a cluster are
approximately at the same bitrate, and are distributed in a frame interval. A GOP-level
bitrate clustering recognition algorithm called GLBCR, based on the principle of
binary-tree, interval arithmetic and recursive subdivision, is proposed in this paper. The
framework of the proposed algorithm is shown in Figure 5. First, PSNR between each
original and decoded frame is computed as the video frame feature to quantify the
degradation of the video frame quality. Second, the PSNR values of I-frames are
extracted by peak detector function. Finally, a PSNR similarity based clustering
algorithm which is based on the principle of binary-tree, interval arithmetic and
recursive subdivision is proposed to partition the video frames into frame intervals and
recognize the different video bitrates.
A threshold is applied in GLBCR algorithm. Generally, the threshold is the
minimum PSNR mean difference of I-frames at different bitrates, named as
“m_threshold”.
Step 3:
Step 1: Step 2: PSNR
PSNR Extracting similarity
computation the PSNR of based
I-frames clustering
algorithm
Figure 5. The framework of GLBCR algorithm

2.1. Computing Similarity Measure

The similarity measure is performed, which is defined by:

xi  xi 1
s( xi , xi 1 ) 1  (6)
xmax  xmin
where xi and xi 1 are the ith and (i+1)th values, and xmax and xmin are the maximum
and minimum values in a interval. The value of the similarity measure is in the range [0,
1]. As the value approaches 1, the neighboring PSNR values are more similar. The
smaller the similarity value, the more likely it is that the corresponding point will be a
discontinuity point that can divide a frame interval into two frame subintervals.

2.2. PSNR Similarity based Clustering Algorithm

In this paper, a PSNR similarity based clustering algorithm is proposed to search the
particular frames where bitrates changes. It needs to remind that in the first level, the
frame interval is composed of all I frame numbers, not all the video frames. In the first
level, according to the similarity measure discussed in section 2.1, the PSNR similarity
of neighboring I-frames is computed, and the discontinuity points are listed in
ascending order. Then, the I-frame number which has the smallest similarity value is
W.-J. Shi et al. / A GLBCR Algorithm for Wireless Video Transmission 471

selected as the discontinuity point. It needs to judge that if the frame interval should be
divided into two subintervels on both two sides of the selected discontinuity point.
The mean difference of the frame intervals on both two sides of the discontinuity
point is compared against the m-threshold. Since a frame interval corresponds to a
cluster, if the difference of the frame intervals on both sides of the discontinuity point
is greater than the m-threshold value, the above frame intervals represent two clusters;
otherwise, they would belong to one cluster and the frame interval would not be
partitioned.
It needs to remind that the frame interval is recursively split by the frames with
smallest value of similarity measure in each splitting process based on the principle of
binary tree.
After getting the Iframe number where the bitrate changes, in the second level, the
particular frame number in the GOP, in which the obtained I frame lies, needs to be
computed. According to the similarity measure discussed in section 2.1, we calculate
the neighboring PSNR similarity in the obtained GOP, and get the particular frame
number with the smallest similarity value in the GOP, which is just the particular frame
where bitrates changes.
Since sometimes the transmission process will occur at the same bitrate, it is
essential to compute the mean of each frame interval; however, the neighboring frame
intervals must be part of different clusters, according to the principles of the GLBCR
method. Therefore, in order to reduce the calculations required, one need only compute
the mean difference between the frame interval and all other frame intervals except the
neighboring interval. For example, there are four frame intervals: Interval 1, Interval2,
Interval3 and Interval4. Required calculations include the mean difference between
Interval1, Interval3 and Interval4, as well as the mean difference between Interval2 and
Interval4. If the difference of the frame intervals on both sides of the discontinuity point
is greater than the m-threshold value, the compared intervals are considered as two
clusters; otherwise, they are the same cluster that will be merged into one cluster.
GLBCR is described in Algorithm 1.
Algorithm 1 GLBCR algorithm
Algorithm 1 GLBCR
Input: video frames
Output: clustering number, subinterval and clustering type
1. Calculate PSNR of each video frame᧨and define the interval [N1, Nmax];
2. Extract the PSNR of I-frames, and define the new interval [n 1, nIMax];
3. Calculate the PSNR similarity between neighboring I-frame in the interval [n1, nIMax];
4. Detect the discontinuity points disci and define data set disc of the discontinuity points
disc={disc1,disc2,…᧨discN} in ascending sort;
5. Calculate the difference d between the mean of PSNR values within the interval [n 1, nIMax] on both
two sides of the discontinuity point disc1
6. if d>the m-threshold do
7. Divide the frame intervals on both two sides of the discontinuity point disc i into two subintervals
[n1, nIi] and [nIi+1, nIMax];
8. Calculate the particular frame number Ni between the neighboring nIith and nIi+1th GOP in which
the bitrate changed, and divide the interval [N1, Nmax] into two frame subintervals [N1, Ni] and
[Ni+1, Nmax];
9. renew the frame interval, repeat step 3-step9;
10. else
11. Break;
12. end if
13. Merge the intervals with the almost same bitrate
14. Define the bitrate clusters {cluster1, cluster2, …, clusterx}
472 W.-J. Shi et al. / A GLBCR Algorithm for Wireless Video Transmission

3. Experiment and Discussion

In this paper, we use the LIVE mobile VQA database [2] to evaluate the performance
of GLBCR, which has simulated video distortions in heavily-trafficked wireless
network [3]. The database consists of 10 source videos and 200 distorted videos at
720p (1280×720) resolution. All videos in the database are of duration 15 seconds and
frame-rate 30 fps. The distortions include compression, wireless channel transmission
losses, frame-freezes, rate adaptation and temporal dynamics.
The rate-adaption videos and rate-switches videos in the database are tested in this
paper. The rate-adaption videos are defined as following that the videos start at a bitrate
WRx, then after n seconds, the bitrate switches to a higher bitrate WRy, then again after
n seconds switches back to original bitrate. Three different bitrate-switches are
simulated as (1) WR1-WR4-WR1, (2) WR2-WR4-WR2, (3) WR3-WR4-WR3, which are
named in turn as s14, s24 and s34.
The rate-switches video is defined as following that the bitrate is varied between
WR1 to WR4 multiple times. The five different rate-switches are simulated as (1)WR1-
WR4-WR1-WR4-WR1-WR4, (2)WR1-WR2-WR4, (3)WR1-WR3-WR4, (4)WR4-WR2-
WR1, (5)WR4-WR3-WR1, which are named in turn as t14, t124, t134, t421 and t431.
Dissimilar to regular rate-adaption videos, the bitrates of rate-switches video are
changed unregularly.

3.1. Experiment Result

The GLBCR method is tested on the LIVE mobile VQA database. The accuracy of
GLBCR is evaluated by comparing the difference between the frame intervals and the
corresponding clusters in the actual categories and also in the clusters of bitrates
recognized by GLBCR. The different types of videos are separately divided into
clusters in the experiments. The actual category and the recognized bitrate clusters and
corresponding frame intervals of the video called “dv” in the LIVE Mobile VQA
database are illustrated in Table 3. Table 3 displays the experimental results and shows
that the recognized bitrate clusters are approximately the same as the actual bitrate
category. A frame interval includes the frames at the same bitrate. For example, the
frame interval [1, 150] indicates the frame numbers from the first frame to the 150th
frame at the same bitrate. The correlation coefficient between the recognized clusters
and the actual frame intervals approximates 1. This means that GLBCR is capable of
recognition that is consistent with the actual bitrate.
According to the features of the PSNR values of videos at different bitrates, the
m_threshold of video called “dv” is set to 2.
Table 3. Comparison between the actual category of bitrates and the recognized clusters, based on GLBCR
of the video called “dv”.

actual bitrate recognized clusters


Video type
interval category interval category
[1,150] R1 [1,150] Cluster1
s14
[151,300] R4 [151,300] Cluster2
W.-J. Shi et al. / A GLBCR Algorithm for Wireless Video Transmission 473

[301,450] R1 [301,450] Cluster1


[1,150] R2 [1,150] Cluster1
s24 [151,300] R4 [151,300] Cluster2
[301,450] R2 [301,450] Cluster1
[1,150] R3 [1,150] Cluster1
s34 [151,300] R4 [151,300] Cluster2
[301,450] R3 [301,450] Cluster1
[1,90] R1 [1,90] Cluster1
[91,150] R4 [91,150] Cluster2
[151,240] R1 [151,240] Cluster1
t14
[241,300] R4 [241,300] Cluster2
[301,390] R1 [301,390] Cluster1
[391,450] R4 [391,450] Cluster2
[1,180] R1 [1,180] Cluster1
t124 [181,330] R2 [181,330] Cluster2
[331,450] R4 [331,450] Cluster3
[1,210] R1 [1,210] Cluster1
t134 [211,360] R3 [211,360] Cluster2
[361,450] R4 [361,450] Cluster3
[1,120] R4 [1,120] Cluster1
t421 [121,270] R2 [121,270] Cluster2
[271,450] R1 [271,450] Cluster3
[1, 90] R4 [1, 90] Cluster1
t431 [91,240] R3 [91,240] Cluster2
[241,450] R1 [241,450] Cluster3

3.2. Performance Comparison

The most well-known and commonly used partitioning methods are K-Means and K-
Medoids [13-16]. A comparison of the performance of K-Means, K-Medoids and
GLBCR is evaluated by using the LIVE mobile VQA database; the results of this
comparison are given in Table 4. From Table 4, it can be observed that the performance
of GLBCR algorithm exceeds that of the K-Medoids, and is closed to that of K-Means
algorithm. Although the K-Medoids and K-Means algorithms can work well for finding
spherical-shaped clusters in small- to medium-size databases, they are limited by the
given number of clusters. The common disadvantage of the K-Means and K-Medoids
algorithms is the necessity for users to specify the number of clusters. GLBCR can
automatically cluster with prior knowledge of the PSNR mean values at different
bitrates. The time complexity of the K-Means algorithm is O(nkt), where n is the total
number of objects, k is the number of clusters, and t is the number of iterations. The
time complexity of every iteration in the K-Medoids algorithm is O(k(n-k)2), where n
474 W.-J. Shi et al. / A GLBCR Algorithm for Wireless Video Transmission

and k are the same as in the k-means method. The time complexity of the proposed
GLBCR algorithm is O(nlog(n)) where n is the total number of objects.
Table 4. The performance comparison of K-means, K-medoids and GLBCR using the LIVE mobile VQA
database.

Algorithm Accuracy(%) Time Complexity


K-Means 97.8% O(nkt)
K-Medoids 93.75% O(k(n-k)2)
GLBCR 97.15 O(nlog(n))

4. Conclusion

A GOP-level bitrate clustering recognition algorithm called GLBCR for wireless video
is presented. This algorithm can recognize the video bitrates variation by analyzing the
video frame quality of wireless video. When compared with the K-Means and K-
Medoids algorithms, the results demonstrate that the proposed GLBCR algorithm is
effective and produces results that are consistent with the real bitrates.

Acknowledgements

This work is supported by National Natural Science Foundation of China (No.


51274202, No. 51504214, and No. 51504255), the Fundamental Research and
Development Foundation of Jiangsu Province (No. BE2015040), Natural Science
Foundation of Jiangsu province of China (No. BK20131124, No. BK2012068, No.
BK20130199), the Perspective Research Foundation of Production Study and Research
Alliance of Jiangsu Province (No. BY2014028-01), the scientific research project of
Yancheng Normal University (No.12YCKL002).

References

[1] Cisco. Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2015-2020, Cisco,
2016, 01.
[2] A. K. Moorthy, K. Seshadrinathan, R. Soundararajan, et al. Wireless Video Quality Assessment: A
Study of Subjective Scores and Objective Algorithms. IEEE Transactions on Circuits and Systems for
Video Technology, 20(2010), 587-599.
[3] A. K. Moorthy, L. K. Choi, A. C. Bovik, et al. Video quality assessment on mobile devices: subjective,
behavioral and objective studies. IEEE Journal of selected topics in signal processing, 6(2012), 652-
671.
[4] J. Y. C. Chen, and J. E. Thropp. Review of low frame rate effects on human performance. IEEE Trans.
on systems, 37(2007), 1063-1076.
[5] Y. F. Ou, Z. Ma, T. Liu et al. Perceptual quality assessment of video considering both frame Rate and
quantization Artifacts. IEEE Transactions on circuits and systems for video technology, 21(2011), 286-
298.
[6] Z. Ma, M. Xu, Y. Wang. Modeling of rate and perceptual quality of compressed video as functions of
frame rate and quantization stepsize and its applications. IEEE transactions on circuits and systems for
video technology, 22(2012), 671-682.
[7] X. Q. Zhu, B. Girod. Distributed Media-Aware Rate Allocation for Wireless Video Streaming. IEEE
Transactions on Circuits and Systems for Video Technology, 20(2010), 1462-1474.
W.-J. Shi et al. / A GLBCR Algorithm for Wireless Video Transmission 475

[8] Y. F. Su, Y. H. Yang, Meng-Ting Lu, et al. Smooth Control of Adaptive Media Playout for Video
Streaming. IEEE Transactions on Multimedia, 11(2008), 1331-1339.
[9] T. S. Zhao, J. H. Wang, Z. Wang, et al. PSNR-Based Coarse-Grain Scalable Video Coding. IEEE
Transactions on Broadcasting, 61(2015), 210-221.
[10] R. Raju, S. A P. PSNR Based Video Coding Using 2D-DWT. 2014 International Conference on Control,
Instrumentation, Communication and Computational Technologies, Kanyakumari, 2014, 954-957.
[11] C. L. Yang, D. Q. Xiao. Improvements for H.264 Intra Mode Selection Based on SSE and PSNR.
Journal of Electronics & Information technology, 33(2011), 289-294.
[12] Jamali, S., et al., Detecting changes in vegetation trends using time series segmentation. Remote
Sensing of Environment, 156(2015), 182-195.
[13] J. W. Han, M. Kamber. Data Mining: concepts and techniques (Third Edition). Morgan Kaufmann
Publishers, 2012, 451-457.
[14] J. Macqueen. Some methods for classification and analysis of multivariate observations. In proc. 5th
Berkeley Symposium on Mathmatical Statistics and Probability, 1967, 281-297.
[15] L. M. Xue, W. X. Luan. Improved K-means Algorithm in User Behavior Analysis. 2015 ninth
International Conference on Frontier of Computer Science and Technology, Dalian, 2015, 339-342.
[16] U. Agrawal, S. K. Roy, U. S. Tiwary, et al. K-Means Clustering for Adaptive Wavelet Based Image
Denoising. 2015 International Conference on Advances in Computer Engineering and Applications,
Ghaziabad, 2015, 134-137.
476 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-476

The Analysis of Cognitive Image and


Tourism Experience in Taiwan’s Old
Streets Based on a Hybrid MCDM
Approach
Chung-Ling KUOa and Chia-Li LIN b,1
a
Department of Tourism and Leisure Management, Lee-Ming Institute of Technology,
No. 22, SEC.3, Tailin RD., Taishan Dist, New Taipei City, 24305, Taiwan
b
Department of Recreation Management, Shin Chien University, 200 University Road,
Neimen, Kaohsiung, 845, Taiwan

Abstract. There are forty to fifty old streets spreading around Taiwan. Each
Taiwan’s old street has its own story. Some are famous for their delicacy; others
for their unique scenery and still others promote their local cultures through the
combination of their local industries and festivals. A few Taiwan’s old streets have
found their position through time, while others have not and are fading like dried
leaves. This study, therefore, aims to find both the position and the developing
direction of Taiwan’s old streets. Through a tourism-experiencing perspective, the
evaluation system is formed according to six aspects (Landscape image, Historical
site image, Cultural image, Shopping experience, Gourmet experience, Marketing
experience) in order to classify the Taiwan’s old streets and construct the
developing strategies. The connection of old-street-forming criteria will be
constructed by the use of Fuzzy Cognitive Map (FCM). They will be classified
according to their quality, and their competitive strategies will also be established.
Hopefully, this study will be of great value in assisting the governmental
authorities finding the characteristics of the Taiwan’s old streets and improving
their development.

Keywords. old street, cognitive image, tourism experience, fuzzy cognitive map
(FCM), network relation map (NRM)

Introduction

The preservation and utilization of historical monuments have been increasingly


emphasized worldwide in recent years. On the one hand, through the preservation of
historical monuments, people of later generations are able to understand the cultural
development and historical changes of the place. On the other hand, the re-development
of historical monuments and cultural resources can provide modern people with tourist
and recreational spots. Such activities also help bring people’s attentions to the
preservation and re-utilization of historical monuments. When discussing the issue, one
will definitely talk about the old streets. Not only do the old streets preserve the history

1
Corresponding Author: Chia-Li LIN, Department of Recreation Management, Shin Chien University, 200
University Road, Neimen, Kaohsiung, 845, Taiwan; E-mail: linchiali0704@yahoo.com.tw.
C.-L. Kuo and C.-L. Lin / The Analysis of Cognitive Image and Tourism Experience 477

of local cultural development, but they also witness the rise and decline of economic
activities in the place. Although urban regeneration and suburban development have
solved the issue of living for the increasing population in the modernized city, they also
result in the demolition of many old streets and buildings. Therefore, we should look
for the significance of old streets in the fast-changing time and understand their value
for the new age. This study attempts to find the image connotations of old streets by
analyzing old street experiences, through which the researcher discovers the value of
old streets in people’s minds. Reconfirmation of old street images is used as the basis
for the re-utilization of old streets. In one aspect, through the preservation of old-street
cultures, local residents’ sentimental values of the old streets are lasted. In the other,
through the re-utilization of old streets, their economic values are raised. Hence, it is
an important issue for the government to ensure that with the consideration of both
sentimental and economic values, the old streets are able to be included in the new
trend of city and town development.
However, the rapid development also brings a dilemma: urban regeneration and
the preservation of historical monuments are often in conflict. Therefore, it would be
very difficult to keep the value of historical monuments in the rapid-changing process
of urban development, and the issue needs to be carefully treated. On the one hand, to
promote the economic development of the area, some old streets and historical
monuments need to be demolished. On the other hand, it may limit the development of
the area and its economic growth if one chooses to preserve the cultural resources.
With the improvement of material life, people’s need for travel and leisure has risen
increasingly, which has also promoted the local tourism. Especially, nostalgic tours
which provide customers with knowledge of the historical monuments have become
more and more popular. Recent studies have been paying attention to the issue. Our
study uses service-experiencing perception as the starting point. We created an
evaluation system for old-street classification and development strategies which is
based on six aspects (sightseeing, historic monument visiting, culture experiencing,
shopping, food tasting and marketing). We use the Fuzzy Cognitive Map (FCM) in our
construction of the system which presents the relationship between old-street
characteristics and criteria. Some important old streets in north and central Taiwan are
used for our study cases.
This study is divided into five sections, the second section we want to discuss
about the cognitive image and tourism experience, the third part is research method,
and empirical study based on Taiwan’s old streets, the final section is conclusion. In
the end, we would like to find the key success factor when developing Taiwan’s old
streets.

1. Cognition Image and Tourism Experience of Taiwan’s Old Streets

1.1. The Affective Image Dimension

In the study of tourist characteristics and their image cognition of traveling places, one
finds that there is a particular relationship between cognitive image and the motives of
tourists, which leads to three results: (1) Motives influences the affective image. (2)
Holiday traveling experiences are clearly relevant to cognition and affective images. (3)
The social statistic features have influence on cognition and affective image evaluation
[1].
478 C.-L. Kuo and C.-L. Lin / The Analysis of Cognitive Image and Tourism Experience

1.1.1. Landscape Image (LI)


Tourists’ impressions of a city may build upon its energetic industrial activities. In a
survey about the impression of Hong Kong, tourists think of the place a shopping
paradise. However, the crowded, busy and nervous city life is considered to be the
weakness of the city. What attract them are Hong Kong’s landmarks, such as its
beautiful mountain scenes and the ferries under the starry sky [2]. This study divides
the landscape image into five aspects, which include Human Landscape, Settlement
Landscape, Geographical Landscape, Architectural Landscape and Geological
Landscape. The “Human Landscape” aspect presents the way in which tourists
experience the local customs through its ethnic cultural and commercial activities.
“Settlement Landscape” means to inspire the nostalgic feelings of tourists through the
characteristic settlement monuments. The “Geographical Landscape” aspect shows
how tourists are attracted to the particular geographical environment of the place.
“Architectural Landscape” means to help tourists understand the merging of local
culture and its social change through the variety of architectural styles. In “Geological
Landscape”, tourists understand the secret and power of the great nature through the
place’s characteristic geological environment.

1.1.2. Historic Monument Image (HI)


In a study which borrows the concept if collective memory, one finds that when
traveling in a slavery farm, the hegemonic consciousness of tourists may be
strengthened, which also evokes their thoughts about a specific kind of political system
[3]. The exploitation of historic monuments is in such a process: at first the government
builds the place. Then, travel agents attract international tourists with the beautiful
image of exploring the oriental culture or the old Chinese culture in Hong Kong. The
local institutes, therefore, begin to revisit and reconfirm the local culture. Finally the
owner of the place starts to draw the attention of the government and make it a local
scene [4]. Our study has divided the aspect of historic monument image into five
criteria: Historic relics , Traces of local industry , Traces of local people , Religious
monuments and Cultural relics . The historic relics facet shows how tourists are
introduced the cultural and historical changes of the place through viewing
characteristic city walls, buildings and signboards. The local industry facet presents
how tourists know the development of local industry through traces of industrial
activity facilities, such as the pottery industry in Yingge. The local people facet means
to help tourists understand the development of local history through the traces of
residence or properties of well-known or ruling people. The religious monuments facet
shows how tourists know the development of local religion(s) through visiting religious
buildings. The “cultural relics” facet means using museums and exhibition halls to
preserve cultural relics for tourists to visit.

1.1.3. Culture Image (CI)


In a study of the cognitive difference before and after a trip, the researcher uses
journeys to India as an example. One finds that the abundant art and cultural
monuments of India are highly valued by tourists. However, tourists tend to give
negative feedbacks to the country’s deceivers, beggars, public hygiene and safety
conditions [5]. Our study has divided the culture image aspect into five facets, which
C.-L. Kuo and C.-L. Lin / The Analysis of Cognitive Image and Tourism Experience 479

include “traditional festivals”, “temple fairs”, “sacrificial rites”, “new emerging


festivals” and “conventional art teaching”. The “traditional festival” facet presents the
way in which tourists feel the local celebration atmosphere through its festivals. The
“temple fair” facet means to help tourists experience the pious religious belief of the
locals through religious and temple fairs. The “new emerging festivals” facet shows
how people use local festival activities to promote local cultural characteristics and
attract tourists’ attentions. The “conventional art teaching facet” means to teach
tourists traditional crafts making and to help them experience in person.

1.2. The Tourism Experience Dimension

In a study on the relationship between travelling image and holiday experience, one
finds that travelling image has direct impact on the previous cognitive quality,
satisfaction and the motivation to revisit the place. This has, therefore, verified the role
of image in the marketing of the tourist place. The relationship shows that good
service quality has positive influence on the tourists’ satisfaction and their inclination
for revisiting [6].

1.2.1. Shopping Experience (SE)


In a survey on tourists’ clothes purchasing in tourist spots, it show that clothes which
are combined with the local cultural elements sell better in the market. This also points
out that fact that local cultural commodities are clearly relevant to the consumers’
shopping behaviors [7]. In a study on the attitudes of sales in tourist spots and
consumers’ shopping behaviors, one discovers that service-oriented selling behaviors
have positive influence on diversified products, service quality and shopping behaviors.
On the other hand, product-selling-oriented behaviors may induce negative effects on
product quality, product value and the impracticability of the product [8]. This study
has divided the shopping experience aspect into five facets, which are “route planning”,
“shop types”, “product types”, “shopping environment” and “service attitude”. The
“route planning” facet emphasizes the importance to arrange clear routes for tourists to
shop easily. The “shop type” facet suggests the more diverse the shop types, the more
likely it is to fulfill the tourists’ shopping requirements. The “product type” aspect
points out that the more variety the products, the more possible it would be to satisfy
the tourists’ shopping choices. The “shopping environment” facet shows that better
shopping environments tend to enhance the tourists’ shopping desires. The “service
attitude” facet reveals that the more friendly the salesperson’s attitude, the more it
increases the tourists’ motivations to inquire about and buy the product. In addition,
flexible “business hours” also help satisfy tourists’ shopping needs because they can
shop in their preferred time.

1.2.2. Gourmet experience (GE)


In a research on the traveling motivation and preference in night-market and street-
vendor areas, one finds that fast food is the most dominating commercial and
recreational activity there. Shopping and novelty-searching come next. However, the
there are still issues which have been complained by tourists, such as theft, parking and
traffic in the night-market and street-vendor areas in Taiwan [9]. Our research has
divided the food experience aspect into five facets, which are famous snacks, local
specialties, dining environment, and hygiene condition and service quality. Fast snacks
480 C.-L. Kuo and C.-L. Lin / The Analysis of Cognitive Image and Tourism Experience

are the local cuisines or snacks; the more special they are, the more they attract tourists.
Local specialties mean the special products of the area which can be tasted by tourists
as well as for them to take home as gifts to families and friends. The dining
environment facet emphasizes the condition of the dining area which makes tourists
decide whether they dine in there or not. The hygiene condition facet suggests that
good hygiene condition will encourage customers to dine in or take away. The service
quality facet points out that a good-quality service will make customers want to visit
the shop again.

1.2.3. Marketing experience (ME)


This study has divided the marketing experience aspect into five facets: certificate
exhibiting, activity promoting, internet advertising, magazine reporting and media
broadcasting. The certificate-exhibiting facet means to promote the old-street
certification system or to use indications of tourist spots in order to make the old street
well known. Activity promoting emphasizes holding the activities which will help
tourists experience the attractiveness of the old street. Internet advertising focuses on
constructing professional websites in order to advertise for the activities of the old
street. Magazine reporting means to help the public understand more about the past and
future of the old street through interview articles in the special issues of magazines.
The media-broadcasting facet focuses on helping people understand more about the old
street through television, newspaper and magazine reporting.

2. Research Method

2.1. Fuzzy cognitive map

FCM (Fuzzy cognitive map) approach was proposed by Kosko (1988); Sekitani and
Takahashi, (2001), and was developed from the original model of Axelord (1976)
through incorporating fuzzy measure and completing a flexible and feasible method to
resolve fuzzy network relation structure among objects in a complicated system. FCM
approach has been widely applied to enterprise management, political decision,
industrial analysis, and system control [10-24] This research process include 5 steps:
(1) Evaluate the initial average matrix (A), (2) Evaluate the direct influence matrix (D),
(3) Evaluate state matrix (C), (4) Evaluate the influence relation structure of
aspects/criteria and (5) Draw the Network Relation Map (NRM)

(1) Evaluate the initial average matrix (A)


The field experts were asked to indicate the influence that they believe each aspect
exerts on each of the others; according to scoring scales ranging from 0 to 4, where “0”
means no influence and “4” means “extremely strong influence.” For the question
between aspect/criterion; “1”, “2”, and “3” mean “Low influence”, “Medium
influence” and “High influence,” respectively. As the data shows in Table 1, the
influence of EI (Historical site image) on CI (Cultural image) is 3.508, which means
“high influence”. On the other hand, the influence of GE (Gourmet experience) on HI
(Historical site image) is 1.763, which means “medium influence”.
C.-L. Kuo and C.-L. Lin / The Analysis of Cognitive Image and Tourism Experience 481

Table 1 Initial influence matrix

Aspects LI EI CI SE CE ME Total
Landscape image (LI) 0.000 2.780 2.780 2.203 2.237 2.525 12.525
Historical site image (HI) 2.831 0.000 3.508 2.068 2.169 2.678 13.254
Cultural image (CI) 2.627 3.322 0.000 1.983 2.254 2.780 12.966
Shopping experience (SE) 2.153 1.695 2.186 0.000 2.729 3.068 11.831
Gourmet experience (GE) 1.966 1.763 2.475 2.712 0.000 2.983 11.898
Marketing experience (ME) 2.441 2.712 2.610 2.949 2.949 0.000 13.661
Total 12.017 12.271 13.559 11.915 12.339 14.034
(2) Evaluate the direct influence matrix (D)
The initial direct influence matrix (D) can be calculated by Eqs. (1) and (2). Initial
average matrix (A) is the initial average influence matrix, and can produce the initial
direct influence matrix (D) through the process of Eqs. (1) and (2). Matrix D represents
each direct influence, and in the Matrix, the numbers on the diagonal are 0 and the sum
of each column and row is 1 in maximum (only one equals 1). Adding the sums of each
row and column in the Matrix results in the direct influence value:
D sA, s > 0 (1)
where
n n
s min [1/ max ¦ aij ,1/ max ¦ aij ], i, j 1, 2,..., n (2)
i, j 1di d n 1d j d n
j 1 i 1
n n
and lim D m [0]nun , where D [ xij ] nun , when 0  ¦ xij , ¦ xij d 1 at
m of
j 1 i 1
n n
least one ¦x
j 1
ij or ¦x
i 1
ij equal one, but not all. So we can guarantee lim D m 1
m of
[0]nun .

As shown in Table 1, we processed the “original influence matrix (A)” by using


Equations (1) and (2) and obtained the “direct influence matrix (D)”. The diagonal
items of D are all 0, and the sum of a row is 1, at most as shown in Table 2. Then we
calculated Table 3 by adding up the rows and columns. The sum of the rows and
columns for “ME aspect” is 1.973, which is the most important influence aspect as
shown in Table 3. On the other hand, the sum of the rows and columns for SE aspect is
1.692, which is the least important influence aspect.
Table 2 Direct influence matrix ( D)

Aspects LI EI CI SE CE ME Total
Landscape image (LI) 0.000 0.198 0.198 0.157 0.159 0.180 0.893
Historical site image (HI) 0.202 0.000 0.250 0.147 0.155 0.191 0.944
Cultural image (CI) 0.187 0.237 0.000 0.141 0.161 0.198 0.924
Shopping experience (SE) 0.153 0.121 0.156 0.000 0.194 0.219 0.843
Gourmet experience (GE) 0.140 0.126 0.176 0.193 0.000 0.213 0.848
Marketing experience (ME) 0.174 0.193 0.186 0.210 0.210 0.000 0.973
Total 0.856 0.874 0.966 0.849 0.879 1.000
482 C.-L. Kuo and C.-L. Lin / The Analysis of Cognitive Image and Tourism Experience

Table 3 Direct influence matrix comparison table

Aspects Sum of Sum of Sum of row Importance


row column and column of Influence
Landscape image (LI) 0.893 0.856 1.749 4
Historical site image (HI) 0.944 0.874 1.819 3
Cultural image (CI) 0.924 0.966 1.890 2
Shopping experience (SE) 0.843 0.849 1.692 6
Gourmet experience (GE) 0.848 0.879 1.727 5
Marketing experience (ME) 0.973 1.000 1.973 1
Total

(3) Evaluate state matrix (C)


When we define 4 combinations (N, D, C, f), N {N1 , N2 ,..., Nn } shows there are
n objects and D represents the relationship matrix (Direct influence matrix) composed
of the relationship between the objects. C is the state matrix. C (0) is the initial state
matrix, and C ( t ) is the interim state matrix after repeating t times. f is the threshold
function and means that the interrelationship stays between C ( t ) and C (t 1) . The
following are the threshold functions often seen as shown in Eq. (6).
­1 if x t 1
f ( x) ® , (Linear function) (3)
¯0 if x  1
f ( x) tanh( x) (1  e- x ) (1  e - x ) ,
(Hyperbolic Tangent Function) (4)
f ( x) 1 (1  e- x ) (Logistic Function) (5)

The influence relationship between aspects or criteria can be calculated through the
following equation:
C ( t 1) f (C ( t ') D ) , Ct ' Ct  C 0 , C (0) I nun (6)
where: I nu n represents the identity matrix.

Vector-Matrix multiplication can conduct the interim state value of a continuous


FCM (Fuzzy cognitive map) approach, while unlimited multiplication makes the
interim state value close to a fixed value, or what is called the limit steady-state cycle.
The continuous multiplication will make the stabilized vector-matrix remain as a fixed
value, which is called the Limit State Cycle. The threshold in the study is shown in the
linear function of Eq. (3), and Eq. (6) can generate the state matrix of each dimension
or criterion. The interim state can reach a steady state through continuous multiplying,
called the limit steady-state matrix as shown in Table 4.
C.-L. Kuo and C.-L. Lin / The Analysis of Cognitive Image and Tourism Experience 483

Table 4 Limit steady-state matrix

Aspects LI EI CI SE CE ME Total
Landscape image (LI) 1.397 1.597 1.710 1.520 1.564 1.741 9.529
Historical site image (HI) 1.638 1.508 1.826 1.583 1.635 1.830 10.020
Cultural image (CI) 1.601 1.672 1.597 1.554 1.612 1.805 9.842
Shopping experience (SE) 1.456 1.465 1.598 1.318 1.521 1.690 9.048
Gourmet experience (GE) 1.455 1.477 1.620 1.486 1.365 1.694 9.097
Marketing experience (ME) 1.642 1.691 1.809 1.658 1.704 1.703 10.206
Total 9.189 9.410 10.161 9.118 9.401 10.463 -

(4) Evaluate the influence relationship of aspects /criteria


Eqs. (7) & (8) may add elements of a row in the C * (Limit state matrix) to get the
d (Row sum vector), while Eqs. (7) & (9) may add elements of a column in the
C * (limit state matrix) to get the transpose of r (Column sum vector). Adding ( d ) and
( r ) gets the di  ri (row-column sum vector), which represents the aggregate influence.
The higher di  ri (Row-column sum vector) is, the greater the influence relationship
between the aspect or criterion i and other aspects or criteria. Subtracting ( d ) from ( r )
yields the di  ri (Row-column difference vector), which represents the net influence. If
di  ri ! 0 , the magnitude that the aspect/criterion influences on other aspects/criteria
are greater than the magnitude that the aspects/criteria are influenced by other
aspects/criteria, and vice versa. In order to produce the C net (Net limit state matrix), the
lower triangular matrix should be subtracted from the upper triangular matrix, or Eq.
(10) can be processed. The value of the upper triangular matrix and that of the lower
triangular matrix are the same, but with different signs, after the process of Eq. (10).
C
*
[tij ], i, j  {1, 2,..., n} (7)
n

d d nu1 [¦ tij ]nu1 = (8)


j 1

(d1 ,..., di ,..., dn )


n (9)
r rnu1 [¦ tij ]1cu n = (r1 ,..., rj ,..., rn )
i 1

*
The C (Limit state matrix) can be derived from Eqs. (4) or (5). Table 4 is the
calculated C * (Limit state matrix). The C * (Limit state matrix), consists of multiple
elements, indicated in Eq. (6) as shown in Table 4. The sum vector of the row value is
{ di }, and the sum vector of the column value{ rj }; then, let i j , the { di  ri } is
sum vector of the row value plus the column value, which means the C * (Limit state
matrix). As the di  ri is higher, the network relation structure of the aspects/criteria is
stronger. The di  ri is, which means there is a net influence relation structure. If
di  ri > 0, it means the degree of influencing others is stronger than the degree to be
influenced.
484 C.-L. Kuo and C.-L. Lin / The Analysis of Cognitive Image and Tourism Experience

The ME (Marketing experience) aspect has the highest degree of full influence
value ( d6  r6 ) = 20.669, and the SE (Shopping experience) aspect has the lowest
degree of full influence value ( d 4  r4 ) = 18.166. The HI (Historical site image) aspect
has the highest degree of net influence value ( d2  r2 )=0.610. The order of the other
net influences is listed as follows: the LI (Landscape image) aspect ( d1  r1 = 0.340),
the SE (Shopping experience) aspect ( d4  r4 =-0.070), ME (Marketing experience)
aspect ( d6  r6 =-0.257), GE (Gourmet experience) aspect ( d5  r5 =-0.304) and the last
one, the CI (Cultural image) aspect ( d3  r3 -0.319) as shown in Table 5.

Table 5. The influence magnitude at the C*


Aspects { di } { ri } { d i  ri } { d i  ri }

Landscape image (LI) 9.529 9.189 18.717 0.340


Historical site image (HI) 10.020 9.410 19.429 0.610
Cultural image (CI) 9.842 10.161 20.002 -0.319
Shopping experience (SE) 9.048 9.118 18.166 -0.070
Gourmet experience (GE) 9.097 9.401 18.497 -0.304
Marketing experience (ME) 10.206 10.463 20.669 -0.257

(5) Draw the Network Relation Map (NRM)


According to the aspects/criteria defined as shown in Table 1, some field experts
were invited to discuss the relation structure and influence levels of criteria under the
same aspects and to score the relation structure and influence among criteria based on
the FCM (Fuzzy cognitive map) approach. The aspects are divided into different styles,
so the field experts could answer the questionnaire in areas/fields with which they were
familiar. The net influence matrix, C net , is determined by Eq. (9).
C net [tij  t ji ], i, j  {1, 2,..., n} (10)

The diagonal items of the matrix are all 0. In other words, the matrix contains a
strictly upper triangular matrix and a strictly lower triangular matrix as shown in Table
6. Moreover, while values of a strictly upper triangular matrix and strictly lower
triangular matrix are the same, their symbols are the opposite. This property helps us in
that we only have to choose one of a strictly triangular matrix.
Table 4 shows the matrix at the limit steady state. Eq. (10) can produce the net limit
state matrix, as shown in Table 6. Using the values of ( d  r ) and ( d  r ) in Table 5 as
the X axis and Y axis, respectively, the NRM can be drawn as shown in Figure 1. Data
in Table 7 can be used to draw the NRM as shown in Figure 1. The HI (Historical site
image) aspect is the major aspect with a net relation structure, while the CI (Cultural
image) aspect is the major aspect being influenced. ME (Marketing experience) aspect
is the aspect with the greatest aggregate relation structure while the SE (Shopping
experience) aspect is the one with the smallest aggregate relation structure as shown in
Figure 1.
C.-L. Kuo and C.-L. Lin / The Analysis of Cognitive Image and Tourism Experience 485

Table 6 The net influence matrix of cognitive image and tourism experience

Aspects LI EI CI SE CE ME
Landscape image (LI) -
Historical site image (HI) 0.041 -
Cultural image (CI) -0.109 -0.154 -
Shopping experience (SE) -0.063 -0.118 0.045 -
Gourmet experience (GE) -0.110 -0.158 0.008 -0.035 -
Marketing experience (ME) -0.099 -0.140 0.004 -0.032 0.010 -
1

0.8
Historical site
image (HI)
0.6
0.041
Landscape
0.4 image (LI)
0.118

0.2 0.063
0.154 0.140
Shopping
0.110 0.158
experience (SE)
d-r

0
0.109
0.099
0.035 0.032
-0.2 0.045

0.008 0.004 Marketing


Gourmet 0.010 experience
-0.4 Cultural image
experience (GE)
(CI) (ME)

-0.6

-0.8

-1.0
18.0 18.5 19 19.5 20 20.5 21

d+r

Figure 1. The improvement strategy map for cognitive image and tourism experience

3. Conclusions

According to the aspects/criteria, there are six aspects, such as LI (Landscape image)
aspect, HI (Historical site image) aspect, CI (Cultural image) aspect, SE (Shopping
experience) aspect, GE (Gourmet experience) aspect, and ME (Marketing experience)
aspect and experts were invited to analyze the NRM (Network relation map) and to
score the network relation structure among the aspects based on the FCM (Fuzzy
cognitive map) approach. The network relation matrix can be derived from Eq. (9).
Which in the six aspects, HI (Historical site image) and LI (Landscape image) aspects
are more influential, while the SE (Shopping experience) aspect, Gourmet experience
(GE), Cultural image (CI) and Marketing experience (ME) are the major dimension
being influenced. The Marketing experience (ME) is the dimension with the highest
full influence, while the SE (Shopping experience) aspect is the one with the smallest
full influence aspect.
486 C.-L. Kuo and C.-L. Lin / The Analysis of Cognitive Image and Tourism Experience

References

[1] A. Beerli and J. D. Martin, Tourists' characteristics and the perceived image of tourist destinations: a
quantitative analysis - a case study of Lanzarote, Spain, Tourism Management 25 (2004), 623-636.
[2] W. M. Choi, A. Chan, and J. Wu, A qualitative and quantitative assessment of Hong Kong's image as a
tourist destination, Tourism Management 20 (1999), 361-365.
[3] C. N. Buzinde and C. A. Santos, Representations of slavery, Annals of Tourism Research 35 (2008), 469-
488.
[4] S. C. H. Cheung, The meanings of a heritage trail in Hong Kong, Annals of Tourism Research 26(1999),
570-588.
[5] M. Chaudhary, India's image as a tourist destination - a perspective of foreign tourists, Tourism
Management 21 (2000), 293-297.
[6] J. E. Bigne, M. I. Sanchez, and J. Sanchez, Tourism image, evaluation variables and after purchase
behaviour: inter-relationship, Tourism Management 22 (2001), 607-616.
[7] M. Asplet and M. Cooper, Cultural designs in New Zealand souvenir clothing: the question of
authenticity, Tourism Management 21 (2000) 307-312.
[8] J. Chang, B.-T. Yang and C.-G. Yu, The moderating effect of salespersons' selling behaviour on
shopping motivation and satisfaction: Taiwan tourists in China, Tourism Management 27 (2006), 934-
942.
[9] A.-T. Hsieh and J. Chang, Shopping and Tourist Night Markets in Taiwan, Tourism Management, 27
(2006) 138-145.
[10] G. A. Banini and R. A. Bearman, Application of fuzzy cognitive maps to factors affecting slurry
rheology, International Journal of Mineral Processing, 52 (1998), 233-244.
[11] S. Bueno and J. L. Salmeron, Fuzzy modeling Enterprise Resource Planning tool selection, Computer
Standards & Interfaces 30 (2008), 137-147.
[12] B. Kosko, Hidden patterns in combined and adaptive knowledge networks, International Journal of
Approximate Reasoning 2 (1988), 377-393.
[13] K. C. Lee, J. S. Kim, N. H. Chung, and S. J. Kwon, Fuzzy cognitive map approach to web-mining
inference amplification, Expert Systems with Applications 22 (2002), 197-211.
[14] S. Lee and I. Han, Fuzzy cognitive map for the design of EDI controls, Information & Management, 37
(2000) 37-50.
[15] S. Lee, B. G. Kim, and K. Lee, Fuzzy cognitive map-based approach to evaluate EDI performance: a
test of causal model, Expert Systems with Applications 27 (2004), 287-299.
[16] K. S. Park and S. H. Kim, Fuzzy cognitive maps considering time relationships, International Journal
of Human-Computer Studies 42 (1995)157-168..
[17] L. Rodriguez-Repiso, R. Setchi, and J. L. Salmeron, Modelling IT projects success with Fuzzy
Cognitive Maps, Expert Systems with Applications 32 (2007), 543-559.
[18] W. Stach, L. Kurgan, W. Pedrycz, and M. Reformat, Genetic learning of fuzzy cognitive maps, Fuzzy
Sets and Systems, 153 (2005) 371-401.
[19] M. A. Styblinski and B. D. Meyer, Signal Flow Graphs vs Fuzzy Cognitive Maps in application to
qualitative circuit analysis, International Journal of Man-Machine Studies 35 (1991), 175-186.
[20] R. Taber, Knowledge processing with Fuzzy Cognitive Maps, Expert Systems with Applications 2
(1991), 83-87.
[21] Z. Wei, L. Lu, and Z. Yanchun, Using fuzzy cognitive time maps for modeling and evaluating trust
dynamics in the virtual enterprises, Expert Systems with Applications 35 (2008)1583-1592.
[22] G. Xirogiannis, J. Stefanou, and M. Glykas, A fuzzy cognitive map approaches to support urban design,
Expert Systems with Applications 26 (2004), 257-268.
[23] B. S. A. Yeoh and S. Huang, The conservation-redevelopment dilemma in Singapore: The case of the
Kampong Glam historic district, Cities 13 (1996), 411-422.
[24] R. Yu and G.-H. Tzeng, A soft computing method for multi-criteria decision making with dependence
and feedback, Applied Mathematics and Computation 180 (2006), 63-75.
Fuzzy Systems and Data Mining II 487
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-487

A Collaborative Filtering Recommendation


Model Based on Fusion of
Correlation-Weighted and Item
Optimal-Weighted
Shi-Qi WEN a, Cheng WANG a, c,1, Jian-Ying WANG a, Guo-Qi ZHENG c, d, Hai-Xiao
CHI a, Ji-Feng LIU b
a
College of Computer Science and Technology, Huaqiao University, Xiamen 361021,
China
b
College of Foreign Languages, Huaqiao University, Xiamen 361021, China
c
Huaqiao University-Yardi Big Data Research Centre, Xiamen 361021, China
d
Yardi Technology Limited, Xiamen 361021, China

Abstract. Traditional collaborative filtering algorithm has a shortcoming---it


assigns all items with equal importance, which can result in excessive frequency in
recommending hot items and reduced novelty and accuracy. Thus, we propose a
collaborative filtering recommendation model based on fusion of item
correlation-weighted and optimal-weighted. First we use correlation-weighted
method to find the best threshold which could ensure the stability of algorithm
under the sparse condition. Then, we fusion the correlation-weighted method with
item optimal-weighed strategy on the condition of selecting the optimal threshold.
Targeting final mean absolute error (MAE) of collaborative filtering, we introduce
the weight value in the prediction process to enhance the performance of mining
items in which users are really interested. Theoretically, fusion model can
effectively overcome the problem in traditional collaborative filtering algorithm,
and combine the advantage of correlation-weighted and item optimal-weighted.
Experiment results in MovieLens-100K data set show that MAE of the a
collaborative filtering recommendation model based on fusion of item
correlation-weighted and optimal-weighted is lower than traditional collaborative
algorithm, item optimal-weight collaborative algorithm, correlation-weight
collaborative algorithm and variance-weight collaborative algorithm. The model
we proposed reduces the item average popularity, increases the coverage and
improves the recall rate and more novelty is achieved.

Keywords. collaborative filtering; item-weighted; correlation-weighted;


optimization method; novelty; popularity

Introduction

As one of the most successful technologies of recommendation systems, collaborative


filtering (CF) is an effective measure to solve the information overload [1].The basic

1
Corresponding Author: Cheng WANG, College of Computer Science and Technology, Huaqiao
University, Xiamen 361021, China; E-mail: wangcheng@hqu.edu.cn.
488 S.-Q. Wen et al. / A Collaborative Filtering Recommendation Model

idea is to predict the preference of target users according to their historical rating data,
and select a plurality of high degree project as the recommendation result [2].
But traditional collaborative filtering algorithm has the problem of low accuracy;
the reason of which partly due to the ignorance of the fact that different item has
different influence and contribution to the recommendation result when using the user
item rating matrix prediction score [3].
From the angle of the long tail theory, each item has a long tail in the aspect of
influence and the importance, and every item is valuable. The difference lies in the
magnitude of value and the types of audience groups.
The traditional recommendation algorithm is more concerned about the impact of
popular items, while ignoring the value of unpopular project in the trail part. Giving
weight to unpopular items is a commonly used method to improve item influence. But
these weighted strategies are all experiential weighted and additional priori-knowledge
is needed. In addition, they introduce a new optimization parameter set problem, and it
can only improve recommendation accuracy from one aspect.
Lai W [4] et al. proposed a collaborative filtering recommendation algorithm
incorporated with the changes of user interest. The algorithm they propose designed a
time weight function and introduced a new method to calculate the similarity between
items. You H [5] et al. propose a recommendation algorithm combining item clustering
method and weighted slope one scheme. These algorithms improve the accuracy of
recommendation accuracy to a certain extent, but ignoring the personalized of
recommendation.

1. Problems in Item-based Collaborative Filtering Algorithm and its Results


Analysis

1.1. Problem Description

Given a recommendation system consists of m users U= {


u1 , u2 , ..., ua , ..., um }
i i i i
and n items I= { 1 , 2 ,.., b ,.... n }, in traditional algorithms, input data can be
described as a m u n rating matrix. As shown in Table 1, every entry in this matrix
ra ,b a scoring item b , where ra ,b  [1,5] .This value shows
represents rate of user
us the extent of user’s interest in item b .If user a has not rated item b ,then
ra ,b 0
.Because of the lack of the rate, a large number of the ratings in user-item rate
matrix are missed. In traditional item-based collaborative filtering, the nearest-neighbor
i
set is computed by active item b through comparing similarity between item b and
i
other items according to input ratings data. Thus nearest-neighbor would be found.
Therefore, the missing data can be predicted according to the nearest-neighbor set, and
topгN recommendation set is obtained.
S.-Q. Wen et al. / A Collaborative Filtering Recommendation Model 489

Table 1. User-item rate matrix

User Item i1 … ib … in

u1 r1,1 … r1,b … r1,n


ua ra ,1 … ra,b … ra,n

um rm,1 … rm,b … rm ,n

1.2. Similarity Computation

The calculation of item similarity is one of the steps of collaborative filtering algorithm,
and the commonly used method of similarity calculation includes cosine similarity,
correlation similarity and modified cosine similarity.
Cosine similarity regards user rating as a multi-dimensional vector. The similarity
between item u and item v can be computed by the angle of the vector.

sim(ia , ib )
ia x ib ¦ kI a ˆ I b
(rk ,a rk ,b )
(1)
| ia | * | ib | ¦ lI a
(rl ,a ) 2 ¦ jIb
( r j ,b ) 2
In formula (1) rl , a , l  I a represents vector l ’s scoring on component a ,and
rj ,b , j  I b represents vector j ’s scoring on component b . rk ,a , k  I a ˆ I b
represents vector k ’s scoring on vector a and b ’s common component .
rk ,b , k  I a ˆ I b represents vector k ’s scoring on vector a and b ’s common
component.
Xu X [6] et al. use recall, accuracy, coverage, popularity etc. as indicators to verify
in the case of sparse data. Compare the Jaccard coefficients, the Euclidean distance, the
Pearson coefficient and the cosine similarity, cosine similarity has best effect.
Therefore, cosine similarity is the basis of many kinds of weighted improvement.
But these algorithms do not take into account the importance of unpopular items
and the influence of most frequently recommended items [7], thus affecting the
accuracy of recommendation and personalization.

1.3. Predicting Rating

According to the rating of the target item nearest neighbor, to predict the score of the
non-rating items, we should select a number of the high scoring items as
recommendation results to target users. We can predict user uv ’ s rating on item ia ,
according to user uv ’ s rating on item ia ’ s nearest neighbor.
¦ sim(i , i )* r
vN
a b v ,b
Puv ,ia (2)
¦ sim(i , i )
vN
a b
490 S.-Q. Wen et al. / A Collaborative Filtering Recommendation Model

Here,
ib  N represents i b is item ia ’s nearest neighbor. sim(ia , ib ) represents
i i u
item a ’s similarity between item b . rv ,b represents user v ’ s rate on item b .
i

1.4. Results Analysis of Problems in Item-Based Collaborative Filtering Algorithm

The similarity calculate of traditional collaborative filtering algorithm is not stable. In


traditional algorithm, we determine the two items are similar even if they only have a
few common rating and similar in these rating. So, using traditional method calculate
similarity has great instability.
Traditional item-base collaborative filtering algorithm also has the problem of low
accuracy and it is not novelty enough. It is because of that traditional item-based
collaborative filtering algorithm treats all the items with equal importance, which leads
to excessive frequency in recommending hot items. A good recommendation system
not only needs accurate predict user’s behavior, but also can help users find new things
that they are interested in. And a good recommendation system should be able to
improve the coverage of recommendation result, mining more new things .But in recent
years, collaborative filtering algorithm researches mainly concentrate on the
recommendation accuracy [8]; there is not much study on unpopular item mining.
Therefore, the precision, coverage rate and novelty of the recommendation system are
needed to be paid attention to. So, Mining items in which users are really interested
while ensuring the accuracy of recommendation is a problem worthy of study.

2. A Collaborative Filtering Recommendation Model Based On Fusion of


Correlation-Weighted and Item Optimal-Weighted

To improve the prediction precision and stability, we introduced an optimizing method


to quickly find the optimal weight distribution strategy, and propose the method to
select the items with a higher impact and greater contribution to the recommend result.
Assigning each item a weight W= { w1 , w2 ,..., wb ,..., wn } and search for the optimal
solution through iterative.
In fact, if the two users only choose a few items, and the rating of these items is
similar, it cannot explain that the two users are really similar. So, we use correlation
weight to avoid the situation that items only rated by very small number of users and
similar on these user rating.

2.1. Correlation-Weight Model

2.1.1. Correlation-Weight Impact to Similarity Computation


Correlation weight is a kind of empirical weighting that uses the number of common
score to affect similarity calculation. Correlation weight function is shown as Eq. (3). Q
represents the times that item ia and ib rated by common user .T is a threshold we
set before.
S.-Q. Wen et al. / A Collaborative Filtering Recommendation Model 491

­°Q Q T
® T
wia ,ib (3)
°¯1 Q tT
Correlation-weighted similarity simc(ia , ib ) is shown as Eq. (4).
simc(ia , ib ) wia ,ib * sim(ia , ib ) (4)
If we set T to 1, simc(ia , ib ) is as the same as traditional similarity. If T set too
small or too large often cannot achieve the best result.

2.1.2. Correlation-Weight Impact to Nearest Neighbor Choose and Rating Predicting


Correlation-weighted have an impact to similarity calculate which could impact nearest
neighbor choose.
¦ simc(i , i )* r a b v ,b
Pucv ,ia vN
(5)
¦ simc(i , i )
vN
a b

Here,
ib  N represents i b is item ia ’s nearest neighbor. simc(ia , ib )
i i
represents item a ’s correlation-weighted similarity between item b . r represents v ,b

user
uv ’ s rate on item i b .

2.2. Item Optimal-Weighted Model

2.2.1. Item Optimal-Weighted Impact to Similarity Computation


We use cosine similarity as the similarity measurement of item optimal-weighted
model. However, due to the particularity of cosine similarity, the weights of the
numerator and denominator can be eliminated. Thus, optimal-weighted dose not impact
calculate of similarity.

simcc(ia , ib )
ia x ib ¦ kI a ˆ I b
(rk ,a * wa , rk ,b * wb )
| ia | * | ib | ¦ lI a
(rl ,a * wa ) 2 ¦ (r * w )
jI b j ,b b
2 (6)
wa * wb * ¦ kI (rk ,a * rk ,b ) ¦ (r * r k ,a k ,b )
a ˆ Ib kI a ˆ I b
sim(ia , ib )
wa * wb * ¦ lI a
(rl ,a ) 2 ¦ jI b
( r j ,b ) 2 ¦ (r ) ¦
lI a l ,a
2
jI b
( r j ,b ) 2

Here, simcc(ia , ib ) represents item optimal-weighted similarity. sim(ia , ib )


represents traditional item similarity. rl ,a , l  I a represents vector l ’s scoring on
component a ,and rj ,b , j  I b represents vector j ’s scoring on component b .
rk ,a , k  I a ˆ I b represents vector k ’s scoring on vector a and b ’s common
492 S.-Q. Wen et al. / A Collaborative Filtering Recommendation Model

component . rk ,b , k  I a ˆ I b represents vector k ’s scoring on vector a and b ’s


common component. wa represent item ia ’s weight, wb represent item ib ’s weight.

2.2.2. Item Optimal-Weighted Impact to Nearest Neighbor Choose and Rating


Predicting

¦ simcc(i , i )* r * w ¦ sim(i , i )* r * w
a b v ,b b a b v ,b b
Puccv ,ia vN vN
(7)
¦ simcc(i , i )* w
vN
a b ¦ sim(i , i )* w
b
vN
a b b

Here, Puccv ,ia represents user uv ’s rating on item ia . N represents item ia ’s nearest
neighbor. simcc(ia , ib ) represents item optimal-weighted similarity. sim(ia , ib )
represents traditional item similarity. rv ,b represents user uv ’ s rate on item i b . wb
represents item iv ’s weight.

2.3. Fusion Model Description

2.3.1. Fusion Strategy


First we use correlation-weighted method to find the best threshold T which could
ensure the stability of algorithm under the sparse condition. Then, we fusion the
correlation-weighted method with item optimal-weighed strategy on the condition of
selecting the optimal threshold. Correlation-weighted method has an impact to
calculate of similarity while item optimal-weighed strategy has an impact to the rating
predict. Fusion strategy combines the characteristics of the tow method and solves the
defects of traditional algorithm.

2.3.2. Fusion Model Impact to Similarity Computation

sim fusion (ia , ib ) wia ,ib * sim(ia , ib ) wia ,ib *


¦ kI a ˆ Ib
(rk ,a rk ,b )
(8)
¦ lI a
(rl ,a ) 2 ¦ jI b
( rj ,b ) 2
Here, wia ,ib represents correlation weight. rl ,a , l  I a represents vector l ’s scoring
on component a ,and rj ,b , j  I b represents vector j ’s scoring on component b .
rk ,a , k  I a ˆ I b represents vector k ’s scoring on vector a and b ’s common
component . rk ,b , k  I a ˆ I b represents vector k ’s scoring on vector a and b ’s
common component.
S.-Q. Wen et al. / A Collaborative Filtering Recommendation Model 493

2.3.3. Fusion Model Impact to Nearest Neighbor Choose and Rating Predicting

¦ sim
vN
fusion (ia , ib )* rv ,b * wb
Pfusion _ uv ,ia (9)
¦ sim
vN
fusion (ia , ib )* wb

Here, Pfusion _ uv ,ia represents user uv ’s rating on item ia . N represents item


ia ’s nearest neighbor. sim fusion (ia , ib ) represents fusion item similarity. rv ,b
represents user uv ’ s rate on item i b . wb represents item iv ’s weight.

2.4. Solving of Fusion Model

2.4.1. The Optimization Goal of Correlation-Weighted Fusion Item Optimal-Weighted


Model
We use collaborative filtering precision function as the fitness function of the optimal
iteration to maximize the accuracy of the collaborative filtering algorithm as the goal of
optimization through the movement of all the particles in the space with the result of
the optimal value as the optimal accuracy value, and the optimal value of the location
as the optimal weight. The formula of fitness function is shown as Eq. (10).
m

¦| p
uv 1
uv ,ia  rv ,a |
min MAE min
m
­
° sim fusion (ia , ib ) wia ,ib * sim(ia , ib ) wia ,ib *
¦ kIa ˆIb (rk ,a rk ,b ) (10)
°
°
¦ lIa (rl ,a )2 ¦ jIb (rj ,b )2
st. ®
° ¦ sim fusion (ia , ib ) * rv,b * wb
° Pfusion _ uv ,ia vN
°
¯
¦ sim fusion (ia , ib ) * wb
vN

2.4.2. Using PSO to Solving Optimal Weight Coefficient


Optimization problem is to select most reasonable scheme from all possible (finite or
infinite) schemes, and reach the optimal scheme, namely the problem of optimal
solution. In other words, the method of searching for optimal solution is the best
method.
Particle Swarm Optimization (PSO) is an optimized algorithm based on swarm
intelligence. Particle swarm optimization algorithm is simple, effective, convergence
speed fast and has strong global search ability. In recent years, it has caused a stir of
interest in the academic circles, and has been widely applied to the field of the function
optimization, the neural network training and the pattern classification. Zhang et al. [9]
combine the particle swarm optimization algorithm with item clustering collaborative
filtering algorithm so as to quickly find the clustering center. Lu et al. [10] use the
scoring difference as fitness function in the configuration file to improve the item
weighting attribution and obtain a better accuracy. PSO optimization process is shown
in Figure 1.
494 S.-Q. Wen et al. / A Collaborative Filtering Recommendation Model

Figure 1. Flowchart of Using PSO to find optimization weight

Set F as maximum iteration number of PSO㧘with iteration times 1 as


number of iteration㧘and global optimal value
gbest as maximum value. Initialize
weights distribution of the particles, use the average distribution as a particle's initial
weight distribution, and the rest particles are distributed weights randomly. Use the
known weight as the initial value can speed up the convergence rate of PSO algorithm.
pbest is the optimal recommendation accuracy. If pbest ! gbest then update the
global optimal accuracy and the corresponding optimal weight allocation strategy.
Determine whether the current number of times is more than F .If times < F ,
update the weight distribution of each particle.
S.-Q. Wen et al. / A Collaborative Filtering Recommendation Model 495

2.5. Theory Analysis of Fusion Model

2.5.1. Advantage of Correlation-Weight Model


Correlation-weighted model avoid the coupling of similarity calculation. And it can
ensure the stability of algorithm under the sparse condition. We can use it to improve
the similarity calculate without priori-knowledge.

2.5.2. Advantages of Item Optimal-Weighted Model


Optimal-weighted model not only does not need a priori knowledge, but also is low in
algorithm’s time and space complexity. We just need to optimize n parameters, the
time complexity is O ( n ) .

2.5.3. Advantages of Fusion Model


Traditional collaborative filtering algorithm has the problem of unstable enough; we
introduced correlation-weighted strategy into our model. We setting the threshold of
items common score number, if common rating number is less than the threshold, the
similarity might be unstable. Thus, if we decide the similarity is unstable we multiply
the similarity a correlate weight, which equal to the quotient of the common rating and
the threshold. This method could effective increase the stability of similarity.
We no longer treat the entire item with equal importance. We have adopted the
weighted strategy to assign each item a weight. And get the beast weighted strategy
through iterative training .By use optimal weighted strategy we could reduce the
recommendation of hot items to recommend unpopular items that users are interested.
Theoretically, fusion model can effectively overcome the problem in traditional
collaborative filtering algorithm, and combine the advantage of correlation-weighted
and item optimal-weighted has a certain improvement in personalized and accuracy.

3. Experiment Results and Analysis

3.1. Datasets Introduction

In the experiment we use the MoviesLens Datasets provided by GroupLens team in the
University of Minnesota in the United States, which contains the 100000 ratings (1-5
scales) rated by the 943 users on 1682 movies. Each user scores at least 20 films. The
datasets are very sparse, since the density of actual score data is100000/ (943*1682)
=6.3%.
In this paper, the data are collected from the table u.data. Calculations of the
similarity are conducted in terms of UserID, MovieID, and Rating. Table 3 is shown as
below.
Table 2. Table u.data format
UserID MovieID Rating Timestamp
496 S.-Q. Wen et al. / A Collaborative Filtering Recommendation Model

3.2. Evaluation Parameters

To verify the effectiveness of the algorithm, we use MAE, recall, precision, average
popularity and coverage as the evaluation indicator. R (u ) represents the items we
recommend to users. T (u ) represents the items that users actually rated. M
represent the number of items.

3.2.1. Precision and Individuation Evaluation Index


MAE can be defined as:
m

¦| p
uv 1
uv ,ia  rv ,a |
MAE
m
Puv ,ia u i u
represents user v ’s rating on item a , rv , a represents user v ’s actual

rating on item
ia , and m represents the size of test dataset.

3.2.2. Precision and Individuation Evaluation Index


) The Recall describes how many percentages of items we recommend to users are
actually chosen by users. Recall can be defined as:
¦ | R(u) T(u) |
u
Re call
¦ | T(u) |
u
2) Coverage describes how much percentage of items we recommend to users. It
represents the algorithm ability to mine unpopular item. Coverage can be defined as:
| R(u) |
Coverage (13)
|M |
3) Average popularity can be defined as:
¦
vI ( ua )
item _ pop(v)
popularityAVG (14)
V
Here, popularityAVG represents average popularity, item _ pop(v) represents
item iv ’s popularity, and I (ua ) represents the items we recommend to user ua . V
represents the number of items we recommend to users.

3.3. Experiment Parameters Setting

3.3.1. Correlation Weight Parameters Setting


Figure 2 shows the experimental result with different values of the experimental
parameters T.
S.-Q. Wen et al. / A Collaborative Filtering Recommendation Model 497

Figure 2. The result of item-based and correlation-weight-based collaborative filter


The experiment result in figure 2 shows that result is more stable when T is set to
5.

3.3.2. PSO Algorithm Parameters Setting


In this experiment, the number of PSO particles is 100; the number of iterations is 500.
The parameters C1 and C2 are both 0.1.When PSO is initialized, the average
distribution one part of the initial particles, other particles randomly assigned. We
recommend 10 items to users as a result of the recommendation.

3.4. Recommendation Experiment Results

In this paper, we improve the traditional item-based collaborative filtering algorithm


and compare with traditional algorithm, item optimal weight algorithm, correlation
weight algorithm and variance weight algorithm. The experiment results are shown as
below.

1) Recommend Precision Experiment Results

Figure 3 The MAE result compare of collaborative filter algorithms


498 S.-Q. Wen et al. / A Collaborative Filtering Recommendation Model

2) Recall Experiment Results

Figure 4. The Recall result compare of collaborative filter algorithms

3) Items popularity Experiment Results

Figure 5. The popularity result compare of collaborative filter algorithms

4) Coverage Experiment Results

Figure 6. The coverage result compare of collaborative filter algorithms


S.-Q. Wen et al. / A Collaborative Filtering Recommendation Model 499

5) Analysis of Experimental Results

1) From figure 3, under the circumstances of different nearest neighbors number,


the average absolute errors of the model we propose are less than traditional algorithm,
item optimal weight algorithm, correlation weight algorithm and variance weight
algorithm. Because the similarity calculates of fusion strategy is more stable and the
similarity value could impact nearest neighbor choose. Thus, our method could indeed
improve the recommendation accuracy and the experimental result could confirm the
theory analysis of our model.
2) From figure 4, 5 and 6, the result procured from the experiment demonstrates
that the algorithm we propose reduces the item average popularity, increase the
coverage and improve the recall rate. Because item optimal-weighted could find better
item weighting distribution strategy and get more personalized recommendation. The
results show that our method can mine items in which users are really interested while
ensuring the accuracy of recommendation.

4. Conclusions and Remarks

In this paper we propose a collaborative filtering recommendation model based on


fusion of item correlation-weighted and optimal-weighted, we introduce the weight
value in the prediction process and fusion correlation-weighted strategy. Increase the
stability of similarity calculates and improves the influence of unpopular items.
Experiment results in MovieLens-100K data set show that Our method enhance the
performance of mining items in which users are really interested and more personalized
is achieved.
The weighted strategy in this paper can also be used to weight the users. We can
choose more authoritative users to improve the recommend quality. We can also use
other fusion strategy and other optimal object function as the fitness function of our
recommendation model. It is meaningful for the next step of our research.

Acknowledgements

This work was financially supported by National Natural Science Foundation of China
(Grant No.51305142, 61572204), Fujian province science and technology plan
(No.2017H01010065), project of Xiamen science and technology plan
(3502Z20151239), Postgraduate Scientific Research Innovation Ability Training Plan
Funding Projects of Huaqiao University (No.1511314023).

References:

[1] G. Suganeshwari, S. P. S. Ibrahim. A Survey on Collaborative Filtering Based Recommendation


System, Proceedings of the 3rd International Symposium on Big Data and Cloud Computing
Challenges (ISBCC – 16’), Springer International Publishing (2016).
500 S.-Q. Wen et al. / A Collaborative Filtering Recommendation Model

[2] Z. L. Zhao, C. D. Wang, Y. Y. Wan, et al. Pipeline Item-Based Collaborative Filtering Based on Map
Reduce, Proceedings of the 2015 IEEE Fifth International Conference on Big Data and Cloud
Computing. IEEE Computer Society, (2015), 9-14.
[3] J. J. Castro-Schez, R. Miguel, D. Vallejo, et al. A highly adaptive recommender system based on fuzzy
logic for B2C e-commerce portals. Expert Systems with Applications, 38 2011), 2441-2454.
[4] W. Lai, H. Deng. An improved collaborative filtering algorithm adapting to user interest changes,
Information Science and Service Science and Data Mining (ISSDM), IEEE, (2012), 598-602.
[5] H. You, H. Li, Y. Wang, et al. An improved collaborative filtering recommendation algorithm
combining item clustering and slope one scheme. Lecture Notes in Engineering & Computer Science,
2215(2015), 18-20.
[6] X. U. Xiang, X. F. Wang. Optimization Method of Similarity Degree in Collaborative Filter Algorithm.
Computer Engineering, 36(2010), 52-54.
[7] S. Y. Wei, Y. Ning, X. B. Yang. Collaborative Filtering Algorithm Combining Item Category and
Dynamic Time Weighting. Computer Engineering, 40(2014), 206-210.
[8] W. U. Hu, Y. J. Wang, Z. Wang, et al. Two-Phase Collaborative Filtering Algorithm Based on
Co-Clustering. Journal of Software, 21(2010), 1042-1054.
[9] Z. Y. Xiong, F. J. Zhang, Y. F. Zhang. Item Clustering Recommendation Algorithm Based on Particle
Swarm Optimization. Computer Engineering, 35(2009), 178-180.
[10] L. U. Chun, A. Hong, J. Gong, et al. Research on collaborative filtering recommendation method
based on PSO algorithm. Computer Engineering & Applications, 50(2014), 101-107.
Fuzzy Systems and Data Mining II 501
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-501

A Cayley Theorem for Regular Double


Stone Algebras
Cong-Wen LUO 1
College of Science, China Three Gorges University, Hubei, China

Abstract. This paper studies the Cayley’s theorem for regular double Stone alge-
bras. We raise the concept of regular ternary class and show that each regular dou-
ble Stone algebra is isomorphic to a subalgebra of the algebra associated with some
regular ternary class of functions over a set, which is analogous to the results for
Stone algebras.

Keywords. Regular double Stone algebra, Representation, Regular ternary class.

1. Introduction

A double Stone algebra A = (A; ∨, ∧,0 ,+ , 0, 1) is an algebra of type (2, 2, 1, 1, 0, 0)


such that (i) (A; ∨, ∧, 0, 1) is a bounded distributive lattice; (ii) 0 is a pseudocomplement
(i.e., a ∧ x = 0 iff x ≤ a0 ) satisfying the Stone identity x0 ∨ x00 = 1; (iii) + is a
dual pseudocomplement (i.e., a ∨ x = 1 iff x ≥ a+ ) satisfying the dual Stone identity
x+ ∧ x++ = 0. A double Stone algebra A is called regular if x0 = y 0 and x+ = y +
imply x = y.
In the following we always use D to denote the class of bounded distributive lattices,
A to denote the class of regular double Stone algebras.
Cayley’s Theorem provides a well-known representation of groups by means of cer-
tain unary functions (the so-called permutations) with composition as its binary oper-
ation. For Boolean algebras and distributive lattices,a presentation via binary functions
were given in [1,2,3,5]. Esik prove a Cayley-like representation theorem for ternary al-
gebras([4,6]). In this paper we raise the concept of regular ternary class and show that
each regular double Stone algebra is isomorphic to a subalgebra of the algebra associated
with some regular ternary class of functions over a set, which is analogous to the results
for Stone algebras ([7]).

2. Representation by Set Algebras

We start by constructing a regular double Stone algebra. Let X be a set and S(X) =
{(A, B) ∈ X × X|A ⊆ B}. We define the following operations on S(X) :
(A1 , B1 ) ∪ (A2 , B2 ) = (A1 ∪ A2 , B1 ∪ B2 );
1 Corresponding Author : Cong-wen Luo , College of Science, China Three Gorges University, Yichang ,

China , E-mail: lcw@ctgu.edu.cn.


502 C.-W. Luo / A Cayley Theorem for Regular Double Stone Algebras

(A1 , B1 ) ∩ (A2 , B2 ) = (A1 ∩ A2 , B1 ∩ B2 );


(A, B)0 = (X\B, X\B);
(A, B)+ = (X\A, X\A).
Then (S(X), ∪, ∩,0 ,+ ) is a regular double Stone algebra.
Theorem 2.1. Let A ∈ A. Then A can be embeded in some regular double Stone
algebra S(X).
Proof. Let X be the set of all the prime ideals on A and Xa denote the set of
all the prime ideals not containing a. Since a++ ≤ a00 , Xa++ ⊆ Xa00 , we have
(Xa++ , Xa00 ) ∈ S(X). Define ϕ : A → S(X), a → (Xa++ , Xa00 ). It is easy to see
that XaVb = Xa ∪ Xb , Xab = Xa ∩ Xb . Since Xa0 = X\Xa00 , Xa+ = X\Xa++ , we
have ϕ(a0 ) = (ϕ(a))0 , ϕ(a+ ) = (ϕ(a))+ . The fact A is regular implies ϕ is one-to-one.
Therefore, A can be embeded in some regular double Stone algebra S(X).

3. Representation by Functions

Definition 3.1. Let X be a set. A class D of maps X 3 → X is called a regular ternary


class if the following hold:
1. The projections π1 : X 3 → X, (x, y, z) → x and π3 : X 3 → X, (x, y, z) → z
are in D.
2. D is closed under composition, so that if f, g1 , g2 , g3 are in D, then the function
h : X 3 → X is also in D, where h is defined by:
h(x, y, z) = f (g1 (x, y, z), g2 (x, y, z), g3 (x, y, z))
for all x, y, z ∈ X.
3. Each function g in D is idempotent:

g(x, x, x) = x,

for all x ∈ X.
4. Any two g, h ∈ D commute:
h(g(x1 , x2 , x3 ), g(y1 , y2 , y3 ), g(z1 , z2 , z3 )) = g(h(x1 , y1 , z1 ), h(x2 , y2 , z2 ), h(x3 ,
y3 , z3 )) for all xi , yi , zi ∈ X, i = 1, 2, 3.
5. Each g ∈ D is diagonal:
g(g(x1 , x2 , x3 ), g(y1 , y2 , y3 ), g(z1 , z2 , z3 )) = g(x1 , y2 , z3 ),
for all xi , yi , zi ∈ X, i = 1, 2, 3.
6. If g(z, x, x) = h(z, x, x) and g(z, z, x) = h(z, z, x), then

g(x, y, z) = h(x, y, z),

for all x, y, z ∈ X.
In what follows we always use B to denote the class of regular ternary classes over
a set X.
Let D ∈ B, we define the operations ∨, ∧,0 and + on D as follows. For any g, h ∈ D
and x, y, z ∈ X,
(g ∨ h)(x, y, z) = g(x, h(x, y, y), h(x, y, z)),
(g ∧ h)(x, y, z) = g(h(x, y, z), h(y, y, z), z),
g 0 (x, y, z) = g(z, z, x),
g + (x, y, z) = g(z, x, x).
C.-W. Luo / A Cayley Theorem for Regular Double Stone Algebras 503

Since D ∈ B, the maps g ∧ h, g ∨ h, g 0 and g + are in D. Moreover, we define the


constants 0 and 1 by:1 = π1 and 0 = π3 .
Theorem 3.2. Let D ∈ B. Then D ∈ A under the above operations and constants.
Proof. That D is a Stone algebra under the operations ∨, ∧ and 0 follows from Propo-
sition 1 in [7]. Furthermore,
(1) π1++ (x, y, z) = π1+ (z, x, x) = π1 (x, z, z) = x = π1 (x, y, z)
(2) (g ∨ π1+ )(x, y, z)
= g(x, π1+ (x, y, y), π1+ (x, y, z))
= g(x, π1 (y, x, x), π1 (z, x, x))
= g(x, y, z)
(3) (g ∨ (g ∨ h)+ )(x, y, z)
= g(x, (g ∨ h)+ (x, y, y), (g ∨ h)+ (x, y, z))
= g(x, (g ∨ h)(y, x, x), (g ∨ h)(z, x, x))
= g(x, g(y, h(y, x, x), h(y, x, x)), g(z, h(z, x, x), h(z, x, x)))
= g(g(x, x, x), g(y, h(y, x, x), h(y, x, x)), g(z, h(z, x, x), h(z, x, x)))
= g(x, h(y, x, x), h(z, x, x))
= g(x, h+ (x, y, y), h+ (x, y, z))
= (g ∨ h+ )(x, y, z)
(4) (g + ∧ g ++ )(x, y, z)
= g + (g ++ (x, y, z), g ++ (y, y, z), z)
= g + (g + (z, x, x), g + (z, y, y), z)
= g + (g(x, z, z), g(y, z, z), z)
= g(z, g(x, z, z), g(x, z, z))
= g(g(z, z, z), g(x, z, z), g(x, z, z))
= g(z, z, z) = z = π3 (x, y, z)
Hence by the definition of D, D ∈ A.
Suppose L = (L, +, ·, 0, 1) ∈ D, where we write + for ∨ and · for ∧. Define
ML = {(a1 , a2 , a3 )|ai ∈ L , ai aj = 0, for i = j, and a1 + a2 + a3 = 1.}
Let a = (a1 , a2 , a3 ), b = (b1 , b2 , b3 ) ∈ ML , we define a + b and ab as the matrix
products

⎛ ⎞
1 0 0
a + b = (a1 , a2 , a3 ) · ⎝b1 b2 + b3 0 ⎠ , (1)
b1 b2 b3

⎛ ⎞
b1 b2 b3
ab = (a1 , a2 , a3 ) · ⎝ 0 b1 + b2 b3 ⎠ . (2)
0 0 1

Moreover, we define
a0 = (a3 , 0, a1 + a2 ), a+ = (a2 + a3 , 0, a1 ) and let 0 = (0, 0, 1), 1 = (1, 0, 0).
Note that the two constants 0, 1 ∈ ML . Also, ML is closed under the above opera-
tions.
Theorem 3.3. Let L ∈ D, then
< ML , +, ·,0 ,+ , 0, 1 >∈ A.
Proof. Define
504 C.-W. Luo / A Cayley Theorem for Regular Double Stone Algebras

DL = {ifa : L3 → L, (x, y, z) → a1 x + a2 y + a3 z, a = (a1 , a2 , a3 ) ∈ ML }.

According to Proposition 3.2 in [6], DL satisfies the conditions (1)-(5) of the regular
ternary class. Now we will show that the condition (6) holds, hence DL ∈ A.
Suppose ifa (z, x, x) = ifb (z, x, x) and ifa (z, z, x) = ifb (z, z, x), where a =
(a1 , a2 , a3 ) and b = (b1 , b2 , b3 ) in ML , then a1 z + (a2 + a3 )x = b1 z + (b2 + b3 )x
and (a1 + a2 )z + a3 x = (b1 + b2 )z + b3 x. Setting x = 1, z = 0 and x = 0, z = 1
respectively, we have a3 = b3 , a2 + a3 = b2 + b3 and a1 = b1 , a1 + a2 = b1 + b2 . Thus
a2 = b2 from the fact that L ∈ D and a = b and then ifa (x, y, z) = ifb (x, y, z).
Next, we define

ϕ : ML → DL , a → ifa .

It is easy to show that ϕ is an isomorphism. In fact, if ifa = ifb , then, for all x, y, z ∈ L,
ifa (x, y, z) = ifb (x, y, z), that is to say, a1 x + a2 y + a3 z = b1 x + b2 y + b3 z. Setting
x = 1, y = z = 0; x = z = 0, y = 1 and x = y = 0, z = 1, respectively, we have
ai = bi , i = 1, 2, 3. Thus ϕ is one-to-one. Obviously, ϕ is onto. Furthermore,

ifa+b = ifa ∨ ifb , ifab = ifa ∧ ifb ,

and
ifa0 (x, y, z) = (ifa )0 (x, y, z), ifa+ (x, y, z) = (ifa )+ (x, y, z).
Hence ML ∈ A .
For each set X, the subset-pair algebra S(X) is isomorphic to the algebra ML ,
where L is the field of all subsets of X. Indeed, the function

S(X) → ML

(A1 , A2 ) → (A1 , A2 \A1 , X\A2 )

is an isomorphism.
Lemma 3.4. Let A ∈ A. Then A can be embeded in S(X), where X =
{I is a prime ideal of A} .
Proof. Let Xa = {I is a prime ideal of A|a ∈ I} . Since a++ ≤ a00 , Xa++ ⊆ Xa00 ,
we have (Xa++ , Xa00 ) ∈ S(X). Define ϕ : A → S(X), a → (Xa++ , Xa00 ). It is easy
to see that ϕ is a homomorphism. Since Xa0 = X\Xa00 , Xa+ = X\Xa++ , we have
ϕ(a0 ) = (ϕ(a))0 , ϕ(a+ ) = (ϕ(a))+ . The fact A is regular implies ϕ is one-to-one.
Therefore, A can be embeded in S(X).
Corollary 3.5. Let A ∈ A. Then A embeds in ML .
Theorem 3.6. A ∈ A iff there exists D ∈ B such that A can be embeded in the
algebra associated with D.
Proof. If A embeds in D ∈ B, then A ∈ A, by Theorem 3.2. On the contrary,
suppose A ∈ A, then A can be embeded in S(X). But S(X) ∼ = M L , ML ∼ = DL , where
L ∈ D.
For each A ∈ A , let X be the set of all the prime ideals on A, L is the
field of all subsets of X. Define the map ϕ : A → B by A → ϕ(A) =
if(Xa++ ,Xa◦◦ Xa++ ,XXa00 ) |a ∈ A},
C.-W. Luo / A Cayley Theorem for Regular Double Stone Algebras 505

where the function if(Xa++ ,Xa◦◦ Xa++ ,XXa◦◦ ) is


if(Xa++ ,Xa◦◦ Xa++ ,XXa◦◦ ) : L3 → L,
(P1 , P2 , P3 ) → (Xa++ ∩ P1 ) ∪ ((Xa◦◦ \ Xa++ ) ∩ P2 ) ∪ ((X \ Xa◦◦ ) ∩ P3 ).
From the proof of Theorem 3.6, we may see that ϕ is an embedding.
Example 3.7. The following figures show A, S(X), ML and DL for A ∈ A . Let
X = {(0], (a]} and we denote the prime ideal (0] by x1 and denote the prime ideal (a]
by x2 . A can be embeded in S(X). The image of A in S(X) has been shaded. Thus A
embeds in the algebra associated with D.
1 = 00 =c 0+ = a+
ca
c
0 = 10 = 1+ = a0

Figure 1 The regular double Stone algebra A.


s (X, X)
@
@
({x1 }, X) c @ c ({x2 }, X)
@ @
@ @
−→ ({x1 }, {x1 }) c @ cs (∅, X) @ c ({x2 }, {x2 })
@ @
@ @
@c @c
(∅, {x1 }) (∅, {x2 })
@
@
@ cs
(∅, ∅)

Figure 2 The subset-pair algebra S(X) on the prime ideal set of A.

sc (X, ∅, ∅)
@
@
({x1 }, {x2 }, ∅) c @ c ({x2 }, {x1 }, ∅)
@ @
@ @

= ({x1 }, ∅, {x2 }) c @ cs (∅, X, ∅) @ c ({x2 }, ∅, {x1 })
@ @
@ @
@ @c
(∅, {x1 }, {x2 }) c (∅, {x2 }, {x1 })
@
@
@ cs
(∅, ∅, X)

Figure 3 The regular double Stone algebra ML from the distributive lattice L.
506 C.-W. Luo / A Cayley Theorem for Regular Double Stone Algebras

s if(X,∅,∅)
@
@
if(x1 ,x2 ,∅) c @ c if(x2 ,x1 ,∅)
@ @
@ @

= if(x1 ,∅,x2 ) c @ cs if @ c if(x2 ,∅,x1 )
(∅,X,∅)
@ @
@ @
@c @c
if(∅,x1 ,x2 ) if(∅,x2 ,x1 )
@
@
@s
if(∅,∅,X)

Figure 4 The regular ternary class of functions DL from ML .


References

[1] R.Balbes and Ph.Dwinger, Distributive Lattices, U.Missouri Press,1974.


[2] S.L.Bloom and Z.Esik, Cayley iff Stone, Bull.EATCS 43(1991),159-161.
[3] S.L.Bloom, Z.Esik and E.G.Manes, A Cayley theorem for Boolean algebras,
Amer.Math.Monthly 97(1990),831-833.
[4] I.Chajda, A representation of the algebra of quasiordered logic by biary functions,
Demonstratio Math.27(1994),601-607.
[5] I.Chajda and H.Langer, A Cayley theorem for distributive lattices, Algebras Univer-
s.60(2009),365-367.
[6] Z.Esik, A Cayley theorem for ternary algebras, International Journal of Algebra and
Computation 3(1998),311-316.
[7] C.W.Luo, A Cayley theorem for Stone algebras, Journal of Mathematical Research
and Exposition. 4(2007),960-962.
Fuzzy Systems and Data Mining II 507
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-507

ARII-eL: An Adaptive, Informal and


Interactive eLearning Ontology Network
Daniel BURGOS1
UNESCO Chair on eLearning. Universidad Internacional de La Rioja (UNIR). Gran
Via Rey Juan Carlos I, 4126002 Logroño, La Rioja, Spain

Abstract. User models – the representation of user properties, preferences, goals,


etc. – are crucial in modern educational applications. Several related processes,
such as recommendations and user adaptations, depend on a user model. The
increasing demand for new and more complex system features has resulted in an
increasing complexity of user models. Ontologies are a tool to deal with this
growing complexity. Well-defined and flexible ontologies can help developers
cope with the inherent difficulties. In this paper, the authors present ARII-eL, an
Ontology Network for Adaptive and Informal eLearning. ARII-eL defines a user
model which is flexible, adaptable and scalable. ARII-eL has been developed for
use in a range of educational settings including informal and interactive learning,
for which there are not so many user models available for. In addition, we show
the ARII-eL ontology network and user model and one application, practical case,
and the user results. The massive use of social networks, wikis, collective
repositories, instant messaging services, and other social applications is moving
the learning process out of classrooms into informal environments, at least as a
complement to the regular lessons; and learning applications should take
advantage of this tendency. This work is a step forward in that direction.

Keywords. User Modelling; Ontology; Ontology network; Informal Learning;


Adaptive Learning; Personalization; Interaction; Recommendation

1. Semantic Web Technologies for User Modelling

Semantic Web (SW) technologies, initiated by Tim J. Berners-Lee, allow the addition
of meaning to information through the use of a semantic formalization, named ontology.
Therefore, an ontology corresponds to a vocabulary containing a hierarchy of semantic
concepts and properties employed for the definition of the knowledge in a given
domain. Concepts and properties are used to annotate the content of the application.
Semantic annotation consists in the creation of links between concepts and their
instances in the content. In this context, an information search can be related with
content, semantic relations or both. For example, if the user is looking for an expert
located in Paris the system can see that Paris is a French city, and if there are no experts
living in the capital it can propose experts located in other cities in France. In order to
achieve this outcome, the system makes an inference using the subsumption relations

1
Corresponding Author: Daniel BURGOS, UNESCO Chair on eLearning. Universidad Internacional
de La Rioja (UNIR). Gran Via Rey Juan Carlos I, 4126002 Logroño, La Rioja, Spain; E-mail:
daniel.burgos@unir.net.
508 D. Burgos / ARII-eL: An Adaptive, Informal and Interactive eLearning Ontology Network

present at the ontology. There are reports in the literature which highlight the benefits
of inference in ontology-based user models [1].
In order to implement the foundations of Semantic Web, W3C has defined several
formalisms which allow the conceptual model to be represented and the knowledge
base to be managed. The logical representation is based on information triples defined
in the RDF (Resource Description Framework) language [2]. In this language each
triple is formed by a subject (for example a concept), a predicate (generally a semantic
property) and an object (the value of a resource). From this logical model, and using
either the formalism RDFS (Resource Description Framework Schema) [3] or OWL
(Ontology Web Language) [4], the ontology can be defined containing only concepts
and the semantic relations between them. Starting from an ontology, annotations can be
instanced (with RDF) in order to describe the content.
Ontologies and annotations are, in general, stored in a repository and form a graph of
triples that can be queried using SPARQL [5], the query language for RDF. It is also
possible to apply semantic rules de-fined with SWRL (Semantic Web Rule Language)
[6].
Using ontologies, several vocabularies have been proposed to describe important
domains. These vocabularies are public and permit semantic interoperability across the
Web. Interoperability issues of ontologies in educational applications have been
analysed in [7], where the authors couple two complementary systems through the use
of a common ontology.
In the following sections two of these vocabularies, which are of great importance for
user-profile modelling, will be described.

1.1. FOAF

FOAF (Friend of a Friend) [8] is based on decentralized technology and has been
designed to allow data integration through a variety of applications and Web services.
In order to achieve this goal FOAF has taken a different approach for data interchange.
It neither requires the user to specify anything about him or herself or others, nor limits
what can be said about the user or the variety of semantic vocabularies that may be
used.
Personal data are located in the category FOAF Basics. Personal Info contains
information like age, interests, etc. An important object property is knows, which
allows for the representations of interrelations between people. This property can be
very important for social networks. The category Online Accounts / IM contains the
identifiers used to connect to most extended chats. Projects and Groups contain the
projects and organizations the person belongs to. Documents and Images hold
references to documents or images browsed or in which the user has shown an interest.
In this category all the resources browsed by the user can be located. FOAF contains a
series of important concepts useful for any user model.

1.2. SIOC

The ontology SIOC (Semantically-Interlinked Online Communities) [9] provides the


main concepts and properties necessary to describe the information generated for
online communities, social networks, wikis, weblogs, etc.
Online communities have replaced traditional media like libraries and publications to
keep a community informed. They are a very useful source of information, and, on
D. Burgos / ARII-eL: An Adaptive, Informal and Interactive eLearning Ontology Network 509

many occasions, the place where the sought information is found. SIOC is intended to
link online communities using semantic Web technologies to describe the information
online communities have about their structure and content. Developers can use SIOC to
express the information contained in online communities in a simple and extensible
way.
Other ontologies, like Dublin Metadata Core, FOAF, etc., can be mixed with
SIOC terms. The SIOC kernel defines classes like Community, which defines a
community, and Container, a general class parent of the class Forum. The class Item is
the parent of the classes Post, User, and so on.
SIOC developers have defined a basic kernel and new concepts or extensions to
existing ones have been added as modules. This way the kernel is kept simple and
legible. At present, SIOC provides three modules: Access, Services and Types. Access
models access permissions, roles, etc. This module defines the classes Permission and
Status. Types module contains classes that extend to types like Forum and Post. This
module contains a large set of specialized classes. Developers are encouraged to add
new classes to the ontology here.
It is common for online communities to publish interfaces of Web services. These
interfaces allow programmatic search as long as services for content management are
used, usually SOAP and/or RES-Tian. Classes to deal with these are included in the
Service module.

2. The ARII-eL Model

2.1. User Modelling with Semantic Web Technologies

User models are in general complex but at the same time need great flexibility. The
user model should represent user characteristics but also context characteristics. For
example, documents, activities or social interactions are important criteria to describe
the user. Semantic Web technologies seem to be the more suitable to satisfy user-model
needs and requirements. ARII-eL (Adaptive, Informal and Interactive eLearning
Ontology Network), the conceptual model defined in this paper is a fully open and
flexible logical model based on the composition of different vocabularies. The power
of this ontology allows users to:
x Develop a conceptual model independent of the application;
x Reuse public vocabularies;
x Extend the conceptual model with new vocabularies;
x Achieve interoperability concerns between the descriptions’ different
contents;
x Search information using semantic criteria;
x Enlarge the knowledge base with a reasoning feature, e.g., using the semantic
rules; and
x Manage system information and make recommendations also using the
semantic rule.
The definition of ontologies needs to follow basic elements of the methodology.
The first element consists of the development of an ontology network to represent the
diversity of data and content. In order to organize the ontology network, the network
kernel must be formed by generic concepts and the most important relations between
510 D. Burgos / ARII-eL: An Adaptive, Informal and Interactive eLearning Ontology Network

them. This approach allows a focus on the main concepts, the user in this case, and
extends them by adding specialized ontologies like a knowledge domain. There are also
other criteria to assure the quality of the defined ontologies and the semantic coherence
of the model.

2.2. Architecture of the Ontology Network

The required ontologies must have different functionalities in the system; these
functionalities can be classified into three categories:
x Ontologies representing the conceptual model for content annotation;
x Ontologies for the extend the knowledge base through the reasoning process;
x Ontologies used for information classification and personalization.
Annotation consists of the instantiation of the ontologies in semantic descriptions
(semantic annotation). Every new element of content or user action can be associated
with a description. Part of this description can be generated automatically by the
system (automatic annotation) and the other part can be made by the user (manual
annotation). Ontologies that contain the inferences made from annotations can also be
defined.
Finally, ontologies defining criteria for information personalization will also be
used. Inferences can be applied using explicit criteria selected by the user in his or her
profile.

2.2.1. Kernel of the Ontology Network


In this section the ARII-eL ontology network will be described. This model has been
defined for use in several applications where user experience, interrelations between
users, professors, media, etc., need to be modelled. The main ontologies that compound
the ontology network and its interrelations are shown in Figure 1.
From this figure it can be seen that the design is basically modular, aiming to ease
its comprehension and extensibility. In addition, ARII-eL can be used by several
applications while keeping its basic structure. Another aspect to notice is that the ARII-
eL ontology network is centred on the user and the activities he or she is able to do.
The ontologies described below are formed of a few classes that in some cases inherit
from or reference classes defined in known ontologies, like FOAF, SIOC, SKOS, etc.,
and in other cases define the concepts from scratch.
ARII-eL is simple but this simplicity does not affect its applicability. One of the
ideas behind this design relates to the generation of new classes, properties, annotations,
etc., through the application of rules to current classes, properties, etc. In principle, for
ARII-eL to be applied to different software applications it should only be necessary to:
x Add some application-specific classes or properties; and
x Add a set of semantic rules which will allow the automatic generation of
application-tailored information.
This means that if the ARII-eL ontology network has some given information, by
using rules or inferences it is possible to generate complementary information adapted
to the characteristics of a given application. In a way, the ARII-eL ontology network
can be seen as a static model, which, however, has the necessary information for the
generation of a dynamic and adaptable model.
D. Burgos / ARII-eL: An Adaptive, Informal and Interactive eLearning Ontology Network 511

Figure 1. Ontologies of the ARII-eL ontology network and their interrelations.

2.2.2. Description of the Ontologies


First, the User ontology will be described, which represents the application user and the
main related properties: demographic and personal data, education, etc. It contains the
User class, which inherits from the foaf: Person class. This means it has all the
properties defined by the parent class, but some new properties have been added: has
Activity, has Competence, has Goal, has Group, has Interest, has Preference. These are
object properties because they reference instances of other classes. These properties
relate the User ontology with the ontologies in Figure 2.
The Social ontology models user communities; it represents user membership and
the goals associated with a user community, and it provides an identity for the
community as a whole. This ontology contains the class Team, which inherits from the
foaf: Group class.
Regarding the Competence ontology, it is important to note that an actual
implementation of it depends a lot on the target application. In consequence a simple
definition of the main classes has been made. First, the Competence class is defined,
which is the parent class of the Cognitive Competence, Physical Competence and
Technical Competence classes. These classes are very simple; the idea is that
annotations related to acquired competences can be generated with semantic rules
which depend on the characteristics of the given application.
The Goal ontology models the personal and professional objectives which
motivate a person to use a software application. Here can be found the Goal class,
which is the parent of the Educational Goal, Social Goal and Thematic Goal classes.
User goals can determine which educational resource should be recommended to the
student. A change in user goals could imply a modification in the recommendation
strategy from that moment on. This characteristic is common to the Interest ontology,
which focuses on the professional and/or education interests of the user. This ontology
models these interests in terms of, among other characteristics, the knowledge the user
has of the domain. It is clear that an expert and a beginner will not have the same
interests, and this difference should be taken into consideration by the user model.
The Domain ontology models the knowledge do-main. It can be incremented with
references to ontologies specific to a given domain. This means that this ontology is a
kind of connector, and its definition will depend on the application and the available
domain ontologies.
512 D. Burgos / ARII-eL: An Adaptive, Informal and Interactive eLearning Ontology Network

The Activity ontology models the activities per-formed by the user. These
activities can include social interactions, media interaction, etc. This ontolo-gy is also
related to the domain ontology, a media ontology (if necessary), and an ontology
related to user interaction. The goal is to annotate any activity performed by the user; to
achieve it, any number of ontologies which model the user activities can be connected.
The Preference ontology has been defined to model all user preferences but those
related to accessibility. The Preference class is defined as the parent class of classes
like Cognitive Preference and Learning Preference. The Accessibility ontology
contains the Accessibility Preference class, which inherits from Preference and is the
parent of the classes Content Accessibility, Context Accessibility, Control Accessibility
and Display Accessibility. Audio Accessibility, Video Accessibility, Key-board
Accessibility, etc., inherit from Content Accessibility. Other classes could be added and
existing ones incremented according to the needs of an application. It does not make
sense to define a complete accessibility ontology, with all the associated effort, and
then to use a minimal part of it. So, a simple ontology should define basic concepts and
create a framework for future additions and modifications.
The modular definition of ARII-eL allows any ontology to be disconnected where
it is not useful for a given application. This should not affect the rest of the network.
This flexibility should boost applicability to other software projects.

2.2.3. Synthesis of the ARII-eL Ontology Network


The ontology network proposed here has been created with the objective of use in two
projects in development at present. But it is general enough to be useful for any
enhanced learning technology project. This is because ARII-eL has the minimum
necessary structure to represent a user model. However, the defined ontology network
is functional and can be used as the starting point and kernel of any user model.
At first it should be mentioned that the user is the centre of the ontology network
and a selection of the main properties related to the user has been provided. In addition,
the class representing the user is related to other supporting classes that complete the
user information. Any of these classes may be incremented or decremented
independently of the others, ac-cording to the application requirements. But the main
classes and the relations between them should be invariable from project to project.
A considered goal has been that the ontology network should ease the representation of
personalized learning. Each user is represented with relations in-dependent of the rest
of the users. The existence of an ontology focused on social communication, the Social
ontology, allows the representation of the interactions between users. This
representation can be more or less detailed depending on the project. For example, if it
is necessary to do a semantic search or to add data to the user-generated information,
the related metadata can be added to the ontology.
Analyzing ARII-eL with regard to the classification of the user models it is
possible to reach the following conclusions:
x Canonical vs. individual model. The ontology network presented here has
been designed to be employed as an individual model; it would be sub-utilized
in a canonical model. However, this does not mean that it cannot be used to
represent a canonical model; only, its potential will be misused.
x Explicit vs. implicit model. This classification of the models, in principle,
does not imply a difference regarding the information that can be rep-resented
in a user model oriented to any of these models. However, an implicit model
D. Burgos / ARII-eL: An Adaptive, Informal and Interactive eLearning Ontology Network 513

with se-mantic rules to generate new information should be less complex than
an explicit model. In any case, the ARII-eL ontology network is applicable to
any of these approaches and it should be considered that at present almost all
models are a mix between both approaches.
x Short term vs. long term. It is clear that this aspect should not affect the ARII-
eL ontology network at all. The only difference is regarding the persistence
strategy: if the information is stored in working memory, then it is a short-
term model; if the information is stored in the disk, then it is a long-term
model.
x Another advantage of ARII-eL is how easy it is to extend it. This is because of
its modularity, which also gives a high degree of comprehensibility to the
ARII-eL ontology network. These characteristics should increase the
possibilities to apply the ontology network to a range of projects.
x The use of first persons (i.e., “I”, “we”, “their”, possessives, etc.) should be
avoided, and can preferably be expressed by the passive voice or other ways.
This also applies to the Abstract.
x A research paper should be structured in terms of four parts, each of which
may comprise of multiple sections:
o Part One is problem description/definition, and a literature review upon
the state of the art;
o Part Two is methodological formulation and/or theoretical development
(fundamentals, principle and/or approach, etc.)
o Part Three is prototyping, case study or experiment;
o Part Four is critical evaluation against related works, and the conclusion.
x A survey paper may skip Part Three, but should multiply Part Two and
elaborate Parts One and Four.
x An application paper may lightly touch Part Two but should elaborate Part
Three, with Parts One and Four similar to what the research paper would.
In any article it is unnecessary to have an arrangement statement at the beginning
(or end) of every (sub-) section. Rather, a single overall arrangement statement about
the whole paper can be made at the end of the Introduction section.

3. iLIME: An Application Case of the ARII-eL User Model

3.1. The iLIME Project

At present, the ARII-eL ontology network is being used as part of the project iLIME, a
software engine based on a conceptual and personalized learning model, L.I.M.E.,
which is based on four ponderable categories: Learning (L), Interaction (I), Mentoring
(M) and Evaluation (E) [10]. These contributions are the pillars for any learning
scenario in their formal and informal settings. Our approach provides the student with
adaptive tutoring and support via a simple and fine-grain configurable rule system. In
addition, students are monitored along the course of their interaction through the
eLearning platform, which efficiently gathers necessary inputs [11, 32] like actions,
decisions, grades, communication, and so on. By combining rules, tracking data,
categories and settings, students finally get personalized counselling about their
academic path.
514 D. Burgos / ARII-eL: An Adaptive, Informal and Interactive eLearning Ontology Network

The model also provides an added value from other recommender approaches [13,
34] in online education by delegating to the teacher/tutor/manager the following
actions: a) design of the rule set; b) distribution of a percentage contribution to each
category and setting; and c) configuration of site inputs and monitoring strategies. In
short, iLIME is a tutor-assisted framework for student guidance; as with other
recommender systems, its goal is to improve learning efficiency. The Recommendation
engine in iLIME, Meta-Mender [15], provides the management of the information and
knowledge of the user, which becomes the basis for adaptive recommendation. This
user tracking is taken from the Learning Management System, which hosts a
knowledge data-base to be used by expert users.
In the case of the iLIME project, the ARII-eL ontology network offers support for
highly specialized communities. The main feature of this type of com-munities is, in
addition to the inherent specialization from resident students to highly experienced
surgeons, the almost complete absence of time that individuals have to share within the
community. So, in this case implicit techniques are fundamental in order to get all
available information from the user while trying to interfere as little as possible in the
user’s interaction with the application. In this case the application of semantic inference
rules is the key factor to personalize the user model. Another important factor is the
presence of a massive number of media, the majority of them videos of surgical
operations. The low availability of time implies that users will in most cases look for
specific knowledge while skipping less interesting material; comments and posts added
by experienced users will also be of great importance, so the learning will be in a large
part informal. As long as ARII-eL is conceived of as a user model for applications with
an important component of adaptive, informal and interactive eLearning, it is suitable
for the iLIME application.

3.2. Design of the Application Case to Evaluate the User Model

In order to run a validation of the ontology and the user model, we designed and
implemented an application case (learning scenario) of the ARII-eL ontology network
applied to the iLIME project from 2 to 29 July 2012. We used a graduate course (in
Spanish) on “Design and management of research projects”, in the Master of Science in
eLearning and Social Networks, an online, official Master’s degree at Universidad
Internacional de La Rioja (UNIR). This course took place over four weeks, with 49
enrolled students. All the students but one took part in the experiment. Therefore, we
had 48 graduate students, between 35 and 45 years old, from two countries (Spain, 45
students; Colombia, three students) and two continents (Europe, South America), with
a gender distribution of 28 females and 20 males. The support group consisted of a
teacher, an online tutor, an academic coordinator, and a director. In addition, other
cross-support departments might have provided some assistance (i.e., administrative,
legal, counselling, research, library, etc.). The environment was executed for every user
only if (s)he agreed with the terms described in a formal document, so that the
recording and tracking of their private data were explicitly authorized.
We split the base group into two equally distributed groups (24 members for each
group). Group A (experimental) was engaged with the ARII-eL ontology network and
received personalized recommendations based on a number of inputs, including
traditional ones (e.g., teacher, tutor, admin staff). Group B (control) followed the
course without ARII-eL, and received traditional support only. The distribution of
learners between Group A and Group B was based on previous academic records.
D. Burgos / ARII-eL: An Adaptive, Informal and Interactive eLearning Ontology Network 515

The aim of this implementation was to prove the validity of the conceptual model
in a self-contained way. It was not our purpose to insert any disruptive element into the
development of a subject along a timeline to show significant progress in learning
assets or results in relation to every learner. On the contrary, we tested the model in a
split classroom to retrieve and analyse the learners’ track records on Learning,
Interaction, Mentoring and Evaluation, so that we might demonstrate whether or not
the conceptual model was a valid option to provide personalized feedback that might
lead to an increase in user performance.

3.3. Evaluation Results of the Application Case

The application of the ARII-eL user model to the described scenario showed a clear
and positive progress of the users in Group A, those who received recommendations
supported by ARII-eL. The overall average of inputs, categories and students shows a
final positive difference of 10.53% between the experimental group and the control
group (66.72% - 56.19%), in addition to a maximum difference between corners of
37.37% (81.41% - 44.04%).
After the implementation of the learning scenario we distributed a questionnaire
designed for evaluation by users of the learning scenario. We collected responses from
21 users from the experimental group (Group A, n=21).
Scores followed a modified Likert scale, from 1 (strongly disagree) to 5 (strongly
agree); 0 meant “completely against”. The questionnaire combined five categories for a
total of 12 questions. Categories and Questions are shown in Figure 2.

Figure 2. Evaluation by end-users of the ARII-eL user model and the iLIME project. Score by question and
category
516 D. Burgos / ARII-eL: An Adaptive, Informal and Interactive eLearning Ontology Network

The questionnaire was aimed at extracting useful information from the guided
users in Group A, those who received the recommendations provided by the LIME
model.
The results of this survey show a clear approval of the recommendation approach
and a strong influence on personal performance. The overall average of 3.95 out of a
maximum of 5 shows 79.05% positive feedback (Figure 2).
By category, Content shows the highest score with 4.36 points out of 5 (87.14%).
Adaptation shows the lowest score with 3.62 point out of 5 (72.38%). The highest-
scored question was #8 (Content: “The recommendation length, was it appropriate?”)
with 89.52%. However, the lowest score was not only in the lowest category (#10,
Adaptation: “Do you think that the recommendation took all the related factors of your
contribution to the subject?”) with 65.71%, but also in the category Usefulness (#1,
Usefulness: “Would you implement this recommendation model as a general service at
this university?”), with 65.71%. The other best-scored questions were #7 (Performance:
“Was the provided recommendation accurate and did it provide good counselling?”,
86.67%), #9 (Content: “The recommendation, was it properly written and easy to
understand?”, 84.76%), and #4 (Usability: “Did the recommendation focus your
attention on the screen?”, 83.81%).
These results outline a number of insights. Firstly, most of the users approved of
the user model and the experience. They found the learning scenario a valid model to
apply, which was useful for their learning experience. In addition, they found the
provided recommendations and their presentation on the screen to be appropriate and
accurate. However, we conclude that the concept of adaptation might not have been
completely understood. Since the final scores of the evaluation questionnaire seem
lower in questions related to adaptation, we think users’ expectations were not
completely met. Since they liked the model, the system and the experience, it is quite
likely that the definition of adaptation and/or the definition of the guidance provided
was not explained well enough. Nevertheless, questions on Adaptation mostly scored 3,
4 or 5, which means a result with room for improvement; nonetheless, it is also highly
remarkable, at the end.

4. Conclusions and Future Work

In this paper, the authors have presented an ontology network for user modelling
focused on Adaptive, Informal and Interactive eLearning. The developed ontology
network is simple, modular and flexible. The simplicity comes from the fact that each
class has the most important properties necessary to represent the acquired knowledge.
The relations between classes are clear without large inheritance hierarchies. ARII-eL
is modular, because each ontology is functional by itself, and does not depend on other
ontologies to express its concepts and the relations between them. Each ontology
contains a set of classes with clear relations to each other. Finally, the flexibility is
explained by the facility to extend any class and the fact that new ontologies can be
added at any time without greatly affecting the overall network. Several types of user
models can be implemented with the proposed ontology-based model.
The presented ontology network will help fill a gap in user modelling related to the
support of applications with an important informal learning component. The
importance of social networks as a means to socialize and share knowledge and
experience must be taken into account by developers and de-signers of educational
D. Burgos / ARII-eL: An Adaptive, Informal and Interactive eLearning Ontology Network 517

applications. The lack of user models supporting this type of learning could influence
the development of applications able to take advantage of this emerging trend.
For validation we designed and implemented a learning scenario at the
Universidad Internacional de La Rioja (UNIR), in the context of the official academic
Master’s programme of Science in eLearning and Social Networks, in July 2012. We
used the subject “Design and management of research projects” and a specific software
implementation called the iLIME project, supported by the LIME conceptual model.
The scope of this scenario dealt with scheduled, regular activities (e.g., knowledge
tests), and informal learning activities (e.g., user interaction in group debates), up to 30
various inputs. Over four weeks we took weekly measurements (milestones M8, M15,
M22, M29) from two groups of 24 students: experimental (A) and control (B).
The results of the application case showed positive progress over the four weeks,
with a final positive difference of 10.53% between the groups, and a maximum
difference of 37.37%, favouring the experimental group receiving support from ARII-
eL. In addition, we distributed an online questionnaire among the members of the
experimental group (A). The results showed clear support, with 79.05% satisfaction
among the 21 respondents. The results were concentrated on the responses 4 and 5
(73.40%) on a modified Likert scale from 1 (completely disagree) to 5 (completely
agree), including value 0 (completely against). The survey grouped a total of 12
questions in five categories: Usefulness, Usability, Performance, Content, and
Adaptation.
These results, from the users’ performance and questionnaires, represent tangible
proof of the success of the ARII-eL user model and the implementation in the iLIME
application, based on a large number of objective measurements. Therefore, they back
up the conceptual design with practical experience.

5. Acknowledgements

We thank Emmanuel Jamin and Vicente Romero for their contribution to the original
conceptual work on the design of the ontology and the Meta-Mender recommendation
system. This work is partially funded by UNIR Research (http://research.unir.net),
Universidad Internacional de La Rioja (UNIR, http://www.unir.net, Spain), under the
Research Support Strategy [2013-2015], Research Group TELSOCK on Technology-
enhanced Learning and Social Networks.

References

[1] S.E. Middleton, N.R. Shadbolt and D.C. De Roure, Ontological user profiling in recommender systems.
ACM Transactions on Information Systems (TOIS), 22(2014), 54 – 88
[2] R. Denaux, V. Dimitrova and L. Aroyo, Interactive Ontology-Based User Modeling for Personalized
Learning Content Management, AH 2004: Workshop Proceedings Part II, 338 – 347, 2004
[3] RDF, RDF: http://www.w3.org/RDF/
[4] RDFS, RDFS: http://www.w3.org/TR/rdf-schema/
[5] OWL, http://www.w3.org/TR/owl-features/
[6] SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/
[7] SWRL, http://www.w3.org/Submission/SWRL/
[8] J. Breslin, A. Harth, U. Bojars and S. Decker, Towards semantically-interlinked online communities.
Proceedings of the 2nd European Semantic Web Conference, 2005
[9] FOAF, http://xmlns.com/foaf/spec/20100101.rdf
518 D. Burgos / ARII-eL: An Adaptive, Informal and Interactive eLearning Ontology Network

[10] SIOC, http://rdfs.org/sioc/spec/


[11] D. Burgos, L.I.M.E. A recommendation model for informal and formal learning, engaged. International
Journal of Interactive Multimedia and Artificial Intelligence, 2(2013), 79-86.
[12] D. Burgos, C. Tattersall, and R. Koper, How to represent adaptation in eLearning with IMS Learning
Design, Interactive Learning Environments, 15(2007), 161-170.
[13] J. J. Rocchio, Relevance feedback in information retrieval, in the SMART Retrieval System.
Experiments in Automatic Document Processing. Englewood Cliffs, NJ: Prentice Hall, Inc., 1971.
[14] G. Linden, B. Smith, and J. York, Amazon.com recommendations: Item-to-item collaborative filtering,
Internet Computing IEEE, 7 (2003), 76-80.
[15] B. Marlin, Modeling user rating profiles for collaborative filtering. In Advances in neural information
processing systems (p. None).
[16] S. Thrun, L. K. Saul, and B. Schölkopf, Eds. Cambridge, MA: MIT Press, 2003, 627-634.
[17] V.A. Romero, D. Burgos, Meta-Mender: a Meta-rule Based Recommendation System for Educational
Applications. Procedia Computer Science, 1(2) (2010), 2877-2882,
Fuzzy Systems and Data Mining II 519
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-519

Early Prediction of System Faults


You LI a , Yu-Ming LIN b,1
a
Guangxi Key Laboratory of Automatic Detecting Technology and Instruments, Guilin
University of Electronic Technology, Guilin City, Guangxi Province, 541004, China
b
Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic
Technology, Guilin City, Guangxi Province, 541004, China

Abstract. A system will produce massive status data during its runtime, which
contains rich status information. In this work, we target at detecting system faults as
early as possible based on the system status data sequences. Firstly, we formalized
the system fault detection into classification problem, in which different types of
status data were integrated to reflect the system status. Secondly, we devised a
detection method to predict the class of a status sequence when its full length is not
yet available. At last, a series of experiments were conducted to verify the proposed
methods’s effectiveness.

Keywords. Fault detection, Early prediction, Data sequence

Introduction

Over the last few decades, data-driven applications have received many attentions be-
cause data collecting and processing ability of computer has an enormous promotion.
For a complex system, massive status data is generated over time, which contains rich
information for diagnosing the system’s status. Based on such data, it is possible for user
to detect the system’s faults on time, even to predict the faults before they happen.
In recent years, the data-based system fault detection techniques were proposed to
maintain systems, such as in [1,2,3]. More specifically, Neuhaus et al. [4] developed the
VULTURE to analyze the correlation of history data on modifying the source codes, bug
reports and software structure by mining the vulnerability database. Alhazmi et al. [5]
built a model to predict the number of undiscovered bugs by using the rate of discov-
ering bugs. For the fault detection of software systems and the hardware systems, con-
structing the relationship graph for different artifacts is a common method, such as in the
HIPIKAT [6], the FRAN [7]. Traditional mechanism analysis diagnosis methods depend
on a complex nonlinear dynamic mathematical model, therefore they have the limitations
on timeliness and accuracy for early fault diagnosis for the complicated system. In fact,
system status data constitutes various data sequences with respect to different systems or
purposes. Generally, the system’s status can be presented by some values such as temper-
atures, CPU workload. These values can be used independently for the single component
of the system, and they can also be merged into one value to reflect the system’s state. For
1 Corresponding Author: Yu-Ming LIN, Guilin University of Electronic Technology, No. 1, Jinji Road,

Qixing District, Guilin City, Guangxi Province, 541004, China; E-mail: ymlinbh@163.com
520 Y. Li and Y.-M. Lin / Early Prediction of System Faults

these cases, some burst detection techniques like sliding windows [8] can find the faults
effectively. However, some values like temperatures would improve slowly, which leads
to a poor effectiveness for the traditional methods. Further, if faults can be predicted with
some harbingers, it would reduce the system’s risk and damage significantly.
In this work, we tackle the problem of predicting the system’s faults on system
status data series. The diagnosis on system’s faults is treated as a classification problem
according to the status data series, in which the label means there is a system fault or
not. In this scenario, a classifier trained by labeled samples can overcome the above
limitations. As time passed by, the classifier makes the prediction on system faults online
for the current status data series. In summary, our work includes mainly as follows:
1. We formalize the system fault detection as a supervised learning problem, in
which the objective is to predict the system status accurately as early as possible.
2. A system fault prediction algorithm based on early classification is proposed,
by which system faults can be identified when the series data’s length is not yet
available.
3. Extensive experiments are conducted to verify the proposed method’s efficiency.

1. Problem Statement

Assuming S = {s1 , s2 , ..., s3 } is the set of status data sequences. For the convenience
narration, we list some symbols used in this work in Table 1.

Table 1. Some used symbols in this work

Symbol Description Symbol Description

si a status data sequence Lsi the label of s[i]


sji the j th status data in si si [1 . . . j] the j-length prefix subsequence of s_i
Lensi the length of s[i] C the trained classifier

For each data sequence, we try to predict its class as early as possible. In other words,
we try to find the j as small as possible, where we can predict the data sequence’s class
accurately. Formally, our target can be concluded as follows.


n
C∗ = argmin loss(C(si [1 . . . j]), Lsi ) (1)
C∈H,j≤Lensi i=1

where H is the set of classification hypothesis, C(x) would make a prediction for the
sample x. The loss is the predefined loss function such as the 0-1 loss, the hinge loss.

2. Predicting the system’s faults as early as possible

2.1. Digitizing and serializing the system status

The system’s status can be reflected by various factors, such as temperatures, workload.
However, these factors are measured with different dimensions. For example, the CPU’s
Y. Li and Y.-M. Lin / Early Prediction of System Faults 521

temperatures could be 30 degrees Celsius at a certain time point, CPU’s workload could
be 70% at the same time. In order to integrate various factors into one value, the first
thing is to normalize these factors. In this work, we use the Z-score normalization.
Moreover, each factor plays different role for predicting the system’s overall status.
Some weight parameters are used to regulate the contributions of different factors, which
are needed to turn by expert experimentally. So, a value of the status data sequence can
be calculated by the following formula.


n
ajk − μk
sji = θk (2)
σk
k=1

where θk is the parameter turning the k th indicator’s weight, ajk is the j th value of the
k th indicator, μk is the mean of the k th indicator’s values, σk is the standard deviation
for the k th indicator.

2.2. Early prediction for system status

If the system status is made up by many status data sequences with fixed length, predict-
ing the system status can be treated as a binary classification problem. For example, the
label "1" means system works normally, and "-1" means system might not work properly.
In such scenario, the existing classification method can be used to make predictions.
The nearest neighbor algorithm is one of the frequently-used algorithms for classifi-
cation, in which a sample’s label is determined by the samples close to it. This method is
simple, parameter-free and does not require feature selection and discretization. It would
work until the fixed-length sequence is generated totally. However, some system statuses
can be diagnosed with part of system status data. Based on the early classification[9], we
proposed a technique to predict the system’s status as early as possible.
Let Nsli is the set of the data sequence si ’s nearest neighbors in the training set T r,
which means Nsli = {t|t = arg min(dist(t[1, l], si [1, l]))}. The si [1, l] is the sequence
t∈T r
si ’s prefix subsequence with the length of l. The dist(a, b) is a distance function, which
can measure the distance of two data sequences like the Euclidean distance. Let Rl (t) be
the set of reverse nearest neighbors of t[1, l] that treat t[1, l] as the nearest neighbor, that
is Rl (t) = {t |t ∈ Ntl }.
Based on the definitions above and the conclusions in [9], the data sequence si ’s
Minimum Prediction Length MPL(si ) =k if for any l (k ≤ l ≤ n), Rl (si ) = Rn (si ) = ∅
and Rk−1 (si ) = Rn (si ), where n is the full length of data sequence si . Then, we can
make a prediction for a testing sequence sample by the 1-nearest neighbor classification
algorithm as soon as it’s length is up to the MPL value. However, such method has strict
requires for the stability of a training sample’s reverse nearest neighbor set after the time
point MPL. Moreover, this method is prone to overfit the training set.
To improve the robustness, a sequence’s label should be generated by a cluster of
samples rather than one sample. Then, a clustering algorithm could be used to group the
sequences. The MPL of a cluster G with n-length sequence data is k if for any l ≥ k,
Rl (G) = Rn (G) and G is 1-nearest neighbor consistent [10] and for l = k − 1 the first
two conditions can not hold simultaneously [9]. When a testing sample belonging to a
cluster can be predicted by the dominant label in the cluster according to the cluster’s
522 Y. Li and Y.-M. Lin / Early Prediction of System Faults

minimum prediction length. This dominant label can be determined by a user-defined


threshold, which measures the sample’s proportion with a particular label to all samples
in this cluster. By such support threshold, the overfiting can be avoided at the same time.
Algorithm 1 (EPSS) shows the details of making early predictions for the system’s
status sequences. Some symbols are defined as follows. T p = {(si , psi )|si ∈ S, psi is
the predicted label of si }. M N N is the set of mutual nearest neighbor pair (Si , Sj ), in
which Si is Sj ’s nearest neighbor and vice versa. In Algorithm 1, the training phrase is
described from line 1 to line 10. At first, we need to compute each sample’s MPL, which
is described from line 1 to line 3. The time complexity of this process is O(N 2 L), where
N is the number of the training samples, L is the full length of the status data sequence. A
hierarchical clustering process is carried out from line 4 to line 10, in which each training
sequence is treated as a leaf cluster initially. In clustering process, the complexity of this
process is O(N 3 L), since we need to computing the Nsli for every cluster. The prediction
phrase is from line 11 to 26, in which we will check whether the prediction of a sequence
would be made at each timestamp l. If Nsli = ∅ holds (line 14), it means we cannot make
prediction in this timestamp. Thus, we have to wait for more state data for predicting.

Algorithm 1 Early Prediction for System Status (EPSS)


Input: a training set T r, a testing set T t
Output: a prediction set T p
1: for each si ∈ T r do
2: computing the nearest neighbors of si ’s all prefix subsequences;
3: computing the MPL of si
4: computing the set M N N ;
5: while M N N = ∅ do
6: for each (Gi , Gk ) ∈ M N N do
7: merging Gi and Gk into a cluster Gp ;
8: if all sequences in Gp carry the same label then
9: computing the MPL of Gp and updating the MPL for each sequence in Gp ;
10: recomputing the set M N N with the new clusters and the unmerged clusters;
11: for each si ∈ Tt do
12: for l = 1 To n do
13: computing Nsl i ;
14: if Nsl i = ∅ then
15: psi = the dominating label of s ∈ Nsi in corresponding cluster S;
16: Tp ← (si , psi );
17: return T p;

3. Experiments

We constructed a data set to verify the proposed method’s effectiveness, which is based
on computers’ status data from different factors including the CPU temperature, the
CPU’s workload, the hard disk rotational seed, the graphics card temperature, the mem-
ory usage. We measured and recorded the values of the above factors every five seconds.
According to the Equation 2, we merged the above factor values into one value
to describe the system’s status. Further, we constructed a system status sequence with
Y. Li and Y.-M. Lin / Early Prediction of System Faults 523

fifty continuous merged values. Each sequence was labeled as "normal" or "abnormal"
manually. In total, the data set is made up of 6000 labeled system status sequences, in
which there are 644 samples labeled as "abnormal".
To measure the effectiveness of system fault detection, the accuracy (ACC) is treat-
ed as a evaluation indicators which focuses on the proportion of correct predictions.
However, it is not enough to consider the prediction accuracy only, because the abnormal
sequences are a tiny proportion of all sequences. Thus, if we predict all sequences to be
the normal ones, the prediction accuracy is still very high. But it is completely useless
for users. Our target is to find the abnormal sequences. Therefore, we treat the precision
(Pf ), recall (Rf ) and F1-score (F1 ) on abnormal sequence as the important indexes:
In the first experiment, we compared the EPSS with the full-length 1-nearest neigh-
bor (1NN) in Table 2. This means the EPSS and 1NN can make correct predictions for
most samples. However, the 1NN can not make prediction until the status data was gen-
erated totally. The EPSS’s average prediction length is 39.61. This is very important for
system maintenance, because we can know the system’s status. Especially for the cas-
es of system fault, the maintainers can diagnose and troubleshoot system earlier, which
would reduce the risk significantly caused by the system’s faults. Moreover, we investi-
gated the predictions on samples labeled as "abnormal" since such samples are crucial
for system and make up a tiny percentage of all samples. We can find that the EPSS al-
so achieved the high prediction precision and recall with small prediction length, which
means the proposed is effective to predict the system’s faults.

Table 2. The prediction effectiveness comparison between the 1-nearest neighbor and the EPPS

ACC AVG prediction length Rf Pf F1

1NN 0.995 100 0.991 0.968 0.979


EPSS 0.991 39.61 0.951 0.965 0.958

We can see that the EPSS will work with enough samples in first experiment. The
second experiment focused on the influence of the training sample number, which in-
creased from 200. We investigate the changes of efficiency on EPSS gradually. Overall,
the prediction effectiveness is promoted with the constant increase of the training sample
number. Thus, when we collect more and more training samples, the proposed method
would make better effectiveness for system status prediction. On the other hand, the third
line of Table 3 shows the average prediction time on making prediction for one sample.
We can find that the prediction process is real-approximate, which can help the system
managers further to diagnose the system status as early as possible.

Table 3. The prediction effectiveness with different counts of training samples

# of training samples 200 300 400 500 600 700 800 900 1000

ACC 0.963 0.972 0.977 0.978 0.983 0.984 0.987 0.987 0.991
prediction time (sec.) 0.013 0.022 0.031 0.038 0.055 0.060 0.069 0.071 0.078
Rf 0.874 0.881 0.914 0.943 0.925 0.932 0.953 0.949 0.951
Pf 0.802 0.867 0.880 0.867 0.892 0.922 0.929 0.937 0.965
F1 0.836 0.874 0.897 0.904 0.908 0.927 0.940 0.943 0.958
524 Y. Li and Y.-M. Lin / Early Prediction of System Faults

4. Conclusion

It is very important for maintainers to predict and diagnose the system’s faults early,
which would reduce the risk significantly. In this work, we treated the system status pre-
diction as a sequence classification problem. Then, we proposed a framework to predic-
t the system faults, by which the status predictions can be made as early as possible.
Moreover, we constructed a real world data set, and carried out a series of experiments
to verify the proposed method’s effectiveness. As the future work, we will further reduce
the proposed method’s sensitivity to the noise data and explore the parallel solution for
system status prediction based on the Map-Reduce framework.

Acknowledgment

This work is supported by the Guangxi Key Laboratory of Automatic Detecting Technol-
ogy and Instruments (No. YQ14109), the NSFC grant (No. 61562014, and U1501252,
No.61262008), the Guangxi Natural Science Foundation (No. 2015GXNSFAA139303),
the project of Guangxi Key Laboratory of Trusted Software, the high level of innovation
team of colleges and universities in Guangxi and outstanding scholars program funding,
and program for innovative research team of Guilin University of Electronic Technology.

References

[1] A S. Raj, N. Murali, Early classification of bearing faults using morphological operators and fuzzy
inference, IEEE Transactions on Industrial Electronics, 2 (2013), 567–574.
[2] S. Yin, G. Wang, H R. Karimi, Data-driven design of robust fault detection system for wind turbines,
Mechatronics, 4 (2014), 298–306.
[3] N. Subrahmanya, Y C. Shin, A data-based framework for fault detection and diagnostics of non-linear
systems with partial state measurement, Engineering Applications of Artificial Intelligence, 1 (2013),
446–455.
[4] T. Zimmermann, P. Weissgerber, S. Diehl, A. Zeller, Mining version histories to guide software changes,
IEEE Transactions on Software Engineering, 6 (2005), 429-445.
[5] O. Alhazmi, Y. Malaiya, I. Ray, Security vulnerabilities in software systems: A quantitative perspective,
IFIP Annual Conference on Data and Applications Security and Privacy, 2005, 281–294.
[6] D. Čubranić, G. C Murphy, J. Singer, et al., Hipikat: A project memory for software development, IEEE
Transactions on Software Engineering, 6 (2005), 446–465.
[7] Z. M Saul, V. Filkov, P. Devanbu, C. Bird, Recommending random walks, Proceedings of the the 6th
joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on
The foundations of software engineering, 2007, 15–24.
[8] Y. Zhu, D. Shasha, Efficient elastic burst detection in data streams, the ninth international conference
on Knowledge Discovery and Data Mining, 2003, 336–345.
[9] Z. Xing, J. Pei, S Y. Philip, Early Prediction on Time Series: A Nearest Neighbor Approach, Twenty-first
International Joint Conference on Artificial Intelligence, 2009, 1297–1302.
[10] C. Ding, X. He, K-nearest-neighbor consistency in data clustering: incorporating local information into
global optimization, Proceedings of the 2004 ACM symposium on Applied computing, 2004, 584–589.
Fuzzy Systems and Data Mining II 525
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-525

QoS Aware Hierarchical Routing Protocol


Based on Signal to Interference plus Noise
Ratio and Link Duration for Mobile Ad
Hoc Networks
Yan-Ling WU a,1, Ming LIb, and Guo-Bin ZHANGb
a
School of Computer, Dongguan University of Technology, Dongguan, Guangdong,
523808, China
b
School of Electronic and Engineering, Dongguan University of Technology,
Dongguan, Guangdong, 523808, China

Abstract. Though node mobility allows quick network setup for Mobile Ad Hoc
Networks (MANETs), it can lead to route failure if improper movements are taken.
Therefore, how to maintain the reliability of routes has always been a challenge in
MANETs. Since routes are composed of relaying-nodes hop-by-hop, how to select
reliable relay nodes is of great importance. Existing researches of routing protocol
mainly utilize the metric of Hop or Received Signal Strength (RSS), which ignores
the interference from other nodes and cannot make a good indication for link
quality. In allusion to this problem, we propose an extension of OLSR protocol,
named QoS aware Hierarchical Routing Protocol (QHRP), by replacing its relay-
node selection policy. To be specific, we use a new metric combining estimated
signal to interference plus noise ratio (SINR) and link duration (LD) instead of the
number of hops used in OLSR. Since SINR considers the accumulative
interference from neighbor nodes, and LD considers the lifetime per link, it is
expected that routes generated by QHRP can be more reliable. Extensive
simulations show that QHRP can achieve an outstanding performance in terms of
calculation of route, overhead and packet drop ratio.

Keywords. MANETs, QoS, Routing Protocol, SINR, LD.

Introduction

In Mobile Ad hoc Networks (MANETs), wireless mobile nodes dynamically establish


routes among them without fixed network infrastructure and centralized administration.
Existing literature review divides the existing routing protocols for MANETs into two
categories: table-driven solutions and source-initiated on-demand solutions [1]. The
former periodically exchange the routing information among all nodes to maintain a
route from one node to another node. Data can be immediately delivered from source
node to destination node when necessary, such as Destination-Sequenced Distance-
Vector (DSDV) [2] and Wireless Routing Protocol (WRP) [3]. However, they consume
too much bandwidth even though the nodes never move. The latter establishes the route

1
Corresponding Author: Yan-Ling WU, Dongguan University of Technology, Dongguan, Guangdong,
523808, China; E-mail: wu_yanling@hotmail.com.
526 Y.-L. Wu et al. / QoS Aware Hierarchical Routing Protocol

only when desired. Once a route has been created, it is maintained by a route
maintenance procedure during data delivery, such as Dynamic Source Routing (DSR)
for Mobile Ad Hoc Networks for IPv4 [4] and Ad hoc On-demand Distance Vector
(AODV) [5]. However, the latency to determine a route can be quiet significant if there
is not an available route between source node and destination node.
Quality of Service (QoS) is always a challenge in MANETs due to interference
and node’s mobility. Throughput, end to end latency, lifetime, available bandwidth and
packet delivery ratio are usually evaluated as the key QoS parameters [6-23]. In this
paper, we propose an extension of OLSR protocol, named QoS aware Hierarchical
Routing Protocol (QHRP), by replacing its relay-node selection policy. To be specific,
we use a new metric combining estimated signal to interference plus noise ratio (SINR)
and link duration (LD) instead of the number of hops used in OLSR. Since SINR
considers the accumulative interference from neighbor nodes, and LD considers the
lifetime per link, it is expected that paths generated by QHRP can be more reliable.
Only the nodes with SINR higher than a given threshold can make candidates for relay
nodes, and candidate which have the longest LD win the selection. Similar to OLSR,
once the relay nodes (hereafter renamed as parent-nodes in QHRP) are elected
according to our new policy, Topology Control (TC) messages are generated and
diffused by parent-nodes in the network. This strategy allows that more reliable routes
can be established and most of flooding overhead can be eliminated in MANETs.
The rest of paper is organized as follows. In Section 1, related works are reviewed.
The details of QHRP are given in Section 2, followed by the performance evaluations
in Section 3. Finally, we conclude this paper in Section 4.

1. Related Works

In this Section, we review the existing routing protocols for MANETs from the
perspective of QoS, and summarize them into three categories: strategies based on
reservation of available bandwidth [7-8], [18], [24, 25]; strategies based on link state
[15], [16], [19, 20], [23].
In this Section, we review the existing routing protocols for MANETs from the
perspective of QoS, and summarize them into two categories: available bandwidth
reservation strategies [7, 8], [18], link-state strategies [15, 16], [19, 20], [23].
Guimarães et al. [7] introduced a mechanism called Bandwidth Reservation over
Ad hoc Wireless Networks (BRAWN), where the available bandwidth was calculated
by each node of the network to estimate the suitable rate allocation. A cross-layer
routing protocol was proposed, which applied two different methods “Listen” and
“Hello” to estimate residual bandwidth, then integrated a QoS-aware mechanism into
the route discovery procedure and providing feedback to the application [8]. Lei et al.
[18] proposed an available bandwidth estimation method on considering concurrent
transmission for MANETs.
Rubin and Liu [15] studied in depth the statistical properties of link stability in
four movement patterns, i.e. Random Destination, Random Walk, Random Movement
and Swarm Movement patterns, respectively. The lifetime distribution of links was
analyzed and different models for different movement patterns were developed. Al-
Akidi et al. [19] proposed three schemes to offer a novel mechanism for establishing
and maintaining routes based on-demand routing technique. The heading direction
information was applied instead of GPS information which was not always available
Y.-L. Wu et al. / QoS Aware Hierarchical Routing Protocol 527

(e.g. in the underground). A probability-based mechanism was presented which could


make an accurate estimation of the links’ stability [20]. The signal strength variation
was applied as a main indicator of the nodes’ mobility. The proposed mechanism
estimated the stability and the fidelity of nodes which were integrated in OLSR.
The previous strategies attempted to provide a mechanism to guarantee QoS in
MANETs, but present certain problems: i) the reserving available bandwidth protocols
were always suffered from the problem of flooding message; ii) most link-state
protocols considered a higher priority for reliability of routes may result in overusing
some nodes along these links, which would then fail quickly.
In our contribution, we design a new strategy QHRP which eliminates most of
broadcasting messages through election of parent-nodes. Furthermore, the election of
parent-nodes is achieved considering LD and SINR instead of Received Signal Strength
(RSS) which is not suitable as a criterion due to existing interferences in the networks.
And the simulations are performed to demonstrate the effectiveness of our contribution

2. System Model

In QHRP, we assume that:


ձ free space propagation model is implemented, where the RSS totally depends on
the distance between the pair of communication and the transmission power.
ղ all nodes are synchronized.
ճ GPS system is implemented in all nodes.
մ the coverage of MANETs is a circle whose radius is denoted as R.

2.1. Calculation of SINR and LD

The network topology is illustrated in Figure 1.

j
g n
e i
R
k f
h
m l
(0, 0)

Figure 1. Network topology


For a given node j, the SINR of one of its neighbors i can be expressed as:

RSS from i
SINRi lg (1)
¦ Total of RSS excluded the one from i

As mentioned above, the free space propagation model is implemented in QHRP.


Therefore, for a given node j, the Received Signal Strength from i (denoted as RSSi)
can be written as follows:
528 Y.-L. Wu et al. / QoS Aware Hierarchical Routing Protocol

PTi x Gi x Gj x O 2 (2)
RSSi
4S 2 x di2, j x L
where PTi is the transmission power of node i, Gi and Gj are the gain of antennas of
i and j, respectively. λ, L, di, j represent the wave length, the system loss, and the
distance between i and j, respectively.
Based on Eqs.(1) and (2), for the node j, the SINRi can be written as:

PTi x Gi x Gj x O 2

SINRi lg
4S 2 x di2, j x L (3)
¦ Total of RSS excluded the one from node i
Since the coordinate of each node is known in QHRP, LD between two one-hop
neighbor nodes i and j can be estimated as the proposition introduced by Su et al. [26].
Let vi and vj be the speeds, and θi and θj (0 θi, θj 2π) be the moving direction of i
and j.

(ab  cd )  (a 2  c 2 )r 2  (ad  bc)2 (4)


LDi , j
a2  c2

Where, r is the transmission range, a vi cosTi  v j cos T j , b xi  x j ,


c vi sin Ti  v j sin T j and d yi  y j , respectively.

2.2. Election of Parent-Node

The HELLO message is extended to carry the SINR and the LD between one-hop
neighbors QHRP. The nodes whose SINR is greater than SINRthr are considered as the
candidates of parent-node. Once the candidates determined, the one with the longest
LD will be elected as the parent-node.
Reserved Htime Willingness
Link code Reserved Link Message Size
Interface Address of g
SINR of g LDi, g
Interface Address of h
SINR of h LDi, h
Interface Address of e
SINR of e LDi, e
Interface Address of j
SINR of j LDi, j
Figure 2.(a) Format of HELLO message, HELLO message sent by i
Y.-L. Wu et al. / QoS Aware Hierarchical Routing Protocol 529

Reserved Htime Willingness


Link code Reserved Link Message Size
Interface Address of g
SINR of g LDe, g
Interface Address of h
SINR of h LDe, h
Interface Address of l
SINR of i LDe, i
Interface Address of j
SINR of j LDe, j
Figure 2.(b) Format of HELLO message, HELLO message sent by e
The format of HELLO message is illustrated as Figure 2.
As illustrated in Figure 2, the node g, h, e and j are considered as the candidates of
parent-node of the node i since their SINR are greater than SINRthr. Note that the node l
is not considered as the candidates of parent-node of i since its SINR is below than
SINRthr. For the same reason, the node k is not considered as the candidates of parent-
node of e.

2.3. Advertisement of Parent-Node

Once the parent-node is determined, the Parent_Update message will be generated and
sent to the parent-nodes by the children-nodes. The Parent_Update message allows that
each parent-node collects information of its children-nodes. Afterwards, information
will be exchanged between the pair of children-node and parent-node during Htime.
The format of Parent_Update message is illustrated in Figure 3.

Parent_Election_Ti
Htime Reserved
mer

Interface Address of children-node

Interface Address of Parent-node

Figure 3. Format of Parent_Update message


The value of ‘Parent_Election_Timer’ is set to be smaller than Htime to ensure that
the Parent_Election procedure can be re-initiated before the appearance of next
HELLO. In the simulations, it is set to two-thirds of Htime. ‘Interface Address of
children-node’ and ‘Interface Address of parent-node’ represent the address of an
interface of children-node and its parent-node, respectively.
530 Y.-L. Wu et al. / QoS Aware Hierarchical Routing Protocol

2.4. Topology Discovery

In QHRP, the main function of parent-nodes is relaying the messages between nodes
and establishing the suitable routes to the destination node. Each parent-node
periodically diffuses the list of its children-nodes by broadcasting TC messages which
allow creating and maintaining the routing information in the network. The TC
messages can be only generated and reconstructed by parent-nodes. A modified TC
message is introduced which is included the SINR and LD in QHRP. This modification
certainly is helpful to discover the more stable routes.
The format of modified TC is illustrated in Figure 4.

ANSN Reserved
Advertised Children-node Main Address
SINR LD
Advertised Children-node Main Address
SINR LD

Figure 4. Format of TC message


Each parent-node introduces the main address of the set of all children-nodes in the
field ‘Advertised Children-node Main Address’. The relevant SINR and LD are
introduced in the field ‘SINR’ and ‘LD’, respectively. These two fields are updated
based on the current Parent_Update message. The field of ‘Reserved’ is set to 0. The
other fields are kept the same as defined in OSLR specification.

3. Performance Evaluation

Kitasuka and Tagashira [27] proposed a method (denoted as Shared MPR) to reduce
the traffic by finding more MPR to decrease the MPR ratio, which is defined as the
number of MPR nodes divided by the total number of nodes in the network.
Consequently, the number of TC message generation is decreased, and the overhead is
reduced in the networks. To reduce the number of TC messages, the MPR computation
algorithm is modified. In the proposed algorithm, each node tried to select a node as an
MPR, which is already selected as an MPR by its neighbors, only if the size of MPR set
is not larger than that of the conventional MPR selection.
To demonstrate the effectiveness of QHRP, the performance evaluation is achieved
in comparison with the shared MPR selection.
The main simulation parameters are illustrated in Table 1.
Simulations are achieved with Matlab 7.0.4 in terms of:
x overhead: the number of control packets to total of packets generated in the
networks.
x packet drop ratio: the number of lost packets to total of packets generated in
the network.
Y.-L. Wu et al. / QoS Aware Hierarchical Routing Protocol 531

Table 1. Simulation parameters

Parameter Value
network size 1000 m × 1000 m
mobility model RWP
pause time 5s
HELLO interval 2s
TC interval 5s
hold time 15 s
physical layer 802.11
Transmit power 1 mW
gain of antenna 3 dbi
radio frequency 2.4 GHz
transmission range of node 150 m
SINRthr -91 dBm
velocity of nodes uniform [1, 10] m/s
number of nodes 100
simulation time 3000 s
The comparison of overhead for two protocols is illustrated in Figure 5.

25
QHRP
shared MPR

20
Overhead (%)

15

10

0
1 2 3 4 5 6 7 8 9 10
Velocity of nodes (m/s)

Figure 5. Comparison of overhead


In Figure 5, the simulation result illustrated that the performance in terms of
overhead is degraded with the increase of velocity of nodes for two protocols. The
nodes move more rapid, the routes need to be re-established more frequent. As a
consequence, more control packets are generated in the networks.
532 Y.-L. Wu et al. / QoS Aware Hierarchical Routing Protocol

It is well-known that for OLSR-extensions, reducing the size of MPR set can help
reducing the protocol overhead. In theory, Kitasuka and Tagashira [27] can achieve
better performance than QHRP since its purpose is to minimize the size of MPR set by
finding more MPR in the networks. However, as shown in Figure 6, the performance of
Kitasuka and Tagashira [27] is not really better than QHRP. While the velocity of
nodes is greater than 5 m/s, the performance of QHRP is much better than shared MPR.
The main reason is that the nodes are assumed to be static in Kitasuka and Tagashira
[27], the created routes are vulnerable while the nodes move. While the nodes move
more rapid, to maintain and re-discovery the network topology, more control packets
are generated. Contrary to Kitasuka and Tagashira [27], the routes created in QHRP are
more stable since the mobility has already been considered. In QHRP, the route
establishment is not only based on the SINR between two nodes, but also based on the
estimation of link duration between them.
The evaluation of packet drop ratio for two protocols is shown in Figure 6.

10
QHRP
9 shared MPR

7
Packet drop ratio (%)

0
1 2 3 4 5 6 7 8 9 10
Velocity of nodes (m/s)

Figure 6. Comparison of packet drop ratio


As shown in Figure 6, the performance of two protocols is also degraded with the
increase of velocity of nodes in terms of packet drop ratio. With Kitasuka and
Tagashira [27], the packet drop ratio is smaller than that one with QHRP only when the
velocity of nodes is less than 5.5 m/s. The reason is detailed as follows.
x While the nodes move slowly, the routes created with the shared MPR can be
kept relatively stable. Furthermore, the traffic is reduced by minimizing the
MPR. As a consequence, few packets are dropped. Contrarily, the traffic
generated with QHRP is always much more than Kitasuka and Tagashira [27]
due to the introduction of Parent_Update message, which increases the
probability of collision in the networks.
x While the velocity of nodes increases, the performance of both protocols has
been degraded due to the high probability of route failure and the collision in
the network. However, the one with shared MPR is worse since the mobility is
not considered in Kitasuka and Tagashira [27]. With QHRP, the routes are
Y.-L. Wu et al. / QoS Aware Hierarchical Routing Protocol 533

more reliable compared to the shared MPR since the SINR and LD are
considered while the route establishing.

4. Conclusion

Route failure resulting from node mobility has always been a challenge in MANETs.
Many extensions of the link stated based routing protocol named OLSR have been
proposed. However, since the interference is not able to be predicted, selecting relay
nodes mainly by counting the number of hops in OLSR does not bring reliable routes in
MANETs. In this paper, a QoS aware Hierarchical Routing Protocol (QHRP) is
proposed. In QHRP, we use a new metric combing SINR and LD to replace the
traditional hops used in OLSR, and exchange these new metrics among one-hop
neighbors via periodically broadcasted HELLO messages. Upon receiving several
HELLO messages from its one-hop neighbor nodes, the node will only consider
neighbor nodes whose SINR go beyond a predefined threshold as candidates for relays,
and select the candidate with the longest LD as its parent-node. Then, similar to OLSR,
TC messages will be diffused by parent-nodes. Extensive simulation results confirm
the effectiveness of our proposed approach in term of calculation of route table,
overhead and packet drop ratio.

Acknowledgements

This work is supported by Project of Public Welfare of Guangdong Province


(2015A010103020), National Natural Science Foundation of China (61170216), Nature
Science Foundation of Guangdong Province (2015A030313652) and Project of social
development of Dongguan (2014106101002).

References

[1] E. M. Royer, C-K Toh. A Review of current routing protocols for ad hoc mobile wireless networks. IEEE
Personal Communication 6 (1999), 46-55.
[2] C. E. Perkins, P. Bhagwat. Highly dynamic destination-sequenced distance-vector routing (DSDV) for
mobile computers. ACM Sigcom (1994), 234-244.
[3] S. Murthy, J. J. Garicia-Luna-Aceves. A Routing Protocol for packet radio networks. ACM MobiCom
(1995), 86-94.
[4] D. Johnson, Y. Hu, D. Maltz. The Dynamic Source Routing Protocol (DSR) for Mobile Ad Hoc
Networks for IPv4. IETF RFC 4728, Feb. 2007.
[5] C. E. Perkins, E. M. Royer. Ad hoc on-demand distance vector routing. IETF RFC 3562, 2003.
[6] T. B. Reddy, I. Karthigeyan, B.S. Manoj, C. Siva Ram Murthy. Quality of service provisioning in ad hoc
wireless networks: a survey of issues and solutions. Ad Hoc Networks 4 (2006), 83-124.
[7] R. Guimarães, L. Cerdà, José M. Barceló, J. García, M. Voorhaen, C. Blondia. Quality of service through
bandwidth reservation on multirate ad hoc wireless networks. Ad Hoc Networks 7 (2009), 388-400.
[8] L. Chen, Wendi B. Heinzelman. QoS-Aware Routing Based on Bandwidth Estimation for Mobile Ad
Hoc Networks. IEEE Journal on selected areas in communications 23 (2005), 56-572.
[9] M. Xie, M. Haenggi. Towards an end-to-end delay analysis of wireless multihop networks. Ad Hoc
Networks 7 (2009), 849-861.
[10] S. Kajioka, N. Wakamiya, H. Satoh, K. Monden, M. Hayashi, S. Matsui, M. Murata. A QoS-aware
routing mechanism for multi-channel multi-interface ad-hoc networks. Ad Hoc Networks 9 (2011), 911-
927.
534 Y.-L. Wu et al. / QoS Aware Hierarchical Routing Protocol

[11] J. J. Liu, X. H. Jiang, H. Nishiyama, N. Kato and X. M. (Sherman) Shen. End-to-End Delay in Mobile
Ad Hoc Networks with Generalized Transmission Range and Limited Packet Redundancy. IEEE
Wireless Communications and Networking Conference (2012), 1731-1736.
[12] C. K. Toh, A. N. Le and Y.Z. Cho. Load balanced Routing Protocols for Ad Hoc Mobile Wireless
Networks. IEEE Communications Magazine 47 (2009), 78-84.
[13] I. T. Haque. On the Overheads of Ad Hoc Routing Schemes. IEEE System Journal 9 (2014), 605-614.
[14] Q. Xue, A. Ganz. Ad hoc QoS on-demand routing (AQOR) in mobile ad hoc networks. Journal of
Parallel and Distributed Computing 63 (2003), 154-165.
[15] I. Rubin, Y.-C. Liu. Link stability models for QoS ad hoc routing algorithms. IEEE 58th VTC Fall
(2003), 3084-3088.
[16] N. Sarma, S. Nandi. Route Stability Based QoS Routing in Mobile Ad Hoc Networks. Wireless
Personal Communication 54 (2010), 203-224.
[17] C. W. Yu, T. K. Wu, R. H. Cheng. A low overhead dynamic route repairing mechanism for mobile ad
hoc networks. Computer Communications 30 (2007), 1152-1163.
[18] L. Lei, T. Zhang, L. Zhou, X. M Chen, C.F. Zhang, and C. Luo. Estimating the Available Medium
Access Bandwidth of IEEE 802.11 Ad Hoc Networks with Concurrent Transmissions. IEEE
Transactions on vehicular technologies 64 (2015), 689-701.
[19] M. Al-Akaidi and M. Alchaita. Link stability and mobility in ad hoc wireless networks. IET
communications 1 (2007), 173-178.
[20] A. Moussaoui, F. Sechedine, A. Boukerram. A link-stat QoS protocol based on link stability for Mobile
Ad hoc Networks. Journal of Network and Computer Applications 39 (2014), 117-125.
[21] A. Nayebi, H. Sarbazi-Azad. Analysis of link lifetime in wireless mobile networks. Ad Hoc Networks
10 (2012), 1221-1237.
[22] C. Ma, Y. Y. Yang. A Battery-Aware Scheme for Routing in Wireless Ad Hoc Networks. IEEE
Transactions on vehicular technology 60 (2011), 3919-3932.
[23] T. Clausen, P. Jacquet. Optimized Link State Routing Protocol (OLSR). IETF RFC 3626, Oct. 2003.
[24] R. Belbachir, Z. M. Maaza, and A. Kies. The mobility issue in admission controls and available
bandwidth measures in MANets. Wireless Personal Communication 70 (2013), 743–757.
[25] C. Sarr, C. Chaudet, G. Chelius, and I. G. Lassous. Bandwidth estimation for IEEE 802.11-based ad
hoc networks. IEEE Transactions on Mobile Computing 7 (2008), 1228–1241.
[26] W. Su, S. J. Lee, and Mario Gerla. Mobility Prediction and Routing in Ad Hoc Wireless Networks.
International Journal of Network Management 11 (1970), 3-30.
[27] T. Kitasuka, S. Tagashira. Finding More Efficient Multipoint Relay Set to Reduce Topology Control
Traffic of OLSR. IEEE 14th International Symposium and Workshops on a World of Wireless, Mobile
and Multimedia Networks (WoWMoM), (2013), 1-9.
Fuzzy Systems and Data Mining II 535
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-535

The Design and Implementation of


Meteorological Microblog Public Opinion
Hot Topic Extraction System
Fang REN a,1, Lin CHEN b, Cheng-Rui YANG a
a
Shaanxi province meteorological service center, Xi'an,710014,China.
b
Sichuan province meteorological observatory,Chengdu,610072,China

Abstract. Sina meteorological microblog was taken as data resource, with a view
to Sina Shaanxi meteorological microblog, the paper achieved designing of the
system of extraction of public opinion on microblog’s hot topic. We introduced in
detail the system’s whole process, described data extraction, Word segmentation
system, hot extraction method, and described the problems occurred in realization
and the factors to be improved. The next step will be increased analysis results of
images, to get more accurate hot extraction results.

Keywords. meteorological , microblog, public opinion , segmentation, hot words


extraction

Introduction

At present, mainstream systems at home and abroad of monitoring of public opinion


are aiming at traditional social network, few of them is aiming at microblog, because
microblog’s history is so short, research on it is very few. I in addition, there is great
barrier in microblog data obtaining, it is difficult to monitor public opinion on
microblog automatically. But many great events of public opinion in recent years were
disclosed on microblog first and fermented rapidly, so it is necessary to monitor public
opinion on microblog. Therefore, the monitoring of hot public opinion on mobile
internet and analysis of their law become a necessary and urgent requirement [1].
Meteorological microblog as a meteorological service platform of social media,
has the same features such as large amounts of data, rapid refresh frequency, fast
spread speed, diversity, complexity. When the emergency happened, it is a important
subject that how to exert better the functions of popularization, rescue reminder,
network supervision, Communicate with overseas refute slanders, and to spread
positive energy, to enhance organizing ability of self rescue . It need to filter and
extract rapidly useful information from large amounts of data, to make analysis of
service in Data Set, to estimate the benefit, and then take relative measures to promote
the service’s efficiency. Synthesized current situation home, in view of convenience
and data’s authority of the development platform, the paper choose Sina microblog as

1
Corresponding Author: Fang REN, No.36, North commissioner main Street, Lianhu District, Xi'an City,
Shaanxi Province, China; E-mail: renfang829200@163.com.
536 F. Ren et al. / The Design and Implementation of Meteorological Microblog Public Opinion

research’s object to process analysis of data. Collecting information, disposing of data,


finding topic, analysis of hot spot in the paper are aiming at public opinion on network.

1. Process of Discovery

The system extracts original data from microblog in real time using API [2], filters
every microblog’s message simply, reserves only text of the microblog’s message and
forwarding content to proceed with word segmentation. The text’s number of
comments and forwarding, the forwarding content’s number of comments and
forwarding will be count. Their statistical weights then will be handled with
interdependency. The word segmentation system, filtering system, and word
segmentation algorithm are referred from open word segmentation system shootseg [3]
developed by ShootSun working room. And then, all the topic words’ word-frequency
statistics will be calculated out, all the topic words will be sorted by their heat. The hot
words’ interdependency is analyzed then. The method of analysis of interdependency
will be introduced briefly below. The outcome of output includes topic words, topic
words related words, topic words related microblog. The prototype system will judge
every hot word as a topic word. Through comparing related words with related
microblog, perceptually presents the correction of the topic word. Process is presented
in the Figure 1.

Figure 1. Total process of the system

2. Data Acquisition of Public Opinion Monitoring on Microblog

Information of public opinion on internet means the information collected through


gathering and analysis of public opinion’s information. Collection of the information of
public opinion on internet is the basis and precondition of research, analysis,
application of the information. This paper realized Collection of the information by
learning and studying API interface provided by Sina microblog platform, can support
Shaanxi meteorological microblog in collecting and analyzing of the information of
public opinion on internet, so as to monitor and guide the public opinion.

2.1. API of Sina Microblog

The open platform of Sina microblog provides download of plenty of APIs which
adapting to kinds of language environments. Official SDK are self-developed by
authority, with official total technology support, can perform all functions of handling
microblog. Official SDK advantage is its powerful functionality, but its disadvantage is
also obvious that these SDK languages supportability is poor, the programming
language C# used by the system of this paper is not supported by them, so it has to
F. Ren et al. / The Design and Implementation of Meteorological Microblog Public Opinion 537

perform download and exchange of data across language environments in


programming process, very inconveniently. Whereas the third-party SDK are packaged
official existing SDK by netizens, with more powerful languages adaptation. The
third-party SDK have inevitably some functions’ shortage or instability, and their help
texts are not as complete as official SDK, so unofficial SDK can only realize
optimizing and extension of official SDK to some extent.

2.2. API of Sina Microblog Usedby the Paper

After compared many APIs provided by The open platform of Sina microblog, and
considering the paper’s language environment, the paper used API of Sina Microblog ,
and the C# Sina Microblog’s third-party SDK [4] adopted by the paper is shared
by netizen LinXuanchen. The third-party API was modified in adaptation relative to
original API, meanwhile, without many functions’ shortage, its complete functions and
help texts bring the paper great help and convenience.
This system acquired practical basic information includes App Key, App Secret,
authorized callback page by registered as a developer in Sina microblog and finished
registering application program. The practical basic information is important condition
of the successful connection from related applications developed by the system to
server of microblog.
Main application modules of Sina microblog’s API are login module and
information extraction module. Login module accomplish identity authentication of
users on the open platform of Sina microblog, obtain verification’s outcome of current
user’s login. Information extraction module realize that every account correspond to
one ID, obtain user’s personal information (head portrait, nickname, label, etc.) , obtain
latest microblog of current user and users focused by current user, and information of
current microblog’s status’ sender.

2.3. Structure of Data Storage

In the process of data collection, every microblog message is collected in form of


character string, data structure’s attribute includes microblog creation time, microblog
text, microblog ID of source of reply to microblog , user ID of source of reply to
microblog, user’s nickname of source of reply to microblog , number of microblog’s
retransmission, user ID of sending microblog, ID of microblog, collection number of
microblog, microblog images(large), microblog images(large), microblog
images(large), microblog images(medium), microblog images(little), number of
comments on microblog , retransmission source’s microblog.

3. Word Segmentation System

Word segmentation technology is also a core module of this system. Word


segmentation algorithm adopted by the paper is referred from open words segmentation
system shootseg developed by ShootSun working room, and the whole original
algorithm was modified.
538 F. Ren et al. / The Design and Implementation of Meteorological Microblog Public Opinion

3.1. Process of Word Segmentation System

Process of word segmentation system includes three aspects: load of word


segmentation’s dictionary and sorting of word segmentation’s dictionary, basic word
segmentation on text, processing with text processed by word segmentation.
The original procedure read the text to be processed in form of character string,
and then the processed text returns in form of character string too. After modified by
the system of this paper, though the text to be processed is read in form of character
string, the keyword is outputted directly in form of ArrayList, rather than filtering the
keyword directly. The whole process of reading is presented in the Figure 2.

Figure 2. Total process of the word segmentation module

3.2. Module of Load Of Dictionary

Besides the existing 4 word segmentation dictionaries (phraseology dictionary, letter


dictionary, digit and Chinese digit dictionary, name processing dictionary), the paper
adds noise processing dictionary. These dictionaries are coded uniformly in format of
utf-8, separated by a carriage return, sorted by their Hash value, stored in data folder
under root directory in format of txt.
The system reads dictionary storing text line by line using StreamReader (method
of reading stream data), dictionary is in ArrayList whose storage structure is string in
memory.

3.3. Algorithm of Word Segmentation Module

The system judges the type of text read currently line by line with word segmentation
dictionaries loaded into memory earlier, set decollator as ‘/’. It judges current word is
whitespace or not at first, segment the word when the word is digit or letter, if else,
take the whitespace as noise and filter it directly. Then determine the type of the word,
if the word can not be matched with phraseology after determined the type of the word,
set a mark to indicate its type. The marks include digit mark, letter mark, name mark,
name mark often appears when the word or character can not be processed with
traditional word segmentation. If there is determination of former word, separate the
former word when system determining current word to be different type, and mark the
current word, insert directly the word to output string when word segmentation
retrieval of current word is closing. The most complicated task in the algorithm is
determination of Non-numeric characters, in the retrieval structure of Non-numeric
characters, retrieve the second grade hash table firstly, search in advance, if the word
constructs term, then insert directly the term to output string, and avoid practicing again
with a mark to indicate the word segmentation in advance have been done. It is
F. Ren et al. / The Design and Implementation of Meteorological Microblog Public Opinion 539

required to make determination of quantifier and name word, when it is impossible to


discern a word’s structure, output directly as a unidentified word and separate it. At last,
if the current word is a rare word or a symbol, or it can not be matched in any case,
separate its former word, and initialize all marks, insert current word and decollator.

4. Topic Information Extraction System

4.1. Topic Information Extraction

After the word segmentation, to count topics, the system have to do coherence analysis
of the key word of every microblog, in order to overcome the disadvantage of
disordering, shortness, unsystematic, decentration of microblog, also to evade problems
of message’s imperfection and interlacement. Common method of coherence analysis
of word includes. ٤ 1 Analysis of the key word’s frequent usage [5-6]. ٤ 2 Word
frequency and word density [7-8]. ٤ 3 Keyword’s position and form [8]. ٤ 4 Distance
between keywords [9-10] . ٤ 5 Analysis of link and pages’ weight [6].
To analyze topic word, it is necessary to use algorithm of keyword’s distance, but
the method has definite problems in analysis of microblog. Because microblog’s
content is tattered and has a large coverage, keywords extracted from a microblog will
be plentiful, so the outcome from the method’s process will has great complicated
dimensions, bringing trouble to operation. The paper’s system adopted a compromised
method. The system calculated out and displayed abundant keyword by utilizing key
word’s frequent usage, word frequency and word density, distance between keywords,
Analysis of link and pages’ weight comprehensively.

4.2. System Implementation

The system’s developing platform is Microsoft Visual Studio 2012, and the data base
using Microsoft SQL Server 2008. The operating environment is Dual-core P4 2.5GHz,
4GB, XP SP3, speed of network is 100MB.
Before operation of the system, it is required to set the system with API of Sina
microblog. The setting includes App Key, App Secret, CallbackURL, after the three
data passed verification, the program could be linked to server of Sina microblog.
And then the system verifies the account’s password of users logged in by
comparing local value and verified value, to accomplish login. After login, the system
obtains current users logged in and the latest microblog they focused on. Because the
microblog captured in this method is ranked by time, the system finished the download
of microblog data in batches by setting time, and after the download is finished
completely, the system starts to analyze the microblog. The realizing module of the
system is presented in the Figure 3.
540 F. Ren et al. / The Design and Implementation of Meteorological Microblog Public Opinion

Figure 3. System’s module structure diagram

5. Experiment Confirm and Analysis

The system captured 671 pieces of effective microblog data through collecting data of
Shaanxi meteorological Sina microblog in 3 months, after data filtering, word
segmentation, topic word count, heat ranking, 356 original microblog and relative
comment are obtained. In the process, the algorithm of word segmentation amended by
the paper is adopted to longer microblog text, then supplemented the popular Word in
Networks to the dictionary, acquired well effect. Firstly, we verified the effectiveness
of algorithm of extracting topic adopted by the paper, the experiment applied database
of SQL Server 2008 to establish corpus which is used to simulate real data. Five
outcomes ranking at the front of extraction of hot spot were selected after being
arranged and analyzed. It is presented in table 1.
To verify whether the algorithm of extracting topic can respond hot event on
microblog or not, we compared the hot spots extracted by the experiment with hot
topics on Sina microblog, it is found that all the events listed by the experiment
appeared on the ranking list of Sina microblog, that meaning the experiment have got
a good result and thus verified that the algorithm of extracting topic of the paper has
feasibility and effectiveness to extract focal topic discussed and commented warmly by
netizen.
Table 1. Hot words sorted
Ranking Topics Occurrence Number Frequency
1 haze 113 0.035
2 El Nino 89 0.021
3 debris flow 77 0.017
4 rainstorm 52 0.012
5 hail 31 0.007
F. Ren et al. / The Design and Implementation of Meteorological Microblog Public Opinion 541

6. Conclusions

The system completed the extraction of the hot topic of work effectively .But, it also
has some shortcomings, is lack of image analysis results. The next step will be for the
further research, to get more accurate hot extraction results.

References

[1] Y. Liu. Introduction of Research on the Network Public Opinion.Tianjin: Tianjin People Publishing
House, 2007.
[2] Sina .Sina Microblog. API Open Platform [EB/OL]. (2013-03-12).
http://open.t.sina.com.cn/wiki/index.hph.
[3] http://download.csdn.net/download/zeal27/3049486
[4] http://www.cnblogs.com/linxuanchen/p/5113233.html
[5] (Can.) Writen by J. W. Han, M. Kamber; Translated by M. Fan, X. F. Meng. Data Mining: Conceptions
and Technology. Beijing: China Machine Press, 2007.3
[6] J. W. Han, X. F. Meng, J. Wang. Web Mining Research. Journal of Computer Research and
Development, 38(2001): 405-414.
[7] G. L. Ji, K. Shuai, Z. H. Sun. Data Mining Technology and Its Application. Journal of Nanjing Normal
University (Natural Science Edition), 23(2000): 25-27.
[8] J. Wu. The Beauty of Mathematics. Google Research Institute, 2008.12
[9] G. M. Yu. Characteristics and Statistical Analysis of Network Public Opinion’s Hot Event. People
Forum(Chinese), 4(2010).
[10] Q. Y. Yao, G. S. Liu, X. Li. Text Clustering Algorithm Based on Vector Space Modal (VSM) \.
Computer Engineering. 9(2008): 40-44.
542 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-542

Modeling and Evaluating Intelligent


Real-Time Route Planning and Carpooling
System with Performance Evaluation
Process Algebra
Jie DING a,b , Rui WANG a,d,1 , and Xiao CHEN c
a
School of Information Engineering, Yangzhou University, Yangzhou, Jiangsu, China
b
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing,
Jiangsu, China
c
School of Computer Science and Communication Engineering, Jiangsu University,
Zhenjiang, Jiangsu, China
d
State Key Laboratory of Software Development Environment, Beihang University,
Beijing, China

Abstract. Intelligent traffic systems (ITS), have provided effective solutions to


solve many kinds of traffic problems which are becoming increasingly prominent.
Wherein, for saving travel costs, reducing harmful emissions and maximizing the
utilization of traffic facilities, intelligent real-time route planning and carpooling
has become a hot research issue. This paper mainly focuses on the general architec-
ture of intelligent real-time route planning and carpooling. Meanwhile, the research
also adopts a formalization language — Performance Evaluation Process Algebra
(PEPA) to model the overall architecture and processing of the system. Further-
more, fluid flow approximation is used to do specific performance evaluation of the
model. Through setting the specific parameters, it is easy to obtain the maximum
utilization, response time and throughput of the system. Therefore, the research in
this paper is significant and meaningful to achieve efficient management of all types
of transportation resources and make better use of ITS.

Keywords. ITS, PEPA, Performance Evaluation, Fluid Flow Approximation

Introduction

For variety transportation problems faced by the modern city, like congestion, travel de-
lays and many more. Sustainable development, especially sustainable mobility (such as
shared transport systems and modes of combined transport), is gaining more and more
attention. And several efficient solutions (such as car sharing, bus and intelligent traf-
fic systems (ITS, an integrated system consists of communication technologies, vehicle
sensing and many other technologies) ) have been proposed to ease these problems [1].
1 Corresponding Author: Rui WANG, School of Information Engineering, Yangzhou University, Yangzhou,

Jiangsu, China; E-mail: RuiWangYZU@163.com.


J. Ding et al. / Modeling and Evaluating Intelligent Real-Time Route Planning 543

And under this context, carpooling and ride sharing are becoming two most promising
approaches to realize sustainable mobility [2]. Furthermore, carpooling can also help
save travel cost, alleviate traffic pressure, save fuel and protect environment.
Except [1, 2], there are many other researchers have studied on how to make a more
convenient and economical carpooling. In [3], by adopting the genetic algorithm, the au-
thors propose a Low-Complexity and Low-Memory Carpool Matching method to solve
carpool services problems for passengers and help to solve traffic congestion. In [4], by
analyzing a proposed Automated Wireless Carpooling System (AWCS) and the Central
Monitoring System (CMS), the authors find it is efficiently to make passengers get in-
to carpool vehicles in a city. In [5], through applying a cloud computing framework,
the authors propose an intelligent carpool system called BlueNet which mainly consists
of the Mobile Client module and the Cloud Global Carpool Services. Furthermore, by
experiment, they also find this system can dramatically reduces the processing time of
obtaining the carpooling results. Moreover, in [6], authors make an research on how to
utilize positioning systems to support a dynamic network of car and taxi pool services
and maximize use of empty seats of cars and taxis.

Figure 1. The architecture of intelligent real-time route planning and carpooling system (The CSM and Driver
module are extracted from [3, 5])

Inspired by these studies, this paper mainly focuses on the architecture of the in-
telligent real-time route planning and carpooling in ITS. As is shown in Figure 1, by
utilizing positioning system (GPS module), storage module, intelligent analysis system
(ITS, AM and CSM), communication network and a variety of smart mobile devices, this
paper makes an analysis of the intelligent real-time route planning and carpooling sys-
tem. Meanwhile, this paper adopts a formalization language — Performance Evaluation
Process Algebra (PEPA) to model and evaluate the system. From [7], it is easy to find
PEPA language has a great advantage on modeling and evaluating systems with closure
process. Moreover, we use fluid flow approximation to conduct performance evaluation
based on the model. In [8, 9], it is easy to know that in fluid approximation, the discrete
state space is described as continuous. Furthermore, the continuous time Markov chains
(CTMC) that a PEPA model based on is converted into ordinary differential equation-
s (ODEs). Therefore, the relative performance of the corresponding PEPA model (i.e.
544 J. Ding et al. / Modeling and Evaluating Intelligent Real-Time Route Planning

the real-time route planning and carpooling system) can be obtained by the numerical
solution of ODEs.
Section 1 gives the specific PEPA models of the process of intelligent route planning
and carpooling and do performance evaluation of the model. Section 2 concludes this
paper.

1. Modeling and Performance Evaluation

1.1. Application Example and PEPA models

This section is mainly to describe the general processing of intelligent real-time route
planning and carpooling. From Figure 2, it is easy to find that the whole process is di-
vided into eight components: Passengers module, Driver module, GPS, traffic acquire
devices (TAD), intelligent traffic system (ITS), analysis module (AM), carpool services
module (CSM) and storage module (SM). Moreover, in Figure 2, rectangles denote indi-
vidual activities, round rectangles represent shared activities and diamonds denote choice
in PEPA. Meanwhile, it is easy to find three kinds of arrows are used in Figure 2. Solid
arrows are used to describe the execution sequences of different components. Dotted ar-
rows represent the specific execution sequences of all activities within every component.
Shorted dotted arrows denote a choice of execution sequences of some activities within
SM module.
The whole specific working process of the intelligent real-time route planning and
carpooling system is stated as follows (here, we just list the PEPA models of Passengers
module and CSM. According to the semantic of PEPA, readers can get the corresponding
PEPA models of the other modules.):

Figure 2. The processing of intelligent real-time route planning and carpooling (The CSM and Driver module
are extracted from [3, 5])
J. Ding et al. / Modeling and Evaluating Intelligent Real-Time Route Planning 545

1. Passengers Module: First sends route planning and carpooling request (i.e.
route_carpool_req) to ITS module. Then, receives detailed results of travel routes and
carpooling which are returned from ITS module (i.e. route_carpool_rsp). Finally, exe-
cutes reset operation (i.e. reset1) and prepares to send new requests to ITS module. The
corresponding PEPA models are:
def
P assenger1 = (carpool_req, rcarpool_req ).P assenger2
def
P assenger2 = (carpool_rsp, rcarpool_rsp ).P assenger3
def
P assenger3 = (reset1, rreset1 ).P assenger1
2. ITS Module: The main functions of this module are described as follows:
• Sends a location request (i.e. locate_req) to GPS module and receives positioning
results from SM module (i.e. locate_rsp).
• Sends a request to TAD module to acquire the corresponding traffic status and
accepts the query results from SM module (i.e. traf f ic_rsp).
• Integrates location result and the corresponding traffic information (i.e. locate_tr
af f ic_comb), sends request to AM module for route planning (i.e. route_req) and ac-
cepts the corresponding route planning result (i.e. route_rsp).
• Sends carpool planning request to CSM (i.e. carpool_req) and receives carpool
result pushed by CSM (i.e. carpool_rsp).
• Sends request of carpool confirmation message to Driver module (i.e. inf o_conf _
req) and acquires the corresponding results (i.e. inf o_conf _rsp).
3. Driver Module: This module will first determine whether the corresponding trav-
el information has been generated (i.e. judge_3), upon receiving the request from IT-
S module. If the travel information has not been generated, then the driver first gener-
ates travel information (i.e. inf o_generate) and returns this information to ITS module.
Otherwise, this module will directly return the information to ITS module. Next, Driver
module executes reset operation (i.e. reset2) and prepares to receive new requests.
4. GPS module: This module will first make a judgment to determine whether it
has connected to the corresponding usable satellite (i.e. judge_1), after receiving lo-
cation request from ITS module. If not, then, GPS connects to the usable satellite (i.e.
satellite_link) and generates the specific location information (i.e. locate_generate).
Otherwise, GPS generates the position information directly.
5. TAD module: Once accepting the request from ITS, this module will first judge
whether the traffic status has been obtained (i.e. judge_2). If not, TAD first acquires
traffic status data (i.e. traf f ic_obtain) and then transforms the acquired data into real-
time traffic information (i.e. traf f ic_generate). Otherwise, TAD generates real-time
traffic information straightway.
6. AM: This module is used to generate intelligent analysis on the information which
consists of location and traffic status provided by ITS module (i.e. route_plan) and
feedbacks the best route to ITS.
7. CSM: The main functions of this module are described as follows:
• After receiving the carpool request from ITS, CSM will first send a request to
SM to obtain detailed information of available vehicles (i.e. car_inf o_req) and receive
query results of vehicles from SM (i.e. car_inf o_rsp).
• According to the information of vehicles and route planning, CSM will make
an intelligent analysis, generate the optimal matching result (i.e. carpool_match) and
return the result to ITS.
The corresponding PEPA models are:
546 J. Ding et al. / Modeling and Evaluating Intelligent Real-Time Route Planning

def
CSM1 = (carpool_match_req, rcarpool_match_req ).CSM2
def
CSM2 = (driver_inf o_req, rdriver_inf o_req ).CSM3
def
CSM3 = (driver_inf o_rsp, rdriver_inf o_rsp ).CSM4
def
CSM4 = (carpool_match, rcarpool_match ).CSM5
def
CSM5 = (carpool_match_rsp, rcarpool_match_rsp ).CSM1
8. SM: This module is responsible for unified store and call the information of loca-
tion, traffic and car (i.e. locate_get, traf f ic_get and car_inf o_get). Meanwhile, SM
will also provide the information to specific modules of the system (i.e. ITS and CSM
modules).

1.2. Parameter Setting and Description

In this section, all activities and components involved in the PEPA models are extracted in
Table 1 and Table 2 respectively. In addition, parameters of all activities and components
are also listed. In Table 1, most parameters are obtained through the Internet via practical
test in the form of debug. The duration denotes the corresponding execution time of every
activity and it’s unit is in seconds. Because of equipment limitations, it is difficult to
obtain the specific information of every component. Then, the amount of each component
in Table 2 is assumed based on Internet resources and actual situation.
Table 1. Description and duration of all activities
Action Duration (s) Action Duration (s)
route_carpool_req 0.000370 route_carpool_rsp 0.00351
reset1 1.0 locate_req 0.000480
locate_rsp 0.000330 traf f ic_req 0.000350
traf f ic_rsp 0.000560 locate_traf f ic_comb 0.0025
route_req 0.000350 route_rsp 0.030
carpool_req 0.000480 carpool_rsp 0.000560
inf o_conf _req 0.00351 inf o_conf _rsp 0.000370
judge_1 0.001 satellite_link 1.499
locate_generate 0.670 locate_store 1.081
locate_get 0.713 traf f ic_store 1.081
traf f ic_get 0.713 car_inf o_req 0.000480
car_inf o_get 0.713 car_inf o_rsp 0.000560
judge_2 0.001 traf f ic_obtain 0.670
traf f ic_generate 0.670 route_plan 2.801
carpool_match 1.227 judge_3 1.0
inf o_generate 2.0 reset2 1.0

1.3. Performance Analysis

The specific dynamic performance of the real-time route planning and carpooling mod-
el is given out in this section (see Figure 3, Figure 4). As shown in these figures, re-
sponse time of the whole process of the model and throughput of car_inf o_get and
locate_traf f ic_comb will be respectively analyzed.
J. Ding et al. / Modeling and Evaluating Intelligent Real-Time Route Planning 547

Table 2. Number of components


Component Number Component Number
Passenger 300 Driver 300
GPS 60 TAD 60
ITS 80 AM 100
SM 100 CSM 100

In Figure 3, the maximum number of passengers (in Table 2) is set as constant. Then,
the analysis is made on the probability of passengers finish the whole process of real-time
route planning and carpooling. So, in Figure 3, it is easy to find that when the number
of passengers increases, the probability of passengers complete the route planning and
carpooling is lower, i.e. passengers spend more time completing the search process.

 




  





!"
 !"
!"

        


Figure 3. Response time of passengers complete the process of real-time route planning and carpooling

'$&'$%*%% * * %*&



'$&'$%( )  




* %*&
 *%% * 




    
#$ %&

Figure 4. Throughput (car_inf o_get, locate_traf f ic_comb) VS the number of Passengers

The throughput of car_inf o_get and locate_traf f ic_comb are analyzed in Figure
4. From the two figures, it is easy to find that when the number of passengers increases
(i.e. when the requests from passengers increase), the throughput of these activities be-
comes larger too. However, the curve becomes smooth when the throughput approaches
the maximum value of system.
548 J. Ding et al. / Modeling and Evaluating Intelligent Real-Time Route Planning

Through analyzing the response time and throughput of the model, it is helpful to
test the ability of the system to process passengers’ requests in practice and maximize
the utilization of the system.

2. Conclusion

This paper employs formalization PEPA to model the whole process of real-time route
planning and carpooling in ITS. As in practice, an efficient ITS is of great significance for
the management of traffic information and reduction of traffic congestion. Meanwhile,
performance evaluation of the system can improve the use of the system. In our future
work, we will focus on intelligent decision and intelligent push of real-time traffic. Fur-
thermore, we will look for a rational and effective intelligent decision-making algorithm
and apply it in ITS.

Acknowledgements

The authors acknowledge the financial support by the National NSF of China under Grant
No. 61472343, the National Natural Science Foundation of Jiangsu Province under Grant
No.BK20160543 and BM20082061507.

References

[1] A. Awasthi, and S. S. Chauhan: Using AHP and Dempster-Shafer theory for evaluating sustainable
transport solutions. Environmental Modelling & Software, 26 (2011), 787-796.
[2] E. Cangialosi, A. D. Febbraro, and N. Sacco: Designing a multimodal generalised ride sharing system.
Institution of Engineering and Technology, 10 (2016), 227-236.
[3] M. K. Jiau, and S. C. Huang: Services-Oriented Computing Using the Compact Genetic Algorithm
for Solving the Carpool Services Problem. IEEE Intelligent Transportation Systems Society, 16 (2015),
2711-2722.
[4] R. K. Megalingam, R. N. Nair, and V. Radhakrishnan: Automated Wireless Carpooling System for
an eco-friendly travel. 3rd International Conference on Electronics Computer Technology (ICECT), 4
(2011), 325-329.
[5] S. C. Huang, M. K. Jiau, and C. H. Lin: A Genetic-Algorithm-Based Approach to Solve Carpool Service
Problems in Cloud Computing. IEEE Transactions on Intelligent Transportation Systems, 16 (2015),
352-364.
[6] P. Lalos, A. Korres, C. K. Datsikas, G. S. Tombras and K. Peppas: A Framework for Dynam-
ic Car and Taxi Pools with the Use of Positioning Systems. Computation World: Future Comput-
ing, Service Computation, Cognitive, Adaptive, Content, Patterns (COMPUTATIONWORLD ’09),
DOI.10.1109/ComputationWorld.2009.55, (2009), 385-391.
[7] J. Ding: A Comparison of Fluid Approximation and Stochastic Simulation for Evluating Content Adap-
tation Systems. Wireless Personal Communications, 84 (2015), 231-250.
[8] J. Ding: Structural and Fluid Analysis for Large Scale PEPA Models–With Applications to Content
Adaptation Systems. PhD Thesis, The University of Edinburgh, 2010.
[9] J. Ding and J. Hillston: Numerically Representing Stochastic Process Algebra Models. Computer Jour-
nal, 55 (2012), 1383-1397.
Fuzzy Systems and Data Mining II 549
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-549

Multimode Theory Analysis of the Coupled


Microstrip Resonator Structure
Ying ZHAO1, Ai-Hua ZHANG and Ming-Xiao WANG
The Institution of Electrical engineering and Information Engineering of Lanzhou
University of Technology, Lanzhou, 730030, China

Abstract. In this paper, multimode problem of the coupled microstrip resonator


structure is analyzed using the ‘equivalent removing’ method. By this method,the
difficult computation for the electromagnetic equation is avoided. The coupled
microstrip resonator structure is seen as the substructures of a microwave equalizer
which can make the TWTA’s(Travelling Wave Tube Amplifier ) gain character
linear. In order to make the designing easier, the main structure of the equalizer
can be seen as the cascade structure of such substructures. Multimode problem
caused for the discontinuity structure makes the S-parameter of the main structure
cannot be gotten from the cascade equation easily. So the analysis of the
multimode problem of the substructure is the most important thing in the
equalizer’s cascade designing. Good results were obtained when the ‘equivalent
removing’ method was used to analyze the problem, which made the cascade
equation can be used to compute the main structure’s S-parameters in the
designing.

Keywords. multimode, TWTA, equalizer, equivalent removing, microstrip


resonator, HFSS

Introduction

Multimode problem often occurs when discontinuity is introduced into the microwave
structure such as the coupled microstrip resonator structure which is seen as the
substructures of a Microwave equalizer which can make the gain characteristics of the
traveling-wave-tube amplifiers (TWTA) linear [1]. While solid-state devices’ gain
characteristics linear, TWTAs’ output is nonlinear, whose center point has the
maximum value [2]. The main problem of the design is how to deal with the multimode
problem caused for the discontinuities in the structure. The multimode problem of the
equalizer is same as in the basic substructure which is composed with only one
resonator coupled to the transmission line. The common method for the multimode
problem is to compute the modes field precisely for the structure. But as we all know, it
is too difficult to get precise results. And also for the equalizer designing, the goal is
the implementation not the field computation. So choosing a suitable method to deal
with it for the equalizer’s precise realization is critical to the designing.
Without multimode problem, show as in figure 1, the equalizer’s S-parameter can
be calculated using the cascade equation directly. But the multimode problem cannot

1
Corresponding Author: Ying ZHAO: PhD in communication engineering, The Institution of
Electrical engineering and Information Engineering of Lanzhou University of Technology. E-mail:
Zhying2005@163.com.
550 Y. Zhao et al. / Multimode Theory Analysis of the Coupled Microstrip Resonator Structure

be avoided for its discontinuity structure. Compromising the multimode effect, the
method of changing structure itself is improved in the paper. We analyzed an equalizer
using thin-film resistor loaded branch microstrip resonator as an example.Investigating
the length of the microstrip trace which decides the multimode’s fading character. Such
equalizer has a microstrip with finite length openning at one end and loaded with
thin-film resistors on the other end[3][4]. It can provide a good attenuation curve to
compensate a TWTA’s nonlinear response.

Figure 1. The equalizer structure Figure 2. The basic substructure

1. Basic Structure and Formulation

The equalizer’s basic substructure is shown as in figure 2. There is only one resonator
is coupled to the microstrip trace.

Figure 3. The equivalent cascaded network Figure 4. S-parameter model of the cascaded network
The cascades equivalent network is shown as in figure 3, where[5][6]: [Sa] is the
input continuity part’s equivalent network, [Sb] is the discontinuity part’s equivalent
network, [Sc] is the output continuity part’s equivalent network.
The cascaded network’s S-parameter model with two cases is shown as in figure 4.
We suppose the normalized impedance is matching to the output port of the first case
and the input port of the second case. Then their S-parameters are [S]1 and [S]2
respectively[7]. Where:

b11 1
S11a11  S12
1
a21, b21 1
S21a11  S22
1
a21
b12 2
S a
11 12  S a22 , b22
2
12
21
S21 a12  S22
2
a22 (1)

Then, b21 a12 , b12 a21 .So the total S-parameters of the structure is as follows:

ª 1 S121 2
S11 1
S 21 1
S12 S122
º (2)
« S11  1  S 1 S 2 1  S 22
1
S112 »
[S ] « 1 2
22 11
2 1 2
»
« S 21 S 21 2
S 22
S S S12 »
 21 22
« 1  S 22 S11
1 2 2 »
1  S 22 S11 ¼
1
¬

­ S11 S11  ( S12 S11 S 21 ) /(1  S 22 S11 )


a a b a a b

° (3)
( S12 S12 ) /(1  S 22 S11 )
a b a b
° S12
®
( S 21 S 21 ) /(1  S 22 S11 )
a b a b
° S 21
°S S 22  ( S 21 S 22 S12 ) /(1  S 22 S11 )
b b a b a b
¯ 22
Y. Zhao et al. / Multimode Theory Analysis of the Coupled Microstrip Resonator Structure 551

Ignoring the multimode effect, from figure 3 the total S-parameters can be seen as
the cascade result of the [Sa] , [Sb] and [Sc] with Eq. (3).But as we can see the
discontinuity is near the resonator. So Eq. (3) cannot be used to compute the total S
parameter. So we supposing it’s S-parameter is as follows[7]:

ªV1,#1 º ª ª S11 S12 º


#1, #1
ª S11 S12 º º ªV1,#1 º
#1, # 2

«V2, #1 » « «S S 22 »¼ «¬ S 21 S 22 »¼ » «V  » (4)
«  » « ¬ 21 # 2 , #1 # 2,# 2
» « 2, #1 »
«V1, # 2 » « ª S11 S12 º ª S11 S12 º » «V1, # 2 »
«¬V2, # 2 »¼ « «¬ S 21 S 22 »¼ «¬ S 21 S 22 »¼ » «¬V2, # 2 »¼
¬ ¼

The S parameter of the even part far away from the discontinuous part should be
seen as the even transmission line. So the most important thing for the total S
parameter is [Sb] .And the equivalent circuit of the discontinuous part is shown as in
figure 3.Where [Sa] , [Sb] and [Sc] has the same meaning as before. And then we get
the S-parameter model of the multimode circuits as shown in figure 5:
V1,#1
V2,#1
V1,#1 V2,#1
ª[S]#1,#1 [S]#1,#2 º
« #2,#1 »
V1,#2 ¬[S] [S]#2,#2 ¼ V2,#2

V1,# 2 V2, #2

Figure 5. S-parameter model of the multimode circuits


For every mode the S parameter is:

#i , #i
ª S11 S12 º (5)
>S @ «S
¬ 21 S 22 »
¼

Where: S V j, #i ,S Vk, #i ,the mutual S parameter is:


Vk ,#i 0 , k z j Vl ,#i 0 ,l z j

V j, #i
jj jk
V j , #i

ª S11 S12 º
#i , # j
(6)
>S @ «S
¬ 21 S 22 »
¼ iz j

Where: Vk, #i 㧘 Vl ,#i , the total S-parameter is: [7]


S kk S kl 0,m z k
0 ,l z k
Vk, # j
 Vl ,# j Vm ,# j
V k ,# j

º ª V1,#1 º

ª V1,#1 º ª ª S11 S12 º
#1, #1 #1, # 2 #1, # n
«  » ª S11 S12 º ªS S12 º «  »
«« " « 11 ! »
«V2,#1 » « ¬ S 21 S 22 »¼ «S
¬ 21 S 22 »¼ ¬ S 21 S 22 »¼ V
» « 2,#1 » (7)
«V1,# 2 » «ª S # 2 , #1 # 2,# 2 # 2,#n » «V1,# 2 »
S12 º ª S11 S12 º ª S11 S12 º
«  » « « 11 » «S ! « "» «V  »
«V2,# 2 » « ¬ S 21 S 22 ¼ ¬ 21 S 22 »¼ ¬ S 21 S 22 »¼ » « 2,# 2 »
« # » « « »
«  » # # # # # »« # »
« # n , #1 #n ,# 2 #n,#n » 
«V1,#n » « ª S11 S12 º ª S11 S12 º ªS
! « 11
S12 º «V »
"» « 1,#n »
«V  » « «¬ S 21 »
S 22 ¼ «S
¬ 21 S 22 ¼»
¬ S 21 S 22 »¼ » V2 , # n
« 2,#n » « « »
¬« # ¼» ¬ # # # # # »¼ « # »
¬ ¼
552 Y. Zhao et al. / Multimode Theory Analysis of the Coupled Microstrip Resonator Structure

Where: V jr,#i :( j 1, 2 ) is the mode voltage of the incident wave and reflect wave
of the ith mode of the jth physical port. S11 and S 21 :are the real quantities representing
the reflection and transmission coefficients respectively. Mutual S parameter is the
effect of the multimode on the main mode of the transmission structure.
From Eq. (7), we can see without multimode the total S parameter will has no
mutual S parameter, and it becomes the same as the main mode S parameter of the
transmission structure. So, if we can remove the multimode effect from the structure
we can remove the difficulty in the computation of the total S- parameters, and so the
cascade equation can be used to calculate the cascade structure. So ‘equivalent
removing’ method is developed in this paper.

2. Analysis and Rsults

Substructures shown in figure 6 are fabricated for the analysis. And also HFSS
simulator from Ansoft Corporation was programmed to analyze the multimode effect
of the substructures[8].The software includes post-processing commands for analyzing
this behavior in detail. The same results were gotten from both the measurements and
the simulations.

Figure 6. The equivalent circuits of the discontinuity (a) (b) (c) and (d)
The results are shown in figure 7 (only the transmission S21 coefficients
comparisons were shown). We analyzed the difference of transmission characteristics,
especially S21, of the multimode and the multimode removing structure, shown as the
figure 6(b).The layout of the proposed branch resonator coupled to 50 : microstrip line.

Figure 7. The multimode data and Figure 8. S21 of the multimode and the different
the non-multimode data multimode removing data
The results shown in figure 7 shows that the resonant frequency of the multimode
data is lower than the multimode removing structure, while the attenuation is larger.
The main reason for this phenomenon is that the multimode made the loaded capacity
of the resonator larger which lengthened the branch equivalently and lowered the
resonant frequency. And the multimode consumed the energy of the main mode which
Y. Zhao et al. / Multimode Theory Analysis of the Coupled Microstrip Resonator Structure 553

made the attenuation larger. The data for the both sides multimode removing and single
side multimode removing data are shown as in figure 8,and ‘m’ means multimode data,
‘srm’ means single side multimode removing data, and ‘brm’ means both side
multimode removing data. The rule of the changing of the multimode is same.
So as for the structure shown in figure 6 (c), if we divide it through line a-b, we
can see the multimode between the two parts cannot be omitted simply. So the cascade
formulation (4) cannot be used to get the total S-parameters of the structure. So it is
difficult to deal with the designing. To simplifying the designing, the ‘equivalent
removing’ method is improved in this paper which is improved with numerical and
simulations. It means that we can divide the structure as figure 6(c) into two parts and
find the substitute structure as figure 6 (b) for the trace is long enough. The first part of
the divided structure is called the single side multimode removing equivalent structure.
And the second part of it can be substituted with a reversal structure as figure 6(d).
In this way, the multimode effect in the middle of the structure shown in figure 6
(c) can be removed equivalently, and also we can get the total S-parameter of the
structure (c), from the cascade formulations (4). In the formulations we use the
S-parameters of the structure (b) to substitute the S-parameters of the structure (a)
which removing the multimode effects of the structure equivalently. The results of the
method used to designing the equalizer show as below which the original computed
S-parameter results compared with the measured data shows in figure 9.
In figure 9, the calculated data means the results which is using the cascade
method to get the total S-parameter of the structure (c) by dividing it into two equal
parts as (a), and using the original S-parameter of the structure (a) to compute the total
S-parameter of the structure(c). The result in the figure shows that a large error is
introduced. So we know the cascade method cannot be used in this way to calculate the
total S-parameters for the structure (c).

Figure 9. The original and the calculated result Figure 10 The adjusted and the calculated result
When we were using the fading character of the high modes the ‘equivalent
removing’ method is improved and used, the good results are gotten, as shown in figure
10. This time we use the equivalent structure and corresponding S data to compute the
total S-parameters with the formulations (4). From figure 10, we can see the curve of
the computed data, and the curve of the measured data which was measured by HP8255
are almost overlapped which means the method is suited for the cascade design. And
the errors between the computed data and the measured data are given as shown in
figure 11. From the figure, we can see, for 95% of all the points we considered above,
the error is lower than 1dB. And the error on 80% of all the points above is lower than
0.7dB. The error data shows a very good matching of the computed data and the
measured data which means the method we used is right and is good to the designing.
554 Y. Zhao et al. / Multimode Theory Analysis of the Coupled Microstrip Resonator Structure

Figure 11. The error data

3. Conclusion

We have investigated the influence of the length of the microstrip trace and cascade
state to the discontinuous problem of the microstrip equalizer designing through
numerical simulations done with HFSS method and experiments. And numerical results
compare very well with the experimental results. The results show that when the first
high mode fades 90dB, in the both side removing structure, the cascade result is quite
well which can meet the equalizer designing requirement well. In the signal side
removing structure, as the removing side is designed to made the first high mode fades
90dB, the other side which is the side with multimode is designed to make the first high
mode fades only 6dB[9][10].And also that the resonant frequency is closely related to
the thin-film resistor’s dimension loaded resonator, and the attenuation is closely
related to both the resonator and the trace.

References

[1] D. J. Mellor, On the Design of Matched Equalizer of prescribed Gain Versus Frequency Profile. IEEE
MTT-S International Microwave Symposium Digest, 1997, 308-311.
[2] Broadband MIC Equalizers TWTA Output Response. IEEE Design Feature. Oct 1993.
[3] J. Y. Chi, G. X. Zhang, G. Huang. CAD and Experimental Research of The Microstrip Eqaulizer for
TWT Amplifier, Journal of Electronics & Information Technology , April 1989.
[4] M. Sankara Narayana. Gain Equalizer Flattens Attenuation Over 6-18GHz. Applied Microwave
&wireless. November 1998, 74-78.
[5] JD Baena,J Bonache,F Martin,RM Sillero,F Falcone,Equivalent-Circuit models for split-ring resonators
and complementary split-ring resonators coupled to planar transmission lines, IEEE Transactions on
Microwave Theory and Techniquesdoi ,2005, 53(4):1451-1461.
[6] V Sanz,A Belenguer,AL Borja,J Cascon,H Esteban, Broadband Equivalent Circuit Model for a Coplanar
Waveguide Line Loaded with Split Ring Resonators, International Journal of Antennas & Propagation,
4(2012):1238-1241.
[7] O Pitzalis㧘RA Gilson, Tables of Impedance Matching Networks Which Approximate Prescribed
Attenuation Versus Frequency Slopes, IEEE Transactions on Microwave Theory and Techniques,
19(1971):381-386.
[8] M. Shattuck. EM-Based Models Improve Circuit Simulators. Microwave &RF ,2000 JUNE 97-108
[9] P. Heymann, H. Prinzler, and F. Schnieder. DE-embedding of MMIC transmission-line measurements.
1994 IEEE MTT-S Digest 1045-1048.
[10] GD Vendelin, AM Pavio, UL Rohde, Microwave Circuit Design Using Linear and Nonlinear
Techniques, Wiley, 37(2005):973-974.
Fuzzy Systems and Data Mining II 555
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-555

A Method for Woodcut Rendering from


Images
Hong-Qiang ZHANG, Shu-Wen WANG1, Cong MA and Bing-Kun PI
College of Electrical Engineering, Northwest University for Nationalities, Lanzhou,
China

Abstract. This paper demonstrates a method for obtaining the woodcut rendering
results of images. The first four steps to produce woodcut rendering simulation
image include: boundary extraction, histogram matching, image edge enhancement
and binarization processing. First, we used the Roberts operator to extract the
image boundary. Second, we used the histogram matching to adjust the color
distribution of the gray image. Third, through image fusion we fused the image
boundary and the adjusted image. Finally, we used image binarization by setting
the threshold to get woodcut rendering results. Experimental results showed that
the algorithm has low computational complexity, and the real-time characteristic
of it is very good. Using the method in this paper, we can get the woodcut
rendering results with excellent, artistic effects.

Keywords. Non-photorealistic rendering, woodcut rendering, image binarization

Introduction

Non-photorealistic rendering is a kind of rendering technique. It simulates various


kinds of art form images by computer, such as pencil drawing and painting. This
technology has become a bridge between computer science and art expression. The
non-photorealistic rendering can use all kinds of methods to convert a digital image
into an image with artistic effects.
In recent years, scholars obtained a lot of outstanding research results in the non-
photorealistic areas. However, there are very few studies on woodcut rendering
research. Mizuno et al. introduced a complete system by modelling the woodblock,
paper and ink [1]. This method needs complex mathematical calculation. The result is
limited to the quality of the model. Mello et al. introduced an image-based rendering
method [2]. They simulated the score of woodcuts by a function along the trajectory.
However, the function is so simple that simulated scores greatly differ from real scores
in the woodcuts. Jie Li and Dan Xu introduced a simple and efficient method to get a
one-of-a-kind, traditional woodcut rending result [3]. Unfortunately, the rending result
is not very clear.
The process of creating a woodcut is very complex and time consuming. This
article wants to demonstrate a method that will obtain woodcut artwork in a simple and
real-time manner, as well as meet the public’s demand for such works of art. Based on
the simulation of the artists’ operating process, we mainly consider the following two

1
Corresponding Author: Shu-Wen WANG, College of Electrical Engineering, Northwest University for
Nationalities, China; E-mail: shuwenwang@163.com.
556 H.-Q. Zhang et al. / A Method for Woodcut Rendering from Images

problems: 1) artists use sketches to describe the overall shape and main profile; 2)
artists grave the dark space to express the image’s tone and level in different areas.
There is comprehensive consideration of the above two issues. This paper presents
a stepwise woodcut rendering method. We use four steps to produce a woodcut
rendering simulation image: boundary extraction, histogram matching, image edge
enhancement and binarization processing.

1. Boundary Extraction

The image boundary data provides the basic information for photo processing [4]. In
the process of photo processing, we greatly need boundary information, because there
are many important image details in the image boundary.
Robert’s operator [5] is a very simple algorithm, using a local difference operator
for the edge. Robert’s operator has high positioning accuracy and the advantages of a
sensitivity to noise. Figure 1 is the original drawing. Robert’s operator boundary
extraction result is shown in Figure 2.
Robert’s operator convolution factors as follows:
ª1 0 º ª 0 1º (1)
Gx Gy
«¬0 1»¼ «¬1 0»¼
Formula for calculating grey degrees:
2 2
G Gx  G y (2)
The specific calculation is as follows:
G x, y
abs f x, y  f x  1, y  1  abs f x, y  1  f x  1, y (3)

Figure 1. The original drawing Figure 2. The result of boundary extraction

2. Histogram Matching

Different images have different grayscale distribution, as shown in Figure 3. This


causes some problems regarding the experiment of the late binarization processing. We
need to set different thresholds for binarization according to the different image. This
would be incredibly time-consuming and tedious. So, in this paper, we put out a
method to adjust image gray-scale distribution by using histogram matching [6, 7]. In
H.-Q. Zhang et al. / A Method for Woodcut Rendering from Images 557

this way, the image information will be concentrated so we can set up the unified
threshold for binarization processing. The Histogram Matching results are shown in
Figure 4.
This article uses the following function as the distribution curve:
­ 1  ucVv ½
° e if u b d v d u c °
p1 (v) ® V ¾ (4)
°¯ 0 otherwise ° ¿
­ 1 ½
° if v d u b °
p 2 (v) ® u b  u a ¾ (5)
°¯ 0 otherwise °
¿
Among them u a 105, u b 225, u c 255, V 9 . The function curve is shown in
Figure 5.

Figure 3. The gray histogram of three different Figure 4. The gray histogram after the histogram
photo matching
558 H.-Q. Zhang et al. / A Method for Woodcut Rendering from Images

9
8
7
6
5
4
3
2
1
0 109
121
133
145
157
169
181
193
205
217
229
241
253
1
13
25
37
49
61
73
85
97

Figure 5. The function distribution curve for histogram matching

3. Image Edge Enhancement

As shown in Figure 6(b), the image has very high brightness and has an obvious
phenomenon of image distortion. As a result, we need to strengthen the details for
Figure 6(b). The boundary of the image has some important details. Therefore,
enhancing the image boundary information can achieve the purpose of enhancing the
image [8]. In this paper, we enhance the image in Figure 6(b) by using edge
enhancement [9]. The image in Figure 6(c) can be obtained by adding edge information
to Figure 6(b). The detail of (c) has significant improvement compared with the detail
of (b).

(a) (b) (c)


Figure 6. (a) the original image, (b) the result of histogram matching, (c) the result of edge enhancement

The specific function is the following:


T = S .* J (6)
Among them, S is the border of the original image, J is the result of the histogram
matching and T is the result of edge enhancement.
H.-Q. Zhang et al. / A Method for Woodcut Rendering from Images 559

4. Image Binarization Processing

As shown in Figure 7, woodcut image is a kind of binary image. Also, the images only
have two kinds of color: black and white. In order to achieve woodcut rendering, we
need the image binarization [10] processing. In order to reduce the computational
complexity, we realize the binarization processing by setting the binarization threshold
value [11].

Figure 7. Classic woodcut works


The binarization formula is as follows

p(v) ^1 vtu
0 vu ` (7)

By setting the threshold value (u) to control the result of the binarization
processing, we have concentrated the image information through histogram matching.
And as shown in Figure 5, the image information is concentrated in the part of the gray
level values greater than 230. As a result, the value of the threshold (u) can be set as
230.

5. The Experimental Results

This article has done woodcut rendering works for three different types of images.
They are: (a) Natural scenery, (b) Cityscape and (c) Portrait photography. We also
received very good rendering results, as shown in Figure 8.
Compared with the original image and the results of (a), it can be found that the
results of the rendering image are very details and that the details are very accurate. For
example, the reflection in the river is not only a black shadow. There are some white
spots in the reflection.
Compared with the original image and the result of (b), it can be found that the
shop signs on the wall, the structure and the texture of the building are very clear.
However, we can only see part of the clouds. This is a problem that needs to be solved.
Compared with the original image and the result of (c), it can be found that the
portrait of the result is very obvious. Also, the woman's hat is very clear as well.
560 H.-Q. Zhang et al. / A Method for Woodcut Rendering from Images

However, there is a certain degree of roughness in the background. Another problem to


be solved exists in discovering how to improve the rendering quality.
Experiments prove that this algorithm can obtain the rendering results which have
the artistic effects’ of woodcuts. In this paper, woodcut rendering can be realized in
real time, and this method has very good generality.

(a) Natural scenery

(b) Cityscape

(c) Portrait photography


Figure 8. Three different types of image and their rendering results
H.-Q. Zhang et al. / A Method for Woodcut Rendering from Images 561

6. Conclusion

In this paper, we realized a simple method for woodcut rendering. First, we used the
Robert’s operator to extract the image boundary. Second, we used the histogram
matching to adjust the color distribution of the gray image. Third, through image fusion
we fused the image boundary and the adjusted image. Finally, through image
binarization, we set the threshold in order to get woodcut rendering results.
Experimental results show: the algorithm has low computational complexity and the
real-time characteristics of it are very good. The proposed method can obtain woodcut
rendering images with better artistic effects, as well as having excellent universality.
Future research will include improving the accuracy of rendering, the rendering quality
and the artistic quality of the rendering image.

Acknowledgements

We would like to express our gratitude to the National Natural Science Foundation of
China (No.61261042) and the scientific research innovation team about the key
technologies of Internet of Things of Northwest University for Nationalities. These
projects have brought a lot of help to the thesis.

References

[1] S. Mizuno and T. Kasaura, et al. Automatic generation of virtual woodblocks and multicolor woodblock
printing. Computer Graphics Forum, 19(2000), 51-58.
[2] V. Mello, C. R. Jung, et al. Virtual woodcuts from images. 5th international conference on Computer
graphics and interactive techniques, 2007: 103-109.
[3] J. Li, D. Xu. A scores based rendering for Yunnan out-of-print woodcut. 14th International Conference
on Computer-Aided Design and Computer Graphics, 2015: 214-215.
[4] J. H. Yeom, M. Y. Jung, Y. Kim. Line-based paddy boundary extraction using the Rapid Eye satellite
image. IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, 2015: 3397-
3400.
[5] R. L. Duan, Q. X. Li, Y. H. Li. Summary of image edge detection. Optical Technique, 31(2005): 415-
419.
[6] C. Q. Huang, Q. Zhang, H. Wang, et al. A low power and low complexity automatic white balance
algorithm for amoled driving using histogram matching. Journal of Display Technology, 11(2015): 53-
59.
[7] J. Wu, L. Lu, D. Dong. Fusion mutispectral and high resolution image using IHS transform and
histogram equilibrium. Journal of Wuhan University of Technology (Transportation Science &
Engineering), 28(2004): 55-58.
[8] J. Jin, S. Y. Tang, Y. Shen. An innovative image enhancement method for edge preservation in wavelet
domain. 2015 IEEE International Instrumentation and Measurement Technology Conference (I2MTC)
Proceedings, Pisa, 2015, 52-56.
[9] J. Chen, X. X. Cui, J. Xiao, et al. Properties of image edge enhancement using radial hilbert transform.
Acta Photonica Sinica, 40(2011): 483-486.
[10] B. Wu, Z. Y. Qin. New approaches for the automatic selection of the optimal threshold in image
binarization. Journal of Institute of Surveying and Mapping, 18(2001): 283-286.
[11] M. Soua, R. Kachouri, M. Akil. Improved Hybrid Binarization based on Kmeans for Heterogeneous
document processing. 9th International Symposium on Image and Signal Processing and Analysis
(ISPA), Zagreb, 2015, 210-215.
562 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-562

Research on a Non-Rigid 3D Shape


Retrieval Method Based on Global and
Partial Description
Tian-Wen YUAN, Yi-Nan LU1, Zhen-Kun SHI and Zhe ZHANG
College of Computer Science and Technology, Jilin University, Jilin 130000, China

Abstract. Retrieval of 3D shapes, especially non-rigid shapes, is attracting


growing interest in the research community. In this paper, a new robust feature
points extraction method based on two major approaches is presented. The
proposed method is proven to be robust when applied to non-rigid 3D shapes as
well as highly replicable in different scales. In addition, since partial features
provide more detailed information regarding non-rigid 3D shape retrieval, a
representation combining the global partial descriptors for 3D shape retrieval is
developed and a new similarity measurement is applied to the corresponding parts
of different shapes. Finally, the proposed method is implemented using global and
partial descriptors with different weights. The results indicate that the proposed
method efficiently performed non-rigid 3D shape retrieval.

Keywords. 3D mesh retrieval, segmentation, curvature flow, DoG

Introduction

Due to the rapid development of computer software and hardware, three-dimensional


models have become an abundant resource in many fields, such as computing and
medicine. Similar to 2D image retrieval, 3D object retrieval has attracted growing
interest in computer vision research. In non-rigid 3D shape retrieval, similar objects
can have different poses, which increases the difficulty of the retrieval process.
Feature-based approaches rely on the identification of 3D feature points on
surfaces. In Mesh-DoG, a feature extraction method developed by Zaharescu et al. [1],
a scale-space is constructed by implementing different normalized Gaussian derivatives
with the DoG operator [2]. Mesh-DoG is executed on a scalar function defined on the
manifold. The operator is computed on the scalar function of a mean curvature or
Gaussian curvature. In a similar method, the Salient Points (SP) method proposed by
Castellani et al. [3], the coordinates of all of the points are directly computed. In SP,
DoG is projected onto the normal direction, withstanding the influence of 3D
translation. While Mesh-DoG requires curvature information, SP cannot be used to
represent characteristic regions. In a word, the gap is still latent in this approach as any
possible combination of curvature and operation on coordinate.
Therefore, an accurate method of non-rigid 3D shape retrieval is still needed.
Global description is known to be a useful means of distinguishing a shape from other

1
Corresponding Author: Yi-Nan LU, College of Computer Science and Technology, Jilin University, Jilin
130000, China; E-mail: luyn@jlu.edu.cn.
T.-W. Yuan et al. / Research on a Non-Rigid 3D Shape Retrieval Method 563

shape classes. However, a recent study conducted by Lian et al. [4] suggests that some
global descriptors can also effectively identify similarities among shapes. In another
study, Bronstein et al. [5] developed the shape descriptor BoF-HKS based on the bags
of words (BoW) framework. However, the BoF-HKS shape descriptor yielded
unsatisfactory results when applied to a non-rigid 3D shape retrieval system in
SHREC’15 [6].
Some parts may not occur large shape deformation so that they can reserve original
partial characteristics. Ran et al. [7] and Li et al. [8] presented several partial matching
methods. However, these methods cannot be applied to shape retrieval. In a more
recent method proposed in [9] and [10], global and partial features were combined.
According to the results, this method outperformed other methods in content-based 3D
shape retrieval tasks. In another study, Sipiran et al. [11] suggested that a 3D generic
shape can be represented as a linear combination of a global descriptor and a set of
partial descriptors. First, the local features of the 3D Harris feature points were
computed. Then, a clustering approach was used to identify points in the same cluster
with similar features. However, points in the same cluster could be disjointed without
location information. Katz et al. [12] developed a novel hierarchical mesh segmentation
algorithm using Core Extraction. The segmentation results were invariant to both the
pose of the model and the differing proportions of the model’s components.
Furthermore, Zhang et al. [13] proposed a region-growing approach, in which the
mean-shift curvature is used to cluster points. The points within a cluster are then
compiled into mesh faces, which are connected to form sub graphs. These methods can
also be used to generate joint segmentation.
Therefore, in this study, a new approach is developed by combining the global and
partial descriptions for non-rigid 3D shape retrieval. This approach is fundamentally
different from most existing approaches to 3D shape retrieval. In addition, a
segmentation method is adopted to detect points using the improved DoG based on
curvature flow. Furthermore, an algorithm is used to measure the similarity among
segmented parts of different objects.

1. Method

Few approaches utilize 3D partitions as input in non-rigid retrieval tasks. In this section,
a segmentation approach based on Core Extraction is adopted. The complete and joint
partitions included in this approach are thought to be more applicable to non-rigid 3D
shape retrieval.

1.1. Mesh Segmentation


The Core Extraction approach was applied to a 3D mesh in order to generate the
complete and joint parts proposed by Katz et al. [12]. This method has been proven to
be pose-invariant and can be used to generate the correct segmentation of meshes. The
Core Extraction approach can be summarized by the following steps:
x Item Preprocessing: this step includes the mesh coarsening and pose-invariant
representation steps. In the mesh coarsening step, the algorithm working on
large meshes is accelerated in order to decrease noise sensitivity. In the pose-
invariant representation step, multidimensional scaling (MDS) is used to
transform the mesh into a canonical mesh.
564 T.-W. Yuan et al. / Research on a Non-Rigid 3D Shape Retrieval Method

x Keypoints extraction: the feature points which satisfy the condition are
identified and used to guide the segmentation process.
¦ GeodDist(ν,ν ) ! ¦ GeodDist(ν ,ν )
νiS
i
νiS
n i (1)

x Core component extraction and mesh segmentation: the spherical mirroring


operation is computed in order to extract the core component, and every
component is determined using one or more feature points.

1.2. Feature Points Extraction

The proposed algorithm was compared to SP and Mesh-DoG, as shown in Figure 1.


The left two images in Figure 1 were obtained using SP [14], the middle two images
were obtained using Mesh-DoG [14], and the two images on the right were obtained
using the proposed feature extraction method. Unlike SP, which only detects points
with large degrees of positional variation, Mesh-DoG utilizes the curvature information
on every vertex. This curvature information can be used to directly express geometrical
characteristics and detect a larger number of stable feature points. In the proposed
method, curvature flow information [15] is used to construct a scale space. Smoothing
with curvature flow information ensures that each vertex varies along the normal
direction at the rate of curvature. Furthermore, only the local maximum layer of the
three DoG layers is identified in the developed approach. However, the coordinates are
directly operated upon without directly computing the curvature.

Figure 1. Feature Points Detected by SP, Mesh-DoG, and the Proposed Algorithm.
Note that, for non-rigid 3D shape retrieval system, a robust feature points
extraction approach can be used to detect repeatable points in variant pose models. The
proposed algorithm, which was based on Mesh-DoG and SP, is presented in Algorithm .

1.3. Description

The proposed description can be divided into two types of descriptions, including a
global descriptor of the entire 3D model and a set of partial descriptors.
BoF-HKS was used as the global feature since it has proven to be robust and
effective against non-rigid transformation.
The following algorithm was used to select the set of partial descriptors:
x Given a partial descriptor, a set of feature points C can be obtained directly.
First, the number of feature points in C is determined. If the number of feature
points is less than a predetermined size, the set is discarded since the
performance of the set decreases as the number of feature points decreases.
x The neighbors surrounding the feature points are added to C in order to
improve the retrieval results.
T.-W. Yuan et al. / Research on a Non-Rigid 3D Shape Retrieval Method 565

x The BoW method is used to generate the feature representations of the partial
descriptor and normalize the feature vector.

Algorithm Improved DoG Based on Curvature Flow


Require: Input the mesh M1
Ensure: A set of feature points : C
1: Let C be a set of feature points;
2: C
3: for j=1 to max_layer do
4: Let function flow()represent the curvature flow operation
5: Mj+1=flow(Mj)
6: for each vertex pi on Mj do
7: , is the corresponding vertex on
8:
9:
10: end for
11: end for
12: for each vertex on DoGj do
13: if pi satisfies the following condition for all neighbors rpi around pi:
14: DoGj (pi) > DoGj (rpi)
15: then Let represent the corresponding vertex on DoGj-1
16: and represent the corresponding vertex on DoGj+1
17: if and satisfy the condition(line14) and
18: DoGj (pi) > DOGj-1() && DoGj(pi) > DoGj+1()
19: then Insert the into the set C
20: end if
21: end if
22: end for

1.4. Matching Distance

In order to determine how similar two objects are, the distances between pairs of
descriptors must be computed using a dissimilarity measure. In addition, a linear
combination is applied between the global and partial descriptor distance as suggested
by Sipiran et al. [11]. The global and partial descriptions of two 3D mesh models P and
Q can be expressed as:
DO {(GO , RO ) | GO  Rn , PO { pO1 , pO2 ,..., pOm}, pOi  R n } (2)
DQ {(GQ , PQ ) | GQ  Rn , PQ { pQ1 , pQ2 ,..., pQk }, pQi  R n} (3)
In addition, the matching distance can be expressed as:
d ( D O , DQ ) P GO  GQ 1  (1  P ) PO  PQ 1
(4)
where weighs the degree of correspondence between the two distances, Gi is the
global description, and Pi is the partial description.
When computing the global-to-global distance, L1 was applied instead of L2 [11].
The retrieval results of L1 distance and L2 distance are compared in the following
section. When computing the part-to-part distance, the degree of correspondence
between the two sets {PO} and {PQ} was unknown. In this section, when dealing with
two parts, all of the possible corresponding distances were computed, but only the best
and second-best corresponding distances were considered, as suggested by Lowe [2].
When the difference between the best and second-best distance was less than 0.2, the
part was assumed to correspond with the other part. Otherwise, the part was assumed to
have properties similar to the other part, such as those shared by the right and left hands.
566 T.-W. Yuan et al. / Research on a Non-Rigid 3D Shape Retrieval Method

2. Experiment

The feature extraction results are shown in Figure 2. The models were obtained from
the database provided in [16]. A total of 900 points were extracted from each mesh.
The four models shown in Figure 3 are in various poses. The next three models are in
Shot Noise, local scaling, and scaling with a similar pose positions. Feature points on
the ears, hands, and feet were repeatedly extracted.

Figure 2. Feature Point Extraction in Non-rigid Models.

Table 1. Repeatability at radius=5 of my approach and Mesh-DoG (mean curvature) feature detection
algorithm. Average number of detected points: 392
Strength
Transf. Method
1 2 3 4 5
Isometry My Appro. 91.69 96.44 93.86 90.93 92.84
Mesh-DoG 97.75 98.13 97.92 97.14 97.70
Scaling My Appro. 97.02 96.89 95.50 95.34 94.63
Mesh-DoG 98.00 98.00 98.00 98.00 98.00
Shot Noise My Appro. 99.23 98.85 98.46 98.10 97.99
Mesh-DoG 98.25 98.00 98.00 97.87 97.75
My Appro. 95.98 97.39 95.94 94.79 95.15
Average
Mesh-DoG 98.00 97.84 97.97 97.67 97.82

Table 2. Repeatability at radius=5 of my approach and SP feature detection algorithm. Average number of
detected points: 205
Strength
Transf. Method
1 2 3 4 5
Isometry My Appro. 88.48 93.31 91.9 89.79 91.79
SP 79.01 83.5 83.9 84.33 84.79
Scaling My Appro. 93.86 92.83 92.16 91.5 91.33
SP 84.68 82.36 80.77 78.98 77.42
Shot Noise My Appro. 98.28 96.57 95.59 94.37 93.14
SP 77.78 73.31 66.06 62.25 59.68
My Appro. 93.54 94.24 93.22 91.89 92.09
Average
SP 80.49 79.72 76.91 75.19 73.96

Table 1, 2 show the repeatability of Mesh-DoG, SP and my approach at fixed


radius (approximately 1% of the shape diameter), broken down according to
transformation classed and strengths. Higher repeatability scores are indication of
better performance [17] [18]. Mesh-DoG obtains almost perfect scores. Also obviously,
T.-W. Yuan et al. / Research on a Non-Rigid 3D Shape Retrieval Method 567

my approach takes the second place considering all evaluation measures. Although it
performs slightly worse than Mesh-DoG, it clearly outperforms in Shot Noise and
yields better results than SP with a same condition: operation directly on coordination.
An HKS interval and visual dictionary of 60 and 40 were used in the retrieval
process, respectively. The segmentation database provided in [19], which includes 380
meshes across 19 object categories, was used to compare the retrieval performances of
the proposed model, SP, and Mesh-DoG based on their Precision-recall curves for
different values of. Mean Average Precision (MAP), First Tier (FT), and Nearest
Neighbor (NN) measurements were also used to evaluate the three methods [6].
Unfortunately, the Core Extraction code was unable to be obtained for segmentation;
thus, the three measurements were implemented using the database provided in [19].
The recall-precision plot (PR-curve) shown in Figure 3 illustrates that the value of
μ influenced the retrieval results. The model yielded poor retrieval results when all of
the matching distances were determined by the partial descriptors (μ=0). This was
likely because two models within the same class lack common parts, resulting in
segmentation fault. However, when μ=1, BoF-HKS was directly applied to the dataset
as in SHREC’15, yielding acceptable results. The optimum results were obtained when
μ=0.7. As shown by the results, the performance of the proposed method was higher
than that of BoF-HKS (μ=1) and Part-HKS (μ=0). Therefore, only μ values of 0, 0.7,
and 1 were considered in the analysis.
The MAP, FT, NN, and global difference results are displayed in Figure 4. The
results obtained when μ=0 were identical due to the lack of influence of the global
distance. In addition, when computing the global distance, L1 yielded better results than
L2. Furthermore, the proposed method performed better than BoF-HKS (μ=0).
Therefore, the proposed method could improve the retrieval performance of non-
rigid 3D shape systems.

Figure 3. PR-curve of Retrieval


568 T.-W. Yuan et al. / Research on a Non-Rigid 3D Shape Retrieval Method

Figure 4. MAP, FT, and NN at approximately L1 + L1 and L2 + L1 for different μ

3. Conclusion

In this paper, a new feature point detector based on Curvature Flow was developed.
The proposed method operates on coordinates using curvature information, but does
not require the computation of extra curvature information. In addition, a new distance
is proposed for non-rigid 3D shape retrieval by combining global and partial
representation. The Core Extraction method was used to generate complete, joint, and
functional parts. The best and second-best distances were used to evaluate
correspondence between parts of different models.
According to the experimental data, the performance of the proposed method was
higher than that of the other approaches when applied to non-rigid 3D shape retrieval
with only global or partial representation. However, the performance of the proposed
method was limited when two models within the same class did not share enough parts,
possibly due to segmentation fault. Regardless, the proposed method offers new
representational capabilities for non-rigid 3D shape retrieval.

Acknowledgement

Supported by Specialized Research Fund for the Doctoral Program of Higher


Education, China (No. 20130061110054

References

[1] A. Zaharescu, E. Boyer, K. Varanasi, et al. Surface Feature Detection and Description with Applications
to Mesh Matching. Computer Vision and Pattern Recognition, IEEE Conference on, Miami, FL,
2009:373-380.
[2] D. G. Lowe. Object recognition from local scale-invariant features. The proceedings of the seventh IEEE
international conference, 2(1999):1150.
[3] U. Castellani, M. Cristian, S. Fantoni, et al. Sparse points matching by combining 3D mesh saliency with
statistical descriptors. Computer Graphics Forum. Blackwell Publishing Ltd, 27(2008): 643-652.
[4] Z. H. Lian, A. Godil, B. Bustos, et al. A comparison of methods for non-rigid 3D shape retrieval. Pattern
Recognition, 46 (2013):449-461.
[5] A. M. Bronstein, M. M. Bronstein, L. J. Guibas, et al. Shape google: Geometric words and expressions
for invariant shape retrieval. Acm Transactions on Graphics 30(2011):623-636.
[6] Z. Lian, J. Zhang, S. Choi, et al. SHREC'15 Track: Non-rigid 3D shape retrieval. Eurographics
Workshop Ond Object Retrieval (2015).
[7] Ran, Gal, and D. Cohen-Or. Salient geometric features for partial shape matching and similarity. Acm
Transactions on Graphics. 25 (2006):130-150.
T.-W. Yuan et al. / Research on a Non-Rigid 3D Shape Retrieval Method 569

[8] B. Li, A. Godil, H. Johan. Non-rigid and Partial 3D Model Retrieval Using Hybrid Shape Descriptor and
Meta Similarity. Advances in Visual Computing. Springer Berlin Heidelberg, 2012:199-209.
[9] A. Mademlis, P. Daras, A. Axenopoulos, et al Combining Topological and Geometrical Features for
Global and Partial 3-D Shape Retrieval. IEEE Transactions on Multimedia 10(2008):819-831.
[10] B. Bustos, T. Schreck , M. Walter, et al. Improving 3D similarity search by enhancing and combining 3D
descriptors. Multimedia Tools & Applications 58 (2012):81-108.
[11] I. Sipiran, B. Bustos, and T. Schreck. Data-aware 3D partitioning for generic shape retrieval ‫ ۼ‬.
Computers & Graphics. 37(2013):460-472.
[12] S. Katz, G. Leifman, A. Tal. Mesh segmentation using feature point and core extraction. Visual
Computer, 21(2005):649-658.
[13] X. Zhang, G. Li, Y. Xiong, et al. 3D Mesh Segmentation Using Mean-Shifted Curvature. Advances in
Geometric Modeling and Processing, International Conference, GMP 2008, Hangzhou, China, April 23-
25, 2008. Proceedings 2008:465-474.
[14] Tombari, Federico, S. Salti, et al. Performance Evaluation of 3D Keypoint Detectors. International
Journal of Computer Vision. 102 (2013):198-220.
[15] Chen Wei. A Mesh Smoothing Algorithm Using Curvature Flow. Computer Engineering and
Applications (2005).
[16] Shape Retrieval Contest Datasets : http://tosca.cs.technion.ac.il/book/shrec_feat2010.html
[17] A. M. Bronstein, M. M. Bronstein, B. Bustos, et al. SHREC 2010: robust feature detection and
description benchmark. Eurographics 2010 Workshop on 3D Object Retrieval. The Eurographics
Association.
[18] E. Boyer, A. M. Bronstein, M. M. Bronstein, et al. SHREC 2011: robust feature detection and
description benchmark. Eurographics 2011 Workshop on 3D Object Retrieval. The Eurographics
Association, 2011:71-78.
[19] A Benchmark for 3D Mesh Segmentation :http://segeval.cs.princeton.edu/
570 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-570

Virtual Machine Relocating with


Combination of Energy and Performance
Awareness
Xiang LIa, Ning-Jiang CHENa,b,1,You-Chang XUa and Rangsarit PESAYANAVINa
a
School of Computer and Electronic Information, Guangxi University, Nanning
Guangxi, 530004, China;
b
Guangxi Key Laboratory of Multimedia Communications and Network Technology
(Cultivating Base), Guangxi University, Nanning, Guangxi, 530004, China

Abstract. Virtual machine relocation has attracted attention to enable various


cloud computing services and to facilitate virtual machine (VM) migration. Most
of existing virtual machine relocating strategies usually focuses on the
optimization of single objective in cloud computing environment. But the
objective often sacrifices other objectives to achieve the optimal effect. Therefore
this paper presents a virtual machine relocating method with multi-factor tradeoff,
in which resource utilization is converted into energy and there is a tradeoff
between energy consumption and performance at the same time. The strategy
effectively avoids the unnecessary migration by using autoregressive model to
predict the performance of the future period when choosing the virtual machines to
be relocated. The experiment results show that the presented strategy can improve
the response time of virtual machine, the energy consumption, and the number of
virtual machine migration under the condition of guaranteeing certain SLA
(Service-Level-Agreement).

Keywords. virtual machine relocation, energy awareness, performance awareness,


autoregressive model

Introduction

In cloud computing applications, according to the research from Amazon in a data


center energy costs accounted for 42% of the total cost[1]. Cloud service providers
need to optimize energy consumption to reduce the high operating costs [2] and high
energy consumption of precise control server [3]. In a virtual data center, virtual
machine consolidation/relocation is an important way of energy conservation. Virtual
machine relocation dynamically adjusts to the location of virtual machine according to
certain indicators to achieve more reasonable of resource utilization or better user
experience or saving energy. To provide satisfactory service to the user of cloud service,
we need to balance between energy consumption and performance in data center. In
addition, before shutting down a server, in order not to affect the normal operation of
virtual machine on the server, we need to reasonably relocate some virtual machines to
1
Corresponding Author: Ning-Jiang CHEN, Guangxi University, Nanning, Guangxi, 530004, China; E-
mail: chnj@gxu.edu.cn.
X. Li et al. / VM Relocating with Combination of Energy and Performance Awareness 571

another server. During the research work of relocating virtual machine, some work set
resource utilization as the target, some are drive by profit; others set energy
consumption as the goal. Many relocating methods with consideration of energy
consumption [4] often reduce energy consumption without regarding on-demand
service quality of virtual machine as the key. In some practical application scenario,
during the process of virtual machine relocating, we need to reduce energy
consumption and consider how to ensure a good user experience at the same time. The
contributions of this paper include:(1) a virtual machine relocating strategy of
collaborating energy and performance in cloud is introduced; (2) the strategy
effectively avoids the unnecessary migration by using autoregressive model to predict
the performance of the next period.
The rest of this paper is organized as follows: Section 1 discusses the related work.
Section 2 presents the strategies for VM relocating. Section 3 conducts experiment to
verify the strategy. Section 4 concludes this paper.

1. Related Work

Many researchers take virtual machine placement problem as a problem of container


placement[2]. In the container problem, the researchers pay more attention to maximize
resource utilization without concerning the quality of service which is pursued by the
user. The QoS (Quality of Service) of cloud service is usually measured by a series of
indicators of SLA. To guarantee the goal of application service level, Das et al.[5]
present an adaptive QoS-aware VM provisioning mechanism to ensure efficient
utilization of the system resources. In[6], Hu et al. proposes a dynamic resource
provisioning strategy for multi-tier cloud services that employs a user preference and
service time mix-aware workload prediction method to be used as a foundation of
resource provisioning. In [7], Mann et al. present a QoS framework for VM migrations
(VMPatrol) which uses a cost of migration model to allocate a minimal bandwidth for a
migration flow. In [8], the researchers propose an improved and effective light weight
mechanism for real time service latency prediction for optimum virtual machine
resource allocation in delay-sensitive services of cloud. Beloglazov et al. [9] put
forward a Markov chain model and a control strategy, under the specific QoS
constraints to maximize every virtual machine migration time. To deal with the
problem of unbalanced traffic load in switching on and off VMs for the purpose of
energy saving, Wang et al. [10] present the strategy of Energy efficiency and Quality
of Service aware VM Placement (EQVMP), combining hop reduction, energy saving
and load balancing techniques. Besides virtualization placement based on QoS, the
researchers also focus on how to reduce energy consumption [11-12].
Many researchers lack adequate research on energy-efficiency strategy [13-15].
Beloglazov et al.[15] put forward four methods to choose the virtual machine that need
to be migrated from the overload server and a virtual machine placement method based
on an energy consumption strategy (named Power Aware Best Fit Decreasing,
PABFD).A kind of virtual machine relocation strategy with energy consumption and
performance perception named RelocatEP is put forward in this paper.
572 X. Li et al. / VM Relocating with Combination of Energy and Performance Awareness

2. The Design of the Strategy

Figure 1 shows a virtual machine relocating architecture, which consists of three parts:
Decider, Polynomial with Lasso Energy model based on Resource Utilization (PLERU)
and Performance Monitoring and Tracing model(PMT). The Decider module is used to
determine which virtual machine needs to be relocated and where to be relocated. In a
certain period, PLERU collects the usage of virtual machine and physical machine,
including the utilization rate of CPU and memory utilization rate, to model the energy
consumption. PMT monitors the performance indicators of each virtual machine. Based
on the output data of PLERU and PMT module, Decider module calculates the number
of virtual machine that need to be relocated and the destination of the relating virtual
machine.

Figure 1.The Architecture of Virtual Machine Relocation with Energy- performance Awareness
In Decider module, this paper designs a virtual machine relocation strategy. As
choosing the virtual machines that is need to be relocated, we use the autoregressive
model of time series prediction strategy to predict the SLA value of in the future time.
The prediction strategy considers the future influence on the rest of the virtual machine
after relocation, in order to avoid the virtual machine which has temporary SLA
violation caused by mutation load is relocated. At the same time, the virtual machine
that violates most its SLA relocated and the original host’s resource is released and
other virtual machines in the original host machine can use more resources. The
strategy let other virtual machines to avoid unnecessary migration so as to reduce the
overall energy consumption and improve the service quality of virtual machine.

2.1. Select Virtual Machine Using Autoregressive Model Based on Time Series

As for choosing virtual machines, in order to avoid the SLA violation, this paper adopts
the time series forecasting autoregressive model to predict the performance of virtual
machines in the case of SLA violation in next period of time. The response time is used
as the key performance indicator for virtual machines and physical server. The n-order
Autoregressive model (AR) is used to predict the next response time value. The
response time Tt at moment t only relates with the values of Tt-1, Tt-2,...Tt-n before t. So
Tt can be expressed as Formula(1):
Tt = ɔ1Tt-1+ɔ2Tt-2+,͐,+ɔnTt-n+at (1)
X. Li et al. / VM Relocating with Combination of Energy and Performance Awareness 573

Among them, the autoregressive parameter ɔ‹ indicates s the degree of influence of


Tt-i on Tt and at is the normal distribution random variable.
After predicting the response time, the maximum SLA violation of virtual
machine, which beyond the response time threshold most is selected. Minimum
Migration Time Policy (MMT) and Random Choice Policy (RC) in [16] are both the
methods of selecting relocating virtual machine. In [17], Beloglazov et al. propose the
minimization of migrations (MM) policy, which only considers the CPU utilization in
choosing a necessary relocation of virtual machine, without considering the virtual
machine's memory utilization and other external factors. For the choice of relocating a
virtual machine, response time should be an important factor. MMT can’t accurately
calculate the migration time (Migration Time = memory usage / host bandwidth
size),as the migration time not only relates with the number of the dirty pages in each
memory copy and the number of copies in virtual machine migration, but also with the
network bandwidth.
As the virtual machines with non-transitory SLA violation or the virtual machines
violating their SLAs most are selected for relocating, the resources of server are
released to some extent, and thus the overall energy is reduced and the SLAs of virtual
machines are guaranteed.

2.2. The Host Selection Based on Maximum Residual Energy Consumption

In the selection of host, principle is that choosing the host without violating SLA as
well as choosing the one having the maximum residual energy. In the process of
calculating the host energy, the polynomial with lasso energy model based on resource
utilization (PLERU) is adopted.
Usually, CPU energy consumption and energy memory consumption are the main
component of the total server energy consumption, accounting for 58% and 28% of the
total energy consumption of the total server respectively [18]. Therefore, our work
considers mainly CPU consumption and memory consumption.CPU energy
consumption and memory energy consumption is related to the CPU utilization and
memory usage respectively, which is not a simple linear relationship. So the energy
consumption model based on multivariate linear regression strategy is designed. In
PLERU, the multiple linear regression model of energy consumption for CPU
utilization and memory utilization is established. The energy consumption is given as
Formula (2).
yi β0 β1 xicpu β2 ximen Hi ,i 1,...,n
{E ( H ) 0,Var ( H ) V 2 (2)
i i

where yi stands for the measured energy consumption xicpu and ximen respectively
mean the measurement of CPU utilization and memory utilization; εi stands for
unobservable random errors, i = 1,2, ..., n; β0, β1,…, βm stand for the regression
coefficient.
The absolute value function of the model regression coefficients is used as a
penalty to compress the model coefficients, and the smaller absolute value of the
coefficient is automatically compressed to 0. At the same time, the selection of
significant variables and the estimation of the corresponding parameters are realized.
During the process of data training, it is required to achieve a balance between
overfitting and underfitting.
574 X. Li et al. / VM Relocating with Combination of Energy and Performance Awareness

The objective function of lasso regression method is given as Formula(3):


m 2 m
‹㹙σ i 1 ቌ› i െ ൭Ⱦ 0 ൅ Ⱦ 1 š iCPU ൅ Ⱦ 2 š imen ൱ቍ ൅ ɀσ i 1 ȁȾ i ȁ㹛(3)

In Formula(3), γ is the penalty factor controlling the severity of punishment. If the


value of γ is set too large, the final model parameters will tend to 0, resulting
inunderfitting; if the setting is too small, resulting inoverfitting. Therefore, the value of
γ in general needs to be determined through cross examination.
After the current energy consumption of each host is got by PLERU, the remaining
energy provision of each host will be found. Then the hosts are sorted in descending
order based on the remaining energy consumption. After sorting, the first host that does
not violate its SLA and meets the energy requirement of virtual machine will be
selected as the destination.
There are two concepts to be used in our strategy as follows.
1)The three-threshold: It includes the upper threshold of CPU utilization of host,
the lower threshold of CPU utilization of host, and the threshold of virtual machine
response time. Among them, the virtual machine response time threshold is used to
determine whether SLA violation of virtual machine happens; the other two thresholds
are alternatively the basis of virtual machine migration.
2)Sliding time window: It is the time interval of collecting sample data.
This strategy mainly is divided into two parts, one part is to choose the virtual
machine that needs to be relocated, and the other part is to choose the host. Choosing
relocating virtual machine uses the 3-threshold method. The virtual machine that is
against the SLA on the host that is beyond the upper threshold of CPU utilization will
be chose, and then the virtual machine whose predicted response time exceeds the
threshold of response time is chose. If the CPU utilization of a host is below the lower
threshold of CPU utilization, then all the virtual machines on the host will be chose for
migration. When choosing the virtual machine to be relocated from the host whose
CPU utilization beyond the upper threshold of CPU utilization, we introduce the
concept of sliding time window. During the time window as the sampling period, the
performance value of each virtual machine is detected, and the virtual machine exceeds
the SLA threshold will be marked as SLA-violation. In order to avoid temporary SLA
violation or unnecessary migration caused by transient load mutation, our strategy uses
the autoregressive model to predict the response time of the “SLA-violation “virtual
machine, based on the detected historical data in the sliding window. The virtual
machines whose predicted values still violate the SLA are sorted in descending order,
and the virtual machine that is violates SLA most is selected as the relocated one. The
next step is to determine the host to accept the relocated virtual machine. The host that
is not against SLA is selected, and the energy consumption of the host is obtained
according to the nonlinear lasso regression energy consumption model. The host with
minimum energy consumption is regarded as the relocating target. The process of the
strategy is shown in Figure2.
X. Li et al. / VM Relocating with Combination of Energy and Performance Awareness 575

Figure 2.The Process of RelocatEP Strategy

3. Experiments

We conducted a series of experiments based on a cloud computing platform set up by


Guangxi University for validating the strategy. The platform integrates a set of cloud
service components, including SWIFT storage system, KVM virtualization system,
self-developed monitoring tools, and so on. Since the commercial virtual machine
system are generally leased to the users as Web server, so we use the well-known
benchmark TPC-W as the test application deployed on virtual machine. TPC-W is a
well-known benchmark for web server and database performance, in which the
workload is performed in a controlled internet commerce environment that simulates
the activities of business-oriented transactional Web server. The study in [16]indicated
that the maximum response time of 90% of the shopping interaction in TPC-W is not
exceeds more than 5000ms, so the threshold of the response time of the virtual machine
is 5000ms. RelocatEP is compared with MMT [15] + PABFD,MM [16] + GAPA [17]
and in the experiments, in which the concerned indicators include the average response
time and the saving energy consumption after relocating the virtual machine, and the
number of virtual machine migration.
The experimental environment includes five servers one database server acts as the
platform of database system and TPC-W client, three servers as virtualization host run
a number of virtual machines. In the experiments, we configured 50, 75 and 100 virtual
machines on the host, and these virtual machines and the hosts are monitored. The
576 X. Li et al. / VM Relocating with Combination of Energy and Performance Awareness

response time output from TPC_W benchmark by PMT is regarded as the response
time of virtual machine. The experimental results are shown in from Figure 3 to Figure
5.

Figure3.TheSaving energy consumption of host

Figure4.The response time of the relocated virtual machine after relocation

Figure5. The number of virtual machine migration


As shown in Figure 3, whether it is 50 virtual machines, 75 virtual machines or
100 virtual machines, our strategy can reduce the energy consumption. The
MMT+PABFD method in the case of 75 and 100 virtual machines, the energy
consumption are reduced mostly because it sacrifices the response time of virtual
machine to reduce energy consumption. As shown in Figure 4, the response time of
MMT+PABFD method in the case of 75 and 100 virtual machines are more than
5000ms that the user can accept. It indicates that this method fails to improve the
performance of virtual machines. During the selection of target host, RelocatEP selects
the virtual machine with the largest SLA violation to be relocated, at the same time
selects the host with energy consumption as the target. This may release the resources
as much as possible, and the virtual machines are relocated on the host with minimum
energy consumption. Also seen from Figure 4, after the virtual machines are relocated
X. Li et al. / VM Relocating with Combination of Energy and Performance Awareness 577

by a certain strategy, the response times of the virtual machines are shortened.
RelocatEP strategy can guarantee the response time would not exceed 5000ms after
virtual machine relocation in all the three situation, and it effectively ensures the user
experience. From Figure 5, it can be seen that the number of migrated virtual machine
is the least. It is because that RelocatEP uses the predicting process based on self-
regression model to predict the performance of virtual machine in the future period,
which helps to deal with the performance violation caused by temporary load mutation
of virtual machine.
In the experiments, compared with MM+ GAPA method and MMT + PABFD
method, in the aspect of saving energy, our strategy is better than MM+GAPA, but
worse than MMT+PABFD. However in the aspect of response time and the number of
virtual machine migration, RelocatEP is better than other at least 30% and
10%respectively.
In summary, compared to MM+IMBFD and MMT+ PABFD, RelocatEP is good at
reducing the energy consumption and avoiding unnecessary virtual machine migration
in the case of guaranteeing SLA.

4. Conclusions

In this paper, we propose the RelocatEP strategy. It uses the three-threshold method for
choosing the relocated virtual machines. Then, it adopts the automatic regression model
based on time series to predict the SLA violation in the next time period, to detect
whether one virtual machine is a temporary load mutation caused by SLA violation,
avoiding unnecessary migration. During the process of choosing a host, the
performance and energy consumption of the host are balanced. The experimental
results show that RelocatEP can guarantee the user’s SLA and provide the effective
way to reduce energy consumption, and also can avoid unnecessary virtual machine
migration. In the future work we will research the optimization of cost in different
virtual machine migration strategy.

Acknowledgements

This work is supported by the Natural Science Foundation of China(No.61063012,


61363003) and the National Key Technology R&D Program of China(No.
2015BAH55F02).

References

[1] J. S. Yan, S. Ali, S. Kun, et al. State-of-the-art research study for green cloud computing. The Journal of
Supercomputing, 65(2013): 445-468.
[2] J. Dong, X. Jin, H. Wang, et al. Energy-saving virtual machine placement in cloud data centers//
Proceedings of 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
(CCGrid), IEEE, 2013: 618-624.
[3] Y. Wang, X. Wang, M. Chen, et al. Partic: Power-aware response time control for virtualized web servers.
IEEE Transactions on parallel and distributed systems, 22(2011): 323-336.
[4] J. W. Jang, M. Jeon, H. S. Kim, et al. Energy Reduction in Consolidated Servers through Memory-Aware
Virtual Machine Scheduling[J]. IEEE Transactionson Computers, 60(2011):552-564.
578 X. Li et al. / VM Relocating with Combination of Energy and Performance Awareness

[5] A. K. Das, T. Adhikary, M. A. Razzaque, et al. An intelligent approach for virtual machine and QoS
provisioning in cloud computing// Proceedings of The International Conference on Information
Networking 2013 (ICOIN). IEEE, 2013: 462-467.
[6] D. Hu, N. Chen, S. Dong, et al. A user preference and service time mix-aware resource provisioning
strategy for multi-tier cloud services. AASRI Procedia, 2013, 5: 235-242.
[7] V.Mann, A. Vishnoi, A. Iyer, et al. Vmpatrol: Dynamic and automated qos for virtual machine
migrations//Proceedings of the 8th International Conference on Network and Service Management.
International Federation for Information Processing, 2012: 174-178.
[8] R. K.Sharma, P. Kamal, S. P. Singh. A latency reduction mechanism for virtual machine resource
allocation in delay sensitive cloud service//Green Computing and Internet of Things (ICGCIoT), 2015
International Conference on. IEEE, 2015: 371-375.
[9] A.Beloglazov, R. Buyya. Managing overloaded hosts for dynamic consolidation of virtual machines in
cloud data centers under quality of service constraints. IEEE Transactions on Parallel and Distributed
Systems, 24(2013): 1366-1379.
[10] S. H. Wang, P. P. W. Huang, C. H. P. Wen, et al. EQVMP: Energy-efficient and QoS-aware virtual
machine placement for software defined datacenter networks// Proceedings of The International
Conference on Information Networking 2014 (ICOIN2014). IEEE, 2014: 220-225.
[11] Y. Kessaci, N. Melab, E. G. Talbi. A Pareto-based metaheuristic for scheduling HPC applications on a
geographically distributed cloud federation. Cluster Computing,16(2013):451-468.
[12] T. Mastelic, A. Oleksiak, H. Claussen, et al. Cloud computing: Survey on energy efficiency. ACM
Computing Surveys, 47(2015): 33.
[13] A. Kansal, J. Liu, A. Singh, et al. Semantic-less coordination of power management and application
performance. ACM SIGOPS Operating Systems Review,44(2010): 66-70.
[14] G. Jung, M. A. Hiltunen, K. R. Joshi, et al. Mistral: Dynamically managing power, performance, and
adaptation cost in cloud infrastructures// Proceedings of IEEE 30th International Conference on
Distributed Computing Systems (ICDCS), IEEE, 2010: 62-73.
[15] A. Beloglazov, R. Buyya. Optimal online deterministic algorithms and adaptive heuristics for energy
and performance efficient dynamic consolidation of virtual machines in cloud data centers.
Concurrency and Computation: Practice and Experience, 24(2012): 1397-1420.
[16] A. Beloglazov, J. Abawajy, R. Buyya. Energy-aware resource allocation heuristics for efficient
management of data centers for Cloud computing. Future Generation Computer Systems,28(2012):755-
768.
[17] L. L. Xiao, H. Z. Xi. A virtualized cloud computing data center energy aware resource allocation
mechanism .Computer application, 33(2013): 3586-3590.(in Chinese)
[18] N. Quang-Hung, P. D. Nien, N. H. Nam, et al. A genetic algorithm for power-aware virtual machine
allocation in private cloud//Proceedings of Information and Communication Technology-EurAsia
Conference. Springer Berlin Heidelberg, 2013: 183-191.
Fuzzy Systems and Data Mining II 579
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-579

Network Evolution via Preference and


Coordination Game
En-Ming DONG 1 , Jian-Ping LI and Zheng XIE
National University of Defense Technology, Changsha, Hunan, 410073, China

Abstract. As an efficient technique to research complex system, networks can re-


veal the interactions in many systems. The interactions in many real-life networks
are complex, for example, the cooperation and competition in social networks. The
different strategies in game theory can be used to describe the complex interactions.
A network evolution model is proposed with the consideration of both preference
and coordination game. Theoretical analysis and experiment results show that the
model can well describe many statistical properties in real-life complex networks,
like scale-free and high clustering coefficient.

Keywords. network evolution, coordination game, power law degree distribution

Introduction

Many real physical, biological, social complex systems, such as transportation, nervous
system and social relations can be abstracted to topological network models with nodes
representing individuals and edges representing interactions.
Since the 1990s, many models have been proposed to characterize the properties of
complex networks, like small world and scale-free properties. In which the small world
model proposed by Watts and Strogstz [1] and the growth preferential attachment model
proposed by Barabsi and Albert(BA model)[2] are recognized as the pioneering work.
(In fact, as early as 1965, Price proposed the preferential attachment mechanism on de-
gree during his study on citation relations[3]).Inspired by the BA model, many network
evolution models are proposed with consideration of local preferential attachment[4],
age preferential attachment[5], and the fitness[6].
Beyond the topological structure of networks, some recent studies take the locations
of nodes into consideration. Dynamic random geometric graphs are basic framework in
whose models, and the networks generated are more similar to real complex networks.
Krioukov et al found that hyperbolic geometry is the hidden geometry of the networks
with a power-law degree distribution, and build a network navigation algorithm[7]. Xie
et al constructed geometric graph models for citations, cooperations and Internet, using
the geometric distance to describe the correlations between nodes [8].
Note that most studies rarely consider the various interactions between the nodes in
network, which are hard to describe by simple links. As in protein networks, different
1 Corresponding Author: En-Ming Dong, National University of Defense Technology, Changsha, Hunan,

410073, China; E-mail: dream0617@163.com.


580 E.-M. Dong et al. / Network Evolution via Preference and Coordination Game

types of proteins have two diametrically opposite effect: mutual promotion or suppres-
sion; and in a social network, the edges between individuals can represent both cooper-
ation and competition. In this paper, under the framework of game theory, the strategies
are used to mimick relationships between nodes in networks. A network evolution model
is proposed, in which the new node selects its neighbors by preference attachment first.
Then in order to get expected payoff, it changes some neighbors according to the coordi-
nation Game. Two neighbor changing methods are considered in the model, one is only
breaking links(NEB), while the other one is rebuilding links(NER). Theoretical analysis
and experiment results show that networks generated by both NEB and NER model fol-
low power law degree distributions in large degree. By controlling expected payoff rate,
the rate of small degree nodes and the clustering coefficient can be more similar to real
world networks.

1. The Model

1.1. Basic Descriptions of Network Game Theory

Cooperation and competition are common phenomenon in real life. Game theory is a
subject to study them mathematically, wich is an important branch of operational re-
search with applications to economics, military and psychology. The game model can
take many forms in regard to the various real-life examples, but essentially composed of
three basic elements: the players, the strategy set, the payoff matrix. Players are the game
participators, who can choose strategies. At least two players are needed in one game.
Each player i can take strategy si . When all the players’ strategies are chosen, a game
situation x is formed. The payoff of each player is a function of the situation x. A basic
game model is defined below.

Definition 1 A basic game model refers to a triple Γ = (N, S, P), where N is a non-empty
finite set of the players, S is non-empty policy set, P is payoff matrix of two or
more players at different strategies.[9]

HH n
H 2 C D
n1 HHH
C a, a 0, b
D b, 0 c, c

In real life, people are usually influenced by their neighbors, such as they will make
reference to the surrounding neighbors when buying a product, they will consider the
compatibility with others when buying a software. Such conformity can be described
by majority game, that people are forced to take some actions or strategies in order to
be consistent with the most neighbors. In fact, people can take two actions to achieve
consistency with the surrounding neighbors: one is to change their actions or strategies;
the other is to change their neighbors, by choosing neighbors with the same interests to
achieve that consistency. The online social softwares can help people to choose friends
with similar views, behaviors and hobbies. Coordination game is efficient to reflect the
E.-M. Dong et al. / Network Evolution via Preference and Coordination Game 581

payoff of the consistency. Coordination game is a special kind of game, only the players
with the same strategy can get income, that is the income come from the coordination
but conflict of the strategy. Take two-person game as an example, the payoff matrix are
defined above.
A network evolution mechanism is proposed based on coordination game, first we
define node label strategy.

Definition 2 Given a k dimension label vector L = (l1 , l2 , . . . , lk ) to network G = (V, E).


For any component l j ( j = 1, 2, . . . , k), any node i ∈ V , there exist a strategy
si j ∈ {0, 1}. Si = si1 , si2 , . . . , sik is called the label strategy of node i in networks G
corresponding to label L.

In particular, the label vector and label strategy is an enrichment to the networks.
In social networks, for example, if one component of L is basketball, the strategy will
be 1 if node i likes basketball, 0 otherwise. According to BA model, when a new node
comes into a network, it prefer to link existing famous nodes, namely nodes with large
degree. However, the well-known nodes may not share the same interests and hobbies
with the new node. For example, it’s odd to follow a baseball star when you are absorbed
in football. So we use coordination game to make a selection of the neighbors.

1.2. Network Evolution Based on Payoff Rate

Consider a simple coordination game, the payoff matrix is defined as below. If two nodes
have the same strategy, they both gain 1, otherwise 0. Therefore, new nodes can choose
neighbors based on label strategy in order to make its payoff rate higher than a given
threshold α .
HH n
HH 2 C D
n1 HH
C 1, 1 0, 0
D 0, 0 1, 1

For a given network G = (V, E) with label vector L = (l1 , l2 , . . . , l j ), the label strategy
of existing node k ∈ V is Sk . The new node i’s label strategy is Si . Its neighbors selected
by preference attachment is N = (n1 , n2 , . . . , nm ), with the corresponding label strategies
Sn1 , Sn2 , · · · , Snm , for a neighbor nc , node i can get payoff

uic = ∑ u(Snc (d), Si (d)),


d

in which u(Snc (d), Si (d)) is payoff according to the payoff matrix of coordination matrix.
So the payoff rate of the new node is

∑c ui c
Rui = .
mj

When the Rui is smaller than α , the new node break links with the neighbor of the
least payoff, until Rui > α . So the network evolution algorithm NEB is as follows.
582 E.-M. Dong et al. / Network Evolution via Preference and Coordination Game

Algorithm 1
Step 1 Initialize: randomly generate a initial network G = (V, E) with m0 nodes, a label
vector L = (l1 , l2 , . . . , lk ), and the corresponding label strategies S j , j = 1, 2, · · · , m0 ;
Step 2 Termination conditions: if the total nodes is larger than N, stop; else go step 3;
Step 3 Preferential attachment: new node i comes into the network, with label strategies
Si , it first select m(m < m0 )nodes by preferential attachment, namely the probabil-
ity of chosen a existing node j is

dj
Pj = ;
∑x dx

Step 4 Coordination game: for all the neighbor N = (n1 , n2 , . . . , nm ) with the label strate-
gies Sn1 , Sn2 , . . . , Snm , calculate the payoff uic and the whole payoff rate Rui ; for a
given threshold α , if Rui > α , go step 3, else break links with the node with the
least payoff, go step 4.

To test the performance of the algorithm, we consider the simple case of k = 2. The
strategies of nodes to the network label are generated randomly. The degree distribution
are shown in figure 1. The different colors represent degree distributions of different α ,
with initial m0 = 30, m = 20, N = 10000. When d > 20, the degree distribution of all the
generated networks follow power law distributions. When d ≤ 20, the rate of nodes with
small degrees increase as α increases.

Degree distribution for different payoff rate,


101
0
0.3
0.5
0.8
100 1
Probability P(k)

10-1

10-2

10-3
100 101 102 103
Degree k

Figure 1. Degree distributions for different α

It can also be proved that the degree distributions of networks generated by algorithm
1 follow power law.

Theorem 1 When the new node i comes into the network, the probability of the existing
node q to be selected as the neighbors of i is independent to α

Proof: First the node q can be selected according to the preferential attachment with
the probability Pqp = d j / ∑ j d j . Then suppose in the coordination game process, node q
is selected according to probability PqG . So the total probability that node q is selected is
Pq = Pqp PqG .
Note that ∑ j d j is twice the number of links in the network, so if we only consider the
preferential attachment process, ∑ j d j  2m for large m. When consider the coordination
E.-M. Dong et al. / Network Evolution via Preference and Coordination Game 583

game process, α is related to the broken of nodes, so is related to ∑ j d j . In average, the


probability that neighbor q won’t be dropped is PqG . So ∑ j d j  2mPqG ,
As a result, the probability that node k is selected is

dq dq
Pq = PqP PqG = PG = .
2mPqG q 2m

This is the same as the BA model, and independent to α .


Therefore, the networks generated by algorithm 1 have nearly the same degree dis-
tributions with BA networks for large k, which is consistent with the results shown in
Figure 1.

1.3. Network Evolution Based on Payoff

In algorithm 1, if the payoff of building a link is 0, the total payoff of the new node is
unchanged. However, people are more concerned with the total amount of payoff instead
of the payoff rate. So after breaking a link, the new node will select one existing node
with nearly the same strategies to remain the total number of neighbors m unchanged.
For the payoff rate α , the new node will get payoff α m.
In most cases, nodes do not know strategy information of the other nodes in the
network. They can only get information from their neighbors, so the new neighbors will
be selected in the neighborhood of the existing neighbors. The NER algorithm is nearly
the same as algorithm 1. The only difference is in step 3, after breaking a link, the new
node i will chose one node in the neighborhood of its neighbors according to the payoff.

Degree distributions for different payoff rate


102
0
0.5
1
101
Probability P(K)

100

10-1

10-2

10-3
100 101 102 103
Degree K

Figure 2. Degree distributions for different α

The degree distribution of networks generated by NER algorithm is also power law,
the theoretical proof is similar to the model proposed by Peter Holme[10]. Experiment
results is shown in figure 2, in which the rate of number of nodes with small degree also
increases as α grows. The clustering coefficient of the generated networks is shown in
figure 3, it remains stable as N increases. The clustering coefficient can be adjusted by
α , it grows as α increases.
584 E.-M. Dong et al. / Network Evolution via Preference and Coordination Game

Clustering coefficient for N with different payoff rate


0.45
0
0.4 0.2
0.4
0.35 0.6
0.8
1

Clustering coefficient
0.3

0.25

0.2

0.15

0.1

0.05

0
0.5 1 1.5 2 2.5 3
Number of nodes N ×104

Figure 3. Clustering coefficient for N with different α

Conclusion

Based on preferential attachment and coordination game, a network evolution model is


proposed with two types of link formation mechanisms: the NEB and NER. Networks
generated by both mechanisms follow power law degree distributions when the degree is
large. The rate of number of small degree nodes(in both NEB and NER) and the cluster-
ing coefficient(in NER) increase as the expected payoff rate grows, which is more similar
to real-world networks. Therefore, it is reasonable to believe that game theory is one in-
herent mechanism to the small degree nodes and the emergence of clustering in network
evolution.

References

[1] D. J. Watts, S. H. Strogatz, Collective dynamics of small-world networks, Nature 93 (1998), 440-442.
[2] A. L. Barabsi, R. Albert, and H. Jeong, Emergence of scaling in random networks, Science 286 (1999),
509-512.
[3] D. J. Price, Networks of scientific papers, Science 149(1965), 510-515.
[4] X. Li, G. Chen, A local-world evolving network model, Physica A 328(2003), 274C286.
[5] S. N. Dorogovtsev, P. L. Krapivsky, JFF. Mendes, Transition from small to large world in growing
networks, Europhys.Letter 81(2007), 226-234.
[6] R. Albert, A. L. Barabasi, Statisrical Mechanics of Complex Networks, Rev.Mod.Phys 74(2002), 47.
[7] M. Boguna, D. Krioukov, K. Claffy. Navigability of Complex Networks, Nature Physics 5(2009), 74-80.
[8] Z. Xie, Z. Ouyang, J. Li, A geometric graph model for coauthorship networks, Journal of Informetrics
10(2016), 299-311.
[9] D. Fudenberg, Game Theory, The MIT Press 60(1991), 841-846.
[10] P. Holme, B. J. Kim, Growing scale-free networks with tunable clustering, Physical Review E 65(2002),
95-129.
Fuzzy Systems and Data Mining II 585
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-585

Sensor Management Strategy with


Probabilistic Sensing Model for
Collaborative Target Tracking in Wireless
Sensor Network
Yong-Jian YANG1, Xiao-Guang FAN, Sheng-Da WANG, Zhen-Fu ZHUO, Jian MA
and Biao WANG
Aeronautics and Astronautics Engineering College, Air Force Engineering University,
Xi’an, Shaanxi, China

Abstract. This paper addresses collaborative target tracking in wireless sensor


networks where the probabilistic sensing model is adopted because disk sensing
model is coarse and unrealistic. However, the probabilistic sensing model will lead
to the decrease of useful sensor nodes, the degradation of detection probability,
and the divergent of tracking results. To address those problems, mobile sensor
nodes are adopted and a novel sensor management strategy with mobile sensor
nodes is proposed based on the distributed Kalman filtering fusion with feedback.
Simulation results verify that the number of useful sensors, the collaborative
detection probability and the tracking performance are obviously improved.

Keywords. collaborative target tracking, wireless sensor network, probabilistic


sensing model, detection probability

Introduction

Wireless sensor network (WSN) which usually consists of a large number of static
sensor nodes, has been widely applied in military and civilian fields [1-2]. Target
tracking is considered as one of the important applications in WSN, such as monitoring
of wild animals, intruder detection and surveillance in the military areas.
How to improve the energy efficiency and tracking performance in WSN has
attracted more and more attention [3]. Sensor nodes selection problem plays a
significant role in improving both the energy efficiency and tracking performance in
WSN [4-5]. Prediction-based scheme [6] is one of such methods to select sensor nodes
for the next sampling period. Based on prediction-based scheme, various methods are
proposed to improve the energy efficiency, increase tracking performance, and shorten
the latency between sensors and the sink to increase the communication performance
[7-9]. These methods can be roughly divided into two categories. One is clustering
management methods [2, 10-15] which create multiple clusters of sensor nodes and
each cluster consists of a cluster head node and several cluster member nodes. Cluster

1
Corresponding Author: Yong-Jian YANG, Aeronautics and Astronautics Engineering College, Air
Force Engineering University, Xi’an, Shaanxi, China; Email: yangyongjian_king@126.com.
586 Y.-J. Yang et al. / Sensor Management Strategy with Probabilistic Sensing Model

formation, maintaining, reconfiguration and cluster-head election are the main


problems related to those methods. For example, [10] proposes a new cluster
management strategy by using the proactive and reactive cluster management. The
other one is Information Driven Sensor Querying (IDSQ) [16-18] methods. The basic
idea of those methods is to explore the content of the data captured by sensors to
optimize the future readings. For example, [17] proposes a distributed Kalman filter
algorithm combining with information-driven extension methods.
However, all the previous works are based on the 0/1 disk sensing model [19]
which is a coarse and unrealistic model. Probabilistic sensing model is a more accurate
sensing model and has been widely adopted to analyze the quality of coverage in WSN
[20]. To the best of our knowledge, there are not enough works which discuss the
influences on tracking results caused by probabilistic sensing model in WSN.
A common goal of target tracking system is the continuous detection of mobile
targets with a minimum number of active sensor nodes. On the other hand, the number
of active sensor nodes affects success probability in target tracking. If the probabilistic
sensing model is adopted, it is possible that the active sensor nodes cannot detect the
target, which will lead to intermittent observations and degraded tracking performance
(even divergence of tracking results). Therefore, how to improve the target detection
probability in WSN with probabilistic sensing model is a very realistic work. This
paper attempts to use the mobile sensor nodes to improve the collaborative detection of
WSN with probabilistic sensing model. By using the mobile sensor nodes, the number
of useful active sensor nodes is obviously increased and the tracking performance using
distributed Kalman filtering fusion with feedback [21] is also significantly improved.
The rest of the paper is organized as follows. The related work is discussed in
Section 1. The sensor management strategy is presented in Section 2. Section 3
presents the simulation results. Conclusions are given in Section 4.

1. Problem Formulation

S S
Suppose a WSN, in which the sensor locations Si=( xi , yi ), i=1, 2, …, Ns are assumed
known, is used to track the state of moving target whose motion equation and the
observation equation can be expressed as follows
x(k  1) Φ(k  1| k ) x(k )  w(k ) (1)
zi (k ) Hi (k ) x(k )  vi (k ) (2)
n m
where x(k)R is the target state vector, zi(k)R is the measurement vector obtained
from the i-th sensor (the target can be detected by the i-th sensor) in the network.
W(k)~N(0, Q(k))Rn and vi(k)~N(0, Ri(k))Rm are the process and measurement noise,
respectively. Φ(k+1|k) and Hi(k) are constant matrices with suitable dimensions.
According to the distributed Kalman filtering fusion with feedback [21], the
estimated state and its covariance in fusion center are as follows
N
xˆ (k | k ) xˆ (k | k  1)  P(k | k )¦[ H i T (k )( Ri (k )) 1 zi (k )] (3)
i 1
N
P( k | k ) ¦ ( P (k | k ))
i 1
i
1
 ( N  1)( P(k | k  1)) 1 (4)
Y.-J. Yang et al. / Sensor Management Strategy with Probabilistic Sensing Model 587

where zi (k ) zi (k )  Hi (k ) xˆi (k | k 1) is the local innovation, xˆi (k | k ) and Pi (k | k )


are the local estimated state and covariance using Kalman filtering, N is the number of
sensors who participated target tracking and detected the target at time k.
Due to the properties of sensors and the environment, there are only a part of
sensor nodes can detect the target at time k. Assuming the sensing range of i-th sensor
node is ri and uncertain sensing range is δ˜ri (δ<1), then a reliable detection probability
of the target which is detected by the i-th sensor node can be modeled as follows [20]
­0 d ( Si , target) / ri  1 t G
° E1
° OD
pRi ® O2 exp(  1 E12 )  G d d ( Si , target) / ri  1  G (5)
° D2
°¯O2 d ( Si , target) / ri  1  G
where d ( Si , target) ( xt  xiS )2  ( yt  yiS ) 2 is the Euclidean distance between the
target and the j-th sensor node. λ1, λ2, β1 and β2 are constants. α1 and α2 are the functions
of ri and d(Si, target).
D1 ri (G  1)  d ( S j , target)
(6)
D2 ri (G  1)  d ( S j , target)
Eq. (5) indicates that even the target is under the sensing range of sensor node, the
detection probability is less than 1. Although the collaborative detection probability of
some sensor node is greater than only one sensor node, the observations are
intermittent for a single sensor node, which will lead to a reduction of the participated
tracking nodes number. Usually, according to the property of sensors, a smaller d(Si,
target) is, a higher detection probability can achieved. Therefore, it is reasonable to use
mobile sensor nodes to improve the detection of target.

2. Sensor Management Strategy with Mobile Sensor Nodes

As our best knowledge, the collaborative target tracking in WSN focus on how to use
sensors in an energy-efficient way. These methods assume the target will be detected
when the target is under the sensing range of sensor node and the sensor node cannot
move, furthermore, it requires redundant sensor nodes. However, it is possible that
none of sensor is available to track the target if the distance between target and sensor
is great. Therefore, we assume that the nodes of WSN includes both the static and
mobile sensors, the static sensor node indicates the node cannot move and the mobile
node indicate the node can move to any directions with limited velocity.
The procedure of the sensor management strategy with mobile sensor nodes in
WSN is descripted as follows
Step 1 Potential sensor nodes selecting. Select the sensor nodes whose sensing
range includes the predicted location of target as potential sensor nodes of time k.
Calculate Np(k) which is the number of potential sensor nodes, and make Nu(k) = Np(k)
where Nu(k) is the number of useful sensor nodes. The useful sensor node represents
the sensor node detected the target and received the echoes from the target.
Step 2 Mobile sensor nodes update their locations. When the potential sensor
nodes include the mobile sensor nodes, update the positions of these mobile sensor
nodes as follows
588 Y.-J. Yang et al. / Sensor Management Strategy with Probabilistic Sensing Model

xˆ1 (k | k  1)  xiSm (k  1)
T atan( ) (7)
xˆ2 (k | k  1)  yiSm (k  1)
­ xiSm (k ) xiSm (k  1)  v ˜ sin(T )
® Sm (8)
¯ yi (k ) yi (k  1)  v ˜ cos(T )
Sm

where xˆ1 (k | k  1) and xˆ2 (k | k  1) represent the predicted position at time k along x
direction and y direction, respectively. xiSm (k  1) and yiSm (k  1) represent the position
of mobile sensor node at time k-1 along x direction and y direction, respectively. And v
is the velocity of mobile sensor.
Step 3 Local estimated state and corresponding error covariance matrix
updating. Under the detection probability modeled as (5), update the estimated state
and the corresponding error covariance matrix by using Kalman filtering and return
these data to fusion center. When the observations miss, Nu(k) = Nu(k)-1 and no data
return to fusion center.
Step 4 Target trajectory updating. Update the estimated state and corresponding
covariance of target by using distributed Kalman filtering fusion with feedback. Then
broadcast the fused estimated state and corresponding covariance to the potential
sensor nodes.
Obviously, the proposed sensor management strategy does not consider the energy
efficiency, but it is very useful to improve the tracking performance. In some
applications such as battlefield surveillance, the number of sensor nodes is not
redundant, and the sensor nodes are generally chargeable, thus, energy efficiency is not
a priority factor.

3. Simulation Results

Assume the sensing field of WSN is 4 km u 4 km. The parameters of probabilistic


sensing model in (5) are ri=1.2 km (i=1, 2, …, Ns), δ=0.6, λ1=1, λ2=0.6, β1=3 and β2=2.
The total sensor nodes’ number is Ns = 38, which includes N Ss =19 static sensor nodes
and N Sm =19 mobile sensor nodes. Rs=50m and Rm=100m which represent the
measurement noise covariance of static and mobile sensor nodes, respectively. The
speed of mobile sensor node is v=50m/s.
The target begins to move at a constant speed of [30m/s, 30m/s] from the position
of [200m, 400m]. During 50s~70s the target takes a uniformly accelerated motion, and
its acceleration is [-5m/s2, 4m/s2]. During 70s~100s its acceleration is [10m/s2, -10m/s2].
The total movement time is 100s. The CA (constant acceleration) model is adopted to
track the target in this paper.
The distribution of sensor nodes and the trajectory of target are shown in Figure 1.
The estimated trajectory of the target and the trajectory of mobile sensor nodes using
mobile sensor nodes proposed in this paper are also showed in Figure 1.
Figure 2 shows the indexes of sensor node which participated target tracking and
detected the target. From Figure 1 and Figure 2, we can find that the static sensor nodes
2, 4, 7, 8, 12 and 16 are used to track the target because the predicted target position is
located in their sensing ranges, but the target is not detected by these sensor nodes. This
is because the distance between the target and the sensor node is great which lead to a
Y.-J. Yang et al. / Sensor Management Strategy with Probabilistic Sensing Model 589

low detection probability for these sensor nodes. In addition, because of the target
maneuvers and the detection probability of sensor node, the observations acquired by
each sensor node are intermittent.
40

35

30

Sensor node index


25
Y (m)

20

15

10

0
0 10 20 30 40 50 60 70 80 90 100
X (m) Time (s)

is the mobile node, is the static node is the selected tracking node; is the true trajectory of target

is the trajectory of the mobile node, is the esitmated trajectory of target using mobile nodes

Figure 1.The distribution of nodes and the Figure 2. The indexes of sensor nodes
estimated target trajectory detected the target

In order to analyze the performance of sensor management strategy with mobile


sensor nodes, two simulation scenes are assumed, one is collaborative tracking using
19 static sensor nodes and 19 mobile sensor nodes, the other is collaborative tracking
using 38 static sensor nodes.
200 22
The average number of useful tracking sensor nodes using mobile snsor nodes
RMSE of position using mobile sensor nodes 20
180 The average number of useful tracking sensor nodes using static snsor nodes
RMSE of velocity using mobile sensor nodes The average number of potential tracking sensor nodes
RMSE of acceleration using mobile sensor nodes 18
160 RMSE of position using static sensor nodes
RMSE of velocity using static sensor nodes 16
140 RMSE of acceleration using static sensor nodes
The average number of sensors

14
120
RMSEs

12
100
10
80
8
60
6
40 4
20 2

0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Time (s) Time (s)

Figure 3. RMSEs of position velocity and acceleration Figure 4. The average number of useful sensor node
Figure 3 shows the RMSEs of position, velocity and acceleration using mobile
sensor nodes and using static sensor nodes. The simulation has been independently
operated 100 times. Obviously, the estimated position and velocity of collaborative
tracking using static sensor nodes are divergent after target maneuver occurred. The
collaborative tracking using mobile sensor nodes can apparently improve the precision
of estimated target state.
Figure 4 shows the average number of useful sensor nodes of 100 times
simulations. Obviously, the number of useful sensor nodes is less than the potential
sensor nodes, because of the detection probability of sensor node. The number of useful
sensor nodes using mobile sensor nodes is greater than those using static sensor nodes,
which indicates the higher collaborative detection probability of using mobile sensor
nodes.
590 Y.-J. Yang et al. / Sensor Management Strategy with Probabilistic Sensing Model

From these simulation results, the following conclusions can be acquired. (1) The
useful sensor nodes number is always less than the potential sensor nodes number
because the detection probability of sensor node is less than 1. (2) The collaborative
tracking results using static sensor nodes are worse than using mobile static sensor
nodes. Especially, when target maneuver occurs, the tracking results using static sensor
nodes even diverge. (3) When some mobile sensor nodes are added in WSN, the sensor
management strategy with mobile sensor nodes proposed in this paper can improve the
collaborative detection probability and the estimated state precision.

4. Conclusions
Focused on the collaborative target tracking in wireless sensor networks with the
probabilistic sensing model, this paper has proposed to use mobile sensor nodes to
counteract the shortcomings resulted by the probabilistic sensing model. Specifically,
leveraged by the mobile sensor nodes, a sensor management strategy is proposed to
improve the number of useful sensor nodes, the collaborative detection probability and
the estimated state precision. Extensive simulations have been conducted in this paper
and the simulation results verify that the proposed sensor management strategy with
mobile sensor nodes based on distributed Kalman filtering fusion with feedback has
achieved high performance in terms of the number of useful sensors, the collaborative
detection probability and the tracking results.

References
[1] O. Eemigha, W. Hidouci, and T. Ahmed, On energy efficiency in collaborative target tracking in wireless
sensor network: a review, IEEE Communications Surveys & Tutorials, 15(2013), 1210-1222.
[2] Z. X. Cai, S. Wen, and L. J. Liu, Dynamic cluster member selection method for multi-target tracking in
wireless sensor network, Journal of Central South University, 21(2014), 636-645.
[3] J. M. Chen, J. K. Li, and T. H. Lai, Energy-efficient intrusion detection with a barrier of probabilistic
sensors: global and local, IEEE Transactions on Wireless Communications, 12(2013): 4742-4755.
[4] V. Isler, and R. Bajcsy, The sensor selection problem for bounded uncertainty sensing models, IEEE
Transaction on Automation Science and Engineering., 3(2006), 372-381.
[5] U. D. Ramdaras, F. G. J. Absil, and R. V. Genderen, Sensor selection for optimal target tracking in
sensor networks, International Journal of Intelligent Defence Support Systems, 4(2011), 187-207.
[6] J. H. Yoo, and H. J. Kim, Predictive target detection and sleep scheduling for wireless sensor networks,
IEEE International Conference on Systems, Man and Cybernetics, 2013, 362-367.
[7] G. Wang, Y. Wu, K. Dou, Y. Ren, and J. Li, AppTCP: The design and evaluation of application-based
TCP for e-VLBI in fast long distance networks, Future Generation Computer Systems, 39(2014), 67–
74.
[8] G. Wang, Y. Ren, K. Dou, and J. Li, IDTCP: An effective approach to mitigating the TCP Incast problem
in data center networks, Information Systems Frontiers, 16(2014), 35–44,.
[9] G. Wang, Y. Ren, and J. Li, An effective approach to alleviating the challenges of transmission control
protocol, IET Communications, 8(2014), 860–869,.
[10] J. Teng, H. Snoussi, and C. Richard, Prediction-based cluster management for target tracking in
wireless sensor networks, Wireless Communications and Mobile Computing, 12(2012), 797-812,.
[11] Z. Zhou, S. L. Zhou, S. G. Cui, et al., Energy-efficient cooperative communication in clustered wireless
sensor networks, IEEE Transactions on Vehicular Technology, 3(2006), 271−290.
[12] G. Wang, Y. Zhao, J. Huang, et al. A K-means-based Network Partition Algorithm for Controller
Placement in Software Defined Network, International Conference on Communications (ICC), 2016.
[13] B. Jinsuk, K. A. Sun, and P. Fisher, Dynamic cluster header selection and conditional re-clustering for
wireless sensor networks, IEEE Transactions on Consumer Electronics, 56(2010), 2249−2257.
[14] J. Meng, S. R. Li, and Z. Zhou, Overall energy efficient clustering algorithm in UWB based wireless
sensor network, Second International Symposium on Intelligent Information Technology Application.
Shanghai, China, 2(2008), 806−810.
Y.-J. Yang et al. / Sensor Management Strategy with Probabilistic Sensing Model 591

[15] Z. Wang, W. Lou, Z. Wang, et al., A novel mobility management scheme for target tracking in cluster-
based sensor networks, Lecture Notes in Computer Science, 6131(2010), 172-186.
[16] F. Zhao, J. Shin, and J. Reich, Information-driven dynamic sensor collaboration, IEEE Signal
Processing Magazine, 19(2002), 61-72.
[17] R. Olfati-Saber, Distributed tracking for mobile sensor networks with information-driven mobility, In
Proc. 2007 American Control Conference, New York City, USA, 2007, 4606-4612.
[18] J. Passerieux, and D. Van Cappel, Optimal observer maneuver for bearings-only tracking, IEEE
Transactions on Aerospace and Electronic Systems, 34(1998): 777-788.
[19] B. Wang, Coverage problems in sensor networks: a survey, ACM Computing Surveys, 43(2011), 32-53.
[20] X. Wang, J. J. Ma, S. Wang, et al., Distributed energy optimization for target tracking in wireless
sensor Networks, IEEE Transactions on Mobile Computing, 9(2009), 73-86.
[21] Y. M. Zhu, Z. S. You, J. Zhao, et al., The optimality for the distributed Kalman filtering fusion with
feedback, Automatica, 37(2011), 1489-1493.
592 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-592

Generalized Hybrid Carrier Modulation


System Based M-WFRFT with Partial
FFT Demodulation over Doubly
Selective Channels
Yong LI a,b,c,1 , Zhi-Qun SONG a,b and Xue-Jun SHA c
a
Science and Technology on Information Transmission and Dissemination in
Communication Networks Laboratory, Shijiazhuang, Hebei, China
b
The 54th Research Institute of China Electronics Technology Group Corporation,
Shijiazhuang, Hebei, China
c
Communication Research Center, Harbin Institute of Technology, Harbin, China

Abstract. To mitigate the inter-symbol interference (ISI) and inter-carrier inter-


ference (ICI) over doubly selective (DS) channel, we, in this paper proposed a
novel communication system, generalized hybrid carrier modulation (GHCM) sys-
tem with the partial fast Fourier transform (FFT) demodulation. The GHCM sys-
tem, which is based on the multi-Weighted type fractional Fourier transform (M-
WFRFT), merges together the components of single carrier modulation (SCM) sys-
tem and orthogonal frequency division multiplexing (OFDM) system. And GHCM
system with partial FFT demodulation is expected to perform better than SCM and
OFDM systems with the same demodulation and channel equalization over DS
channels. In this paper, we first derive the structure of GHCM with partial FFT
demodulation after the basic knowledge. According to the numerical simulation,
it can be demonstrated that the GHCM with partial FFT performs better than SC
and OFDM systems. Furthermore, GHCM can be the development of hybrid carrier
modulation (HCM) system.
Keywords. Generalization hybrid carrier modulation (GHCM) system, M-WFRFT,
partial FFT demodulation;

Introduction

Recently, researches upon inter-carrier inference (ICI) and inter-symbol interference


(ISI) mitigation under the time-frequency selective fading (i.e., doubly selective (DS))
channels, such as the next generation cellular channels, Digital video broadcasting
(DVB) channels and low earth orbit satellite (LEO) channel [1–4], have been popularity.
To overcome ICI and ISI over DS channels, the orthogonal frequency division multi-
plexing (OFDM) and single carrier modulation (SCM) systems have been considering.

1 Corresponding Author: Yong LI, The 54th Research Institute of China Electronic Technology Group

Corporation, Shijiazhuang, China, E-mail:young li 54@126.com


Y. Li et al. / Generalized Hybrid Carrier Modulation System 593

However, SCM system will be easily plagued due to the ISI under DS channels. OFDM
system will be impaired because of the significant time variations over highly Doppler
DS channels [5, 6].
To this end, we proposed the generalized hybrid carrier modulation (GHCM) system
with the partial fast Fourier transform (FFT) demodulation to mitigate the ISI and ICI
over DS channels in this paper. The GHCM system, merges the components of SCM and
OFDM systems. It is demonstrated via numerical simulation, with partial FFT demod-
ulation, that GHCM system outperforms both SCM and OFDM systems over the same
DS channels.
This paper is organized as follows. The first section presents preliminary to de-
rive the multi-weighted type fractional Fourier transform (M-WFRFT) and its important
property. The GHCM with partial FFT demodulation, will be provided in section two.
Furthermore, some simulations and discussions have been considered in section three.
We finally conclude the whole paper in the last section.

1. Preliminary

1.1. Multi-Weighted Type Fractional Fourier Transform (M-WFRFT)

There are many various forms for multi-WFRFT according to [7], such as the classical
fractional Fourier transform (CFRFT) based multi-WFRFT and the generalized classical
fractional Fourier transform (GCFRFT) based multi-WFRFT [8–12]. However, the 4-
weighted type fractional Fourier transform(i.e.,4-WFRFT) based M-WFRFT is popular
due to its structure [2]. Moreover, et.al in [7] have provided the explanation of multi-
WFRFT in theory. However, its application upon wireless communication has not been
explored. In this paper, we focus on the 4-WFRFT based M-WFRFT.
Since the M-WFRFT is based on the 4-WFRFT, we first derive the definition of 4-
WFRFT for the original signal. A set of N -length symbols X as X = {x1 , x2 , ..., xN },
α4 -order 4-WFRFT of X can be defined as:

4 [X] = W4 X
S = Fα α T
(1)

where W4 is N × N 4-WFRFT matrix, consisting of Fl , l = 0 ∼ 3. And, Fl is the lth


times of discrete Fourier transform (DFT) matrix F. Specially, Fl will be degenerated to
the unit matrix if l = 0, and Fl will converted back to the DFT matrix F when l = 1.
The 4-WFRFT matrix W4α , can be derived as:


3
W4α = Aρ (α)Fl , (2)
ρ=0

4  
Here, the weighted coefficient Aρ (α) = 14 m=1 exp − jmπ(α−ρ+1) 2 ,ρ =
0, 1, 2, 3. As mentioned previously, there are many different methods for defining multi-
WFRFT [7]. Upon the 4-WFRFT, we can define the M-WFRFT of signal X as follows:
594 Y. Li et al. / Generalized Hybrid Carrier Modulation System

Figure 1. GHCM system with Partial FFT demodulation over DS channels

Fα αM
M X = WM X, M > 4,
M
(3)

with


M −1
αM ( 4ρ )
WM = Bρ (αM )W4 M , M > 4. (4)
ρ=0

where

1 1 − exp[−2iπ(αM − ρ)]
Bρ (αM ) = , M > 4. (5)
M 1 − exp[−2iπ(αM − ρ)/M ]

1.2. Property of M-WFRFT

Property 1 For any α and β, the additive of M-WFRFT hold obviously:

α+β β β
WM = WM
α
WM = WM WM
α
(6)

According to the additive property of 4-WFRFT [2, 5], the proof of Property 1 is
straight and omitted here. Furthermore, the M-WFRFT will be degenerated to 4-WFRFT
with M = 4.

2. GHCM based M-WFRFT and Partial FFT Demodualtion

The baseband model for the GHCM system, employing partial FFT demodulation, is
demonstrated in Figure.1. When M = 4, the GHCM system will be degenerated
to HCM system based 4-WFRFT. Assuming that a set of N -length symbols S as
S = {s1 , s2 , ..., sN }, is in the α M-WFRFT domain. Then, the signal S will be converted
to time domain through −α order M-WFRFT. The GHCM block involves a cyclic pre-
Y. Li et al. / Generalized Hybrid Carrier Modulation System 595

fix(CP) with Tp ≥ Tl , where Tl is the maximum multipath delay spread. The transmitted
GHCM signal can be given as follows:

−α
D = WM S, (7)

The signal D can be then transmitted serially under DS channels. Furthermore, the
received signal, after removing CP, can be expressed as:

V = Ht S + n, (8)

where Ht denotes the time varying channel convolution matrix [1, 13]. Moreover, n is
the additive white Gaussian noise (AWGN) samples.

2.1. Partial FFT Demodulation

At the receiver, the time domain signal V can be converted to frequency domain through
FFT. However, to mitigate the ICI to some degree. We, in this paper, employ the partial
FFT demodulation. The partial FFT demodulation, first divide the whole sampling inter-
val [0, T ] to P non-overlapping intervals, with the pth interval of [(p − 1)T /P, pT /P ].
Each interval can be operated by an FFT. Let V = {v1 , v2 , ..., vN } be the signals at the
receiver, the partial FFT demodulation output will be holds:

Yp = FVp (9)

with

Vp = diag{p }V, p = 1, 2, ..., P. (10)

and

p = {0, 0, ..., 0, 1, 1, ..., 1, 0, 0, ..., 0}, p = 1, 2, ..., P. (11)


 ! "  ! "  ! "
N
(p−1)N
P P N − pN
P

2.2. Optimal Compensation based on MMSE

In order to further suppress ICI, the appropriate equalizer is needed to the GHCM sys-
tem. Assuming Uk = {uk,1 , uk,2 , ..., uk,P } is the optimal compensation for the k th
subcarrier. And then, the estimation of Xk , can be expressed as:

X̂k = Uk YkT (12)

in which,

#
N
N0 $−1  
Uk = (Hf,l ul−k )2 + IP Hf,k u0 , (13)
P
l=1
596 Y. Li et al. / Generalized Hybrid Carrier Modulation System

Figure 2. BER performance comparison between between GHCM system (M = 8, α = 0.8),SCM and
OFDM systems over DS channels

The estimation, i.e. Ŝ, can be given by an β-order M-WFRFT:


β
Ŝ = WM X̂ (14)

with α + β = −1.

3. Simulation and Anlysis

To verify the effectiveness of the GHCM modulation system, comparing to OFDM and
SCM modulation schemes with partial FFT, we present a simulation over the typical DS
channels. Let M = 8, the division number is 32, which trade off the calculational com-
plexity and performance. The normalized Doppler frequency is fd Td = 0.27. Here, fd
denotes the maximum Doppler frequency of the signal, and Td is the symbol sampling
duration. To simulate DS channels, we assume a ten-tap wide-sense stationary uncorre-
lated scatting (WSSUS) channel [14].
According to simulation result in Figure 2, we can see that the GHCM system
(M = 8, α= 0.8) performs better than OFDM system with partial FFT demodulation
when division number is 32. Furthermore, GHCM system with PFFT exhibits better BER
performance than SCM system with partial FFT as SNR > 15dB.

4. Conclusion

In this paper, we proposed a GHCM system based M-WFRFT with partial FFT demod-
ulation over doubly selective channels. We derive the structure of GHCM after the pre-
liminary. Moreover, we also explore the properties of M-WFRFT in this paper. Under
the considered DS channel, the performance of GHCM modulation system with proper
modulation order is more robust than OFDM system and SCM system with partial FFT
in the moderate-to-high SNR regions.
Y. Li et al. / Generalized Hybrid Carrier Modulation System 597

Acknowledgements

This work is supported by the fund of Science and Technology on Communication Net-
works Laboratory under grant number EX156410046 and the 973 Program under Grant
No.2013CB329003. Moreover, this work is also supported by the fund of National Key
Laboratory of Science and Technology on Communications.

References

[1] P. Schniter, Low-Complexity Equalization of OFDM in Double Slective Channels, IEEE


Transactions on Signal Processing, 52 (2004), 1002–1011.
[2] Y. Li, X. Sha and K. Wang, Low Complexity Equalization of HCM Systems with DPFFT
Demodulation over Doubly-Selective Channels, IEEE Signal Processing Letters, 21 (2014),
862–865.
[3] P. Schniter and H. Liu, Iterative equalization for single-carrier cyclicprefix in doubly-
dispersive channels, in Proc. 2003 Asilomar Conference on Signals, Systems and Computers,
1 (2003), 502–506.
[4] Y. Li, X. Sha and K. Wang, Hybrid Carrier Communication with Partial FFT Demodulation
over Underwater Acoustic Channels, IEEE Communications Letters, 17 (2013), 2260–2263.
[5] Y. Li, X. Sha and K. Wang, Hybrid Carrier Modulation System with Partial FFT Demod-
ulation Over Doubly Selective Channels in Presence of Carrier Frequency Offset, Circuits,
Systems, and Signal Processing , 33 (2014), 3967–3979.
[6] K. Tu, D. Fertonani and T. M. Duman and et.al, Mitigation of Intercarrier Interference for
OFDM Over Time-Varying Underwater Acoustic Channels, IEEE Journal of Oceanic Engi-
neering, 36 (2011), 156–171.
[7] Q. Ran, D. S. Yeung, C. C. Eric Tsang and et.al, General Multifractional Fourier Transform
Method Based on the Generalized Permutation Matrix Group, IEEE Transactions on Signal
Processing, 53 (2005), 83–98.
[8] H. M. Ozaktas, O. Arikan, A. Kutay and G. Bozdagi, Digital computation of the fractional
Fourier transform, IEEE Transactions on Signal Processing, 44 (1996), 2141–2150.
[9] G. Cariolaro, T. Erseghe, P. Kraniauskas and N. Laurenti, A unified framework for the frac-
tional Fourier transforms, IEEE Transactions on Signal Processing, 46 (1998), 3206–3219.
[10] G. Cariolaro, T. Erseghe, P. Kraniauskas and N. Laurenti, Multiplicity of Fractional Fourier
Transform and Their Relationship, IEEE Transactions on Signal Processing, 48 (2000), 227–
241.
[11] T. Erseghe, P. Kraniauskas and G. Cariolaro, Unified fractional Fourier transform and sam-
pling theorem, IEEE Transactions on Signal Processing, 47 (1999), 3419–3423.
[12] S. C. Pei, M. H. Yeh and C. C. Tseng, Discrete fractional Fourier transform based on orthog-
onal projections, IEEE Transactions on Signal Processing, 47 (1999), 1335–1348.
[13] Y. Li, X. Sha, Performance analysis of OP-HCM system with DPFFT demodulation over
doubly-selective channels, Wireless Communications and Signal Processing, 2014 Sixth In-
ternational Conference on, 1(2014), 1–6.
[14] J. Huang, S. Zhou, J. Huang and C. Berger and et.al, Progressive inter-carrier interference
equalization for OFDM transmission over timevarying underwater acoustic channels, IEEE
Journal of Selected Topics in Signal Processing, 5 (2011), 84–89.
598 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-598

On the Benefits of Network Coding for


Unicast Application in Opportunistic
Traffic Offloading
Jia-Ke JIAO a , Da-Ru PAN a,1 , Ke LV b and Li-Fen SUN a
a School of Physics and Telecommunication Engineering, South China Normal

University,Guangzhou China
b University of Chinese Academy of Sciences, Beijing China

Abstract. With the explosive increase in mobile services and user demands,cellular
networks will very likely be overloaded and congested in the near future.To cope
with this explosive growth in traffic demands with the limited current network ca-
pacity, opportunistic network is used to offload traffic from cellular networks to
free device-to-device networks. Network coding can make full use of the mobility
of nodes and improve the performance of opportunistic traffic offloading. In this
paper ,we investigate the benefits of applying a form of network coding known
as random linear coding to unicast application in opportunistic traffic offloading.
Moreover we establish a mathematical model to analyze the benefits of RLC. RLC
schemes achieve faster information propagation at the price of a greater number of
transmissions and take up a lot of storage space. To optimize scheme, we utilize the
survival time to control the number of packets in the network and free up storage
space for node.

Keywords. Opportunistic network, network coding, packets control

Introduction

Mobile Internet access is getting ever-increasingly popular and today provides various
services and applications, including audio, images and video. Mobile Social Networks
have began to attract increasing attention in recent years and currently a large percent-
age of mobile data traffic is generated by these mobile social application and mobile
broadband-based PCs[1]. Therefor, mobile data traffic is growing at an unprecedented
rate, which causes many problems to network providers. Many studies show that the
traffic load on cellular networks may soon reach the networks’ critical limit.
To cope with the explosive traffic demands and limited capacity provided by the cur-
rent cellular networks, many schemes are proposed using opportunistic network[2][3][4].
Opportunistic network is self-organization network that utilizes the mobile node contact
opportunity to achieve data communications. It does not require one complete commu-
nication path exists between source and destination. By utilizing delay-tolerant-network,
1 Corresponding Author: Da-Ru PAN, school of Physics and Telecommunication Engineering, South China

Normal University, Guangzhou China; E-mail:pandr@scnu.edu.cn


J.-K. Jiao et al. / On the Benefits of Network Coding for Unicast Application 599

service providers may deliver the information to only a small fraction of selected users
to reduce mobile data traffic and the selected users then help to further propagate the
information among all the subscribed users through their social participation.When the
non-selected nodes are in the communication range of the selected nodes, the selected
nodes will disseminate the information to the non-selected nodes.Then,the non-selected
nodes disseminate messages to other nodes which have not received messages after they
get it from selected nodes or others. But these schemes can not make full use of the
node’s mobility characteristics. Because in delay-tolerant-network, the factors that affect
the information dissemination are not only related to the probability of the node’s en-
counter, but also the effective information carried by nodes. By encoding can increase
the effectiveness of information carried by nodes when nodes are within the transmission
range.
Our main contribution is that by extending the result of Xiaolan Zhang et al.[4],
we analyze the situation of multiple-sources single-destination and apply this scheme
to offload mobile data. Moreover we establish a mathematical framework to analyze
the delivery rate of packets. Another contribution is that we propose a new scheme to
reduce the number of packets in the network. Under the RLC scheme, when two nodes
meet each other, the probability that nodes carry useful to exchange is higher, so the
number of packet copies is larger in the network. Now the main way of some of the
replication control scheme is to assign a certain number of tokens to each packet. But the
flooding speed of these schemes is slow. In our scheme, the packet flood until a threshold
calculated based on the probability model we deduced and then the packets are deleted
according to their TTL. The advantage of this scheme is that it achieves much lower
latency and can control the number of packets in the network.
This paper is organized as follows: we briefly review related work in Section2. An-
alyze offloading mobile traffic by RLC scheme and optimize RLC scheme in Section3.
Finally, Section4 concludes the paper.

1. Related Work

A detailed study of the linear encoding was done by Zhang et al[4], they focus on the
benefits of applying random liner coding to unicast application in opportunistic network.
Their work exhibits the benefit of RLC schemes through simulations. In our work ,we
will do further research on the RLC schemes and apply it to offload mobile data.
Groenevelt et al.[5] built a stochastic model using Laplace-Stieltjes transform to
analyze the message delay and the number of copies in network. The work [6] develop
a rigorous, unified framework based on ordinary differential equations(ODEs) to study
epidemic routing and its variations and Lin et al.[7] developed ODE models to analyze
the group delivery delay for a single group of single source single destination packets
under random liner coding and no-coding. We establish a mathematical model to analyze
the relationship between delivery rate and simulation time.
In RLC schemes, To control the number of packets, Zhang et al.[8] introduces a
token-based RLC scheme that extends binary spray-and -wait[9][10]. This scheme as-
sociates a token number with each generation which limits the transmission times of
combinations and implemented by two steps. The first step is token reallocation that re-
distribute the tokens of generations in proportion to the ranks of nodes, and the second
600 J.-K. Jiao et al. / On the Benefits of Network Coding for Unicast Application

step is that transmit one combination that if the meeting nodes have useful information
then they will transmit a random liner combination and their tokens will decrease one.
We utilize the survival time to control the number of packets in the network and free up
storage space for node.

2. System Model and Formulate the Problem

In this section, we describe the scheme that use random liner coding to deliver the mes-
sage and apply this scheme to offload mobile data.

2.1. RLC-Based Routing scheme

Random Liner Coding, just as its name implies is to generate a linear combination of the
encoded packets which is viewed as a vector of symbols from a finite field GFq of size
q[11]. A S bits packets can be viewed as a vector of d = [S/logq2 ] symbols from GFq .
So,we assume that a data can be divided into K packets Pi , i = 1, 2, . . . , K,and the size
of every packet is S bit. We can viewed Pi as a vector mi , i = 1, 2, . . . , K,with d symbols
from GFq . Combining the K packets linearly and the encoded packets x can be written as

K
x = ∑ αi m i
i=1
(1)
= (α1 , α2 , . . . , αK ) × (m1 , m2 , . . . , mK )T
= α × M

Where αi is encoding coefficient from GFq , the vector of α = (α1 , α2 , . . . , αK ) is


encoding vector and x is the encoded packets.The M = (m1 , m2 , . . . , mK )T is the K × d
matrix of the K original packets.
In RLC scheme, encoded packets are transmitted in the network in place of the
original packet. The received encoded packets in mobile nodes’ buffer will constitute a
encoding matrix. If the number of encoded packets one node stored is r, the encoding
matrix is X = (x1 , x2 , . . . , xr ). A is formed by the encoding vectors of x1 , x2 , . . . , xr . The
destination nodes can get the original packets through solving linear equations X = AM,
where A is the r ×K matrix of encoding vectors, X is the r ×d matrix of encoded packets.
We can get the M if and only if the rank of (A) is K. So the destination node needs
to receive K linearly independent encoding vectors. Without encoding, the nodes only
deliver the packets that it stored,but the packets delivered under RLC schemes is a linear
combination of packets it stored in its buffer. What follows is to introduce the information
switching between two nods under RLC schemes.
When two mobile nodes are within transmission range , they first exchange infor-
mation of encoded packets they carried and judge whether the other node carries the en-
coded packets that is useful to itself. For example, two nodes u and v, if the encoding
vectors of the encoded packets that stored in u’s buffer can increase the rank of encoding
vectors that v has, then u generates a new random linear combination of the encoded
packets in its buffer x1 , x2 , . . . , xr :Xnew = ∑rj=1 β j xj , where the coefficients are chosen
uniformly from the finite field. Xnew is useful(i.e.,it can increase the rank of encoding
matrix )to v with probability greater than or equal to 1 − 1/q[12].
J.-K. Jiao et al. / On the Benefits of Network Coding for Unicast Application 601

2.2. Network modeling

We consider a network with N identical mobile nodes.Among the N nodes, there are N1
initial nodes,N2 intermediary nodes and N3 destination nodes.The size of large data is M
and the size of packet is m.The number of packet is M/m equaling to N1 i.e. N1 = M/m.
In our scheme, every initial nodes will carry one packet and the destination nodes needs
to collect the N1 packets. If destination nodes could not receive initial N1 packets from
other nodes after a specified "tolerable" duration which is related to the data lifetime,
they can directly ask to receive the packets from the cellular network.We denote the total
number of packets sent in the core network is L , the number of useful packets destination
nodes received is Q, the number of destination nodes have received the complete data
is W, the number of packets the core network cost is U. If the Q = N1 ,we say that this
destination node belongs to W and don’t need to be receive packets from base station.

2.3. Analyze multiple source single destination RLC routing scheme by mathematical

Next we analyze multiple source single destination RLC scheme. To reduce the amount
of cellular traffic compared to the epidemic routing, the advantage of multiple source sin-
gle destination RLC scheme increases the effectiveness of information. In RLC scheme,
each node may contain the packets you need, so the Poisson intensity for each node to
meet the other nodes at the next moment is Nλ .We assume that the probability that one
node contain useful packet is 1/2 when two nodes encounter each other. So the practical
Poisson intensity is 1/2Nλ . In the previous content, we have introduced the probability
of the useful packets nodes accepted is greater than or equal to 1 − 1/q which is only
with regard to the size of finite field. In RLC scheme, the destination nodes get the data
only in the case that they collects N1 linearly independent packets, so the destination
nodes needs to receiveN1 /(1 − 1/q) packets at most. So in a certain period of time T,
the probability of the target node to receive one packet is Prlc = 1 − e−1/2Nλ T and the
probability of the target nodes successfully get the initial data which does’t divide into
packet is

PRLC = (1 − e−1/2Nλ T )N1 /(1−1/q) (2)

2.4. Offloading Mobile Traffic By RLC Scheme

Next, we consider the benefits of offloading mobile data under RLC scheme and epi-
demic routing. We use relatively small data packets with m bit which can be successfully
transmitted within the transmission range. We can calculate the total size of mobile data
offloaded and the consumption of mobile traffic. In this paper,we apply the total number
of packets sent in the core network to measure the pros and cons of each routing mech-
anism. Without delay tolerant network, U is equal to U = N1 × N3 .Under the multiple
source single destination RLC scheme, W = N3 × PRLC and the U is

N3 −W
U = N1 + ∑ (N1 − Qi ) (3)
i=1

In this simulation we have a data into 10 data packets and encoding (i.e the number
of initial nodes is 10 which every node carries one packet according to our scheme ) ,
602 J.-K. Jiao et al. / On the Benefits of Network Coding for Unicast Application

then set 10 target nodes, select 80 volunteer nodes helping transmission. Figure1 plots
the delivery rate of epidemic routing and RLC. The flooding speed of this multiple source
RLC routing is faster and all the target nodes successfully receive data using less time
than epidemic routing. Figure2 plots the total number of packets copies in the network
.The number of packets in nodes’ buffer under RLC routing is more than epidemic rout-
ing.Another phenomenon shown in Figure2 is that when the growth trend of curves tends
to be saturated (i.e every node has all the packets), the final number of packets in RLC
routing is more than epidemic routing. The reason for this phenomenon is that RLC
scheme may produce linear correlation encoding vector in the process of rapid flooding.
So RLC scheme take up a lot of storage space.

Figure 1. Delivery rate of two schemes Figure 2. The number of packets in the network

To control the number of packets in the network and free up storage space of node,
a scheme is proposed in the following section.

3. A New Scheme to Reduce the Copies of Network

While flooding-based routing schemes have a high probability of delivery, they waste a
lot of energy and incur a great number of transmissions. To reduce the copies of network,
a frequently used scheme is binary spray and wait scheme. Zhang et al.[8] proposed the
token-based RLC schema that extends binary spray and wait. The node redistribute their
tokens in proportion to their ranks.
However limiting the the number of token affect the speed of flooding. We propose a
new replication control scheme based on the TTL of packet. In our scheme, every initial
packet generated in the source node is assigned with survival time Tttl . When two nodes
are within transmission range, a new copy is generated to hand over to another node
with survive time Tttl .This scheme at the expense of large number of transmission in
exchange for reducing the copies of network. Considering that the node mobility shows
a very high degree of temporal and spatial regularity, and each individual returns to a few
highly frequented locations with a significant probability, the packets stored in node is
are nearly saturated after message flooding over a period time. The messages carried by
nodes waste mass of storage space. Our scheme consists of two phases: flooding phase
and delete phase. In flooding phase, given a probability PRLC and calculate the T based on
J.-K. Jiao et al. / On the Benefits of Network Coding for Unicast Application 603

the probability model PRLC = (1 − e−1/2Nλ T )N1 /(1−1/q) we derived. T is used to control
the time of packet flooding. In delete phase(i.e. after time T ), the packets are removed
according to their Tttl . The Tttl is determined according to Prlc = (1 − e−1/2Nλ ) where
Prlc is the probability that one node successfully receive a packet.

Figure 3. Packets receiving from base station Figure 4. The number of packets in the network

Figure 5. The number of packets in the network Figure 6. Packets receiving from base station

Figure3 plots the number of packets needs receiving from base station under three
schemes. The performance of our scheme outperforms the two schemes(binary and spray,
controlling packets based on rank). In the same simulation time, our scheme achieves
higher delivery rate than other scheme. Figure4 depict the number of packets in the net-
work of four schemes. The number of packets keeps growing and tends to be saturated
after a period of time. In controlling packets based on rank and binary and spray schemes,
we set token=64. Because the flooding speed of these schemes is slow, so the number of
packets is relatively small and the delivery rate is low. In our scheme, there is a signif-
icant decline at threshold T which is used to control the time of packet flooding. After
time T, the number of packets is relatively stable. Figure5 and Figure6 plot the number
of packets in the network and the number of packets receiving from base station under
different value of. The higher delivery rate is achieved under the higher and meanwhile
it consumes a large amount of storage space.
604 J.-K. Jiao et al. / On the Benefits of Network Coding for Unicast Application

4. Conclusion

In this paper, we have studied the problem of how to offload multiple mobile data traf-
fics from overloaded cellular networks using RLC scheme in opportunistic network. Be-
cause of its higher degree of randomness compare to non-coding schemes, RLC scheme
increase the delivery rate and reduce the cost of base station. Especially we analyze
the relationship between delivery rate and simulation time by establishing mathematical
model. To optimize scheme, we utilize the survival time to control the number of packets
in the network and free up storage space for node. We plan to investigate the influence of
the size of packet on the RLC scheme.

Acknowledgements

This study was supported by the National Natural Science Foundation of China under
Grant Nos.61471175 and U1301251, the Supporting Plan for New Century Excellent
Talents of the Ministry of Education under Grant No. NCET-13-0805.

References

[1] Mobile Data Traffic Surpasses Voice, http://www.cellularnews.com/story/42543.php,2010


[2] B. Han, P. Hui, et al. Mobile Data Offloading through Opportunistic Communications and Social Partic-
ipation. IEEE Transactions on Mobile Computing 11(2012):821-834.
[3] X. Zhang, G. Neglia, J. Kurose, et al. Benefits of network coding for unicast application in disruption-
tolerant networks. IEEE/ACM Transactions on Networking, 21(2013):1407-1420.
[4] S. Jain, M. Demmer, R. Patra, et al. Using redundancy to cope with failures in a delay tolerant network.
ACM, 2005:109-120.
[5] R. Groenevelt, P. Nain, G. Koole. The message delay in mobile ad hoc networks. Performance Evalua-
tion,62(2005):210-228.
[6] X. Zhang, G. Neglia, J. Kurose, et al. Performance modeling of epidemic routing. Lecture Notes in
Computer Science,51(2006):827-839.
[7] Y. Lin, B. Liang, B. Li. Performance Modeling of Network Coding in Epidemic Routing. In
Proc.MobiOpp,2007
[8] X. Zhang, G. Neglia, J. Kurose, et al. Benefits of Network Coding in Disruption Tolerant Networks.
IEEE/ACM Transactions on Networking,2(2010):267-308.
[9] T. Small, Z. J. Haas. Resource and Performance Tradeoffs in Delay-Tolerant Wireless Networks. Acm
Workshop on Delay Tolerant Networking, 2005:260-267.
[10] T. Spyropoulos, K. Psounis, C. S. Raghavendra. Spray and wait: an efficient routing scheme for inter-
mittently connected mobile networks. ACM SIGCOMM Workshop on Delay-Tolerant NETWORKING.
ACM, 2005:252-259.
[11] R. Lidl and H. Niederreiter, Finite Fields,2nd ed. Cambridge,U.K.: Cambridge Univ. Press,1997.
[12] S. Deb, M. Medard, C. Choute. Algebraic gossip: a network coding approach to optimal multiple rumor
mongering. Information Theory IEEE Transactions on, 14(2006):2486-2507.
Fuzzy Systems and Data Mining II 605
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-605

A Geometric Graph Model of Citation


Networks with Linearly Growing
Node-Increment
Qi LIU 1 , Zheng XIE and En-Ming DONG and Jian-Ping LI
National University of Defense Technology, Changsha, (410073), China

Abstract. Due to the fact that the numbers of annually published papers in some
citation networks have witnessed a linear growth, a geometric model is thus raised
to reproduce some statistical features of those networks, in which the academic in-
fluence scopes of the papers are denoted through specific geometric areas related to
time and space. In the model, nodes (papers) are uniformly and randomly sprinkled
onto a cluster of circles of the Minkowski space whose centers are on the time axis.
Edges (citations) are linked according to an influence mechanism which indicates
that an existing paper will be cited by a new paper located in its influence zone.
Considering the citations among papers in different disciplines, an interdisciplinary
citation mechanism is added to the model in which some papers with a small prob-
ability of being chosen will cite some existing papers randomly and uniformly. Dif-
ferent from most existing models that only study the scale-free tail of the in-degree
distribution, this model characterizes the overall in-degree distribution. Moreover,
the model can also predict the scale-free tail of the out-degree distribution, which
indicates that the model is a good tool in researches on the evolutionary mechanism
of citation networks.
Keywords. Citation network, Influence mechanism, Interdisciplinary citation
mechanism, Power-law distribution

Introduction

The research of citation networks has drawn increasing attention and been applied to
many fields. It can help scientists find useful academic papers, help inventors find inter-
esting patents, or help judges discover relevant past judgements. The scientific citation
networks considered in this paper are directed graphs, in which nodes represent papers,
while edges stand for the citation relationships between them. Since new papers can only
cite the published papers [1], these graphs are acyclic.
Degree distribution is a fundamental research object of citation networks, and a se-
ries of models have been proposed to illustrate it. The Price model appears to be the first
to discuss about cumulative advantage in the context of citation networks and their in-
degree distributions [2,3]. The idea lies in that the rate at which a paper gets new citations
1 Corresponding Author: Qi LIU, National University of Defense Technology, Changsha, China ; E-mail:

liuqi@smail.nju.edu.cn.
606 Q. Liu et al. / A Geometric Graph Model of Citation Networks

should be proportional to the citations that it already has. This can lead to a scale-free
distribution according to the Price model [4]. The cumulative advantage is also known as
the preferential attachment in other literatures [5]. An investigation has been conducted
by Eom et al [1] on the microscopic mechanism for the evolution of citation networks
by raising a linear preferential attachment with the initial attractiveness dependent on the
time. The model reproduces the tails of the in-degree distributions and the phenomenon
called “burst”: the citations received by papers increase rapidly in the early years since
publication. The above-mentioned models have studied the tail of the in-degree distribu-
tion only, while the two-mechanism model proposed by George et al [4] characterizes
the properties of the overall in-degree distributions.
With respect to the research of networks from the real world (e.g. citation networks),
using random geometric graph has become a hot topic in recent years. Xie et al [6] define
the academic influence scope as a geometric area and present an influence mechanism,
which means that an existing paper will be cited by a new paper located in its influence
zone. Based on this mechanism, they further propose the concentric circles model (CC
model), which can fit the scale-free tails of the in-degree distributions of the citation
networks with the exponentially growing nodes. Nevertheless, the forepart of the in-
degree distribution and the out-degree distribution can not be well fitted by this model.
In reality, node-increment in many current citation networks enjoys a linear growth,
e.g. Cit-HepPh, Cit-HepTh (Figure 1). Therefore, a model with linearly growing node-
increment is proposed. The edges in the model are still linked according to the influence
mechanism, whereas they are revised in that the influence scopes of papers are deter-
mined by their topics and ages (the time that has passed since publication). Differen-
t from the previous models that only focus on the tails of in-degree distributions, the
improved model can well predict the overall in-degree distributions of the real citation
networks. In consideration of the citations among different disciplines in real citation
networks, a mechanism that is referred to as the interdisciplinary citation mechanism is
proposed. Under appropriate parameters, these mechanisms can reproduce the scale-free
tail of the out-degree distribution of citation networks. These results show that our model
can be used as a medium to study the intrinsic mechanism of citation networks.
The structure of this paper is as follows. The model is described in Section 2. The
degree distributions are analyzed in Section 3, and finally the conclusion is provided in
the last section.
Table 1. Some statistical indices of the citation networks.
Networks Nodes Links CC AC AC-In AC-out PG MO
Cit-HepTh 27770 352807 0.165 -0.030 0.041 0.096 0.987 0.650
Cit-HepPh 34546 421578 0.149 -0.006 0.077 0.112 0.999 0.724
Modeled network 33165 162080 0.393 -0.068 0.316 0.166 0.970 0.967
1 The first two networks extract from arXiv which cover paper from January 1993 to April 2003
(124 months) in high energy physics theory and in high energy physics phenomenology [7, 8]. The
last network is generated according to the generating process of the model, where parameters are
m = 15, T = 66, β0 = 0.035, λ = 0.001, S = 66, p = 1, η = 2.5, α = 1.3, r = 0.01, ξ = 2.7, k0 = 6.
2 In the header of the table, CC, AC, AC-In, AC-Out, PG and MO denote the clustering coefficient,
the assortative coefficient, the in-assortative coefficient, the out-assortative coefficient, the node
proportion of giant component and modularity, respectively.
Q. Liu et al. / A Geometric Graph Model of Citation Networks 607

800 10000
600
Cit−HepPh Modeled network (c)
Cit−HepTh 9000
550 700 y=2.277t+208.2 y=15t
y=1.543t+210.7

The number of papers


8000

The number of papers


The number of papers

500
(a) 600
(b) 7000
450
6000
500
400
5000
350 400
4000

300 3000
300

250 2000
200
200 2 1000
R 2 =0.7292 R =0.7287 R 2 =1
100 0
150 0 20 40 60 80 100 120 140 160
0 50 100 150 0 100 200 300 400 500
t/Month t/Month t/2Month

Figure 1. The changing trends of the monthly numbers of papers of the data in Table 1. (a)
the trend for the papers of Cit-HepTh; (b) the trend for papers of the papers of Cit-HepPh; (c) the
trend for the modeled network. They are fitted by linear functions. The coefficient of determination
(R2 ) is used to measure the goodness of fits.

1. The model

Since many journals and databases publish papers monthly or yearly and papers in the
same issue cannot cite each other normally, models like the Price model or the copy
model that publish one paper at each time step do not consider the growing trends of
papers. Xie et al [6] pay attention to the citation networks in which the number of papers
published each year grows exponentially. However, in some real citation networks (e.g.
Cit-HepPh and Cit-HepTh), the monthly or annual numbers of papers published grow
linearly (Figure 1). For purpose of the evolution and features of these networks, a ge-
ometric graph model, in which the node-increment in specific time unit experiences a
linear growth, is proposed here.
In our model, a simple spacetime, (2+1)-dimensional Minkowski spacetime of two
spatial dimensions, along with one temporal dimension is considered, so that the time
characteristics of the nodes in citation networks can be modeled. The nodes in the model
are uniformly and randomly sprinkled onto a cluster of concentric circles (the centers of
which are on the time axis), and some spatial coordinates are given to them to represent
the research contents of papers. In addition, the nodes on different circles are generated
in different time units, while those in the same circle represent the papers published in
the same issue. The number of nodes in a circle is a linearly increasing function of the
circle’s temporal coordinate. In the spacetime, nodes are identified by their locations
(R(t), θ ,t), where t is the generated time of the node, R(t) is the radius of the circle
born at time t, and θ is the angular coordinate. Considering that the radius R(t) and the
time t are 1-to-1 correspondence, each node is identified by its location only with time
coordinate t and angular coordinate θ . The edges in the model are linked according to
the influence mechanism and the interdisciplinary citation mechanism,
Supposing that a modeled network has N(t) = mt papers (m ∈ Z+ ) published at time
t (t = 1, 2, ..., T ∈ Z+ ), including some interdisciplinary papers, the generating process
of the model is listed as follows.
1. Generate a new circle Ct with radius R(t) = N(t)/(2πδ ) (δ ∈ R+ ) centered at
point (0, 0,t) at each time t = 1, 2, ...T ∈ Z+ , sprinkle N(t) nodes (papers) on it
randomly and uniformly, and give each node a coordinate, e.g. the coordinate of
node i is (θi ,ti ).
2. For each node with coordinate (θ ,t), the influence zone (academic influence s-
cope) of the node is defined as an interval of angular coordinate with center θ
and arc-length D = β (θ )/t α , where α ∈ (1, 2) is used to tune the exponent of
608 Q. Liu et al. / A Geometric Graph Model of Citation Networks

scale-free tail of in-degree distribution, and β (θ ) is used to make the in-degree


distribution of papers published each time have a scale-free tail.
3. For node i and node j, the coordinates of which are (θi ,ti ) and (θ j ,t j ) respective-
ly, if the distance of angular coordinates Δ(θi , θ j ) = π − |π − |θi − θ j || < |Di | and
t j > ti , a directed edge is drawn from j to i under a probability p.
4. Select r(r  1) percent nodes as interdisciplinary papers to continually and ran-
domly cite some existing papers to make the reference lengths (out-degrees)
of those papers to be random variables drawn from a power-law distribution
f (k) = k−ξ (k > k0 ).
The function β (θ ) in Step 2 is a staircase function of θ


⎪ β0 , θ ∈ [0, θ1 ]

⎨ β0 + λ , θ ∈ [θ1 , θ2 ]
β (θ ) = .

⎪ ..


β0 + (S − 1)λ , θ ∈ [θS−1 , 2π ],

where β0 ∈ R+ , λ > 0, S ∈ Z+ , and [θ0 , θ1 ], ..., [θS−1 , θS ] are a specific partition of [0, 2π ]
satisfying Δ(θi+1 , θi ) = 2π (β0 + iλ )−η /∑S−1
j=1 (β0 + j λ )
−η , i = 0, 2, ..., S − 1, η > 0, θ =
0
0, θS = 2π , and the aging of the papers’ influences is ignored here due to the short time
span of the empirical data (around ten years) (Table 1).
0
(a) Modeled network (b) Cit−HepTh −1 (c) Cit−HepPh
10 10
−1
10
Out−degree distribution

−1
Out−degree distribution
Out−degree distribution

10
−2 −2
10 10

−2
10

−3 −3
10 10
−3
10

−4 −4
−4 a=3.518 10 10 a=3
10 a=2.636
b=0.2939 b=0.8646 b=0.788
R2 =0.9989 R2=0.9815 R2 =0.9893
RMSE=0.001742 RMSE=0.002605 RMSE=0.00211
−5 −5
10 10
−5
10
0 1 2 0 1 2 3
10 10 10 10
0
10
1
10
2
10
3
10 10 10 10
Node out−degree k Node out−degree k Node out−degree k

Figure 2. Out-degree distribution. (a) the out-degree distribution of the modeled network; (b)
the out-degree distribution of Cit-HepTh; (c) the out-degree distribution of Cit-HepPh. The fit-
ting functions of their foreparts are f1 (k) = a(a + bk)k−1 e−a−bk /k!. The root mean squared error
(RMSE) and coefficient of determination (R2 ) are used to measure the goodness of fits.

2. Out-degree Distribution

The out-degree distributions of the real citation networks (Table 1) take the form of scale-
free tails and curves in the forepart (Figures 2b,2c). The curves in the forepart of the out-
degree distributions can be well fitted by the generalized Poisson distribution. In reality,
the behavior that paper j cites paper i is influenced by the number of the citations [2, 3]
and the popularity of paper i s author. At the same time, it can be viewed as a low-
probability event (the reference length of paper j is very small compared with the large
number of papers). These settings are suitable for the use of the generalized Poisson
distribution. Now the formulas of the forepart and tail of the out-degree distribution of the
modeled network (Table 1) are derived to revel the mathematical mechanism behind the
phenomenon that our model generates the similar curve and scale-free tail (Figure 2a).
Q. Liu et al. / A Geometric Graph Model of Citation Networks 609

The edges in the model are linked according to the influence mechanism and the
interdisciplinary citation mechanism. Firstly, the non-interdisciplinary paper i with co-
ordinate (θi ,ti ) is considered. For prior node j, its coordinate is (θ j ,t j ), where t j < ti . If
Δ(θi , θ j ) < β (θ j )/t αj , node i is located in the influence zone of node j. When β (θ j )/t αj
is small enough, β (θi ) ≈ β (θ j ), because β (.) is a staircase function. Then the expected
out-degree of node i is as follows:

ti −1
β (θi )p mβ (θi )pti2−α
k+ (θi ,ti ) = ∑ tα R(t j )δ ≈
2π (2 − α )
, (1)
t j =1 j

which is an increasing function of the temporal coordinate ti . When ti is large enough,


∂ k+ (θi ,ti )/∂ ti ≈ 0, indicating that the reference length of the paper denoted by node
i is approximately a constant. This is in accordance with the actual situation that the
reference length of papers cannot grow infinitely.
Since the process of sprinkling nodes follows the Poisson point process, the actual
out-degree of node i is not exactly equal to the expected out-degree. Therefore, in or-
der to obtain the correct out-degree distribution, it is necessary to average the Poisson
distribution,

1 +
(k (θi ,ti ))k e−k (θi ,ti ) ,
+
p(k+ (θi ,ti ) = k) = (2)
k!

which is the probability that node i has out-degree k, with the temporal density ρ (ti ). In
this model, ρ (ti ) ≈ 2ti /T 2 . So the out-degree distribution is
mβ (θi )p
 2π −
1 mβ (θi )p k+ α e 2π (2−α )
+
pnon (k = k) ≈ ( ) 2−α d θi . (3)
2π 0 2π (2 − α ) k!

It is a mixture poisson distribution, which is similar to that of the real citation networks.
Moreover, we use the generalized Poisson distribution to fit the curve in the forepart of
the modeled out-degree distribution, and get a good result (Figure 2a).
The interdisciplinary papers make the tail of the modeled out-degree distribution
fat (Step 4) (Figure 2a). Thus, in combination with the non interdisciplinary papers, the
out-degree distribution is

p(k+ = k) = (1 − r)pnon (k+ = k) + r f (k), (4)

where r expresses the proportion of interdisciplinary papers, and f (k) refers to the power-
law distribution mentioned in Step 4.
610 Q. Liu et al. / A Geometric Graph Model of Citation Networks

0
(a) Modeled network 0 (b) Cit−HepTh 0 (c) Cit−HepPh
10 10 10
a=1.225 a=1.7 a=1.825
b=0.6643 b=0.8789
In−degree distribution

b=0.8358

In−degree distribution

In−degree distribution
10
−1 R2=0.9994 10
−1
R2 =0.9991 10
−1
2
R =0.9992
RMSE=0.000816 RMSE=0.000766 RMSE=0.000546

−2 −2 −2
10 10 10

10
−3
γ=2.47 10
−3 γ=2.7 10
−3
γ=3.47
xmin =12 xmin =55 xmin=94
p=0.0120 p=0.0060 p=0.3500
−4 gof=0.0157 −4
gof=0.0265 −4 gof=0.0237
10 10 d=70.75 10
d=1.579 d=2610.15
R2 =0.9931 R2=0.9107 R2=0.7899
RMSE=0.000095 RMSE=0.000075 RMSE=0.000044
ϣ
−5 −5 −5
10 10 10
0 1 2 3 0 1 2 3 4 0 1 2 3
10 10 10 10 10 10 10 10 10 10 10 10 10
Node in−degree k Node in−degree k Node in−degree k

Figure 3. In-degree distribution. (a) the in-degree distribution of the modeled network; (b) the
in-degree distribution of Cit-HepTh; (c) the in-degree distribution of Cit-HepPh. The fitting func-
tions of their foreparts are f1 (k) = a(a + bk)k−1 e−a−bk /k!, and tails are f2 (k) = ck−γ (fitted by the
method in Ref ( [9])).

3. In-degree Distribution

The in-degree distributions of the real citation networks have been investigated with the
result showing that the curves in the forepart of the in-degree distributions can be well
fitted by the generalized Poisson distribution (Figures 3b,3c). Actually, the citations of
one paper are affected by the new papers of its authors, and the probability of one paper
receiving citations (be selected from plenty of papers) is small and not equal to that of
other papers. These are the conditions in which the generalized Poisson distribution can
be applied. Besides, the in-degree distributions of the real citation networks have scale-
free tails (Figures 3b,3c) which are usually interpreted as a result of the preferential at-
tachment [2,3]. In this model, this phenomenon is caused by the highly cited papers with
large influence zones. Now, an expression of the forepart and tail of the in-degree distri-
bution of the modeled network is derived to revel the mathematical mechanism behind
the phenomenon that our model generates the similar curve and scale-free tail(Figure 3a).
For the modeled paper i, it can receive citations from the papers located inside or
outside of its influence zone. Therefore, the expected in-degree of paper i with coordinate
(θi ,ti ) is

T
β (θi )p T
2 mβ (θi )p T −1
k− (θi ,ti ) = ∑ tα R(s)δ + r ∑ s − 1 ≈ 4π t α (T 2 − ti2 ) + 2r ln ti . (5)
s=t+1 s=t+1 i

When ti is small, the first item in formula (5) is larger than the second item, so k− (θi ,ti ) ≈
mβ (θi )pT 2 /4π tiα . The in-degree distribution in the large in-degree region, by averaging
the Poisson distribution, is

(τ −k+ αα
+2 )2
 2π  a2 − 2(k− αα
+2 )
1 −(1+ α2 ) e
p(k− = k) ≈ k  d τ d θi , (6)
2π 0
a2
Tα 2π (k − αα+2 )

where a2 = mβ (θi )pT 2 /4π , τ = a2 /tiα . The Laplace approximation and the Stirling’s
approximation are used in this approximation. It can be proven that the integral term of
τ is approximately independent of k. In this way, the modeled in-degree distribution in
the large-k has a scale-free tail with exponent 1 + 2/α .
When ti is large, the time derivative of the influence zone of paper i is considered,

(β (θi )/tiα ) = −αβ (θi )ti−α −1 ≈ 0. It means that the influence zone of paper i in this
Q. Liu et al. / A Geometric Graph Model of Citation Networks 611

model is approximately a constant when ti is large. Hence, it is assumed that β (θi )/tiα ≈
D (D is a constant). The expected in-degree of paper i is k− (θi ,ti ) ≈ mDp(T 2 −ti2 )/4π +
2r ln((T − 1)/ti ). And thus the in-degree distribution in the small in-degree region is

mDp k − mDp
4π (T −1)
2
4cπ ( 4π (T − 1)) e
2

p(k = k) ≈
mDpT 2 k!
(1 − c)e−2 ln(T −1) (2r ln(T − 1))k e−2r ln(T −1)
+ . (7)
r k!
It indicates that the in-degree distribution of the modeled network in the small in-degree
region is a mixture Poisson distribution, which is similar to that of the real citation net-
works. Moreover, the fitting shows that the in-degree distribution of the modeled network
in the small in-degree region can be well fitted by the generalized Poisson distribution
(Figure 3a).

4. Conclusion

A model of scientific citation networks with linearly growing node-increment is pro-


posed, in which the influence mechanism and the interdisciplinary citation mechanism
are involved. Under appropriate parameters, the formula of the modeled network’s in-
degree distribution is derived, and it shows a similar behavior to the empirical data in the
small in-degree region and a scale-free tail in the large in-degree region. Different from
most previous models that just study the forepart of the out-degree distribution of the
empirical data, this model also captures the fat tails. The model vividly characterizes the
academic influence power of papers by geometric zones, and reproduces the scale-free
tails of citation networks’ in-degree distributions by the papers’ inhomogeneous influ-
ence power. Therefore, it is believed that this model is a suitable geometric tool to study
the citation networks.

References

[1] YH Eom, S Fortunato, Characterizing and modeling citation dynamics. PLoS ONE 6 (2011): e24926.
[2] DJ Price, Networks of scientific papers. Science 149 (1965): 510-515.
[3] DJ Price, A Generai Theory of Bibiiometric and Other Cumulative Advantage Processes. J Am Soc Inf
Sci Technol 27 (1976): 293.
[4] GJ Peterson, S Pressé, KA Dill, Nonuniversal power law scaling in the probability distribution of scien-
tific citations. Proc Natl Acad Sci USA 107 (2010): 16023-16027.
[5] MEJ Newman, Clustering and preferential attachment in growing networks. Phys Rev E 64 (2001):
025102.
[6] Z Xie, ZZ Ouyang, PY Zhang, DY Yi, DX Kong, Modeling the citation network by network cosmology.
PLoS ONE 10 (2015): e0120687.
[7] J Leskovec, J Kleinberg, C Faloutsos, Graph Evolution: Densification and Shrinking Diameters. ACM
TKDD 1 (2007): 2.
[8] J Gehrke, P Ginsparg, J Kleinberg, Overview of the 2003 KDD Cup. SIGKDD Explorations 5 (2003):
149-151.
[9] A Clauset, CR Shalizi, MEJ Newman, Power-law distributions in emprical data. SIAM Rev 51 (2009):
661-703.
612 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-612

Complex System in Scientific Knowledge


Zong-Lin XIE 1 , Zheng XIE, Jian-Ping LI and Xiao-Jun DUAN
National University of Defense Technology, Changsha, Hunan, 410073, China

Abstract. The research of complex systems is widespread in scientific fields, which


mainly investigates how relationships among individual parts generate the collec-
tive behaviors of a system. This paper aims to analyze the effects of complex sys-
tem in scientific knowledge by using the co-word occurrence analysis to the cor-
pus of papers published in PNAS in the years 1999-2012. Surprisingly, the per-
centage of the papers containing ‘complex’ and ‘system’ contributing to the total
accounts for 47%. The major topics of complex system, such as complex network,
emergence, complexity, uncertainty also show growth trends. Some topic words in
the social, biological, and physical sciences are selected by domain experts and are
mapped into networks respectively based on their co-occurrence frequency. Find-
ing is that ‘complex’ and ‘system’ are the core nodes in each network, which shows
the importance of complex system in every research field.

Keywords. complex system, complex network, system science, data mining

Introduction

Since the concept of general systems was initially hinted by von Bertalanffy in the 1920s,
the contents of systems science have covered a very wide range of topics, including:
human behavioral systems, social systems, mechanical systems, computer and network
systems, intelligent systems, simulation systems, biological systems, aerospace systems,
the earth system, commercial systems, administrative systems, etc[1]. The research of
complexity is one of the foci of systems science[2,3].
The complex system consists of the interconnected or interwoven parts. To under-
stand the behavior of a complex system one must understand not only the behaviors of
the parts, but also how they act together to form the behavior of the whole[4]. Different
methods are established to resolve various kinds of complexity in different systems. Op-
erations research, information theory, control theory and cybernetics, dissipative struc-
tures, synergetics, complex networks and so on offer theories and methods to investigate
natural and social systems.
This paper aims to analyze the role of complex system in scientific knowledge. The
Proceedings of the National Academy of Sciences(PNAS, http://pnas.org) is an impor-
tant scientific journal and knowledge dataset that publishes highly cited research reports,
commentaries, reviews, perspectives and letters. A data set comprising the set of 52,025
papers published in the PNAS in the years 1999-2012 is used to achieve the aim. It has
1 Corresponding Author: Zong-Lin Xie, National University of Defense Technology, Changsha, Hunan,

410073, China; E-mail: 845640593@qq.com.


Z.-L. Xie et al. / Complex System in Scientific Knowledge 613

been analyzed and modeled by us to some extent[5]. Amazingly, the percentage of the
papers containing both ‘complex’ and ‘system’ contributing to the total papers touches
47%. The research into the major topics of complex system, such as complex network,
emergence, complexity, and uncertainty also shows similar upward trends.
Coverage in PNAS broadly spans the biological, physical, and social sciences. We
can finely analyze the effects of complex system on research topics in each sciences.
Co-word occurrence analysis is a content analysis technique and is used here to identify
the strength of associations between those topic words based on their co-occurrence in
the same document. The resulting co-occurrence frequency matrix can be expressed by
a weighted network. We find that the words ‘complex’ and ‘system’ are the core nodes
in the co-words network of each science, which shows the important role of complex
system in every research field.

1. The growth trend of complex system in knowledge

The co-occurrence frequency is used to measure the strength of relationships between


words. K. Börner et al used this measurement to analyze the PNAS data from 1982-2001
and achieve interesting results[6]. In the corpus used here, the co-occurrence frequency
of ‘complex’ and ‘system’ is 24,657 and has the same increasing trend as that of the pa-
pers(Figure 1(a)). It means that ‘complex’ and ‘system’ closely related and used widely
and frequently in research. ‘Model’ and ‘control’ are two main topic words in complex
system and also in other research fields, the frequencies of which touches 37,432 and
36,117 respectively. The increasing trends of those words are also the same as that of the
number of papers. It is clear that research into the major topics of complex system, such
as emergence, uncertainty and feedback is strongly coupled and shows similar upward
trends(Figure 2(a)).
Specifically, the research advance of complex system has mostly appeared in the sci-
entific area of complex networks. It indicates that many realistic complex systems can be
investigated by abstracting into network structures without specifying the functionalities,
scales, and other characteristics of the original systems. And such theoretical generali-
ty allows researchers from different fields to employ a unified framework to investigate
their respective networks. The study of complex networks is a young and active area of
scientific research inspired largely by the empirical study of real-world networks such
as computer networks and social networks. As A. L. Barabási said, data-based mathe-
matical models of complex systems are offering a fresh perspective, rapidly developing
into a new discipline: network science[7]. In the corpus, the percentage of the papers
containing both ‘complex’ and ‘network’ contributing to the total papers touches 13.5%.
And the frequency of this co-words is in accelerated growth(Figure 1(a)).
The network has become a part of nature, human life and technology. Understanding
of natural or technological systems is reflected in our ability to control them[8]. Hence
network controllability becomes an important subject in system science, which is con-
cerned about the structural controllability of a network. We find that the trend of fre-
quency of co-words ‘complex’ and ‘network’ is strongly coupled with and shows similar
upward trends to that of ‘control’ and ‘network’(Figure 2(b)).
The research into those major topics of complex network, such as the small-world
property, scale-free property, clustering, and community structure, is strongly coupled
and shows similar upward trends(Figure 2(b)).
614 Z.-L. Xie et al. / Complex System in Scientific Knowledge

6000 6000
(a) (b)
Number of papers Number of papers
5000 5000
Complex system Control
4000 4000
Complex network

Frequency
Model
Frequency

3000 3000

2000 2000

1000 1000

0 0

Year Year

Figure 1. Panel(a) shows the frequency trend of the co-occurrence of ‘complex’ and ‘system’ and that of
‘complex’ and ‘network’. Panel(b) shows the frequency trend of the words ‘model’ and ‘control’.

800 1000
(a) Complex network (b)
Complexity
700 900
Feedback 800
Network control
600
Emergence 700 Network community
500 Nonlinear
Frequency

600 Small world network


Frequency

400 Uncertainty 500 Scale free network


400
300
300
200
200
100 100

0 0

Year Year

Figure 2. The frequency trend of some hot topics in complex system (Panel(a)) and in complex network
(Panel(b)). It can see that compared with network structure, network control has received more attention of
researchers.

2. Complex system in main disciplines

In order to explain explicitly the general and fundamental postulates of different system-
s jointly, be they natural or social, the complex systems or complexity provides one of
those unified theories to describe the intrinsic mechanism underlying the systems and the
interactions between these individual systems and their outside environments. In com-
plex systems field, although derived from a specific discipline, many methodologies offer
a possible way to unify physical, biological and social systems.
In the corpus, there are 40,262 biological papers, which account for 77% of the total.
Meanwhile, PNAS recruits papers and publishes special features in the physical(8,997
papers in the corpus, including 3,647 papers in biophysics) and social sciences(1,192
papers in the corpus).
By the analysis above, the PNAS data can be mainly divided into three categories:
biological, physical, and social sciences. Now we use the co-word technology to analyze
the relationship between complex system and the topics in those sciences respectively.
Firstly, we use Natural Language Toolkit(NLTK, www.nltk.org) to build the wordlist
for the corpus by selecting nouns and the words whose synsets contain nouns. Secondly,
we divide most of the papers(46,766 papers) in the corpus into three overlapped cluster-
ings, i.e., biological, physical, and social sciences, based on the discipline tags of the pa-
pers given by the authors. The others are tagged as acknowledgment, letter, symposium,
etc. Thirdly, for each paper clustering, we selected the highly frequent words, which oc-
Z.-L. Xie et al. / Complex System in Scientific Knowledge 615

cur in more than 10% paper of the total in the clustering. Fourthly, the domain experts in
each science select the most meaningful words as topic words based on their professional
knowledge.
The resulting co-occurrence frequency matrix can be expressed by a weight-
ed network. The layouts depicted in Figure 3,4,5 are generated by the free soft-
ware Gephi(www.gephi.github.io) and using the force directed algorithm proposed by
Fruchterman and Reingold[9]. We find that the words ‘complex’ and ‘system’ are the
core nodes in the co-words network of each science(Figure 3,4,5), which reveals the im-
portance of complex system in every research field. In what follows, we briefly discuss
the role of complex system in each science.
Social complexity and its emergent properties are central recurring themes through-
out the historical development of social thought and the study of social change. The ear-
ly founders of sociological theory, all examined the exponential growth and increasing
interrelatedness of social encounters and exchanges. This emphasis on inter-connectivity
in social relationships and the emergence of new properties within society is found in
theoretical thinking in multiple areas of sociology. Complexity in the social and behav-
ioral sciences referring mainly to a complex system is found in the study of modern or-
ganizations and management studies. The focus of related object of study in topics of
social complexity is also shown in Figure 3.
policy
evolution
child bias
technology

regression
family food
adult person
risk
animal
world water
nature society
behavior control right
brain
poor information prediction
statistic
life
increase
population environment
female
male
country people
system outcome
human
national
social mechanism technique

development
state network
movement
region relationship
knowledge history
challenge
organization complex natural
dynamic
thought
capacity community

growth specie
rise
health

Figure 3. Co-word network of the highly frequent and meaningful words based on 1,192 social science papers
in the PNAS data(1999-2012).

We randomly select 10 papers from the 450 ones, which belong to social science
and contain ‘complex’ and ‘system’. Those researches are about the women’s under-
representation phenomenon, the Maya abandonments reason, the collaboration in social
networks, the phenomenon of territorial expansion, the emergence of segregation in so-
cial network, climate change mitigation, emerging sign language, the factors on gene
expression, the Climate negotiations, and dynamics of relationship between two peoples.
Complexity of biological science, not only lies in emergence and evolution of or-
ganisms and species , but also in both structure and function in biological organisms,
with emphasis placed on the complex interactions and the fundamental relations and re-
lational patterns that are essential to life. A complete definition of complexity for indi-
vidual organisms, species, ecosystems, biological evolution and the biosphere is still an
ongoing issue. Specifically, the topics related to complex systems biology include but
616 Z.-L. Xie et al. / Complex System in Scientific Knowledge

are not limited to, DNA, structure and function, organisms and species relations and evo-
lution, interactions among species, evolution theories, self-reproduction, computational
gene models, autopoiesis, protein folding, cell signaling, signal transduction network-
s, complex neural nets, genetic networks. Many related keywords are embodied in the
Figure 4.

injection organism native

edta male

behavior electron
rabbit virus
ligand
energy
allele patient
matrix
infection
phosphate adult
nature motif drug spectrum
chemical
rat blood proliferation

brain complexmodel band tumor


chain water acid protein
environment
fluorescent
gene cell cancer
stress
antibody
body
population system expression death sodium
serum mouse human
network
basis animal dna nucleus
information microscopy
life disease
treatment genome
evolution
substrate enzyme mammalian
signaling
nucleotide fragment
strain escherichia
rna cdna
peptide
manner inhibition mrna fusion
generation
kinase
promoter clone
incubation plasmid
encoding

Figure 4. Co-word network of the highly frequent and meaningful words based on 40,262 biological science
papers in the PNAS data(1999-2012).

We randomly select 10 papers from the 19,515 ones, which belong to biological sci-
ence and contain ‘complex’ and ‘system’. Those researches are concerned with the ge-
netic code expansion, the dynamics in an RNA virus, a disease treatment, the mechanism
of DNA transcription, the effect of the expression of intracellular signaling molecules,
study the behavior of insects, the changes of gene expression, the influencing factors
of bacteria growth, the neural processing of emotional faces, and yeast commitmen-
t complex formation. In the physical world, the classical dynamics presents a reversible
and symmetric image. Nowadays, many methodologies from physics offer possible ap-
proaches to unify natural and social systems. Some examples include classical mechanic-
s, friction, patterned ground, statistical mechanics, electrical networks, temperature, and
convection. Most of the laws of physics themselves appear to be the most fundamental
principles in the universe, raising the question of what might be the most fundamental
law of physics from which all others emerged.
The co-words network(Figure 5) shows the interest in physical complexity nowa-
days. Significantly, as Figure 5 shows, some biological topics, such as protein, cell and
molecule, are also hot topics in physical science, because they are the hot topics in bio-
physics, whose papers account for 42% of the total physical science papers in the corpus.
We randomly select 10 papers from the 4,703 ones, which belong to physical science
and contain ‘complex’ and ‘system’. Those researches are related to the control of pro-
tein crystal nucleation, intermolecular forces, permeation mechanism of a mammalian
urea transporter, dynamic force spectroscopy of adhesion bonds, Nucleic acid-triggered
catalytic drug release, the biological macromolecules, the lipid matrix and tensile force,
the protein-protein interactions, the Coupling of protein and hydration-water dynamics
in biological membranes, and Molecular dynamics simulations.
Z.-L. Xie et al. / Complex System in Scientific Knowledge 617

proton
statistic
ray mass

ion atom amino geometry


hydrogen electron fluorescence
helix
environment
kinetics equation spectrum combination
nature membrane
crystal force
reactionwater space human
carbon
curve protein
backbone
spectroscopy mechanism
gas information dynamic enzyme
chemical complex population
system molecule cell
network
chemistry gene
energy signal
nmr modeling
particle
surface algorithm
complexity
laser evolution
diffusion oxygen matrix
dna
wave
mutant
imaging polymer
uncertainty pulse
perturbation
organic

Figure 5. Co-word network of the highly frequent and meaningful words based on 8,997 physical science
papers (including 3,647 biophysics papers) in the PNAS data(1999-2012).

3. Conclusion

To show the role of the complex systems in scientific knowledge, we apply the co-word
occurrence analysis to the corpus of papers published in PNAS 1999-2013. Surprisingly,
the percentage of the papers containing ‘complex’ and ‘system’ contributes to the total
touches 47%. The papers containing ‘complex’ and ‘network’ account for 13.5% of the
total. The research about the major topics of complex systems, such as complex network,
emergence, complexity, uncertainty also shows upward trend. The papers in the corpus
mainly belong to the biological, physical, and social sciences. We further analyze the
effect of complex system on research topics in those sciences respectively. The frequent
and meaningful words are selected as topics by domain experts. The co-occurrence fre-
quency matrix of those topic words is expressed by a weighted network. We find that
‘complex’ and ‘system’ are the core nodes in these co-word networks of all sciences.
The phenomenon shows the importance of complex system in every research field.

References

[1] D. B. Kenneth, Fifty Years of Systems Science: Further Reflections, Systems Research and Behavioral
Science, 22 (2005), 355-361.
[2] D. Chu, R. Strand, R. Fjelland, Theories of complexity, Complexity, 8 (2003), 19-30.
[3] D. Harel, Statecharts: a visual formalism for complex systems, Science of Computer Programming, 8
(1987), 231-274.
[4] Z. Xie, X. Duan, Z. Ouyang, P Zhang, Quantitative Analysis of the Interdisciplinarity of Applied Math-
ematics, Plos One, 10 (2015), e0137424.
[5] Z. Xie, Z. Ouyang, J. Li, A geometric graph model for coauthorship networks, Journal of Informetrics,
10 (2016), 299-311.
[6] K. K. Mane, K. Börner, Mapping Topics and Topic Bursts in PNAS, Proceedings of the National Acade-
my of Sciences of the United States of America, 101 (2004), 5287-5290.
[7] A. L. Barabási, The network takeover, Nature Physics, 8 (2012), 14-16.
[8] Y. Y. Liu, J. J. Slotine, A. L. Barabási, Controllability of complex networks, Nature, 473 (2011), 167-
173.
[9] T. M. J. Fruchterman, E. M. Reingold, Graph drawing by force-directed placement, Software-Practice
and Experience, 21 (1991), 1129-1164.
618 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-618

Two-Wavelength Transport of Intensity


Equation for Phase Unwrapping
Cheng ZHANG a,1, Hong CHENG a, Chuan SHEN a, Fen ZHANG a, Wen-Xia BAO a,
Sui WEI a, Chao HAN b, Jie FANG c, and Yun XIA d
a
Key Laboratory of Intelligent Computing & Signal Processing, Anhui University,
Hefei 230039, China
b
College of Electrical Engineering, Anhui Polytechnic University, Wuhu, 241000,
China
c
College of Mechanical and Electronic Engineering, West Anhui University, Lu'an
237012, China
d
Local Taxation Bureau of Anhui Province Hefei, 230061, China

Abstract. A novel method based on two-wavelength transport of intensity


equation (TW-TIE) is proposed for phase unwrapping under large-step
discontinuities condition. With a wrapped phase obtained by interferometry setup,
two different wavelength illuminations are used to obtain two different intensity
images, then TW-TIE is applied to phase unwrapping. Numerical experiments are
presented to demonstrate the feasibility and effectiveness of our proposed method.

Keywords. phase unwrapping, transport of intensity equation, two wavelength


illuminations, fast Fourier transform solvers

Introduction

In various types of optical metrology, such as digital holography [1], digital speckle
pattern interferometry [2], X-ray imaging [3] and so on, an important problem exists
that the phase recovered from the intensity suffers from wrapping caused by the
arctangent function, which leads the obtained phases within the interval [-π, π] as
modulo 2π of the true phases. The inverse process, called phase unwrapping [4], is
aimed to recover the missing integer multiple of 2π from wrapped phase map to
remove the discontinuities in the phase map.
Many phase unwrapping algorithms can be used to deal with phase unwrapping
problem, such as Branch-cut algorithm [5], quality-guided method [6], and so on.
These methods can be categorized into spatial phase unwrapping (SPU). In general, for
these SPU methods, an unwrapped phase can be derived from single wrapped phase
map only, according to the neighborhood characteristic of the phase value of the pixel.
However, these methods have a common limitation, which cannot deal with the case
that contains discontinuities, especially fails for large-step objects and isolated objects.
Compared to SPU methods, another type of phase unwrapping method-temporal
phase unwrapping (TPU) have been reported [7], for the purpose of resolving the

Corresponding Author: Cheng ZHANG, Key Laboratory of Intelligent Computing & Signal Processing,
Anhui University, Hefei 230039, China; E-mail: question1996@163.com.
C. Zhang et al. / Two-Wavelength Transport of Intensity Equation for Phase Unwrapping 619

wrapped phase of the objects with large-step discontinuities and separations. Usually,
more than one wrapped phase maps or additional black/white pattern are utilized to
provide additional information about the fringe patterns. The most competitive
advantage of TPU is that it is capable of large-step objects and isolated objects analysis,
and noise uncertainties don’t spread during the entire unwrapping process.
In this paper, a novel phase unwrapping method based on two-wavelength
transport of intensity equation (TW-TIE) is proposed. Different from classical transport
of intensity equation (TIE), two illumination with different wavelengths, rather than
two different propagation distances, are used to obtain two different intensity images,
which benefits from no precise translation needed. Numerical demonstrations are given
to show the effectiveness of our proposed method.

1. Principles

Under the paraxial approximation, classical TIE can be derived from the parabolic
wave equation [8]:
’ ˜ [I z (r )’ z (r )] k wI z (r )/wz (1)

where, z denotes the optical axis for light propagation,


r (x, y) denotes the
spatial coordinates in the x y plane, orthogonal to the z axis. The wave number
k 2 / and is the wavelength of light source,
’ (wx, wy) represents the

2D differential operator in the x  y plane.


I z (r ) and z (r ) represent the intensity
and the phase function at the detector plane.
Many algorithms have been proposed to resolve the phase function z (r ) from
the calculated longitudinal derivation (wI /wz ) , such as fast Fourier transform (FFT-
TIE) [9,10], Green function method (GF-TIE) [8], and so on. The most widely used
method for solving the TIE is the FFT-TIE, firstly reported by Gureyev et al [9, 10].
Under the assumption that the object is a pure phase object with unit intensity at
the focus plane, i.e. I z (r ) 1 for all pixels, then the simplified version of standard
TIE can be solved using FFT-based Poisson solver of the form:
(r ) 1
-1
{k {wI z (r )}
} / [4 2(fx2  fy2 )]} (2)
where fx and fy denote the 2D reciprocal space coordinate in the Fourier space,
respectively, and symbol ˜ and
{} ˜ are 2D Fourier transform and its inverse,
{} -1

respectively. Note that the FFT-based solver in Eq.(2) have a very fast implementation
and memory efficient due to two FFTs required only.
620 C. Zhang et al. / Two-Wavelength Transport of Intensity Equation for Phase Unwrapping

2. Two-wavelength Transport of Intensity Equation for Phase Unwrapping

In digital holography and other interferometry setup, a wrapped phase w can be


(r )
extracted from one or more interferograms, and then an unwrapped phase map can be

resolved from wrapped phase w (r ) to eliminate the discontinuities. Firstly, a

auxiliary function of optical field


uo (r; z 0) is constructed as follows
uo (r; z 0) exp[i (r )] (3)
In TIE, the longitudinal derivation can be calculated using the simple first order
finite difference method in the following formula:
wI z (r )/wz | [I (z  'z )  I (z )]/'z (4)
For our purpose, two different intensities can be obtained through free-space
propagation, which can be calculated using angular spectrum method (ASM). The
transfer function for two-wavelength could be expressed as

H (fx , fy ; z, i ) exp ª«i(2


( / i )z 1  [( i fx )2  ( i fy )2 ] º» , i 1,2
1 (5)
¬ ¼
The transfer functions for two wavelengths 1 and 2 are closely related, the
relationship between wavelength 1 and 2 can be derived as follows

H (fx , fy ; z, 2 ) e p ª«i(2
exp ( / 2 )z 1  [( 2 fx )2  ( 2 fy )2 ] º»
¬ ¼
ª
¬«

| exp i(2 / 1 )z 1  ' 21 / 2 1  [( 1 fx )2  ( 1 fy )2 ] º
¼»
(6)

=H (fx , fy ; z ', 1 )

Here,
z' z(1  ' 21 / 2 ) . Under proper approximation, the transfer function for

wavelength 2 can be approximately equivalent to transfer function


H (fx , fy ; z, 1 )
for wavelength 1 with a different distance z ' . Here, with some parameters neglected,
g

two different intensity images correspond two different wavelengths 1 and 2 are
obtained as follows:
I ( 1) | 1
( [u0 ].
] u H (z, 1 ))|2 (7)

I( 2) | 1
( [u0 ]. u H (z, 2 ))| | | 2 1
( [u0 ]. u H (z ', 1))| 2
(8)

For intensities I(( 2 ) , which can be considered as the defocused intensity images
for the wavelength 1 with different distance z ' . Then, according the Eq.(4), The
longitudinal derivation for two different wavelengths can be expressed as:
C. Zhang et al. / Two-Wavelength Transport of Intensity Equation for Phase Unwrapping 621

wI z (r ) I ( 1 )  I ( 2 ) I (z, 1 )  I (z  z ' 21 / 2 , 1 )
| | (9)
wz 'z ' 'z '
Once the Eq.(9) is computed, the phase distribution could be recovered using
various TIE-solver, i.e., the unwrapped phase is recovered finally.

3. Numerical Simulation

In this section, numerical examples are presented to verify the feasibility and
effectiveness of TW-TIE for phase unwrapping, which is tested with two numerical
synthetic phase fields.
In first simulation, a 128×128 noiseless unwrapped phase is generated using the

Matlab peaks function as the true phase profile true


(r )
without phase unwrapping,
shown in Figure 1(a). The phase range is about ten times bigger than the wavelengths.
The transverse pixel size dx × dy = 5μm × 5μm. In Figure 1(b), the wrapped phase
w (r ) is obtained by the wrapped function modulo 2π from the true phase. Next, a

complex auxiliary field o


u (r )
is constructed, and two intensity images corresponding
two different wavelengths are calculated using Eq. (7) and Eq. (8), and shown in Figure

1(c) and Figure 1(d), respectively. Next, the longitudinal derivation z


is
wI (r )
computed using Eq. (9) and shown in Figure 1(e), and the final unwrapped phase is
computed using Eq. (2), as shown in Figure 1(f). For comparison, the result of Branch-
Cut method is shown in Figure 1(g).

Figure 1. Simulation of phase function of 'peak' function.


622 C. Zhang et al. / Two-Wavelength Transport of Intensity Equation for Phase Unwrapping

Figure 2. Simulation of phase function of standard 'Peppers' image.


In order to further illustrate the superiority of TW-TIE, a more complex phase is
tested. In Figure 2(a), original phase is generated from the standard 'Peppers' image
with the phase in the interval of [-4π, 4π] and corresponded phase is shown in Figure

2(b). Figure 2(c) and Figure 2(d) show two intensities


I(( 1 ) and
I(( 2 ) for two

different wavelengths 1 and 2 at the propagation distance z = 10 μm. Figure 2(e)

is the longitudinal derivation using the finite difference of 1


and
I(( )
2
. Figure
I(( )
2(f) and Figure 2(g) are the final unwrapped phase using TW-TIE and classical Branch-
cut phase unwrapping method, respectively. Figure 2(h) gives one of intersecting line
in the two unwrapped phase compared with the original phase.
As observed in Figure 1 and Figure 2, one can find that our proposed TW-TIE
method can successfully achieve phase unwrapping both in simple and complex cases.
In simple case, both methods perform very well. However, in more complex case,
block artifact can be clearly find in the Branch-Cut method, due to the complex
structure of phase function, our proposed TW-TIE performs very well again.

4. Conclusion

A novel phase unwrapping method based on two-wavelength transport of intensity


equation is presented. Numerical simulations are given to illustrate the effectiveness of
this method. As comparison with classical Branch-cut phase unwrapping method, the
effectiveness of our proposed TW-TIE method is demonstrated, especially the
superiority in the complex case. Further studies can be extended to obtain more
intensity images to enhance the quality of unwrapped phase map.
C. Zhang et al. / Two-Wavelength Transport of Intensity Equation for Phase Unwrapping 623

Acknowledgements

This Research is partially supported by National Natural Science Foundation of China


(Grant Nos. 61301296, 61302179, 61377006, 61501001); Starting Research Fund of
Anhui University (No. 33190218); Natural Science Foundation of Anhui Province (Nos.
1508085MF121, 1608085QF161); Natural Science Project of Anhui Higher Education
Institutions of China (Grant Nos. KJ2016A029, KJ2015A114).

References

[1] D. Parshall, M. K. Kim. Digital holographic microscopy with dual-wavelength phase unwrapping.
Applied Optics, 45(2006), 451-459.
[2] B. Bhaduri, N. K. Mohan, M. P. Kothiyal, et al. Use of spatial phase shifting technique in digital speckle
pattern interferometry (DSPI) and digital shearography (DS). Optics Express, 14(2006): 11598-11607.
[3] A. Momose. Recent advances in X-ray phase imaging. Japanese Journal of Applied Physics, 44(2005):
6355-6367.
[4] J. C. Estrada, M. Servin, J. Vargas. 2D simultaneous phase unwrapping and filtering: A review and
comparison. Optics and Lasers in Engineering, 50(2012): 1026-1029.
[5] R. M. Goldstein, H. A. Zebker, C. L. Werner. Satellite radar interferometry: Twódimensional phase
unwrapping. Radio science, 23(1988): 713-720.
[6] M. Zhao, L. Huang, Q. Zhang, et al. Quality-guided phase unwrapping technique: comparison of quality
maps and guiding strategies. Applied optics, 50(2011): 6214-6224.
[7] A. Davila, J. M. Huntley, C. Pallikarakis, et al. Simultaneous wavenumber measurement and coherence
detection using temporal phase unwrapping. Applied optics, 51(2012): 558-567.
[8] M. R. Teague. Deterministic phase retrieval: a Green’s function solution. JOSA, 73(1983): 1434-1441.
[9] T. E. Gureyev, K. A. Nugent. Rapid quantitative phase imaging using the transport of intensity equation.
Optics Communications, 133(1997): 339-346.
[10] D. Paganin, K. A. Nugent. Noninterferometric phase imaging with partially coherent light. Physical
review letters, 80(1998): 2586-2589.
624 Fuzzy Systems and Data Mining II
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.
doi:10.3233/978-1-61499-722-1-624

A Study of Filtering Method for Accurate


Indoor Positioning System Using Bluetooth
Low Energy Beacons
Young Hyun JINa, Wonseob JANG a, Bin LIa, Soo Jeong KWONa, Sung Hoon LIMa
and Andy Kyung-yong YOONa,b,1
a
YAP Company Inc, R&D Centre, Seoul, Korea.
b
Yonsei University, School of Electrical & Electronic Eng., Korea.

Abstract. Fingerprinting technique is an essential element in the indoor


positioning system (IPS). Common methods utilize Wi-Fi signals. However, most
of the Wi-Fi, because it is pre-installed for other purposes, is difficult to fully used
for fingerprinting. Bluetooth beacon fingerprinting (BBF) can be used to overcome
the drawbacks of the Wi-Fi fingerprinting. With BBF cost reduction is possible,
and it also has advantages that can flexibly correspond to the environment
variation. This study determines whether it is possible to achieve IPS with
implementation of BLE beacon-based fingerprinting. This study presents an
optimized fingerprinting method for building a fingerprinting data. In order to
increase the accuracy, we applied filtering and probabilistic smoothing technique
of fluctuation correction to necessary part of the signal. Result of the study
proposes that BLE beacon fingerprinting can be a replacement for Wi-Fi based
fingerprinting.G

Keywords. beacon, fingerprint, indoor positioning, Bluetooth Low Energy(BLE),


radio map

Introduction

Bluetooth Low Energy (BLE) beacon technology refers to protocol-based short-range


wireless non-contact recognition technology in Bluetooth version 4.0. Recently
commercialization of BLE beacon devices such as iBeacon has been put to practical
use, and several companies have developed indoor positioning system (IPS) using
Bluetooth device infrastructure[1-2]. As a way to implement the IPS, Cell-ID,
trilateration, and fingerprinting system are most widely known methods. Recently
commercialized BLE beacon device is convenient to carry, and has been used in many
areas due to its low cost advantage. BLE beacon indoor positioning system, compare to
existing similar systems, has the advantage of lower power consumption over Wi-Fi,
and higher coverage over RFID. In addition, compatibility with smartphones and
various devices is higher than Zigbee.
The cost of Bluetooth beacon IPS can be relatively expensive mainly because
installation of new infrastructure is required, whereas Wi-Fi doesn’t require additional

1
Corresponding Author: Andy Kyung-yong YOON, CTO, Professor, 623 Gangnamdae-ro Seocho-gu, Seoul
06524 Korea; E-mail: xperado@yonsei.ac.kr.
Y.H. Jin et al. / A Study of Filtering Method for Accurate Indoor Positioning System 625

installation in most cases. Also, RSSI signal at fixed position, depending on the
environment, can be fluctuated greatly by noise. Recently many studies have been
performed in order to improve the error value of the RSSI. Most applied methods are
Bayesian recursive algorithm [3], Kalman filter [4], and weighted signal [5]. As a
method for position tracking, there are TOA, TDOA, AOA, and RSSI. However, the
remaining schemes, excluding the RSSI, require antenna for position tracking, thus not
suitable for location tracking utilizing Bluetooth. On the other hand, the characteristics
of the RSSI that does not require synchronization between devices is suitable for
location tracking using BLE beacon [6]. This paper studies accuracy of IPS using
Kalman filtered beacon fingerprinting. This paper also studies how Gaussian filter
applied fingerprint database update affects the positioning accuracy.

1. Review of Related Studies

1.1. Positioning Technique

As a method to implement the IPS, trilateration and signal fingerprinting have been
widely known. Trilateration technique utilizes geometry of signal angle, a method for
determining the relative position of the object. Trilateration has an advantage that it is
possible to easily calculate position in geometry. However, there is a disadvantage that
it requires additional filtering by signal distortion [7]. Fingerprinting technique takes
advantage of unique RSSI to specific location. To track location, first RSSI is collected
in offline phase to build signal fingerprint database. Then in online phase received
signal pattern is compared to the collected fingerprint database to find the most similar
signal pattern [8].

1.2. Kalman Filter

Kalman filter is a probability based inference methods for estimating the continuous
state. The filter has role of removing the noise from the measured data according to
time, relieves the uncertainty contained in the measurement data. It is used in computer
vision, robotics, radar, and other various fields. Also, Kalman filter is used in the study
of indoor positioning technology. In this study, Kalman filter is applied to receive
signals from access point when constructing fingerprint database.

1.3. Gaussian Smoothing

Gaussian smoothing filter is used to remove noise using a Gaussian distribution. The
basic principle of the method is to give more weight to the kernel and less weight to
around the kernel as the distance increases. Therefore high probability values of the
sample are taken with the exclusion of the sample of low probability values. In the
experiments, fingerprint database is created by applying a Gaussian filter to the data
collected in offline training phase. Gaussian interpolation is used to estimate signal data
at location where collection didn’t occur [9].
626 Y.H. Jin et al. / A Study of Filtering Method for Accurate Indoor Positioning System

2. Proposed Method

2.1. Fingerprinting Technique

Kalman filter is applied to RSSI collected from reference point (RP). Fingerprint
database is constructed using average signal values. Beacon signal strength collected
from all RPs comprise fingerprint database. ,n represents beacon RSSI collected
from nth beacon of mth RP. Eq. (1) represents structure of the fingerprint database.

భǡభ మǡభ ೘ǡభ


భǡమ మǡమ ೘ǡమ
Ǥ భǡమ మǡయ ೘ǡయ
Ǥ Ǥ 

Ǥ
  ൌ
Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ Ǥ (1)

Ǥ Ǥ Ǥ
Ǥ Ǥ Ǥ

ǡ  ǡ ǡ

Eq. (2) is process for finding Nearest Neighbor (NN).  represents average RSSI
of fingerprint at position i. This value is composed of a 9-dimensional vector. 
means average RSSI of Test Point (TP) at position k, also composed of a 9-dimensional
vector. Then it finds smallest  result from minimized Euclidean distance of TP and
RP.

 ‫ | ׷‬െ  |  ൌ ͳǡʹǡ͵ǡ ǥ ͵͸ǡ ሺ ൌ ͳǡʹǡ͵ǡ ǥ ͳ͸ሻ (2)

2.2. Gaussian Smoothing

Determine the value of σ in proportion to the distance from the kernel. Obtain signal
data around the kernel necessary for smoothing in accordance with the value σ. Apply
smoothing to RP signal of the data in kernel using signal data calculated in the previous
step. Lastly, update the fingerprint database. The following is normal distribution
formula for Gaussian smoothing.

 ሺ ሻమ
ǡ    ൌ ‡š’ െ  (3)

√ 

3. Design for Experiment

For the experiment, beacons deployment layout is designed. There are 3 rows of 3
beacons, total of 9 beacons are placed in 14 meters by 14 meters indoor environment as
shown in Figure 1. The distance between each row is 7 meters. Reference Points are
located inside 6 meters by 6 meters beacon area. The shortest distance of RPs is about
2.33 meters. Test Points are randomly selected inside the experiment area. At least 3
TPs should be inside of each of 4 beacon square area as shown in Figure 1. The device
to retrieve Bluetooth RSSI data at each RPs must be identical. Duration of retrieval
should be long enough at every RPs to yield substantial result. The height of collection
device should be fixed. Scanned data may be filtered for noise reduction.
Y.H. Jin et al. / A Study of Filtering Method for Accurate Indoor Positioning System 627

Figure 1. Design of Beacon Deployment Layout

Figure 2. RP Deployment

4. Experiment

4.1. Experimental Setup

The experiment was conducted in a school gymnasium as show in figure 2 in


Gwacheon Foreign Language High School. Beacons, RPs and TPs were placed as it is
described in Section 3. The 16 TPs, represented as blue star were randomly placed as
shown in Figure 1. In each and all points, RSSI was collected for at least 20 seconds
using the same smartphone. Bluetooth beacons used for this experiment are BS-
LBB724 of YAP Company. BS-LBB724 is battery type BLE Beacon (Bluetooth 4.0).
628 Y.H. Jin et al. / A Study of Filtering Method for Accurate Indoor Positioning System

Beacons’ TX power was set to 0dbm (Lv6). Bluetooth RSSI collected ranges from -50
to -90. Mobile device is running Android 5.1, namely a Samsung Galaxy note 5
smartphone with a 2.1 GHz ARM Cortex-A57 processor and 4 GB RAM was used for
this experiment.

4.2. Offline Phase

 Kalman Filter


Beacon RSSI was collected for the fingerprinting. Received beacon RSSI are unstable.
Kalman filter was applied to stabilize the received RSSI. As the result of filtering,
RSSI data fluctuation was decreased. Thus, suspicious RSSI values were adjusted as
shown in Figure 3.
90
Before Apply Kalman
85 After Applied

80

75
RSSI

70

65

60
0 10 20 30 40 50 60 70 80
Count of Incoming Signal
Figure 3. Beacon RSSI measurement for Kalman filter

Figure 4. RSSI Distribution Heat map


Y.H. Jin et al. / A Study of Filtering Method for Accurate Indoor Positioning System 629

 RSSI Distribution Analysis


Beacon RSSI distribution heat map was generated using the data collected from each
RPs as shown in Figure 4. The 9 heat maps is all same area, and each map shows
different heat situation by beacon’s installation position. Each heat map shows the
coverage by dark blue beacon. Darker color represents strong RSSI. Characteristics of
collected RSSI were analyzed using the heat map. It is confirmed that decrease in
signal strength is not proportional to the increase in distance away from the beacon
device. Data showed that RSSI is affected by time and spatial environment even if the
source is the same.

4.3. Online Phase

In this experiment, TP represents location of actual target. One of signal data set collect
from the TP was selected randomly. Selected signal data set was compared with
previously collected RP data using the RSSI difference. Then most closely matched RP
was represented as positioning results. The success ratio of two different test cases was
measured for the analysis. The first case is designed to test if a TP is inside of a 3.2
meter diameter cell, comprised of four neighboring RPs. For the second case we
measured NN success ratio of TP. Nearest RP’s distance is 0.6 to 0.85 meters.
Lastly, we analyzed collected data to measure error distance and positioning
success rate. After the first experiment, same experiment was conducted using
Gaussian smoothing to analyze effect of filtered data as shown in Figure 5 and Table 1.
Table 1 shows results of positioning of each TPs. Final results showed average error
distance of 1.74 meters. Results of NN accuracy rate and In-Cell accuracy rate were
46.04% and 94.58% respectively.

Figure 5. Comparison of Positioning result before/after Gaussian Smoothing


630 Y.H. Jin et al. / A Study of Filtering Method for Accurate Indoor Positioning System

Table 1. Result of Positioning at respective TPs using Gaussian Smoothing

Before Gaussian Smoothing After Gaussian Smoothing


TP Error Distance(m) NN hit (%) In-cell hit (%) Error Distance(m) NN hit (%) In-cell hit (%)
1 1.49 73.33 96.67 1.20 83.33 100.00
2 2.38 0.00 100.00 2.38 0.00 100.00
3 0.99 86.67 100.00 1.37 73.33 93.33
4 0.85 100.00 100.00 1.83 10.00 100.00
5 1.90 0.00 100.00 1.90 0.00 100.00
6 2.22 36.67 53.33 1.10 86.67 93.33
7 2.25 6.67 100.00 2.63 0.00 100.00
8 1.70 0.00 100.00 1.95 20.00 86.67
9 1.27 80.00 100.00 1.20 83.33 100.00
10 2.19 10.00 100.00 1.83 36.67 100.00
11 3.26 0.00 86.67 3.04 0.00 96.67
12 1.04 80.00 100.00 0.91 93.33 100.00
13 1.07 93.33 96.67 0.85 100.00 100.00
14 1.14 70.00 100.00 1.01 83.33 100.00
15 2.75 10.00 90.00 2.90 0.00 100.00
16 1.34 90.00 90.00 0.85 100.00 100.00
Average 1.74 46.04 94.58 1.68 48.125 98.13

Therefore we have concluded that the actual positioning error distance of about 1m.
There is slight increase in accuracy rate when Gaussian smoothing is applied. TP 6 in
Table 1 shows significant difference between Gaussian smoothing data and non-
Gaussian smoothing data. It is confirmed that Gaussian smoothing can compensate
unstable signal and increase positioning accuracy. We have noticed that sample data
sometimes showed decrease in accuracy rate, however, overall results showed
improvement. As mentioned previously, the distance of the nearest RPs of TP is 0.6
meters to 0.85 meters.

5. Conclusion

In this study, Beacon fingerprinting experiment using BLE signal for IPS was
performed as an alternative to Wi-Fi fingerprinting technology. Our results confirmed
that it is possible to ensure the positioning accuracy of RP interval set only by utilizing
a simple NN algorithm. Thus, it was found that a simple and repetitive method can be
possible to construct a low-cost BLE IPS. Kalman filter was used in order to reduce the
measurement error of the instability of the BLE signal. In addition the signal data is
accumulated repeatedly, and its average is applied. The result confirmed practical use
of low-cost BLE for IPS. Result showed slight improvement in both error distance and
hit ratio when Gaussian smoothing was applied. Reliability of fingerprint database is
low when constructing the collected data in short time at low cost. In this case the
Y.H. Jin et al. / A Study of Filtering Method for Accurate Indoor Positioning System 631

proposed improving effect seems to be more prominent. Gaussian Smoothing can


correct errors of exceptionally collected signal by using spatial continuity. However,
the distinctiveness of the signal pattern can be rather reduced. It is necessary to discuss
study of filtering approach in multi-dimensional spatial data or image processing
technique to explore other possibilities.

References

[1] K. Kamol, K. Prashant, Modeling of indoor positioning systems based on location fingerprinting,
INFOCOM 2004. Twenty-third AnnualJoint Conference of the IEEE Computer and Communications
Societies 2 (2004), IEEE, 1012-1022.
[2] V. Gabriel, et al. Monitoring and detection platform to prevent anomalous situations in home care,
Sensors 14(2014), 9900-9921.
[3] R. Aswin N., et al. Accurate mobile robot localization in indoor environments using Bluetooth, Robotics
and Automation (ICRA), 2010 IEEE International Conference on (2010), IEEE, 4391-4396.
[4] P. Nazemzadeh, D. Fontanelli, D. Macii, T. Rizano, and L. Palopoli, Design and performance analysis
of an indoor position tracking technique for smart rollators, Indoor Positioning and Indoor Navigation
(IPIN) 2013 International Conference on, IEEE, 1–10.
[5] L.J. Liu and H.J. Ma, Study on wireless sensor network boundary localization based on rssi, Wireless
Communication and Sensor Network (WCSN), 2014 International Conference on, IEEE , 232–235.
[6] Y. Wang, X. Yang, Y. Zhao, Y. Liu, and L. Cuthbert, Bluetooth positioning using rssi and triangulation
methods, Consumer Communications and Networking Conference (CCNC), 2013 IEEE, 837–842.
[7] H. Guangjie, C. Deokjai, L. Wontaek, A novel reference node selection algorithm based on trilateration
for indoor sensor networks. In: Computer and Information Technology 2007, CIT 2007, 7th IEEE
International Conference on, IEEE, (2007), 1003-1008.
[8] Z. Peng, et al. Collaborative WiFi fingerprinting using sensor-based navigation on smartphones.
Sensors 15 (2015), 17534-17557.
[9] K. In-Cheol, C. Eun-Mi, O. Hui-Kyung, Gaussian Interpolation-Based Pedestrian Tracking in
Continuous Free Spaces, The KIPS Transactions: PartB 19 (2012), 177-182.
This page intentionally left blank
Fuzzy Systems and Data Mining II 633
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.

Subject Index
3D mesh retrieval 562 comment mining 220
access control 407 complex network 612
accident prediction 232 complex system 612
accident state vector 232 computation times 241
adaptive learning 507 computational method 3
aggregation operator 37 conditional probability
applicability 254 introduction 220
architecture 327, 458 cooperative relaying 312
area compensation 58 coordination game 579
artificial neural network 173 correlation analysis 274
association pair extraction 220 correlation-weighted 487
association rule mining 141, 429 course 397
attribute reduction 94 credit approval dataset 149
authentication client 407 curvature flow 562
auto-encoder 321 data analysis 254
autoregressive model 570 data center networks 167
avalanche 226 data mining 65, 149, 159, 173,
bankruptcy prediction 282 179, 194, 248, 347, 612
battery equalization 28 data mining visualization 353
beacon 624 data sequence 519
big data 65 database design 414
bipolar-valued fuzzy set 115 decision tree model 194
bitrate clustering recognition 466 deep learning 149
Bluetooth Low Energy (BLE) 624 delay 87
blurred security boundaries 208 detection probability 585
Bonferroni mean 37 differential evolution 438
BP network 274, 341 DoG 562
BP neural network 360 DRAM 173
BPNN 390 Dynamic Bayesian Networks
bushing type cable terminal 360 (DBN) 290
card transaction dataset 149 dynamic itemset mining 141
case 65 early prediction 519
centroid points 94 early warning index 341
chance constraints 121 Ebola 226
citation network 605 e-commerce 65
cloud reasoning 377 energy awareness 570
clustering 397 energy saving 167
clustering algorithm 466 energy utilization 28
cobweb 397 engineering 353
cognitive image 476 equalizer 549
collaborative filtering 397, 487 equivalent removing 549
collaborative target tracking 585 E-R model 414
color histogram 299 evaluation model 438
634

exact database 179 incompatible bipolarity 115


exact frequent itemset mining 179 indoor positioning 624
expectation-maximization (EM) 267 influence mechanism 605
face recognition 367 informal learning 507
factor analysis 254, 341 information system 353
fast Fourier transform solvers 618 infrared image 360
fault detection 173, 519 intelligent academic advising
feature extraction 360 system 397
filling valley balancing 28 intensity inhomogeneity 241
finance 102 interaction 507
fingerprint 624 interdisciplinary citation
fluid flow approximation 542 mechanism 605
FMADM 94 Internet of Things (IoT) 327, 458
forecasting 3 inter-timeslice arc 290
Fourier-Mellin transform 360 interval type-2 fuzzy set 45
fuzzy 87, 102 itemset mining 141
fuzzy C means clustering 81 item-weighted 487
fuzzy cognitive map (FCM) 476 ITS 542
fuzzy control 28, 108 IVHFS 37
fuzzy dual calculus 11 joint robot 108
fuzzy dual numbers 11 kernel extreme learning machine 282
fuzzy logic 22, 131 kernelled support matrix
fuzzy MEBN 71 machine 347
fuzzy rough sets 81 K-means 397
fuzzy set 115 k-preference integration 58
fuzzy tensors 51 LD 525
fuzzy time series 3 level set 241
generalization hybrid carrier lithium battery 28
modulation (GHCM) system 592 load forecasting 429
genetic algorithm (GA) 260, 377 local learning 248
global minimization 241 logistic regression 444
group key 208 low density parity check (LDPC) 306
hazardous chemical accidents 232 MANETs 525
heart rate variability 444 Markov blanket 248
heavy metal pollution 274 mathematical programming 11
hesitant fuzzy set 115 maximal information coefficient
HFSS 549 (MIC) 290
high dimension PCA 390 member indicator 208
hot words extraction 535 meta-investment strategy 194
hyperbolic ricci flow 452 meteorological 535
hypergraph 334 metro construction 341
hypergraph Laplacian 334 micro expressions 367
illustration image 299 microblog 535
image binarization 555 microstrip resonator 549
image classification 299 minimal strong component 51
image recognition 360 minimum cost flow 58
implicit feature 220 momentum 102
improved particle swarm multi-criteria decision making 45
optimization 282 multi-hop 420
635

multimode 549 phase unwrapping 618


multi-objective optimization 260 pollution index 274
multiple criteria decision making 115 Poor man’s Data Augmentation
multiple instruction multiple (PMDA) 267
data (MIMD) 306 popularity 487
multiple support thresholds 141 power efficiency 312
multi-pose 367 power law degree distribution 579
M-WFRFT 592 power-law distribution 605
National Geographic Conditions preference order 45
Monitoring 377 priority 167
network coding 598 probabilistic frequent itemset
network evolution 579 mining 179
network relation map (NRM) 476 probabilistic frequent itemset 159
network reviews 220 probabilistic sensing model 585
networked systems 87 probability distribution 3
neural network 321 projection pursuit 438
Next Generation Network propagation of non-stationary
(NGN) 327 waves 353
node mobility 420 PR-OWL 71
non-photorealistic rendering 555 pseudonym 208
novelty 487 PSNR 466
numerical method 353 public opinion 535
old street 476 QoS 525
ontology 507 quality prediction 390
ontology language 71 quantitative concept lattice 429
ontology network 507 quantum bee colony
ontology reasoning 414 optimization 312
open system interconnection radio map 624
(OSI) 327 radius agency 407
opportunistic network 598 radius EAPOL 802.1x 407
optimization 11, 452 radon transform 360
optimization method 487 random forests 444
orthogonal frequency division recommendation 507
multiplexing (OFDM) 267 recommender system 321
oscillation period 51 regular double Stone algebra 501
packets control 598 regular ternary class 501
parallel computing 377 relational semantics 131
parameter tuning 282 relay selection 312
partial FFT demodulation 592 representation 501
particle swarm optimization 438 residual space 367
PatientsLikeMe 208 return voltage 81
pattern recognition 22 reversed hash key chain 208
PEPA 542 Ricci flow 452
performance awareness 570 robust optimization 121
performance degradation 186 rough sets 94
performance evaluation 542 routing 167
personalization 507 routing protocol 525
personnel optimization row-sparse 334
scheduling 377 sample entropy 186
636

sample self-representation 334 time synchronization 420


sampling 159 t-norm (based) logics 131
sandpile model 226 TOPSIS 45
sector rotation theory 194 tourism experience 476
segmentation 535, 562 transport of intensity equation 618
semantic web 71, 414 trapezoidal fuzzy number 94
(set-theoretic) Kripke-style Trello 208
semantics 131 two wavelength illuminations 618
signed distance 58 TWTA 549
simple additive weighting 45 uncertain classification 121
simulated annealing 377 uncertain database 159, 179
simulation 108 unrestricted Bayesian classifier 248
SINR 525 user modelling 507
sliding-mode control 108 UWSNs 420
smart grid 429 video frame quality 466
spectral clustering 334 virtual image 367
sports gambling markets 22 virtual machine relocation 570
sports metric forecasting 22 virtualized security defense
stock 102 system 208
stock selection model 194 VR/AR 208
stress recognition 444 weapon-target allocation 260
structure learning 290 WEKA 397
style 299 wireless sensor
substructural logic 131 network 452, 458, 585
support vector data description 186 wireless video 466
support vector WLAN 306
machine 121, 232, 347 woodcut rendering 555
system science 612 worldwide interoperability
tax inspection cases-choice 347 for microwave access
technical analysis 102 (WIMAX) 306
telemetry big data 186 yield analysis 173
Fuzzy Systems and Data Mining II 637
S.-L. Sun et al. (Eds.)
IOS Press, 2016
© 2016 The authors and IOS Press. All rights reserved.

Author Index
Abdullah, L. 45 Gao, Z.-W. 327
Abuzayed, N. 141 Guo, Z.-B. 367
Akhmetova, Z. 353 Han, C. 618
Bao, W.-X. 618 Han, Y. 115
Boranbayev, S. 353 Han, Y.-H. 429
Burgos, D. 507 Han, Z.-Z. 232
Cai, J.-D. 81 He, D. 267
Cao, H.-Y. 220 He, D.-H. 81
Cao, Q. 466 He, L.-M. 194
Cao, Q.-L. 121 He, W. 327, 458
Chang, C.-W. 173 He, X.-R. 37
Chen, H.-L. 282 Hsu, Y.-C. 22
Chen, H.-Y. 407 Hu, K.-W. 28
Chen, Lin 535 Hu, Y. 194
Chen, Ling 51 Huang, F. 65
Chen, N.-J. 570 Huang, N.-J. 121
Chen, S. 115 Huang, P. 94
Chen, S.-D. 194 Jang, W. 624
Chen, W. 458 Ji, Y. 28
Chen, X. 542 Jiang, H.-Y. 194
Chen, X.-H. 267 Jiao, J.-K. 598
Chen, Yan-Hui 444 Jin, W.-D. 438
Chen, Yao-Hua 254 Jin, Y.H. 624
Chen, Y.-W. 290 Jing, H. 306
Cheng, D.-B. 334 Jing, J.-H. 260
Cheng, H. 618 Kamal, C.W.R.A.C.W. 45
Chi, H.-X. 487 Kang, X. 186
Chi, J.-R. 108 Kita, K. 299
Cosenza, C.A.N. 11 Krykhtine, F. 11
Dai, M. 444 Kumar, S. 3
Deng, Z.-Y. 334 Kuo, C.-L. 476
Ding, J. 542 Kwon, S.J. 624
Dong, E.-M. 579, 605 Lai, F.-G. 312
Du, J. 377 Li, B. 624
Duan, X.-J. 612 Li, D. 71
El Moudani, W. 11 Li, G.-L. 290
Ergenç, B. 141 Li, H.-F. 159, 179, 241
Fan, X.-G. 585 Li, J.-P. 579, 605, 612
Fang, J. 618 Li, L. 71
Fang, W.-D. 327, 458 Li, M. 525
Fujisawa, A. 299 Li, M.-H. 248
Gangwar, S.S. 3 Li, Q. 282
Gao, J.-J. 420 Li, S. 466
638

Li, Xiang 570 Qiu, X. 226, 452


Li, Xiaosong 397 Ran, F. 28
Li, X.-F. 267 Ren, F. 535
Li, Yong 592 Sarsenov, B. 353
Li, You 519 Sha, X.-J. 592
Li, Y.-G. 334 Shan, L.-H. 327, 458
Li, Y.-T. 312 Shang, Z.-J. 312
Lim, S.H. 624 She, J.-H. 347
Lin, C.-L. 476 Shen, C. 618
Lin, S.-Y. 173 Shen, L.-M. 282
Lin, Y.-M. 519 Shen, X.-H. 420
Liu, G.-Q. 241 Sheng, X.-P. 274
Liu, H. 429 Shi, W.-J. 466
Liu, J. 414 Shi, Z.-K. 562
Liu, J.-F. 487 Song, X. 65
Liu, K.-W. 232 Song, Z.-Q. 592
Liu, Qi 605 Sun, L.-F. 598
Liu, Qian 377 Sun, W.-W. 274
Liu, Qin 226 Sun, Y.-J. 466
Liu, X.-L. 407 Tallón-Ballesteros, A.J. v
Liu, Z.-Y. 71 Tang, K.-M. 226, 452
Lu, L.-Z. 51 Tang, W.-Z. 321
Lu, S.-C. 58 Tao, S. 377
Lu, Y.-N. 562 Tao, Y.-Z. 220
Luo, C.-W. 501 Tian, H.-D. 186
Luo, Q. 115 Wan, J.-H. 232
Lv, K. 598 Wan, X.-Y. 220
Lv, Z.-Y. 94 Wang, B. 585
Ma, B.-L. 341 Wang, C. 487
Ma, C. 555 Wang, C.-K. 226
Ma, F.-Y. 458 Wang, H.-D. 341
Ma, J. 585 Wang, H.-Y. 420
Matsumoto, K. 299 Wang, J. 167
Meng, S. 37 Wang, J.-Y. 487
Mora-Camino, F.A.C. 11 Wang, K.-J. 282
Niimi, A. 149 Wang, L.-M. 248
Niu, H. 108 Wang, M.-J. 282
Niu, H.-Q. 360 Wang, M.-X. 549
Ouyang, D.-T. 414 Wang, R. 542
Pan, D.-R. 598 Wang, S.-D. 585
Park, H.-A. 208 Wang, S.-W. 555
Peachavanish, R. 102 Wang, X.-D. 321
Peng, L.-X. 254 Wang, X.-F. 58
Pesayanavin, R. 570 Wang, Yang 367
Pi, B.-K. 555 Wang, Yong 220
Pi, D.-C. 186 Wang, Yue 159, 179
Qian, L. 167 Wang, Y.-L. 321
Qiao, Y. 87 Wang, Z.-F. 71
Qiao, Z.-T. 260 Wei, S. 618
639

Wen, S.-Q. 487 Zhang, C. 618


Wu, J.-Z. 360 Zhang, F. 618
Wu, L.-Q. 452 Zhang, G.-B. 525
Wu, Y.-J. 321 Zhang, H. 360
Wu, Y.-L. 525 Zhang, H.-Q. 555
Wu, Y.-Y. 37 Zhang, H.-Y. 167
Xia, Y. 618 Zhang, M.-N. 420
Xiao, Y.-B. 121 Zhang, N. 159
Xie, S. 81 Zhang, S.-C. 334
Xie, Y. 254 Zhang, X.-Y. 94
Xie, Z. 579, 605, 612 Zhang, Y.-C. 341
Xie, Z.-L. 612 Zhang, Y.-J. 159
Xing, L.-N. 290 Zhang, Y.-S. 260
Xu, J. 360 Zhang, Z. 562
Xu, X.-X. 414 Zhang, Z.-H. 194
Xu, Y.-C. 570 Zhao, J.-W. 28
Yan, Y.-Y. 367 Zhao, L.-Y. 327
Yang, B.-Z. 121 Zhao, Y. 549
Yang, C.-R. 535 Zheng, G. 444
Yang, E. 131 Zheng, G.-Q. 487
Yang, H. 226, 452 Zheng, L.-W. 94
Yang, J. 108 Zheng, W.-J. 360
Yang, Y.-J. 585 Zheng, Z.-Y. 71
Yao, H.-J. 87 Zhou, F. 186
Ye, Y.-X. 414 Zhou, H. 390
Yi, Q. 306 Zhou, J.-C. 167
Yoon, A.K.-y. 624 Zhou, W. 37
Yoshida, M. 299 Zhou, X. 377
Yu, D.-J. 37 Zhu, B. 438
Yu, K.-M. 390 Zhu, B.-L. 282
Yuan, F.-S. 87 Zhuo, J. 347
Yuan, H.-Y. 367 Zhuo, Z.-F. 585
Yuan, T.-W. 562 Zhuzbayev, S. 353
Zeng, Q.-M. 81 Zuo, H.-W. 466
Zhang, A.-H. 549

You might also like