You are on page 1of 4

Design on Web Services Similarity Calculation

Framework
Guisheng Yin Xiaohui Cui Yuxin Dong
College of Computer Science and College of Computer Science and College of Computer Science and
Technology Technology Technology
Harbin Engineering University Harbin Engineering University Harbin Engineering University
Harbin, China Harbin, China Harbin, China
yinguisheng@hrbeu.edu.cn cuixiaohui@yeah.net dongyuxin@hrbeu.edu.cn

Abstract—Semantic Similarity Calculation(SSC) is main factor to is not described in detail, and lacking of integrated framework
affect accuracy on Web Services Discovery. Based on recent to support the whole process make it hard to apply in practice.
Another way is to involve Domain Information Content(DIC)
researches and requirements for nearness of user request and
into weights of directed edge as basis to optimize semantic
effective response of algorithm, the essay introduces Web distance of ontology concept in semantic tree. Since the
Services Similarity Calculation Framework(WS-SCF) The process used domain-related training texts, every semantic
framework extracts semantic information of traditional Web node can greatly reflect domain feature [5, 6]. Because of the
Services and integrates SSC algorithm to product semantic huge amount of semantic nodes, the closer to root of semantic
relation matrix. It has been proved by experiments the matrix tree, the more complex the algorithms calculate total
Information Content, which making it hard to control
reflects domain feature and satisfy user request, and framework
computational complexity. In addition, [7] considers different
can be widely applied in domain of Web Service. semantic type of directed edge, but distinguishing semantic
Keywords—Web Services; Ontology; Semantic Similarity; type is still problem.
Extension Semantic Library According to above analysis, problem on SSC are
computational complexity, ambiguous semantic relation, low
I. INTRODUCTION flexibility and lack of framework to support, which make them
Web Services is a sort of self-contained, self-described and unpractical to apply on Web Service. To consider importance
modularized application, and it can be described, published, SSC, we will firstly propose Web Services Similarity
searched and invoked on the Internet. Thereby, Web Services Calculation Framework(WS-SCF). Then we will construct
not only extend function of application but also implement Web Service Behavior(WSB) and User Service Request(WSR)
dynamic supply to software [1]. In recent years, a great model, and adopt asynchronous mode to generate Semantic
number of Web Services emerged, and the way searching Web Service Library and Extension Semantic Library. Finally,
services satisfied User Service Request(USR) became research we design and implement algorithm to calculate semantic
highlight on Web Service Discovery, while Semantic relation matrix between USR and WSB. Experiments indicate
Similarity Calculation(SSC) between USR and Web the framework can directly apply in Web Service and enhance
Services is bottleneck in Web Service Discovery. Accordingly, efficiency for following Web Service Discovery.
domestic and overseas scholars have proposed solution from
various aspects. II. WEB SERVICES SIMILARITY CALCULATION
SSC algorithm based on key-words-using UDDI [2] to FRAMEWORK
express user request, the algorithm used key-words frequency The framework designs two main periods-Services
to calculate similarity. It lacks of flexibility and can’t greatly Register and Similarity Calculation. Detailed information of
meet the requirements of user request. components of the framework sees Fig. 1.
SSC algorithm based on Semantics-using ontology Services Register-This process implements execration of
concepts to express user request, for instance OWL-S[3], the semantic behavior for Web Services. Since present Web
algorithm used semantic similarity of ontology concept to Services all adopt industry standard WSDL to describe Web
match services. Now, SSC of Web Services mostly refers to Services, it is necessary to design Web Service Fetch Model to
traditional ontology similarity research: One way is to involve get WSDL of Web Services. Semantic Information Analysis
Semantic Attribute Density(SAD) to measure similarity Model achieves ontology concept as Specific Web Service
relation[4], which improves SSC merely on semantic distance. index to generate Semantic Web Service Library.
However the way of search semantic attributes and calculation

978-1-4244-5895-0/10/$26.00 ©2010 IEEE

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 13,2010 at 15:46:09 UTC from IEEE Xplore. Restrictions apply.
WSDL
Web Service Semantic
Web Service
Behavior Ontology Information Analysis

Semantic
Web Service Library
Web Service Fetch

Traditional
Web Service Library
Services Register
Similarity Calculation

Web Service Semantic Similarity Ontology Extension


Behavior Ontology Calculation

User Service Request Extension Semantic Library


Ontology

Request Ontology Semantic Relation Matrix


Fetch
Ontology Library

Figure 1 Web Services Similarity Calculation Framework


Similarity Calculation-This period generates semantic The operations of a Web Services depict behaviors of Web
relation matrix between USR and WSB. By utilizing training Services from different aspects, which mean we can utilize
texts and Ontology Extension Model, this period involves operations as foundation to construct Web Service Behavior
domain Information Content, out-degree and depth into node Model.
in semantic tree to generate Extension Semantic Library. In
Definition 2: Web Service Behavior(WSB) is tuple of 3
order to increase domain-nearness of ontology in Extension
members- p =<ID, I, O>. I = {i1, i2, i3… in} and O = {o1, o2,
Semantic Library, we can dynamically adjust amount of
o3… on} for ∀i I, o O, o, i T. In tuple, ID, I and O
training texts. Ontology Extension algorithm will be fully
stand for Identity Number, input ontology sets and output
present in introduction of Ontology Extension Model.
ontology sets respectively.
Semantic Similarity Calculation Model firstly gains WSB
Semantic Information Analysis Model adopt following
and WSR, which can obtained from Semantic Web Service
rules to gain each member of WSB Model. For each concept
Library and Extension Semantic Library respectively. Then it
in input ontology sets, the information directly maps to sets of
generates semantic relation matrix R that can be applied in
the NAME element in MESSAGE elements carried by
Web Service Discovery. Owing to that Extension Semantic
portType element of corresponding operation in WSDL. For
Library has already been generated, this model, with lower
each concept in output ontology sets, the acquisition rule is
computational complexity, only focuses on simple calculation
same as input ontology except the different direction of
on semantic nodes, which make it easy to implement and
portType element.
adopt.
C. User Service Request Model
III. SEMANTIC SIMILARITY MODEL
Modeling the service request provided by user is to
A. Extension Ontology Concept Model maintain consistency between USR and WSB, which satisfy
the basic need of WS-SCF.
Ontology Extension algorithm generates Extension
Ontology Concept that contains Information Content, depth, Definition 3: User Service Request(USR) r is a tuple of 2
out-degree to map each node in semantic tree. Definition1 members- r=<Ir, Or>. Ir= {ir1, ir2… irm} and Or= {or1, or2…
shows details of Extension Ontology Concept. orn}, for ∀ir∈Ir, or∈Or,ir, or∈T. In tuple, Ir stands for user
provided input ontology sets when invoking Web Services,
Definition1: Extension Ontology Concept is tuple of 5
while Or means user expectation output ontology sets after
members-C=<N, Dep, IC, Deg, Fat>, which N, Dep, IC, Deg
invoking Web Services.
and Fat stands for Name, Depth, Information Content, and
Father of nodes in semantic tree T respectively. IV. Semantic Similarity Calculation on Web Service
B. Web Service Behavior Model According to the design of WS-SCF, the similarity

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 13,2010 at 15:46:09 UTC from IEEE Xplore. Restrictions apply.
between USR and WSB is determined by corresponding 2) Out-degree of end nodes
element in input ontology sets and output ontology sets of In terms of construction assumption (2) of semantic tree,
above two tuple. By means of WordNet© as elementary Relation between edge weight and out-degree of nodes is
ontology library, we firstly analyze the correctness of SSC reverse ratio, and calculated by (3)
based on semantic distance, then involving impact factor to
semantic distance, finally design and implement SSC
algorithm based on Extension Semantic Library.
degreeWeight =| N | − deg/ ∑ n .deg
nt ∈ N
t
(3)

A. Similarity Calculation based on semantic distance In (3), N= path(Ci,Cj)/{Ci,Cj} and out-degree can be
achieved from Extension Semantic Library. path(Ci,Cj) means
Frequently, SSC can be measured by semantic distance, nodes in shortest path between semantic node Ci and Cj.
which depends on established semantic tree, for instance degreeWeight∈(0 1)。
WordNet©. The establishment of semantic tree is based on
following 2 assumptions: (1) The closer to root the node is, 3) Information Content of end nodes
the more abstract the meaning of node expressed; (2) The In order to enhance domain-nearness of semantic nodes,
meaning of father node covered the meaning of its children. we use word frequency to calculate Information Content of
The process of SSC is to map different ontology to nodes ontology concepts. The more Information Content of ontology
in semantic tree, and similarity is determined by the shortest concept is, the more weight the edge has. In other words,
path between nodes. Therefore, relation between semantic change of edge weight agrees with Information Content.
similarity and semantic distance are non-linear inverse ratio, Information Content weight can be calculated by (4).
as described by (1). infocontWeight = arctan-1(IC) (4)
α In (4), IC=−log(P(ni)) and. P(ni) means word-frequency of
SemSim(Ci , C j ) = (1) concepts ni in training text. infocontWeitht∈(0 1).
Dist (Ci , C j )2 + α
Dist(Ci, Cj) stands for distance between concept Ci and Cj C. Semantic Similarity Calculation Algorithm
while α is positive integer to adjust convergence rate of Base on output identifier, SSC algorithm outputs semantic
function SemSim. The correctness of (1) is guaranteed by relation matrix R to describe matching level between USR and
Theorem 1. WSB. For each element rij in R, rij stands for atomic semantic
Theorem 1 For ∀Ci,Cj,Ck if Dist(Ci,Cj)>Dist(Cj,Ck) then similarity between ontology concept i in ontology sets of USR
SemSim(Ci,Cj)<SemSim(Cj,Ck). Particularly, when i=j=k, and ontology concept j in ontology sets of WSB. Procedure
SemSim(Ci,Cj)=SemSim(Cj,Ck)=1. SimCal describes the whole steps to generate matrix semantic
relation matrix R.
We can use increasing-decreasing characteristic of
composite function to prove Theorem 1. The value range of (1) Procedure SimCal
is (0 1]- when Dist(Cj,Ck)∈,[0 ∞). Base on thoughts that Input: r=<Ir, Or>, P=<ID, I, O> and flag. Output: R = {rij}
arbitrary range [a b) in number axis injective maps [0 1), we 1. if(flag == 0) do
involved impact factor into SSC algorithm to such that 2. sour = r.Or; targ = p.O;
Dist(Cj,Ck)∈[0 1). 3. else targ =r.Ir; sour = p.I;
4. if(|sour|>|targ|) return NULL;
B. Impact factor for semantic distance 5. Set R = (sour)T × targ
Semantic distance can be transform to sum value of edge 6. foreach(rij in R) do
weight constituting the path between two nodes in semantic 7. path = GetPath(sour[i], targ[j]);
tree. Edge weigh should consider Information Content, 8. ComFat = path.CommonFather;
out-degree and depth of end nodes. 9. totalWeight, index, totalDegree = 0;
10. foreach (nt in path) do
1) Depth of end nodes 11. if(nt == ComFat) break; edgeWeight = 0; Set fat = nt.fat;
In terms of construction assumption (1) of semantic tree, 12. depthWeight = e-fat.dep; degreeWeight = fat.deg;
relation between node-depth and abstraction degree is reverse 13. infocontWeight = arctan-1(fat.IC);
ratio. The deeper semantic nodes location is, the less the value 14. totalDegree += degreeWeight;
of edge weight is. The value of edge weight in semantic tree 15. edgeWeight = (depthWeight + degreeWeight +infocontWeight) /3;
can by calculated by (2). 16. totalWeight += edgeWeight; index++;
17. totalWeight = index – totalWeight/totalDegree
depthWeight = e-dep (2) 18. Set rij = α/(TotalWeight)2 +α
In (2), node-depth can be achieved from Extension 19. return R
Semantic Library and depthWeight∈(0 1)。
Row 1~4 checks input of algorithm whether satisfy priori

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 13,2010 at 15:46:09 UTC from IEEE Xplore. Restrictions apply.
condition. When algorithm matching input ontology sets, the Table I Semantic Similarity Calculation results
size of input ontology sets in USR should be greater than the No Concept Group SAD DIC WS-SCF
size of input ontology sets in WSB. However, when algorithm
1 (product, goods) 0.2313 0.6725 0.7723
matches output ontology sets, the size of output ontology sets
2 (fruit, apple) 0.5712 0.5214 0.4756
in USR should be smaller than the size of output ontology sets
3 (money, currency) 0.5884 0.9324 0.9973
in WSB. Returning Null Matrix means matching failure.
4 (credit card, bank) 0.4523 0.5543 0.7633
Row 5~19 improves SSC Algorithm based on Semantic The comparison between Group 1 express depth effect
Distance. When receiving the shortest path, SimCal uses depth, with the same attribute of semantic nodes. On calculation of
out-degree and Information Content of each edge in path to Group 3, both SAD and DIC did not reflect effect of abstract
get weighted distance, and then calculate semantic similarity level to semantic nodes. Group 3 and 4 show effect to
by (1). When getting shortest path, SimCal utilizes notion of domain-nearness under consideration of domain training text.
most near common father to facilitate the process. As seen from the above analysis on semantic similarity,
WS-SCF overcomes computation complexity and nearness
V. Experimental Analysis loss of traditional SSC. In terms of lower time-consuming of
Based on background of E-Commerce, we utilize WSDL algorithm, we guarantee domain nearness and request for SSC,
to describe Web Service and generate Service Library. For thereby, enhance effect on Web Service Discovery.
ontology concepts, we extend corresponding concepts to
Extension Semantic Library based on WordNet© and adopt VI. CONCLUSION
domain-related text to calculate Information Content weight. Based on requirement to SSC on Web Services, we design
In order to analyze algorithm efficiency of WS-SCF, we and implement WS-SCF. The framework is compatible with
compare SAD and DIC presented in [4] and [5] with WS-SCF existing Web Services and extracts semantics in WSDL to
to calculation semantic similarity for one specific USR among generate Semantic Web Service Library. Also, the framework
different amounts of Web Service Library. Time-consuming of considers effects on depth, out-degree and domain-nearness to
each methods are shown in Fig. 2. generate Extension Semantic Library, and calculate semantic
relation matrix by WSB and USR. In the future, we will
extract eigenvectors from semantic relation matrix to express
optimum matching, which further simplify SSC and reduce
computational complexity in Web Service Discovery.
Calculation Time (ms)

ACKNOWLEDGEMENT
This work is sponsored by the National Natural Science
Foundation of China under Grant No. 60973075, and the
Foundation of Harbin Science and Technology Bureau under
Grant No. RC2009XK010003
REFERENCES
Service Number
[1]. Sun Ping, Jiang Chang-Jun. Using Service Clustering to Facilitate
Figure 2 Time-consuming comparison Process-Oriented Semantic Web Service Discovery. Chinese Journal of
Computers, 2008, 31(8): 1340~1353
In Fig. 2, SAD costs less time than others for it only [2]. L Clement, A Hately, C von Riegen, et al. UDDI Version 3.0[OL].
consider semantic attribute factor, which results the liner http://uddi.org/pubs/uddi_v3.htm, 2002
increment of time-consuming. On the contrary, DIC spends [3]. DAVID M, MARK B. OWL-S: semantic markup for web services
80.21ms when numbers of services arrive at 1000. The reason [EB/OL]. http://ww.w3.org/submission/2004/SUBM-OWL-S-30041122
is method of DIC calculates Information Content form current [4]. Zhao Yong-jin, Zheng Hong-yuan, DING Qin-lin. Study on semantic
similarity algorithm based on ontology. Journal of Computer
nodes and its children, and it must be done every time Applications, 2009, 29(11): 3074~3076 (in Chinese)
similarity calculation, which leads to huge amount of [5]. Qiu Jiang-nan, Zhong Qiu-yuan, CUI Yan. Research on integrated
computational complexity. WS-SCF is a little more than SAD, semantics matching algorithm in service matching model. Journal of
but practical time-consuming is lower and can be accepted in Dalian University of Technology. 2007, 46(6):914~919 (in Chinese)
practical situation, for it costs 50ms in 1000 number services [6]. Shi Bin, Yan Jian-zhuo, Wang Pu, Fang Li-ying. Ontology-based
Measure of Semantic Similarity Between Concepts. Computer
library. In a word, WS-SCF costs suitable time to calculate but Engineering, 2009, 35(19): 83~85 (in Chinese)
possesses higher domain-nearness and user service request. [7]. Huang Guo, Zhou Zhu-rong. Research on domain ontology-based
concept semantic similarity computation. Computer Engineering and
To analyze nearness of domain and user request, we select Design. 2007, 28(10): 2460~2463 (in Chinese)
4 groups of representative ontology to calculate semantic [8]. EDGINGTON, CHOIB, HENSON K. Knowledge ontology to
similarity by SAD, DIC and WS-SCF. The results are shown facilitate adopting [J]. Communications of the ACM, 2004,
47(11):85~9
in Table I.

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 13,2010 at 15:46:09 UTC from IEEE Xplore. Restrictions apply.

You might also like