Professional Documents
Culture Documents
1, JANUARY-MARCH 2009 65
Abstract—The Web is transforming from a Web of data to a Web of both Semantic data and services. This trend is providing us with
increasing opportunities to compose potentially interesting and useful services from existing services. While we may not sometimes
have the specific queries needed in top-down service composition approaches to identify them, the early and proactive exposure of
these opportunities will be key to harvest the great potential of the large body of Web services. In this paper, we propose a Web service
mining framework that allows unexpected and interesting service compositions to automatically emerge in a bottom-up fashion. We
present several mining techniques aiming at the discovery of such service compositions. We also present evaluation measures of their
interestingness and usefulness. As a novel application of this framework, we demonstrate its effectiveness and potential by applying it
to service-oriented models of biological processes for the discovery of interesting and useful pathways.
1 INTRODUCTION
are novel (i.e., previously unknown) and whether they are applicability of corresponding functions within these
established in a surprising way (e.g., if they link segments domains. We formally define mining context C in (1)
not previously known to be related). An interactive session
C ¼ fdðLÞ j d 2 Dg; ð1Þ
follows next with the user taking hints from highlighted
interesting segments within a composition network and where
picking a handful of nodes to pursue further. These nodes
are then automatically linked into a connected subgraph, to . D is a set of Web service domains;
. L is a set of locale attributes of mining interest;
the extent possible, using a subset of nodes and edges in the
. dðLÞ is a domain carved out by L.
original graph. This subgraph provides the user the basis to
formulate hypotheses, which can then be tested out via Consequently, if we use ! to denote the relationship of
refers to, then the set of all ontologies referred to in C can be
simulation. In the case of pathway discovery, the simulation
denoted as OntðCÞ and calculated using
is used to invoke relevant service operations, changing the
quantity/attribute value of various entities involved in the OntðCÞ ¼ font j 9d 2 C ^ d ! ontg; ð2Þ
composition network. Results from the simulation phase and the set of all operation interfaces included in C can be
are expected to reveal hidden relationships among the denoted as OPintf ðCÞ and calculated using
corresponding processes. These results are then presented
to the user, whose subjective evaluation finally determines OPintf ðCÞ ¼ fopintf j 9d 2 C ^ opintf 2 dg: ð3Þ
whether the subgraph in pursuit is actually useful. In some 4.2 Search Space Determination
cases, the user may want to revise the simulation initial
The mining scope determines the coverage of the search
setting, rerun the simulation, and evaluate new simulation
space when looking for composable components for the
results. At the end, the user may want to introduce some of
purpose of composition. Similar to the drug discovery
the discovered service composition subgraphs representing process, the end product of our search space determination
pathways to a pathway base for future references. One use phase is a focused library consisting of Web services from
of such references may be in the area of building models for service registry R that are involved in mining context C. We
biological entities at a more complex level. We present formally define focused library L in (4)
details of various phases of the mining process in the
following sections. L ¼ fs j s 2 R ^ ðs:operations \ OPintf ðCÞ 6¼ _
9op 2 s:operations : opconsume ðOPintf Þ \ OPintf ðCÞ 6¼ Þg;
4 PRESCREENING PLANNING ð4Þ
The search space of the mining process can be scoped down where s:operations denotes the set of operations imple-
if we are only interested in finding potentially interesting/ mented by s and opconsume ðOPintf Þ denotes the set of
useful composed services within certain functional areas operation interfaces that are consumed by op. Thus (4)
and locale of mining interest limiting the applicability of gives the focused library as the set of all Web services that
these functions. We organize our prescreening planning either provide implementation(s) for some interface(s) in
to contain two phases: Scope Specification and Search OPintf ðCÞ or whose operation(s) consume(s) some imple-
Space Determination. mentation(s) of interface(s) in OPintf ðCÞ. The focused
library thus covers the search space that is carved out
4.1 Scope Specification based on the identified mining context.
The mining process starts with the scope specification phase
where a composite Web service engineer optionally takes 5 SCREENING
advantage of necessary subjective interestingness measures
to bootstrap the mining process. The engineer may scope The screening phase in our framework consists of three
distinct subphases: filtering, static verification, and linking.
the mining activity by defining a list of functional areas and
the locales where these functions reside. For example, the 5.1 Filtering
engineer may express a general interest in service composi- To address the problem of combinatorial explosion, we rely
tions that involve travel, healthcare, or insurance within the on a publish/subscribe mechanism to convert the tradi-
locale of the continental US. Since different functional areas tional combinatorial search problem into a service/opera-
are drawn from corresponding domains, which may, in tion recognition problem. As a result, top-down searches
turn, rely on different ontologies, scope specification are transformed into bottom-up matches. We filter Web
essentially determines a set of ontologies to use for the services at two levels: operation and parameter.
mining process. When presented with these ontologies, the
engineer may choose to assign interestingness weights to 5.1.1 Operation Level Filtering
various ontology nodes that he/she is particularly inter- At the operation level, operation interfaces within the
ested in. In addition, the engineer may optionally choose to mining context serve as the medium for Web service
assign interestingness weights to some of the operation operations to plug into one another via direct recognition.
interfaces within these domains that are also of interest. The We show our operation level filtering mechanism in
end product of scope specification is the mining context Algorithm 1. Algorithms 2 and 3 list our operation agent’s
containing a list of relevant domains and locales limiting the functions for publication and subscription.
ZHENG AND BOUGUETTAYA: SERVICE MINING ON THE WEB 69
Fig. 4. Exhaustive search versus our filtering mechanisms. (a) Exhaustive search. (b) Operation level filtering. (c) Parameter level filtering.
0
4: cs generateLeadðop; op Þ; TABLE 1
5: Lps :addðcsÞ; Symbols and Parameters
6: end for
7: end if
8: return Lps ;
. Exact match or synonym. na ¼ nb . One index node could consume any of s2 ’s operations, and vice versa. There
is created for all synonymous ontology nodes. are two ways to match up operations from s1 and s2 . The
. Is-a. na is a child of nb . first is to iterate through s1 ’s operations and for each
. Has-a. na has a component nb . operation iterate through Noc to see if an consumable
We assume that the above relationships among para- operation is in s2 ’s operation set. The second is to iterate
meter types are already declared in domain ontologies and through s1 ’s operations and then s2 ’s operations. For each
thus can be automatically detected. Fig. 4c illustrates our operation found in s2 ’s operation set, check if it is
parameter level filtering mechanism. Since ontological consumable by the one found in s1 ’s operation set. Thus,
index nodes are used to describe the type of operation the time to perform operation level comparison using an
parameters, a parameter is considered an instance of such a exhaustive search is
node. When a Web service operation is introduced in the 2
Tof ¼ O Nws min Noi Noc logNoi ; Noi2 logNoc :
mining process, each of its output parameters will publish
to an ontology index node it is an instance of. Similarly, We now analyze the performance of our operation level
each of its input parameters will subscribe to an ontology filtering algorithms. According to Algorithm 2, the time to
index node it is an instance of. The publication and perform publishðopÞ is OðNops Þ. Likewise, from Algorithm 3,
subscription on a node can sometimes propagate to other the time to perform subscribeðopÞ is OðNopp Þ. Thus, Tof can
nodes within the ontology index node network. This be calculated according to Algorithm 1
happens when the node is involved in an inheritance or
compositional relationship with other nodes. In general, Tof ¼ O½Nop þ Nws ðNoi ðlogNop þ Nops þ Noc
publication propagates down a composition tree and up an ðlogNop þ Nopp ÞÞ þ logðjontjÞÞ
inheritance tree, while subscription propagates up a
¼ O½Nws ðNoi ðNops þ Noc Nopp
composition tree and down an inheritance tree. In addition
to parameter, a service would also subscribe to the ontology þ ð1 þ Noc Þ logNop Þ þ Nop þ logðjontjÞÞ:
index node that defines the type of its service providing Comparing the performance of our filtering algorithms
entity. For better performance, we include this subscription against that of an exhaustive search algorithm, we see
in lines 21-27 of Algorithm 1. Due to page limit, we omit that when Nop is relatively small and stable as compared
listing of parameter filtering algorithms. to Nws , Tof in our filtering algorithm is linear to Nws ,
As Web service operations are introduced into the while Tof in a traditional exhaustive search is exponential
mining process, subscriptions and publications at both the to Nws .
operation and parameter levels are triggered. Each opera- We conducted experiment on an XP machine with duo
tion interface and ontology index node keeps track of its
core 2.8 GHz to simulate the performance of the filtering
own subscribers and publishers. This tracking enables Web
algorithms. We focus in our experiment on investigating the
services to recognize one another at both levels.
relationship between the total processing time of the
5.2 Complexity Analysis filtering algorithms and the number of Web services that
We compare the computation complexity of our operation are used as inputs to these algorithms. Table 2 lists the
level filtering algorithms against a naive exhaustive search configuration variables used in our experiment.
algorithm. Table 1 lists relevant variables used in our We use ns to denote the number of services in the input
complexity analysis. and s ratio the ratio of ontology index nodes to services. For
If we refer to the size of collection s:operations as jSj, each s ratio, we iterate through ns , which starts at 100 and
then the time to carry out a hashtable-based check of the 2 doubles its values for each subsequent iteration, as
operation (lines 6 and 11 in Algorithm 1) is O½logðjSjÞ. We indicated in Table 2. For each pair of (ns , s ratio), we run
first analyze the performance of the traditional exhaustive through the filtering algorithms 10 times. We then take the
search mechanism (see Fig. 4a). An operation level averages of the total processing time from these runs and
composability check will iterate through all services in the plot them in Fig. 5. According to simulation results in Fig. 5,
scope and check each service against all other services. For we see that the total filtering time is linear to the number of
a pair of services s1 and s2 , it checks whether s1 ’s operation services used as input.
ZHENG AND BOUGUETTAYA: SERVICE MINING ON THE WEB 71
TABLE 2
Experiment Settings for Performance Simulation
In our simulation, we focus on investigating the interest- Fig. 6 uses circles to highlight skyline points. It shows the
ingness skyline of service compositions. In particular, we interestingness skylines for different numbers of operation
focus on the study of interestingness of compositions interfaces per domain. We see that as this number increases,
obtained through indirect recognition since they require the number of discovered compositions also increases
more computation according to (11), (8), and (10). Table 3 dramatically. However, the interestingness skyline keeps a
lists the configuration variables used in our experiment. population of top candidates with a relatively stable size.
74 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 2, NO. 1, JANUARY-MARCH 2009
Fig. 6. Skylines versus number of operations. (a) 50 operations/domain. (b) 200 operations/domain. (c) 500 operations/domain.
7 APPLICATION TO PATHWAY DISCOVERY WSML [16] and deployed them into a WSMX [17] runtime
environment [10]. We use simple pathways manually
Limitations of existing biological process representation
constructed here as references when we later check the
approaches motivated us to propose to model these
correctness of pathways automatically discovered using
processes as Web services [10]. To demonstrate the
our mining algorithms.
effectiveness of our mining framework, we applied it to
Fig. 10 gives a snapshot of a pathway network auto-
the discovery of pathways linking these service-oriented
matically discovered. To enable the identification of inter-
processes. To prepare for our experiment, we first
esting service compositions (i.e., segments within a
compiled a list of conceptual models of biological processes pathway network), we extended each WSML service
based on [11], [12], [13], [14], and [15]. In addition to modeling a biological process to declare the modeling
describing process models, these sources also reveal some source in its nonfunctional properties (nfp) section. Based on
simple relevant pathways that can be manually put comparison of such information from edges in the pathway
together, as shown in Fig. 9, where each subfigure graph involved in recognition patterns in Figs. 2b, 2c, and
represents models constructed based on information 2d, our interestingness evaluation algorithm is then used to
obtained from a single source. Ontology concepts highlight those that are determined novel. These high-
(Fig. 9a) are used by these models to define the type of lighted edges provide the user with some visual clues
service providing entities and operation input/output aiding the manual selection of interesting nodes to pursue
parameters. Multiple examples of promotion, inhibition and further. Once nodes of interest are selected by the user, our
indirect recognition can be found in these simple pathways. graph expansion algorithm (Section 6.2.2) is then used to
For example, Fig. 9c shows that upon injury, LTB4 recruits link interesting nodes and edges into a connected graph,
Neutrophil, promoting its service of producing COX2. which forms the basis for the user to formulate hypotheses.
Fig. 9d shows that Gastric Juice’s service can inhibit the An example of these hypotheses may state that an increased
services of both Stomach Cell and Mucus. Example of dosage amount of Aspirin will lead to the relief of pain, but
indirect recognition can be found in Fig. 9e, where PLA2’s may increase the risk of ulcer in the stomach. To test out
service can liberate Arachidonic Acid, which can, in turn, hypotheses such as these, an initial quantity representing
be used as input to either the produce PGG2 operation of units of service is assigned to all service providing entities
COX1’s service or the produce PGE2 operation of the COX2 at the beginning of the simulation (lines 1-3 in Algorithm 4).
service. In practice, we envision that research labs (i.e., These quantities are expected to change as entities involved
model sources) can publish their discoveries of individual in the simulation interact with each other over time. From
biological processes independently using the vehicle of the two sample plots generated based on simulation results,
Web services. Based on these models, we constructed we see that as the quantity of Aspirin increases from 10 in
corresponding WSDL services, wrapped them using plot (a) to 40 in plot (b), there is an increase in the erosion
ZHENG AND BOUGUETTAYA: SERVICE MINING ON THE WEB 77
Fig. 10. Discovered pathway highlighted with interesting subgraph and sample simulation results.
of stomach by the gastric juice due to the increased approach and are thus not taken advantage in our
suppression on the production of mucus that covers the approach. A number of feedback and log-based approaches
stomach wall. We also notice (in plots (a) and others that are have been proposed to improve QoS and service composa-
not shown in Fig. 10) that when the senseRelief operation is bility measures. For example, Jurca et al. [21] propose a QoS
enabled, it tends to obliterate the trace of Aspirin’s impact monitoring scheme based on quality ratings from service
on pain sensation due to the ‘leaky bucket’ effect it has on clients, Dustdar and Hoffmann [22] rely on analyzing Web
pain and relief signals. Once we disable this operation (see service execution log data to discover potential process
plot (b)) in our simulation, we see a dramatic association workflow instances involving these services, and Liang
between the Aspirin dosage and the suppression on the et al. [23] rely on usage data at user, template, and instance
amount of pain signal being generated. This together with levels to mine for Web service composition patterns. While
the observation of Aspirin’s impact on stomach erosion as these approaches may work well for business processes
noted earlier essentially confirms the initial hypothesis over time as user feedback and execution logs are expected
from the user. to become available, the challenge of identifying interesting
workflows in the absence of such feedback and logs,
especially at the time when component Web services are
8 RELATED WORK
just introduced, is still real. Our Web service mining
Web mining research focuses on applying data mining framework allows the mining of interesting service compo-
techniques to discover interesting patterns of data from the sitions to be carried out in the absence of user feedback and
Web. In contrast, our research focuses on studying service execution logs. When applied to the field of pathway
behaviors that are intrinsically dynamic in nature, thus the discovery, where the expedience of such discovery is the
need of dynamic invocation of services after the discovery key to success, our approach enables the proactive
of interesting service compositions. A comprehensive QoS- discovery of interesting pathways upon the availability of
based service composition selection strategy is proposed in these services.
[8]. Our weighted function on usefulness ((12)) is based on
this strategy. Ardagna and Pernici [18] take this a step
further by considering the frequency of execution paths. In 9 CONCLUSION
our framework, we don’t assume that such frequency is In this paper, we proposed a Web service mining frame-
readily available. Xiong et al. [19] investigate how to work that enables the proactive discovery of interesting
configure Web services in a dynamically changing environ- and useful service compositions. To address the challenge
ment. In this regard, our research aims at the quick of combinatorial explosion, we developed mining algo-
identification of best service compositions and thus focuses rithms that can scale well will grow number of Web
more on the initial selection of service compositions using services. We also discussed how interestingness and
usefulness measures. Lamparter et al. [20] rely heavily on usefulness can be objectively evaluated. Finally, we
user preferences in the selection of Web services. Such presented a novel application of our framework to the
preferences lead to a typical top-down service composition discovery of pathways linking biological processes. Future
78 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 2, NO. 1, JANUARY-MARCH 2009
work includes improving the agility of our mining frame- George Zheng received the BS degree in
electronics engineering from Shanghai Jiao
work to accommodate for the dynamic expansion and Tong University, China, in 1986, the MS degree
evolution of WSML services. This would not only allow the in electrical engineering from the University of
framework to be checked against an expanding pool of Virginia, Charlottesville, in 1991, and the MS
Web services, but more importantly, ensure that the results degree in computer science from Johns Hopkins
University, Baltimore, Maryland, in 1997. He
of the mining process are updated to reflect the current received the PhD degree in computer science
availability and semantic description of service capabilities. from the Virginia Polytechnic Institute and State
University, Blacksburg, in 2009. He is currently a
principal systems engineer with Science Applications International
REFERENCES Corporation (SAIC). His research interests include Web services mining,
bioinformatics, workflow, software simulation, and systems integration.
[1] Web Services Architecture—W3C Working Group Note, http://
www.w3.org/TR/2004/NOTE-ws-arch-20040211/, Feb. 2004.
Athman Bouguettaya received the PhD degree
[2] G. Zheng and A. Bouguettaya, “A Web Service Mining Frame-
in computer science from the University of
work,” Proc. IEEE Int’l Conf. Web Services (ICWS ’07), July 2007.
Colorado at Boulder in 1992. He is a science
[3] P. Ball, Designing the Molecular World—Chemistry at the Frontier.
leader at CSIRO ICT Center, Canberra. He was
Princeton Univ. Press, 1994.
previously a tenured faculty member in the
[4] OWL-S: Semantic Markup for Web Services—W3C Member
Computer Science Department at Virginia Poly-
Submission, http://www.w3.org/Submission/OWL-S/, Nov.
technic Institute and State University (commonly
2004.
known as Virginia Tech). He is on the editorial
[5] Web Service Modeling Ontology, http://www.wsmo.org/, 2009.
boards of several journals, including the IEEE
[6] B. Medjahed, A. Bouguettaya, and A.K. Elmagarmid, “Composing
Transactions on Services Computing, the Inter-
Web Services on the Semantic Web,” VLDB J., Sept. 2003.
national Journal on Web Services Research, the
[7] J. Augen, “The Evolving Role of Information Technology in the
VLDB Journal, the Distributed and Parallel Databases Journal, and the
Drug Discovery Process,” Drug Discovery Today, vol. 7, pp. 315-
International Journal of Cooperative Information Systems. He was
323, 2002.
invited to be a guest editor of a special issue of Computer on trust
[8] L. Zeng, B. Benatallah, A.H.H. Ngu, M. Dumas, J. Kalagnanam,
management in Web service environments and a special issue of
and H. Chang, “QoS-Aware Middleware for Web Services
Internet Computing on database technology on the Web. He also guest
Composition,” IEEE Trans. Software Eng., vol. 30, no. 5, pp. 311-
edited a special issue of the ACM Transactions on Internet on Semantic
327, May 2004.
Web Services. He served as a program chair of the 2008 International
[9] S. Borzsonyi, D. Kossmann, and K. Stocker, “The Skyline
Conference on Service Oriented Computing (ICSOC) and the IEEE
Operator,” Proc. 17th Int’l Conf. Data Eng., pp. 421-430, 2001.
RIDE Workshop on Web Services for E-Commerce and E-Government
[10] G. Zheng and A. Bouguettaya, “Discovering Pathways of Service
(RIDE-WS-ECEG 2004). He has served on numerous program
Oriented Biological Processes,” Proc. Ninth Int’l Conf. Web
committees of database and service-oriented computing conferences.
information Systems Eng. (WISE ’08), Sept. 2008.
His current research interests are in service-oriented computing. He is a
[11] S.Y. Auyang, “From Experience to Design—The Science behind
senior member of the IEEE and the ACM.
Aspirin,” http://www.creatingtechnology.org/biomed/aspirin.
htm, 2009.
[12] C. Freudenrich, “How Pain Works,” http://health.howstuffworks.
com/pain.htm, 2009.
[13] L. Hoffman, “How Aspirin Works,” http://health.howstuffworks.
com/aspirin1.htm, 2009.
[14] M. Landau, “Inflammatory Villain Turns Do-Gooder,” http://
focus.hms.harvard.edu/2001/Aug10_2001/immunology.html,
2009.
[15] M.-J. Yin, Y. Yamamto, and R.B. Gaynor, “The Anti Inflammatory
Agents Aspirin and Salicylate Inhibit the Activity of IB kinase-
,” Nature, vol. 369, pp. 77-80, Nov. 1998.
[16] The Web Service Modeling Language WSML, http://www.wsmo.
org/wsml/wsml-syntax, 2009.
[17] Web Services Execution Environment, http://sourceforge.net/
projects/wsmx, 2009.
[18] D. Ardagna and B. Pernici, “Global and Local QoS Constraints
Guarantee in Web Service Selection,” Proc. IEEE Int’l Conf. Web
Services (ICWS ’05), July 2005.
[19] P. Xiong, Y. Fan, and M. Zhou, “QoS-Aware Web Service
Configuration,” IEEE Trans. Systems, Man, and Cybernetics, Part
A, vol. 38, no. 4, pp. 888-895, 2008.
[20] S. Lamparter, A. Ankolekar, R. Studer, and S. Grimm, “Pre-
ference-Based Selection of Highly Configurable Web Services,”
Proc. 16th Int’l Conf. World Wide Web (WWW ’07), pp. 1013-1022,
2007.
[21] R. Jurca, B. Faltings, and W. Binder, “Reliable QoS Monitoring
Based on Client Feedback,” Proc. 16th Int’l Conf. World Wide Web
(WWW ’07), pp. 1003-1012, 2007.
[22] S. Dustdar, T. Hoffmann, and W. van der Aalst, “Mining of Ad-
Hoc Business Processes with TeamLog,” Data and Knowledge Eng.,
http://citeseer.ist.psu.edu/dustdar04mining.html, 2005.
[23] Q.A. Liang, J.-Y. Chung, S. Miller, and Y. Ouyang, “Service
Pattern Discovery of Web Service Mining in Web Service Registry-
Repository,” Proc. IEEE Int’l Conf. e-Business Eng. (ICEBE ’06),
pp. 286-293, 2006.