You are on page 1of 14

IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 2, NO.

1, JANUARY-MARCH 2009 65

Service Mining on the Web


George Zheng and Athman Bouguettaya, Senior Member, IEEE

Abstract—The Web is transforming from a Web of data to a Web of both Semantic data and services. This trend is providing us with
increasing opportunities to compose potentially interesting and useful services from existing services. While we may not sometimes
have the specific queries needed in top-down service composition approaches to identify them, the early and proactive exposure of
these opportunities will be key to harvest the great potential of the large body of Web services. In this paper, we propose a Web service
mining framework that allows unexpected and interesting service compositions to automatically emerge in a bottom-up fashion. We
present several mining techniques aiming at the discovery of such service compositions. We also present evaluation measures of their
interestingness and usefulness. As a novel application of this framework, we demonstrate its effectiveness and potential by applying it
to service-oriented models of biological processes for the discovery of interesting and useful pathways.

Index Terms—Service mining, service recognition, interestingness, usefulness, pathway discovery.

1 INTRODUCTION

T HE Web is currently going through a transformation


from a data-centric Web to a Semantic Web consisting of
both self-describable data and Web services, which are a new
up empty-handed. The top-down approach can thus work
well only if the service composer clearly knows what to look
for and the component Web services needed to compose
type of first class object. The Web service deployment of such services are available.
previously isolated applications allows such an application Aiming at exploring the full potential of the service
to be described and published by one organization (i.e., space without prior knowledge of what exactly is in it,
service provider), and discovered and invoked later by another view that approaches service composition from the
other independently developed applications (i.e., service bottom-up is building up recently [2]. Instead of starting the
consumers) [1], essentially making these applications search with a specific goal, a service engineer may be
interoperable on the Web. This unprecedented ease of interested in discovering any interesting and useful service
application integration contributed to the increasing popu- compositions that may come up in the search process. For
larity of Web service composition, which aims at providing performance reasons, a general goal may be provided at the
value-added services through composing existing services. beginning to scope down the initial search space to a
Web service composition has traditionally taken a top-down reasonable size. For an illustration, we show at the bottom
approach. The top-down approach requires a user to of Fig. 1 that a service engineer sets out to find any
provide a goal containing specific search criteria defining interesting and useful services with a general interest in
the exact service functionality the user expects, as shown Chinese medicine in mind. What comes out of the search
through an example of composing a travel service at the top process might be quite surprising. For example, in addition
left of Fig. 1. Often, the more specific the query and search to discovering the possibility of composing a service for
criteria are, the smaller the search space and more relevant translating Tsalagi1 to Chinese, the engineer also discovers,
the composition results will be. The specificity of the search with the help of a service mining tool, a service composition
criteria would reflect the interest and often knowledge of that takes as input a biological sample from a subject,
the service composer about the potential composability of determines the corresponding genome and the possible
existing Web services. Since the composer is typically only diseases the subject is predisposed to, and finally generates
aware of and consequently interested in some specific types a list of treatment recommendations and/or life style
of compositions, the scope of such a search is usually very suggestions. Thus, unlike the search process in the top-
narrow. As a result, an attempt to start the search with a set down approach that is strictly driven by the search criteria,
of specific criteria that are not framed to coincide with the the search process in the bottom-up approach is serendipi-
availability of interested Web services will most likely end tous in nature, i.e., it has the potential of finding interesting
and useful service compositions that are unexpected.
As more diverse services are deployed to the Web at an
. G. Zheng is with the Department of Computer Science, Virginia accelerating rate, the collective opportunities of composing
Polytechnic Institute and State University, Blacksburg, VA 94061.
services will surpass anyone’s imagination. Many of these
E-mail: gzheng@vt.edu.
. A. Bouguettaya is with Commonwealth Scientific and Industrial Research opportunities will be hidden in the Web of available
Organisation (CSIRO) ICT Centre, Building 108, North Road, ANU services and unexpected to most people. While we may
Campus, ACTON ACT 2601, Australia. not sometimes have the specific queries needed to search
E-mail: athman.bouguettaya@csiro.au.
for them, being able to discover these opportunities early in
Manuscript received 10 Aug. 2008; revised 15 Dec. 2008; accepted 18 Jan. today’s business environment equates to gaining competi-
2009; published online 22 Jan. 2009.
For information on obtaining reprints of this article, please send e-mail to: tive business advantages. For government agencies, doing
tsc@computer.org, and reference IEEECS Log Number TSC-2008-08-0072.
Digital Object Identifier no. 10.1109/TSC.2009.2. 1. A language spoken by the Cherokee Indian tribe.
1939-1374/09/$25.00 ß 2009 IEEE Published by the IEEE Computer Society
66 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 2, NO. 1, JANUARY-MARCH 2009

distinguish interesting and useful composed services from


those that are either trivial, already known, or useless.
In this paper, we propose a Web service mining frame-
work that addresses these two challenges. We organize the
remainder of the paper as follows: Section 2 first introduces
the concept of Web service recognition, which forms the
basis of much of our mining algorithms. We present our
service mining framework and details of all the phases in
Sections 3-6. We show the application of this framework to
pathway discovery in Section 7. In Section 8, we survey and
contrast related work with our research. We conclude the
paper with discussion of future work in Section 9.

2 WEB SERVICE RECOGNITION


Much like molecules in the natural world where they can
recognize each other and form bonds in between [3], Web
services and operations can also recognize each other
through both syntax and semantics. Consequently, poten-
tially interesting and useful service compositions may
Fig. 1. Top-down composition versus bottom-up mining. emerge from bottom-up through such mechanism. In the
following, we first introduce our extensions to existing Web
so also means that citizens receive useful and potentially service ontologies (e.g., OWL-S [4] and WSMO [5]) that
life-enhancing services in advance. It is thus essential to be makes this possible.
able to proactively discover opportunities for composing Operation interface. A construct used to specify a
shared service capability. An operation can implement an
useful services even when the goals are unspecified at
operation interface or make known of its need to invoke
the moment, or simply hard to imagine or unknown. Much
operations that implement an operation interface. The
like the easy access to a glut of data that has provided a
separation of operation from operation interface allows
fertile ground for data mining research, we expect that the
the same interface to be implemented by multiple service
increase in Web services’ availability will also spur both operations. This construct allows Web services to declara-
the need and opportunities to break new ground on Web tively plug into one another at the operation level.
service mining. We define Web service mining as a bottom- Domain. It is used to group relevant service capabilities
up search process aimed at the proactive discovery of or operation interfaces into the same category. A Web
potentially interesting and useful Web services from service’s involvement with a domain is reflected by
existing services. whether it supplies or consumes an implementation of an
Web service mining faces two main challenges, namely, operation interface in such a domain. Like OWL-S and
combinatorial explosion and evaluation of interestingness and WSMO, we rely on domain ontologies to define the type of
usefulness. A naive bottom-up approach would be to operation parameters.
conduct an exhaustive search and full-blown composability Based on these extensions, we identify four types of
analysis between any two Web services in the service recognition between Web services and operations, as shown
registry. Once a potentially positive composition from two in Fig. 2.
component services is identified, the analysis would 1. Direct recognition. A direct recognition is established
expand to allow for more component services in a between operations opa and opb if opa consumes the
composition. As the number of registered Web services implementation of an operation interface opintf , which is
increases at an accelerating rate, such an approach can implemented by opb . In addition, opa and opb must be mode,
quickly become infeasible due to the overwhelming binding, and message composable [6]. Mode composability
states that the following pairs are composable: notification
computation resulting from a “combinatorial explosion.”
with one-way and solicit-response with request-response.
The second challenge is determining the interestingness and
Binding composability states that both operations should
usefulness of a composed service identified through mining.
share the same protocol. Message composability states that
In the top-down approaches, the determination of interest-
the number of parameters and type of each parameter
ingness and usefulness is not a major concern since the goal should match between the two operations.
provided by the user already implies what types of 2. Indirect recognition. A target operation opt indirectly
compositions the user anticipates. In Web service mining, recognizes a source operation ops if ops generates some or all
neither interestingness nor usefulness would be so obvious input parameters of opt . We use the term indirect to indicate
when the composed Web services are discovered without the fact that there is a potential need to relay parts of the
any specific goals. Useful composed services may be output message from ops to parts of the input message to opt
contaminated with frivolous or trivial ones that add little at the composition level.
or no value. In addition, some of these useful services may The following two recognition mechanisms are relevant to
have already been known. It will thus be necessary to service oriented models of biological processes as we apply
ZHENG AND BOUGUETTAYA: SERVICE MINING ON THE WEB 67

Fig. 2. recognitions between services and operations.

our mining framework later to the discovery of biological


pathways. Each service models the process(es) of a biological Fig. 3. Service mining framework.
entity, also referred to later as a service providing entity.
3. Promotion. When operation op1 of service sa produces Scope specification is followed by several automatic
an entity (i.e., output parameter) that in turn provides phases. The first of these is search space determination. To
service sb , we say that sa : op1 promotes sb . help curb the problem of combinatorial explosion when
4. Inhibition. When operation op1 of service sa consumes faced with a large number of Web services, the mining
an entity (i.e., input parameter) that in turn provides service context is used in this phase to identify a focused library of
sb , we say that sa : op1 inhibits sb .
existing Web services as the initial pool for further mining.
Note that in order for Web services and operations to
The next is the screening phase, which contains three
recognize one another using these mechanisms, additional
subphases representing a grow-weed-grow cycle. In the
pre- and postconditions may also need to be met.
filtering (first growing) subphase, Web services in the
focused library would go through filtering algorithms for
3 THE SERVICE MINING FRAMEWORK the purpose of identifying potentially interesting leads of
Fig. 3 shows our Web service mining framework using a service compositions. This is achieved through establishing
multiphase approach that is inspired by the drug linkages between Web services based on the four recogni-
discovery process [7]. The idea is to keep the computation tion mechanisms at a “coarse-grained” level (i.e., involving
complexity simple in the early phases when there are a only a subset of matching Web service characteristics such
large number of Web services to process. As the size of as operation interfaces and parameter types), so the filtering
the candidate pool shrinks toward the later phases, it step can be quickly completed. In static verification (weed-
increases the computation complexity in order to achieve ing) subphase, service compositions leads identified earlier
better accuracy. The framework starts first with scope are semantically verified based on a subset of operation
specification, a manual phase involving a domain expert pre- and postconditions involving binary variables (e.g.,
defining the context of mining. We expect the domain whether the input to an operation is activated) and
expert to have a general idea about the “seeds” of Web enumerated properties (e.g., the locale of an operation
service functional areas (e.g., travel, insurance, medicine) input). In the linking (last growing) subphase, verified
and optionally the locales of these functions that he/she is service compositions are linked together to establish more
interested in mining. Such seeds are expected to grow into comprehensive composition networks. When the mining
fruitful compositions as the mining progresses. Within this framework is applied to the discovery of biological path-
phase, a hierarchy of domain ontology indexes is established ways,2 such composition networks would represent path-
to speed up latter phases in the mining process. Weights ways linking service oriented models of biological
may be assigned to these seeds to differentiate user’s processes. The composition networks are then input to the
interest in them and to help retain compositions grown evaluation phase, which consists of four subphases. Objective
out of those that the user is more interested in. Note that evaluation identifies and highlights interesting segments of a
rather than specifying the exact goal of the compositions composition network by checking whether such linkages
in pursuit as would a traditional Web service composition
approach, scope specification does not limit what that goal 2. Pathways are represented as a network of interactions among
biological entities such as cell, DNA, RNA, and enzyme. Exposure of
should be. Consequently, any composition leads emerged pathways is expected to deepen our understanding of how diseases come
within this scope will be pursued further. about and help expedite drug discovery for treating them.
68 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 2, NO. 1, JANUARY-MARCH 2009

are novel (i.e., previously unknown) and whether they are applicability of corresponding functions within these
established in a surprising way (e.g., if they link segments domains. We formally define mining context C in (1)
not previously known to be related). An interactive session
C ¼ fdðLÞ j d 2 Dg; ð1Þ
follows next with the user taking hints from highlighted
interesting segments within a composition network and where
picking a handful of nodes to pursue further. These nodes
are then automatically linked into a connected subgraph, to . D is a set of Web service domains;
. L is a set of locale attributes of mining interest;
the extent possible, using a subset of nodes and edges in the
. dðLÞ is a domain carved out by L.
original graph. This subgraph provides the user the basis to
formulate hypotheses, which can then be tested out via Consequently, if we use ! to denote the relationship of
refers to, then the set of all ontologies referred to in C can be
simulation. In the case of pathway discovery, the simulation
denoted as OntðCÞ and calculated using
is used to invoke relevant service operations, changing the
quantity/attribute value of various entities involved in the OntðCÞ ¼ font j 9d 2 C ^ d ! ontg; ð2Þ
composition network. Results from the simulation phase and the set of all operation interfaces included in C can be
are expected to reveal hidden relationships among the denoted as OPintf ðCÞ and calculated using
corresponding processes. These results are then presented
to the user, whose subjective evaluation finally determines OPintf ðCÞ ¼ fopintf j 9d 2 C ^ opintf 2 dg: ð3Þ
whether the subgraph in pursuit is actually useful. In some 4.2 Search Space Determination
cases, the user may want to revise the simulation initial
The mining scope determines the coverage of the search
setting, rerun the simulation, and evaluate new simulation
space when looking for composable components for the
results. At the end, the user may want to introduce some of
purpose of composition. Similar to the drug discovery
the discovered service composition subgraphs representing process, the end product of our search space determination
pathways to a pathway base for future references. One use phase is a focused library consisting of Web services from
of such references may be in the area of building models for service registry R that are involved in mining context C. We
biological entities at a more complex level. We present formally define focused library L in (4)
details of various phases of the mining process in the
following sections. L ¼ fs j s 2 R ^ ðs:operations \ OPintf ðCÞ 6¼  _
9op 2 s:operations : opconsume ðOPintf Þ \ OPintf ðCÞ 6¼ Þg;
4 PRESCREENING PLANNING ð4Þ

The search space of the mining process can be scoped down where s:operations denotes the set of operations imple-
if we are only interested in finding potentially interesting/ mented by s and opconsume ðOPintf Þ denotes the set of
useful composed services within certain functional areas operation interfaces that are consumed by op. Thus (4)
and locale of mining interest limiting the applicability of gives the focused library as the set of all Web services that
these functions. We organize our prescreening planning either provide implementation(s) for some interface(s) in
to contain two phases: Scope Specification and Search OPintf ðCÞ or whose operation(s) consume(s) some imple-
Space Determination. mentation(s) of interface(s) in OPintf ðCÞ. The focused
library thus covers the search space that is carved out
4.1 Scope Specification based on the identified mining context.
The mining process starts with the scope specification phase
where a composite Web service engineer optionally takes 5 SCREENING
advantage of necessary subjective interestingness measures
to bootstrap the mining process. The engineer may scope The screening phase in our framework consists of three
distinct subphases: filtering, static verification, and linking.
the mining activity by defining a list of functional areas and
the locales where these functions reside. For example, the 5.1 Filtering
engineer may express a general interest in service composi- To address the problem of combinatorial explosion, we rely
tions that involve travel, healthcare, or insurance within the on a publish/subscribe mechanism to convert the tradi-
locale of the continental US. Since different functional areas tional combinatorial search problem into a service/opera-
are drawn from corresponding domains, which may, in tion recognition problem. As a result, top-down searches
turn, rely on different ontologies, scope specification are transformed into bottom-up matches. We filter Web
essentially determines a set of ontologies to use for the services at two levels: operation and parameter.
mining process. When presented with these ontologies, the
engineer may choose to assign interestingness weights to 5.1.1 Operation Level Filtering
various ontology nodes that he/she is particularly inter- At the operation level, operation interfaces within the
ested in. In addition, the engineer may optionally choose to mining context serve as the medium for Web service
assign interestingness weights to some of the operation operations to plug into one another via direct recognition.
interfaces within these domains that are also of interest. The We show our operation level filtering mechanism in
end product of scope specification is the mining context Algorithm 1. Algorithms 2 and 3 list our operation agent’s
containing a list of relevant domains and locales limiting the functions for publication and subscription.
ZHENG AND BOUGUETTAYA: SERVICE MINING ON THE WEB 69

Fig. 4. Exhaustive search versus our filtering mechanisms. (a) Exhaustive search. (b) Operation level filtering. (c) Parameter level filtering.

Algorithm 1. Operation Level Filtering 27: end if


Input: Context operation interfaces OPintf ðCÞ, focused 28: end for
library F . When publishing an operation that implements an
Output: Leads of composed Web services L. interface, function publishðopÞ of the corresponding agent
Variables: Leads from publication and subscription Lps , checks whether there is any subscriber to the interface. If
operation interfaces consumed by op; opconsume ðOPintf Þ. so, it tries to establish a service composition lead using
1: == Create an agent per operation interface for keeping direct recognition between the publisher and the subscri-
track of publishers and subscribers ber. Similarly, when a service operation subscribes to an
operation interface that it consumes, function subscribeðopÞ
2: for all opintf 2 OPintf ðCÞ do
checks whether there is any publisher that implements
3: create Agentðopintf Þ;
the interface. If so, it tries to establish a lead service
4: end for
composition between the subscriber and the publisher.
5: for all s 2 F do Fig. 4b depicts our operation level filtering mechanism.
6: for all op 2 s:operations do
7: == Operation implementing an operation Algorithm 2. Operation Agent Function for Publication
interface publishes through the interface publishðopÞ
8: if 9opintf 2 OPintf ðCÞ: op implements opintf then Input: Web service operation op providing implementation
9: Lps Agentðopintf Þ.publish(op); for the operation interface that this agent represents.
10: L:addðLps Þ; Output: Leads of composed Web services Lps .
11: end if Variable: A composed service cs.
12: == Operation consuming implementation of an 1: publishers:addðopÞ;
operation interface subscribes to the interface 2: if subscribers 6¼  then
0
3: for all op 2 subscribers do
13: for all opintf 2 opconsume ðOPintf Þ do 0
4: cs generateLeadðop ; opÞ;
14: if opintf 2 OPintf ðCÞ\then
5: Lps :addðcsÞ;
15: Lps Agentðopintf Þ.subscribe(op);
6: end for
16: L:addðLps Þ;
7: end if
17: end if
8: return Lps ;
18: end for
19: end for Algorithm 3. Operation Agent Function for Subscription
20: == Subscribe service to the service providing entity subscribeðopÞ
type Input: Web service operation op interested in invoking the
21: k typeðs:providerEntityÞ; operation interface that this agent represents.
22: if k 2 OntðCÞ then Output: Leads of composed Web services Lps .
23: if :9AgentðkÞ then Variable: A composed service cs
24: create AgentðkÞ; 1: subscribers:addðopÞ;
25: end if 2: if publishers 6¼  then
0
26: AgentðkÞ:subscribeðsÞ; 3: for all op 2 publishers do
70 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 2, NO. 1, JANUARY-MARCH 2009

0
4: cs generateLeadðop; op Þ; TABLE 1
5: Lps :addðcsÞ; Symbols and Parameters
6: end for
7: end if
8: return Lps ;

5.1.2 Parameter Level Filtering


Parameter level filtering targets three types of recognition:
promotion, inhibition, and indirect recognition, as described in
Section 2. We consider three types of matching between
parameters p1 and p2 , whose data types refer to domain
ontology index nodes (DOINs) na and nb , respectively:

. Exact match or synonym. na ¼ nb . One index node could consume any of s2 ’s operations, and vice versa. There
is created for all synonymous ontology nodes. are two ways to match up operations from s1 and s2 . The
. Is-a. na is a child of nb . first is to iterate through s1 ’s operations and for each
. Has-a. na has a component nb . operation iterate through Noc to see if an consumable
We assume that the above relationships among para- operation is in s2 ’s operation set. The second is to iterate
meter types are already declared in domain ontologies and through s1 ’s operations and then s2 ’s operations. For each
thus can be automatically detected. Fig. 4c illustrates our operation found in s2 ’s operation set, check if it is
parameter level filtering mechanism. Since ontological consumable by the one found in s1 ’s operation set. Thus,
index nodes are used to describe the type of operation the time to perform operation level comparison using an
parameters, a parameter is considered an instance of such a exhaustive search is
node. When a Web service operation is introduced in the  2  
Tof ¼ O Nws  min Noi  Noc  logNoi ; Noi2  logNoc :
mining process, each of its output parameters will publish
to an ontology index node it is an instance of. Similarly, We now analyze the performance of our operation level
each of its input parameters will subscribe to an ontology filtering algorithms. According to Algorithm 2, the time to
index node it is an instance of. The publication and perform publishðopÞ is OðNops Þ. Likewise, from Algorithm 3,
subscription on a node can sometimes propagate to other the time to perform subscribeðopÞ is OðNopp Þ. Thus, Tof can
nodes within the ontology index node network. This be calculated according to Algorithm 1
happens when the node is involved in an inheritance or
compositional relationship with other nodes. In general, Tof ¼ O½Nop þ Nws  ðNoi  ðlogNop þ Nops þ Noc
publication propagates down a composition tree and up an  ðlogNop þ Nopp ÞÞ þ logðjontjÞÞ
inheritance tree, while subscription propagates up a
¼ O½Nws  ðNoi  ðNops þ Noc  Nopp
composition tree and down an inheritance tree. In addition
to parameter, a service would also subscribe to the ontology þ ð1 þ Noc Þ  logNop Þ þ Nop þ logðjontjÞÞ:
index node that defines the type of its service providing Comparing the performance of our filtering algorithms
entity. For better performance, we include this subscription against that of an exhaustive search algorithm, we see
in lines 21-27 of Algorithm 1. Due to page limit, we omit that when Nop is relatively small and stable as compared
listing of parameter filtering algorithms. to Nws , Tof in our filtering algorithm is linear to Nws ,
As Web service operations are introduced into the while Tof in a traditional exhaustive search is exponential
mining process, subscriptions and publications at both the to Nws .
operation and parameter levels are triggered. Each opera- We conducted experiment on an XP machine with duo
tion interface and ontology index node keeps track of its
core 2.8 GHz to simulate the performance of the filtering
own subscribers and publishers. This tracking enables Web
algorithms. We focus in our experiment on investigating the
services to recognize one another at both levels.
relationship between the total processing time of the
5.2 Complexity Analysis filtering algorithms and the number of Web services that
We compare the computation complexity of our operation are used as inputs to these algorithms. Table 2 lists the
level filtering algorithms against a naive exhaustive search configuration variables used in our experiment.
algorithm. Table 1 lists relevant variables used in our We use ns to denote the number of services in the input
complexity analysis. and s ratio the ratio of ontology index nodes to services. For
If we refer to the size of collection s:operations as jSj, each s ratio, we iterate through ns , which starts at 100 and
then the time to carry out a hashtable-based check of the 2 doubles its values for each subsequent iteration, as
operation (lines 6 and 11 in Algorithm 1) is O½logðjSjÞ. We indicated in Table 2. For each pair of (ns , s ratio), we run
first analyze the performance of the traditional exhaustive through the filtering algorithms 10 times. We then take the
search mechanism (see Fig. 4a). An operation level averages of the total processing time from these runs and
composability check will iterate through all services in the plot them in Fig. 5. According to simulation results in Fig. 5,
scope and check each service against all other services. For we see that the total filtering time is linear to the number of
a pair of services s1 and s2 , it checks whether s1 ’s operation services used as input.
ZHENG AND BOUGUETTAYA: SERVICE MINING ON THE WEB 71

TABLE 2
Experiment Settings for Performance Simulation

Fig. 5. Filtering time versus number of services.


5.3 Static Verification and Linking
Various measures have been proposed in [6] to determine the following function to measure the similarity between opi
whether two operations are composable at both syntactic and opj :
and semantic levels. These measures can be used to
Simðopi ; opj Þ
determine whether a direct recognition-based composition  
is actually valid. For promotion-based and inhibition-based jPin ðopi Þ \ Pin ðopj Þj jPout ðopi Þ \ Pout ðopj Þj
¼ cp 
compositions, they are valid because the entities involved jPin ðopi Þ [ Pin ðopj Þj jPout ðopi Þ [ Pout ðopj Þj
 
provide the corresponding services by declaration. For an jCpre ðopi Þ \ Cpre ðopj Þj jCpost ðopi Þ \ Cpost ðopj Þj
indirect recognition-based composition, its validity can be þ cc  ;
jCpre ðopi Þ [ Cpre ðopj Þj jCpost ðopi Þ [ Cpost ðopj Þj
determined by checking whether the preconditions and
ð5Þ
postconditions of source and target operations in a
composition overlap. Statically verified service composi- where cp and cc are weights such that 0  cp ; cc  1 and
tions are then linked together in the linking subphase into cp þ cc ¼ 1. jP j and jCj give the size of parameter set P and
more comprehensive service composition networks. Within condition set C, respectively. When dealing with para-
such a network, nodes would represent services, opera- meters, operators \ and [ are based on ontological overlap
tions, operation interfaces, and parameters (see Fig. 2). and union of two concepts. For conditions, these two
Edges would represent relationships among these nodes. operators would find the overlap and union of two concepts
These relationships include: an operation implementing an that are either the same or synonyms. According to (5),
operation interface, an operation consuming the implemen- Simðopi ; opj Þ ranges from 0 to 1, with 1 indicating that the
two operations have the same parameters and conditions.
tation of an operation interface, an entity providing a
Note that (5) focuses only on the external observable
service, a service providing an operation, and an operation
similarity between two operations.
consuming or producing a parameter.
6.1.2 Domain Correlation and Unrelatedness
6 POSTSCREENING EVALUATION Domain correlation  measures the relevance of two
domains di and dj or the cohesion of the same domain
Not all service compositions discovered during the screen-
(when i ¼ j). The relevance of di and dj can be reflected by
ing phase are necessarily interesting and useful. The
the composability among operations from the two domains.
purpose of postscreening analysis and evaluation is to When i ¼ j, this relevance becomes the measure of cohesion
identify those that are truly interesting and useful. of a single domain. Based on heuristics, domain correlation
Postscreening evaluation consists of four distinct steps: is defined as
objective evalutaion, interactive hypothesis formulation, simula-
1
tion, and subjective evaluation. "
½di ; dj  ¼ e 0 ðnþ1Þ ; ð6Þ
6.1 Objective Evaluation where n is the number of unique pairs of operations,
Objective evaluation aims at using objective measures to fðopi ; opj Þ j opi 2 di ; opj 2 dj g, which are previously known
evaluate the interestingness and usefulness of composed to have been involved in a composition. When n ¼ 0, the
Web services. In this section, we first introduce the concepts correlation between two domains in (6) is assigned an
1
of operation similarity, domain correlation and domain initial value of 0 ¼ e "0 . Equation (6) shows that  for two
unrelatedness. These concepts are used in our objective domains quickly approaches 1 as n increases. We define
measures for interestingness and usefulness, which are the multiplicative inverse of the domain correlation as
described thereafter. domain unrelatedness . We bound the maximum value of 
to 1, thus,
6.1.1 Operation Similarity
0
The concept of operation similarity is relevant when we ½di ; dj  ¼ : ð7Þ
study the interestingness of an indirect recognition-based ½di ; dj 
composition. The similarity of two operations can be Both (6) and (7) can be used later to measure the diversity
measured by comparing their input parameter set, output of components involved in a service composition, as
parameter set, preconditions and postconditions. We use discussed in the following section.
72 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 2, NO. 1, JANUARY-MARCH 2009
8
6.1.3 Objective Interestingness < ds 2DðsÞ ½dðopÞ; ds ; promotion or inhibition
Dv ¼
Interestingness indicates how interesting a Web service :
op2OPs ½dðopÞ; dðopt Þ; indirect recognition:
composition discovered from the screening phase really is.
ð9Þ
We consider two types of application area where this needs
to be determined. The first is e-government and e-commerce, If we replace  with , (9) can be rewritten as
where Quality of Web Service (QoWS) plays an important 8 0
role in the case of direct recognition-based composition. We < ds 2DðsÞ ½dðopÞ;ds  ; promotion or inhibition
Dv ¼ ð10Þ
say that the composition is interesting if it exhibits better : 0
op2OPs ½dðopÞ;dðopt Þ ; indirect recognition:
qualities than all previously discovered similar operations.
The second application area is biological pathway discovery, For promotion or inhibition involving op and s, DðsÞ is a
where the significance of QoWS is less important as we don’t set of domains s is involved with. In all cases, dðÞ gives the
expect any occurrences of direct recognition-based composi- domain of a given service operation. According to (10), in
tion. In this section, we attempt to devise an objective the case of promotion or inhibition, the more domains s is
interestingness measure factoring in various considerations involved with and the less the correlation between these
for the other three recognition mechanisms that are applic- domains and that for op, the higher the value of diversity. In
able to both application areas. We define interestingness I the case of indirect recognition, both the number and the
as a tuple ½A; N; Uq ; Dv ; W , where domains of source operations op 2 OPs contribute to the
overall diversity. The more source operations that are
. A is the actionability of the composition, involved in the composition and the less the correlation
. N is the novelty of the composition, between their domains and that of the target operation, the
. Uq is the uniqueness of the composition, higher the value of diversity.
. Dv is the diversity of the composition constituents, and
. W is the product of expert-assigned weights wi 6.1.4 Objective Usefulness
(wi > 1) to DOINs and operationQinterfaces that are While it is difficult to objectively quantify the usefulness
involved in a composition: W ¼ m i¼1 wi . We choose of new properties emerged out of a service composition
to multiply all such weights involved in a composi- using a simple usefulness measure, we take into
tion to reflect their subjective interestingness-enhan- consideration elements of usefulness that can be objec-
cing effect. tively evaluated for the cases of direct and indirect
Actionability. We define actionability as a binary (i.e., 1 recognition. Due to the prevalent use of QoWS attributes
for actionable, 0 for nonactionable) representing whether in e-commerce and e-government, usefulness in these
the composability of a composition can be verified through settings could be calculated for the following two cases:
simulation or live execution. A nonactionable composition Case 1. There are multiple operations competing to
is considered uninteresting. Thus, actionability contributes provide each of the input parameters of a completely bound
operation within a group of relevant leads (GRLs) centered
multiplicatively toward the overall interestingness.
around an operation interface. Each of these competing
Novelty. We define novelty as a binary (i.e., 1 for novel, 0
operations may, in turn, have implementations provided by
for old or known). The source of this information may be a
multiple services. If the overall composition provides the
database or registry that keeps track of known service
same or similar function not seen before, then the
compositions. An old or known composition is considered usefulness can be at least partially calculated using the
uninteresting. Thus novelty also contributes multiplica- overall QoWS. The purpose of doing that is so that all the
tively toward the overall interestingness. candidate compositions can then be compared to find out
Uniqueness. Uniqueness Uq measures how unique which one of them exhibits the highest QoWS.
a composition is. We use the following function to Case 2. If a composition provides a function that is the
calculate uniqueness: same as or similar to an existing function, usefulness could
8 be expressed through its QoWS improvement achieved over
>
< 1; promotion or inhibition
the existing function.
Uq ¼ 1  Maxop2D In either case, QoWS-based usefulness can be deter-
>
:
SimðcompðOPs ; opt Þ; opÞ; indirect recognition: mined either statically if QoWS is registered over time and
ð8Þ become wildly known or measured at runtime if they are
vague and have to be determined dynamically.
For both promotion and inhibition, the uniqueness is set
to 1 due to the validity of the composition. For indirect 6.1.5 Evaluation of Objective Interestingness
recognition, D is a reference set of domains. Obviously, the and Usefulness
more similar the composed operation is to an existing When multiple compositions are identified through the
operation, the less unique it is regarded. Uq can thus vary screening phase, the candidate pool for further considera-
between 0 and 1. tion can be reduced through selecting service compositions
Diversity. Diversity Dv indicates how diverse compo- that exhibit the highest values of objective interestingness
nents involved in a Web service composition are. We use and/or usefulness. This can be done using two approaches,
the following objective function to measure diversity: namely, weighted function and skyline methods.
ZHENG AND BOUGUETTAYA: SERVICE MINING ON THE WEB 73

Weighted function. A weighted function taking into TABLE 3


account interestingness measures can be devised as follows: Experiment Settings for Interestingness Simulation
Y
m
I ¼ ANðcu Uq þ cd Dv Þ wi ; ð11Þ
i¼1

where cu and cd are weights for the weighted function such


that 0  cu ; cd  1 and cu þ cd ¼ 1, m is the number of
expert-assigned weights wi (wi > 1) to operation interfaces
and DOINs that are involved in a composition. Likewise, a
similar weighted function on usefulness in the e-commerce
environment can be calculated using
8X X Since both actionability and novelty are boolean vari-
> q  qmin qmax  q
>
> wq þ w ; case 1; ables, we ignore them in our simulation. During each
< q2pos qmax  qmin q2neg q qmax  qmin
U¼ X X iteration of our mining algorithms, we pick a value for the
> q q
>
> wq  wq ; case 2; number of operation interfaces per domain and use that to
: q max  q min q max  qmin
q2pos q2neg populate operations in 50 domains. For each domain
ð12Þ operation, we generate its input/output parameters such
that the number of these parameters uniformly falls in the
where pos is a set of quality attributes (e.g., reliability) that
range of 0-5. Each of these parameters is associated with a
contribute positively toward the weighted function while
neg is a set of quality attributes (e.g., response time) that DOIN, which is identified with a sequence number. For
contribute negatively toward the weighted function. wq are simplicity, we flatten all the DOINs (i.e., no inheritance and
weights assigned P by users to each quality attributes such composition relationships among ontology nodes) so that
that wq  0 and wq ¼ 1. q represents the value of an only exact matches and synonyms will be considered. We
aggregate lead QoWS attribute. q ¼ qcomposition  qexisting . place these DOINs (50,000 of them for the experiment) in a
qmax and qmin are, respectively, the maximum and minimum circular buffer so that the last sequence number is next to
values of the same attribute among all leads in the GRL. The the first one. To study the contribution of user assigned
first half of (12) is essentially a combined form of the scaling weights on DOINs toward the interestingness, we ran-
phase and weighting phase proposed in [8]. The second half of domly choose 100 nodes (50;000  0:002) using a uniform
(12) extends the function to the improvement of QoWS distribution and assign a weight uniformly distributed
calculated between two compositions. between 1.0 and 5.0. To simulate the cohesive nature of
A weighted function, once configured with all the DOINs in a domain, we pick them for the domain using a
weights, can be rather simple to use. Unfortunately, the
Gaussian distribution around a mean sequence number
configuration itself requires the user to first express his/her
randomly chosen for the domain according to a uniform
preferences over several interestingness-related measures
or quality attributes as numeric weights. Often the user has distribution. We assume that each parameter has an equal
to go through a time-consuming trial and error process, as chance of being associated with a DOIN. To simulate the
the data are being presented, to arrive at a desired pre- and postconditions, each parameter is symbolically
combination of such weights. given a range randomly chosen between 0 and 1.0 using a
Skyline. Another popular approach in selecting service uniform distribution. We use the overlap of two such
compositions that exhibit high values of desired properties is ranges (see (5)) to calculate the contribution of these
the use of skyline operator, which originates from the conditions toward the similarity of two operations. During
database community [9]. A skyline is defined as a set of each mining iteration based on the chosen number of
objects that are not dominated by other objects. In a multi- operation interfaces per domain, no , we calculate unique-
objective environment such as those listed in the interes- ness, diversity, and weight product for compositions
tinginess tuple, composition compa dominates composition discovered in the iteration. These values are then normal-
compb if compa exhibits better value in at least one dimension ized using the following equation:
than does compb and values as good as or better than does
compb in all other dimensions. The skyline operator v  vmin
v¼ : ð13Þ
addresses well the problem faced by the weighted function vmax  vmin
as the user is not required to come up with an optimal The interestingness skylines can be shown as a surface
combination of all the weights used in the weighted function.
formed in a 3D space with compositions’ uniqueness Uq ,
Q
6.1.6 Interestingness Simulation diversity Dv , and weight product m i¼1 wi as the coordinates.

In our simulation, we focus on investigating the interest- Fig. 6 uses circles to highlight skyline points. It shows the
ingness skyline of service compositions. In particular, we interestingness skylines for different numbers of operation
focus on the study of interestingness of compositions interfaces per domain. We see that as this number increases,
obtained through indirect recognition since they require the number of discovered compositions also increases
more computation according to (11), (8), and (10). Table 3 dramatically. However, the interestingness skyline keeps a
lists the configuration variables used in our experiment. population of top candidates with a relatively stable size.
74 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 2, NO. 1, JANUARY-MARCH 2009

Fig. 6. Skylines versus number of operations. (a) 50 operations/domain. (b) 200 operations/domain. (c) 500 operations/domain.

6.2 Interactive Hypothesis Formulation 6.3 Runtime Simulation


To help user formulate hypotheses that would lead to the Potentially composable Web services are identified in the
identification of ultimately interesting and useful service screening phase based on ontologies, which contain static
compositions, we have developed strategies aiming at information about the semantics of those services. Such
providing user with visual aids toward that goal. We composability may not survive the dynamic runtime config-
describe these strategies in this section and focus on the uration that contains realistic external conditions and
aspect of interestingness as we apply these strategies later to interdependencies among service operations. For example,
the discovery of pathways. a biological Web service may specify conditions necessary to
6.2.1 Identification of Interesting Segments enable or disable a biological process. These conditions may
include temperature, parameter locale and quantity, kinetic
in a Composition Network
energy, etc. While it may not be feasible to account for the
After a service composition network is discovered from the
external conditions and operation interdependencies during
screening phase, the identification of interesting segments
the screening phase since they would force it to become too
within the network would help user focus more toward
them as they are expected to become part of the final selective and computationally more expensive, they need to
outcome from the service mining process. The evaluation be considered in the evaluation phase to ensure that the
strategies discussed in Section 6.1.5 can be used, in general, pathways identified in the screening phase really exist. The
to identify compositions of high interestingness. verification of pathway validity can be carried out using a
simulation environment, where functions of biological Web
6.2.2 Establishment of Fully Connected Graph services can be invoked in the order as identified in pathway
Once interesting edges in a service composition or pathway leads. A pathway lead identified in the screening phase
network are highlighted to the user, he/she can then use indicates the potential possibility of a pathway based on
them as hints in selecting nodes of interest for further service and operation recognition. Verification aims at
exploration. We have developed the following strategy to determining if segments of an identified pathway lead can
link both interesting edges and user -elected nodes into a indeed be enabled with a chain of relevant conditions. The
connected graph to the extent possible.
1. Coalesce nodes (e.g., a, b, and c in Fig. 7) linked by
interesting edges into a group.
2. Convert interesting nodes (e.g., t picked by user) and
groups encompassing interesting nodes (e.g., c, f)
into nuclei, i.e., graph expansion focus nodes.
3. Incrementally expand all the nuclei. We use the
heuristics of connecting all the interesting nodes
using as many interesting edges as possible. To
achieve this, whenever a newly encountered node is
part of a nonnucleus group (e.g., one that contains h,
i, and j), an additional expansion is also triggered
and the whole group is engulfed. The expansion
stops when all nuclei are connected or when all
nodes in the graph are visited.
We omit listing of corresponding algorithms due to page
limit. Connected graphs identified using this process are then
presented to the user as basis for hypothesis formulation. Fig. 7. Expansion of interesting segments in graph.
ZHENG AND BOUGUETTAYA: SERVICE MINING ON THE WEB 75

second important aspect of runtime simulation is its ability to


support predictive analysis. For example, based on pathway
leads established from the screening phase and later high-
lighted in the interactive hypothesis formulation subphase,
the user may attempt to predict certain outcome from
indirect relationships derived from the way the pathway
network is laid out. Such prediction can be tested out using a
simulation strategy outlined in Algorithm 4.
Algorithm 4. Simulation Algorithm
Input: Pathway Network P N, function fðÞ determining
initial number of instances for an entity type, total number Fig. 8. Truth Table for Removing Entity Instance.
of iterations I, upper bound S for random number
generator random with uniform distribution available input entities meeting its preconditions should
Output: Statistics Stats simply not be invoked. Second, it determines how many
Variables: entity type et, entity instance container instances are available for providing the corresponding
ContainerðetÞ of type et, operation op, input entity opin , service. This factor is needed in the case of pathway
output entity opout and precondition oppre of op discovery due to the fact that biological entities of the same
1: for all et 2 P N do type each has a discrete service process that deals with input
2: ContainerðetÞ create fðetÞ instances; and output of a finite proportion. This differs from
3: end for traditional business service processes that are often repre-
4: Stats Tally entity quantities in each container; sented as collective singletons for a given organization (e.g.,
5: for i ¼ 0 to I do credit check, loan approval). The available instances of a
6: for all op 2 P N do particular biological entity that provides a service will drive
7: s op:getP roviderServceðÞ; the amount of various other entities they may consume
8: et parameter op:getInputParameterðÞ:getEntityTypeðÞ; and/or produce. For this reason, the algorithm treats each
9: etprovider s:getP roviderEntityT ypeðÞ; entity node in a pathway network as a container of entity
10: if etparameter ¼ etprovider then instances of the noted ontology type. We determine the
11: n number of entities of type etprovider that number of times an operation should be invoked based on
match oppre the quantity of the corresponding service providing entity
12: else (lines 7-16). To make sure that an operation from a service
13: n number of entities of type etparameter providing entity of a small quantity also gets the chance to
14: end if be invoked, a random number generator is used (line 16).
15: // Calculate the number of times to invoke the Fig. 8 shows the logic used in Algorithm 4 for removing an
operation entity in the corresponding entity container.
16: n n=Sþððrandom:nextIntðSÞ < ðn modulo SÞÞ?1 : 0Þ;
17: for j ¼ 0 to n do 6.4 Subjective Evaluation
18: if 9opin 2 Containerðetparameter Þ : In addition to objective measures for both interestingness
opin matches oppre then and usefulness, user evaluating these aspects of a service
19: opout invokeðopÞ with opin ; composition may choose to use subjective measures. The
20: if etparameter 6¼ etprovider ^ provider is reference base of such measures may be personal knowl-
consumable then edge, belief, bias, and needs. Unfortunately, approaches
21: Containerðetprovider Þ:removeð0Þ; based solely on subjective measures tend to inhibit us from
22: end if getting interesting and useful compositions that were not
23: if etparameter 6¼ etprovider _ provider is thought of. An extreme case of relying on subjective
consumable then measures to carry out Web service mining is the traditional
24: Containerðetparameter Þ:removeðopin Þ; composition approach where the user issues a query
25: end if specifying the composition in pursuit to start the search
26: etparameter opout :getEntityT ypeðÞ; process. Our approach allows objective measures to be used
27: Containerðetparameter Þ:addðopout Þ; first to reduce the population size of the candidate pool and
28: end if pushes the more expensive subjective evaluation toward the
29: end for end where the population size is presumably much smaller.
30: end for Results from runtime simulation are presented to the user,
31: Stats Tally entity quantities in each container; who can then correlate them with the hypotheses made
32: end for earlier to see if they can be either confirmed or rejected.
When an operation is to be invoked, the algorithm checks Based on such analysis, the user may then make the
two factors. First, it examines whether all the preconditions ultimate determination as to whether the pathway under
of the operation are met. An operation that does not have investigation is really interesting and useful.
76 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 2, NO. 1, JANUARY-MARCH 2009

Fig. 9. Conceptual models of biological processes.

7 APPLICATION TO PATHWAY DISCOVERY WSML [16] and deployed them into a WSMX [17] runtime
environment [10]. We use simple pathways manually
Limitations of existing biological process representation
constructed here as references when we later check the
approaches motivated us to propose to model these
correctness of pathways automatically discovered using
processes as Web services [10]. To demonstrate the
our mining algorithms.
effectiveness of our mining framework, we applied it to
Fig. 10 gives a snapshot of a pathway network auto-
the discovery of pathways linking these service-oriented
matically discovered. To enable the identification of inter-
processes. To prepare for our experiment, we first
esting service compositions (i.e., segments within a
compiled a list of conceptual models of biological processes pathway network), we extended each WSML service
based on [11], [12], [13], [14], and [15]. In addition to modeling a biological process to declare the modeling
describing process models, these sources also reveal some source in its nonfunctional properties (nfp) section. Based on
simple relevant pathways that can be manually put comparison of such information from edges in the pathway
together, as shown in Fig. 9, where each subfigure graph involved in recognition patterns in Figs. 2b, 2c, and
represents models constructed based on information 2d, our interestingness evaluation algorithm is then used to
obtained from a single source. Ontology concepts highlight those that are determined novel. These high-
(Fig. 9a) are used by these models to define the type of lighted edges provide the user with some visual clues
service providing entities and operation input/output aiding the manual selection of interesting nodes to pursue
parameters. Multiple examples of promotion, inhibition and further. Once nodes of interest are selected by the user, our
indirect recognition can be found in these simple pathways. graph expansion algorithm (Section 6.2.2) is then used to
For example, Fig. 9c shows that upon injury, LTB4 recruits link interesting nodes and edges into a connected graph,
Neutrophil, promoting its service of producing COX2. which forms the basis for the user to formulate hypotheses.
Fig. 9d shows that Gastric Juice’s service can inhibit the An example of these hypotheses may state that an increased
services of both Stomach Cell and Mucus. Example of dosage amount of Aspirin will lead to the relief of pain, but
indirect recognition can be found in Fig. 9e, where PLA2’s may increase the risk of ulcer in the stomach. To test out
service can liberate Arachidonic Acid, which can, in turn, hypotheses such as these, an initial quantity representing
be used as input to either the produce PGG2 operation of units of service is assigned to all service providing entities
COX1’s service or the produce PGE2 operation of the COX2 at the beginning of the simulation (lines 1-3 in Algorithm 4).
service. In practice, we envision that research labs (i.e., These quantities are expected to change as entities involved
model sources) can publish their discoveries of individual in the simulation interact with each other over time. From
biological processes independently using the vehicle of the two sample plots generated based on simulation results,
Web services. Based on these models, we constructed we see that as the quantity of Aspirin increases from 10 in
corresponding WSDL services, wrapped them using plot (a) to 40 in plot (b), there is an increase in the erosion
ZHENG AND BOUGUETTAYA: SERVICE MINING ON THE WEB 77

Fig. 10. Discovered pathway highlighted with interesting subgraph and sample simulation results.

of stomach by the gastric juice due to the increased approach and are thus not taken advantage in our
suppression on the production of mucus that covers the approach. A number of feedback and log-based approaches
stomach wall. We also notice (in plots (a) and others that are have been proposed to improve QoS and service composa-
not shown in Fig. 10) that when the senseRelief operation is bility measures. For example, Jurca et al. [21] propose a QoS
enabled, it tends to obliterate the trace of Aspirin’s impact monitoring scheme based on quality ratings from service
on pain sensation due to the ‘leaky bucket’ effect it has on clients, Dustdar and Hoffmann [22] rely on analyzing Web
pain and relief signals. Once we disable this operation (see service execution log data to discover potential process
plot (b)) in our simulation, we see a dramatic association workflow instances involving these services, and Liang
between the Aspirin dosage and the suppression on the et al. [23] rely on usage data at user, template, and instance
amount of pain signal being generated. This together with levels to mine for Web service composition patterns. While
the observation of Aspirin’s impact on stomach erosion as these approaches may work well for business processes
noted earlier essentially confirms the initial hypothesis over time as user feedback and execution logs are expected
from the user. to become available, the challenge of identifying interesting
workflows in the absence of such feedback and logs,
especially at the time when component Web services are
8 RELATED WORK
just introduced, is still real. Our Web service mining
Web mining research focuses on applying data mining framework allows the mining of interesting service compo-
techniques to discover interesting patterns of data from the sitions to be carried out in the absence of user feedback and
Web. In contrast, our research focuses on studying service execution logs. When applied to the field of pathway
behaviors that are intrinsically dynamic in nature, thus the discovery, where the expedience of such discovery is the
need of dynamic invocation of services after the discovery key to success, our approach enables the proactive
of interesting service compositions. A comprehensive QoS- discovery of interesting pathways upon the availability of
based service composition selection strategy is proposed in these services.
[8]. Our weighted function on usefulness ((12)) is based on
this strategy. Ardagna and Pernici [18] take this a step
further by considering the frequency of execution paths. In 9 CONCLUSION
our framework, we don’t assume that such frequency is In this paper, we proposed a Web service mining frame-
readily available. Xiong et al. [19] investigate how to work that enables the proactive discovery of interesting
configure Web services in a dynamically changing environ- and useful service compositions. To address the challenge
ment. In this regard, our research aims at the quick of combinatorial explosion, we developed mining algo-
identification of best service compositions and thus focuses rithms that can scale well will grow number of Web
more on the initial selection of service compositions using services. We also discussed how interestingness and
usefulness measures. Lamparter et al. [20] rely heavily on usefulness can be objectively evaluated. Finally, we
user preferences in the selection of Web services. Such presented a novel application of our framework to the
preferences lead to a typical top-down service composition discovery of pathways linking biological processes. Future
78 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 2, NO. 1, JANUARY-MARCH 2009

work includes improving the agility of our mining frame- George Zheng received the BS degree in
electronics engineering from Shanghai Jiao
work to accommodate for the dynamic expansion and Tong University, China, in 1986, the MS degree
evolution of WSML services. This would not only allow the in electrical engineering from the University of
framework to be checked against an expanding pool of Virginia, Charlottesville, in 1991, and the MS
Web services, but more importantly, ensure that the results degree in computer science from Johns Hopkins
University, Baltimore, Maryland, in 1997. He
of the mining process are updated to reflect the current received the PhD degree in computer science
availability and semantic description of service capabilities. from the Virginia Polytechnic Institute and State
University, Blacksburg, in 2009. He is currently a
principal systems engineer with Science Applications International
REFERENCES Corporation (SAIC). His research interests include Web services mining,
bioinformatics, workflow, software simulation, and systems integration.
[1] Web Services Architecture—W3C Working Group Note, http://
www.w3.org/TR/2004/NOTE-ws-arch-20040211/, Feb. 2004.
Athman Bouguettaya received the PhD degree
[2] G. Zheng and A. Bouguettaya, “A Web Service Mining Frame-
in computer science from the University of
work,” Proc. IEEE Int’l Conf. Web Services (ICWS ’07), July 2007.
Colorado at Boulder in 1992. He is a science
[3] P. Ball, Designing the Molecular World—Chemistry at the Frontier.
leader at CSIRO ICT Center, Canberra. He was
Princeton Univ. Press, 1994.
previously a tenured faculty member in the
[4] OWL-S: Semantic Markup for Web Services—W3C Member
Computer Science Department at Virginia Poly-
Submission, http://www.w3.org/Submission/OWL-S/, Nov.
technic Institute and State University (commonly
2004.
known as Virginia Tech). He is on the editorial
[5] Web Service Modeling Ontology, http://www.wsmo.org/, 2009.
boards of several journals, including the IEEE
[6] B. Medjahed, A. Bouguettaya, and A.K. Elmagarmid, “Composing
Transactions on Services Computing, the Inter-
Web Services on the Semantic Web,” VLDB J., Sept. 2003.
national Journal on Web Services Research, the
[7] J. Augen, “The Evolving Role of Information Technology in the
VLDB Journal, the Distributed and Parallel Databases Journal, and the
Drug Discovery Process,” Drug Discovery Today, vol. 7, pp. 315-
International Journal of Cooperative Information Systems. He was
323, 2002.
invited to be a guest editor of a special issue of Computer on trust
[8] L. Zeng, B. Benatallah, A.H.H. Ngu, M. Dumas, J. Kalagnanam,
management in Web service environments and a special issue of
and H. Chang, “QoS-Aware Middleware for Web Services
Internet Computing on database technology on the Web. He also guest
Composition,” IEEE Trans. Software Eng., vol. 30, no. 5, pp. 311-
edited a special issue of the ACM Transactions on Internet on Semantic
327, May 2004.
Web Services. He served as a program chair of the 2008 International
[9] S. Borzsonyi, D. Kossmann, and K. Stocker, “The Skyline
Conference on Service Oriented Computing (ICSOC) and the IEEE
Operator,” Proc. 17th Int’l Conf. Data Eng., pp. 421-430, 2001.
RIDE Workshop on Web Services for E-Commerce and E-Government
[10] G. Zheng and A. Bouguettaya, “Discovering Pathways of Service
(RIDE-WS-ECEG 2004). He has served on numerous program
Oriented Biological Processes,” Proc. Ninth Int’l Conf. Web
committees of database and service-oriented computing conferences.
information Systems Eng. (WISE ’08), Sept. 2008.
His current research interests are in service-oriented computing. He is a
[11] S.Y. Auyang, “From Experience to Design—The Science behind
senior member of the IEEE and the ACM.
Aspirin,” http://www.creatingtechnology.org/biomed/aspirin.
htm, 2009.
[12] C. Freudenrich, “How Pain Works,” http://health.howstuffworks.
com/pain.htm, 2009.
[13] L. Hoffman, “How Aspirin Works,” http://health.howstuffworks.
com/aspirin1.htm, 2009.
[14] M. Landau, “Inflammatory Villain Turns Do-Gooder,” http://
focus.hms.harvard.edu/2001/Aug10_2001/immunology.html,
2009.
[15] M.-J. Yin, Y. Yamamto, and R.B. Gaynor, “The Anti Inflammatory
Agents Aspirin and Salicylate Inhibit the Activity of IB kinase-
,” Nature, vol. 369, pp. 77-80, Nov. 1998.
[16] The Web Service Modeling Language WSML, http://www.wsmo.
org/wsml/wsml-syntax, 2009.
[17] Web Services Execution Environment, http://sourceforge.net/
projects/wsmx, 2009.
[18] D. Ardagna and B. Pernici, “Global and Local QoS Constraints
Guarantee in Web Service Selection,” Proc. IEEE Int’l Conf. Web
Services (ICWS ’05), July 2005.
[19] P. Xiong, Y. Fan, and M. Zhou, “QoS-Aware Web Service
Configuration,” IEEE Trans. Systems, Man, and Cybernetics, Part
A, vol. 38, no. 4, pp. 888-895, 2008.
[20] S. Lamparter, A. Ankolekar, R. Studer, and S. Grimm, “Pre-
ference-Based Selection of Highly Configurable Web Services,”
Proc. 16th Int’l Conf. World Wide Web (WWW ’07), pp. 1013-1022,
2007.
[21] R. Jurca, B. Faltings, and W. Binder, “Reliable QoS Monitoring
Based on Client Feedback,” Proc. 16th Int’l Conf. World Wide Web
(WWW ’07), pp. 1003-1012, 2007.
[22] S. Dustdar, T. Hoffmann, and W. van der Aalst, “Mining of Ad-
Hoc Business Processes with TeamLog,” Data and Knowledge Eng.,
http://citeseer.ist.psu.edu/dustdar04mining.html, 2005.
[23] Q.A. Liang, J.-Y. Chung, S. Miller, and Y. Ouyang, “Service
Pattern Discovery of Web Service Mining in Web Service Registry-
Repository,” Proc. IEEE Int’l Conf. e-Business Eng. (ICEBE ’06),
pp. 286-293, 2006.

You might also like