You are on page 1of 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/325203163

NL2API: A Framework for Bootstrapping Service Recommendation using Natural


Language Queries

Conference Paper · May 2018

CITATIONS READS

0 21

5 authors, including:

Anup Kalia Maja Vukovic


IBM IBM
24 PUBLICATIONS   83 CITATIONS    84 PUBLICATIONS   895 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Web Service Composition View project

Trust and Commitments View project

All content following this page was uploaded by Anup Kalia on 17 May 2018.

The user has requested enhancement of the downloaded file.


NL2API: A Framework for Bootstrapping Service Recommendation using Natural
Language Queries

Chen Lin∗ , Anup K. Kalia† , Jin Xiao† , Maja Vukovic† and Nikos Anerousis†
∗ Department of Computer Science
North Carolina State University, Raleigh, NC
Email: clin12@ncsu.edu
† IBM T. J. Watson Research Center, Yorktown Heights, NY 10598
{anup.kalia, jinoaix, maja, nikos}@us.ibm.com

Abstract—Existing approaches to recommend services using techniques such as keyword matching, entity relationship ex-
natural language queries are supervised or unsupervised. traction, and extraction of topics and clusters. The keyword
Supervised approaches rely on a dataset with natural language matching approach [11] and the entity relationship extrac-
queries annotated with categorizing labels. As the annotation
process is manual and requires deep domain knowledge, tion [17] technique match users’ natural language queries
these approaches are not readily applicable on new datasets. syntactically with service descriptions, thereby, ignoring the
On the other hand, unsupervised approaches overcome the semantic similarity and may lead to false positives. For
limitation. To date, unsupervised approaches are primarily example, consider a user wants to find services related to
based on matching keywords, entity relationships, topics and bus. The response returns a service Message Bus that has
clusters. Keywords and entity relationships ignore the semantic
similarity between a query and services. Topics and clusters no relation to the user’s actual intent, i.e., transport services.
capture the semantic similarity, but rely on mashups that For another instance, approaches based on the extraction of
explicitly capture relationships between services. Again, for new topics and clusters [13], [6], [5], [16] capture the semantic
services, the information are not readily available. We propose similarity between a user query and service descriptions.
NL2API, a framework that relies solely on service descriptions However, these approaches rely on mashups that explicitly
for recommending services. NL2API has the benefit of being
immediately applicable as a bootstrap recommender for new capture the relationships between services – mashups may
datasets. To capture relationships among services, NL2API not be available for all services.
provides different approaches to construct communities where In the paper, we propose a novel unsupervised framework
a community represents an abstraction over a group of services. natural language to service APIs (NL2API) that recommends
Based on the communities and users’ queries, NL2API applies services based on users’ queries and service descriptions.
a query matching approach to recommend top-k services. We
evaluate NL2API on datasets collected from Programmable The approach does not rely on mashups since they are not
Web and API Harmony. Our evaluation shows that for sizable available for new services especially if they have not been
datasets such as Programmable Web NL2API outperforms used in any application. Thus, NL2API acts as a bootstrap
baseline approaches. service recommender for new services. NL2API identifies
Keywords-service discovery, topic modeling, deep learning, relationships between services by extracting communities
community detection, clustering, web services based on their semantic similarities of intents. Each com-
munity is represented as a tree like structure where the
I. I NTRODUCTION the internal nodes represent hierarchical intents and the leaf
Web services help users to develop applications by offer- nodes represent topics. A topic refers to a common theme
ing them a wide range of functionalities in the domains of across a group of services, whereas an intent refers to an
IT, e-commerce, marketing, location, communications, and abstraction over topics, or other intents. NL2API identifies
so on. With the increasing number of services in different the communities which are most closely related to the
domains, the discovery process has become challenging. users’ queries and then recommends top-k services with a
Existing discovery techniques rely on two broad ap- confidence score from these communities using the latent
proaches, supervised and unsupervised. In supervised ap- semantic index (LSI) technique.
proaches a training dataset for a domain is prepared where The proposed framework was evaluated on two datasets:
natural language based queries are annotated with specific Programmable Web and API Harmony [8]. The results show
services by domain experts [9] or from crowd [14]. There are that the proposed framework can retrieve relevant results
a few challenges associated with supervised approach. First, without external information such as mashups. The results
the annotation process is time consuming and laborious. also show that the proposed framework outperforms baseline
Second, annotations can be misleading and biased. Unsu- approaches especially when it was applied to large dataset
pervised approaches overcome the limitations by considering such as Programmable Web.
II. R ELATED W ORK Stack Overflow. We assume that Stack Overflow might not
The existing approach for service recommendation is provide data for a new domain. Another limitation of their
based on supervised approaches or unsupervised approaches. approach is that they consider keyword based matching that
ignores the semantic similarity of users’ queries with service
A. Supervised Approaches descriptions and relationship between services.
Su et al. [14] propose a framework for web service discov- Xie et al. [16] provide an approach that takes input as
ery. Since the framework relies on a supervised approach, mashups (related services) and clusters them into groups
Su et al. creates a training data based on inputs from crowd. based on their textual descriptions. For the clustering they
The training data consists of mappings between informal used the K-medoids algorithm. From the clusters, services
commands based on natural language and service calls. Su are recommended based on user’s requirements. The ap-
et al. provide an intermediate representation or a canonical proach has several limitations. One, the approach relies on
command for each service in terms of HTTP verb, resource, mashups which may not be present for a new domain. Two,
return type, required parameters, and optional parameters. the approach clusters mashups rather than services. Thus, the
Then, the crowd is employed to paraphrase canonical com- identified clusters may not correctly depict the relationship
mands. They use LSTM for training and evaluation of between services.
their annotated dataset. Their approach has limitations. One, Gao and Wu [5] provide a framework to recommend
collecting informal commands for a large repository of a set of services based for mashup composition based on
services can be challenging. Two, with inclusion of new users’ text inputs. In addition to matching services with
domains, the intermediate representation has to be refined. users’ requirements they emphasize on the quality of service
Kalia et al. [9] provide service discovery in a service recommended. For recommending top-k services, rather than
catalog based on natural language requests from users. using traditional topic modeling techniques, they create a
Their approach relies on multiple labels associated with mapping between service descriptions and mashup compo-
each service. For example, a service can be associated with sition. The supervise technique first learns the relevance
a category, task, and an action. To prepare the training between service descriptions and mashups. Then, they apply
data Kalia et al. consider IT change requests and annotate clustering using K-means to recommend top-k services.
them with multiple labels based on a service catalog. For They used the learned model from the supervised approach
building their model they use the classifier chain multilabel to improve clustering. Clearly the approach rely on mashup
classification and improve the classification accuracy by composition to cluster services that may not be present for
providing a feedback based on extracted parameters from new domains.
users’ requests. Their approach is limited to a specific Samanta and Liu [13] provide an approach for service
catalog of services. For new catalogs, the annotation process recommendation that applies Hierarchical Dirichlet process
can be cumbersome and needs extensive domain knowledge on services to generate topics. The approach considers
for annotation. mashups as its input to generate topics. A cosine similarity
between the topics generated from services and mashups is
B. Unsupervised Approaches computed to generate a candidate list of services. Further,
Xiong et al. [17] provides a framework that considers the usage history of services in mashups is used to produce
users’ personalized requirements and semantic graphs to top k-services from the candidate list. Hao et al. [6] propose
generate recommendations for users. The semantic graph is an approach to refine service descriptions for a specific
constructed based on natural language API descriptions. A query. The approach relies on mashup descriptions that
semantic representation is captured as triples in the form provide application scenarios for services and such descrip-
of (X, α, Y) where X and Y are entities and α is the tions provide additional information for services that can
string of words that intervenes between X and Y. Relation- be used to reconstruct the original service descriptions. For
ship are constructed by extracting typed dependency trees reconstruction of service descriptions, Hao et al. consider
from descriptions. Using the relationships, they recommend Latent Dirichlet Allocation to extract topics from mashup de-
top-k recommendations. Their approach does not discover scriptions and the query given by the user. Both approaches
the semantic relationships between services based on their rely on mashups that may not be present for all kinds of
descriptions. Thus, the approach may fail to recognize the domains.
users’ intent from their queries.
Rahman et al. [11] provide an approach that uses crowd- III. T HE NL2API F RAMEWORK
sourced knowledge for recommendation of APIs. Their In the section, we present an overview of our proposed
approach exploits the relationships of keywords extracted framework NL2API. Our framework has three components:
from Stack Overflow questions and answers with APIs a preprocessing unit, a community extractor, and a query
based on their descriptions. One important limitation of their matcher. In the following sections, we present each of the
approach is that the approach rely on external dataset such as components in greater detail.
A. The Preprocessing Unit
The preprocessing unit takes natural language service
descriptions as input and preprocesses them using a pipeline
that removes punctuations, applies part-of-speech (POS)
tagger, removes frequent, infrequent, and corpus specific
stop words. In our framework, we removed infrequent words
that were in three or less number of service descriptions and
frequent words that appear in more than 10% of the service
descriptions. For corpus-specific stop words, we heuristically
select the low-quality topics and remove the most frequent
terms in the topics. Table I shows examples of such words.

Table I
E XAMPLES OF MOST FREQUENT, INFREQUENT, AND CORPUS - SPECIFIC
STOP WORDS EXTRACTED FROM SERVICE DESCRIPTIONS IN
P ROGRAMMABLE W EB . Figure 1. Examples of communities constructed from service descriptions

Type Words
cluster approach. Finally, a topic modeling method is applied
Frequent service, time, method, platform, developer, api, web,
words system, site, respone, functionality
to services within each cluster. A detailed descriptions are
provided in the following paragraphs.
Infrequent phyloinformatic, carregistrationapi, myhurricane, 1) Topic Modeling (TNMF): Existing topic modeling
words atomicreach, hotukdeal, polypeptide, cooladata,
eventcategorie, changeavatar, zubhium, americascup techniques consider the well-known algorithms such as La-
tent Dirichlet Allocation (LDA) [1] and non-negative matrix
Corpus- management, feature, message, solution, integration,
specific app, tool, function, client, request, xml, way, provider factorization (NMF) [3]. However, empirical studies [12],
stopword [19] have shown that their effectiveness greatly reduces
when the length of documents decreases. Considering most
service descriptions are short sentences, we consider the
B. The Community Extractor non-negative matrix factorization based on term correlation
A community represents a hierarchical structure that rep- matrix (TNMF) [18] as the method has been proven to out-
resents a set of topics related to a common intent. Within perform LDA, NMF, probabilistic latent semantic analysis
each community (Figure 1), the structure can be constructed (PLSA), graph regularized NMF [3], and symmetric NMF
as a tree where the internal nodes represent intents and leaf [10] for multiple datasets containing short texts.
nodes represent topics. Each topic is inferred from a list of Table II
service descriptions. Note that the depth of the hierarchical E XAMPLES OF TOPIC AND TOPIC WORDS EXTRACTED FROM SERVICE
tree indicates intents at different levels of abstraction. Intents DESCRIPTIONS IN P ROGRAMMABLE W EB .

closer to the root node represent abstract and general intents


(e.g., investment) whereas intents closer to topics represent Topic Topic Words
detailed and specific intents (e.g., stock exchange). Topics Topic 1 train, bus, commuter, timetable, transport, de-
that share common ancestor have higher degree of relevance. parture, transit, journey
For example, in Figure 1, the topics “music”, “movie”, Topic 2 wind, humidity, weather, precipitation, baromet-
and “video” are closely related. Similarly, topics “location”, ric, temperature, irradiance, forecast
“route” and “map” are related. Topic 3 student, class, course, education, school, college,
To extract communities, we utilize representative topic instructor, assignment
modeling and clustering techniques, and devise solutions that Topic 4 election, committee, voter, vote, ballot, voting,
integrates them into our desired tree structure. Accordingly, candidate, privilege
we propose three solutions. One, the baseline approach
that uses a topic modeling technique where each learned TNMF [18] is a method designed exclusively for discover-
topic forms a single community. Note that the baseline ing topics especially from short texts. Traditional topic mod-
approach does not construct a hierarchical intent tree. Two, eling NMF decomposes the term-document matrix, which
the bottom up approach that first infers topics, then applies indicates the term occurrence in a given document. However,
the community detection method to identify communities. for short texts, the term-document matrix can be extremely
Three, the top down approach that learns a latent high level, sparse, which prevents the model to learn reliable topics.
low dimensional vector representation for each service and TNMF tackles the problem by making an assumption that
then groups similar vectors into community using k-means terms co-occurring frequently are most likely to be relevant
to a common topic. For example, if the terms address
and zipcode co-occur in several documents at the same
time, they are more likely to talk about a common topic
location. Thus, TNMF learns reliable topics by decomposing
term-correlation matrix [18] instead – the matrix does not
suffer from the scarcity problem, in fact, the size of terms
remain stable even when the number of document grows.
An example of topic and topic words extracted from service
descriptions in Programmable Web is shown in Table II.
2) Topic Modeling (TNMF) + Louvain’s Community De-
tection (LCD): The second approach first learns topics using
TNMF, then uses Louvain’s Community Detection (LCD) Figure 2. The network is created using topics and services. Communities
[4] to extract communities using a constructed network that are derived by applying the LCD approach.
models pairwise association between topics and services.
Specifically, the network is constructed based on a learned
topic-document matrix V. In the network graph, nodes change in the modularity larger than a predefined threshold
represent services (or documents) and topics. An weighted is possible. For a more detailed description of LCD, please
edge is formed between the topic and services if the cor- refer to [2]. In table III we show examples of communities
responding entry in V is non-zero; the weight is the value extracted from service descriptions in Programmable Web
in that entry. Figure 2 shows an example of such network. using TNMF and LCD.
After the network is completed, LCD is applied to extract
Table III
communities from it. E XAMPLES OF COMMUNITIES , TOPICS , AND TOPIC WORDS EXTRACTED
LCD is a greedy optimization algorithm where the value FROM SERVICE DESCRIPTIONS IN P ROGRAMMABLE W EB USING TNMF
AND LCD.
to be optimized by LCD is a scale value called modularity
(ranging between -1 to 1). Modularity measures the density
Communities Topics Topic Words
of edges inside communities to edges outside communities.
Optimizing modularity would lead to the best possible Community 1 Topic 1 mining, block, bitcoin, currency
Topic 2 equity, market, stock, exchange
partition of a community. Iterating through all possible
Topic 3 sell, audit, buy, priority
partitions is highly impractical. Thus, based on a heuristic
approach with the following two phases repeated iteratively Community 2 Topic 4 gene, research, metabolic
Topic 5 sequence, protein, amino, acid
until convergence:
In the first phase, all nodes are assigned to their own
communities. The change of modularity is computed for a 3) LSTM based Autoencoder + K-means clustering: The
moving node i from its own community to its neighboring third approach explores a language generation task: training
community j (only contains node j at the beginning), using a Long short-term memory (LSTM) based autoencoder to
the following heuristic function build an embedding for the service descriptions and then,
h P +k  P +k 2 i decodes the embedding to reconstruct the original sentence.
in i,in tot i The embedding can be seen as a high-level low-dimensional
∆Q = − −
2m 2m (1) representation of original service descriptions. The K-means
hP  P 2 ki 2 i
in
− tot
−( ) clustering method is used to partition the service embedding
2m 2m 2m into different clusters (or community).
P
where in is the sum of all the weights P of edges inside LSTM [7] has been frequently used on language genera-
the community that i is moving into. tot is the sum tion tasks like machine translation or parsing. The power of
of all weights of the edges in the community. ki is the LSTM comes from its ability to capture local dependencies
weighted degree of i, ki,in is the sum of weights of the between words: neighboring words are combined to express
links between i and other nodes in the community. Once the a particular meaning. It is a special type of Recurrent Neural
value is computed for all communities node i is connected Network (RNN) that can avoid the vanishing (and exploding)
to, node i can be combined into the community that resulted gradient problem. An LSTM unit contains three major com-
in the greatest ∆Q. the process is applied sequentially to all ponents Forget, Input and Output. The components interact
communities in the network. with each other to control how information flows. Forget
In the second phase, the nodes in the same community determines what information from previous memory cell
are grouped together – the entire community is treated as is expired and should be thrown away. Input component
a node, thus a new network is constructed. Then, the first determines what information is new and requires updating.
phase is reapplied. the procedure will be repeated until no Output component is an activation function that filters the
value from a memory cell. M(query, topic) between a query and a topic, i.e., the sum
The LSTM based autoencoder shown in Figure 3 creates of the similarity scores between each word in a query
a compact representation of service descriptions. The au- and each top keyword in a given topic. We compute the
toencoder is a neural model that consists of two LSTMs. word similarity using well-known Wu-Palmer score [15] that
One LSTM encodes a sequence of words into a fixed-length calculates the relatedness of two word sense by computing
vector representation. The output of the last LSTM unit the path length from the least common subsumer of the two
generates an embedding of service descriptions. The second word sense to the root node that captures the most specific
LSTM decodes the representation into another sequence of concept they share.
symbols. The encoder and decoder of the proposed model The matching query is given as follows.
are jointly trained to maximize the conditional probability
of a target sequence given a source sequence. In the ex- X X
ample shown in Figure 3, the target sequence and source M (query, topic) = Similarity(u, v) (2)
u∈query w∈topic
sequence are the same since autoencoder aims at obtaining
a representation that fully captures the content of its input Similarity(u, v) = max(W u − P almer(ui , vj )) (3)
– a good representation is to retain a significant amount of
information about the input. where u are words in query and w are topic terms.
ui and vj are word senses. For a given query, we select
topics with the highest M(query, topic) and intents that
are the ancestors of the identified topics. If topics under
consideration belong to different communities and do not
share a common ancestor, then the parent nodes of the topics
are identified. The services under the identified intents are
returned as candidates.
Given a set of services as candidates, Latent Semantic
Index (LSI) is used to calculate a matching score between
the candidate services and that query. Services with highest
matching score is recommended to the user.
Figure 3. LSTM based autoencoder
IV. E VALUATION
After the low dimensional representation is obtained, We evaluate NL2API on two datasets: Programmable Web
we apply the K-means clustering method to partition the and API Harmony.
services into K specified clusters (or community). Finally, Programmable Web is one of the largest repository
TNMF is applied to services within each community to to discover and search application programming interface
extract topic per community. An example is demonstrated (APIs) or services to be used for web or mobile applications.
in Table IV. Everyday new services are being added to Programmable
Web to make them available to developers. In addition to
Table IV
E XAMPLES OF COMMUNITIES , TOPICS , AND TOPIC WORDS EXTRACTED service, Programmable Web provides mashups data, source
FROM SERVICE DESCRIPTIONS IN P ROGRAMMABLE W EB USING LSTM code to use APIs, a list of frameworks, libraries, and SDK
AUTOENCODE , K-M EANS , AND TNMF.
each of which is connected to API-driven development.
API Harmony provides a catalog of web services. API
Communities Topics Topic Words Harmony helps a developer to better understand the APIs
Community 1 Topic 1 currency, trading, market through executable examples and by providing the historical
Topic 2 product, sale, order, price usage of a specific service. API Harmony has three features.
Topic 3 advertiser, publisher, revenue
One, developers can use API Harmony to discover services.
Community 2 Topic 4 sequence, protein, structure It uses past search information to recommend services for a
Topic 5 analysis, search, dataset, result new search query. Two, developers can learn about services
Topic 6 database, gene, record, protein
based on their documentations, community discussions, and
sample codes. Three, API Harmony provides developers
with examples and historical usages on how to integrate a
C. The Query Matcher specific service into their applications.
The query matcher recommends a list of services based on Table V shows the count of services we consider for
the query from a user. Based on the given query, we check each dataset. For Programmable Web, we develop a crawler
communities and underlying topics the query is related to. to mine services along with their names, descriptions and
To check the relatedness, we compute a matching score categories. For API Harmony, we collect the services directly
from its authors. Each service has a name, description, url iterated until there are no new service matching the intent
paths and parameters. and all search queries have been exhausted. Finally for each
intent, the investigators merge their independent set of valid
Table V services to produce a final ground truth set for each intent.
C OUNTS OF SERVICES IN EACH DATASET.
Step 3. Generate queries for each intent. We hire a third
Dataset # of Services
investigator to produce queries for each intent identified in
Step1 . Queries generated can be of two types, one based on
Programmable Web 12372
a set of keywords and another based on semantics or well
API Harmony 1180
formed natural language phrases. For example, to search
speech to text services, one can search using keywords
We evaluate the NL2API framework in terms of coverage. such as speech to text and tools for speech to text or
Coverage measures the number of relevant recommendations using semantics such as get speech to text services and find
returned by a specific approach i.e., (1) TNMF, (2) TNMF platforms that convert speech to text.
and LCD, and (3) LSTM and K-means. A recommendation Step 4. Compute coverage. We ask the third investigator to
is relevant if it is an element of the ground truth. provide the queries consisting of keywords and semantics for
each intent. We run the queries utilizing a search approach
A. Experiment Design
in NL2API to obtain a result set under that specific method.
To evaluate NL2API, we need ground truth. As all of The intersection of the result set with the ground truth set of
the proposed approaches are unsupervised, determining the that intent yields a valid recommendation set. And the ratio
ground truth is challenging. Traditional approaches rely on of the valid recommendation set over the ground truth set of
mashups where we can get the information about relation- the intent yields the coverage measurement for the specific
ship of one service to another. However, NL2API is agnostic search approach for that intent.
of mashups. Coverage is therefore a normalized ratio capturing how
One way to collect the ground truth is to expose NL2API well a search method worked for a given intent. We
to crowd and ask them to provide queries and recommen- multiplying the ratio by 100 to obtain a percentage score.
dations. However, the approach is quite limiting as crowd
workers may not have the expertise to provide relevant • Keywords Coverage. The coverage measures
queries and interpret recommendations. To overcome the performance of keyword queries.
limitation Su et al. [14] provide an intermediate represen-
tation for a service for crowd to generate queries, however, • Semantics Coverage. The coverage measures
scaling the approach for multiple datasets with large number performance of semantic queries.
of services is infeasible. Thus, we provide an experimental
agnostic to crowd and can be applied to any unsupervised • Total Coverage. The coverage measures performance
approach as follows: of overall recommendation from both keyword and
Step 1: Generate a set of intents. For each dataset, we semantic queries.
randomly select N descriptions. We hire two investigators
with sufficient and similar technical expertise to read N V. R ESULTS
descriptions and produce N intents they both agree on. An We apply the steps given in Section IV-A to the datasets
intent here refers to a description that concisely specifies from Programmable Web and API Harmony. For each
the type of service a user is looking for. Both investigators dataset we randomly generate five (N =5) service descrip-
must agree on the semantic interpretation of all of the intents tions. For each description, the two investigators come
produced. up with five intents. For example, for a description such
Step 2. Determine ground truths for each intent. For each as Picwing provides easy photo sharing, bridging online
intent in N , a ground truth result set is constructed. Ideally, sharing and offline sharing . . ., the agreed intent could be
the ground truth set is formed by manually evaluating every put photo online for sharing. We provide the intents to
services in a dataset and selecting all services that match the third investigator who generates queries and search for
the intent. However, the is infeasible due to the large size of valid recommendations using all of the three approaches. For
the dataset, especially Programmable Web. The investigators example, for an intent such as put photo online for sharing,
hence approximate the exhaustive search by conducting a the third investigator can suggest a query based on keywords
series of keyword based searches on the data set: first, the such as image sharing and semantics such as share photos
investigators independently formulate likely keyword queries with my friends.
based on an intent, and collect the results that matches the For each approach and each query in an intent, we ob-
intent. Then, they further generate new keywords based on tain the recommendations corresponding that approach and
the description of services identified. The process is then that intent, and compute three coverage scores: keywords,
LSTM+K-Means

TNMF+LCD

TNMF

(a) total coverage (PW). (b) keywords coverage (PW). (c) semantics coverage (PW).

LSTM+K-Means

TNMF+LCD

TNMF

0 10 20 30 40 50 −5 10 20 30 40 50 −5 15 30 45 60
(d) total coverage (AH). (e) keywords coverage (AH). (f) semantics coverage (AH).

Figure 4. Boxplots comparing different approaches applied on datasets from Programmable Web (PW) and API Harmony (AH).

semantics, and total coverage. Figure 4 shows the boxplots 66.93%) and LSTM + K-means (std = 7.5%, var = 56.56%)
of coverage performance for each approach by averaging the approaches are significantly larger than the TNMF approach
coverage scores across 5 intents. the exercise is conducted (std = 0.83%, var = 0.69%). The observation suggests that
for Programmable Web (PW) and for API Harmony (AH). output based on communities constructed could vary largely
Total Coverage. For the Programmable Web dataset we from one query to another i.e., for some queries we can get
find that the median of the total coverage for the approaches a higher coverage than others. However, the may not hold
TNMF + LCD (10.41%) and LSTM + K-means (9.12%) out- for the TNMF approach. For the API Harmony dataset, we
perform the baseline approach TNMF (9.02%). However, for find the standard deviation (std) and variance (var) for the
the API Harmony dataset we find that the median of the total TNMF approach is larger than approaches TNMF + LCD
coverage for the approach TNMF (28.35%) outperforms the and LSTM + K-means. The observation suggests that a
approaches TNMF + LCD (10.75%) and LSTM + K-means larger dataset can be crucial to improve the performance
(8.48%). The observations show that for a large dataset of the recommendations.
such as Programmable Web extracting communities could Semantic Coverage. For the Programmable Web dataset,
be beneficial in discovering closely related services. For a we find that the median of the semantic coverage for
small dataset such as API Harmony, a baseline approach the approach TNMF + LCD (12.5%) outperforms the ap-
is sufficient to discover relevant services. the suggest that proaches TNMF (9.7%) and LSTM + K-means (9.7%).
with increasing size of datasets, our approach is beneficial For the API Harmony dataset, we find that the median of
to developers. the semantic coverage for the TNMF approach (23.22%)
Keyword Coverage. For the Programmable Web dataset, outperforms the TNMF + LCD (13.02%) and LSTM + K-
we find that the median of the keyword coverage for the means (7.97%) approaches. From the observations we find
approaches TNMF (8.33%) and TNMF + LCD (8.33%) that for a larger dataset such as Programmable Web the com-
outperforms the LSTM + K-means (3.33%) approach. For munities extracted using LCD could be useful in discovering
the API Harmony dataset, we find that the median of relevant services based on semantic queries. In addition, we
the keyword coverage for the approach TNMF (30.79%) observe that for the Programmable Web dataset, the standard
outperforms the approaches TNMF + LCD (6.32%) and deviations (std) and variances (var) for the TNMF + LCD
LSTM + K-means (5.09%). From the observations we find (std = 13.2%, var = 174.7%) and LSTM + K-means (std
that for searching services based on keyword queries topic = 5.8%, var = 33.7%) approaches are significantly larger
modeling approaches based on TNMF can provide relevant than the TNMF approach (std = 4.49%, var = 20.17%). The
recommendations. Additionally, for the Programmable Web observation suggests that based on communities constructed
dataset we find that the standard deviations (std) and vari- we can get higher coverage for certain queries. It may not
ances (var) for the TNMF + LCD (std = 8.18%, var = hold for the TNMF approach. For the API Harmony dataset,
we find the standard deviation (std) and variance (var) for [8] IBM. Api harmony, 2017.
the TNMF + LCD approach is larger than the TNMF and
LSTM + K-means approaches. The observation suggests that [9] A. K. Kalia, J. Xiao, M. F. Bulut, N. Anerousis, and
M. Vukovic. Cataloger: Catalog recommendation service
for semantic queries, generating communities from topics for it change requests. In Proceedings of International
are beneficial for better recommendations irrespective of the Conference on Service Computing, pages 545–560, Malaga,
data size. nov 2017. Springer.
VI. D ISCUSSION [10] D. Kuang, C. Ding, and H. Park. Symmetric nonnegative
We present NL2API a framework for recommending top-k matrix factorization for graph clustering. In Proceedings
services based on three different approaches. The approaches of the 2012 SIAM international conference on data mining,
pages 106–117. SIAM, 2012.
are agnostic of mashups or pre-specified mappings from
natural language queries to services. Thus, NL2API can act [11] M. M. Rahma, C. K. Roy, and D. Lo. RACK: Auto-
as a quick bootstrapping recommender for datatsets that do matic api recommendation using crowdsourced knowledge.
not have mashups or prior history of services’ usage. We In Proceedings of 23rd IEEE International Conference on
evaluate NL2API on two different datasets, Programmable Software Analysis, Evolution, and Reengineering, pages 349–
359, Klagenfurt, 2016. IEEE.
Web and API Harmony. Our findings show that for larger
datasets such as Programmable Web the TNMF + LCD and [12] M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The
LSTM + K-means approaches provide significantly higher author-topic model for authors and documents. In Pro-
coverage than the baseline TNMF approach. However, for ceedings of the 20th conference on Uncertainty in artificial
smaller dataset such API Harmony, the baseline approach intelligence, pages 487–494. AUAI Press, 2004.
can produce better results than the other approaches. [13] P. Samanta and X. Liu. Recommending services for new
In the future, we plan to further expand our investigation. mashups through service factors and top-k neighbors. In Pro-
One, we plan to create a larger repository of services. Two, ceedings of IEEE International Conference on Web Services,
we plan to provide a more efficient query matching tech- pages 381–388, Honolulu, jun 2017. IEEE.
nique such as word vector to discover relevant communities
[14] Y. Su, A. H. Awadallah, M. Khabsa, P. Pantel, and M. Ga-
generated by the TNMF + LCD and the LSTM + K-means mon. Building natural language interfaces to web apis.
approaches. Three, we plan to perform an extensive user In Proceedings of the 26th ACM International Conference
study to further evaluate the effectiveness of our approaches. on Information and Knowledge Management, pages 1–10,
Singapore, nov 2017. ACM.
R EFERENCES
[1] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet alloca- [15] Z. Wu and M. Palmer. Verbs semantics and lexical selection.
tion. Journal of machine Learning research, 3(Jan):993–1022, In Proceedings of the 32nd annual meeting on Association
2003. for Computational Linguistics, pages 133–138. Association
for Computational Linguistics, 1994.
[2] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefeb-
vre. Fast unfolding of communities in large networks. [16] F. Xie, J. Liu, M. Tang, D. Zhou, B. Cao, and M. Shi.
Journal of statistical mechanics: theory and experiment, Multi-relation based manifold ranking algorithm for api rec-
2008(10):P10008, 2008. ommendation. In G. Wang, Y. Han, and G. M. Pérez,
editors, Proceedings of 10th Asia-Pacific Services Computing
[3] D. Cai, X. He, J. Han, and T. S. Huang. Graph regularized Conference, pages 15–32, Zhangjiajie, 2016. Springer.
nonnegative matrix factorization for data representation. IEEE
Transactions on Pattern Analysis and Machine Intelligence, [17] W. Xiong, Z. Wu, B. Li, Q. Gu, L. Yuan, and B. Hang.
33(8):1548–1560, 2011. Inferring service recommendation from natural language api
descriptions. In Proceedings of IEEE International Confer-
[4] A. Clauset, M. E. Newman, and C. Moore. Finding com- ence on Web Services, pages 316–323, San Francisco, jun
munity structure in very large networks. Physical review E, 2016. IEEE.
70(6):066111, 2004.
[18] X. Yan, J. Guo, S. Liu, X. Cheng, and Y. Wang. Learning
[5] W. Gao and J. Wu. A novel framework for service set topics in short texts by non-negative matrix factorization on
recommendation in mashup creation. In Proceedings of term correlation matrix. In Proceedings of the 2013 SIAM
IEEE International Conference on Web Services, pages 65– International Conference on Data Mining, pages 749–757,
72, Honolulu, jun 2017. IEEE. San Diego, 2013. SIAM.
[6] Y. Hao, Y. Fan, W. Tan, and J. Zhang. Service recommenda- [19] W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and
tion based on targeted reconstruction of service descriptions. X. Li. Comparing twitter and traditional media using topic
In Proceedings of IEEE International Conference on Web models. In European Conference on Information Retrieval,
Services, pages 285–292, Honolulu, jun 2017. IEEE. pages 338–349. Springer, 2011.
[7] S. Hochreiter and J. Schmidhuber. Long short-term memory.
Neural computation, 9(8):1735–1780, 1997.

View publication stats

You might also like