You are on page 1of 4

Searching Ontologies Based on Content: Experiments in the Biomedical

Domain A Review

1
Nur Shamilla Selamat, 2Mdm Rozilawati binti Dollah @ Md Zain

1
Department of Information Systems, Faculty of Computing, Universiti Teknologi Malaysia,
81310 Johor Bahru, Johor, Malaysia
nurshamillaselamat@gmail.com

2
Department of Information Systems, Faculty of Computing, Universiti Teknologi Malaysia,
81310 Johor Bahru, Johor, Malaysia
rozilawati@utm.my

Abstract

Searching for a collection of ontologies according to a users query is hard


as ontologies that are most relevant to a users query, most often than not, do
not have the users query term in the names of their concept. A study by
Alani, Noy, Shah, Shadbolt, & Musen (2007) provides a solution to this
problem by retrieving a collection of terms from the Web which matches the
users query term in expanding the users query. Thus, improving the results
of the search by 113%.

Keywords: Ontology Searching, Biomedical Ontologies, Ontology Analysis

1.0 Ontology Search

Ontologies are one of the essential components of the Semantic Web. At the time, as there are
an abundance of ontologies available, it is tough for users to search for ontologies that are
relevant to their domain of interest. This is due to the problem whereby ontologies that are
most relevant to a particular domain, most often than not, do not contain the name of the
domain itself in the names of their classes, properties, and property values.
A study by Alani, Noy, Shah, Shadbolt, & Musen (2007) defines the aforementioned
problem in an ontology repository R as follows: given a user query Q, R returns a collection
of ontologies according to Q. They view ontology search and ontology ranking as two
complementary sides of the problem in searching for the most relevant ontologies.
In understanding how users search for ontologies, the authors monitored Protgs
mailing lists, whereby it received requests from users searching for ontologies in various
domains. The result of the observation shows that most of the requests name the domain,
albeit not the representative terms for the domain. Furthermore, search engines return
ontologies which have the query term in their class or property names, rather than ontologies
which cover the domain that is described by the query term.
Ontology metadata provided by ontology authors helps in evaluating whether an
ontology covers a particular domain or otherwise. Ontology repositories such as BioPortal
enables authors to specify the domain of their ontologies, among other things such as their
ontologies metadata.
The result of an investigation by Alani, Noy, Shah, Shadbolt, & Musen (2007)
presents a new mechanism in searching for ontologies, in which it uses and expands the query
term supplied by user in order to search for ontologies with the largest number of terms from
the expanded set, taking term frequency into account. It mimicks the way a human expert
would search for terms that are relevant to a topic, by searching the Web for the topic by
using Google and by restricting the results to the ones from Wikipedia with pages from 2 to
50, in order to produce a corpus to describe the domain of the query term. Subsequently, the
frequency of the terms that appear in the corpus is calculated by using a TF algorithm, in
which the top 50 terms are used as the new user query.
To evaluate the result, experts in biomedical ontologies were asked to identify
ontologies from a repository named Open Biomedical Ontologies, which are available
through the BioPortal of the National Center for Biomedical Ontologies, based on queries
such as (1) anatomy, (2) pathology, (3) physiological process, and (4) histology, in which the
derived results are set as the gold standard in determining the precision, recall and f-measure
to compare the results produced from the authors approach against three baseline cases: (1)
search for the class and property names of ontologies by using only the query terms supplied
by users, (2) search for the class and property names, as well as the property values with the
query terms specified by users, and (3) return all ontologies in the repository. As a result, the
authors approach demonstrated a 113% improvement over the results of that of baseline (1),
and 43% over the results of that of baseline (2).
All in all, the authors contribution includes analysing the inter-expert agreement for
the problem of ontology search, discovering that searching with property values, in
conjunction with class and property names produces significant improvement in search
results, describing an ontology search approach which uses Wikipedia for query expansion,
and evaluating the approach on a repository for biomedical ontologies, by comparing the
results to those that were derived manually by domain experts.

2.0 Related Work

Ontologies are different from standard Web pages, whereby ontologies are comprised of
highly structured graphs of classes and properties, whereas the latter are made up of text.
Even so, ontology search engines apply traditional Web search techniques when searching
for ontologies such as described in Table 1.

Table 1 Ontology Search Engines

Search Engine Description


Swoogle The dominant ontology search engine. Searches index of
ontologies by crawling off the Web for classes and
properties with names that contain the query terms
specified by user.
OntoSearch Employs reasoning by using Pellet, and supports direct
matches to labels or structures only.
OntoSelect Searches by content, similar to the authors own
approach, although differ in terms of it rely on user in
finding a one-document corpus, whereas the authors are
proposing a method to automate the process to be used
in searching for ontologies.

Nevertheless, traditional Web search techniques do not always return relevant results, hence
query expansion is employed. In addition, ontology ranking is as important as ontology
search. Thus, the authors are proposing to combine their ontology search approach with one
of the ontology-ranking approaches namely, the PageRank-based ranking of Swoogle, or the
structure-based ranking like AKTiveRank, or user ratings.
3.0 Conclusion

Ontologies are difficult and costly to build. Thus, in order to encourage the reusing of
ontologies, studies on advanced ontology searching approaches are encouraged. This is
because, one of the problem in ontology search is that queries name a domain, instead of the
terms that are most likely would appear among the ontology concepts. Therefore, query
expansion technique is introduced in ontology search. Nevertheless, how a query is
expanded, and the result of the expansion affect the final results significantly, as expanded
query that contains terms that are too general or irrelevant may propagate the result set.

4.0 References

Alani, H., Noy, N. F., Shah, N., Shadbolt, N., & Musen, M. A. (2007). Searching Ontologies
Based on Content: Experiments in the Biomedical Domain.

You might also like