Professional Documents
Culture Documents
Keywords: Genetic Algorithm , Ontology, Ontology Alignment , Ontology Mapping, Optimized Ontology
Mapping.
------------------------------------------------------------------------------------------------------------------- --------------------------
Date of Submission: Feb 12, 2018 Date of Acceptance: March 07, 2018
-------------------------------------------------------------------------------------------------------------------------- -------------------
1. Introduction related with ontologies need to be made clear. Next
subsection throws light on some such terms:
S emantic web emphasizes on incorporating meaning
with information displayed on the web. Ontologies are 1.1 Technical Preliminaries
the backbone of knowledge exchange in semantic web This section briefly explains some terms which need
where ontology is the taxonomy for a domain to understood clearly for understanding this work:
representing concepts, objects, attributes and their a. Ontology Mapping: Ontology mapping refers to
relationships with each other. Ontology represents method of translating concepts of one ontology
shared conceptualization (Gruber,1995) of a domain into concepts defined in some other ontology.
for use in semantic driven application in present Ontology mapping usually involves some loss of
Internet, where shared conceptualization refers to the information however, it doesn’t lead to
commonly accepted understanding for conceptual inconsistencies. Ontology alignment and
model of a domain under consideration.Ontologies articulation are used synonymously for ontology
(Singh et al.,2011) find applicability in system mapping. These are defined as:
engineering, semantic web, artificially intelligent Ontology Alignment: refers to establishing
systems, information extraction and aggregation to a set of binary relations between the
name a few areas. Ontologies ( Singh et al., 2010) aim vocabularies of two ontologies.
to capture the knowledge in a generic and formal way Ontology Articulation: involves
so that it may be reused and shared across generation of rules through which fusion
applications and by groups of people. or merging or ontologies can be carried
However, with wide acceptance of internet based out. Conditions of ontology alignment are
applications more and more ontologies are developed referred as articulation (Chitra &
by various stakeholders for different purposes, Aghila,2014).
making their interoperability difficult. Further, b. Similarity Measure:Similarity is numeric
considering the large size of internet, its users and measure of the degree to which two objects are
variety of applications being used; it is difficult to alike . Similarity measures focus on providing
force users to work with a single ontology for a concrete basis for finding similarity among two
domain. However, in order for different applications entities belonging to separate ontologies. Two
to communicate with each other and exchange objects must have similar characteristics to be
knowledge, it becomes essential for ontologies to be comparable. Formal definition of similarity
interoperable. This has been considered as an between two objects x and y as given by Ehrig
important issue by semantic web community and and Sure (2004) states:
many efforts has been made in this direction in names sim(x, y) ∈ [0..1]
of ontology alignment or ontology mapping. Some
sim(x, y) = 1 → x = y: two objects are
researchers have tried to focus on optimizing ontology
identical.
mapping, however the reason for optimizing ontology
sim(x, y) = 0: two objects are different
mapping and scenario requiring it are not clearly
and have no common characteristics.
stated. Before moving further, some basic terms
sim(x, x) = 1: similarity is reflexive.
Int. J. Advanced Networking and Applications 3572
Volume: 09 Issue: 05 Pages: 3571-3579(2018) ISSN: 0975-0290
Ontology mapping involves searching concepts of off-springs in next generations. Here reproduction
ontology in another one. Size of these taxonomies can refers to selecting fittest chromosome based on its
be quite large, leading to increased time and space fitness value. Crossover refers to exchanging genes
complexity of search processes. Thus, heuristic search between two individual chromosomes of a population
techniques need to be employed to reduce the number for producing new off-springs. Mutation deals with
of alternatives to be explored in the search space. randomly changing genes in a chromosome. It is of
Heuristic search techniques make use of a fitness two types i.e. Point mutation and chromosomal
function to decide next alternative to be explored mutation. In Point mutation only a single gene is
among many available alternatives. It is usually altered in a chromosome, whereas in chromosomal
implemented by assigning weights to various mutation few genes are altered completely.
alternatives i.e. candidates in a search space. Thus process of GA for problem solving may be
However, manual assignment of these weights is not summarized as follows:
practically feasible nor desirable in web based 1) Obtain a set of initial population
applications. A still better mechanism for searching 2) Iterative execution of:
ontologies and automating computation of fitness (i) Evaluation
function is use of machine learning techniques such as (ii) Selection
Genetic algorithms. (iii) Reproduction
Consequently, the main aim of the current work is to (iv) Crossover
present a genetic algorithm based optimized ontology (v) Mutation
mapping technique. 3) Convergence to a solution
The rest of paper is structured as follows: Section 2 Next section presents literature review in the relevant
provides brief overview of genetic algorithm and its domains.
working. Section 3 presents survey of relevant
literature in ontology alignment, ontology similarity 3. Literature Survey
parameters and genetic algorithms. Section 4 This section explores existing literature on ontology
introduces the proposed mechanism, experimental similarity measures and mapping mechanism and
analysis is illustrated in section 5. Finally, section 6 various methods available for ontology mapping
concludes this work. optimization.
Man et. al. (1996) in [7], have introduced GA as a
2. Genetics for Ontology Mapping: An complete entity in which knowledge can be integrated
Overview to develop framework for a design tool. Authors
Genetic Algorithm (GA) (Man et al.,1996) is based on highlighted that Genetic algorithms may be used as
evolutionary theory that follows principal of optimization tool.
‘survival-of-the-fittest’. It was presented by Maedche and Staab (2001) in [1], has considered
J.H.Holland in 1970s and has proved to be significant ontology as semiotic sign systems that are used to
instrument for scientific and engineering applications communicate meaning. They have proposed a
(Malhotra et al.,2011) since then.GA works on natural methodology to measure the extent to which two
process of evolution like reproduction, mutation, ontologies overlap and fit with each other at various
recombination and selection for providing solutions of semiotic levels. However, evaluation of proposed
complex and conflicting problems. Due to availability method with real world data is left as part of future
of cheap and high-speed computational components, work.
GA has emerged as an appealing solution for wide Wiesman and Roos (2004) in [4], introduced an agent
range of complex , time consuming tasks such as based domain independent method for ontology
information retrieval (Thada& Jaglan,2013), ontology mapping based on learning relationship between
mapping (Wang et al.,2006) and text mining etc. ontologies. However, mapping between different
GA starts with an initial population, where population representations of the same concepts can’t be handled
refers to a set of possible solutions for a problem. properly. Authors emphasized that context dependent
Each member of population is termed as a ontology mapping is an NP-Hard Problem. Further, an
chromosome and it represents a string of genes where extension of this method to learn a mapping between
a gene represents a bit pattern. The goal is to obtain a groups of interrelated concepts has been left as part of
set of most suitable chromosomes or most suitable future research.
individual chromosome after some iterations of GA. Euzenat J. (2004) in [5], has compared ontology
Suitability of a chromosome for a particular problem alignment methods on common tests. Main purpose of
is measured using fitness function (Renjith& this evaluation of ontology alignment methods was to
Chandrika, 2013). A population obtained after some help designer and developers of such methods to
iterations is called as a generation. improve further and help users to evaluate the
Effectiveness of next generation is enhanced by suitability of proposed methods for their applications.
applying reproduction, crossover and mutation A semi-automatic ontology mapping tool called
operations. Purpose of these operations is to mix or GLUE had been deployed by Doan et al. (2004) in
recombine genes of parents for production of their [9]. This tool makes use of multi-strategy learning
Int. J. Advanced Networking and Applications 3574
Volume: 09 Issue: 05 Pages: 3571-3579(2018) ISSN: 0975-0290
approach. It makes use of Naïve Bayes learning Malhotra et. al. (2011) in [6], have discussed the
technique which applies well to long textual elements concept and design procedure of genetic algorithms as
but is less effective with short, numeric elements. an optimization tool. They have applied GA for
Wang et. al. (2006) in [8], have developed a genetic process control in induction motor drive, speed
algorithm based optimization procedure for ontology control of gas turbine, etc. and optimized control
matching problem taking it as a feature-matching parameters for them. Singh et. al. (2011) in [10] have
process. Global similarity measure has been taken as proposed an agent based ontology mapping
fitness function between two ontologies based on mechanism for mapping in homogenous as well as
feature sets. heterogeneous domains, in order to facilitate
Martinez-Gil et. al. (2008) in [2], presented Genetics interoperability between multi-agent systems
for Ontology Alignments (GOAL) approach, to developed by different stakeholders for different
compute the optimal ontology alignment functions for purposes. This mechanism makes use of ontology
a given ontology input set. However, a multi- extension and intension concepts. However this work
objective strategy, avoiding unwanted deviations from doesn’t consider optimization while ontology
precision and recall values is left as part of future mapping.
study. Further, the authors emphasized that there Hartung et. al.(2013) in [3], presented Generic
should be a technique which given the specifications Ontology Matching and Mapping Management
of an ontology matching problem, may compute the (GOMMA) framework which works on n-gram
optimum alignment function. So that, ontology matching for computing the similarity of concept
alignment problem may be solved accurately and names and synonyms. This work outlined use of
without human intervention. This would lead to real Graphical Processing Unit (GPU) for highly parallel
interoperability in the semantic web. string matching. The GPU based execution of
Lin and Sandkuhl (2008) in [14], provided a review algorithms like n-gram matching requires some efforts
on exploiting Wordnet for ontology mapping. Authors to overcome the CPU limitations but boosts
emphasized that synonyms can help solve naming performance. However, effect of different kinds of
conflicts [4] among various ontologies, while GPU hardware on GPU-based similarity computations
mapping and Wordnet thesauri can help improve has been left as part of future research.
similarity measures dealing with ontology mapping. Singh and Anand (2013) in [13],developed an agent
A design structure for development [12] of based mechanism for automatic construction of
ontological databases in general had proposed by domain ontologies. Authors have used mapping
Singh et. al. (2010) in [11]. This work elaborated between already existing ontologies to construct new
minute details to be considered while designing ontology thus reducing time and efforts required in
ontology databases to make knowledge interchange this process. A comparison and summarization of
language independent. various existing techniques is given as follows in
Table 1.
From the above table it can be concluded that, specializes searching along very high dimensional
although many efforts have been made towards search spaces, as this problem is.
ontology mapping, optimization of ontology mapping This work focuses on finding the optimal matching
still is an open research issue. It is clear that Genetic ontology from large number of ontologies existing
algorithms may be used for problems having large corresponding to a source ontology. Considering
search spaces. Some researchers have already used source ontology SO1consisting of n concepts and k
ontology mapping with this technique, however still target ontologies are available for mapping each
there is scope for a mechanism which may consisting of m concepts then total number of
incorporate, semantic knowledge in optimization comparisons required to choose best match will use
process. Therefore, the motivation to the current work the following equation :
is to develop an approach for optimizing ontology Optimal_matching(SO1)=f(n×k×m) (3)
mapping using Genetic algorithms as introduced in In order to solve this problem using GA, both the
the next section. fitness function (FT) and the evaluation function need
to be decided. The ontology taxonomies (hierarchy)
4. The Proposed Optimizing Ontology (OH) will act as input in formation of chromosomes
Mapping Using Genetic Algorithms of sample space, where a chromosome is a collection
(OOMGA) Approach of i genes.
For formulating genes, OH will be traversed starting
This work presents Optimized Ontology Mapping from root node to leaf node in depth first order, one
using Genetic Algorithm (OOMGA) mechanism for such traversal will produce one gene, and traversal of
optimal ontology mapping. This mechanism takes into complete OH will produce i genes {g1,g2,g3,-----gi}.
consideration synonymous concepts existing in Thus source ontology hierarchy OHs can be
compared ontologies along with usual method of term represented as a chromosome Cswhere
frequency based mapping. Reason for deploying GA Cs={g1s,g2s,g3s,-----gis}(4)
among all machine learning techniques is that GA Ontology mapping will involve comparison of
Cs(OHs) with {C1(OH1),C2(OH2),----Ck(OHk)} as
shown below in figure 2:
1
s g11 g12 g13 g14 g1m
g1s 2 g21 g22 g23 g24 g2m
can be used as it provides magnitude of difference If there is no semantic similarity between two genes
between two genes, as follows: or two ontologies then theJaccard coefficient (J) will
J g1s , g11 g1s g11 g1s g11 (5) be 0 or close to zero. Then, the similarity will depend
mainly on Cosine_similarity of genes.
The Jaccard coefficient (J) between two genes would
be 1 or close to 1 if they are either identical or near 4.1 Example for mapping between two educational
identical, however it will be 0 in case of unidentical ontologies
genes. To clarify the above stated concept, consider the two
The fitness function ofthe proposed framework is exampleontologies as shown in figures 3 and 4. Both
defined as: these ontologies are from education domain, one
fitness _ fun cos_ sim g1s , g11 J ( g1s , g11 ) represents part of university ontology and other
illustrates part of school ontology.
(6)
Employees
Staff
Now, every concept of source and target ontologies (9) and (10) provide exact matching of all three terms
has a synonym set associated with it. These synonyms at second subset in equation 9, based on contextual
are represented in numeric values from using UIA similarity of these terms. Now J-coefficient for g11 and
table. For example: concept employee has synonym all subsets will be computed and maximum value
set {staff, worker} which can be represented as among all calculated values will be considered as J-
{13,21} using positional value of staff and worker coefficient of original pair (g1s , g11). For more
from table 2, similarly term faculty has synonym set relevant and lesser false negatives while matching, the
{13,16,17} . fitness function is to be computed.
This similarity calculation mechanism is better than
For generalization, when comparing two genes for cosine similarity alone as it incorporates contextual
if g1s g11 where:
similarity of terms in various ontologies.
similarity i.e., to check,
g1s={employee, faculty, asstt. prof.} ={1,3,7}(7) 4.2 Work Flow of OOMGA
g11={staff, faculty, lecturer}={13,3,17} (8) Figure 5 given below illustrates work flow in
Before comparing g1s is scanned from synonym set OOMGA. For optimized ontology mapping, initially
matrix (table 3) and its synonymous set termed as concepts of source ontology will be converted into
syn_set is generated by replacing each term with all genes. All unique terms of these genes will be entered
its synonyms one by one. For example syn_set for g1s into UIA and will be assigned unique integer values.
is given below: Further, synonyms of all unique terms will be
syn_set (g1s )= {{13,21},{13,16,17},{17}} obtained from Wordnet and will be inserted into
using this, g1s can be rewritten in expanded form as synonym set matrix. Afterwards, genes will be
shown below. converted into numeric sets. Then synonymous set
g1s={{1,3,7},{13,3,7},{21,3,7},{1,13,7},{1,16,7},{1, (syn_set) will be generated for source gene and it will
17,7},{1,3,17}} (9) be used for computing Jaccard coefficient from target
g11={13,3,17} (10) gene. In this process, J value for contextually similar
As compared to original equations (7) and (8) where genes will become close to one. Cosine similarity of
only one term was matching exactly, new equations source and target genes will also be computed.
Int. J. Advanced Networking and Applications 3578
Volume: 09 Issue: 05 Pages: 3571-3579(2018) ISSN: 0975-0290
Then, fitness function between two genes will be so it is a promising technique for optimized ontology
computed using equation (6) and will be stored in mapping. Further, proposed technique deploys a
fitness matrix shown in Table 4 given below. similarity calculation mechanism that is better than
Purpose of Table 4 is to keep record of fitness cosine similarity alone as it incorporates contextual
function values when source gene is compared with similarity of terms in various ontologies while
different target genes. Based on a threshold value, mapping optimization. However, proposed
genes will be selected for next generation and then mechanism is still in process of implementation.
mutation and crossover operations will be applied Future work involves its implementation and
with some probability (To be decided at the time of comparison with existing techniques.
experiment) to generate next generation.
This process will be repeated on all ontologies under References:
consideration for mapping and best matching 1. Maedche, A., &Staab, S. (2001). Comparing
ontologies would be considered as optimal matching ontologies-similarity measures and a comparison
pair. study (p. 16). AIFB.
2. Martinez-Gil, J., Alba, E., & Aldana-Montes, J. F.
4. Conclusions and Future Work (2008, October). Optimizing ontology alignments by
This work presented an optimized ontology mapping using genetic algorithms. In Proceedings of the
technique deploying genetic algorithm.GA specializes workshop on nature based reasoning for the semantic
searching along very high dimensional search spaces Web. Karlsruhe, Germany.
Int. J. Advanced Networking and Applications 3579
Volume: 09 Issue: 05 Pages: 3571-3579(2018) ISSN: 0975-0290
3. Hartung, M., Kolb, L., Groß, A., & Rahm, E. 14. Lin, F., &Sandkuhl, K. (2008). A survey of
(2013, January). Optimizing Similarity Computations exploiting wordnet in ontology matching.
for Ontology Matching-Experiences from GOMMA. In Artificial Intelligence in Theory and Practice
In Data Integration in the Life Sciences (pp. 81-89). II (pp. 341-350). Springer US.
Springer Berlin Heidelberg. 15. Ehrig, M., & Sure, Y. (2004). Ontology
4. Wiesman, F., & Roos, N. (2004, July). Domain Mapping-an integrated approach. In the Semantic
independent learning of ontology mappings. Web: Research and Applications (pp. 76-91).
In Proceedings of the Third International Joint Springer Berlin Heidelberg.
Conference on Autonomous Agents and Multiagent 16. Lee, W. N., Shah, N., Sundlass, K., &Musen, M.
Systems-Volume 2 (pp. 846-853). IEEE Computer (2008). Comparison of ontology-based semantic-
Society. similarity measures. In AMIA annual symposium
5. Euzenat J. (2004), ‘Evaluating Ontology proceedings (Vol. 2008, p. 384). American Medical
Alignment Methods’. Published in proceedings of Informatics Association.
Dagstuhl Seminar on Semantic Interoperability and 17. Gruber, T. R. (1995). Toward principles for the
Integration, September 2004, Wadern, Germany. design of ontologies used for knowledge
6. Malhotra R., Singh N. and Singh Y. (2011), sharing. International journal of human-computer
‘Genetic Algorithms: Concepts, Design for studies, 43(5), 907-928.
Optimization of Process Controllers’. Published by 18. Gao, Y., & Gao, W. (2012). Ontology similarity
Canadian Center of Science and Education in measure and ontology mapping via learning
International Journal of Computer and Information optimization similarity function. International
Science, Vol. 4, No.2, March 2011,pp. 39-54. Journal of Machine Learning and Computing, 2(2),
7. Man, K.,F., Tang, K.,S. and Kwong, S. (1996). 107-112.
Genetic Algorithms: Concepts and Applications. 19. Turney, P. D., &Pantel, P. (2010). From
IEEE Transactions on Industrial Electronics, frequency to meaning: Vector space models of
43(5),519-534, OCTOBER 1996. semantics. Journal of artificial intelligence
8. Wang, J., Ding, Z., & Jiang, C. (2006, December). research, 37(1), 141-188.
GAOM: Genetic algorithm based ontology matching. 20. Chitra, S., &Aghila, G. (2014). A survey on tools
In Services Computing, 2006. APSCC'06. IEEE and algorithms of ontology operations. Research
Asia-Pacific Conference on (pp. 617-620). IEEE. Journal of Engineering Sciences, 3(5), 12-25, May
9. Doan, A., Madhavan, J., Domingos, P., & Halevy, 2014.
A. (2004). Ontology matching: A machine learning 21. Jing, L., Zhou, L., Ng, M. K., & Huang, J. Z.
approach. In Handbook on ontologies (pp. 385-403). (2006, April). Ontology-based distance measure for
Springer Berlin Heidelberg. (GLUE Approach) text clustering. In Proceedings of the Text Mining
10. Singh, A., Juneja, D., & Sharma, A. K. (2011). Workshop, SIAM International Conference on Data
Design of an Intelligent and Adaptive Mapping Mining (Vol. 23).
Mechanism for Multiagent Interface. In High 22. Thada, V., &Jaglan, V. (2013). Comparison of
Performance Architecture and Grid Computing (pp. Jaccard, Dice, Cosine similarity coefficient to find
373-384). Springer Berlin Heidelberg. best fitness value for web retrieved documents using
11. Singh, A., Juneja, D., & Sharma, A. K. (2010). genetic algorithm. International journal of
General Design Structure of Ontological Databases Innovations in Engineering and Technology (IJIET),
in Semantic Web. International Journal of 2(4), 202-205, August 2013.
Engineering Science and Technology, 2(5), 1227- 23. Renjith, S., & Chandrika, A. (2013). Fitness
1232. function in genetic algorithm based information
12. Singh, A. and Anand,P.(2013). State of Art in filtering- a survey. International Journal of Computer
Ontology Development Tools. International Journal Science and Mobile Computing, 80-86, December
of Advances in Computer Science & Technology, 2013.
2(7),96-101, July 2013. 24. http://www.merriam-webster.com/dictionary/
13. Singh, A. and Anand,P. (2013). Automatic optimization
Domain Ontology Construction Mechanism.
Proceedings of 2013 IEEE International
Conference on Recent Advances in Intelligent
Computing Systems (RAICS), Trivandrum,
Kerala, India, December 19-21,2013,(pp.304-309).