You are on page 1of 13

Biologically Inspired Cognitive Architectures (2015) 12, 121– 133

Available at www.sciencedirect.com

ScienceDirect

journal homepage: www.elsevier.com/locate/bica

RESEARCH ARTICLE

Human-inspired semantic similarity between


sentences
a,*
J. Ignacio Serrano , M. Dolores del Castillo a, Jesús Oliva b, Rafael Raya a

a
Group of Neural and Cognitive Engineering (gNeC), Centro de Automática y Robótica, Consejo Superior de Investigaciones
Cientı́ficas (CSIC), Spain
b
BBVA Data & Analytics, Spain

Received 23 March 2015; received in revised form 10 April 2015; accepted 10 April 2015

KEYWORDS Abstract
Cognitive linguistics; Following the Principle of Compositionality, the meaning of a complex expression is influenced,
Computational linguistics;
to some extent, not only by the meanings of its individual words, but also the structural way the
Semantic similarity
words are assembled. Compositionality has been a central research issue for linguists and psy-
cholinguists. However, it remains unclear how does syntax influence the meaning of a sentence.
In this paper, we propose an interdisciplinary approach to better understand that relation. We
present an empirical study that seeks for the different weights given by humans to different
syntactic roles when computing semantic similarity. In order to test the validity of the hypothe-
ses derived from the psychological study, we use a computational paradigm. We incorporate
the results of that study to a psychologically plausible computational measure of semantic sim-
ilarity. The results shown by this measure in terms of correlation with human judgments on a
paraphrase recognition task confirm the different importance that humans give to different
syntactic roles in the computation of semantic similarity. This results contrast with generative
grammar theories but support neurolinguistic evidence.
ª 2015 Elsevier B.V. All rights reserved.

Introduction

We, humans, are continuously assessing the similarity of


* Corresponding author.
objects in our daily life. As explained by Goldstone (1994),
when humans try to judge the similarity of visual scenes,
E-mail addresses: jignacio.serrano@csic.es (J.I. Serrano), md.
delcastillo@csic.es (M.D. del Castillo), jesus.oliva1984@gmail.com we take into account the structure of the compared objects
(J. Oliva), rafael.raya@csic.es (R. Raya). and the different relationships between the different parts.

http://dx.doi.org/10.1016/j.bica.2015.04.007
2212-683X/ª 2015 Elsevier B.V. All rights reserved.
122 J.I. Serrano et al.

So humans use the structural information in the comparison focused on the two specific domains: computer literacy
of general objects, but could this conclusion be generalized and psychological research methods. Moreover we give a
to language? To what extent do the different parts of the more accurate quantitative measure of the different
hierarchical structure of a sentence influence the global weights given by humans to different syntactic roles while
meaning? The relations between syntax and semantics have computing semantic similarity.
been studied for several years. In particular, compositional- In order to assess the validity of the conclusions obtained
ity has been largely studied since it was first proposed as the with the experiments carried out with humans, we used a
notion that the meaning of an expression is determined by computational paradigm. We incorporated the results of
the meaning of its constituents and the way they are com- the empirical study to a psychologically plausible semantic
bined. This is clearly shown by sentences made up by the similarity measure (Oliva, Serrano, Del Castillo, & Iglesias,
same words but with very different semantic interpretations 2011) that takes into account the influence of different syn-
like: ‘‘The dog bit the man’’ and ‘‘The man bit the dog’’. tactic roles on the overall sentence meaning. The semantic
Despite the great amount of work about the Principle of similarity measure was applied to a paraphrase recognition
Compositionality and its different interpretations, it is still task (Dolan, Quirk, & Brockett, 2004) using two different
not clear the real influence of different syntactic roles on combinations of weights obtained from the human judg-
the representation of meaning. See the following examples: ments for a semantic similarity task and from the human
judgments for the same paraphrase recognition task. The
(a) That movie made me cry quickly. results obtained with both versions confirm the different
That Movie made me cry slowly. contribution of different syntactic roles on semantic simi-
(b) Than movie made me cry quickly. larity computation. The different variations of the similarity
That movie made me laugh quickly. measure tested with the two combinations of weights out-
It seems clear that the two sentences in (a) are more performed their non-weighted counterparts. Moreover, they
semantically similar than the ones in (b). However, in both obtained results similar to the ones reported by Islam and
cases we have just replaced one word by one of its anto- Inkpen (2008) and Mihalcea, Corley, and Strapparava
nyms. So, how does our brain compute semantic similarity? (2006) on the paraphrase recognition task. Furthermore,
Are some parts of a sentence more important than others in four of the six approaches tested outperform significantly
the computation of semantic similarity? Most studies have the method of Mihalcea et al. (2006) and the results of three
centered on the dominant effect of verbs on the meaning of them are similar to the ones reported by Islam and Inkpen
of a sentence but there is a lack of work about the relative (2008). Finally, we compared the different weights given by
influence of different syntactic roles. The only study in this humans to different syntactic roles on different tasks that
direction is the one presented by Wiemer-Hastings (2004). involve semantic similarity computation. The weights
On that paper, Wiemer-Hastings points that human judges obtained from the human ratings of semantic similarity
tend to ignore similarities between segments with different and the ones obtained from the paraphrase recognition task
functional roles, denoting the importance of syntactic were very similar, showing that humans tend to use the
structure analysis in the computation of semantic similarity, same weights through different tasks. The interdisciplinary
and claiming that different syntactic roles have a different character of this work is not only assessed by the combina-
level of signification in the calculation of semantic similarity tion of experimental techniques derived from psycholinguis-
by humans. tics and computational sciences. Moreover, the
In this paper, we propose an interdisciplinary approach contributions of this paper are of interest both from a the-
to better understand how our mind computes semantic sim- oretical and a practical point of view. The importance of
ilarity and, in particular, the different importance that sentence semantic similarity measures in natural language
humans give to different syntactic roles in the computation research is increasing due to the great number of applica-
of semantic similarity. Acoording to Cambria and White tions that are arising in many text-related research fields.
(2014), the work presented here aims at explaining how For example, in text summarization, sentence semantic
can we jump from the syntactics and semantics curves to similarity is used (mainly in sentence-based extractive doc-
the pragmatics one. To this end, we performed a psycholog- ument summarization) to cluster similar sentences and then
ical study about how humans compute semantic similarity find the most representing sentence of each cluster
between sentences and then we use a computational para- (Aliguliyev, 2009). Also in web page retrieval sentence sim-
digm in order to test the validity of the hypotheses derived ilarity can improve the effectiveness by calculating similar-
from that study. First of all, we present an empirical study ities of titles of pages (Park, Ra, & Jang, 2005).
that seeks for the different weights given by humans to dif- Conversational agents can also benefit from the use of sen-
ferent syntactic roles when computing semantic similarity. tence similarity measures reducing the scripting process by
According to the results obtained by Wiemer-Hastings using natural language sentences rather than patterns of
(2004), we start from the hypothesis that different syntactic sentences (Allen, 1995). These are only a few examples of
roles have different importance in the calculation of seman- applications whose effectiveness could improve with sen-
tic similarity by humans. Going a step forward, we made a tence semantic similarity calculation. Therefore, our work
deeper empirical study, based on two experiments with not only sheds light on the theoretical question of how
human judges, that complement the experiments carried humans use syntactic roles when computing semantic simi-
out by Wiemer-Hastings (2004) and overcome some of their larity. It also shows how those results can be straightfor-
limitations. Our experiments are not restricted to specific wardly used in a psychologically plausible computational
domains while the work of Wiemer-Hastings (2004) is system with many practical applications.
Human-inspired semantic similarity between sentences 123

The structure of this paper is the following: next section relations between syntax and semantics, the most impor-
presents a brief review of some studies about the influence tant effort to measure the influence of different syntactic
of syntax on semantics and some approaches to compute roles on semantic similarity computation is the one done
semantic similarity taking into account syntactic informa- by Wiemer-Hastings (2004). On their work, they tried to
tion to some extent. The following section discusses the catch the relative influence of the main phrases (subject,
psychological experiments carried out to measure the con- verb, object and indirect object) within a sentence. A cor-
tributions of different syntactic roles to the overall meaning pus of 50 sentence pairs was constructed for the experi-
of a sentence. Next section after that presents an adapta- ment. The first sentence of each pair was built by
tion of a psychologically plausible semantic similarity mea- randomly selecting a subject, verb, direct object and,
sure that applies the conclusions obtained in the previous optionally, an indirect object from a list of candidates.
experiments on the computation of semantic similarity by The second sentence was created by taking some of the can-
humans. Finally, a conclusion section sums up the work didates selected for the first sentence but changing the
and points out some conclusions and future work. roles of some of them. The participants were asked to rate
the similarity of each pair on a scale from 1 (totally dissim-
ilar) to 6 (completely similar). The main conclusions
Background obtained by Wiemer-Hastings is that humans tend to ignore
similarities between segments with different functional
From early 70s the relations between syntax and semantics roles and that verbs play a main role on the overall
have been studied to some extent. However, a deep empir- semantics, followed by subjects and objects among which
ical study about the quantitative and qualitative contribu- there is no significant difference. This is also in accordance
tion of the different syntactic roles to the overall meaning with recent neurolinguistic findings (Malaia & Newman,
of a sentence has not been carried out to date. 2014).
An early study by Healy and Miller (1970) focused on the Talking about computational methods, there do not exist
influence of verbs and subjects on semantics. On that study, many methods that take into account the different impor-
25 sentences were constructed from 5 subjects (the sales- tance of syntactic roles to compute semantic similarity
man, the writer, the critic, the student and the publisher), between sentences. There are few methods that consider
5 verbs (sold, wrote, criticized, studied and published), and pseudo-syntactic information such as word order in the sen-
1 object (the book). Participants were asked to sort the sen- tence. The ones proposed by Achananuparp, Hu, Zhou, and
tences in groups according to similarity in meaning. Results Zhang (2008), Li, McLean, Bandar, O’Shea, and Crockett
showed that participants tend to sort sentences that share (2006) and Islam and Inkpen (2008) use this kind of informa-
the same verb and not the same subject. The conclusion tion showing the best results reported in the literature in
obtained by Healy and Miller is that the verb is the main terms of correlation to human intuition, thus verifying that
determinant of sentence meaning. syntactic information is of great importance in the compu-
Starting from this experiment, most linguistic and psy- tation of sentence semantic similarity. Given the promising
cholinguistic theories assume that verbs play a main role results obtained by methods that include shallow syntactic
on sentence semantics. However, Bencini and Goldberg information, some authors have tried to consider deeper
(2000) suggest that the contribution of verbs to the overall syntactic information. In this direction we can find the
meaning of the sentence may not be as strong as assumed. approach proposed by Achananuparp, Hu, and Yang (2009)
In their study, Bencini and Goldberg support that argument that takes into account the verb-argument structure of
structure constructions (i.e. the different configurations of the sentences to measure their semantic similarity and
the complements of a verb) are directly associated with the efforts of Wiemer-Hastings (2004), Wiemer-Hastings
sentence meaning. To get to this conclusion, they carried and Ziprita (2001), Wiemer-Hastings (2000) and Wiemer-
out an experiment in which participants were asked to Hastings, Wiemer-Hastings, and Graesser (1999) to add syn-
group sentences according to similarity in meaning. tactic information to LSA (Landauer, Foltz, & Laham, 1998)
Sixteen sentences were used, obtained by crossing four (LSA is a corpus-based method for computer modeling and
verbs (throw, slice, get and take) with four constructions simulation of the meaning of words and passages that allows
(ditransitive, caused motion, resultative and transitive). to compute easily its semantic similarity). Both approaches
The results showed that most of the participants sorted improve the results obtained by similar methods that do not
the sentences by construction and not by verb. This way, take into account the contributions of different syntactic
the argument structure constructions seem to play a crucial roles. However, going beyond the verb-argument structure
role in sentence interpretation, independently of the contri- of the sentence could lead to a better performance. To
bution of the main verb. However, Bencini and Goldberg our knowledge, the only effort made on that direction is
acknowledged also the importance of verb semantics on the one of Oliva et al. (2011). They proposed a semantic
the overall meaning of a sentence, stating that in some similarity measure that computes the semantic similarity
cases, its contribution could be higher than the one of argu- between concepts that play the same syntactic role in the
ment constructions. two sentences compared. This approach outperformed
The influence of syntax on semantics has been assessed existing methods in the task of computing semantic similar-
across different areas. For example, the syntactic boot- ity between sentences and obtained similar results to the
strapping theory (Gleitman & Gillete, 1994) posits that chil- best performing methods on the paraphrase recognition
dren use syntactic knowledge during the lexical acquisition task. Given that this measure takes into account the differ-
of verbs. Despite the great amount of work studying the ent syntactic roles, we adapted it in order to test the
124 J.I. Serrano et al.

validity of the conclusions extracted from the psychological joven anim a su amigo rpidamente’’ (In English: The young
study carried out on this paper. man encouraged his friend quickly) and we built a first set of
four sentences by changing, in each one, one of the four
How do humans weigh syntactic roles when syntactic roles for a synonym. The second set of four sen-
tences was constructed by replacing, in each one, one of
judging semantic similarity?
the syntactic roles of the frame sentence for an antonym.
The synonyms and antonyms used in the experiment were
Wiemer-Hastings (2004) pointed that human judges tend to extracted from the online version of the dictionary of the
ignore similarities between segments with different func- Real Academia Española de la Lengua1. The complete set
tional roles, denoting the importance of the different syn- of sentences used in the experiment is shown in Table 1
tactic roles in the computation of semantic similarity, and (see Appendix for the Spanish translations used in the
claiming that different syntactic roles have different experiment).
importance in the calculation of semantic similarity by
humans. In order to check this hypothesis we carried out Method
two experiments. These experiments tried to catch the
The evaluators were asked to rank the sentences in each set
importance given by humans to four basic syntactic roles:
according to its semantic similarity with the frame sentence
subject, verb, direct object and adverbial complement.
(rank 4 indicates the most similar sentence and rank 1 indi-
We designed the first experiment in order to check
cates the most dissimilar sentence). Each participant was
whether there exist significant qualitative differences in
presented with a form with the 4-sentence lists simultane-
the weights used by humans during the computation of
ously and there were no limitations of time to answer.
semantic similarity. The second experiment tries to com-
The sentences were presented to the evaluators in
plement the first one by measuring quantitatively those
Spanish, as shown in Appendix.
different weights.
Results and discussion
Experiment I
Table 1 shows the complete set of sentences and the average
scores given by human evaluators. A two-tailed t-test was
Experiment I examined whether participants are sensible to
performed to determine if there were significant differences
changes in syntactic roles and whether there are qualitative
in the way humans weigh different syntactic roles. In the
differences on the importance of different changes in dif-
synonym-set, not surprisingly, ratings given to the sentence
ferent syntactic roles. This experiment tried to show how
with a different verb were significantly lower than ratings
to rank the four basic syntactic roles (verb, subject, direct
given to the sentences with a different subject, object and
object and adverbial complement) according to the impor-
adverbial complement (tð29Þ ¼ 2:76; p < 0:01; tð29Þ ¼ 2:21;
tance that humans give to each of them while computing
p < 0:05; tð29Þ ¼ 4:22; p < 0:001, respectively). Moreover,
semantic similarity between sentences.
ratings given to the sentence with a different adverbial com-
plement were significantly higher than the ones given to the
Participants
sentences with a different subject, object and verb (tð29Þ ¼
The dataset was assessed by 30 evaluators, all adult, native 2:12; p < 0:05; tð29Þ ¼ 2:48; p < 0:05; tð29Þ ¼ 4:22; p < 0:001,
Spanish-speakers. respectively). However, there was no significant difference
between the ratings given to the sentence with a different
Materials subject and the one with a different object (tð29Þ ¼ 0:62;
Two sets of four sentences were built for use as stimuli in p ¼ 0:541). The results with the antonym set are similar in
the experiment. Both sets used the frame sentence ‘‘El terms of statistical significance. All the differences in

Table 1 Similarity ranks obtained from human evaluators for different syntactic roles and semantic relations of changed words
to the reference sentence ‘‘The young man encouraged his friend quickly’’. Rank ranges from 1 to 4, being 4 the most similar.
Syntactic role substituted Sentence Human-assigned rank
Mean Std.
Synonyms
Subject The lad encouraged his friend quickly 2.60 1.08
Direct object The young man encouraged his buddy quickly 2.40 0.95
Adverb. comp. The young man encouraged his friend rapidly 3.17 1.07
Verb The young man heartened his friend quickly 1.83 0.93

Antonyms
Subject The old man encouraged his friend quickly 2.63 0.69
Direct object The young man encouraged his enemy quickly 2.30 0.83
Adverb. comp. The young man encouraged his friend slowly 3.80 1.07
Verb The young man discouraged his friend quickly 1.20 0.40
Human-inspired semantic similarity between sentences 125

Table 2 Similarity scores obtained from human evaluators for different syntactic roles and semantic relations of changed words
in sentence pairs. Similarity scores range from 0 to 100, indicating 100 the highest similarity.
Syntactic role substituted Sentence pair Human-assigned similarity
Mean Std.
Subject That gem surprised the man 83.667 17.839
That jewel surprised the man
That crane surprised the man 55.667 21.944
That implement surprised the man
That glass surprised the man 23.133 18.747
That magician surprised the man

Direct object The man saw the gem yesterday 82.882 15.277
The man saw the jewel yesterday
The man saw the crane yesterday 45.867 23.562
The man saw the implement yesterday
The man saw the glass yesterday 23.882 19.787
The man saw the magician yesterday

Adverbial complement The man left my bike near those gems 92.000 7.024
The man left my bike near those jewels
The man left my bike near that crane 60.200 24.026
The man left my bike near that implement
The man left my bike near that glass 39.133 23.263
The man left my bike near that magician

Verb The man split the bin 73.667 20.450


The man divided the bin
The man crushed the bin 56.600 23.619
The man split the bin
The man emptied the bin 9.467 11.860
The man situated the bin

ratings were statistically significant but the one between The importance that humans give to subjects and objects
the sentence with a different subject and the sentence with when computing semantic similarity seems to be very simi-
a different object. lar. The differences obtained are not statistically signifi-
Given the results shown in Table 1 we can conclude that cant. This result is in line with the one reported by
different syntactic roles have different effects in sentence Wiemer-Hastings (2004), which also showed no statistical
semantic similarity. Furthermore, we could extract two pre- significance between these syntactic roles.
liminary conclusions:
Experiment II
1. Humans give a great importance to verbs in the compu-
tation of sentence semantic similarity. Substituting a In order to seek for more evidences and to catch the relative
verb by one of its antonyms makes the sentence much importance of each syntactic role we carried out a second
more different from the frame sentence than substitut-
experiment. Experiment I showed that participants gave
ing any other syntactic role. Also when making the sub- the highest importance to verbs and the lowest importance
stitution with a synonym, the resulting sentence is the to adverbial complements. A worthwhile question to ask is
most different from the frame sentence. This indicates
which the relative importance of each syntactic role is.
that a little change in the verb produces a change in sen- Thus, the objective of this second experiment was not only
tence semantics higher than the one produced by a little to rank the different syntactic roles attending to their
change in other syntactic roles.
importance when computing its semantic similarity, but also
2. Humans give a low importance to adverbial complements to obtain a quantitative measure of the different weights
in the computation of sentence semantic similarity. The human give to different syntactic roles when computing
effects produced by a substitution of the adverbial
semantic similarity. This way, this second experiment com-
complement in the frame sentence are exactly the
plements the first one and completes the qualitative and
opposed to the ones produced by substituting the verb. quantitative study presented on this paper.
Changing the adverbial complement by one of its anto-
nyms makes a very similar sentence to the frame sen-
tence. Furthermore, a slight change in the adverbial Participants
complement keeps the sentence very similar to the orig- The participants consisted of 27 evaluators, all adult, native
inal one. Spanish-speakers.
126 J.I. Serrano et al.

Materials complement (tð16Þ ¼ 2:18; p < :05; tð16Þ ¼ 2:42; p < :05;
We collected human ratings of semantic similarity for pairs tð16Þ ¼ 3:83; p < :01 respectively). Moreover, ratings given
of sentences following the existing designs for word similar- to the sentence pair built with the most similar adverbial
ity measures given by Rubenstein and Goodenough (1965) complement were significantly higher than ratings given to
and Yang and Powers (2006). In order to build the data the sentences built 15 with the most similar subjects,
set, we took three noun pairs from the experiment of objects and verbs (tð16Þ ¼ 2:13; p < :05; tð16Þ ¼
Rubenstein and Goodenough (1965). These three noun pairs 2:60; p < :05; tð16Þ ¼ 3:83; p < :01 respectively). However,
were selected attending to the semantic similarity given by there was no significant difference between the ratings
human evaluators in that experiment. A high similarity pair given to the sentence pairs with the most similar subjects
(gem jewel, with a similarity of 3.84), a medium similarity and the most similar objects (tð16Þ ¼ 0:07; p ¼ 0:943). The
pair (crane implement, with a similarity of 1.68) and a low results with the most dissimilar pairs of words are similar
similarity pair (glass magician, with a similarity of 0.11) in terms of statistical significance. All the differences in rat-
were selected. In the same way, we selected three pairs ings were statistically significant but the one between the
of verbs from the experiment of Yang and Powers (2006) sentence with the most similar subjects and the most simi-
(divide split, with a similarity of 4, split crush, with a simi- lar objects. The two conclusions obtained with the first
larity of 2 and empty situate, with a similarity of 0.17). With experiment were confirmed by the results shown in
this set of words, we built for each syntactic role three pairs Table 2. The pair of sentences built with the most dissimilar
of sentences with exactly the same words except for the verbs (empty – situate) is scored by humans as the most dis-
word which plays the corresponding syntactic role. For similar pair of sentences. The mean score given by humans
example, for the syntactic role ’subject’ we built the pairs to this pair of sentences is 9.467, significantly lower than
of sentences: ‘‘That gem surprised the man – That jewel the rest of sentence pairs made with the most dissimilar pair
surprised the man’’ ‘‘That crane surprised the man – That (glass – magician). Also the sentence pair constructed with
implement surprised the man’’and ‘‘That glass surprised the most similar pair of words (divide – split) obtained the
the man – That magician surprised the man’’. The complete lowest score (73.667) when these words play the role of
set of sentences used in the experiment is shown in Table 2 verb. The results of the second experiment also confirmed
(see Appendix for the Spanish translations used in the the observation that the humans give a low importance to
experiment). adverbial complements while calculating sentence semantic
similarity: the sentence pair made by using the most dissim-
Method ilar pair of words (glass – magician) was scored with the
Each evaluator was given a subset made by four pairs of sen- highest similarity (39.133) of all the sentence pairs made
tences, with one of the pairs of sentences built for each syn- with this pair of words and furthermore, the sentence pair
tactic role, so the evaluations were not influenced by the constructed with the most similar pair of noun words
presence of all the similar sentences. Every sentence pair (gem – jewel) obtained the highest score (92.000) when
was assessed by nine evaluators. these words play the role of adverbial complement. These
The evaluators were asked to rate the semantic similar- results confirm the fact that a change (either a big or a little
ity of the sentence pairs in a scale from 0 (minimum similar- change) in the meaning of the adverbial complement leads
ity) to 100 (maximum similarity). The scale used in the to a smaller change (from 39.133 to 92.000) in the sentence
experiments carried out by Rubenstein and Goodenough meaning than the one produced by a similar change in other
(1965) and Yang and Powers (2006) ranged from 0 to 4. syntactic roles. Nevertheless, a change (either a big or a lit-
However, we selected this new scale to make the evaluators tle change) in the meaning of the verb leads to a greater
indicate slight differences in scores easier. change (from 9.467 to 73.667) in the sentence meaning than
the one produced by a similar change in other syntactic
Results and discussion roles.
Observing the results obtained when dealing with sub-
Table 2 shows the complete set of sentences and the aver-
jects and direct objects we can see that the differences
age scores given by human evaluators. We calculated the
between these two syntactic roles are not as obvious as
Pearson’s correlation coefficient ðrÞ between each partici-
the ones described above. This second experiment con-
pant’s ratings and the average rating. The range was
firmed the observation obtained on the first experiment that
r ¼ 0:564 to r ¼ 0:961, with a mean of 0:8597ðSD ¼ 0:107Þ.
the importance given by human judgments to the subject
These results are consistent with the ones presented in
and the object roles is very similar.
O’Shea, Bandar, Crockett, and McLean (2008) for a similar
task. In that experiment, humans measured the similarity
of sentence pairs, obtaining an average correlation of Assessment of the importance of weighing
0:825ðSD ¼ 0:072Þ, moving in a range from 0.594 to 0.921. syntactic roles
Also a two-tailed t-test was performed to determine if there
were significant differences in the way humans weigh differ- In order to assess the conclusions obtained in the previous
ent syntactic roles. First of all, we took a look at the four section, we adapted a semantic similarity measure
pairs of sentences built with the most similar pairs of words. described in Oliva et al. (2011) that takes into account the
Human ratings given to the sentence pair built with the most influence of different syntactic roles on the overall sen-
similar verb pair were significantly lower than ratings given tence meaning. Here, the semantic similarity measure is
to the sentences built with the word pair ‘‘gem – jewel’’ applied to a paraphrase recognition task using two different
playing the role of subject, object and adverbial combinations of weights obtained from the human
Human-inspired semantic similarity between sentences 127

evaluators reported in the previous section and from other W-SyMSS obtains semantic similarity between concepts (in
human judgments for the same paraphrase recognition task. the formula: sS ; sV ; sO ; sA and simðh1 ; h2 Þ) from WordNet, using
As pointed by Corley and Mihalcea (2005) and Islam and its hierarchical structure and the different glosses associated
Inkpen (2008), sentence semantic similarity computation is with each term. The similarity between concepts is the basic
not the same task as paraphrase recognition. While in unit of similarity used by the sentence similarity method, so
semantic similarity computation a similarity score must be using a poor similarity measure at this point could reduce
assigned to pairs of sentences, a binary decision must be the overall performance of the proposed method.
made in paraphrase recognition for pairs of sentences: In this paper we have made a comparative study follow-
whether the two sentences mean exactly the same or not. ing the approach of Oliva et al. (2011) by using six different
Nevertheless, they are very related tasks and there are measures in order to compare their performance. These six
many things that can be learned from one to the other. measures belong to three types of categories (for a more
We used the paraphrase recognition task in our work for detailed explanation of these measures see Pedersen,
two main reasons: first of all, there exist large datasets Banerjee, & Patwardhan (2005)): Path-based category
judged by humans so we can test the hypotheses of our (Path, and Hirst and St. Onge measures), Information con-
experimental study with a significant sample. Moreover, tent-based category (Resnik, Lin, and Jiang and Conrath
the use of the paraphrase recognition task allows expanding measures) and Gloss-based category (Vector measure). In
the results obtained in the experimental study by checking each experiment we tested six different variations of W-
whether similar weights are used by humans in similar tasks. SyMSS, one of each using one of the measures mentioned.
From now on these different variations will be named with
W-SyMSS: Weighted Syntax-based Measure of a prefix (PATH, HSO, RES, LIN, JCN and VECTOR) indicating
Semantic Similarity the semantic similarity measure between concepts used.
The experimental methodology is the same as the one
In this subsection, we describe the W-SyMSS method pro- used by Oliva et al. (2011) but in that study the weights
posed by Oliva et al. (2011). The method captures the influ- assigned to each syntactic role were the ones obtained
ence of the syntactic structure of the compared sentences empirically by Wiemer-Hastings (2004). We will use those
in the calculation of the semantic similarity. This is based results in order to show how our current estimation of
on the notion that the meaning of a sentence is not only weights is more accurate.
made up by the meanings of its individual words, but also
by the structural way these words are combined. SW-SyMSS: Similarity task Weighted SyMSS
W-SyMSS captures and combines syntactic and semantic
information to compute the semantic similarity of two sen- As evidenced above, words with different syntactic roles in
tences. Semantic information is obtained from WordNet a sentence provide different contributions to sentence
(Fellbaum, 1998), whose structure allows calculating differ- semantic similarity computation made by humans.
ent types of semantic similarity measures between con- Therefore, an appropriate weighing strategy is needed in
cepts. Syntactic information is obtained through a parsing order to reflect the contribution that each syntactic compo-
process that obtains the phrases, i.e. groups of words that nent makes to the overall measurement.
function as a single unit in the syntax of a sentence, which Experiment II gives us a quantitative estimation about
make up the sentence as well as their syntactical functions. the different weights given by human evaluators to differ-
With this information, the proposed method measures the ent syntactic roles. To obtain the most adequate combina-
semantic similarity between concepts that have the same tion of weights that fits these human evaluations we used
syntactic function. an evolutionary strategy (De Jong, 2006). The parameters
The similarity between two sentences is calculated as a used in the evolutionary strategy were: a population of 30
weighted sum of the similarities between the heads of the individuals in each generation, an offspring of 200 individu-
phrases that have the same syntactic function in the two als and a schema l þ k for the individuals selection (i.e. the
sentences, following the next formula: l individuals of the next generation are selected among the
Pn parents and the offspring). The fitness function of each
w S  sS þ w V  sV þ w O  sO þ w A  sA þ i¼1 w R  simðh1i ;h2i Þ
simðs1 ;s2 Þ ¼  l  PF combination of weights was calculated as the Pearsons cor-
wS þ wV þ wO þ wA þ wR  n
relation coefficient of the similarities obtained using W-
Let us assume that sentence s1 and s2 are made of a subject SyMSS method and the similarity values given by human
(S), a verb (V), a direct object (O), an adverbial comple- evaluators in Experiment II. To compute the semantic simi-
ment (A) whose semantic similarities are sS ; sV ; sO and sA . larity of each of the sentence pairs in Experiment II, we
Also each sentence may have n other syntactic roles whose used the similarity values given by humans in Rubenstein
heads are h11 , h1n and h21 , h2n , respectively for sentence s1 and Goodenough (1965) to the word pair around which the
and s2 . Moreover, let phrases of h1i and h2i have the same sentence pair had been constructed. For example, the word
syntactic function. Also let us assume that the sentences pair crane implement, whose similarity is of 1.68 in the
have l syntactic roles that are only present in one of the sen- interval [0–4], recalculated to 0.42 in the interval [0–1],
tences. In this case, if one sentence has a phrase not shared can be taken. Given a combination of weights w S ; w V ; w O
by the other, a penalization factor (PF) is introduced to and w A (weights for subject, verb, object and adverbial
reflect the fact that one of the sentences has extra complement, respectively) the semantic similarity of the
information. sentence pair:
128 J.I. Serrano et al.

I saw that crane yesterday.


Table 3 Optimal weights for the semantic similarity task.
I saw that implement yesterday. Syntactic role Weight
would be: Subject 0.65293
w S þ w V þ ðw O  0:42Þ þ w A Verb 0.75191
simðs1 ; s2 Þ ¼ Object 0.68669
wS þ wV þ wO þ wA
Adverbial complement 0.55155
After running the evolutionary strategy, the best combi-
nation of weights is the one shown in Table 3. The similarity
scores obtained using this combination of weights leads to a
correlation coefficient of 0.974 (p < :01) with human simi- Accuracy comparisons between the weighted and non-
larities. Given that there are no human data about other weighted variations can be seen in Table 4. Complete
syntactic roles, the weight for the rest of the possible func- results are shown in Table 5. Also baseline results and the
tions present in a sentence was set empirically to w R ¼ 0:4. results obtained by similar studies (Islam & Inkpen, 2008;
From now on, the W-SyMSS method using the weights in Mihalcea et al., 2006; Oliva et al., 2011) are shown for
Table 3 will be called SW-SyMSS. the sake of comparisons. Concretely, Mihalcea et al.
In order to evaluate our sentence similarity measure SW- (2006) proposed a combined unsupervised method that uses
SyMSS with a large dataset and in a much more challenging six WordNet-based measures and two corpus-based mea-
task, we used the Microsoft Paraphrase Corpus (Dolan et al., sures and combined the results to show how these measures
2004). This corpus consists of 4076 training and 1725 test can be used to derive a short texts similarity measure. The
pairs collected from thousands of news sources on the main drawback of this method is that it computes the simi-
web over a period of 18 months, which have been labeled larity of words using eight different methods, which is not
by two human judges who determined whether the two sen- computationally efficient. Islam and Inkpen (2008) proposed
tences in a pair were semantically equivalent paraphrases or a corpus-based similarity method that considered pseudo-
not. The agreement between human judges was approxi- syntactic information, such as common word order similar-
mately 83%, which can be considered as an upper bound ity. It is important to note that Oliva et al. (2011) used
for the automatic task. For this paraphrase identification two versions of the same similarity method. The first one
task, we used SW-SyMSS as a supervised method, using the is a version of W-SyMSS with all the weights equal to one
training set for obtaining the similarity threshold score that (here called SyMSS) and the second one uses the combina-
carries the best accuracy in the training set, and the test set tion of weights extracted from the work of Wiemer-
for checking the method with this similarity threshold. In Hastings (2004) (here called WHW-SyMSS). Therefore, a
order to determine whether a pair is a paraphrase or not comparison with that method is of special interest and will
we used different similarity thresholds ranging from 0 to 1 show how our approach fits better the weights humans give
with interval 0.05. For each candidate paraphrase pair in to different syntactic roles.
the training set, the system obtained the semantic similarity
score and then labeled the candidate pair as a paraphrase if Results and discussion
the similarity score exceeded each of the thresholds used. Table 4 clearly shows the influence of using syntactic infor-
After the evaluation with the training test, we selected mation to compute semantic similarity. The six variations
the best similarity threshold in terms of accuracy for each that use a weighing strategy (SW-SyMSS) similar to the one
of the variations (PATH, HSO, RES, LIN, JCN and VECTOR) used by human evaluators in the similarity task outperform
of SW-SyMSS evaluated. Then, these similarity thresholds the corresponding variations that do not use this strategy.
were used in the evaluation process with the test set. The improvement of three of these variations (JCN, LIN
According to Mihalcea et al. (2006), two baselines were and VECTOR) was found to be significant (p < :05) using a
used: random simply makes a random decision for each can- parametric paired t-test once we confirmed that data were
didate pair and vector-based uses a cosine similarity mea- normally distributed by a Chi-Square test for the goodness
sure as traditionally used in information retrieval, with tf- of fit (p < :05). The other three variations have a similar
idf weighting (term-frequency * 1/term frequency in docu- performance using our combination of weights and the ones
ment). In order to show the contribution of our weighing used in Oliva et al. (2011).
strategy, we have also computed one more baseline method Moreover, Table 5 shows encouraging results given that
for each variation using a value of 1 for all weights in W- the combination of weights used has been obtained from a
SyMSS. similarity task and then used for the paraphrase recognition
The evaluation metrics used to measure the performance task. Three of the variations of SW-SyMSS (JCN, LIN and
of the different variations of SW-SyMSS are the ones pro- VECTOR) outperform significantly (p < :05, using the statis-
posed by Achananuparp et al. (2008). Precision is the pro- tical test mentioned previously) the accuracy results of
portion of correctly predicted paraphrase sentences to all Mihalcea et al. (2006). Also, the approaches based on the
predicted paraphrase sentences. Recall is the proportion VECTOR measure and the JCN measure (VECTOR-SWSyMSS
of correctly predicted paraphrase sentences to all para- and JCN-SW-SyMSS, respectively) obtain results similar to
phrase sentences. Rejection is the proportion of correctly the ones obtained by Islam and Inkpen (2008) in terms of
predicted non-paraphrase sentences to all non-paraphrase accuracy.
sentences. Accuracy is the proportion of all correctly pre- The results obtained from this computational experiment
dicted sentences compared to all sentences. confirm the working hypothesis: humans give different
Human-inspired semantic similarity between sentences 129

Table 4 Accuracy values with the MSR corpus for SW-SyMSS, WHWSyMSS and non-weighted SyMSS measures, and different
WordNet-based word similarity measures. Bold entries show the best performing variation for each word similarity measure.
Word similarity SW-SyMSS WHW-SyMSS SyMSS
PATH 69.80 69.81a 69.16
JCN 71.83a 70.87 70.42
RES 69.62 69.32 69.48
LIN 71.63a 70.63 70.10
HSO 69.27 68.72 68.48
VECTOR 72.08a 70.82 70.52
a
p < :05.

Table 5 Results with the MSR corpus. SW-SyMSS variations, similar methods and baselines. F1 and f1 are the uniform harmonic
mean of precision-recall and rejection-recall respectively. Bold entries show the best performing method for each evaluation
measure.
Measure Best similarity threshold Pr. Rec. Rej. F1 f1 Acc.
PATH-SW-SyMSS 0.4 73.03 88.25 30.41 79.92 45.23 69.80
JCN-SW-SyMSS 0.4 75.29 85.02 42.25 79.86 53.02 71.83
RES-SW-SyMSS 0.35 71.91 91.13 25.39 80.38 39.71 69.92
LIN-SW-SyMSS 0.4 73.76 88.92 30.89 80.63 45.85 71.63
HSO-SW-SyMSS 0.45 71.47 78.20 43.16 74.68 55.62 69.27
VECTOR-SW-SyMSS 0.5 74.5 88.71 38.15 80.99 53.35 72.08

Islam and Inkpen 0.6 74.65 89.13 39.97 81.25 55.19 72.64
Mihalcea et al. 0.5 69.60 97.70 – 81.30 – 70.30
Oliva et al. (JCN-WHWSyMSS) 0.45 74.47 84.17 41.61 79.02 55.68 70.87

Random – 68.30 50.00 – 57.80 – 51.30


Vector-based 0.5 71.60 79.50 – 75.30 – 65.40

weights to different syntactic roles in semantic similarity PW-SyMSS: Paraphrase task Weighted SyMSS
related tasks. This conclusion was already pointed by Oliva
et al. (2011). As commented before, they used the results The work of Wiemer-Hastings in combination with this study
of Wiemer-Hastings (2004) to show that a psychologically shows that humans give different importance to different
plausible weighing of syntactic roles led to a better fitting syntactic roles while computing semantic similarity. A natu-
to human evaluations. The presented approach obtains bet- ral question that arises is whether humans use similar
ter results than the WHW-SyMSS version of Oliva et al. weights when facing up to similar natural language process-
(2011) with three of the variations tested. Therefore, the ing tasks or not. In order to check this hypothesis we com-
working hypothesis is, once again, supported. Moreover, pute the optimal weights for the paraphrase recognition
these results show that our psychological study, carried task and compare them to the ones obtained for the seman-
out in Experiment II, is more accurate than the one of tic similarity computation task. As acknowledged by Corley
Wiemer-Hastings (2004) in order to measure the quantita- and Mihalcea (2005) and Islam and Inkpen (2008), sentence
tive contribution of each syntactic role in semantic similar- semantic similarity measures are a necessary step in the
ity related tasks. paraphrase recognition task, so it could be expected that
From a computational point of view, Table 5 shows that humans use similar weights in this task.
the enhancement of the psychological plausibility of an The optimal weights for the paraphrasing task were com-
existing method led to an improvement in the overall per- puted using an evolutionary strategy in the same way as for
formance of the method. The proposed semantic similarity SW-SyMSS. A hundred pairs of sentences (50 of them para-
measure can compete with state-of-the-art methods of phrase and 50 non-paraphrase) were selected from the
semantic similarity computation. Therefore, the contribu- Microsoft Paraphrase Corpus. The fitness function of each
tion of this paper is not only relevant from a cognitive point combination of weights was calculated as the accuracy in
of view but also from a computational point of view. As sta- the detection of paraphrase sentences among these 100
ted in the introduction, the importance of sentence seman- selected pairs. Table 6 shows the best combination of
tic similarity measures in natural language research is weights obtained after running the evolutionary strategy.
increasing due to the great number of applications that We only show the results obtained with the VECTOR mea-
are arising in many text related research fields. sure given that, following the previous study, is the best
130 J.I. Serrano et al.

performing concept similarity measure in terms of (PATH, JCN, LIN and VECTOR) outperform significantly the
accuracy. method of Mihalcea et al. (2006). This improvement was
The weight for the rest of the possible functions present found to be significant (p < :05) using a parametric paired
in a sentence was set empirically to w R ¼ 0:07348. Using t-test once we confirmed that data were normally dis-
this combination of weights in the same way as for SW- tributed by a Chi-Square test for the goodness of fit
SyMSS, we evaluate this new approach of the proposed sys- (p < :05). Moreover, the results of two of them (JCN and
tem with the Microsoft Paraphrase Corpus obtaining the VECTOR) are similar to the ones reported by Islam and
results shown in Table 7. Also baseline results and the Inkpen (2008) in terms of accuracy of paraphrase recogni-
results obtained by Mihalcea et al. (2006) and Islam and tion, showing the importance of taking into account the dif-
Inkpen (2008) are shown for the sake of comparisons. ferent importance of syntactic roles in the computation of
The optimal weights obtained for the paraphrasing semantic similarity.
recognition task show that humans tend to use similar
weights although the two tasks are different. Despite the Conclusions
fact that the weights obtained for the paraphrase recogni-
tion task are lower than the ones obtained for the semantic
This paper proposes an interdisciplinary approach to better
similarity computation task, the relative influence of each
understand how our mind computes semantic similarity and,
syntactic role is almost the same for both tasks (see
in particular, the different importance that humans give to
Table 8). In the paraphrase recognition task, humans give
different syntactic roles in the computation of semantic
the highest influence to the verb, while the adverbial com-
similarity. We made a psychological study about how
plement is, by far, the least important syntactic role. The
humans compute semantic similarity between sentences
results obtained for the subject and object role show, once
and then we use a computational paradigm in order to test
again, that the differences between subject and object are
the validity of the hypotheses derived from that study. First
not significant on the calculation of sentence semantic sim-
of all, we present an empirical study that seeks for the dif-
ilarity. As can be seen, for the semantic similarity computa-
ferent weights given by humans to different syntactic roles
tion task, the object had a slightly higher weight than the
when computing semantic similarity. Two experiments have
subject. And for the paraphrase recognition task, the sub-
been carried out to check the hypotheses that human beings
ject is the role that has a slightly higher weight. However,
tend to ignore similarities between segments with different
the differences are very small so we can conclude that
functional roles and that different syntactic roles have dif-
humans do not give a significant difference between these
ferent importance in their calculation of semantic similar-
two syntactic roles. This result matches up with the exper-
ity. The qualitative and quantitative results show that
iment carried out by Wiemer-Hastings (2004) and with the
humans give a great importance to verbs and a low impor-
experiments I and II presented in this paper, which also
tance to adverbial complements in the computation of sen-
show very slight differences between the effects of subject
tence semantic similarity. Furthermore, in our experiments
and object changes in semantic similarity.
we found no significant difference between the importance
The results obtained for both approaches on the para-
given by humans to the subject and the object role, pointing
phrase recognition task show, as expected, that the weights
that humans assign very similar weights to these syntactic
computed from the Microsoft Paraphrase Corpus are more
roles.
suitable for the paraphrase recognition task. However, the
In order to assess the validity of the conclusions obtained
results obtained using the weights computed from the
with the experiments carried out with humans, we used a
semantic similarity task are only slightly different, showing
computational paradigm. We incorporated the results of
again that the weights used by human in different tasks are
the empirical study to a psychologically plausible semantic
very related. If we measure the performance of the para-
similarity method described in Oliva et al. (2011) that takes
phrase weights on the corpus used to compute the semantic
into account the influence of different syntactic roles on the
similarity, we obtain a correlation coefficient of 0.924,
overall sentence meaning.
which is slightly lower than the 0.974 obtained with the
The semantic similarity method was applied to a para-
weights computed from this corpus. These results show that
phrase recognition task using two different combinations
both combinations of weights are suitable for both tasks.
of weights obtained from twenty-seven human evaluators
Once again, the results of this experiment are relevant
for a semantic similarity task and from two human judg-
not only from a cognitive point of view but also from a com-
ments for the same paraphrase recognition task. The results
putational point of view. Four of the six approaches tested
obtained with both versions confirm the different contribu-
tion of different syntactic roles on semantic similarity com-
putation. The different variations tested with the two
combinations of weights outperformed their non-weighted
Table 6 Optimal weights for the paraphrase recognition
counterparts. Furthermore, they obtained results similar
task.
to the ones reported by Islam and Inkpen (2008) and
Syntactic role Weight Mihalcea et al. (2006) on the paraphrase recognition task.
Subject 0.52791 Moreover, four of the six approaches tested outperform sig-
Verb 0.59672 nificantly the method of Mihalcea et al. (2006) and the
Object 0.47315 results of three of them are similar to the ones reported
Adverbial complement 0.22383 by Islam and Inkpen (2008). Finally, we compared the differ-
ent weights given by humans to different syntactic roles on
Human-inspired semantic similarity between sentences 131

Table 7 Results with the MSR corpus. PW-SyMSS variations, similar methods and baselines. Bold entries show the best
performing method for each evaluation measure.
Measure Best similarity threshold Pr. Rec. Rej. F1 f1 Acc.
PATH-PW-SyMSS 0.4 71.65 91.23 28.32 80.26 43.22 71.09
HSO-PW-SyMSS 0.45 71.62 78.75 43.85 75.02 56.33 69.47
JCN-PW-SyMSS 0.4 75.42 84.91 44.01 79.88 57.97 72.12
RES-PW-SyMSS 0.35 72.63 91.46 27.10 80.96 41.81 70.84
LIN-PW-SyMSS 0.4 75.31 88.53 35.69 81.39 50.87 72.03
VECTOR-PW-SyMSS 0.5 75.94 90.77 39.68 82.69 55.22 72.67

Islam and Inkpen 0.6 74.65 89.13 39.97 81.25 55.19 72.64
Mihalcea et al. 0.5 69.60 97.70 – 81.30 – 70.30
Oliva et al. (JCN-WHWSyMSS) 0.45 74.47 84.17 41.61 79.02 55.68 70.87

Random – 68.30 50.00 – 57.80 – 51.30


Vector-based 0.5 71.60 79.50 – 75.30 – 65.40

Table 8 Relative influence of the different syntactic roles on each task.


Syntactic role % Influence on semantic similarity computation % Influence on paraphrase recognition
Subject 24.70 28.98
Verb 28.45 32.76
Object 25.98 25.97
Adverbial complement 20.87 12.29

different tasks that involve semantic similarity computa- similarity assessment. The sentences used in this work are
tion. The weights obtained from the human ratings of all expository, indicative and neutral with no named-entity.
semantic similarity and the ones obtained from the para- Syntactic role influence might be affected by changes in
phrase recognition task were very similar, showing that these variables.
humans tend to use the same weights through different The interdisciplinary character of this work is not only
tasks. assessed by the combination of experimental techniques
The experimental results obtained from both the psycho- derived from psycholinguistics and computational sciences.
logical and the computational experiments are in line with Moreover, the contributions of this paper are of interest
the ones obtained by previous work Wiemer-Hastings both from a theoretical and a practical point of view. On
(2004) and support the Principle of Compositionality. the one hand, our work goes a step forward in the quantita-
Moreover, our study goes a step further and tries to measure tive and qualitative study of how our mind weighs different
qualitatively and quantitative the influence of different syn- syntactic roles when computing semantic similarity. This is
tactic roles in the semantic similarity computation task. The in accordance with recent neurolinguistic findings (Malaia
fact that using these quantitative results improves the per- & Newman, 2014) and contradicts the generative grammar
formance of a psychologically plausible measure of seman- theories. On the other hand, this paper shows how those
tic similarity supports the working hypothesis. It also results can be straightforwardly used in a psychologically
points that our psychological experiments are more accu- plausible computational system with many practical appli-
rate than the existing ones. cations, similarly to the work of Jackendoff (2007). This
These results are bound to some limitations present in work and the one presented here are a proof on how compu-
this work. The concept of similarity is a subjective idea that tational approaches, together with neuroscientific findings,
human subjects might interpret differently. A more accu- can help shape theory by a better approximation of actual
rate approach should take this into account, although the human processing.
problem of determining, or at least grouping, different sim- Future work includes extending the experiments to other
ilarity conceptions is indeed a hard problem. Besides, the syntactic roles in order to obtain general results about how
evidenced influence of the syntactic roles has been charac- humans weigh the most common syntactic roles when they
terized for English (Wiemer-Hastings, 2004) and Spanish compute sentence semantic similarity. It also includes to
(this work). It is not proven that the same is valid for other extend the study to languages of other linguistic typologies.
languages. The revealed weights might be related to the lin- Furthermore, it would be interesting to check the hypothe-
guistic topology, which in both mentioned languages is sis proposed by Bencini and Goldberg (2000). They proposed
Subject–Verb–Object (SVO). Other languages of different that argument structure constructions seem to play a cru-
typology like Egyptian (VSO) or Standard Mandarin (SOV), cial role in sentence interpretation, independent of the con-
for instance, might yield different results. Moreover, the tribution of the main verb. Thus, it would be interesting to
mode of compared sentences can also influence the use the interdisciplinary approach of this study to check
132 J.I. Serrano et al.

that hypothesis. Moreover, from a computational point of Acknowledgment


view, would be interesting to merge the conclusions of
those two studies in order to enhance the psychological This work has been funded by project PIE-201350E070.
plausibility of the proposed semantic similarity measure.
Other trends of future work are related to the application
of the proposed method to different natural language pro-
cessing tasks that involve semantic similarity computation Appendix. Spanish test sentences
to some extent. This way, it could be observed whether
humans keep on using similar weights when facing different
Tables 9 and 10.
tasks.

Table 9 Spanish test sentences used in Experiment I.


Syntactic role substituted Sentence
Synonyms
Subject El muchacho animó a su amigo rápidamente
Direct object El joven animó a su colega rápidamente
Adverb. comp. El joven animó a su amigo velozmente
Verb El joven alentó a su amigo rápidamente

Antonyms
Subject El viejo animó a su amigo rápidamente
Direct object El joven animó a su enemigo rápidamente
Adverb. comp. El joven animó a su amigo lentamente
Verb El joven desanimó a su amigo rápidamente

Table 10 Spanish test sentences used in Experiment II.


Syntactic role substituted Sentence pair
Subject Aquella gema sorprendió al hombre
Aquella joya sorprendió al hombre
Aquella grúa sorprendió al hombre
Aquella herramienta sorprendió al hombre
Aquel cristal sorprendió al hombre
Aquel mago sorprendió al hombre

Direct object El hombre vió la gema ayer


El hombre vió la joya ayer
El hombre vió la grúa ayer
El hombre vió la herramienta ayer
El hombre vió el cristal ayer
El hombre vió al mago ayer

Adverbial complement El hombre dejó mi bicicleta cerca de aquellas gemas


El hombre dejó mi bicicleta cerca de aquellas joyas
El hombre dejó mi bicicleta cerca de aquella grúa
El hombre dejó mi bicicleta cerca de aquellas herramientas
El hombre dejó mi bicicleta cerca de aquel cristal
El hombre dejó mi bicicleta cerca de aquel mago

Verb El hombre partió el contenedor


El hombre dividió el contenedor
El hombre aplastó el contenedor
El hombre partió el contenedor
El hombre vació el contenedor
El hombre situó el contenedor
Human-inspired semantic similarity between sentences 133

Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to


References
latent semantic analysis. Discourse Processes, 25, 259–284.
Li, Y., McLean, D., Bandar, Z., O’Shea, J., & Crockett, K. A. (2006).
Achananuparp, P., Hu, X., Zhou, X., & Zhang, X. (2008). Utilizing
Sentence similarity based on semantic nets and corpus statistics.
sentence similarity and question type similarity to response to
IEEE Transactions on Knowledge and Data Engineering, 18(8),
similar questions in knowledge-sharing community. In
1138–1150.
Proceedings of the QAWeb 2008 workshop..
Malaia, E., & Newman, S. (2014). Neural bases of event knowledge
Achananuparp, P., Hu, X., & Yang, C. C. (2009). Addressing the
and syntax integration in comprehension of complex sentences.
variability of natural language expressions in sentence similarity
Neurocase, 20, 1–14.
with semantic structure of the sentences. In Proceedings of the
Mihalcea, R., Corley, C., Strapparava, C. (2006). Corpus-based and
13th Pacific–Asia conference on knowledge discovery and data
knowledge-based measures of text semantic similarity. In
mining..
Proceedings of the American association for artificial intelli-
Aliguliyev, R. M. (2009). A new sentence similarity measure and
gence (AAAI 2006)..
sentence based extractive technique for automatic text sum-
Oliva, J., Serrano, J. I., Del Castillo, M. D., & Iglesias, A. (2011).
marization. Expert Systems with Applications, 36(4),
Symss: A syntax-based measure for short-text semantic similar-
7764–7772.
ity. Data and Knowledge Engineering, 70(4), 390–405.
Allen, J. (1995). Natural language understanding. The Benjamin/
O’Shea, J., Bandar, Z., Crockett, K.A., & McLean, D. (2008). Agent
Cummings Publishing Company, Inc..
and multi-agent systems: Technologies and applications (Vol.
Bencini, G. M. L., & Goldberg, A. E. (2000). The contribution of
4953, pp. 172–181). Springer..
argument structure constructions to sentence meaning. Journal
Park, E. K., Ra, D. Y., & Jang, M. G. (2005). Techniques for
of Memory and Language, 43(4), 640–651.
improving web retrieval effectiveness. Information Processing
Cambria, E., & White, B. (2014). Jumping NLP curves: A review of
and Management, 41(5), 1207–1223.
natural language processing research. IEEE Computation
Pedersen, T., Banerjee, S., & Patwardhan, S. (2005). Maximizing
Intelligence Magazine, (May), 48–57..
Semantic relatedness to perform word sense disambiguation
Corley, C., & Mihalcea, R. (2005). Measures of text semantic
(Research Report No. UMSI 2005/25). University of Minnesota
similarity. In Proceedings of the ACL workshop on empirical
Supercomputing Institute <http://www.patwardhans.net/pa-
modeling of semantic equivalence..
pers/PedersenBP05.pdf>..
De Jong, K. A. (Ed.). (2006). Evolutionary computation: A unified
Rubenstein, H., & Goodenough, J. B. (1965). Contextual correlates
approach. MIT Press.
of synonymy. Communications of ACM, 8(10), 627–633.
Dolan, W., Quirk, C., & Brockett, C. (2004). Unsupervised con-
Wiemer-Hastings, P. (2000). Adding syntactic information to LSA. In
struction of large paraphrase corpora: Exploiting massively
Proceedings of the 22nd annual conference of the cognitive
parallel news sources. In Proceedings of the 20th international
science society. Erlbaum.
conference on computational linguistics..
Wiemer-Hastings, P. (2004). All parts are not created equal: SIAM-
Fellbaum, C. (1998). Wordnet: An electronic lexical database. MIT
LSA. In Proceedings of the 26th annual conference of the
Press.
cognitive science society. Erlbaum.
Gleitman, L., & Gillete, J. (1994). The handbook of child language.
Wiemer-Hastings, P., Wiemer-Hastings, K., & Graesser, A. (1999).
In P. Fletcher & B. MacWhinney (Eds.), Chap. The role of syntax
How latent is latent semantic analysis?. In Proceedings on the
in verb learning. Blackwell.
sixteenth international joint congress on artificial intelligence
Goldstone, R. (1994). Similarity, interactive activation, and map-
(pp. 932–937). Morgan Kaufman.
ping. Journal of Experimental Psychology, 20(1), 3–28.
Wiemer-Hastings, P., & Ziprita, I. (2001). Rules for syntax, vectors
Healy, A., & Miller, G. (1970). The verb as the main determinant of
for semantics. In Proceedings of the 23rd annual conference of
sentence meaning. Psychonomic Science, 20.
the cognitive science society. Erlbaum.
Islam, A., & Inkpen, D. (2008). Semantic text similarity using
Yang, D., & Powers, D.M.W. (2006). Verb similarity on the taxonomy
corpus-based word similarity and string similarity. ACM
of WordNet. In Proceedings of the third international WordNet
Transactions on Knowledge Discovery from Data, 2(2), 1–25.
conference (pp. 121–128). Masaryk University..
Jackendoff, R. (2007). A parallel architecture perspective on
language processing. Brain Research, 1146, 2–22.

You might also like